Overview

Najko Jahn

2021-08-24

What is searched?

Europe PMC is a repository of life science literature. Europe PMC ingests all PubMed content and extends its index with other literature and patent sources.

For more background on Europe PMC, see:

https://europepmc.org/About

Levchenko, M., Gou, Y., Graef, F., Hamelers, A., Huang, Z., Ide-Smith, M., … McEntyre, J. (2017). Europe PMC in 2017. Nucleic Acids Research, 46(D1), D1254–D1260. https://doi.org/10.1093/nar/gkx1005

How to search Europe PMC with R?

This client supports the Europe PMC search syntax. If you are unfamiliar with searching Europe PMC, check out the Europe PMC query builder, a very nice tool that helps you to build queries. To make use of Europe PMC queries in R, copy & paste the search string to the search functions of this package.

In the following, some examples demonstrate how to search Europe PMC with R.

Managing search results

By default, 100 records are returned, but the number of results can be expanded or limited with the limit parameter.

europepmc::epmc_search('"Human malaria parasites"', limit = 10)
#> # A tibble: 10 × 28
#>    id        source pmid     doi   title authorString journalTitle pubYear journalIssn
#>    <chr>     <chr>  <chr>    <chr> <chr> <chr>        <chr>        <chr>   <chr>      
#>  1 34415329  MED    34415329 10.1… Func… Kimata-Arig… J Biochem    2021    "0021-924x…
#>  2 34087264  MED    34087264 10.1… Dive… Goh XT, Lim… Mol Biochem… 2021    "0166-6851…
#>  3 34400833  MED    34400833 10.1… A he… Tintó-Font … Nat Microbi… 2021    "2058-5276"
#>  4 33789941  MED    33789941 10.1… Addi… Kwon H, Sim… mSphere      2021    "2379-5042"
#>  5 34211355  MED    34211355 <NA>  An E… Clark NF, T… Yale J Biol… 2021    "0044-0086…
#>  6 34362867  MED    34362867 10.4… High… Lai MY, Raf… Trop Biomed  2021    "0127-5720…
#>  7 33693917  MED    33693917 10.1… Non-… Antinori S,… J Travel Med 2021    "1195-1982…
#>  8 32470136  MED    32470136 10.1… C-te… Kimata-Arig… J Biochem    2020    "0021-924x…
#>  9 PPR353209 PPR    <NA>     10.1… 5-me… Liu M, Guo … <NA>         2021     <NA>      
#> 10 33797521  MED    33797521 10.4… Comp… Mat Salleh … Trop Biomed  2021    "0127-5720…
#> # … with 19 more variables: pubType <chr>, isOpenAccess <chr>, inEPMC <chr>,
#> #   inPMC <chr>, hasPDF <chr>, hasBook <chr>, hasSuppl <chr>,
#> #   citedByCount <int>, hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, journalVolume <chr>, pageInfo <chr>,
#> #   issue <chr>, pmcid <chr>

Results are sorted by relevance. Other options via the sort parameter are

Search by DOIs

Sometimes, you would like to check, if articles are indexed in Europe PMC using DOI names, a widely used identifier for scholarly articles. Use epmc_search_by_doi() for this purpose.

my_dois <- c(
  "10.1159/000479962",
  "10.1002/sctm.17-0081",
  "10.1161/strokeaha.117.018077",
  "10.1007/s12017-017-8447-9"
  )
europepmc::epmc_search_by_doi(doi = my_dois)
#> # A tibble: 4 × 28
#>   id       source pmid     doi   title authorString journalTitle issue journalVolume
#>   <chr>    <chr>  <chr>    <chr> <chr> <chr>        <chr>        <chr> <chr>        
#> 1 28957815 MED    28957815 10.1… Clin… Schnieder M… Eur Neurol   5-6   78           
#> 2 28941317 MED    28941317 10.1… Conc… Doeppner TR… Stem Cells … 11    6            
#> 3 29018132 MED    29018132 10.1… One-… Psychogios … Stroke       11    48           
#> 4 28623611 MED    28623611 10.1… Defe… Carboni E, … Neuromolecu… 2-3   19           
#> # … with 19 more variables: pubYear <chr>, journalIssn <chr>, pageInfo <chr>,
#> #   pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> #   hasBook <chr>, hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pmcid <chr>

Output options

By default, a non-nested data frame printed as tibble is returned. Other formats are output = "id_list" returning a list of IDs and sources, and output = “‘raw’”” for getting full metadata as list. Please be aware that these lists can become very large.

More advanced options to search Europe PMC

Annotations

Europe PMC provides text-mined annotations contained in abstracts and open access full-text articles.

These automatically identified concepts and term can be retrieved at the article-level:

europepmc::epmc_annotations_by_id(c("MED:28585529", "PMC:PMC1664601"))
#> # A tibble: 774 × 13
#>    source ext_id   pmcid      prefix exact postfix name  uri   id    type  section
#>    <chr>  <chr>    <chr>      <chr>  <chr> <chr>   <chr> <chr> <chr> <chr> <chr>  
#>  1 MED    28585529 PMC5467160 "tive… Beta… " allo… Beta… http… http… Clin… Title …
#>  2 MED    28585529 PMC5467160 "nomi… genes ".\nRa… gene  http… http… Sequ… Title …
#>  3 MED    28585529 PMC5467160 "nomi… genes " is o… gene  http… http… Sequ… Abstra…
#>  4 MED    28585529 PMC5467160 " One… genes " are … gene  http… http… Sequ… Abstra…
#>  5 MED    28585529 PMC5467160 " ide… beet  " (Bet… Beta… http… http… Clin… Abstra…
#>  6 MED    28585529 PMC5467160 "ify … Beta… " ssp.… Beta… http… http… Clin… Abstra…
#>  7 MED    28585529 PMC5467160 "ulga… gene  " Rz2 … gene  http… http… Sequ… Abstra…
#>  8 MED    28585529 PMC5467160 "e ge… geno… " sequ… geno… http… http… Sequ… Abstra…
#>  9 MED    28585529 PMC5467160 "eque… beet  ". Our… Beta… http… http… Clin… Abstra…
#> 10 MED    28585529 PMC5467160 "disc… genes " rele… gene  http… http… Sequ… Abstra…
#> # … with 764 more rows, and 2 more variables: provider <chr>, subType <chr>

To obtain a list of articles where Europe PMC has text-minded annotations, either subset the resulting data.frame

tt <- epmc_search("malaria")
tt[tt$hasTextMinedTerms == "Y" | tt$hasTMAccessionNumbers == "Y",]
#> # A tibble: 94 × 29
#>    id        source pmid     doi   title authorString journalTitle issue journalVolume
#>    <chr>     <chr>  <chr>    <chr> <chr> <chr>        <chr>        <chr> <chr>        
#>  1 34100426  MED    34100426 10.4… New … Lima MN, Ba… Neural Rege… 1     17           
#>  2 33535760  MED    33535760 10.3… THE … Damiani E, … Acta Med Hi… 2     18           
#>  3 33530764  MED    33530764 10.1… Disc… Hoarau M, V… J Enzyme In… 1     36           
#>  4 33372863  MED    33372863 10.1… ATP2… Lamy A, Mac… Emerg Micro… 1     10           
#>  5 33594960  MED    33594960 10.1… Mana… Kambale-Kom… Hematology   1     26           
#>  6 34283002  MED    34283002 10.1… <i>P… Alhassan AM… Pharm Biol   1     59           
#>  7 34184352  MED    34184352 10.1… Stru… Chhibber-Go… Protein Sci  9     30           
#>  8 34362867  MED    34362867 10.4… High… Lai MY, Raf… Trop Biomed  3     38           
#>  9 34399767  MED    34399767 10.1… Inve… Njau J, Sil… Malar J      1     20           
#> 10 PPR385006 PPR    <NA>     10.2… Temp… Ingholt MM,… <NA>         <NA>  <NA>         
#> # … with 84 more rows, and 20 more variables: pubYear <chr>, journalIssn <chr>,
#> #   pageInfo <chr>, pubType <chr>, isOpenAccess <chr>, inEPMC <chr>,
#> #   inPMC <chr>, hasPDF <chr>, hasBook <chr>, hasSuppl <chr>,
#> #   citedByCount <int>, hasReferences <chr>, hasTextMinedTerms <chr>,
#> #   hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>, pmcid <chr>, versionNumber <int>

or expand the query choosing an annotation type or provider from the Europe PMC Advanced Search query builder.

epmc_search('malaria AND (ANNOTATION_TYPE:"Cell") AND (ANNOTATION_PROVIDER:"Europe PMC")')
#> # A tibble: 100 × 28
#>    id       source pmid     pmcid  doi   title  authorString  journalTitle issue
#>    <chr>    <chr>  <chr>    <chr>  <chr> <chr>  <chr>         <chr>        <chr>
#>  1 31782768 MED    31782768 PMC79… 10.1… Incre… Jongo SA, Ch… Clin Infect… 11   
#>  2 31808816 MED    31808816 PMC76… 10.1… Retin… Villaverde C… J Pediatric… 5    
#>  3 30989220 MED    30989220 PMC73… 10.1… Clini… Enane LA, Su… J Pediatric… 3    
#>  4 31300826 MED    31300826 PMC72… 10.1… Black… Opoka RO, Wa… Clin Infect… 11   
#>  5 31807752 MED    31807752 <NA>   10.1… Malar… Marcombe S, … J Med Entom… 3    
#>  6 31505001 MED    31505001 <NA>   10.1… Acute… Oshomah-Bell… J Trop Pedi… 2    
#>  7 31687768 MED    31687768 <NA>   10.1… Evalu… Ferdinand DY… Trans R Soc… 3    
#>  8 31693130 MED    31693130 PMC71… 10.1… Reduc… Kingston HWF… J Infect Dis 9    
#>  9 31679146 MED    31679146 <NA>   10.1… A Sys… Thiengsusuk … Eur J Drug … 2    
#> 10 30852586 MED    30852586 <NA>   10.1… An Ex… Woodford J, … J Infect Dis 6    
#> # … with 90 more rows, and 19 more variables: journalVolume <chr>,
#> #   pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>,
#> #   isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>,
#> #   hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>

Data integrations

Another nice feature of Europe PMC is to search for cross-references between Europe PMC to other databases. For instance, to get publications cited by entries in the Protein Data bank in Europe published 2016:

europepmc::epmc_search('(HAS_PDB:y) AND FIRST_PDATE:2016')
#> # A tibble: 100 × 28
#>    id       source pmid     pmcid  doi   title  authorString  journalTitle issue
#>    <chr>    <chr>  <chr>    <chr>  <chr> <chr>  <chr>         <chr>        <chr>
#>  1 27989121 MED    27989121 PMC58… 10.1… Short… Lin J, Pozha… Biochemistry 2    
#>  2 27815281 MED    27815281 PMC52… 10.1… Struc… Wakamatsu T,… Appl Enviro… 2    
#>  3 28035004 MED    28035004 PMC53… 10.1… Struc… Waz S, Nakam… J Biol Chem  7    
#>  4 28030602 MED    28030602 PMC51… 10.1… Struc… Christensen … PLoS One     12   
#>  5 28066558 MED    28066558 PMC51… 10.1… Struc… Gai Z, Wang … Cell Discov  <NA> 
#>  6 28024149 MED    28024149 PMC53… 10.1… Cryst… Kuk AC, Mash… Nat Struct … 2    
#>  7 28031486 MED    28031486 PMC52… 10.1… Struc… Sevrioukova … Proc Natl A… 3    
#>  8 28011634 MED    28011634 PMC53… 10.1… Struc… Levdikov VM,… J Biol Chem  7    
#>  9 28009010 MED    28009010 PMC51… 10.1… Struc… Zhao H, Wei … Sci Rep      <NA> 
#> 10 28197319 MED    28197319 PMC53… 10.1… Struc… Johannes JW,… ACS Med Che… 2    
#> # … with 90 more rows, and 19 more variables: journalVolume <chr>,
#> #   pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>,
#> #   isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>, hasBook <chr>,
#> #   hasSuppl <chr>, citedByCount <int>, hasReferences <chr>,
#> #   hasTextMinedTerms <chr>, hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> #   hasTMAccessionNumbers <chr>, firstIndexDate <chr>,
#> #   firstPublicationDate <chr>

The following sources are supported

To retrieve metadata about these external database links, use europepmc_epmc_db().

Citations and reference sections

Europe PMC let us also obtain citation metadata and reference sections. For retrieving citation metadata per article, use

europepmc::epmc_citations("9338777", limit = 500)
#> # A tibble: 233 × 11
#>    id     source citationType title authorString journalAbbrevia… pubYear volume
#>    <chr>  <chr>  <chr>        <chr> <chr>        <chr>              <int> <chr> 
#>  1 33353… MED    review-arti… Xeno… Galow AM, G… Int J Mol Sci       2020 21    
#>  2 31565… MED    research-ar… Regu… Chung HC, N… J Vet Sci           2019 20    
#>  3 30230… MED    research su… Bioe… Legallais C… Adv Healthc Mat…    2018 7     
#>  4 30264… MED    research su… Porc… Fiebig U, F… Xenotransplanta…    2018 25    
#>  5 29756… MED    historical … Infe… Weiss RA.    Xenotransplanta…    2018 25    
#>  6 29642… MED    research su… Trac… Kawasaki J,… Viruses             2018 10    
#>  7 28768… MED    research su… Pres… Kawasaki J,… J Virol             2017 91    
#>  8 28437… MED    research su… Thre… Colon-Moran… Virology            2017 507   
#>  9 28054… MED    research su… Anti… Inoue Y, Yo… Ann Biomed Eng      2017 45    
#> 10 27832… MED    research-ar… Tran… Kim N, Choi… PLoS One            2016 11    
#> # … with 223 more rows, and 3 more variables: issue <chr>, citedByCount <int>,
#> #   pageInfo <chr>

For reference section from an article:

europepmc::epmc_refs("28632490", limit = 200)
#> # A tibble: 169 × 19
#>    id       source citationType title authorString journalAbbrevia… issue pubYear
#>    <chr>    <chr>  <chr>        <chr> <chr>        <chr>            <chr>   <int>
#>  1 12002480 MED    JOURNAL ART… Tric… Adolfsson-E… Chemosphere      9-10     2002
#>  2 18795164 MED    JOURNAL ART… In v… Ahn KC, Zha… Environ Health … 9        2008
#>  3 18556606 MED    JOURNAL ART… Effe… Aiello AE, … Am J Public Hea… 8        2008
#>  4 17683018 MED    JOURNAL ART… Cons… Aiello AE, … Clin Infect Dis  <NA>     2007
#>  5 15273108 MED    JOURNAL ART… Rela… Aiello AE, … Antimicrob Agen… 8        2004
#>  6 18207219 MED    JOURNAL ART… The … Allmyr M, H… Sci Total Envir… 1        2008
#>  7 17007908 MED    JOURNAL ART… Tric… Allmyr M, A… Sci Total Envir… 1        2006
#>  8 26948762 MED    JOURNAL ART… Pres… Alvarez-Riv… J Chromatogr A   <NA>     2016
#>  9 23192912 MED    JOURNAL ART… Expo… Anderson SE… Toxicol Sci      1        2012
#> 10 25837385 MED    JOURNAL ART… Obse… Vladar EK, … Methods Cell Bi… <NA>     2015
#> # … with 159 more rows, and 11 more variables: volume <chr>, pageInfo <chr>,
#> #   citedOrder <int>, match <chr>, essn <chr>, issn <chr>,
#> #   publicationTitle <chr>, publisherLoc <chr>, publisherName <chr>,
#> #   externalLink <chr>, doi <chr>

Fulltext access

Europe PMC gives not only access to metadata, but also to full-texts. Adding AND (OPEN_ACCESS:y) to your search query, returns only those articles where Europe PMC has also the fulltext.

Fulltext as xml document can accessed via the PMID or the PubMed Central ID (PMCID):

europepmc::epmc_ftxt("PMC3257301")
#> {xml_document}
#> <article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
#> [1] <front>\n  <journal-meta>\n    <journal-id journal-id-type="nlm-ta">PLoS  ...
#> [2] <body>\n  <sec id="s1">\n    <title>Introduction</title>\n    <p>Atmosphe ...
#> [3] <back>\n  <ack>\n    <p>We would like to thank Dr. C. Gourlay and Dr. T.  ...