repoRter.nih — R Interface to the ‘NIH RePORTER Project’ API

CRAN status

repoRter.nih is an R package that allows users to request project award data from the U.S. National Institute of Health (NIH)’s Application Programming Interface (API). The RePORTER API provides the public with access to project and grantee information for all projects funded through NIH grants (around $42 billion annually as of 2021). This package provides methods enabling a user to easily build requests in the non-standard JSON schema required by the RePORTER API as well as to retrieve and process results.

Quick Tour

Installation

repoRter.nih can be installed easily through CRAN or GitHub.

CRAN

install.packages('repoRter.nih')

Github

The latest stable release (consistent with CRAN version) can be installed with

devtools::install_github('bikeactuary/repoRter.nih)

A development version of this package will also be available and may contain bugfixes, updates to match upstream RePORTER API changes (for example, changes to request schema), and/or new functionality not yet pushed to CRAN. For the latest dev version, you must specify to install from the ‘dev’ branch:

devtools::install_github('bikeactuary/repoRter.nih@dev')

API Basics

The repoRter.nih package supports v2 of the RePORTER API. This API does not require registration or authorization of any kind. There are some hard and soft limits around request rates & page and complete result set size to be aware of, detailed below:

Item Value Note
Daily query limit None NIH requests you limit large jobs to US weekends and weekdays between 9PM and 5AM EST
Request rate limit None NIH asks you to not post more than 1 request per second - repoRter.nih enforces this
Records per page (max) 500 Default in get_nih_data()
offset parameter (max) 9999 Effectively limits you to retrieving only the first 10,000 records from any result

Note that it is possible to work around the offset parameter limitation by breaking up large requests into smaller requests. If you do this with some thoughtfulness and a little programming, you can work around to obtain complete large result sets - an example is provided in the vignette.

Sample Code

Example 1

If no criteria are provided, the default request is for all projects with fiscal_year = lubridate::year(Sys.Date()). This search will often return over 10,000 results so the full result set will not be retrievable without some programming.

req <- make_req()
res <- get_nih_data(req)

Example 2

Different fields are available for you to search and filter projects.

req <- make_req(criteria =
                  list(covid_response = c("All")),
                message = FALSE)
res <- get_nih_data(req,
                    flatten_result = TRUE)

When TRUE, the flatten_result argument will un-nest nested data.frames and atomic vectors (in which case, values are pasted together and delimited by “;”).

library(ggplot2)
res %>%
  left_join(covid_response_codes, by = "covid_response") %>%
  mutate(covid_code_desc = case_when(!is.na(fund_src) ~ paste0(covid_response, ": ", fund_src),
                                     TRUE ~ paste0(covid_response, " (Multiple)"))) %>%
  group_by(covid_code_desc) %>%
  summarise(total_awards = sum(award_amount) / 1e6) %>%
  ungroup() %>%
  arrange(desc(covid_code_desc)) %>%
  mutate(prop = total_awards / sum(total_awards),
         csum = cumsum(prop),
         ypos = csum - prop/2 ) %>%
  ggplot(aes(x = "", y = prop, fill = covid_code_desc)) +
  geom_bar(stat="identity") +
  geom_text_repel(aes(label =
                        paste0(dollar(total_awards,
                                      accuracy = 1,
                                      suffix = "M"),
                               "\n", percent(prop, accuracy = .01)),
                      y = ypos),
                  show.legend = FALSE,
                  nudge_x = .8,
                  size = 3, color = "grey25") +
  coord_polar(theta ="y") +
  theme_void() +
  theme(legend.position = "right",
        legend.title = element_text(colour = "grey25"),
        legend.text = element_text(colour="blue", size=6, 
                                   face="bold"),
        plot.title = element_text(color = "grey25"),
        plot.caption = element_text(size = 6)) +
  labs(caption = "Data Source: NIH RePORTER API v2") +
  ggtitle("Legislative Source for NIH Covid Response Project Funding")

Learning More

With the basics described above you can get started with the BLS API right away. To learn more see:

Getting Help

If you’ve exhausted the resources above and require help, please open an issue in this repository. Include the problematic code, a description of the expected behavior, the unintended behavior actually observed, and your investigations so far.

Feel free also to email me with feedback or to notify me of new issues.