Introduction to the R package covid19br

Introduction

This vignette shows how to use the R package covid19br for downloading and exploring data from the COVID-19 pandemic in Brazil and the globe as well. The package downloads datasets from the following repositories:

The last repository has data on the COVID-19 pandemic at the global level (daily counts of confirmed cases, deaths, and recovered patients by countries and territories), and has been widely used all over the world as a reliable source of data information on the COVID-19 pandemic. The former repository, on the other hand, possesses data on the Brazilian territory by city, state, region, and national levels.

We hope that this package may be helpful to other researchers and scientists to understand and fight this terrible pandemic that has been plaguing the world.

Getting started with R package covid19br

We will get started by showing how to use the package to load into R data sets of the COVID-19 pandemic by downloading the COVID-19 data set from the official Brazilian repository https://covid.saude.gov.br

library(covid19br)
library(tidyverse)

# downloading the data (at national level):
brazil <- downloadCovid19("brazil")

# looking at the downloaded data:
glimpse(brazil)
#> Rows: 764
#> Columns: 9
#> $ date         <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week     <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases     <int> 0, 1, 0, 0, 1, 0, 0, 0, 1, 4, 6, 6, 6, 0, 9, 18, 25, 21, …
#> $ accumCases   <int> 0, 1, 1, 1, 2, 2, 2, 2, 3, 7, 13, 19, 25, 25, 34, 52, 77,…
#> $ newDeaths    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> 0, 1, 1, 0, 1, 1, 0, 0, 1, 4, 6, 7, 6, 1, 6, 16, 23, 24, …
#> $ newFollowup  <int> 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 7, 12, 19, 24, 28, 36, 54, …
#> $ pop          <dbl> 210147125, 210147125, 210147125, 210147125, 210147125, 21…

# plotting the accumulative number of deaths:
ggplot(brazil, aes(x = date, y = accumDeaths)) +
  geom_point() +
  geom_path()

Next, will show how to draw a plot with the daily count of new deaths along with its respective moving averarge. Here, we will use the function pracma::movavg() to compute the moving average.

library(pracma)

# computing the moving average:
brazil <- brazil %>%
  mutate(
    ma_newDeaths = movavg(newDeaths, n = 7, type = "s")
  )

# looking at the transformed data:
glimpse(brazil)
#> Rows: 764
#> Columns: 10
#> $ date         <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week     <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases     <int> 0, 1, 0, 0, 1, 0, 0, 0, 1, 4, 6, 6, 6, 0, 9, 18, 25, 21, …
#> $ accumCases   <int> 0, 1, 1, 1, 2, 2, 2, 2, 3, 7, 13, 19, 25, 25, 34, 52, 77,…
#> $ newDeaths    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> 0, 1, 1, 0, 1, 1, 0, 0, 1, 4, 6, 7, 6, 1, 6, 16, 23, 24, …
#> $ newFollowup  <int> 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 7, 12, 19, 24, 28, 36, 54, …
#> $ pop          <dbl> 210147125, 210147125, 210147125, 210147125, 210147125, 21…
#> $ ma_newDeaths <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.…

After computing the desired moving average, it is convenient to reorganize the data to fit the so-called tidy data format. This task can be easily done with the aid of the function pivot_long():

deaths <- brazil %>%
  select(date, newDeaths, ma_newDeaths) %>%
  pivot_longer(
    cols = c("newDeaths", "ma_newDeaths"),
    values_to = "deaths", names_to = "type"
  ) %>%
  mutate(
    type = recode(type, 
           ma_newDeaths = "moving average",
           newDeaths = "count",
    )
  )

# looking at the (tidy) data:
glimpse(deaths)
#> Rows: 1,528
#> Columns: 3
#> $ date   <date> 2020-02-25, 2020-02-25, 2020-02-26, 2020-02-26, 2020-02-27, 20…
#> $ type   <chr> "count", "moving average", "count", "moving average", "count", …
#> $ deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

# drawing the desired plot:
ggplot(deaths, aes(x = date, y=deaths, color = type)) +
  geom_point() +
  geom_path() + 
  theme(legend.position="bottom")

When dealing with epidemiological data we are often interested in computing quantities such as incidence, mortality and lethality rates. The function covid19br::add_epi_rates() can be used to add those rates to the downloaded data, as shown below:


# downloading the data (region level):
regions <- downloadCovid19("regions") 

# adding the rates to the downloaded data:
regions <- regions %>%
  add_epi_rates()

# looking at the data:
glimpse(regions)
#> Rows: 3,820
#> Columns: 13
#> $ region       <chr> "Midwest", "Midwest", "Midwest", "Midwest", "Midwest", "M…
#> $ date         <date> 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28, 2020-02-…
#> $ epi_week     <int> 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11…
#> $ newCases     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 3, 4, …
#> $ accumCases   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 5, 9, …
#> $ newDeaths    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ accumDeaths  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ newRecovered <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ newFollowup  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ pop          <dbl> 16297074, 16297074, 16297074, 16297074, 16297074, 1629707…
#> $ incidence    <dbl> 0.000000000, 0.000000000, 0.000000000, 0.000000000, 0.000…
#> $ lethality    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ mortality    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …

The function plotly::ggplotly() can be used to draw an interactive plot as follows:

library(plotly)

p <- ggplot(regions, aes(x = date, y = mortality, color = region)) +
  geom_point() +
  geom_path()

ggplotly(p)

In our last example, we will obtain a table summarizing the for the 27 Brazilian capitals in 2022-03-29.

library(kableExtra)

cities <- downloadCovid19("cities")

capitals <- cities %>%
  filter(capital == TRUE, date == max(date)) %>%
  add_epi_rates() %>%
  select(region, state, city, newCases, newDeaths, accumCases, accumDeaths, incidence, mortality, lethality) %>%
  arrange(desc(lethality), desc(mortality), desc(incidence))

# printing the table:
capitals %>%
 kable(
    full_width = F,
    caption = "Summary of the COVID-19 pandemic in the 27 capitals of Brazilian states."
  )
Summary of the COVID-19 pandemic in the 27 capitals of Brazilian states.
region state city newCases newDeaths accumCases accumDeaths incidence mortality lethality
Northeast MA São Luís 216 37 58090 2698 5271.880 244.8534 4.64
Southeast SP São Paulo 441 25 1049194 42085 8563.435 343.4943 4.01
Southeast RJ Rio de Janeiro 1097 14 950318 36653 14143.946 545.5206 3.86
North PA Belém 124 0 137601 5300 9217.984 355.0506 3.85
North AM Manaus 28 0 290265 9694 13298.054 444.1160 3.34
South PR Curitiba 24 2 245942 8166 12722.641 422.4292 3.32
Northeast CE Fortaleza 28 13 361022 10974 13524.756 411.1126 3.04
Northeast BA Salvador 129 5 292684 8606 10189.716 299.6156 2.94
Midwest MT Cuiabá 79 1 129374 3668 21120.665 598.8112 2.84
Northeast PE Recife 736 7 222818 6087 13539.184 369.8669 2.73
Northeast AL Maceió 35 2 115474 3045 11332.669 298.8376 2.64
Midwest GO Goiânia 699 2 285625 7443 18839.295 490.9265 2.61
North RO Porto Velho 122 0 105807 2658 19980.776 501.9413 2.51
South RS Porto Alegre 224 4 246318 6189 16600.810 417.1129 2.51
Midwest MS Campo Grande 42 3 188098 4419 20993.502 493.2019 2.35
Northeast PI Teresina 0 4 119192 2764 13781.892 319.5948 2.32
Northeast RN Natal 0 0 131402 2931 14862.428 331.5153 2.23
Northeast PB João Pessoa 24 0 146214 3178 18073.089 392.8234 2.17
Southeast MG Belo Horizonte 2183 0 373286 7617 14859.697 303.2161 2.04
North AP Macapá 2 0 82691 1578 16428.882 313.5139 1.91
North AC Rio Branco 0 0 62933 1178 15450.544 289.2082 1.87
Northeast SE Aracaju 1 2 150168 2537 22856.169 386.1415 1.69
Midwest DF Brasília 285 3 691980 11579 22949.204 384.0123 1.67
North RR Boa Vista 9 0 119553 1618 29947.171 405.2974 1.35
Southeast ES Vitória 98 2 107806 1404 29772.685 387.7414 1.30
South SC Florianópolis 115 0 120285 1231 24010.276 245.7218 1.02
North TO Palmas 4 0 73745 727 24653.408 243.0406 0.99