README

EpiLPS: a fast and flexible Bayesian tool for approximate real-time estimation of the time-varying reproduction number ================ Oswaldo Gressani

Aim and scope

EpiLPS (Gressani et al. 2021), is the acronym for Epidemiological modeling (tool) with Laplacian-P-Splines. It proposes a new Bayesian methodology for estimating the instantaneous reproduction number $R_t$ , i.e. the average number of secondary cases generated by an infectious agent at time $t$ (White et al., 2020) ; a key metric for assessing the transmission dynamics of an infectious disease and a useful reference for guiding interventions of governmental institutions in a public health crisis. The EpiLPS project builds upon two strong pillars in the statistical literature, namely Bayesian P-splines and Laplace approximations, to deliver a fast and flexible methodology for inference on $R_t$ . EpiLPS requires two (external) inputs: (1) a time series of incidence counts and (2) a serial or generation interval distribution.

The underlying model for smoothing incidence counts is based on the Negative Binomial distribution to account for possible overdispersion in the data. EpiLPS has a two-phase engine for estimating $R_t$ . First, Bayesian P-splines are used to smooth the epidemic curve and to compute estimates of the mean incidence count of the susceptible population at each day $t$ of the outbreak. Second, in the philosophy of Fraser (2007), the renewal equation is used as a bridge to link the estimated mean incidence and the reproduction number. As such, the second phase delivers a closed-form expression of $R_t$ as a function of the B-spline coefficients and the serial interval distribution.

Another key strength of EpiLPS is that two different strategies can be used to estimate $R_t$ . The first approach called LPSMAP is completely sampling-free and fully relies on Laplace approximations and maximum a posteriori computation of model hyperparameters for estimation. Routines for Laplace approximations and B-splines evaluations are typically the ones that are computationally most intensive and are therefore coded in C++ and integrated in R via the Rcpp package, making them time irrelevant. The second approach is called LPSMALA (Laplacian-P-splines with a Metropolis-adjusted Langevin algorithm) and is fully stochastic. It samples the posterior of the model by using Langevin diffusions in a Metropolis-within-Gibbs framework. Of course, LPSMAP has a computational advantage over LPSMALA. Thanks to the lightning fast implementation of Laplace approximations, LPSMAP typically delivers estimate of $R_t$ in a fraction of a second. The chart below, summarizes the skeleton and mechanisms behind EpiLPS for estimating $R_t$ .

Getting started

As the EpiLPS package includes C++ code, Windows users will need to install Rtools to include the required compilers for a smooth experience. Rtools is free and can be downloaded from https://cran.r-project.org/bin/windows/Rtools/. To install the Github version of EpiLPS (with devtools) type the following lines in the R console:

install.packages("devtools")
devtools::install_github("oswaldogressani/EpiLPS")

The EpiLPS package structure is fairly simple. It has three main routines and an S3 method for plots:

A simulated example

To simulate data with episim(), a serial interval distribution and a pattern for the true reproduction number curve has to be specified. Six patterns are available for the moment. The data generating process is based on Poisson counts or negative binomial counts and the epidemic renewal equation for establishing the link between the mean number of infections and the reproduction number. The default duration of the simulated outbreak is 50 days but other choices are possible. The code below simulates an epidemic according to pattern 4 and gives summarizing plots by setting the option plotsim = TRUE:

si <- c(0.12, 0.28, 0.30, 0.25, 0.05) # Specify a serial interval distribution
simepi <- episim(serial_interval = si, Rpattern = 4, plotsim = TRUE)

The epilps() routine can be used to fit the epidemic data. By default, the LPSMAP approach is used with 30 B-splines in the interval $[1;50]$ and a second order penalty. The plot() routine on the epifit_LPSMAP object can be used to plot the estimated reproduction number.

epifit_LPSMAP <- epilps(incidence = simepi$y, serial_interval = si, tictoc = TRUE)

plot(epifit_LPSMAP)

Several options can be specified in the plot() routine. For instance, graphical parameters such as themetype and rtcol can be used to control the theme and color of the fitted $R_t$ curve. In addition, the option overlayEpiestim can be set to TRUE to overlay the estimated $R_t$ curve with the EpiEstim package of Cori et al., (2013) .

plot(epifit_LPSMAP, themetype = "light", rtcol = "steelblue", overlayEpiestim = TRUE)

The numerical values of the estimated $R(t)$ at days $t=8,\dots,14$ obtained with LPSMAP and the associated $95\%$ credible interval can be obtained by typing:

knitr::kable(epifit_LPSMAP$epifit[8:14,2:4])

	R_estim	R95CI_low	R95CI_up
8	2.400739	2.151855	2.678409
9	2.223205	2.023275	2.442890
10	2.049500	1.882000	2.231908
11	1.912324	1.774338	2.061040
12	1.818282	1.696983	1.948251
13	1.759342	1.654123	1.871254
14	1.714138	1.618886	1.814994

A smooth estimate of the epidemic curve can be obtained with the code below. The option epicol controls the color of the curve and incibars can be set to TRUE or FALSE to show or not the bar plot of the incidence counts.

plot(epifit_LPSMAP, plotout = "epicurve", themetype = "light", epicol = "orange", incibars = TRUE)

Real data examples

To illustrate EpiLPS on real data, we work with the Covid19 R Interface Data Hub https://covid19datahub.io/. Four countries are considered (Luxembourg, Italy, Canada and Japan) and the reproduction number is estimated with LPSMAP over the period April 2020 - October 2021 with a uniform serial interval over 5 days. For Japan, option overlayEpiestim is TRUE to compare the EpiLPS and EpiEstim fits.

library("COVID19")

# Uniform serial interval over 5 days
si <- c(0.2, 0.2, 0.2, 0.2, 0.2)

# Luxembourg
Luxembourg <- covid19(country = "LUX", level = 1, verbose = FALSE)
dateLUX <- Luxembourg$date[75:649]
inciLUX <- Luxembourg$hosp[75:649]

# Italy
Italy <- covid19(country = "ITA", level = 1, verbose = FALSE)
dateITA <- Italy$date[42:616]
inciITA <- Italy$hosp[42:616]

# Canada
Canada <- covid19(country = "CAN", level = 1, verbose = FALSE)
dateCAN <- Canada$date[71:645]
inciCAN <- Canada$hosp[71:645]

# Japan
Japan<- covid19(country = "JPN", level = 1, verbose = FALSE)
dateJPN <- Japan$date[75:649]
inciJPN <- Japan$hosp[75:649]

# Fit with EpiLPS
epiLUX <- epilps(incidence = inciLUX, serial_interval = si, verbose = FALSE)
epiITA <- epilps(incidence = inciITA, serial_interval = si, verbose = FALSE)
epiCAN <- epilps(incidence = inciCAN, serial_interval = si, verbose = FALSE)
epiJPN <- epilps(incidence = inciJPN, serial_interval = si, verbose = FALSE)

gridExtra::grid.arrange(
plot(epiLUX, dates = dateLUX, datelab = "3m", rtcol = "steelblue",
     Rtitle = "Estimated R Luxembourg"),
plot(epiITA, dates = dateITA, datelab = "3m", rtcol = "chartreuse4",
     Rtitle = "Estimated R Italy"),
plot(epiCAN, dates = dateCAN, datelab = "3m", rtcol = "brown2",
     Rtitle = "Estimated R Canada"),
plot(epiJPN, dates = dateJPN, datelab = "3m", rtcol = "darkorchid1",
     overlayEpiestim = TRUE, Rtitle = "Estimated R Japan"),
nrow = 2, ncol = 2)

Validation

To check the (statistical) performance of EpiLPS, the perfcheck() routine can be used to simulate epidemic outbreaks under four different scenarios. Each scenario has a different $R_t$ curve to be compared with the estimated trajectories fitted by EpiLPS. For comparative reasons, the trajectories of EpiEstim (with a weekly sliding window) are also shown. The code below simulates 25 epidemic outbreaks with a data generating process following scenario 3 and a given serial interval distribution. A seed can also be specified for reproducibility.

simexample <- perfcheck(S = 20, method = "LPSMALA",
                        serial_interval = c(0.2, 0.4, 0.2, 0.1, 0.1),
                        scenario = 3, ci_level = 0.95,  seed = 1234,
                        dist = "negbin", overdisp = 1000,
                        themetype = "gray")

Package version

Acknowledgments

This project is funded by the European Union’s Research and Innovation Action under the H2020 work programme, EpiPose (grant number 101003688).

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

References

Gressani, O., Wallinga, J., Althaus, C., Hens, N. and Faes, C. (2021). EpiLPS: a fast and flexible Bayesian tool for near real-time estimation of the time-varying reproduction number. MedRxiv preprint, https://doi.org/10.1101/2021.12.02.21267189

White, L. F., Moser, C. B., Thompson, R. N., & Pagano, M. (2021). Statistical estimation of the reproductive number from case notification data. American Journal of Epidemiology, 190(4), 611-620.

Cori, A., Ferguson, N.M., Fraser, C., Cauchemez, S. (2013) A new framework and software to estimate time-varying reproduction numbers during epidemics, American Journal of Epidemiology, 178(9), 1505–1512. https://doi.org/10.1093/aje/kwt133