NobBS: Nowcasting by Bayesian Smoothing

NobBS is Bayesian approach to estimate the number of occurred-but-not-yet-reported cases from incomplete, time-stamped reporting data for disease outbreaks. NobBS learns the reporting delay distribution and the time evolution of the epidemic curve to produce smoothed nowcasts in both stable and time-varying case reporting settings. For details, see the pre-print by McGough et al. 2019.

Installation

You can install the released version of NobBS from CRAN with:

install.packages("NobBS")

The development version of NobBS can be installed with:

library(devtools) 
install_github("sarahhbellum/NobBS") 

NobBS requires that JAGS (Just Another Gibbs Sampler) is downloaded to the system or computer on which NobBS will run. Download JAGS here: http://mcmc-jags.sourceforge.net/.

Usage

NobBS accepts a line list of case onset and case reporting dates as the input data, and users are required to specify:

For example, in the data denguedat, cases are reported on a weekly basis:

library(NobBS)
data(denguedat)
head(denguedat)
#>   onset_week report_week
#> 1 1990-01-01  1990-01-01
#> 2 1990-01-01  1990-01-01
#> 3 1990-01-01  1990-01-01
#> 4 1990-01-01  1990-01-08
#> 5 1990-01-01  1990-01-08
#> 6 1990-01-01  1990-01-15

For this dengue report line list, the user would specify units=“1 week”, onset_date=“onset_week”, and report_date=“report_week” to produce a nowcast for the Date now of choice.

The user may optionally specify arguments such as:

Bayesian parameters

The user may specify whether underlying cases occur as part of a Poisson or a Negative Binomial process (Default: Poisson), and may specify the priors on the log-linear model as described in McGough et al. 2019, where default values are set to be weakly informative. In addition, arguments governing the JAGS model MCMC sampling may be specified: number of chains (nChains, default=1), number of samples (nSamp, default=10,000), number of iterations discarded as burn-in (nBurnin, default=1000), number of thinned iterations (nThin, default=1), and adaptation period (nAdapt, default=1000). The percentile of the prediction interval may be specified with conf (default = 0.95 for a 95% prediction interval).

For recommendations on priors and moving windows, see the Materials and Methods sub-section in McGough et al. 2019 entitled “NobBS R package implementation and recommendations.”

Model output

Altogether, a simple model may be run by specifying only the required arguments:

NobBS(denguedat, as.Date("1990-10-01"),units="1 week",onset_date="onset_week",report_date="report_week"). The user should save the output to an R object because many levels of information are stored in the output.

test_nowcast <- NobBS(data=denguedat, now=as.Date("1990-10-01"),
                      units="1 week",onset_date="onset_week",report_date="report_week")
#> [1] "Computing a nowcast for  1990-10-01"
#> Compiling model graph
#>    Resolving undeclared variables
#>    Allocating nodes
#> Graph information:
#>    Observed stochastic nodes: 820
#>    Unobserved stochastic nodes: 822
#>    Total graph size: 5090
#> 
#> Initializing model

NobBS returns the list test_nowcast, which contains the following elements:

  1. estimates, accessed via test_nowcast$estimates. This is a 5-column data frame containing case estimates for each date in the historical time series through now, with corresponding date of case onset, lower and upper bounds of the prediction interval, and the number of cases for that onset date reported up to now.
  2. estimates.inflated, accessed via test_nowcast$estimates.inflated. This is a 5-column data frame containing case estimates for each date in the historical time series through now, inflated by the proportion_reported, with corresponding date of case onset, lower and upper bounds of the prediction interval, and the number of cases for that onset date reported up to now.
  3. nowcast.post.samples, accessed via test_nowcasts$nowcast.post.samples. This is a vector of 10,000 samples from the posterior predictive distribution of the nowcast itself.
  4. params.post, accessed via test_nowcast$params.post. This is a data frame containing 10,000 posterior samples for all parameters of the Bayesian model, including αt = now, {log(βd),…,log(β*max**D) }, λt, d, and τα*2. This data frame can get increasingly large as the length of the time series of available data increases, so the user can control which parameter’s posterior samples are outputted in specs$param_names (Default: all). See NobBS help file for details of specs.
tail(test_nowcast$estimates)
#>    estimate lower upper onset_date n.reported
#> 35       48    44    53 1990-08-27         44
#> 36       59    55    65 1990-09-03         54
#> 37       45    40    52 1990-09-10         38
#> 38       90    81   101 1990-09-17         71
#> 39       80    65    99 1990-09-24         40
#> 40      113    66   187 1990-10-01         10

Plotting results

The NobBS output can be used in a variety of ways- for example, to inspect the posterior distribution of the nowcast estimate or to plot the nowcast and “hindcasts” (estimates produced for weeks in the past).

Inspecting the posterior distribution of key parameters

Posterior of the nowcast:

library(ggplot2)
nowcast_posterior <- data.frame(sample_estimate=test_nowcast$nowcast.post.samps)
ggplot(nowcast_posterior, aes(sample_estimate)) + 
  geom_histogram(binwidth=10) + 
  theme_bw() + 
  xlab("Cases (number)") +
  ggtitle("Samples from the posterior distribution of the nowcast for Oct 1, 1990")

Posterior of βd = 0 (the estimated probability of a case being reported with no delay):

library(ggplot2)
library(dplyr)
beta_0 <- data.frame(prob=exp(test_nowcast$params.post$`Beta 0`))
ggplot(beta_0, aes(prob)) + 
  geom_histogram() + 
  theme_bw() + 
  xlab("Probability of delay=0") +
  ggtitle("Posterior distribution of the probability of delay d=0,
         estimated over the period Jan 1, 1990 to Oct 1, 1990")

Plotting the sequence of nowcast and hindcasts

nowcasts <- data.frame(test_nowcast$estimates)

ggplot(nowcasts) + geom_line(aes(onset_date,estimate,col="Nowcast estimate"),linetype="longdash") +
  geom_line(aes(onset_date,n.reported,col="Reported to date"),linetype="solid") +
  scale_colour_manual(name="",values=c("indianred3","black"))+
  theme_classic()+
  geom_ribbon(fill="indianred3",aes(x = onset_date,ymin=nowcasts$lower, 
                             ymax=nowcasts$upper),alpha=0.3)+
  xlab("Case onset date") + ylab("Estimated cases") +
  ggtitle("Observed and predicted number of cases \nat the week of nowcast (Oct 1990) and weeks prior")