is needed. The code then looks as follows: ```{r,EPICURVE-SURVEILLANCE,message=FALSE} ##Load surveillance pkg. library("surveillance") ##Range of the symptom onset date variable so_range <- range(linelist$Date.of.symptoms.onset, na.rm=TRUE) ##Create an sts time series from the linelist, which contains daily counts. sts <- linelist2sts(linelist, dateCol="Date.of.symptoms.onset", aggregate.by="1 day", dRange=so_range) ##Show the resulting time series using the plot functionality for sts objects. plot(sts,legend.opts=NULL, xaxis.tickFreq=list("%d"=atChange,"%m"=atChange), xaxis.labelFreq=list("%d"=at2ndChange),xaxis.labelFormat="%d-%b", xlab="Time (days)",lty=c(1,1,1,1),lwd=c(1,1,2), ylab="No. symptom onsets") ``` ## Nowcasting We now move on to the nowcasts. ```{r} ##State which date to nowcast now <- as.Date("2015-06-12") ``` Say (in a mathematical sense) we move back time to `r now`. In the notation of @hoehle_anderheiden2014 this means $T=`r as.character(now)`$. We want to illustrate what the WHO could see at this point and, on the basis on the available reports at time $T$, estimate the delay distribution and adjust the observed cases accordingly. We shall here only use the right-truncation delay adjusted procedure operating on the generalized Dirichlet distribution. Since the nowcasts for the time points very close to now are very volatile (i.e. have very large credibility regions), it's opportune to not display these casts as they can be very hard to communicate. Also note that the selected date for `now` is selected such that enough cases are available to give a sufficiently reliable estimate for the delay distribution. ```{r,echo=FALSE,eval=FALSE} ##Remove this once the pkg installs again. # source("~/Surveillance/surveillance/pkg/R/nowcast.R") # source("~/Surveillance/surveillance/pkg/R/gd.R") # source("~/Surveillance/surveillance/pkg/R/AllGeneric.R") # source("~/Surveillance/surveillance/pkg/R/stsNC.R") # source("~/Surveillance/surveillance/pkg/R/stsNClist_animate.R") ``` ```{r} ##Nowcasts are displayed up to time (now - safePredictLag) safePredictLag <- 3 nowcastDates <- seq(from = so_range[1], to = now - safePredictLag, by = 1) ``` We now perform right-truncation adjusted Bayesian nowcasting using the generalized Dirichlet distribution. An important choice is here the prior for the expected number of cases per day, i.e. $\lambda_t$ in the notation of @hoehle_anderheiden2014. In the conjugate case this is specified by assuming an iid. Gamma-distribution for $\lambda_t$, which is specified through prior mean and prior variance of the Gamma distribution. We here select here an empirical Bayes inspired approach and estimate these parameters from the currently available data. However, note: These data are by definition of the problem incomplete. As a dirty fix we therefore just inflate the prior variance by a factor - as future work this needs to be improved upon by following a proper marginal likelihood approach. ```{r,results='hide',warning=FALSE} nc.control <- list( N.tInf.prior = structure("poisgamma", mean.lambda = mean(observed(sts)), var.lambda = 5*var(observed(sts)) ), ##compute predictive distribution as wel, which is needed for some of the ##animations. predPMF = TRUE, dRange = so_range) ## Now run the nowcast (NA dates are removed from the dataset). nc <- nowcast(now = now, when = nowcastDates, data = linelist, dEventCol = "Date.of.symptoms.onset", dReportCol = "Date.of.notification.to.WHO", method = "bayes.trunc", #use the conjugate generalized dirichlet aggregate.by = "1 day", D = 14, #adjust cases up to 2 weeks back. control = nc.control) ``` The resulting object is of class `stsNC`, which is just a class inheriting from the `sts` class. Hence, all the usual plotting functions apply to it. In addition, a plot of an `stsNC` object as shown below, contains the median of the pointwise predictive distribution (thick blue line) as well as equi-tailed 95% credibility regions (dashed orange lines). ```{r,NOWCASTPLOT} plot(nc, legend.opts=NULL, xaxis.tickFreq=list("%d"=atChange,"%m"=atChange), xaxis.labelFreq=list("%d"=at2ndChange),xaxis.labelFormat="%d-%b", xlab="Time (days)",lty=c(1,1,1,1),lwd=c(1,1,2), ylab="No. symptom onsets",ylim=c(0,max(observed(nc),upperbound(nc),predint(nc),na.rm=TRUE))) ``` Finally, we can for `stsNC` objects show a simple non-parametric estimate of the delay distribution as a function of time using a window-smoothed approach with window width $2w+1$, see @hoehle_anderheiden2014 for details. ```{r,NOWCASTDELAY} plot(nc, type="delay", dates=seq(so_range[1],now,by="1 day"),w=1) ``` The figure shows for each time point $t$ the median as well as the 10% and 90% quantile of the empirical distribution of delays within the window of $t-w,\ldots,t+w$. Note: This simple estimate ignores the right- truncation, hence, within the period of `(now-D):now` there will be a bias of these estimates towards shorter delays. This biased period is illustrated in the figure by the light-gray shaded area. Furthermore, the median of the model based estimate for the delay distribution is shown for the period of `(now-m:now)`. From the figure one has the suspicion that the delay decreased a bit over time, but the decrease totally at the end could also be due to right-truncation. However, assuming a **time-invariant delay distribution** for the entire outbreak would probably give unsatisfactory results. Hence, we shall for each time point use only a moving window consisting of all observations occuring within the period `(now-m):now`. We select $m=14$ for estimating the delay distribution. This means that only the $m+1$=15 most recent days of the reporting triangle are considered at each instance of "now" when estimating the delay distribution, i.e. there will be 15 observations on number of reports with a delay of 0 days, there will be 14 observationson on number of reports with a delay of 1 day, ..., there will be 1 observation on number of reports with a delay 14 days. Based on the right-truncation adjusted estimation procedure these observations then lead to an estimate of the PMF of the delay. Due to the sliding window this estimate will be more uncertain (especially for the long delays), but if the delay reduces over time, this is more appropriately reflected, because old cases do not contribute to the delay estimation anymore. ## Showing a sequence of nowcasts Once a couple of nowcasts have been performed it can also be helpful to visualize the sequence of nowcasts using an animation. This is easily done by first generating a list of nowcast results followed by a call to `surveillance::animate_nowcasts`. ```{r,cache=TRUE,results='hide',warning=FALSE,message=FALSE} ##Nowcast all time points (except for the first three weeks). This might take a while. nowcasts <- list() animRange <- seq(so_range[1]+21,so_range[2],by="1 day") ##Do nowcasts for the rage of dates for (i in seq_len(length(animRange))) { today <- animRange[i] print(as.character(today)) new_nowcastDates <- seq(from = so_range[1], to = today - safePredictLag, by = 1) nowcasts[[as.character(today)]] <- nowcast( now = today, when = new_nowcastDates, data = linelist, dEventCol = "Date.of.symptoms.onset", dReportCol = "Date.of.notification.to.WHO", method = "bayes.trunc", aggregate.by = "1 day", D = 14, m = 14, ##moving window of 14+1 days for estimation control = nc.control) } ``` We will use the `animation` package to wrap the call to `animate_nowcasts` in order to generate an animated GIF. Better control over the obtained animation can be obtained using the `animation::saveHTML` function. If one wants to include the animation into a Power-Point presentation, I recommend the use of Flash animations (`animation::saveSWF`). Note that the animation package requires [ImageMagick](http://www.imagemagick.org/script/index.php) to be installed on your system. ```{r,ANIMNOWCASTS,warning=FALSE,fig.width=8,fig.height=5,message=FALSE,results='hide'} animation::saveGIF( { par(mar=c(5.5, 4, 2, 2) + 0.1) ; ##add extra space at the bottom and remove at top animate_nowcasts(nowcasts = nowcasts, linelist_truth = linelist, method = "bayes.trunc", #nowcast method to use (has to be in the casts) control = list(sys.sleep=0,dRange=nc.control$dRange,anim.dRange=range(animRange),ylim=c(0,40))) }, movie.name="animate-nowcasts.gif", ani.width=800, ani.height=500) ``` ```{r,results='asis',echo=FALSE} ##setwd(curwd) invisible(file.rename("animate-nowcasts.gif",file.path(fullFigPath,"animate-nowcasts.gif"))) cat(paste0("![]({{ site.baseurl }}/",knitr::opts_chunk$get("fig.path"),"animate-nowcasts.gif",")")) ``` As a final comparison we can also obtain an animation of how the delay distribution changes with time. From the animation we notice that the delay appears to steadily decrease, which is a typical behavior for high-profiled outbreaks. However, this also questions the assumption of a **time-invariant delay distribution**. Instead, one could use a window-limited estimation approach or one could try to model the delay distribution using a discrete time survival model as done in @hoehle_anderheiden2014. The animation below uses the above mentioned sliding-window approach, which is noticed by the estimate of $q_{0.5}^{\text{bayes.trunc(T)}}$ changing for each $T$. ```{r,ANIMDELAYS,results='hide',message=FALSE} animation::saveGIF( for (i in seq_len(length(nowcasts))) { plot(nowcasts[[i]], w=3, type="delay") }, movie.name="animate-delays.gif", ani.width=800, ani.height=500) ``` ```{r,results='asis',echo=FALSE} invisible(file.rename("animate-delays.gif",file.path(fullFigPath,"animate-delays.gif"))) cat(paste0("![]({{ site.baseurl }}/",knitr::opts_chunk$get("fig.path"),"animate-delays.gif"),")") ``` # Discussion The adjustment of occurred-but-not-yet-reported-events applies to many other application areas besides **real-time public health monitoring**. For example, direct links to claims reserve modelling in actuarial sciences exist, but many other areas of application, where delays play a role, appear of interest. For the methodological details of the nowcasting procedures see @hoehle_anderheiden2014, which is available as open access document. The present blog entry focused on getting methods operational using R - the R code of this blog post is available from [github](`r paste0("https://raw.githubusercontent.com/hoehleatsu/hoehleatsu.github.io/master/_source/",knitr::current_input())`). # Acknowledgements Work on the surveillance package was supported by the Swedish Research Council (2015_05182_VR). # References