--- title: "Birth Death Skyline post-processing" author: "Louis du Plessis" date: '`r format(Sys.time(), "Last modified: %d %b %Y")`' output: html_document params: logfile: "../precooked_runs/hcv_bdsky.log" logfile_long: "../precooked_runs/hcv_bdsky_30M.log" gridsize: 100 mostrecent: 1993 --- This notebook steps through the post-processing of `hcv_bdsky.xml`, the analysis of the Egyptian HCV data with a Birth Death Skyline Contemporary model. ```{r install-packages, eval=FALSE, echo=FALSE} # Install the required packages by evaluating this chunk # (only needs to be evaluated the first time) install.packages("devtools") install.packages("coda") install.packages("RColorBrewer") devtools::install_github("laduplessis/bdskytools") devtools::install_github("laduplessis/beastio") ``` ```{r setup, include=FALSE} # Load the required packages and set global options library(coda) library(bdskytools) library(beastio) library(RColorBrewer) knitr::opts_chunk$set(echo = TRUE, fig.path="figs/", dev='png', fig.width=7, fig.height=5) # Set up some colours cols <- list(blue = RColorBrewer::brewer.pal(12,"Paired")[2], orange = RColorBrewer::brewer.pal(12,"Paired")[8]) set_alpha <- function(c, alpha=1.0) paste0(c,format(as.hexmode(round(alpha*255)), width=2)) ``` # Load the trace file and check convergence We can load the log file using the `readLog()` function. ```{r load-data} bdsky_trace <- beastio::readLog(params$logfile, burnin=0.1) ``` With the log file loaded as a `coda::mcmc` object we can use functions from the `coda` package to explore the trace. (see the `coda` [manual](https://cran.r-project.org/web/packages/coda/index.html) for more examples) ```{r, eval=FALSE} summary(bdsky_trace) varnames(bdsky_trace) ``` We can use the `checkESS()` function to find which parameters have ESS < 200, ```{r check-convergence} beastio::checkESS(bdsky_trace) ``` or use the same function to plot the ESS values of all parameters. ```{r plot-ESS, results='hide'} beastio::checkESS(bdsky_trace, cutoff=200, plot=TRUE, log='y', ylim=c(1,10000), title="All parameters", plot.grid=TRUE) ``` We see that most parameters have very low ESS values, because we didn't run the chain long enough. We can also investigate the ESS values of the analysis we ran for 30 million steps, where most parameters should have an ESS > 200: ```{r ESS-bdsky-long} bdsky_trace_long <- beastio::readLog(params$logfile_long, burnin=0.1) beastio::checkESS(bdsky_trace_long, cutoff=200, plot=TRUE, log='y', ylim=c(1,10000), title="All parameters", plot.grid=TRUE) ``` # Extract parameter estimates and HPDs Next we can extract the $R_e$ parameter values and their HPDs. ```{r extract-Re} Re_sky <- beastio::getLogFileSubset(bdsky_trace, "reproductiveNumber_BDSKY_Contemp") Re_hpd <- t(beastio::getHPDMedian(Re_sky)) delta_hpd <- beastio::getHPDMedian(bdsky_trace[, "becomeUninfectiousRate_BDSKY_Contemp"]) ``` # Plotting non-gridded BDSKY estimates We can plot the raw $R_e$ HPD intervals. This is equivalent to the output in Tracer. ```{r plot-Re-ungridded} bdskytools::plotSkyline(1:10, Re_hpd, type='step', ylab="Re") ``` Since the intervals in this analysis are equidistant between the origin and the present and the origin was also estimated, they should probably not be plotted in this way. Use this only when the interval times in bdsky are fixed (requires editing the XML) or when the origin is fixed (automatically fixes the shift-times). When doing this do NOT plot with type='smooth' as it gives a misleading result!!! # Plotting a "smooth" skyline In order to plot the smooth skyline we have to marginalize our $R_e$ estimates on a regular time grid and calculate the HPD at each grid point. It is usually a good idea to use a grid with more cells than the dimension of $R_e$ (but using too many can result in noisy estimates). To do this we first calculate the marginal posterior at every time of interest using the function `bdskytools::gridSkyline()` and then calculate the HPD for each of the finer time intervals. Here we choose to look at `params$gridsize` equidistantly spaced points between the median tMRCA and the most recent sequence (`params$mostrecent`). The times to grid the skyline on (`gridTimes`), refers to years in the past. ```{r grid-Re} tmrca_med <- median(bdsky_trace[, "Tree.height"]) gridTimes <- seq(0, median(tmrca_med), length.out=params$gridsize) Re_gridded <- mcmc(bdskytools::gridSkyline(Re_sky, bdsky_trace[, "origin_BDSKY_Contemp"], gridTimes)) Re_gridded_hpd <- t(getHPDMedian(Re_gridded)) ``` Now we are ready to plot the smooth skyline. ```{r plot-Re} times <- params$mostrecent - gridTimes plotSkyline(times, Re_gridded_hpd, xlab="Date", ylab="Re", type="smooth") plotSkyline(times, Re_gridded_hpd, xlab="Date", ylab="Re", type="lines") ``` We can plot the gridded $R_e$ skyline (not its HPDs) for a few of the states in our MCMC chain to see what it really looks like as the Markov chain samples parameters. Note that the intervals overlap between different posterior samples. This is because the origin estimate is a different sample from the origin's posterior density in each of the plotted samples. As we add more samples to the plot we start to see the smooth skyline appear. ```{r plot-Re-traces, fig.height=10} layout(matrix(c(1:4), nrow=4)) plotSkyline(times, Re_gridded, type='steplines', traces=1, col=cols$blue,ylims=c(0,3.5), xlab="Time", ylab="Re", main="1 random sample") plotSkyline(times, Re_gridded, type='steplines', traces=10, col=set_alpha(cols$blue,0.5),ylims=c(0,3.5), xlab="Time", ylab="Re", main="10 random samples") plotSkyline(times, Re_gridded, type='steplines', traces=100, col=set_alpha(cols$blue,0.5),ylims=c(0,3.5), xlab="Time", ylab="Re", main="100 random samples") plotSkyline(times, Re_gridded, type='steplines', traces=1000, col=set_alpha(cols$blue,0.1),ylims=c(0,3.5), xlab="Time", ylab="Re", main="1000 random samples") ``` # Combined plot Finally, we can plot both Re and delta on one set of axes for comparison. In this analysis delta has a dimension of 1, so the skyline is not really insightful. We can also use this type of plot to compare skylines of the same parameter between different models (eg. changing the priors or number of shifts). ```{r combined} par(mar=c(4,4,1,4)) plot(1, type='n', xlim=c(1700,2000), ylim=c(0,1), xlab='Year', ylab="", yaxt='n', xaxs='i', yaxs='i') plotSkyline(range(times), as.matrix(delta_hpd), type='step', lwd=2, xlab="", ylab="", add=TRUE, new=FALSE, axes=FALSE, fill=set_alpha(cols$blue, 0.5), col=cols$blue) axis(4, las=1) mtext(expression(delta), side=4, line=3) plotSkyline(times, Re_gridded_hpd, lwd=2, xlims=c(1700,2000), xaxs='i', yaxs='i', xlab="", ylab="", add=TRUE, new=TRUE, axes=FALSE, fill=set_alpha(cols$orange, 0.5), col=cols$orange) axis(2, las=1) mtext(expression("R"[e]), side=2, line=3) abline(h = 1, lty=2, col=cols$red, lwd=2) legend("topright", legend=c(expression("R"[e]), expression(delta)), bty='n', fill=set_alpha(c(cols$orange, cols$blue), 0.5), border=c(cols$orange, cols$blue)) ```