This notebook steps through the post-processing of hcv_bdsky.xml
, the analysis of the Egyptian HCV data with a Birth Death Skyline Contemporary model.
We can load the log file using the readLog()
function.
bdsky_trace <- beastio::readLog(params$logfile, burnin=0.1)
With the log file loaded as a coda::mcmc
object we can use functions from the coda
package to explore the trace. (see the coda
manual for more examples)
summary(bdsky_trace)
varnames(bdsky_trace)
We can use the checkESS()
function to find which parameters have ESS < 200,
beastio::checkESS(bdsky_trace)
## posterior prior
## 138.37906 60.38113
## Tree.height Tree.treeLength
## 78.67775 71.17478
## gammaShape rateAC
## 116.30581 99.85803
## rateAG rateAT
## 80.79500 93.91429
## rateCG rateGT
## 159.01652 123.13430
## freqParameter.1 freqParameter.2
## 50.14282 62.07893
## freqParameter.3 freqParameter.4
## 56.17979 49.70493
## BDSKY_Contemp origin_BDSKY_Contemp
## 71.79820 41.83838
## becomeUninfectiousRate_BDSKY_Contemp reproductiveNumber_BDSKY_Contemp.1
## 30.05652 58.51121
## reproductiveNumber_BDSKY_Contemp.2 reproductiveNumber_BDSKY_Contemp.3
## 64.45153 50.02300
## reproductiveNumber_BDSKY_Contemp.4 reproductiveNumber_BDSKY_Contemp.5
## 39.84365 66.48599
## reproductiveNumber_BDSKY_Contemp.6 reproductiveNumber_BDSKY_Contemp.7
## 61.24581 136.80401
## reproductiveNumber_BDSKY_Contemp.8 reproductiveNumber_BDSKY_Contemp.9
## 109.00646 42.14392
## reproductiveNumber_BDSKY_Contemp.10
## 36.00705
or use the same function to plot the ESS values of all parameters.
beastio::checkESS(bdsky_trace, cutoff=200, plot=TRUE, log='y', ylim=c(1,10000), title="All parameters", plot.grid=TRUE)
We see that most parameters have very low ESS values, because we didn’t run the chain long enough. We can also investigate the ESS values of the analysis we ran for 30 million steps, where most parameters should have an ESS > 200:
bdsky_trace_long <- beastio::readLog(params$logfile_long, burnin=0.1)
beastio::checkESS(bdsky_trace_long, cutoff=200, plot=TRUE, log='y', ylim=c(1,10000), title="All parameters", plot.grid=TRUE)
## becomeUninfectiousRate_BDSKY_Contemp reproductiveNumber_BDSKY_Contemp.9
## 111.4578 196.5225
## reproductiveNumber_BDSKY_Contemp.10
## 191.1606
Next we can extract the \(R_e\) parameter values and their HPDs.
Re_sky <- beastio::getLogFileSubset(bdsky_trace, "reproductiveNumber_BDSKY_Contemp")
Re_hpd <- t(beastio::getHPDMedian(Re_sky))
delta_hpd <- beastio::getHPDMedian(bdsky_trace[, "becomeUninfectiousRate_BDSKY_Contemp"])
We can plot the raw \(R_e\) HPD intervals. This is equivalent to the output in Tracer.
bdskytools::plotSkyline(1:10, Re_hpd, type='step', ylab="Re")
Since the intervals in this analysis are equidistant between the origin and the present and the origin was also estimated, they should probably not be plotted in this way. Use this only when the interval times in bdsky are fixed (requires editing the XML) or when the origin is fixed (automatically fixes the shift-times).
When doing this do NOT plot with type=‘smooth’ as it gives a misleading result!!!
In order to plot the smooth skyline we have to marginalize our \(R_e\) estimates on a regular time grid and calculate the HPD at each grid point. It is usually a good idea to use a grid with more cells than the dimension of \(R_e\) (but using too many can result in noisy estimates).
To do this we first calculate the marginal posterior at every time of interest using the function bdskytools::gridSkyline()
and then calculate the HPD for each of the finer time intervals. Here we choose to look at params$gridsize
equidistantly spaced points between the median tMRCA and the most recent sequence (params$mostrecent
). The times to grid the skyline on (gridTimes
), refers to years in the past.
tmrca_med <- median(bdsky_trace[, "Tree.height"])
gridTimes <- seq(0, median(tmrca_med), length.out=params$gridsize)
Re_gridded <- mcmc(bdskytools::gridSkyline(Re_sky, bdsky_trace[, "origin_BDSKY_Contemp"], gridTimes))
Re_gridded_hpd <- t(getHPDMedian(Re_gridded))
Now we are ready to plot the smooth skyline.
times <- params$mostrecent - gridTimes
plotSkyline(times, Re_gridded_hpd, xlab="Date", ylab="Re", type="smooth")
plotSkyline(times, Re_gridded_hpd, xlab="Date", ylab="Re", type="lines")
We can plot the gridded \(R_e\) skyline (not its HPDs) for a few of the states in our MCMC chain to see what it really looks like as the Markov chain samples parameters. Note that the intervals overlap between different posterior samples. This is because the origin estimate is a different sample from the origin’s posterior density in each of the plotted samples. As we add more samples to the plot we start to see the smooth skyline appear.
layout(matrix(c(1:4), nrow=4))
plotSkyline(times, Re_gridded, type='steplines', traces=1,
col=cols$blue,ylims=c(0,3.5), xlab="Time", ylab="Re", main="1 random sample")
plotSkyline(times, Re_gridded, type='steplines', traces=10,
col=set_alpha(cols$blue,0.5),ylims=c(0,3.5), xlab="Time", ylab="Re", main="10 random samples")
plotSkyline(times, Re_gridded, type='steplines', traces=100,
col=set_alpha(cols$blue,0.5),ylims=c(0,3.5), xlab="Time", ylab="Re", main="100 random samples")
plotSkyline(times, Re_gridded, type='steplines', traces=1000,
col=set_alpha(cols$blue,0.1),ylims=c(0,3.5), xlab="Time", ylab="Re", main="1000 random samples")
Finally, we can plot both Re and delta on one set of axes for comparison. In this analysis delta has a dimension of 1, so the skyline is not really insightful. We can also use this type of plot to compare skylines of the same parameter between different models (eg. changing the priors or number of shifts).
par(mar=c(4,4,1,4))
plot(1, type='n', xlim=c(1700,2000), ylim=c(0,1),
xlab='Year', ylab="", yaxt='n', xaxs='i', yaxs='i')
plotSkyline(range(times), as.matrix(delta_hpd), type='step', lwd=2,
xlab="", ylab="", add=TRUE, new=FALSE, axes=FALSE, fill=set_alpha(cols$blue, 0.5), col=cols$blue)
axis(4, las=1)
mtext(expression(delta), side=4, line=3)
plotSkyline(times, Re_gridded_hpd, lwd=2, xlims=c(1700,2000), xaxs='i', yaxs='i',
xlab="", ylab="", add=TRUE, new=TRUE, axes=FALSE, fill=set_alpha(cols$orange, 0.5), col=cols$orange)
axis(2, las=1)
mtext(expression("R"[e]), side=2, line=3)
abline(h = 1, lty=2, col=cols$red, lwd=2)
legend("topright", legend=c(expression("R"[e]), expression(delta)), bty='n',
fill=set_alpha(c(cols$orange, cols$blue), 0.5), border=c(cols$orange, cols$blue))