--- title: "Using JAGS to run MCMC" author: "Math 315, Fall 2019" geometry: "margin=.8in" header-includes: - \usepackage{multicol} output: pdf_document: default html_document: default --- ```{r setup, include=FALSE, cache=TRUE} knitr::opts_chunk$set(echo = TRUE) ``` For the remainder of the term we will use Just Another Gibbs Sampler (JAGS) to draw MCMC samples from our posterior distributions. In class, we won't be able to discuss every distribution or function you may need to specify you JAGS model, so be sure to keep the URL to the JAGS user's manual handy: https://github.com/aloy/math315-fall2019/blob/master/jags_user_manual.pdf. ## Key steps Every JAGS program has five key components: 1. Specify the model as a string (or save it to a text file) 2. Compile the model using `jags.model()` 3. Draw burn-in samples using `update()` 4. Draw posterior samples using `coda.samples()` 5. Inspect/summarize the results using the `plot()` and `summary()` ## Example: Rocket launches Let's look at a small example first by revisiting our rocket launch example. Let $Y= \#$ successful launches. Then the model is given by: - Likelihood: We have $n$ iid ${\rm Bernoulli} (\theta)$ trials - Prior: $\theta \sim \text{Unif}(0.1, 0.9)$ ### Step 0. Load rjags and data set ```{r message=FALSE} library(rjags) # Load data launches <- read.table("https://www4.stat.ncsu.edu/~wilson/BR/table21.txt", header = TRUE) y <- launches$Outcome n <- nrow(launches) ``` ### Step 1. Specify the model as a string ```{r eval=FALSE} model_string <- textConnection("model{ # Likelihood for(i in 1:n) { y[i] ~ dbern(theta) } # Prior theta ~ dunif(a, b) }") ``` - Note: `~` indicates that a variable follow a given distribution - Be sure to use `textConnection()` if you are not saving this model string as a separate text file. ### Step 2. Compile the MCMC code ```{r eval=FALSE} inits <- list(theta = 0.5) data <- list(y = y, n = n, a = 0.1, b = 0.9) model <- jags.model(model_string, data = data, inits = inits, n.chains = 1) ``` Note: inits` and `data` should be **lists** ### Step 3. Draw burn-in samples ```{r eval=FALSE} update(model, 1000) ``` Note: Add the argument `progress.bar = "none"` to silence the progress bar in your .Rmd files ### Step 4. Draw posterior samples ```{r eval=FALSE} samples <- coda.samples( model, variable.names = "theta", n.iter = 10000, progress.bar = "none" ) ``` Note: `variable.names` expects a character vector with the names of the variables to be monitored ### Step 5. Inspect/summarize the results ```{r eval=FALSE} summary(samples) plot(samples) ``` Note: `samples` will be a matrix with one column in this case. ## Your turn 1: Two-sample model (redux) In this first example, we'll return to the two-sample model we explored last week. Recall that the researchers were interested in whether a color distracter slowed student reaction time. The model is given by: $$ \begin{aligned} Y_i &\overset{\rm iid}{\sim} \mathcal{N}(\mu, \sigma^2), \ i = 1, \ldots, 20 & \text{(standard group)}\\ Y_j &\overset{\rm iid}{\sim} \mathcal{N}(\mu + \delta, \sigma^2), \ j = 21, \ldots, 40 & \text{(color group)}\\ \mu &\sim \mathcal{N}(0, 100^2)\\ \delta &\sim \mathcal{N}(0, 100^2)\\ \sigma^2 &\sim {\rm InvGamma}(0.01, 0.01)\\ \end{aligned} $$ ```{r} stroop <- read.csv("http://aloy.rbind.io/data/stroop_game.csv") ``` \clearpage Complete the following code to do the following: i. Specify the model as a text string. Note that `tau =` $1 / \sigma^2$. i. Compile the MCMC code so that is runs a single chain. i. Update the model to draw 1000 burn-in samples. i. Draw 10,000 MCMC samples after burn-in, and monitor the parameters $\mu$, $\delta$, and $\sigma$. ```{r eval=FALSE} # Data management y <- stroop$Time y.std <- y[stroop$Type == "Standard"] y.col <- y[stroop$Type == "Color"] n <- length(y.std) m <- length(y.col) # Specify the model stroop_model_string <- ___("model{ # Specify the likelihood for(i in 1:n) { y.std[i] ~ dnorm(mu, tau) } for(i in 1:m) { y.col[i] ~ dnorm(mu + delta, tau) } # Specify the priors mu ~ ___(0, 1 / pow(100, 2)) delta ~ ___(0, 1 / pow(100, 2)) tau ~ ___(.01, .01) # Calculate sigma sigma <- ___ / sqrt(___) }") # Compile the MCMC code inits <- ___(mu = mean(y), delta = 0, tau = 1 / sd(y)) stroop_data <- ___(y.std = y.std, y.col = y.col, n = n, m = m) stroop_model <- jags.model(___, data = ___, inits = ___, n.chains = ___) # Draw 1000 burn-in samples ___(___, 1000) # Draw 10000 posterior samples stroop_samples <- ___(___, variable.names = ___(___, ___, ___), n.iter = 10000, progress.bar = "none" ) # Inspect/summarize the results summary(___) plot(___) ``` \clearpage **Follow-up questions:** 1. Have the chains for $\mu$, $\delta$, and $\sigma$ converged? How do you know? \vspace{.5in} 2. Report the posterior mean and SD for each parameter. \vspace{.5in} 3. Report a 95% credible interval for $\delta$. What does this say about the difference in average completion time between groups? \vspace{.5in} ## Your turn 2: Survival analysis Researchers are interested in modeling the survival times, measured in weeks, of patients who were diagnosed with leukemia. The patients were classified according to two characteristics of white bloods cells. In this example, we will only examine those whose blood is AG+. The sample consists of $n=17$ times in weeks from diagnosis to death. ```{r} times <- c(65, 156, 100, 134, 16, 108, 121, 4, 39, 143, 56, 26, 22, 1, 1, 5, 65) ``` After discussing the problem with the researchers, you decide to use the Weibull distribution to model the survival time of patient $i$, $Y_i$. The Weibull PDF and CDF are given below for your reference, but are available as functions in JAGS. \begin{align*} f(y_i | v, \lambda) &= \lambda v y_i^{v-1} e^{-\lambda y_i^{v}},\ y_i, v, \lambda > 0 \\ F(y_i | v, \lambda) &= 1 - e^{-\lambda y_i^{v}} \end{align*} In addition, you decide to place independent gamma priors on $\lambda$ and $v$, and since you have little prior information about the parameters, so you decide to use $\text{Gamma}(0.01, 0.01)$ for both. **Set up question:** According to the JAGS user manual, what function should you use for the Weibull density. **Your task:** Fit the above model in JAGS and plot the posterior density of the 24-week survival rate, which is defined as $1-F(24 | v, \lambda)$ where $F$ denotes the CDF of a Weibull distribution. (Check the manual for the function that gives the Weibull CDF!) Summarize your findings about the 24-hour survival rate. ```{r eval=FALSE} # Specify the model string survival_model_string <- textConnection("model{ # Specify the likelihood for(i in 1:n) { y[i] ~ ___(v, lambda) } # Specify the priors ___ ~ ___(0.01, 0.01) ___ ~ ___(0.01, 0.01) # Calculate 24-week survival rate s24 <- 1 - ___(___, v, lambda) }") # Compile the MCMC code inits <- ___(v = 1, lambda = 1) survival_data <- ___(y = ___, n = ___) survival_model <- ___(___, data = ___, inits = ___, n.chains = 3) # Draw 1000 burn-in samples update(___, 1000) # Draw 5000 posterior samples survival_samples <- ___( ___, variable.names = ___, n.iter = ___, progress.bar = "none" ) # Inspect/summarize the results ```