#' --- #' title: "Introduction to partially observed Markov processes" #' subtitle: 'ICTP Workshop on Mathematical Models of Climate Variability, Environmental Change and Infectious Diseases' #' author: "Aaron A. King" #' output: #' html_document: #' toc: yes #' toc_depth: 4 #' bibliography: ../course.bib #' csl: ../ecology.csl #' nocite: | #' @King2008, @Romero-Severson2015, @He2010, #' @Laneri2010, @King2015 #' --- #' #' \newcommand\prob[1]{\mathbb{P}\left[{#1}\right]} #' \newcommand\expect[1]{\mathbb{E}\left[{#1}\right]} #' \newcommand\var[1]{\mathrm{Var}\left[{#1}\right]} #' \newcommand\cov[1]{\mathrm{Cov}\left[{#1}\right]} #' \newcommand\dist[2]{\mathrm{#1}\left(#2\right)} #' \newcommand\dlta[1]{{\Delta}{#1}} #' \newcommand{\dd}[1]{\mathrm{d}{#1}} #' \newcommand{\transpose}{\mathrm{T}} #' \newcommand\lik{\mathcal{L}} #' \newcommand\loglik{\ell} #' \newcommand{\scinot}[2]{#1{\times}10^{#2}} #' \newcommand{\pd}[3][]{\frac{\partial^{#1}{#2}}{\partial{#3}^{#1}}} #' \newcommand{\deriv}[3][]{\frac{\mathrm{d}^{#1}{#2}}{\mathrm{d}{#3}^{#1}}} #' #' [Licensed under the Creative Commons Attribution-NonCommercial license](http://creativecommons.org/licenses/by-nc/4.0/). #' Please share and remix noncommercially, mentioning its origin. #' ![CC-BY_NC](../graphics/cc-by-nc.png) #' This document has its origins in the [SISMID short course on Simulation-based Inference](http://kingaa.github.io/sbied/intro/intro.html) given by Aaron King and Edward Ionides. #' #' Produced in **R** version `r getRversion()` using **pomp** version `r packageVersion("pomp")`. #' #' #' #'

#' **Important Note:** #' These materials have been updated for use with version `r packageVersion("pomp")`. #' As of version 2, **pomp** syntax has changed substantially. #' These changes [are documented](http://kingaa.github.io/pomp/vignettes/upgrade_guide.html) on the **pomp** website. #'

#' ## ----opts,include=FALSE,cache=FALSE------------------------------------------- options(stringsAsFactors=FALSE) library(ggplot2) theme_set(theme_bw()) set.seed(2028866059L) #' #' ## Introduction: ecological and epidemiological dynamics #' #' - Ecological systems are complex, open, nonlinear, and nonstationary. #' - It is useful to model them as stochastic systems. #' - "Laws of Nature" are unavailable except in the most general form. #' - For any observable phenomenon, multiple competing explanations are possible. #' - Central scientific goals: #' - Which explanations are most favored by the data? #' - Which kinds of data are most informative? #' - Central applied goals: #' - How to design ecological or epidemiological intervention? #' - How to make accurate forecasts? #' - Time series are particularly useful sources of data. #' #' #' ### Noisy clockwork: Time series analysis of population fluctuations in animals #' #' ##### Six problems of @Bjornstad2001 #' #' Obstacles for **ecological** modeling and inference via nonlinear mechanistic models: #' #' 1. Combining measurement noise and process noise. #' 2. Including covariates in mechanistically plausible ways. #' 3. Using continuous-time models. #' 4. Modeling and estimating interactions in coupled systems. #' 5. Dealing with unobserved variables. #' 6. Modeling spatial-temporal dynamics. #' #' The same issues arise for **epidemiological** modeling and inference via nonlinear mechanistic models. #' #' ## Objectives #' #' 1. To show how stochastic dynamical systems models can be used as scientific instruments. #' 1. To give students the ability to formulate models of their own. #' 1. To teach efficient approaches for performing scientific inference using POMP models. #' 1. To familiarize students with the **pomp** package. #' 1. To give students opportunities to work with such inference methods. #' 1. To provide documented examples for student re-use. #' #' #' ### Questions and answers #' #' 1. [What roles are played by asymptomatic infection and waning immunity in cholera epidemics?](http://dx.doi.org/10.1038/nature07084) #' 7. [Do subclinical infections of pertussis play an important epidemiological role?](http://dx.doi.org/10.1371/journal.pone.0072086) #' 3. [What explains the seasonality of measles?](http://dx.doi.org/10.1098/rsif.2009.0151) #' 2. [What is the contribution to the HIV epidemic of dynamic variation in sexual behavior of an individual over time? How does this compare to the role of heterogeneity between individuals?](http://dx.doi.org/10.1093/aje/kwv044) #' 5. [What explains the interannual variability of malaria?](http://dx.doi.org/10.1371/journal.pcbi.1000898) #' 6. [What will happen next in an Ebola outbreak?](http://dx.doi.org/10.1098/rspb.2015.0347) #' #' ------------------------------- #' #' ## Partially observed Markov process (POMP) models #' #' * Data $y^*_1,\dots,y^*_N$ collected at times $t_1<\dots="2.8") #' #' A pre-built `pomp` object encoding the Ricker model comes included with the package. #' Construct it by executing ## ----load-ricker,cache=FALSE,results="hide"----------------------------------- ricker() -> rick #' This has the effect of creating a `pomp` object named `rick` in your workspace. #' We can plot the data by doing ## ----plot-ricker-------------------------------------------------------------- plot(rick) #' We can simulate by doing ## ----sim-ricker1-------------------------------------------------------------- x <- simulate(rick) #' What kind of object have we created? ## ----------------------------------------------------------------------------- class(x) plot(x) #' Why do we see more time series in the simulated `pomp` object? #' #' We can turn a `pomp` object into a data frame: ## ----------------------------------------------------------------------------- y <- as.data.frame(rick) head(y) head(simulate(rick,format="data.frame")) #' #' We can also run multiple simulations simultaneously: ## ----------------------------------------------------------------------------- x <- simulate(rick,nsim=10) class(x) sapply(x,class) x <- simulate(rick,nsim=10,format="data.frame") head(x) str(x) #' Also, ## ----fig.height=8------------------------------------------------------------- x <- simulate(rick,nsim=9,format="d",include.data=TRUE) ggplot(data=x,aes(x=time,y=y,group=.id,color=(.id=="data")))+ geom_line()+guides(color=FALSE)+ facet_wrap(~.id,ncol=2) #' #' We refer to the deterministic map as the "skeleton" of the stochastic map. #' We can compute a trajectory of the the deterministic skeleton using `trajectory`: ## ----traj-ricker-------------------------------------------------------------- y <- trajectory(rick) dim(y) dimnames(y) plot(time(rick),y["N",1,],type="l") #' #' Notice that `rick` has parameters associated with it: ## ----coef-ricker-------------------------------------------------------------- coef(rick) #' These are the parameters at which the simulations and deterministic trajectory computations above were done. #' We can run these at different parameters: ## ----------------------------------------------------------------------------- theta <- coef(rick) theta[c("r","N.0")] <- c(5,3) y <- trajectory(rick,params=theta) plot(time(rick),y["N",1,],type="l") x <- simulate(rick,params=theta) plot(x,var="y") #' #' We can also change the parameters stored inside of `rick`: ## ----------------------------------------------------------------------------- coef(rick,c("r","N.0","sigma")) <- c(39,0.5,1) coef(rick) plot(simulate(rick),var="y") #' #' In all of the above, it's possible to work with more than one set of parameters at a time. #' For example: ## ----bifdiag------------------------------------------------------------------ p <- parmat(coef(rick),500) dim(p); dimnames(p) p["r",] <- seq(from=2,to=40,length=500) y <- trajectory(rick,params=p,times=200:1000) matplot(p["r",],y["N",,],pch=".",col='black',xlab='r',ylab='N',log='x') #' #' How do you interpret the above plot? #' This is called a *one-parameter bifurcation diagram". #' #' More information on manipulating and extracting information from `pomp` objects can be viewed in the help pages (`methods?pomp`). #' #' There are a number of other examples included with the package. #' Do `pompExample()` to see a list of these. #' More examples can be found in the **pompExamples** package: #' #' ### Inference algorithms in **pomp** #' #' **pomp** provides a wide range of inference algorithms. #' We'll learn about these in detail soon, but for now, let's just look at some of their general features. #' #' The `pfilter` function runs a simple particle filter. #' It can be used to evaluate the likelihood at a particular set of parameters. #' One uses the `Np` argument to specify the number of particles to use: ## ----pfilter1----------------------------------------------------------------- pf <- pfilter(rick,Np=1000) class(pf) plot(pf) logLik(pf) #' #' Note that `pfilter` returns an object of class `pfilterd.pomp`. #' This is the general rule: inference algorithms return objects that are `pomp` objects with additional information. #' The package provides tools to extract this information. #' We can run the particle filter again by doing ## ----pfilter2----------------------------------------------------------------- pf <- pfilter(pf) logLik(pf) #' which has the result of running the same computation again. #' Note that, because the particle filter is a Monte Carlo algorithm, we get a slightly different estimate of the log likelihood. #' #' Note that, by default, running `pfilter` on a `pfilterd.pomp` object causes the computation to be re-run with the same parameters as before. #' Any additional arguments we add override these defaults. #' This is the general rule in **pomp**. #' For example, ## ----pfilter3----------------------------------------------------------------- pf <- pfilter(pf,Np=100) logLik(pf) #' Here, the particle filtering has been performed with only `r unique(pf@Np)` particles. #' #' ### Building a custom `pomp` object #' #' A real **pomp** data analysis begins with constructing one or more `pomp` objects to hold the data and the model or models under consideration. #' We'll illustrate this process a dataset of *Parus major* abundance in Wytham Wood, near Oxford [@McCleery1991]. #' #' Download and plot the data: ## ----parus-data--------------------------------------------------------------- dat <- read.csv("http://kingaa.github.io/clim-dis/intro/parus.csv") head(dat) plot(pop~year,data=dat,type='o') #' #' Let's suppose that we want to fit the stochastic Ricker model discussed above to these data. #' #' The call to construct a `pomp` object is, naturally enough, `pomp`. #' Documentation on this function can be had by doing `?pomp`. #' Learn about the various things you can do once you have a `pomp` object by doing `methods?pomp` and following the links therein. #' Read an overview of the package as a whole with links to its main features by doing `package?pomp`. #' A complete index of the functions in **pomp** is returned by the command `library(help=pomp)`. #' Finally, the home page for the `pomp` project is http://kingaa.github.io/pomp; #' there you have access to the complete source code, tutorials, manuals, issues page, news blog, etc. #' #' Now, to construct our `pomp` object: ## ----parus-pomp1-------------------------------------------------------------- library(pomp) parus <- pomp(dat,times="year",t0=1959) #' The `times` argument specifies that the column of `dat` labelled "year" gives the measurement times; #' `t0` is the "zero-time", the time at which the state process will be initialized. #' We've set it to one year prior to the beginning of the data. #' Plot the new `pomp` object: ## ----parus-plot1-------------------------------------------------------------- plot(parus) #' #' #### Adding in the process model simulator #' #' We can add the stochastic Ricker model to `parus` by writing a Csnippet that simulates one realization of the stochastic process, from an arbitary time $t$ to $t+1$, given arbitrary states and parameters. #' We provide this to `pomp` in the form of a `Csnippet`, a little snippet of C code that performs the computation. #' The following does this. ## ----parus-sim-defn----------------------------------------------------------- stochStep <- Csnippet(" e = rnorm(0,sigma); N = r*N*exp(-N+e); ") pomp(parus,rprocess=discrete_time(step.fun=stochStep,delta.t=1), paramnames=c("r","sigma"),statenames=c("N","e")) -> parus #' Note that in the above, we use the `exp` and `rnorm` functions from the [**R** API](http://cran.r-project.org/doc/manuals/r-release/R-exts.html#The-R-API). #' In general any C function provided by **R** is available to you. #' **pomp** also provides a number of C functions that are documented in the header file, `pomp.h`, that is installed with the package. #' See the `Csnippet` documentation (`?Csnippet`) to read more about how to write them. #' Note too that we use `discrete.time.sim` here because the model is a stochastic map. #' We specify that the time step of the discrete-time process is `delta.t`, here, 1 yr. #' #' At this point, we have what we need to simulate the state process: ## ----ricker-first-sim--------------------------------------------------------- sim <- simulate(parus,params=c(N.0=1,e.0=0,r=12,sigma=0.5),format="d") plot(N~year,data=sim,type='o') #' #' #### Adding in the measurement model and parameters #' #' We complete the specification of the POMP by specifying the measurement model. #' To obtain the Poisson measurement model described above, we write two Csnippets. #' The first simulates: ## ----parus-rmeas-defn--------------------------------------------------------- rmeas <- Csnippet("pop = rpois(phi*N);") #' The second computes the likelihood of observing `pop` birds given a true density of `N`: ## ----parus-dmeas-defn--------------------------------------------------------- dmeas <- Csnippet("lik = dpois(pop,phi*N,give_log);") #' [Note the `give_log` argument. #' When this code is evaluated, `give_log` will be set to 1 if the log likelihood is desired, and 0 else.] #' We add these into the `pomp` object: ## ----parus-add-meas----------------------------------------------------------- pomp(parus,rmeasure=rmeas,dmeasure=dmeas,statenames=c("N"),paramnames=c("phi")) -> parus #' Now we can simulate the whole POMP. #' First, let's add some parameters: ## ----ricker-add-params-------------------------------------------------------- coef(parus) <- c(N.0=1,e.0=0,r=20,sigma=0.1,phi=200) ## ----ricker-second-sim,results='markup'--------------------------------------- library(ggplot2) sims <- simulate(parus,nsim=3,format="d",include.data=TRUE) ggplot(data=sims,mapping=aes(x=year,y=pop))+geom_line()+ facet_wrap(~.id,ncol=1,scales="free_y") #' #' #### Adding in the deterministic skeleton #' #' We can add the Ricker model deterministic skeleton to the `parus` `pomp` object. #' Since the Ricker model is a discrete-time model, its skeleton is a map that takes $N_t$ to $N_{t+1}$ according to the Ricker model equation #' $$N_{t+1} = r\,N_{t}\,\exp(-N_{t}).$$ #' The following implements this. ## ----parus-skel-defn---------------------------------------------------------- skel <- Csnippet("DN = r*N*exp(-N);") #' We then add this to the `pomp` object: ## ----parus-add-skel----------------------------------------------------------- parus <- pomp(parus,skeleton=map(skel),paramnames=c("r"),statenames=c("N")) #' Note that we have to inform **pomp** as to which of the variables we've referred to in `skel` is a state variable (`statenames`) and which is a parameter (`paramnames`). #' In writing a `Csnippet` for the deterministic skeleton, we use `D` to designate the map's value. #' The `map` call tells **pomp** that the skeleton is a discrete-time dynamical system (a map) rather than a continuous-time system (a vectorfield). #' #' With just the skeleton defined, we are in a position to compute the trajectories of the deterministic skeleton at any point in parameter space. #' For example, here we compute the trajectory and superimpose it on a plot of one simulation: ## ----parus-first-traj,results='markup'---------------------------------------- traj <- trajectory(parus,params=c(N.0=1,r=12),format="d") plot(N~year,data=sim,type='o') lines(N~year,data=traj,type='l',col='red') #' #' #### A note on terminology #' #' If we know the state, $x(t_0)$, of the system at time $t_0$, it makes sense to speak about the entire trajectory of the system for all $t>t_0$. #' This is true whether we are thinking of the system as deterministic or stochastic. #' Of course, in the former case, the trajectory is uniquely determined by $x(t_0)$, while in the stochastic case, only the probability distribution of $x(t)$, $t>t_0$ is determined. #' To avoid confusion, we use the term "trajectory" exclusively to refer to *trajectories of a deterministic process*. #' Thus, the `trajectory` command iterates or integrates the deterministic skeleton forward in time, returning the unique trajectory determined by the specified parameters. #' When we want to speak about sample paths of a stochastic process, we use the term *simulation*. #' Accordingly, the `simulate` command always returns individual sample paths from the POMP. #' In particular, we avoid "simulating a set of differential equations", preferring instead to speak of "integrating" the equations, or "computing trajectories". #' #' ## Exercises #' #' -------------------------- #' #' ##### Exercise: Ricker model parameters #' Fiddle with the parameters to try and make the simulations look more like the data. #' This will help you build some intuition for what the various parameters do. #' #' -------------------------- #' #' ##### Exercise: Reformulating the Ricker model #' Reparameterize the Ricker model so that the scaling of $N$ is explicit: #' $$N_{t+1} = r\,N_{t}\,\exp\left(-\frac{N_{t}}{K}+\varepsilon_t\right).$$ #' #' Modify the `pomp` object we created above to reflect this reparameterization. #' #' Modify the measurement model so that #' $$\mathrm{pop}_t \sim \dist{Negbin}{\phi\,N_t,k},$$ #' i.e., $\mathrm{pop}_t$ is negative-binomially distributed with mean $\phi\,N_t$ and clumping parameter $k$. #' See `?NegBinomial` for documentation on the negative binomial distribution and [the **R** Extensions Manual section on distribution functions](http://cran.r-project.org/doc/manuals/r-release/R-exts.html#Distribution-functions) for information on how to access these in C. #' #' -------------------------- #' #' ##### Exercise: Beverton-Holt #' Construct a `pomp` object for the *Parus major* data and the stochastic Beverton-Holt model #' $$N_{t+1} = \frac{a\,N_t}{1+b\,N_t}\,\varepsilon_t,$$ #' where $a$ and $b$ are parameters and #' $$\varepsilon_t \sim \dist{Lognormal}{-\tfrac{1}{2}\sigma^2,\sigma}.$$ #' Assume the same measurement model as before. #' #' ------------------------------ #' #' ## [Back to course homepage](../) #' ## [**R** codes for this document](http://raw.githubusercontent.com/kingaa/clim-dis/master/intro/intro.R) #' #' ---------------------- #' #' ## References