#' ---
#' title: "Case Study: Polio in Wisconsin"
#' author: "Edward Ionides and Aaron A. King"
#' output:
#'   html_document:
#'     toc: yes
#'     toc_depth: 4
#' bibliography: ../course.bib
#' csl: ../ecology.csl
#' ---
#' 
#' \newcommand\prob[1]{\mathbb{P}\left[{#1}\right]}
#' \newcommand\expect[1]{\mathbb{E}\left[{#1}\right]}
#' \newcommand\var[1]{\mathrm{Var}\left[{#1}\right]}
#' \newcommand\dist[2]{\mathrm{#1}\left(#2\right)}
#' \newcommand\dlta[1]{{\Delta}{#1}}
#' \newcommand\lik{\mathcal{L}}
#' \newcommand\loglik{\ell}
#' 
#' Licensed under the [Creative Commons attribution-noncommercial license](http://creativecommons.org/licenses/by-nc/3.0/).
#' Please share and remix noncommercially, mentioning its origin.  
#' ![CC-BY_NC](../graphics/cc-by-nc.png)
#' This document has its origins in the [SISMID short course on Simulation-based Inference](https://kingaa.github.io/sbied/polio/polio.html) given by Aaron King and Edward Ionides.
#' 
#' Produced with **R** version `r getRversion()` and **pomp** version `r packageVersion("pomp")`.
#' 
#' 
#' 
## ----prelims,include=FALSE-----------------------------------------------
library(pomp)
stopifnot(packageVersion("pomp")>="1.4.5")
set.seed(5996485L)

#' 
#' ## Objectives
#' 
#' In this lesson, we aim to 
#' 
#' 1. show how partially observed Markov process (POMP) methods can be used to understand transmission dynamics of polio.
#' 2. discuss the use of POMP methods for compartmental models for biological systems having age structure, seasonality and other covariates.
#' 3. get some practice maximizing the likelihood for such models. How does one set up a *global* search for a maximum likelihood estimate? 
#' 4. see how to test whether such a search has been successful.
#' 
#' ## Introduction
#' 
#' The massive global polio eradication initiative (GPEI) has brought polio from a major global disease to the brink of extinction. 
#' Finishing this task is proving hard, and an improved understanding of polio ecology might assist.
#' A recent paper investigated this using extensive state level pre-vaccination era data from the USA [[@Martinez-Bakker2015]](http://dx.doi.org/10.1371/journal.pbio.1002172). 
#' We will follow the approach of @Martinez-Bakker2015 for one state (Wisconsin). 
#' In the context of the @Martinez-Bakker2015 model, we can quantify seasonality of transmission, the role of the birth rate in explaining the transmission dynamics, and the persistence mechanism of polio. 
#' @Martinez-Bakker2015 carrried out this analysis for all 48 contigous states and District of Columbia, and their data and code are all publicly available. 
#' The data we study, in consist of `cases`, the monthly reported polio cases; `births`, the  monthly recorded births; `pop`, the annual census; `time`, date in years.
#' 
## ----data----------------------------------------------------------------
polio_data <- read.table("http://kingaa.github.io/sbied/polio/polio_wisconsin.csv")

#' 
#' ## A polio transmission model
#' 
#' ### Model formulation
#' 
#' We implement the discrete-time compartmental model of @Martinez-Bakker2015.
#' It has compartments representing susceptible babies in each of six one-month birth cohorts ($S^B_1$,...,$S^B_6$), susceptible older individuals ($S^O$), infected babies ($I^B$), infected older individuals ($I^O$), and individuals who have recovered with lifelong immunity ($R$). 
#' The state vector of the disease transmission model consists of numbers of individuals in each compartment at each time, 
#' $$X(t)=\big(S^B_1(t),...,S^B_6(t), I^B(t),I^O(t),R(t) \big).$$
#' Babies under six months are modeled as fully protected from symptomatic poliomyelitis; 
#' older infections lead to reported cases (usually paralysis) at a rate $\rho$. 
#' 
#' The flows through the compartments are graphically represented as follows (Figure 1A of @Martinez-Bakker2015):
#' 
#' ![Polio model diagram](./polio_fig1A.png)
#' 
#' Since duration of infection is comparable to the one-month reporting aggregation, a discrete time model may be appropriate. 
#' @Martinez-Bakker2015 fitted monthly observations from May 1932 through January 1953, so we define $t_n=1932+ (4+n)/12$ for $n=0,\dots,N$, and we write $$X_n=X(t_n)=\big(S^B_{1,n},...,S^B_{6,n}, I^B_n,I^O_n,R_n \big).$$
#' The mean force of infection, in units of $\mathrm{yr}^{-1}$, is modeled as
#' $$\bar\lambda_n=\left( \beta_n \frac{I^O_n+I^B_n}{P_n} + \psi \right)$$
#' where $P_n$ is census population interpolated to time $t_n$ and seasonality of transmission is modeled as
#' $$\beta_n=\exp\left\{ \sum_{k=1}^K b_k\xi_k(t_n) \right\},$$
#' with $\{\xi_k(t),k=1,\dots,K\}$ being a periodic B-spline basis.
#' We set $K=6$. The force of infection has a stochastic perturbation,
#' $$\lambda_n = \bar\lambda_n \epsilon_n,$$
#' where $\epsilon_n$ is a Gamma random variable with mean 1 and variance $\sigma^2_{\mathrm{env}} + \sigma^2_{\mathrm{dem}}\big/\bar\lambda_n$.
#' These two terms capture variation on the environmental and demographic scales, respectively.
#' All compartments suffer a mortality rate, set at $\delta=1/60\mathrm{yr}^{-1}$. 
#' Within each month, all susceptible individuals are modeled as having exposure to constant competing hazards of mortality and polio infection.
#' The chance of remaining in the susceptible population when exposed to these hazards for one month is therefore
#' $$p_n = \exp\left\{-\frac{\delta+\lambda_n}{12}\right\},$$
#' with the chance of polio infection being 
#' $$q_n = (1-p_n)\,\frac{\lambda_n}{\lambda_n+\delta}.$$
#' We employ a continuous population model, with no demographic stochasticity (in some sense, the demographic-scale stochasticity in $\lambda_n$ is in fact environmental stochasticity since it modifies a rate that affects all compartments equally).
#' Writing $B_n$ for births in month $n$, the full set of model equations is:
#' $$\begin{aligned}
#' S^B_{1,n+1} &= B_{n+1}\\
#' S^B_{k,n+1} &= p_nS^B_{k-1,n} \quad\mbox{for $k=2,\dots,6$}\\
#' S^O_{n+1} &= p_n(S^O_n+S^B_{6,n})\\
#' I^B_{n+1} &= q_n \sum_{k=1}^6 S^B_{k,n}\\
#' I^O_{n+1} &= q_n S^O_n
#' \end{aligned}$$
#' The model for the reported observations, conditional on the state, is a discretized normal distribution truncated at zero, with both environmental and Poisson-scale contributions to the variance:
#' $$Y_n= \max\{\mathrm{round}(Z_n),0\}, \quad Z_n\sim\dist{Normal}{\rho I^O_n, \rho I^O_n+\left(\tau I^O_n\right)^2}.$$
#' Additional parameters are used to specify initial state values at time $t_0=1932+ 4/12$. 
#' We will suppose there are parameters $\big(\tilde S^B_{1,0},...,\tilde S^B_{6,0}, \tilde I^B_0,\tilde I^O_0,\tilde S^O_0\big)$ that specify the population in each compartment at time $t_0$ via
#' $$ S^B_{1,0}= {\tilde S}^B_{1,0} ,...,S^B_{6,0}= \tilde S^B_{6,0}, \quad I^B_{0}= P_0 \tilde I^B_{0},\quad S^O_{0}= P_0 \tilde S^O_{0}, \quad I^O_{0}= P_0 \tilde I^O_{0}.$$
#' Following @Martinez-Bakker2015, we make an approximation for the initial conditions of ignoring infant infections at time $t_0$. 
#' Thus, we set $\tilde I^B_{0}=0$ and use monthly births in the preceding months (ignoring infant mortality) to fix $\tilde S^B_{k,0}=B_{1-k}$ for $k=1,\dots,6$.
#' The estimated initial conditions are then defined by the two parameters $\tilde I^O_{0}$ and $\tilde S^O_{0}$, since the initial recovered population, $R_0$, is specified by subtraction of all the other compartments from the total initial population, $P_0$. 
#' Note that it is convenient to parameterize the estimated initial states as fractions of the population, whereas the initial states fixed at births are parameterized directly as a count.
#' 
#' 
#' ### Model implementation
#' 
#' **pomp** is an **R** package for time series data analysis, focusing on the use of POMP models [@King2016].
#' pomp is available from [CRAN](http://cran.r-project.org/web/packages/pomp), with development versions available from [github](http://kingaa.github.io/pomp). 
#' Here, we use pomp version `r packageVersion("pomp")`.
#' 
## ----load-package--------------------------------------------------------
library(pomp)
packageVersion("pomp")

#' 
#' Observations are monthly case reports, $y^*_{1:N}$, occurring at times $t_{1:N}$.
#' Since our model is in discrete time, we only really need to consider the discrete time state process,. 
#' However, the model and POMP methods extend naturally to the possibility of a continuous-time model specification. 
#' We code the list of state variables, and the choice of $t_0$, as
## ----statenames----------------------------------------------------------
statenames <- c("SB1","SB2","SB3","SB4","SB5","SB6","IB","SO","IO")
t0 <- 1932+4/12

#' 
#' We do not explictly code $R$, since it is defined implicitly as the total population minus the sum of the other compartments. 
#' Due to lifelong immunity, individuals in $R$ play no role in the dynamics. 
#' Even occasional negative values of $R$ (due to a discrepancy between the census and the mortality model) would not be a fatal flaw.
#' 
#' Now, let's define the covariates. `time` gives the time at which the covariates are defined. 
#' `P` is a smoothed interpolation of the annual census. 
#' `B` is monthly births. 
#' The B-spline basis is coded as `xi1,...,xi6`.
## ----covariates----------------------------------------------------------
bspline_basis <- periodic.bspline.basis(
  polio_data$time,nbasis=6,degree=3,period=1,names="xi%d")
covartable <- data.frame(
  time=polio_data$time,
  B=polio_data$births,
  P=predict(smooth.spline(x=1931:1954,y=polio_data$pop[12*(1:24)]),
            x=polio_data$time)$y,
  bspline_basis
)

#' 
#' The parameters $b_1,\dots,b_6,\psi,\rho,\tau,\sigma_\mathrm{dem}, \sigma_\mathrm{env}$  in the model above are _regular parameters_ (RPs), meaning that they are real-valued parameters that affect the dynamics and/or the measurement of the process.
## ----rp_names------------------------------------------------------------
rp_names <- c("b1","b2","b3","b4","b5","b6","psi","rho","tau","sigma_dem","sigma_env")

#' 
#' The _initial value parameters_ (IVPs), $\tilde I^O_{0}$ and  $\tilde S^O_{0}$, are coded for each state named by adding `_0` to the state name:
## ----ivp_names-----------------------------------------------------------
ivp_names <- c("SO_0","IO_0")

#' 
#' Finally, there are two quantities in the dynamic model specification, $\delta=1/60 \mathrm{yr}^{-1}$ and $K=6$, that we are not estimating. 
#' In addition, there are six other initial-value quantities, $\{\tilde S^B_{1,0},\dots,\tilde S^B_{6,0}\}$, which we are treating as _fixed parameters_ (FPs). 
#' These represent the initial numbers in the 6 one-month infant age classes.
#' We initialize these using the first 6 months of data:
#' 
## ----fixed_params--------------------------------------------------------
i <- which(abs(covartable$time-t0)<0.01)
initial_births <- as.numeric(covartable$B[i-0:5])
names(initial_births) <- c("SB1_0","SB2_0","SB3_0","SB4_0","SB5_0","SB6_0") 
fixed_params <- c(delta=1/60,initial_births)
fp_names <- c("delta","SB1_0","SB2_0","SB3_0","SB4_0","SB5_0","SB6_0")

#' 
#' To begin with, we'll use a crude estimate of the parameters, based on earlier work.
## ----param_guess---------------------------------------------------------
params <- c(b1=3,b2=0,b3=1.5,b4=6,b5=5,b6=3,psi=0.002,rho=0.01,tau=0.001,
            sigma_dem=0.04,sigma_env=0.5,SO_0=0.12,IO_0=0.001,fixed_params)

#' 
#' The process model is implemented by a `Csnippet` that simulates a single step from time `t` to time `t+dt`:
## ----rprocess------------------------------------------------------------
rproc <- Csnippet("
  double beta = exp(dot_product(K, &xi1, &b1));
  double lambda = (beta * (IO+IB) / P + psi);
  double var_epsilon = pow(sigma_dem,2)/lambda +  sigma_env*sigma_env;
  lambda *= (var_epsilon < 1.0e-6) ? 1 : rgamma(1/var_epsilon,var_epsilon);
  double p = exp(-(delta+lambda)/12);
  double q = (1-p)*lambda/(delta+lambda);
  SB1 = B;
  SB2 = SB1*p;
  SB3 = SB2*p;
  SB4 = SB3*p;
  SB5 = SB4*p;
  SB6 = SB5*p;
  SO = (SB6+SO)*p;
  IB = (SB1+SB2+SB3+SB4+SB5+SB6)*q;
  IO = SO*q;
")

#' 
#' The measurement model is
## ----measure-------------------------------------------------------------
dmeas <- Csnippet("
  double tol = 1.0e-25;
  double mean_cases = rho*IO;
  double sd_cases = sqrt(pow(tau*IO,2) + mean_cases);
  if (cases > 0.0) {
    lik = pnorm(cases+0.5,mean_cases,sd_cases,1,0) - pnorm(cases-0.5,mean_cases,sd_cases,1,0) + tol; 
  } else{
    lik = pnorm(cases+0.5,mean_cases,sd_cases,1,0) + tol;
  }
  if (give_log) lik = log(lik);
")

rmeas <- Csnippet("
  cases = rnorm(rho*IO, sqrt( pow(tau*IO,2) + rho*IO ) );
  if (cases > 0.0) {
    cases = nearbyint(cases);
  } else {
    cases = 0.0;
  }
")

#' 
#' The map from the initial value parameters to the initial value of the states at time $t_0$ is coded by the initializer function:
#' 
## ----initializer---------------------------------------------------------
init <- Csnippet("
  SB1 = SB1_0;
  SB2 = SB2_0;
  SB3 = SB3_0;
  SB4 = SB4_0;
  SB5 = SB5_0;
  SB6 = SB6_0;
  IB = 0;
  IO = IO_0 * P;
  SO = SO_0 * P;
")

#' 
#' To carry out parameter estimation, it is also helpful to have transformations that map each parameter into the whole real line:
#' 
## ----trans---------------------------------------------------------------
toEst <- Csnippet("
 Tpsi = log(psi);
 Trho = logit(rho);
 Ttau = log(tau);
 Tsigma_dem = log(sigma_dem);
 Tsigma_env = log(sigma_env);
 TSO_0 =  logit(SO_0);
 TIO_0 = logit(IO_0);
")

fromEst <- Csnippet("
 Tpsi = exp(psi);
 Trho = expit(rho);
 Ttau = exp(tau);
 Tsigma_dem = exp(sigma_dem);
 Tsigma_env = exp(sigma_env);
 TSO_0 =  expit(SO_0);
 TIO_0 = expit(IO_0);
")

#' 
#' We can now put these pieces together into a pomp object. 
#' 
#' 
## ----pomp----------------------------------------------------------------
polio <- pomp(
  data=subset(polio_data, 
              (time > t0 + 0.01) & (time < 1953+1/12+0.01),	
              select=c("cases","time")),
  times="time",
  t0=t0,
  params=params,
  rprocess = euler.sim(step.fun = rproc, delta.t=1/12),
  rmeasure = rmeas,
  dmeasure = dmeas,
  covar=covartable,
  tcovar="time",
  statenames = statenames,
  paramnames = c(rp_names,ivp_names,fp_names),
  initializer=init,
  toEstimationScale=toEst, 
  fromEstimationScale=fromEst,
  globals="int K = 6;"
)

#' 
#' 
## ----first_sim,include=FALSE---------------------------------------------
library(reshape2)
library(ggplot2)
nsim <- 9
x <- simulate(polio,nsim=nsim,as.data.frame=TRUE,include.data=TRUE)
ggplot(data=x,mapping=aes(x=time,y=cases,group=sim,color=(sim=="data")))+
  geom_line()+
  scale_color_manual(values=c(`TRUE`="blue",`FALSE`="red"))+
  guides(color=FALSE)+
  facet_wrap(~sim,ncol=2)+
  scale_y_sqrt()+
  theme_bw()+theme(strip.text=element_blank()) -> pl

#' 
#' To test the codes, let's run some simulations.
#' In the following, we plot the data (in blue) and `r nsim` simulations.
#' 
#' To test the `dmeasure` portion of the likelihood, we'll run a particle filter.
#' This will compute the likelihood at our parameter guess.
## ----first_pf------------------------------------------------------------
pf <- pfilter(polio,Np=1000)
logLik(pf)

#' 
#' To get an idea of the Monte Carlo error in this estimate, we can run several realization of the particle filter.
#' Since most modern machines have multiple cores, it is easy to do this in parallel.
#' We accomplish this via the **doParallel** and **foreach** packages.
## ----parallel-setup,cache=FALSE------------------------------------------
library(foreach)
library(doParallel)
registerDoParallel()

#' 
## ----pf1-----------------------------------------------------------------
set.seed(493536993,kind="L'Ecuyer")
t1 <- system.time(
  pf1 <- foreach(i=1:10,.packages='pomp',
                 .options.multicore=list(set.seed=TRUE)
  ) %dopar% {
    pfilter(polio,Np=5000)
  }
)
(L1 <- logmeanexp(sapply(pf1,logLik),se=TRUE))

#' Notice that we set up a parallel random number generator (RNG).
#' In particular, we use the L'Ecuyer RNG, which is recommended for use with **doParallel**.
#' Note, too, that the replications are averaged using the `logmeanexp` function, which counteracts Jensen's inequality.
#' It is helpful to plot the *effective sample size* and conditional log likelihoods:
## ----pf1-plot,echo=FALSE-------------------------------------------------
library(plyr)
library(reshape2)
library(magrittr)
library(ggplot2)
pf1 %>% 
  setNames(seq_along(pf1)) %>% 
  ldply(as.data.frame,.id='rep') %>%
  subset(select=c(time,rep,ess,cond.loglik)) %>% 
  melt(id=c('time','rep')) %>%
  ggplot(aes(x=time,y=value,group=variable))+
  geom_line()+
  facet_wrap(~variable,ncol=1,scales='free_y')+
  guides(color=FALSE)+
  theme_bw()

#' These calculations took only `r round(t1[3],0)` sec and produced a log likelihood estimate with a standard error of `r round(L1[2],2)`.
#' 
#' ## Parameter estimation
#' 
#' ### Likelihood maximization: A local approach
#' 
#' Now, let us see if we can improve on our initial guess. 
#' We use the iterated filtering algorithm IF2 of @ionides15, which uses a random walk in parameter space to approach the MLE. 
#' We set a constant random walk standard deviation for each of the regular parameters and a larger constant for each of the initial value parameters. 
## ----local_search--------------------------------------------------------
stew(file="local_search.rda",{
  w1 <- getDoParWorkers()
  t1 <- system.time({
    m1 <- foreach(i=1:90,
                  .packages='pomp',.combine=rbind,
                  .options.multicore=list(set.seed=TRUE)
    ) %dopar% {
      mf <- mif2(polio,
                 Np=1000,
                 Nmif=50,
                 cooling.type="geometric",
                 cooling.fraction.50=0.5,
                 transform=TRUE,
                 rw.sd=rw.sd(
                   b1=0.02, b2=0.02, b3=0.02, b4=0.02, b5=0.02, b6=0.02,
                   psi=0.02, rho=0.02, tau=0.02, sigma_dem=0.02, sigma_env=0.02,
                   IO_0=ivp(0.2), SO_0=ivp(0.2)
                 )
      )
      ll <- logmeanexp(replicate(10,logLik(pfilter(mf,Np=5000))),se=TRUE)
      data.frame(as.list(coef(mf)),loglik=ll[1],loglik.se=ll[2])
    }
  }
  )
},seed=318817883,kind="L'Ecuyer")

#' 
#' This investigation took `r round(t1["elapsed"]/60,1)` minutes on a machine with `r w1` cores.
#' The maximum likelihood we obtain is `r with(subset(m1,loglik==max(loglik)),round(loglik,1))`.
#' These repeated stochastic maximizations can also show us the geometry of the likelihood surface in a neighborhood of this point estimate:
## ----pairs_local,fig.width=6,fig.height=6--------------------------------
pairs(~loglik+psi+rho+tau+sigma_dem+sigma_env,data=subset(m1,loglik>max(loglik)-20))

#' 
#' Because it is so useful to build up a view of the geometry of the likelihood surface, we save these likelihood estimates in a file for later use:
## ----param_file1---------------------------------------------------------
write.csv(m1,file="polio_params.csv",row.names=FALSE,na="")

#' 
#' We see what look like tradeoffs between $\psi$, $\rho$, and $\sigma_\mathrm{dem}$. 
#' By itself, in the absence of other assumptions, the pathogen immigration rate $\psi$ is fairly weakly identified. 
#' However, the reporting rate $\rho$ is essentially the fraction of poliovirus infections leading to acute flaccid paralysis, which is known to be around 1%. 
#' This plot suggests that fixing an assumed value of $\rho$ might lead to much more precise inference on $\psi$; 
#' the rate of pathogen immigration presumably being important for understanding disease persistence. 
#' These hypotheses could be investigated more formally by construction of profile likelihood plots and likelihood ratio tests.
#' 
#' ### Benchmark likelihoods for non-mechanistic models
#' 
#' The most basic statistical model for data is independent, identically distributed (IID). 
#' Picking a negative binomial model, 
#' 
## ----nbinom--------------------------------------------------------------
nb_lik <- function(theta) {
  -sum(dnbinom(as.vector(obs(polio)),size=exp(theta[1]),prob=exp(theta[2]),log=TRUE))
} 
nb_mle <- optim(c(0,-5),nb_lik)
-nb_mle$value

#' 
#' This shows us that a model with likelihood below `r round(-nb_mle$value,1)` is objectively worse than this simple IID model, and therefore scientifically unreasonable.
#' This explains a cutoff around this value in the global searches:
#' in these cases, the model is finding essentially IID explanations for the data.
#' 
#' Linear, Gaussian auto-regressive moving-average (ARMA) models provide non-mechanistic fits to the data including flexible dependence relationships. 
#' We fit to $\log(y_n^*+1)$ and correct the likelihood back to the scale appropriate for the untransformed data:
#' 
## ----arma----------------------------------------------------------------
log_y <- log(as.vector(obs(polio))+1)
arma_fit <- arima(log_y,order=c(2,0,2),seasonal=list(order=c(1,0,1),period=12))
arma_fit$loglik-sum(log_y)

#' 
#' This 7-parameter model, which knows nothing of susceptible depletion, attains a likelihood of `r round( arma_fit$loglik-sum(log_y),1)`. 
#' Although our goal is not to beat non-mechanstic models, it is comforting that we're competitive against them.
#' 
#' ### Global likelihood maximization: parameter estimation using randomized starting values.
#' 
#' When carrying out parameter estimation for dynamic systems, we need to specify beginning values for both the dynamic system (in the state space) and the parameters (in the parameter space). By convention, we use  *initial values* for the initialization of the dynamic system and *starting values* for initialization of the parameter search.
#' 
#' Practical parameter estimation involves trying many starting values for the parameters. One can specify a large box in parameter space that contains all parameter vectors which seem remotely sensible. If an estimation method gives stable conclusions with starting values drawn randomly from this box, this gives some confidence that an adequate global search has been carried out. 
#' 
#' For our polio model, a box containing reasonable parameter values might be
#' 
## ----box-----------------------------------------------------------------
polio_box <- rbind(
  b1=c(-2,8),
  b2=c(-2,8),
  b3=c(-2,8),
  b4=c(-2,8),
  b5=c(-2,8),
  b6=c(-2,8),
  psi=c(0,0.1),
  rho=c(0,0.1),
  tau=c(0,0.1),
  sigma_dem=c(0,0.5),
  sigma_env=c(0,1),
  SO_0=c(0,1),
  IO_0=c(0,0.01)
)

#' 
#' We then carry out a search identical to the local one except for the starting parameter values.
#' 
## ----global_search-------------------------------------------------------
stew(file="global_search.rda",{
  w2 <- getDoParWorkers()
  t2 <- system.time({
    m2 <- foreach(i=1:400,.packages='pomp',.combine=rbind,
                  .options.multicore=list(set.seed=TRUE)
    ) %dopar% {
      guess <- apply(polio_box,1,function(x)runif(1,x[1],x[2]))
      mf <- mif2(polio,
                 start=c(guess,fixed_params),
                 Np=2000,
                 Nmif=300,
                 cooling.type="geometric",
                 cooling.fraction.50=0.5,
                 transform=TRUE,
                 rw.sd=rw.sd(
                   b1=0.02, b2=0.02, b3=0.02, b4=0.02, b5=0.02, b6=0.02,
                   psi=0.02, rho=0.02, tau=0.02, sigma_dem=0.02, sigma_env=0.02,
                   IO_0=ivp(0.2), SO_0=ivp(0.2)
                 ))
      ll <- logmeanexp(replicate(10,logLik(pfilter(mf,Np=5000))),se=TRUE)
      data.frame(as.list(coef(mf)),loglik=ll[1],loglik.se=ll[2])
    }
  })
},seed=290860873,kind="L'Ecuyer")
library(plyr)
params <- arrange(rbind(m1,m2[names(m1)]),-loglik)
write.csv(params,file="polio_params.csv",row.names=FALSE,na="")

#' 
#' This search gives a maximized likelihood estimate of `r with(m2,round(max(loglik),1))` with a standard error of `r with(subset(m2,loglik==max(loglik)),round(loglik.se,2))`.
#' It took about `r round(t2[3]/60)` mins on a `r w2`-core machine.
#' This compares with the results of @Martinez-Bakker2015, who report a maximized likelihood of -794.3 with a standard error of 0.11.
#' 
#' Plotting these diverse parameter estimates can help to give a feel for the global geometry of the likelihood surface.
## ----pairs_global,fig.width=6,fig.height=6-------------------------------
pairs(~loglik+psi+rho+tau+sigma_dem+sigma_env,data=subset(m2,loglik>max(loglik)-20))

#' 
#' To understand these global searches, many of which may correspond to parameter values having no meaningful scientific interpretation, it is helpful to put the log likelihoods in the context of some non-mechanistic benchmarks.
#' 
#' It is also good practice to look at simulations from the fitted model:
#' 
## ----plot_simulated,echo=F-----------------------------------------------
library(magrittr)
library(reshape2)
library(ggplot2)
nsim <- 9
params %>%
  subset(loglik==max(loglik)) %>%
  unlist() -> coef(polio)
polio %>%
  simulate(nsim=nsim,as.data.frame=TRUE,include.data=TRUE) %>%
  ggplot(aes(x=time,y=cases,group=sim,color=(sim=="data")))+
  geom_line()+
  scale_color_manual(values=c(`TRUE`="blue",`FALSE`="red"))+
  guides(color=FALSE)+
  facet_wrap(~sim,ncol=2)+
  scale_y_sqrt()+
  theme_bw()+theme(strip.text=element_blank())

#' 
#' We see from this simulation that the fitted model can generate report histories that look qualitatively similar to the data. 
#' However, there are oddities in the latent states. Specifically, the pool of older susceptibles, $S^O(t)$, is mostly increasing. 
#' The reduced case burden in the data in the time interval 1932--1945 is explained by a large initial recovered ($R$) population, which implies much higher levels of polio before 1932. 
#' A likelihood profile over the parameter $\tilde S^O_0$ could help to clarify to what extent this is a critical feature of how the model explains the data.
#' 
#' ### Mining previous investigations of the likelihood
#' 
#' Saving the results of previous searches, with likelihoods that have been repeatedly evaluated by particle filters, gives a resource for building up knowledge about the likelihood surface. Above, we have added our new results to the file `polio_params.csv`, which we now investigate.
#' 
## ----param_file,fig.width=6,fig.height=6---------------------------------
params <- read.csv("polio_params.csv")
pairs(~loglik+psi+rho+tau+sigma_dem+sigma_env,data=subset(params,loglik>max(loglik)-20))

#' 
#' Here, we see that the most successful searches have always led to models low reporting rate. 
#' Looking more closely:
#' 
## ----global_rho,echo=F---------------------------------------------------
plot(loglik~rho,data=subset(params,loglik>max(loglik)-10),log="x")

#' 
#' We see that reporting rates of 1--2% seem to provide a small but clear (several units of log likelihood) advantage in explaining the data.
#' These are the reporting rates for which depletion of susceptibles can help to explain the dynamics.
#' 
#' ### Profile likelihood
#' 
## ----profile_rho---------------------------------------------------------
library(plyr)
library(reshape2)
library(magrittr)

bake(file="profile_rho.rds",{
  params %>% 
    subset(loglik>max(loglik)-20,
           select=-c(loglik,loglik.se,rho)) %>% 
    melt(id=NULL) %>% 
    daply(~variable,function(x)range(x$value)) -> box
  
  starts <- profileDesign(rho=seq(0.01,0.025,length=30),
                          lower=box[,1],upper=box[,2],
                          nprof=10)
  
  foreach(start=iter(starts,"row"),
          .combine=rbind,
          .packages="pomp",
          .options.multicore=list(set.seed=TRUE),
          .options.mpi=list(seed=290860873,chunkSize=1)
  ) %dopar% {
    mf <- mif2(polio,
               start=unlist(start),
               Np=2000,
               Nmif=300,
               cooling.type="geometric",
               cooling.fraction.50=0.5,
               transform=TRUE,
               rw.sd=rw.sd(
                 b1=0.02, b2=0.02, b3=0.02, b4=0.02, b5=0.02, b6=0.02,
                 psi=0.02, tau=0.02, sigma_dem=0.02, sigma_env=0.02,
                 IO_0=ivp(0.2), SO_0=ivp(0.2)
               ))
    mf <- mif2(mf,Np=5000,Nmif=100,cooling.fraction.50=0.1)
    ll <- logmeanexp(replicate(10,logLik(pfilter(mf,Np=5000))),se=TRUE)
    data.frame(as.list(coef(mf)),loglik=ll[1],loglik.se=ll[2])
  }
}) -> m3

## ----save_profile_rho----------------------------------------------------
params <- arrange(rbind(params,m3[names(params)]),-loglik)
write.csv(params,file="polio_params.csv",row.names=FALSE,na="")

#' 
#' Note that $\rho$ is not perturbed in the IF iterations for the purposes of the profile calculation.
#' 
## ----profile_rho_plot1,echo=F,fig.width=6,fig.height=6-------------------
params %>%
  mutate(rho.bin=cut(rho,breaks=seq(0.01,0.025,by=0.0005),include=T)) %>%
  subset(rho>0.01 & rho<0.025) %>%
  ddply(~rho.bin,subset,rank(-loglik)<=3) -> pp

# pp %>%
#   ggplot(aes(x=rho,y=loglik))+
#   geom_point()+
#   # geom_smooth(method="loess")+
#   # lims(y=max(m3$loglik)+c(-10,2))+
#   labs(x=expression(rho))+
#   theme_bw()

pairs(~loglik+psi+rho+tau+sigma_dem+sigma_env,data=subset(pp,loglik>max(loglik)-20))

#' 
#' ### Simulation to investigate the fitted model: Local persistence
#' 
#' The scientific purpose of fitting a model typically involves analyzing properties of the fitted model, often investigated using simulation.
#' Following @Martinez-Bakker2015, we are interested in how often months with no reported cases ($Y_n=0$) correspond to months without any local asymptomatic cases, defined for our continuous state model as $I^B_n+I^O_n<\tfrac{1}{2}$.
#' For Wisconsin, using our model at the estimated MLE, we compute as follows:
#' 
## ----persistence---------------------------------------------------------
library(plyr)
library(magrittr)
library(ggplot2)
library(grid)

params %>%
  subset(loglik==max(loglik)) %>%
  unlist() -> mleparams

coef(polio) <- mleparams

bake(file="sims.rds",seed=398906785,
     simulate(polio,nsim=2000,as.data.frame=TRUE,include.data=TRUE)
) -> sims
ddply(sims,~sim,summarize,zeros=sum(cases==0)) -> num_zeros

num_zeros %>%
  subset(sim != "data") %>%
  ggplot(mapping=aes(x=zeros))+
  geom_density()+
  geom_vline(data=subset(num_zeros,sim=="data"),aes(xintercept=zeros))+
  labs(x="# zero-case months")+
  theme_bw() -> pl1

num_zeros %>%
  subset(sim=="data") %>%
  extract2("zeros") -> datz

num_zeros %>%
  ddply(.(data=sim=="data"),summarize,
        mean=mean(zeros)) -> mean_zeros

sims %>%
  subset(sim != "data") %>%
  ddply(~sim,summarize,
        fadeout1=sum(IB+IO<0.5),
        fadeout80=sum(IB+IO<80)
  ) -> fadeouts

fadeouts %>%
  ggplot(mapping=aes(x=fadeout1))+
  geom_histogram(binwidth=1,fill=NA,color='black',aes(y=..density..))+
  labs(x="# fadeouts")+
  theme_bw()+theme(legend.position="top") -> pl2

print(pl1)
print(pl2,vp=viewport(x=0.8,y=0.75,height=0.4,width=0.2))

sims %>%
  subset(sim!="data") %>%
  summarize(imports=coef(polio,"psi")*mean(SO+SB1+SB2+SB3+SB4+SB5+SB6)/12
  ) -> imports

#' 
#' Since only `r with(subset(num_zeros,sim!="data"),round(sum(zeros<datz)/length(zeros)*100,1))`% of the simulations had fewer zero-case months than did the data, the model does predict significantly more fadeouts than are seen in the data.
#' Months with no asyptomatic infections for the simulations were rare, on average `r round(mean(fadeouts$fadeout1),1)` months per simulation. 
#' Months with fewer than 80 infections averaged `r round(mean(fadeouts$fadeout80),1)` per simulation, which in the context of a reporting rate of `r signif(mleparams["rho"],3)` can explain the absences of case reports. 
#' 
#' For this model, the mean monthly infections due to importations (i.e., due to the term $\psi$) is `r round(imports,1)`. 
#' This does not give much opportunity for local elimination of poliovirus. 
#' One could profile over $\psi$ to investigate how sensitive this conclusion is to values of $\psi$ consistent with the data.
#' 
#' ## Exercises
#' 
#' #### Exercise: Initial values
#' 
#' When carrying out parameter estimation for dynamic systems, we need to specify beginning values for both the dynamic system (in the state space) and the parameters (in the parameter space). By convention, we use  *initial values* for the initialization of the dynamic system and *starting values* for initialization of the parameter search.
#' 
#' Discuss issues in specifying and inferring initial conditions, with particular reference to this polio example. 
#' 
#' Suggest a possible improvement in the treatment of initial conditions here, code it up and make some preliminary assessment of its effectiveness. How will you decide if it is a substantial improvement? 
#' 
#' #### Exercise: Parameter estimation using randomized starting values.
#' 
#' Comment on the computations above, for parameter estimation using randomized starting values. Propose and try out at least one modification of the procedure. How could one make a formal statement quantifying the error of the optimization procedure?
#' 
#' #### Exercise: Demography and discrete time
#' 
#' It can be surprisingly hard to include birth, death, immigration, emmigration and aging into a disease model in satisfactory ways. Consider the strengths and weaknesses of the analysis presented. For example, how does it compare to a continuous-time model? In an imperfect world, it is nice to check the extent to which the conclusions are insensitive to alternative modeling decisions. If you have some ideas to change the treatmentof demography (or an other aspect of the model) you could have a go at coding it up to see if it makes a difference.
#' 
#' #### Exercise: Diagnosing filtering and maximization convergence
#' 
#' Are there outliers in the data (i.e., observations that do not fit well with our model)? Are we using unnecessarily large amounts of computer time to get our results? Are there indications that we would should run our computations for longer? Or maybe with different choices of algorithmic settings?
#' 
#' In particular, `cooling.fraction.50` gives the fraction by which the random walk standard deviation is decreased ("cooled") in 50 iterations. If `cooling.fraction.50` is too small, the search will "freeze" too soon, evidenced by flat parallel lines in the convergence diagnostics. If `cooling.fraction.50` is too large, the researcher may run of of time, patience or computing budget (or all three) before the parameter trajectories approach an MLE.
#' 
#' Interpret the diagnostic plots below. Carry out some numerical experiments to test your interpretations. 
#' One could look at filtering diagnostics at the MLE, for example, `plot(pf1[[1]])` but the diagnostic plots for iterated filtering include filtering diagnostics for the last iteration anyhow, so let's just consider the `mif` diagnostic plot. 
#' Looking at several simultaneously permits assessment of Monte Carlo variability. 
#' In the following, `m4` is constructed by concatenating the results of several `mif2` computations using `c()`.
#' When we call `plot` on such an object, we obtain useful diagnostics.
#' 
## ----mif_diagnostics,fig.height=6,fig.width=6----------------------------
foreach(i=1:4,
        .packages='pomp',.combine=c,
        .options.multicore=list(set.seed=TRUE)
) %dopar% {
  mif2(polio,
       start=c(b1=3,b2=0,b3=1.5,b4=6,b5=5,b6=3,
               psi=0.002,rho=0.01,tau=0.001,
               sigma_dem=0.04,sigma_env=0.5,
               SO_0=0.12,IO_0=0.001,fixed_params),
       Np=1000,
       Nmif=50,
       cooling.type="geometric",
       cooling.fraction.50=0.5,
       transform=TRUE,
       rw.sd=rw.sd(
         b1=0.02, b2=0.02, b3=0.02, b4=0.02, b5=0.02, b6=0.02,
         psi=0.02, rho=0.02, tau=0.02, sigma_dem=0.02, sigma_env=0.02,
         IO_0=ivp(0.2), SO_0=ivp(0.2)
       )
  )
} -> m4

plot(m4)

#' 
#' The likelihood is particularly important to keep in mind. 
#' If parameter estimates are numerically unstable, that could be a consequence of a weakly identified parameter subspace. 
#' The presence of some weakly identified combinations of parameters is not fundamentally a scientific flaw; 
#' rather, our scientific inquiry looks to investigate which questions can and cannot be answered in the context of a set of data and modeling assumptions. 
#' Thus, as long as the search is demonstrably approaching the maximum likelihood region we should not necessarily be worried about the stability of parameter values (at least, from the point of view of diagnosing successful maximization). 
#' So, let's zoom in on the likelihood convergence:
#' 
## ----likelihood_convergence----------------------------------------------
llconv <- do.call(cbind,conv.rec(m4,"loglik"))
matplot(llconv,type="l",lty=1,ylim=max(llconv,na.rm=T)+c(-30,0))

#' 
#' --------------------------
#' 
#' ## [Back to course homepage](http://kingaa.github.io/short-course)
#' 
#' ## [**R** codes for this document](http://raw.githubusercontent.com/kingaa/short-course/master/polio/polio.R)
#' 
#' --------------------------
#' 
#' ## References