## Introduction to Infectious Disease Dynamics
## Clinic on the Meaningful Modeling of Epidemiological Data
## International Clinics on Infectious Disease Dynamics and Data (ICI3D) Program
## African Institute for Mathematical Sciences, Muizenberg, RSA
##
## Juliet R.C. Pulliam, 2012-2018
##
##
## The goal of this tutorial is to acquaint you with ways of
## analysing simple infectious disease models in R. By the end of
## this tutorial, you should be able to:
##
## * create a function describing a system of ODE's
## * use the package deSolve to numerically analyze a system of ODEs
## * plot the output of different types of functions
##
## NOTE: The comments will guide you through the tutorial but you
## should make sure you understand what the code is doing. Some
## function arguments are assigned to ?. This will give you an
## error. You should try out values for these arguments as
## suggested in the comments or find them yourselves in a help file.
######################################################################
## Section 1: Creating a function to describe the behavior of an ODE
######################################################################
## Before you start, it is a good idea to clear the workspace of things
## you have defined previously. Do this now:
rm(list=ls()) # Clear all variables and functions
## We will now define a function to describe a simple SIR model with
## constant population size and no population turnover (that is, no
## births and deaths). This model was discussed in "Introduction to dynamic
## modeling of infectious diseases," and slides are available here:
## https://ndownloader.figshare.com/files/8541817
##
## The version of the model we'll work with here has 3 state variables (though
## only two of these need to be specified to define the state of the system,
## since the population size is constant):
##
## S - the number of susceptibles in the population
## I - the number of infected/infectious individuals in the population
## R - the number of recovered/immune individuals in the population
##
## and 3 parameters:
##
## N - the total population size; N = S + I + R
## beta - the transmission coefficient; beta = the contact rate * infectivity
## gamma - 1 / the infectious period
##
## If you do not remember the model, review the lecture notes before you proceed
## with this tutorial. If you're not entirely comfortable with the distinction
## between state variables and parameters, you may also want to review the
## DAIDD Glossary:
## https://github.com/ICI3D/MMEDparticipants/raw/master/Resources/DAIDD2016_Glossary.pdf
## We will now define a function that describes the change in the state
## variables with time. The function takes the input t (the current time), y (a
## vector giving the current value of the state variables - that is, their
## value at time t), and params (a vector that defines the parameter values);
## the function outputs the values of dS/dt and dI/dt at time t:
##
##-- Model 1: SIR model --##
##
## sir() -- Function for numerical analysis of model 1
sir <- function(t,y,parms){
# The with() function gives access to the named values of parms within the
# local environment created by the function
with(c(as.list(y),parms),{
dSdt <- -beta*S*I/N
dIdt <- beta*S*I/N - gamma*I
# Note: Population size is constant, so don't need to specify dRdt
list(c(dSdt,dIdt))
})
}
######################################################################
## Section 2: Understanding the function that defines our ODE model
######################################################################
## We have defined the function, but we now need to make sure we
## understand how it works. To do this, let's think about how we would
## approximate the model using discrete timesteps, similar to the way we
## used spreadsheets to analyze epidemic models on Monday.
##
## Since the function takes t, y, and parms as inputs, we first need to define
## these. Let us start at time zero:
time <- 0
## We will use parameter values that reflect a measles outbreak that occurred
## in New York City (many years ago!), so let our initial population size (N0) be
N0 <- 7780000
## The second argument of the sir() function defined above is y, which I have
## said is a vector giving the current value of the state variables. Let's
## assume that the initial population has 20.5 infecteds and that 6.5% of the
## population is susceptible. (Remember that this model assumes a large
## population size and represents population averages, so it is ok that these
## values are not integers!) We will define a vector pop.SI that gives the
## initial values of our 2 state variables:
pop.SI <- c(S = 0.065*N0, # Initially 6.5% of the population is susceptible
I = ???) # ENTER THE NUMBER INITIALLY INFECTED (20.5)
## Notice that I've named the two values in the initial population vector to
## help us keep track of which value is which. This also allows us to take
## advantage of the with() function called within the sir() function because I
## have used the same names that are used to refer to the state variables
## within the function.
##
## The final input our function needs is a vector of parameter values, which we
## can create in the same way:
values <- c(beta = 3.6, # Transmission coefficient
gamma = 1/5, # 1 / infectious period = 1/5 days
N = N0) # population size (constant)
## Note that we can calculate the value of the basic reproduction number,
## R0 = beta / gamma, as follows:
values["beta"]/values["gamma"] # value of R0
# HINT: Try putting this command inside the
# function as.numeric() to keep R from
# confusingly retaining the name of the first
# value.
## Now that we have defined the inputs, we are ready to try out our function!
sir(t=time,y=pop.SI,parms=values)
## Look at the output. What does this mean?
##
## Recall that I said the function outputs the time derivatives of S and I at
## the time t. This means that in order to get the (approximate) values of S
## and I at some time delta.t in the future, I need to calculate the equivalent
## of
##
## S(t+delta.t) = S(t) - (beta*S(t)*I(t)/N) * delta.t
##
## I(t+delta.t) = I(t) + (beta*S(t)*I(t)/N - gamma*I(t)) * delta.t
##
## This can be done as follows:
delta.t <- 0.1 # Set a small value for delta.t (0.1 day)
pop.next <- pop.SI + unlist(sir(t=time,y=pop.SI,parms=values)) * delta.t
## We could then iterate this process, updating our values of pop.SI and time
## with each iteration to get a discrete-time approximation of the values of the
## state variables through time. Remember, however, that the differential
## equations describe the rates of change in the limit as delta.t goes to zero.
## It turns out that using the above discrete time approximation of this
## process, as we have done on the spreadsheets (Track B) and live coding
## example, leads to rapid accumulation of error in our estimate of the state
## variables through time, even if we set our value of delta.t to be very
## small. Luckily for us, there are a number of algorithms that have been
## developed to come up with better (both more accurate and more
## computationally efficient) approximations to the values of the state
## variables through time.
######################################################################
## Section 3: Numerical integration of our ODE model
######################################################################
## In R, we can access these algorithms through the deSolve library. We should
## now load this library so we can access its functions:
library(deSolve) # Load libary to be used for numerical integration
## The function within deSolve that we will be using is called lsoda(). Let's
## look at the help file for this function:
?lsoda
## This help file gives some detail on the history of the algorithm as well as
## describing the function's usage. For now, take a look at the "Usage" section
## to look at the different types of arguments the function can take. There are
## a lot of them! For now, we will focus on the first 4 inputs: y, times, func,
## and parms. Read through the "Arguments" section for the descriptions of these
## four inputs. Because we have already defined our function and seen how it
## works, these arguments should seem familiar.
##
## We have already defined the vectors that we will use for two of the arguments
## (y, the vector of the initial values of the state variables, and parms, the
## vector of parameter values). We have also defined a third argumet, func,
## which provides the model definition; it is our function sir(). All that
## remains is to define the vector of times, which tells lsoda() the timepoints
## for which we would like to know the values of S and I. (NOTE: This is
## different from specifying the time points at which to evaluate our function,
## which is determined within the lsoda() function and optimized to efficiently
## provide an accurate approximation.)
##
## Let's ask for output every 0.1 days for 365 days, starting at time 0:
time.out <- seq(0,365,???) # INCLUDE THE APPROPRIATE VALUE TO GET A SEQUENCE
# FROM 0 to 365 BY STEPS OF 0.1
## Now let's see what happens if we plug our inputs into lsoda()...
lsoda(
y = pop.SI, # Initial conditions for population
times = time.out, # Timepoints for evaluation
func = sir, # Function to evaluate
parms = values # Vector of parameters
)
## Well, that seems to have done something, but it's hard to understand the
## output. Let's make it easier to look at by saving the output as a data.frame
## named ts.sir ("ts" being short for timeseries).
ts.sir <- data.frame(lsoda(
y = pop.SI, # Initial conditions for population
times = time.out, # Timepoints for evaluation
func = sir, # Function to evaluate
parms = values # Vector of parameters
))
## Now let's look at the output:
head(ts.sir)
## Our data frame has 3 columns, conveniently labeled "time", "S", and "I",
## showing the values of time and the two state variables through time. We can
## now look at the output in other ways. For example, we saw yesterday that we
## expect the infection to go extinct because there is no source of new
## susceptibles in our equations (eg, through births). Let's see what has
## happened after a year:
subset(ts.sir,time==365)
## There are more infecteds after a year than there were at time 0, so it looks
## like it takes more than a year for the infection to burn out with these
## initial conditions. If we want to see how the epidemic has progressed
## through time, we can plot to output from lsoda(). Remember that the
## parameters we have used are the same as in the live coding example, so
## this is a model of a measles epidemic in New York. We'll label the plot
## accordingly. (You'll learn a lot more about plotting - and modifying the
## appearance of plots - in Tutorial 4.)
plot(ts.sir$time, # Time on the x axis
ts.sir$I, # Number infected (I) on the y axis
xlab = "Time in days", # Label the x axis
ylab = "Number infected", # Label the y axis
main = "Measles in New York", # Plot title
xlim = c(0,400), #
type = "l", # Use a line plot
bty = "n") # Remove the box around the plot
## While there are many more infecteds at time 365 than there were at time 0,
## there are a lot fewer than there were at time 200, and it looks like the
## epidemic is nearly done.
######################################################################
## BENCHMARK QUESTIONS
######################################################################
## Question 1:
##
## Run the SIR model above out to 5 years to convince yourself that the epidemic
## is ending and will not come back.
##
## Question 2:
##
## Create a new function called sir.bd() that describes the SIR model with
## births and deaths (but constant population size), and compare the output
## to the model defined above. You may want to refer to yesterday's lecture
## notes for the model definition. Assume that the life expectancy in the
## population is 60 years.
##
## Question 3:
##
## Use lsoda() to evaluate the values of S and I through time for your new
## function defining the model with births and deaths over 5 years, with the
## same initial conditions as before. How do the dynamics compare?
##
## Question 4:
##
## Now change the value of the life expectancy for the population in the model.
## Compare the long-term dynamics of the model with different values for the
## life expectancy.