--- layout: post title: "Flatten the COVID-19 curve" tags: [rstats, dataviz, R, COVID-19, epidemic model] # bibliography: ~/Literature/Bibtex/jabref.bib header-includes: - \usepackage{bm} comments: true editor_options: chunk_output_type: console --- ```{r,include=FALSE,echo=TRUE,message=FALSE} ##If default fig.path, then set it. if (knitr::opts_chunk$get("fig.path") == "figure/") { knitr::opts_knit$set( base.dir = '/Users/hoehle/Sandbox/Blog/') knitr::opts_chunk$set(fig.path="figure/source/2020-03-16-flatteningthecurve/") } fullFigPath <- paste0(knitr::opts_knit$get("base.dir"),knitr::opts_chunk$get("fig.path")) filePath <- file.path("","Users","hoehle","Sandbox", "Blog", "figure", "source", "2020-03-16-flatteningthecurve") knitr::opts_chunk$set(echo = TRUE,fig.width=8,fig.height=4,fig.cap='',fig.align='center',echo=FALSE,dpi=72*2)#, global.par = TRUE) options(width=150, scipen=1e3) suppressPackageStartupMessages(library(tidyverse)) suppressPackageStartupMessages(library(knitr)) suppressPackageStartupMessages(library(kableExtra)) suppressPackageStartupMessages(library(viridis)) # Non CRAN packages # devtools::install_github("hadley/emo") ##Configuration options(knitr.table.format = "html") theme_set(theme_minimal()) #if there are more than n rows in the tibble, print only the first m rows. options(tibble.print_max = 10, tibble.print_min = 5) ``` ## Abstract: We discuss why the message of flattening the COVID-19 curve is right, but why some of the visualizations used to show the effect are wrong: Reducing the basic reproduction number does not just stretch the outbreak, it also reduces the final size of the outbreak. A Shiny app exists to investigate different scenarios.
Our #FlattenTheCurve graphic is now up on @Wikipedia with proper attribution & a CC-BY-SA licence. Please share far & wide and translate it into any language you can! Details in the thread below. #Covid_19 #COVID2019 #COVID19 #coronavirus Thanks to @XTOTL & @TheSpinoffTV pic.twitter.com/BQop7yWu1Q
— Dr Siouxsie Wiles (@SiouxsieW) March 10, 2020
This means that if we consider the dynamics of a disease in generation time, i.e. in a time scale where one time unit is the time period between infection in the primary case and infection in the secondary case, then $R_0$ denotes the growth factor in the size of the population at the beginning of the outbreak. What is special about the beginning of the outbreak? Well, more or less all contacts an infectious individual has, will be with susceptible individuals. However, once a large part of the population has already been infected, then $R_0$ does not necessarily describe the expected number of cases per primary case anymore. For the COVID-19 outbreak, since it is assumed that little immunity exists against the disease, all individuals will be susceptible and, hence, almost all contacts an infectious individual has, will be with susceptible persons. However, at a later stage in the epidemic, due to the depletion of susceptibles, the number of secondary cases will be lower. Furthermore, possible interventions as part of the epidemic response along the way might additionally lower the number of secondary cases. Assuming $R(0)$ and letting $I(0)=m$ we obtain $S(0) = N-m$. We can use this initial configuration together with a numerical solver for ODEs as implemented, e.g., in the `lsoda` function of the R package [`deSolve`](https://cran.r-project.org/web/packages/deSolve/index.html) [@desolve2010]. For this, we need to implement a function, which describes the derivative of the ODE system: ```{r, echo=TRUE} ###################################################################### # Function to compute the derivative of the ODE system # # t - time # y - current state vector of the ODE at time t # parms - Parameter vector used by the ODE system # # Returns: # list with one component being a vector of length two containing # dS(t)/dt and dI(t)/dt ###################################################################### sir <- function(t, y, parms) { beta <- parms[1] gamma <- parms[2] S <- y[1] I <- y[2] return(list(c(S = -beta * S * I, I = beta * S * I - gamma * I))) } ``` ```{r, echo=TRUE} # Population size N <- 1e6 # Rate at which person stays in the infectious compartment (disease specific and tracing specific) gamma <- 1/5 # Infectious contact rate - beta = R0/N*gamma and when R0 \approx 2.25 then 2.25/N*gamma beta <- 4.5e-07 # R0 for the beta and gamma values R0 <- beta*N/gamma ``` Assuming a hypothetical population of $N = `r format(N, big.mark=",")`$ and a contact rate of $\beta = `r beta`$ means that the contact rate with a given individual is `r beta` contacts per day. The choice of $\gamma = `r gamma`$ corresponds to an average length of the infective period of 5 days. Altogether, this leads to an $R_0$ of `r R0`, which roughly corresponds to the $R_0$ of SARS-CoV-2. We can now solve the ODE system using the above parameters and an initial number of infectious of, say, 10: ```{r, echo=TRUE} # Load package to numerically solve ODEs suppressPackageStartupMessages(library(deSolve)) # Grid where to evaluate max_time <- 150 times <- seq(0, max_time, by=0.01) # Solve ODE system using Runge-Kutta numerical method. ode_solution <- rk4(y = c(N - 10, 10), times = times, func = sir, parms = c(beta, gamma)) %>% as.data.frame() %>% setNames(c("t", "S", "I")) %>% mutate(beta = beta, gama = gamma, R0 = N * beta / gamma, s = S / N, i = I / N, type = "without_intervention") ``` Here we have introduced $s(t) = S(t)/N$ and $i(t) = I(t)/N$ as, respectively, the proportion of susceptible and infectious individuals in the population. Note that $I(t)$ is the number of currently infectious persons. Since a person is usually infectious for more than one day this curve is not equivalent to the number of **new** infections per day. If interest is in this value, which would typically be what is reported by health authorities, this can be computed as $$ C(t) = \int_{t-1}^t \beta S(u) I(u) du. $$ or due to the SIR structure where re-infections are not possible simply as $S(t-1) - S(t)$. ```{r} ode_solution_daily <- ode_solution %>% filter(t %in% seq(0, max_time, by = 1)) %>% mutate(C = if_else(row_number() == 1, 0, lag(S) - S), c = C / N) ```` The epidemic curve of new infections per day is shown below: ```{r, echo=FALSE} df_plot <- ode_solution_daily %>% select(t, c) %>% pivot_longer(-t, names_to= "Quantity", values_to = "Proportion") ggplot(df_plot, aes(x=t, y=Proportion)) + geom_col() + xlab("Time (days)") + ylab("Daily new cases (as proportion of the population)") + scale_y_continuous(labels=scales::percent) ``` Another important quantity of the model is an estimate for how many individuals are ultimately infected by the disease, i.e. $1-s(\infty)$ in a population where initially everyone is susceptible to the disease. This can be either calculated numerically from the above output as: ```{r, echo=TRUE} (1 - tail(ode_solution, n=1) %>% pull(s)) ``` or by numerically solving the following recursive equation [@diekman_etal2013, p.15]: $$ s(\infty) = \exp(-R_0 (1-s(\infty))) $$ using R: ```{r, echo=TRUE} # Function to compute the final size. s_inf <- function(R0) { f_target <- function(x) { x - exp(-R0*(1-x)) } result <- uniroot(f_target, lower=1e-12, upper=1-1e-12)$root return(result) } # Final proportion of infected. 1 - s_inf(R0) ``` We can use the above equation to verify that the larger $R_0$, the larger is the final size of the outbreak: ```{r, echo=TRUE} R0_grid <- c(1.25, 1.5, 1.75,2, 2.25, 2.5, 3) map_dbl( R0_grid, ~ 1-s_inf(.x)) %>% setNames(R0_grid) %>% scales::percent(accuracy=1) ``` However, despite a value of $R_0>1$, not the entire population will be infected, because of the depletion of susceptibles. Hence, the exponential growth rate interpretation of $R_0$ is only valid at the beginning of an outbreak. ## Reducing $R_0$ As we could see from the equation defining $R_0$ in our simple SIR-model, there are two ways to reduce the $R_0$ of a disease: 1. Reduce the number of contacts persons have with each-other, i.e. make $\beta$ smaller 2. Reduce the duration for how long people are effectively spreading the disease (e.g. by quarantine), i.e. reduce the average duration $\gamma$ of how long infectious individuals can infect others. For simplicity we shall only be interested in the first case and will pursue the following very simple strategy, where the different measures only depend on time: $$ \beta(t) = \left\{ \begin{array}{} \beta_0 & \text{if } t\leq t_1, \\ \beta_1 & \text{if } t_1 < t\leq t_2, \\ \beta_2 & \text{if } t_2 < t \end{array} \right. $$ where $\beta_0$ is the ordinary $\beta$ value of the disease. We will use a large reduction of the contacts within the interval $[t_1, t_2]$ and thus $\beta_1 < \beta_0$. After some time, the control measures are reduced slightly, i.e. $\beta_1 < \beta_2 < \beta_0$. We shall use $\beta_1 = r_1 \beta_0$ and $\beta_2 = r_2 \beta_0$ with $r_1 \leq r_2$. The two epidemic curves can now be plotted as follows: ```{r} sir_with_social_dist <- function(t, y, parms, social_dist_period, reduction) { beta0 <- parms[1] gamma <- parms[2] # Reduce contact rate to params[1] * reduction in time period beta_t <- if_else(t <= social_dist_period[1], beta0, if_else(t <= social_dist_period[2], beta0 * reduction[1], beta0 * reduction[2] ) ) S <- y[1] I <- y[2] return(list(c(S = -beta_t * S * I, I = beta_t * S * I - gamma * I))) } # Define parameters social_dist_period <- c(30, 60) reduction <- c(0.6, 0.8) # Solve ode_solution2 <- lsoda(y = c(N - 10, 10), times = times, func = sir_with_social_dist, parms = c(beta, gamma), social_dist_period = social_dist_period, reduction = reduction) %>% as.data.frame() %>% setNames(c("t", "S", "I")) %>% mutate(beta = beta, gama = gamma, R0 = N * beta / gamma, s = S / N, i = I / N, type = "with_intervention") # Use daily number of new cases ode_solution2_daily <- ode_solution2 %>% filter(t %in% seq(0, max_time, by = 1)) %>% mutate(C = if_else(row_number() == 1, 0, lag(S) - S), c = C / N) #Combine the two solutions into one dataset ode_df <- rbind(ode_solution_daily, ode_solution2_daily) ``` ```{r FLATTENTHECURVE, echo=FALSE} ggplot(ode_df, aes(x=t, y=0, xend=t, yend=c, color=type)) + geom_segment(alpha=0.7) + geom_line(aes(x=t, y=c)) + xlab("time") + ylab("Daily new cases") + scale_y_continuous(labels=scales::percent) + theme_minimal() + theme(axis.text.x = element_blank(), axis.text.y = element_blank(), axis.ticks = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + scale_color_brewer(name = "Epidemic model:", type= "qual", palette=1 ) + geom_vline(xintercept = social_dist_period, lty=2) #+ #geom_hline(yintercept = 0.02, lty=2, color="steelblue") #+ #geom_text(data=data.frame(t=120, c=0.021, label="Healthcare system capacity",type="without_intervention"), aes(t,c, label=label), color="steelblue") ``` The final size in the two cases: ```{r, echo=FALSE, message=FALSE} ode_df %>% group_by(type) %>% filter(row_number() == n()) %>% mutate(final_fraction = scales::percent(1 - s, accuracy = 1)) %>% select(final_fraction) ``` Here we used the very conservative estimates that $R_0$ can be reduced by `r scales::percent(reduction[1])` to `r R0*reduction[1]` for a few weeks, hereafter the reduction is `r scales::percent(reduction[2])` of the original $R_0$, i.e. `r R0*reduction[2]`. Things of course become more optimistic, the larger the reduction is. One danger is although to reduce $R_0$ drastically and then lift the measures too early and too much - in this case the outbreak is delayed, but then almost of the same peak size and final size, only later. The simple analysis in this post shows that the final size proportion with interventions is several percentage points smaller than without interventions. The larger the interventions, if done right and timed right, the smaller the final size. In other words: the spread of an infectious disease in a population is a dynamic phenomena. Time matters. The timing of interventions matters. If done correctly, they stretch the outbreak and reduce the final size! ## Discussion The epidemic model based approaches to flatten the curve shows that the effect of reducing the basic reproduction number is not just to stretch out the outbreak, but also to limit the size of the outbreak. This is an aspect which seems to be ignored in some visualizations of the effect. The simple SIR model used in this post suffers from a number of limitations: It is a deterministic construction averaging over many stochastic phenomena. However, at the large scale of the outbreak we are now talking about, this simplification appears acceptable. Furthermore, it assumes homogeneous mixing between individuals, which is way too simple. Dividing the population into age-groups as well as their geographic locations and modelling the interaction between these groups would be a more realistic reflection of how the population is shaped. Again, for the purpose of the visualization of the flatten-the-curve effect again I think a simple model is OK. More involved modelling covering the establishment of the disease as endemic in the population are beyond this post, so is the effectiveness of the case-tracing. For more background on the modelling see for example the YouTube video about the [The Mathematics of the Coronavirus Outbreak](https://www.youtube.com/watch?time_continue=1&v=gSqIwXl6IjQ&feature=emb_logo) by my colleague Tom Britton or the work by @fraser_etal2004. It is worth pointing out that mathematical models are only tools to gain insight. They are based on assumptions which are likely to be wrong. The question is, if a violation is crucial or if the component is still adequately captured by the model. A famous quote says: All models are wrong, but some are useful... **Useful** in this case is the message: **flatten the curve by reducing infectious contacts and by efficient contact tracing**. **Update 2020-03-18**: Tinu Schneider from Switzerland used the GitHub code of this post to program a small [Shiny app](https://tinu.shinyapps.io/Flatten_the_Curve/) allowing one to study the impact of changing the parameters. Check it out! ## Literature