{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Modeling the COVID-19 Pandemic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reading\n",
    "\n",
    "1. Bregman D.J., A.D. Langmuir, 1990. Farr's law applied to AIDS projections. _JAMA_, 263:1522–1525. doi: 10.1001/jama.263.11.1522. \n",
    "\n",
    "1. King, A.A.,  M.D. de Cellès, F.M.G. Magpantay, and P. Rohani, 2015. Avoidable errors in the modelling of outbreaks of emerging pathogens, with special reference to Ebola, _Proc. R. Soc. B._ 28220150347 http://doi.org/10.1098/rspb.2015.0347.\n",
    "\n",
    "1. Murray, C.J.L., and IHME COVID-19 health service utilization forecasting team, 2020. Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months,\n",
    "https://www.medrxiv.org/content/10.1101/2020.03.27.20043752v1.full-text\n",
    "\n",
    "1. Jewell, N.P.,  J.A. Lewnard,  and B.L. Jewell, 2020.\n",
    "Predictive Mathematical Models of the COVID-19 Pandemic: Underlying Principles and Value of Projections,\n",
    "_JAMA_, 323(19):1893-1894. doi:10.1001/jama.2020.6585\n",
    "\n",
    "1. Froese, H., 2020. Infectious Disease Modelling: Fit Your Model to Coronavirus Data, Towards Data Science, https://towardsdatascience.com/infectious-disease-modelling-fit-your-model-to-coronavirus-data-2568e672dbc7; https://github.com/henrifroese/infectious_disease_modelling \n",
    "\n",
    "1. Smith, D., and L. Moore, 2004. The SIR model for spread of disease, _Convergence_, https://www.maa.org/press/periodicals/loci/joma/the-sir-model-for-spread-of-disease-introduction\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Models of epidemics\n",
    "\n",
    "This notebook explores two basic approaches to modeling the spread of a communicable disease through a population. There are more complicated approaches than are considered here, but these are common, and the predictions of COVID-19 infections, ICU beds, and deaths are often based on variations of these models."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SIR models\n",
    "\n",
    "Mathematical models of epidemics date back to the early 1900s.\n",
    "\n",
    "One of the simplest models of epidemics is the SIR model, which partitions the population into three disjoint groups: Susceptible, Infected, and Recovered (or Removed).\n",
    "\n",
    "The SIR model treats the population as fixed: no births, immigration, emigration, or deaths from other causes.\n",
    "It ignores the possibility of carriers, population heterogeneity, or geographic heterogeneity.\n",
    "It treats everyone as equivalent and assumes the population \"mixes\" perfectly.\n",
    "\n",
    "Here are the state variables of the SIR model:\n",
    "\n",
    "+ $N$: initial population size\n",
    "+ $S(t)$: number of susceptible individuals in the population at time $t$\n",
    "+ $I(t)$: number of infected individuals in the population at time $t$\n",
    "+ $R(t)$: number \"removed\" from risk (dead or recovered) in the population at time $t$\n",
    "\n",
    "Assumptions:\n",
    "\n",
    "+ Everyone is either susceptible, infected, or removed (recovered or dead). (There are more complex models with more \"compartments.\")\n",
    "+ There are no births, immigration, emigration, or deaths from other causes.\n",
    "+ If a susceptible person is infected, that person eventually dies or recovers. \n",
    "+ While a person is infected, the person is infectious.\n",
    "+ There is no incubation period between being infected and being infectious.\n",
    "+ If an infected person gets \"close enough\" to a susceptible person, the susceptible person becomes infected.\n",
    "+ Every unit of time, every infected person exposes $\\beta$ people.\n",
    "+ Every susceptible person exposed to an infected person becomes infected.\n",
    "+ Every infected person exposes a disjoint group of people.\n",
    "+ Every unit of time, a fraction $\\gamma$ of infected people recover or die.\n",
    "+ People who recover or die are not infectious.\n",
    "+ People who recover or die never become susceptible again.\n",
    "\n",
    "Then $S + I + R = N$ is constant, the initial population size.\n",
    "\n",
    "The basic reproductive number is $R_0 = \\beta/\\gamma$.\n",
    "Initially, when essentially the entire population is susceptible (the proportion who are infected or recovered is small compared to $N$), this is the number of people each infected person infects before recovering or dying. \n",
    "(Note that $R_0$ has no connection to $R$: the notation is unfortunate.)\n",
    "\n",
    "Define $s(t) \\equiv S(t)/N$, $i(t) \\equiv I(t)/N$, and $r(t) \\equiv R(t)/N$.\n",
    "Then $s+i+r = 1$ for all $t$.\n",
    "There is an epidemic if $s(0)\\beta/\\gamma > 1$: the rate at which susceptible people are\n",
    "getting infected is greater than the rate at which infected people are recoving. \n",
    "\n",
    "The parameters of the model are:\n",
    "\n",
    "+ $\\beta$, the number of exposed people per unit of time per infected persion\n",
    "+ $\\gamma$, the fraction of infected people who are removed (recover or die) per unit time\n",
    "\n",
    "The state of the population evolves according to three coupled differential equations:\n",
    "\n",
    "+ $dS/dt = -\\beta I S/N$\n",
    "+ $dI/dt = \\beta I S/N - \\gamma I$ (equivalently, $di/dt = \\beta is - \\gamma i$)\n",
    "+ $dR/dt = \\gamma I$\n",
    "\n",
    "Another interesting variable is the cumulative number of infections, $C$.\n",
    "\n",
    "+ $dC/dt = \\beta I S/N$; $C(0) = I(0)$.\n",
    "\n",
    "The SIR model has some built-in features that might not match reality.\n",
    "First, note that the derivative of $R$ is always non-negative and proportional to $I$.\n",
    "Since $R \\le N$ and $N$ is finite, $I$ must eventually be zero; that is, the SIR implies that eventually\n",
    "the entire population will be disease-free. \n",
    "Once $I=0$, it can never grow again, since the derivative of $I$ is proportional to $I$.\n",
    "(If nobody is infected, nobody can infect anyone.)\n",
    "Epidemics that follow the SIR model are necessarily self-limiting.\n",
    "\n",
    "That isn't the case if immunity wears off over time (or if the population is not isolated).\n",
    "In that case, we might we parametrize the transition from recovered to susceptible by \n",
    "assuming that a fraction $\\delta$ of $R$ return to $S$ in each time period. \n",
    "(There are other ways we could do this, for instance, having a variable \"lag\" between recovery and becoming susceptible again.)\n",
    "That yields:\n",
    "\n",
    "+ $dS/dt = -\\beta I S/N + \\delta R$\n",
    "+ $dI/dt = \\beta I S/N - \\gamma I$ (equivalently, $di/dt = \\beta is - \\gamma i$)\n",
    "+ $dR/dt = \\gamma I - \\delta R$\n",
    "\n",
    "In this model, $I$ does not necessarily go to zero eventually.\n",
    "\n",
    "Other generalizations include separate \"compartments\" for dead versus recovered), needing an ICU bed, etc., as well as time-varying values of $\\beta$ to account for lockdowns or other policy interventions, heterogeneity in the population, and more."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's run this model (in discrete time) for a population of $N=1,000,000$ people of whom 0.05% are initially infected.\n",
    "We will take the time interval to be 1 day, the period of infectiousness to be $1/\\gamma = 14$ days,\n",
    "and $R_0 = 2 = \\beta/\\gamma$, so $\\beta = 2 \\gamma = 1/7$. We will start with $\\delta=0$ (no re-infections)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# boilerplate\n",
    "import numpy as np\n",
    "import scipy as sp\n",
    "from scipy.integrate import odeint\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.dates as mdates\n",
    "\n",
    "from ipywidgets import interact, interactive, fixed, interact_manual\n",
    "import ipywidgets as widgets\n",
    "from IPython.display import clear_output, display, HTML"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sir_model(N, i_0, beta=1, gamma=1, delta=0, steps=365):\n",
    "    '''\n",
    "    Run SIR model as coupled finite-difference equations.\n",
    "    \n",
    "    Assumes that initially a fraction i_0 of the population of N people is infected and \n",
    "    everyone else is susceptible (none has died or recovered).\n",
    "    \n",
    "    Parameters\n",
    "    -----------\n",
    "    N     : int\n",
    "        population size\n",
    "    i_0   : double in (0, 1)\n",
    "        initial fraction of population infected\n",
    "    beta  : double in [0, infty)\n",
    "        number of encounters each infected person has per time step\n",
    "    gamma : double in (0, 1)\n",
    "        fraction of infecteds who recover or die per time step\n",
    "    delta : double in (0, 1)\n",
    "        fraction of recovereds who become susceptible per time step\n",
    "    steps : int\n",
    "        number of steps in time to run the model\n",
    "    \n",
    "    Returns\n",
    "    --------\n",
    "    S, I, R, C \n",
    "    S     : list\n",
    "        time history of susceptibles\n",
    "    I     : list\n",
    "        time history of infecteds\n",
    "    R     : list\n",
    "        time history of recovered/dead    \n",
    "    C     : list\n",
    "        cumulative number of infections over time\n",
    "    '''\n",
    "    assert i_0 > 0, 'initial rate of infection is zero'\n",
    "    assert i_0 <= 1, 'infection rate greater than 1'\n",
    "    assert beta >= 0, 'beta must be nonnegative'\n",
    "    assert gamma > 0, 'gamma must be positive'\n",
    "    assert gamma < 1, 'gamma must be less than 1'\n",
    "    assert delta >= 0, 'delta must be nonnegative'\n",
    "    assert delta < 1, 'delta must be less than 1'\n",
    "    S = np.zeros(steps)\n",
    "    I = np.zeros(steps)\n",
    "    R = np.zeros(steps)\n",
    "    C = np.zeros(steps)\n",
    "    I[0] = int(N*i_0)\n",
    "    C[0] = I[0]\n",
    "    S[0] = N-I[0]\n",
    "    for i in range(steps-1):\n",
    "        new_i = beta*I[i]*S[i]/N\n",
    "        S[i+1] = max(0, S[i] - new_i + delta*R[i])\n",
    "        I[i+1] = max(0, I[i] + new_i - gamma*I[i])\n",
    "        R[i+1] = max(0, R[i] + gamma*I[i] - delta*R[i])\n",
    "        C[i+1] = C[i] + new_i\n",
    "    return S, I, R, C"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_sir(N, i_0, beta=2/14, gamma=1/14, delta=0, steps=365, verbose=False):\n",
    "    '''\n",
    "    Plot the time history of an SIR model.\n",
    "    \n",
    "    Parameters\n",
    "    -----------\n",
    "    N     : int\n",
    "        population size\n",
    "    i_0   : double in (0, 1)\n",
    "        infection rate at time 0\n",
    "    beta  :  double in [0, infty)\n",
    "        number of encounters each infected person has per time step\n",
    "    gamma : double in (0, 1)\n",
    "        fraction of infecteds who recover or die per time step\n",
    "    delta : double in (0, 1)\n",
    "        fraction of recovereds who become susceptible per time step\n",
    "    steps : int\n",
    "        number of time steps to run the model\n",
    "    verbose : Boolean\n",
    "        if True, return the model predictions\n",
    "    \n",
    "    Returns (if verbose == True)\n",
    "    --------\n",
    "    S : list\n",
    "        susceptibles as a function of time\n",
    "    I : list\n",
    "        infecteds as a function of time\n",
    "    R : list\n",
    "        recovereds as a function of time\n",
    "    C : list\n",
    "        cumulative incidence as a function of time\n",
    "    '''\n",
    "    S, I, R, C = sir_model(N, i_0, beta=beta, gamma=gamma, delta=delta, steps=steps)\n",
    "    times = list(range(steps))\n",
    "    fig, ax = plt.subplots(nrows=1, ncols=1)\n",
    "    ax.plot(times, I, linestyle='--', color='r', label='Infected')\n",
    "    ax.plot(times, S, linestyle='-', color='b', label='Susceptible')\n",
    "    ax.plot(times, R, linestyle=':', color='g', label='Recovered')\n",
    "    ax.plot(times, C, linestyle='-.', color='k', label='Tot infect.')\n",
    "    ax.legend(loc='best')\n",
    "    plt.show()\n",
    "    if verbose:\n",
    "        return S, I, R, C"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "i_0 = 0.0005\n",
    "N = int(10**6)\n",
    "steps = 365\n",
    "\n",
    "plot_sir(N, i_0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For these parameters, the number of infections is shaped like a bell curve and the cumulative number of infections is shaped like a CDF: it is \"sigmoidal\" (vaguely \"S\"-shaped). You might imagine fitting a scaled CDF to the curve. Essentially, the IHME model fits a scaled Gaussian CDF to the total number of infections.\n",
    "(More on this below.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What if immunity wears off over time? Let's try a positive value of $\\delta$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plot_sir(N, i_0, delta=0.01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "da8e0e4dea6143c2932312d32071b4b6",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "interactive(children=(FloatSlider(value=0.0005, description='i_0', max=1.0), FloatSlider(value=0.1428571428571…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "<function __main__.plot_sir(N, i_0, beta=0.14285714285714285, gamma=0.07142857142857142, delta=0, steps=365, verbose=False)>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "interact(plot_sir, N=fixed(N),\n",
    "                   i_0 = widgets.FloatSlider(min=0, max=1, value=i_0),\n",
    "                   beta=widgets.FloatSlider(value=2/14, min=0.01, max=100),\n",
    "                   gamma=widgets.FloatSlider(value=1/14, min=0, max=1, step=0.01),\n",
    "                   delta=widgets.FloatSlider(value=0, min=0, max=1, step=0.01),\n",
    "                   steps=fixed(steps),\n",
    "                   verbose=fixed(False))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fitting the SIR model to data\n",
    "\n",
    "Now we will use least squares to fit the SIR model to a time history of infection data collected by researchers at Johns Hopkins University. See https://github.com/CSSEGISandData\n",
    "\n",
    "\n",
    "This exercise is for illustration. There are many reasons to be wary of this modeling:\n",
    "\n",
    "+ There are serious issues with the data quality: these are cases that were _reported_ according to the rules and circumstances of where they were detected. For mortality (rather than incidence), excess mortality might give a more accurate measure.\n",
    "+ The model is a cartoon, not \"physics\" of epidemics, and it omits many factors that plausibly matter. It is not clear what estimates of the parameters mean when the model is wrong.\n",
    "+ Absent a trustworthy generative model for the data, it is not clear how to assess or interpret the uncertainty of the estimates.\n",
    "+ If the estimated model is to be used for prediction, there is no obvious way to assign meaningful uncertainties to the predictions.\n",
    "+ The optimization problem to fit the parameters has no statistical content or motivation. It is not clear that it yields an estimate that is \"good\" in a useful sense. \n",
    "+ The objective function is not convex in the model parameters, and the optimization algorithm is not guaranteed to solve the optimization problem.\n",
    "\n",
    "The data record total \"confirmed\" cases as a function of time, deaths, and recoveries: $C(t)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.optimize import curve_fit  # nonlinear least squares\n",
    "\n",
    "def f(x, beta, gamma):\n",
    "    '''\n",
    "    Model cumulative infections\n",
    "    \n",
    "    This is just a wrapper for a call to sir_model, to generate a set of\n",
    "    predictions of the cumulative incidence for curve fitting\n",
    "    '''    \n",
    "    return sir_model(N, i_0, beta, gamma, delta=0, steps=len(x))[3]  # C"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 500.        ,  571.39285714,  647.87464122,  729.80689646,\n",
       "        817.5766761 ,  911.59831599, 1012.31532767, 1120.20241833,\n",
       "       1235.76764552, 1359.55471484, 1492.14542917, 1634.1622985 ,\n",
       "       1786.27131969, 1949.18493587, 2123.66518564, 2310.5270524 ,\n",
       "       2510.64202461, 2724.94187786, 2954.42268994, 3200.14910031,\n",
       "       3463.25882519, 3744.96743963, 4046.57343789, 4369.46358264,\n",
       "       4715.11855381])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# test f\n",
    "N = int(10**6)\n",
    "i_0 = 0.0005\n",
    "x = range(25)\n",
    "y = f(x, 1/7, 1/14 )\n",
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0.14285714, 0.07142857])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# test curve_fit\n",
    "popt, pvoc = curve_fit(f, x, y, p0=[1, 0.5], bounds = (0, [np.inf, 1]))\n",
    "popt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# try some real data from the JHU site\n",
    "import pandas as pd\n",
    "# data for countries. THIS IS A \"LIVE\" DATASET THAT IS UPDATED FREQUENTLY\n",
    "url = \"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv\"\n",
    "df = pd.read_csv(url, sep=\",\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Province/State</th>\n",
       "      <th>Country/Region</th>\n",
       "      <th>Lat</th>\n",
       "      <th>Long</th>\n",
       "      <th>1/22/20</th>\n",
       "      <th>1/23/20</th>\n",
       "      <th>1/24/20</th>\n",
       "      <th>1/25/20</th>\n",
       "      <th>1/26/20</th>\n",
       "      <th>1/27/20</th>\n",
       "      <th>...</th>\n",
       "      <th>3/4/22</th>\n",
       "      <th>3/5/22</th>\n",
       "      <th>3/6/22</th>\n",
       "      <th>3/7/22</th>\n",
       "      <th>3/8/22</th>\n",
       "      <th>3/9/22</th>\n",
       "      <th>3/10/22</th>\n",
       "      <th>3/11/22</th>\n",
       "      <th>3/12/22</th>\n",
       "      <th>3/13/22</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>33.93911</td>\n",
       "      <td>67.709953</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>174214</td>\n",
       "      <td>174331</td>\n",
       "      <td>174582</td>\n",
       "      <td>175000</td>\n",
       "      <td>175353</td>\n",
       "      <td>175525</td>\n",
       "      <td>175893</td>\n",
       "      <td>175974</td>\n",
       "      <td>176039</td>\n",
       "      <td>176201</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Albania</td>\n",
       "      <td>41.15330</td>\n",
       "      <td>20.168300</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>272030</td>\n",
       "      <td>272030</td>\n",
       "      <td>272210</td>\n",
       "      <td>272250</td>\n",
       "      <td>272337</td>\n",
       "      <td>272412</td>\n",
       "      <td>272479</td>\n",
       "      <td>272552</td>\n",
       "      <td>272621</td>\n",
       "      <td>272663</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Algeria</td>\n",
       "      <td>28.03390</td>\n",
       "      <td>1.659600</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>265186</td>\n",
       "      <td>265227</td>\n",
       "      <td>265265</td>\n",
       "      <td>265297</td>\n",
       "      <td>265323</td>\n",
       "      <td>265346</td>\n",
       "      <td>265366</td>\n",
       "      <td>265391</td>\n",
       "      <td>265410</td>\n",
       "      <td>265432</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Andorra</td>\n",
       "      <td>42.50630</td>\n",
       "      <td>1.521800</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>38434</td>\n",
       "      <td>38434</td>\n",
       "      <td>38434</td>\n",
       "      <td>38620</td>\n",
       "      <td>38710</td>\n",
       "      <td>38794</td>\n",
       "      <td>38794</td>\n",
       "      <td>38794</td>\n",
       "      <td>38794</td>\n",
       "      <td>38794</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>Angola</td>\n",
       "      <td>-11.20270</td>\n",
       "      <td>17.873900</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>98796</td>\n",
       "      <td>98796</td>\n",
       "      <td>98806</td>\n",
       "      <td>98806</td>\n",
       "      <td>98829</td>\n",
       "      <td>98855</td>\n",
       "      <td>98855</td>\n",
       "      <td>98855</td>\n",
       "      <td>98909</td>\n",
       "      <td>98927</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 786 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "  Province/State Country/Region       Lat       Long  1/22/20  1/23/20  \\\n",
       "0            NaN    Afghanistan  33.93911  67.709953        0        0   \n",
       "1            NaN        Albania  41.15330  20.168300        0        0   \n",
       "2            NaN        Algeria  28.03390   1.659600        0        0   \n",
       "3            NaN        Andorra  42.50630   1.521800        0        0   \n",
       "4            NaN         Angola -11.20270  17.873900        0        0   \n",
       "\n",
       "   1/24/20  1/25/20  1/26/20  1/27/20  ...  3/4/22  3/5/22  3/6/22  3/7/22  \\\n",
       "0        0        0        0        0  ...  174214  174331  174582  175000   \n",
       "1        0        0        0        0  ...  272030  272030  272210  272250   \n",
       "2        0        0        0        0  ...  265186  265227  265265  265297   \n",
       "3        0        0        0        0  ...   38434   38434   38434   38620   \n",
       "4        0        0        0        0  ...   98796   98796   98806   98806   \n",
       "\n",
       "   3/8/22  3/9/22  3/10/22  3/11/22  3/12/22  3/13/22  \n",
       "0  175353  175525   175893   175974   176039   176201  \n",
       "1  272337  272412   272479   272552   272621   272663  \n",
       "2  265323  265346   265366   265391   265410   265432  \n",
       "3   38710   38794    38794    38794    38794    38794  \n",
       "4   98829   98855    98855    98855    98909    98927  \n",
       "\n",
       "[5 rows x 786 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "746 [      1       1       3       4       4       6      10      10      23\n",
      "      23      35      90     262     442     615     801     827     864\n",
      "     914     977    1057    1151    1255    1326    1395    1450    1591\n",
      "    1724    1877    2046    2201    2395    2577    2860    3107    3386\n",
      "    3757    4077    4369    4681    5071    5402    5635    5819    5996\n",
      "    6174    6318    6511    6681    6879    7073    7242    7384    7515\n",
      "    7695    7912    8073    8210    8445    8575    8698    8851    9008\n",
      "    9158    9311    9407    9523    9670    9821    9938   10083   10218\n",
      "   10319   10429   10513   10591   10667   10713   10791   10858   10927\n",
      "   10968   11044   11117   11182   11230   11289   11360   11387   11428\n",
      "   11480   11512   11593   11633   11669   11699   11734   11771   11811\n",
      "   11875   11924   11948   11962   12001   12016   12035   12099   12139\n",
      "   12193   12217   12250   12294   12344   12391   12391   12391   12527\n",
      "   12561   12615   12636   12675   12675   12675   12751   12768   12794\n",
      "   12815   12832   12832   12832   12878   12888   12900   12916   12946\n",
      "   12946   12946   13037   13061   13092   13124   13173   13173   13173\n",
      "   13262   13302   13350   13390   13438   13438   13438   13547   13577\n",
      "   13634   13725   13789   13789   13789   13996   14073   14185   14306\n",
      "   14442   14442   14442   14815   14959   15070   15214   15379   15483\n",
      "   15617   15740   15855   15940   16056   16127   16239   16317   16397\n",
      "   16480   16537   16627   16700   16779   16891   16985   17084   17195\n",
      "   17374   17547   17736   17883   18113   18356   18607   18924   19216\n",
      "   19557   19890   20237   20571   20571   21393   21847   22436   22905\n",
      "   23323   23799   24357   24916   25594   26213   26637   27072   27464\n",
      "   27998   28396   28932   29302   29680   30057   30379   30710   31156\n",
      "   31638   32082   32422   32811   33101   33593   34023   34441   34941\n",
      "   35392   35844   36373   37003   37763   38622   39411   40356   41412\n",
      "   42157   43174   44034   45225   46351   47299   48241   49594   50530\n",
      "   51753   53180   54230   55121   55892   56958   57952   58963   60000\n",
      "   61078   62136   63331   64551   65808   67105   68362   69635   70485\n",
      "   71654   73021   74204   75395   76718   78354   79352   80481   81949\n",
      "   83535   85140   86743   88858   90603   92649   94799   97357  100489\n",
      "  103564  107116  109758  113095  116087  119779  123813  128321  131606\n",
      "  134434  137632  140175  143472  146341  149333  151167  153347  155826\n",
      "  158447  161230  163479  165930  167541  168711  170787  172779  174995\n",
      "  176837  178497  180240  181486  182725  183801  185159  185159  187320\n",
      "  188199  189088  189895  190619  191505  192265  193038  193917  194671\n",
      "  195296  195948  196540  197208  197664  198095  198472  198960  199357\n",
      "  199782  200335  200773  201186  201621  202051  202417  202887  203365\n",
      "  203793  204067  204362  204799  205183  205597  206065  206617  207081\n",
      "  207577  208027  208556  209079  209682  210212  210732  211195  211692\n",
      "  212224  212798  213318  213932  214326  214839  215264  215791  217798\n",
      "  218660  219305  219918  220459  221071  221842  222629  223415  224258\n",
      "  224848  225505  226777  227031  225030  225844  226633  227049  228013\n",
      "  228692  229902  230603  231265  231973  232718  233318  233797  234317\n",
      "  234931  235648  236346  237101  237792  238306  238869  239532  240330\n",
      "  241007  241731  242633  243374  244065  244868  245761  246463  247010\n",
      "  247622  248326  248950  249785  250554  250554  252045  252912  253673\n",
      "  254482  254482  256482  257505  258182  259056  259988  260913  262159\n",
      "  263514  264465  265539  266503  267339  268255  269343  270557  271908\n",
      "  272659  273494  274413  275207  276280  277399  278396  279434  280383\n",
      "  281227  282135  283089  284117  285044  285636  286489  286948  287325\n",
      "  288229  288704  289122  289559  289874  290111  290333  290686  291017\n",
      "  291220  291463  291652  291801  291956  292179  292352  292574  292769\n",
      "  292943  293094  293337  293677  294152  294478  294925  295317  295654\n",
      "  296196  296885  297543  298094  298614  299223  300071  301126  302328\n",
      "  303469  304429  305459  306100  306944  307764  308615  309420  310127\n",
      "  310876  311520  312292  312851  314135  314983  316068  316807  317700\n",
      "  318485  319295  320222  321154  322019  322965  323786  324721  325725\n",
      "  326887  327972  329010  329927  330777  331736  332622  333815  334799\n",
      "  335722  336746  337466  338240  339580  340567  341549  342474  343351\n",
      "  344088  344850  345693  346518  347212  347909  348347  348979  349440\n",
      "  349891  350405  350996  351553  351939  352373  352636  353061  353431\n",
      "  353744  354068  354393  354645  354913  355257  355603  355944  356326\n",
      "  356684  357037  357370  357827  358369  358796  359237  359663  360031\n",
      "  360411  360888  361461  362068  362738  363356  363900  364464  365051\n",
      "  365840  366607  367307  368060  368651  369403  370159  371286  372533\n",
      "  373803  375065  376414  377825  379078  380949  382796  384580  386251\n",
      "  387783  389240  391221  393199  395797  398048  400145  402561  404855\n",
      "  407417  410434  412566  417151  420099  423322  426992  430891  434798\n",
      "  438811  442881  446676  450091  453802  458001  462427  466817  470893\n",
      "  474637  478927  483253  487401  492521  497201  501760  506085  509111\n",
      "  516257  522581  529210  536195  542794  548400  554389  562188  570502\n",
      "  579275  589274  600468  609062  617274  627356  647934  661320  673807\n",
      "  685036  685036  685036  726071  739071  762299  783702  802397  823282\n",
      "  831236  840037  865110  893393  919388  937649  950237  969485  983899\n",
      " 1006295 1030638 1056389 1080003 1105037 1131206 1159986 1193479 1232238\n",
      " 1272864 1319695 1355815 1397833 1438181 1484771 1531518 1582551 1636206\n",
      " 1677289 1713485 1742569 1787935 1842936 1887161 1927340 1966530 2003042\n",
      " 2037891 2087689 2142809 2196556 2244726 2289076 2327399 2356873 2399851\n",
      " 2442799 2483399 2519057 2552361 2578051 2606934 2637414 2666454 2691663\n",
      " 2714447 2731806 2748260 2764838 2783399 2803857 2803857 2837501 2853236\n",
      " 2864063 2876288 2891763 2908721 2923030 2936856 2943947 2950605]\n"
     ]
    }
   ],
   "source": [
    "# data for Denmark\n",
    "DK = df.loc[(df[\"Country/Region\"] == \"Denmark\") & df[\"Province/State\"].isna()]  # Denmark\n",
    "DK = DK.drop(['Province/State','Country/Region','Lat','Long'], axis=1).T        # remove fields we don't need\n",
    "y = DK.to_numpy()   # turn series into a vector\n",
    "y = y[np.nonzero(y)] # remove the data before the first detected case\n",
    "x = range(len(y))    # for model-fitting and plotting\n",
    "print(len(y),y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Text(0.5, 0, 'Days since first detected case')"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.plot(x,y, linestyle='-', color='r')\n",
    "plt.title('\"Confirmed\" COVID-19 Cases in Denmark')\n",
    "plt.xlabel('Days since first detected case')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = 5792202 # estimated population of DK in 2020\n",
    "i_0 = y[0]/N # initial prevalence\n",
    "x = np.array(range(len(y)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([2.11506788e-02, 4.12605498e-16])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# fit the model by nonlinear least squares\n",
    "popt, pvoc = curve_fit(f, x, y, p0=[1, 0.5], bounds = (0, [500, 1]), maxfev=10000)\n",
    "popt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "S, I, R, C = sir_model(N, i_0, beta=popt[0], gamma=popt[1], delta=0, steps=len(y))\n",
    "times = list(range(len(y)))\n",
    "fig, ax = plt.subplots(nrows=1, ncols=1)\n",
    "ax.plot(times, y, linestyle='-', color='r', label='Data')\n",
    "ax.plot(times, C, linestyle='-.', color='k', label='SIR Model')\n",
    "ax.legend(loc='best')\n",
    "plt.xlabel('Days since first detected case')\n",
    "plt.title('\"Confirmed\" COVID-19 Infections in Denmark')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What happens if we run the model into the future?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "S, I, R, C = plot_sir(N, i_0, beta=popt[0], gamma=popt[1], delta=0, steps=1000, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The total number of infections stabilizes, and the pandemic abates. The effective reproductive number $R_0$ is less than 1.\n",
    "\n",
    "Let's do a quick sanity check.\n",
    "Recall that $R_0 = \\beta/\\gamma$ for the SIR model _initially_, when the number infected, recovered, or dead are negligible compared to the total population.\n",
    "Once some of the population has recovered, some of the people an infectious person encounters wiil be people who have recovered and not susceptible to reinfection (according to the model).\n",
    "To first order, $\\beta$ is effectively reduced by $S/N$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "51261262636618.664"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# R_0 at the time the first infection is detected\n",
    "popt[0]/popt[1] # greater than 1. According to the model, the infection will spread if R_0 > 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "219047240451.27252"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# After 1000 timesteps, R_0 is different \n",
    "(S[-1]/N)*popt[0]/popt[1] # less than 1: According to the model, the infection will abate."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fitting curves to the cumulative incidence function \n",
    "\n",
    "The SIR model and its variants attempt to capture the dynamics of transmission and recovery,\n",
    "For some parameter choices, that leads to a sigmoidal shape for the cumulative incidence of infections.\n",
    "\n",
    "Some pandemic predictions simply fit a sigmoidal function to the cumulative number of infections, an approach\n",
    "that dates back to William Farr in the 1800s. \n",
    "King et al. (2015) showed that this approach does not work well for AIDS or Ebola.\n",
    "Nonetheless, this is the approach the IHME takes: they fit a scaled and shifted Gaussian cdf to \n",
    "cumulative incidence.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sigmoidal Models\n",
    "\n",
    "\n",
    "Common sigmoidal models include CDFs of unimodal distributions (such as the Gaussian cdf $\\Phi(x)$) and the sigmoid function $1/(e^{-x} + 1)$, the inverse of the logistic function.\n",
    "\n",
    "To allow this function to be shifted and scaled, we can introduce the family of curves\n",
    "\n",
    "\\begin{equation}\n",
    "\\sigma(x; a, b, c) \\equiv \\frac{c}{e^{-b(x-a)}+1}.\n",
    "\\end{equation}\n",
    "\n",
    "The parameter $a$ controls where the function crosses $c/2$; $b$ controls how rapidly it increases, and $c$ is its asymptotic limit.\n",
    "\n",
    "Let's fit this to the Danish data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "def g(x, a, b, c):\n",
    "    '''\n",
    "    Wrapper for the sigmoid to pass to the curve-fitting function\n",
    "    '''    \n",
    "    return c/(np.exp(-b*(x-a))+1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x7f98b0dde0a0>]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# test g visually.\n",
    "z = np.array(range(-8,15,1))\n",
    "plt.plot(z, g(z,3,1/2,5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5.00000000e+02, 1.11921669e-02, 1.16292975e+06])"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# fit the DK data\n",
    "popts, pvocs = curve_fit(g, x, y, p0=[len(x)/2, 0.02, np.max(y)/2], bounds = ([0, 0, 0], [500, 500, N]), maxfev=10000)\n",
    "popts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# plot\n",
    "Cs = g(x, popts[0], popts[1], popts[2])\n",
    "times = list(range(len(y)))\n",
    "fig, ax = plt.subplots(nrows=1, ncols=1)\n",
    "ax.plot(times, y, linestyle='-', color='r', label='Data')\n",
    "ax.plot(times, C[:len(times)], linestyle='-.', color='k', label='SIR Model')\n",
    "ax.plot(times, Cs, linestyle=':', color='b', label='Sigmoid Model')\n",
    "ax.legend(loc='best')\n",
    "plt.xlabel('Days since first detected case')\n",
    "plt.title('\"Confirmed\" COVID-19 Infections in Denmark')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Both models agree reasonably well with past data. But they have rather different predictions about the future.\n",
    "\n",
    "Let's look at how the predictions vary over time, as the dataset grows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_predictions(y, n, points):\n",
    "    '''\n",
    "    plot SIR and sigmoid models fitted to the first n data\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    y : list of floats\n",
    "        number of infections at each time point\n",
    "    n : int\n",
    "        fit the models to the first n elements of y\n",
    "    points : int\n",
    "        number of time points for which to plot the prediction\n",
    "        \n",
    "    Returns\n",
    "    -------\n",
    "    no return value\n",
    "    '''\n",
    "    # fit the two models\n",
    "    x = list(range(len(y)))\n",
    "    popt, pvoc = curve_fit(f, x[:n], y[:n], p0=[1, 0.5], bounds = (0, [500, 1]),\n",
    "                           maxfev=10000)\n",
    "    popts, pvocs = curve_fit(g, x[:n], y[:n], p0=[len(x)/2, 0.02, np.max(y)/2],\n",
    "                             bounds = ([0, 0, 0], [500, 500, N]), maxfev=10000)\n",
    "    times = list(range(points))\n",
    "    S, I, R, C = sir_model(N, i_0, beta=popt[0], gamma=popt[1], delta=0, steps=points)\n",
    "    Cs = g(times, popts[0], popts[1], popts[2])\n",
    "    fig, ax = plt.subplots(nrows=1, ncols=1)\n",
    "    ax.plot(x, y, linestyle='-', color='r', label='Data')\n",
    "    ax.plot(times, C, linestyle='-.', color='k', label='SIR Model')\n",
    "    ax.plot(times, Cs, linestyle=':', color='b', label='Sigmoid Model')\n",
    "    ax.vlines(x[n-1], 0, np.max([C, Cs]), label='truncation time')\n",
    "    ax.legend(loc='best')\n",
    "    plt.xlabel('Days since first detected case')\n",
    "    plt.title('\"Confirmed\" COVID-19 Infections in Denmark')\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "30e27a949c394f4ca345aa3ddebe4b93",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "interactive(children=(IntSlider(value=100, description='n', max=746, min=20), Output()), _dom_classes=('widget…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "<function __main__.plot_predictions(y, n, points)>"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "interact(plot_predictions, y=fixed(y),\n",
    "                           n = widgets.IntSlider(min=20, max=len(y), value=100),\n",
    "                           points=fixed(2*len(y)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The predictions depend on the model and details of the data. \n",
    "A few additional days of data can drastically change the predictions. \n",
    "For some end dates, the two models have nearly identical predictions; for others, \n",
    "they differ by a factor of 20 or more.\n",
    "\n",
    "Unlike the SIR model, the sigmoid model does not \"know\" how many people are in the population, so even if recovery conveys immunity, the cumulative number of infections in the sigmoid model can exceed the size of the population.\n",
    "\n",
    "This mindless \"curve fitting\" approach is not a reliable basis for predicting the course of the pandemic, predicting ICU demand, predicting the effect of an intervention, allocating resources, or setting public policy.\n",
    "It is a mechanical application of models and algorithms to data, not science.\n",
    "\n",
    "Unsurprisingly, models of this type do not predict infections well in practice. \n",
    "The [most-cited predictions](https://covid19.healthdata.org/global?view=total-deaths&tab=trend) have been those promulgated by the [Institute for Health Metrics and Evaluation](http://www.healthdata.org/) (IHME) at the University of Washington.\n",
    "They say, \n",
    "\n",
    ">Our model is designed to be a planning tool for government officials who need to know how different policy decisions can radically alter the trajectory of COVID-19 for better or worse.\n",
    "\n",
    ">Our model aimed at helping hospital administrators and government officials understand when demand on health system resources will be greatest. (http://www.healthdata.org/covid/faqs#differences%20in%20modeling, last visited 23 January 2021)\n",
    "\n",
    "IHME's model is described [here](https://static-content.springer.com/esm/art%3A10.1038%2Fs41591-020-1132-9/MediaObjects/41591_2020_1132_MOESM1_ESM.pdf).\n",
    "Early in the pandemic, IHME predicted that there would be about 60,000 COVID-19 deaths in ths U.S.\n",
    "As of 18 February 2021, there have been [more than 490,000](https://coronavirus.jhu.edu/us-map).\n",
    "Between March and August, 2020, next-day deaths were within the IHME 95% prediction interval only about \n",
    "30% of the time, despite revisions of the model https://arxiv.org/abs/2004.04734.\n",
    "(We shall not delve into how IHME produces its prediction intervals.)\n",
    "See also https://www.vox.com/future-perfect/2020/5/2/21241261/coronavirus-modeling-us-deaths-ihme-pandemic.\n",
    "\n",
    "One feature of the sigmoidal models is that they predict rapid declines after the peak (the decline is just as rapid as the rise), a feature that\n",
    "some politicians like. \n",
    "That feature is built into the sigmoidal curve: it is an _assumption_, not something that comes from the data.\n",
    "\n",
    "There are also more complicated models of pandemics, including models with more \"compartments,\" models based on other sigmoidal functions (such as the Gaussian CDF), and models that allow various kinds of heterogeneity within the population and simulate interactions between individuals.\n",
    "Some have proved more accurate than others.\n",
    "Just how poorly some models perform is obscured by the practice of revising the model and projections frequently--even daily. \n",
    "\n",
    "Sometimes, a moddle with little substantive foundation can nonetheless predict the behavior of a system, at least if the system is not changing quickly. \n",
    "Accurate predictions of the effect of interventions (masks, lockdowns, vaccination, and so on) require a close tie between the world and the model: the model needs a substantive basis. \n",
    "For causal inferences, the model needs to be a _generative model_ or _response schedule_.\n",
    "\n",
    "George Box famously wrote, \"all models are wrong, but some are useful.\"\n",
    "While that quotation is often invoked as a general license to throw models at data, it should instead \n",
    "prompt questions:\n",
    "\n",
    "* useful for what?\n",
    "* how can you tell whether a model is useful for a particular goal?\n",
    "\n",
    "In the same paper, Box wrote, \"It is inappropriate to be concerned with mice when there are tigers abroad.\"\n",
    "Most of the fretting people do about statistical models is at the level of mice (e.g., asymptotics under unrealistic assumptions), while the problems with most \n",
    "statistical models are tigers: the models have little or no connection to the actual scientific question."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## New variants\n",
    "\n",
    "As of this writing (10 February 2021), new strains of COVID-19 have been detected in the UK, South Africa, and the US, where cases of the new strain have a doubling time of about 10 days. The UK strain is estimated to be 50% to 70% more transmissible than the original virus. There is mounting evidence that the strain is also more lethal.\n",
    "https://www.wsj.com/articles/new-u-k-covid-19-variant-could-be-more-deadly-british-officials-say-11611338370 (last visited 22 January 2021).\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}