{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The impact of social distancing on COVID-19 growth in the US - an interactive notebook\n", "\n", "Do social distancing measures - now in effect in most of the US - work? This notebook uses a mobility index based on aggregated phone location data as a proxy for how well social distancing is practiced. It plots the mean daily growth in COVID-19 cases in a county against the mean mobility index in that county at an earlier time.\n", "\n", "It's interactive, scroll to the bottom to play around with the parameters that figure into the plot.\n", "\n", "[More background](https://blog.nikhaldimann.com/2020/04/14/lockdown-hack-2-towards-proving-that-social-distancing-works/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Attribution\n", "- US county-level mobility statistics from Descartes Labs (https://github.com/descarteslabs/DL-COVID-19)\n", "- US county-level COVID-19 data from the New York Times (https://github.com/nytimes/covid-19-data)\n", "- Inspiration from Soucy et al (https://www.medrxiv.org/content/10.1101/2020.04.05.20054288v1) \n", "- Copyright 2020 Nik Haldimann (nhaldimann@gmail.com). This code is released under an MIT license." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load data from the continuously updated sources (New York Times and Descartes Labs). Run this cell to get the latest datasets." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "nyt_us_counties_url = \"https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv\"\n", "dl_us_mobility_url = \"https://raw.githubusercontent.com/descarteslabs/DL-COVID-19/master/DL-us-mobility-daterow.csv\"\n", "\n", "us_counties = pd.read_csv(\n", " nyt_us_counties_url,\n", " dtype={\"fips\": str},\n", " parse_dates=[\"date\"]\n", ").set_index(\"date\")\n", "mobility = pd.read_csv(\n", " dl_us_mobility_url,\n", " dtype={\"fips\": str},\n", " parse_dates=[\"date\"]\n", ").set_index(\"date\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Implement the meat of the logic, aggregating and joining the two datasets and plotting the result." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from datetime import timedelta\n", "from matplotlib.ticker import AutoMinorLocator, MultipleLocator\n", "from textwrap import wrap\n", "\n", "def aggregate(\n", " us_state=\"New York\",\n", " min_cases_threshold=20,\n", " pandemic_growth_timespan=(\"2020-03-30\", \"2020-04-05\"),\n", " mobility_timespan=(\"2020-03-16\", \"2020-03-22\")\n", "):\n", " # We select one earlier day from the NYT data so we can calculate the\n", " # growth for the first day in the given timespan.\n", " interval_start = pd.Timestamp(pandemic_growth_timespan[0]) - timedelta(days=1)\n", " us_counties_timespan = us_counties.loc[interval_start:pandemic_growth_timespan[1]]\n", " mobility_march_timespan = mobility.loc[slice(*mobility_timespan)]\n", "\n", " us_counties_for_state = us_counties_timespan[\n", " (us_counties_timespan[\"state\"] == us_state)\n", " ]\n", " mobility_for_state = mobility_march_timespan[\n", " (mobility_march_timespan[\"admin_level\"] == 2)\n", " & (mobility_march_timespan[\"admin1\"] == us_state)\n", " ]\n", "\n", " def mean_daily_growth(series):\n", " return np.mean(series / series.shift(1) - 1) * 100\n", "\n", " us_counties_mean_growth = us_counties_for_state[[\"fips\", \"cases\", \"county\"]] \\\n", " .groupby(\"fips\") \\\n", " .aggregate(\n", " max_cases=pd.NamedAgg(column=\"cases\", aggfunc=np.max),\n", " mean_daily_growth=pd.NamedAgg(column=\"cases\", aggfunc=mean_daily_growth),\n", " county=pd.NamedAgg(column=\"county\", aggfunc=lambda x: x[0])\n", " )\n", " mobility_mean = mobility_for_state.groupby(\"fips\").mean()\n", "\n", " return pd.merge(\n", " us_counties_mean_growth[\n", " us_counties_mean_growth[\"max_cases\"] >= min_cases_threshold\n", " ][[\"county\", \"mean_daily_growth\"]],\n", " mobility_mean[\"m50_index\"],\n", " on=\"fips\"\n", " )\n", "\n", "def aggregate_and_plot(\n", " us_state=\"New York\",\n", " min_cases_threshold=20,\n", " pandemic_growth_timespan=(\"2020-03-30\", \"2020-04-05\"),\n", " mobility_timespan=(\"2020-03-16\", \"2020-03-22\")\n", "):\n", " aggregated_data = aggregate(\n", " us_state=us_state,\n", " min_cases_threshold=min_cases_threshold,\n", " pandemic_growth_timespan=pandemic_growth_timespan,\n", " mobility_timespan=mobility_timespan\n", " )\n", " \n", " slope, intercept = np.polyfit(\n", " aggregated_data.m50_index,\n", " aggregated_data.mean_daily_growth,\n", " deg=1\n", " )\n", " x_range = aggregated_data.m50_index.max() - aggregated_data.m50_index.min()\n", " fit_x = np.linspace(\n", " aggregated_data.m50_index.min() - x_range * 0.05,\n", " aggregated_data.m50_index.max() + x_range * 0.05,\n", " 100\n", " )\n", " fit_y = slope * fit_x + intercept\n", "\n", " ax = aggregated_data.plot.scatter(\n", " x=\"m50_index\",\n", " y=\"mean_daily_growth\",\n", " s=40,\n", " c=\"#090\",\n", " figsize=(9, 9),\n", " zorder=1000\n", " )\n", " for index, row in aggregated_data.iterrows():\n", " ax.text(\n", " x=row[\"m50_index\"] + 0.2,\n", " y=row[\"mean_daily_growth\"] + 0.2,\n", " s=row[\"county\"],\n", " fontsize=10,\n", " c=\"#090\",\n", " zorder=1000\n", " )\n", " ax.plot(fit_x, fit_y, c=\"#000\")\n", " caption = (\n", " f\"The association between the mean DL mobility index \"\n", " f\"(from {mobility_timespan[0]} to {mobility_timespan[1]}) \"\n", " f\"and the mean daily growth rate \"\n", " f\"(from {pandemic_growth_timespan[0]} to {pandemic_growth_timespan[1]}) \"\n", " f\"in the 4th week of March 2020 in {us_state} state counties \"\n", " f\"with at least {min_cases_threshold} COVID-19 cases by {pandemic_growth_timespan[1]}\"\n", " )\n", " ax.set_title(\n", " \"\\n\".join(wrap(caption, 80)),\n", " y=-0.27, x=-0.08, loc=\"left\", fontweight=\"bold\", wrap=True\n", " )\n", " ax.set_ylabel(\"Mean daily growth rate (%)\")\n", " ax.set_xlabel(\"Mean DL mobility index (%)\")\n", " ax.xaxis.set_major_locator(MultipleLocator(10))\n", " ax.yaxis.set_major_locator(MultipleLocator(10))\n", " ax.xaxis.set_minor_locator(AutoMinorLocator(2))\n", " ax.yaxis.set_minor_locator(AutoMinorLocator(2))\n", " ax.grid(which='major', color='#eee', linewidth=2)\n", " ax.grid(which='minor', color='#eee')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define widgets that will be used for the interactive plot." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from ipywidgets import interact, Layout\n", "import ipywidgets as widgets\n", "\n", "us_states = sorted(set(us_counties[\"state\"].unique()) & set(mobility[\"admin1\"].unique()))\n", "pandemic_growth_timespan_options = [\n", " timestamp.strftime(\"%Y-%m-%d\")\n", " for timestamp in pd.date_range(us_counties.index.min(), us_counties.index.max())\n", "]\n", "mobility_index_timespan_options = [\n", " timestamp.strftime(\"%Y-%m-%d\")\n", " for timestamp in pd.date_range(mobility.index.min(), mobility.index.max())\n", "]\n", "styling = {\n", " 'layout': Layout(width='600px'),\n", " 'style': {'description_width': 'initial'}\n", "}\n", "\n", "us_state_widget = widgets.Dropdown(\n", " description=\"US state\",\n", " options=us_states,\n", " value=\"New York\",\n", " **styling\n", ")\n", "min_cases_threshold_widget = widgets.IntText(\n", " description=\"Min cases per county\",\n", " value=50,\n", " **styling\n", ")\n", "pandemic_growth_timespan_widget = widgets.SelectionRangeSlider(\n", " description=\"Pandemic growth timespan (y axis)\",\n", " options=pandemic_growth_timespan_options,\n", " value=(\"2020-03-30\", \"2020-04-05\"),\n", " continuous_update=False,\n", " **styling\n", ")\n", "mobility_timespan_widget = widgets.SelectionRangeSlider(\n", " description=\"Mobility index timespan (x axis)\",\n", " options=mobility_index_timespan_options,\n", " value=(\"2020-03-16\", \"2020-03-22\"),\n", " continuous_update=False,\n", " **styling\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot with input from interactive widgets. Use the widgets to update the plot in real time.\n", "\n", "(The plot will only show and the interaction will only work if you are looking at this in a real Jupyter runtime. The Github view on a notebook doesn't work, for example.)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "3b334d525d004d148ff8f16d636d9611", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(Dropdown(description='US state', index=31, layout=Layout(width='600px'), options=('Alaba…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "interact(\n", " aggregate_and_plot,\n", " us_state=us_state_widget,\n", " min_cases_threshold=min_cases_threshold_widget,\n", " pandemic_growth_timespan=pandemic_growth_timespan_widget,\n", " mobility_timespan=mobility_timespan_widget\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.1" } }, "nbformat": 4, "nbformat_minor": 4 }