{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "l4xnhqYTKaEK" }, "source": [ "# Dynamics of Disease Transmission and Human Behavior Project - Time Series Analysis\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Obg9u9ZrnEQG" }, "source": [ "\n", "## 1. Goal\n", "\n", "* Research question: how well can be predict the number of COVID-19 cases one day in advance or one week in advance?\n", "* Math question: given the historical series of COVID-19 cases, what is the best estimator of the cases one day ahead or one week ahead? What is the test-sample mean squared error (MSE) of each estimate?\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Fpgfm6HknHBJ" }, "source": [ "## 2. Dataset\n", "\n", "The dataset contains confirmed COVID-19 cases for 50 states. The COVID-19 cases are reported on the John Hopkins platform. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "aIb9NfsfL4Vg" }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.metrics import mean_squared_error\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ZLueIIW_J5hf", "outputId": "ab4f733c-5dfb-4eda-ec6b-3a6df7f72989" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n" ] } ], "source": [ "from google.colab import drive\n", "drive.mount(\"/content/drive\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "8fPMjUcIKGfY" }, "outputs": [], "source": [ "california = pd.read_csv(\"./drive/MyDrive/CS109 Project/state_level_data/California.csv\",parse_dates=['date'],index_col=['date'])\n", "new_york = pd.read_csv(\"./drive/MyDrive/CS109 Project/state_level_data/New York.csv\",parse_dates=['date'],index_col=['date'])\n", "ma = pd.read_csv(\"./drive/MyDrive/CS109 Project/state_level_data/Massachusetts.csv\",parse_dates=['date'],index_col=['date'])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 487 }, "id": "sfQlKmIrPRVc", "outputId": "088b7bed-5055-4a7f-850f-4da286640a6c" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " | JHU_cases | \n", "JHU_deaths | \n", "JHU_hospitalizations | \n", "up2date | \n", "gt_after covid vaccine | \n", "gt_side effects of vaccine | \n", "gt_effects of covid vaccine | \n", "gt_covid | \n", "gt_how long does covid last | \n", "gt_anosmia | \n", "... | \n", "neighbor_South Dakota | \n", "neighbor_Tennessee | \n", "neighbor_Texas | \n", "neighbor_Utah | \n", "neighbor_Vermont | \n", "neighbor_Virginia | \n", "neighbor_Washington | \n", "neighbor_West Virginia | \n", "neighbor_Wisconsin | \n", "neighbor_Wyoming | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
2020-01-01 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2020-01-02 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2020-01-03 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2020-01-04 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "108.406849 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2020-01-05 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.000000 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
5 rows × 498 columns
\n", "\n", " | JHU_cases | \n", "JHU_deaths | \n", "JHU_hospitalizations | \n", "up2date | \n", "gt_after covid vaccine | \n", "gt_side effects of vaccine | \n", "gt_effects of covid vaccine | \n", "gt_covid | \n", "gt_how long does covid last | \n", "gt_anosmia | \n", "... | \n", "neighbor_South Dakota | \n", "neighbor_Tennessee | \n", "neighbor_Texas | \n", "neighbor_Utah | \n", "neighbor_Vermont | \n", "neighbor_Virginia | \n", "neighbor_Washington | \n", "neighbor_West Virginia | \n", "neighbor_Wisconsin | \n", "neighbor_Wyoming | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
2022-01-09 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1036.271881 | \n", "79.980083 | \n", "0.000000 | \n", "191709.747309 | \n", "880.874506 | \n", "0.0 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2022-01-10 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "501.666380 | \n", "359.532743 | \n", "215.179826 | \n", "222023.793840 | \n", "719.959308 | \n", "0.0 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2022-01-11 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "737.261085 | \n", "221.918779 | \n", "73.787815 | \n", "224041.563172 | \n", "1185.038030 | \n", "0.0 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2022-01-12 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "978.821343 | \n", "151.092032 | \n", "150.713937 | \n", "204037.539213 | \n", "453.839532 | \n", "0.0 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2022-01-13 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "984.362561 | \n", "75.973690 | \n", "0.000000 | \n", "189624.711109 | \n", "1141.021922 | \n", "0.0 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
5 rows × 498 columns
\n", "\n", " | JHU_cases | \n", "JHU_deaths | \n", "JHU_hospitalizations | \n", "up2date | \n", "gt_after covid vaccine | \n", "gt_side effects of vaccine | \n", "gt_effects of covid vaccine | \n", "gt_covid | \n", "gt_how long does covid last | \n", "gt_anosmia | \n", "... | \n", "neighbor_South Dakota | \n", "neighbor_Tennessee | \n", "neighbor_Texas | \n", "neighbor_Utah | \n", "neighbor_Vermont | \n", "neighbor_Virginia | \n", "neighbor_Washington | \n", "neighbor_West Virginia | \n", "neighbor_Wisconsin | \n", "neighbor_Wyoming | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | \n", "718.000000 | \n", "718.000000 | \n", "544.000000 | \n", "416.000000 | \n", "739.000000 | \n", "739.000000 | \n", "739.000000 | \n", "739.000000 | \n", "739.000000 | \n", "739.000000 | \n", "... | \n", "718.000000 | \n", "718.000000 | \n", "718.000000 | \n", "718.000000 | \n", "718.000000 | \n", "718.000000 | \n", "718.000000 | \n", "718.000000 | \n", "718.000000 | \n", "718.000000 | \n", "
mean | \n", "8408.796657 | \n", "107.302228 | \n", "654.398897 | \n", "266.588302 | \n", "1031.942497 | \n", "267.194675 | \n", "185.924657 | \n", "90028.915062 | \n", "261.722963 | \n", "29.050446 | \n", "... | \n", "263.803621 | \n", "2037.598886 | \n", "6963.263231 | \n", "949.910864 | \n", "106.435933 | \n", "1700.607242 | \n", "1296.756267 | \n", "495.643454 | \n", "1651.902507 | \n", "166.931755 | \n", "
std | \n", "12651.020881 | \n", "138.156802 | \n", "557.909209 | \n", "194.928115 | \n", "1390.723867 | \n", "345.901801 | \n", "257.298072 | \n", "48033.048924 | \n", "261.124650 | \n", "55.944590 | \n", "... | \n", "377.790566 | \n", "3500.653803 | \n", "9728.762811 | \n", "1276.364174 | \n", "238.298140 | \n", "2672.792276 | \n", "2221.208234 | \n", "696.125703 | \n", "2286.520080 | \n", "248.876616 | \n", "
min | \n", "-3935.000000 | \n", "-364.000000 | \n", "116.000000 | \n", "0.305869 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "... | \n", "-4.000000 | \n", "-42.000000 | \n", "0.000000 | \n", "0.000000 | \n", "-1.000000 | \n", "-232.000000 | \n", "-39.000000 | \n", "-4.000000 | \n", "0.000000 | \n", "-8.000000 | \n", "
25% | \n", "1808.000000 | \n", "25.000000 | \n", "321.000000 | \n", "148.166794 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "59592.925370 | \n", "85.080935 | \n", "0.000000 | \n", "... | \n", "8.000000 | \n", "80.250000 | \n", "1405.250000 | \n", "164.000000 | \n", "4.000000 | \n", "338.500000 | \n", "179.000000 | \n", "37.250000 | \n", "170.000000 | \n", "4.000000 | \n", "
50% | \n", "4261.500000 | \n", "71.000000 | \n", "432.500000 | \n", "218.744994 | \n", "452.973872 | \n", "106.062503 | \n", "76.018212 | \n", "85254.949587 | \n", "202.678864 | \n", "0.000000 | \n", "... | \n", "103.000000 | \n", "1138.500000 | \n", "4353.500000 | \n", "469.500000 | \n", "28.500000 | \n", "985.000000 | \n", "662.000000 | \n", "245.000000 | \n", "722.500000 | \n", "59.000000 | \n", "
75% | \n", "9263.500000 | \n", "126.000000 | \n", "767.250000 | \n", "372.607233 | \n", "1545.638911 | \n", "433.671560 | \n", "303.376619 | \n", "119532.223229 | \n", "343.857587 | \n", "74.962550 | \n", "... | \n", "391.750000 | \n", "2448.250000 | \n", "9097.250000 | \n", "1342.250000 | \n", "133.000000 | \n", "1947.500000 | \n", "1644.750000 | \n", "800.500000 | \n", "2457.750000 | \n", "214.000000 | \n", "
max | \n", "133669.000000 | \n", "1174.000000 | \n", "2580.000000 | \n", "1038.286656 | \n", "7638.661166 | \n", "2015.938064 | \n", "1288.312759 | \n", "279350.860227 | \n", "1762.854735 | \n", "489.629451 | \n", "... | \n", "3047.000000 | \n", "41464.000000 | \n", "162871.000000 | \n", "14754.000000 | \n", "2779.000000 | \n", "40246.000000 | \n", "33069.000000 | \n", "9164.000000 | \n", "16956.000000 | \n", "1541.000000 | \n", "
8 rows × 498 columns
\n", "