{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using Bayesian Modelling to Predict the Number of COVID19 in Brazil" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Abstract**. One of the main challenges in the predicting the number contaminated for a large country is the discrepancies between each state. Consequently, the number of contaminated people in the data set is uncertain. For this reason, I decided to create a simple model that predicts the number of confirmed cases by state using Bayesian modelling. One of the benefits of this model is to obtain an estimate of the daily growth rate. This work is based on the notebook [2]." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Import" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Modules" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2020-04-02T16:09:47.831763Z", "iopub.status.busy": "2020-04-02T16:09:47.831129Z", "iopub.status.idle": "2020-04-02T16:09:50.072181Z", "shell.execute_reply": "2020-04-02T16:09:50.070893Z" }, "papermill": { "duration": 2.25873, "end_time": "2020-04-02T16:09:50.072328", "exception": false, "start_time": "2020-04-02T16:09:47.813598", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING (theano.configdefaults): install mkl with `conda install mkl-service`: No module named 'mkl'\n" ] } ], "source": [ "#hide\n", "%load_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline\n", "import numpy as np\n", "from IPython.display import display, Markdown\n", "import matplotlib.pyplot as plt\n", "import matplotlib\n", "import pandas as pd\n", "import seaborn as sns\n", "import arviz as az\n", "import pymc3 as pm\n", "import altair as alt\n", "import dask.dataframe as dd\n", "import sys\n", "from pathlib import Path\n", "from itertools import product\n", "from pprint import pprint\n", "import requests\n", "sns.set_context('talk')\n", "plt.style.use('seaborn-whitegrid')\n", "\n", "## Set this to true to see legacy charts\n", "debug=False" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" } }, "source": [ "Setup paths" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "PROJECT_ROOT = Path.cwd().parent\n", "PATH_DATA = PROJECT_ROOT / \"data\" / \"csv\"\n", "sys.path.append(str(PROJECT_ROOT))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data that I am using is provided by `brasil.io` a Brazilian open data initiative [1]." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "raw_data = pd.read_csv(str(PATH_DATA / \"covid19.csv\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For my analysis, the relevant columns are:\n", "\n", "* Date\n", "* State\n", "* Confirmed: which is the number of confirmed cases on a given day" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Auditing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is contained in the dataset?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Dictionary" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "data_dict = {x: x for x in list(raw_data.dtypes.index)}" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "data_dict[\"confirmed\"] = \"Number of Confirmed Cases\"\n", "data_dict[\"is_last\"] = \"Latest Update\"\n", "data_dict[\"confirmed_per_100k_inhabitants\"] = \"Number of Confirmed Cases per 100k Inhabitants\"" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['city',\n", " 'city_ibge_code',\n", " 'date',\n", " 'estimated_population_2019',\n", " 'is_repeated',\n", " 'is_last',\n", " 'last_available_confirmed',\n", " 'last_available_confirmed_per_100k_inhabitants',\n", " 'last_available_date',\n", " 'last_available_death_rate',\n", " 'last_available_deaths',\n", " 'place_type',\n", " 'state',\n", " 'new_confirmed',\n", " 'new_deaths',\n", " 'confirmed',\n", " 'confirmed_per_100k_inhabitants']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(data_dict.keys())" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Description | \n", "
|---|---|
| city | \n", "city | \n", "
| city_ibge_code | \n", "city_ibge_code | \n", "
| date | \n", "date | \n", "
| estimated_population_2019 | \n", "estimated_population_2019 | \n", "
| is_repeated | \n", "is_repeated | \n", "
| is_last | \n", "Latest Update | \n", "
| last_available_confirmed | \n", "last_available_confirmed | \n", "
| last_available_confirmed_per_100k_inhabitants | \n", "last_available_confirmed_per_100k_inhabitants | \n", "
| last_available_date | \n", "last_available_date | \n", "
| last_available_death_rate | \n", "last_available_death_rate | \n", "
| last_available_deaths | \n", "last_available_deaths | \n", "
| place_type | \n", "place_type | \n", "
| state | \n", "state | \n", "
| new_confirmed | \n", "new_confirmed | \n", "
| new_deaths | \n", "new_deaths | \n", "
| confirmed | \n", "Number of Confirmed Cases | \n", "
| confirmed_per_100k_inhabitants | \n", "Number of Confirmed Cases per 100k Inhabitants | \n", "