{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Causal Analysis of *How is the implementation of existing strategies affecting the rates of COVID-19 infection?*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*(Better displayed in [nbviewer](https://nbviewer.jupyter.org/) as red warnings in font tag may not be displayed on github)*\n", "\n", "We believe that a question such as *How is the implementation of existing strategies affecting the rates of COVID-19 infection?* requires a proper causal analysis of the data.\n", "\n", "Such question requires considering what the *causal effect* of certain strategies may be, and an evaluation of *counterfactual effects*.\n", "\n", "To answer this question we rely on the formalism of [*causal models*](http://bayes.cs.ucla.edu/BOOK-2K/) and the [*dowhy*](https://github.com/microsoft/dowhy) library.\n", "\n", "\n", "### Disclaimer\n", "In this notebook we setup a **preliminary template** for a causal analysis of this question. In particular, we setup a simple model that tries to estimate the **average causal effect** of a chosen policy.\n", "\n", "**This is a work in progress.** Publishing this notebook has the sole aim to share an initial analysis and draws suggestions and critiques for improvement.\n", "\n", "In no way, at the moment, these results should be taken to be significant. The data considered are limited, the model simplistic and debatable. **NO conclusions on real world policies should be drawn from this notebook**. \n", "\n", "Limitations will be highlighted in the notebook in RED (coloured fonts may not be displayed on github)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Setup and testing of dowhy\n", "\n", "The aim of this notebook is to (ii) gather some data from relevant sources, (ii) set up a mock causal model, and (iii) run a standard causal analysis (identification, estimation, refutation) using *dowhy*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Data Part\n", "In this part we collect the data that will be used in the analysis.\n", "\n", "We will take into consideration four *variables* for different countries over different days:\n", "- *deaths*: number of reported deaths from covid19\n", "- *confirmed*: number of confirmed cases of infection from covid19\n", "- *treatment*/*measure*/*policy*: the implementation of a specific policy\n", "- *hospital beds*: number of hospital beds for 1000 citizens\n", "\n", "**The selection of these variables is arbitrary and should be enriched.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Importing libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import dowhy as dy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading data\n", "\n", "We will use three data sets:\n", "- **HDE**/HDE/acaps-covid-19-government-measures-dataset: this dataset provides information on the adoption of different measures to contain covid19.\n", "- **johns_hopkins**/johns-hopkins-covid-19-daily-dashboard-cases-over-time: this dataset provides information on the effects of covid19.\n", "- **world_bank**/: these datasets provide background information about a country." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "HDE_df = pd.read_csv('HDE/acaps-covid-19-government-measures-dataset.csv')\n", "JHU_df = pd.read_csv('johns_hopkins/johns-hopkins-covid-19-daily-dashboard-cases-over-time.csv')\n", "WB_df = pd.read_csv('world_bank/hospital-beds-per-1-000-people.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Finding the common countries\n", "\n", "We perform a little bit of manual alignment on the countries contained in the two datasets.\n", "\n", "**This alignment is incomplete**" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def replace_HDE(original,replacement):\n", " HDE_df.loc[HDE_df['country']==original, 'country'] = replacement\n", "\n", "replace_HDE('United States of America','US')\n", "replace_HDE('Czech Republic','Czechia')\n", "replace_HDE('Korea Republic of','Korea, South')\n", "replace_HDE('North Macedonia Republic Of','North Macedonia') \n", "replace_HDE('kenya','Kenya') \n", "replace_HDE('Viet Nam','Vietnam') \n", "replace_HDE('Russian Federation','Russia') " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "countriesHDE = set(HDE_df['country'].unique())\n", "countriesJHU = set(JHU_df['country_region'].unique())\n", "countries = list(countriesHDE.intersection(countriesJHU))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Finding the date range\n", "\n", "We convert the dates in the HDE dataset in datetimes, and we extract the range of dates from the JHU dataset.\n", "\n", "**Some dates in the HDE dataset are inconsistent** (hence the *errors='coerce'* option)\n", "\n", "**We rely on the JHU providing the same range of data for all countries**" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "HDE_df['date_implemented'] = pd.to_datetime(HDE_df['date_implemented'],format=\"%m-%d-%y\",errors='coerce')\n", "\n", "dates = JHU_df.loc[JHU_df['country_region']==countries[0],'last_update'].values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Defining the measure/policy of interest\n", "\n", "We define a measure/policy of interest, which will be *treatment* of interest in the causal analysis. Use *list(HDE_df['measure'].unique())* to list the 32 measures tracked in the HDE dataset." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "treatment = 'Introduction of quarantine policies'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating a new dataframe\n", "\n", "We create a new dataframe, cast the dates in datetime, and add a new binary column that will track when the chosen policy of interest is activated." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "df = JHU_df.copy()\n", "df['last_update'] = pd.to_datetime(df['last_update'])\n", "df[treatment] = 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We set the new column/field as active (=1) from the day **after** the policy has been called for by a government." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Country Switzerland has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Djibouti has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Bhutan has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Trinidad and Tobago has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country US has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Rwanda has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Mongolia has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Libya has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Tanzania has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Pakistan has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Burkina Faso has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Turkey has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Zimbabwe has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Dominica has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Venezuela has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Morocco has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Luxembourg has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Maldives has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Singapore has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Ireland has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Lithuania has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Portugal has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Bolivia has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Kazakhstan has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Indonesia has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Antigua and Barbuda has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Papua New Guinea has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Slovenia has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Mexico has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Saudi Arabia has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Togo has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Timor-Leste has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Eswatini has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Eritrea has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Senegal has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Guinea-Bissau has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Botswana has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Zambia has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Denmark has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Nicaragua has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Sweden has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Japan has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Liechtenstein has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Kuwait has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Cambodia has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Belgium has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Sudan has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "Country Paraguay has not implemented policy Introduction of quarantine policies to date 2020-03-30\n", "\n", "113 countries have implemented policy Introduction of quarantine policies to date 2020-03-30\n" ] } ], "source": [ "ncountries_treatment = 0\n", "for i in countries:\n", " try:\n", " activationdate = HDE_df.loc[(HDE_df['country']==i) & (HDE_df['measure']==treatment), 'date_implemented'].values[0]\n", " df.loc[(df['country_region']==i) & (df['last_update'] > activationdate), treatment ] = 1\n", " ncountries_treatment += 1\n", " except IndexError:\n", " print('Country {0} has not implemented policy {1} to date {2}'.format(i,treatment,dates[-1]))\n", " \n", "print('\\n{0} countries have implemented policy {1} to date {2}'.format(ncountries_treatment,treatment,dates[-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Retrieving further data\n", "We extract more background information, specifically considering what could be *confounders* between our treatment and effect.\n", "In this notebook we take the last recorded value of *hospital beds per 1000 people* from the *world_bank* data set." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Country Egypt has no reported hospital beds between 1960-2019\n", "Country Saint Lucia has no reported hospital beds between 1960-2019\n", "Country US has no reported hospital beds between 1960-2019\n", "Country Bahamas has no reported hospital beds between 1960-2019\n", "Country Kyrgyzstan has no reported hospital beds between 1960-2019\n", "Country Syria has no reported hospital beds between 1960-2019\n", "Country Korea, South has no reported hospital beds between 1960-2019\n", "Country Venezuela has no reported hospital beds between 1960-2019\n", "Country Iran has no reported hospital beds between 1960-2019\n", "Country Czechia has no reported hospital beds between 1960-2019\n", "Country Saint Vincent and the Grenadines has no reported hospital beds between 1960-2019\n", "Country Russia has no reported hospital beds between 1960-2019\n", "Country Slovakia has no reported hospital beds between 1960-2019\n", "Country Liechtenstein has no reported hospital beds between 1960-2019\n", "Country Gambia has no reported hospital beds between 1960-2019\n", "\n", "146 countries have reported hospital beds between 1960-2019\n" ] } ], "source": [ "ncountries_confounder = 0\n", "\n", "confounder = 'hospital beds'\n", "df[confounder] = np.nan\n", "for i in countries:\n", " df1 = WB_df.loc[WB_df['country_name']==i]\n", " beds_nan = df1.iloc[:,4:].values\n", " try:\n", " beds = beds_nan[np.isfinite(beds_nan)][-1]\n", " df.loc[(df['country_region']==i), confounder] = beds\n", " ncountries_confounder += 1\n", " except:\n", " print('Country {0} has no reported {1} between 1960-2019'.format(i,confounder))\n", " \n", "print('\\n{0} countries have reported {1} between 1960-2019'.format(ncountries_confounder,confounder))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusion\n", "\n", "In this part we collected data that we will be fed in our model.\n", "\n", "**Richer and more relevant data should be collected**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Model Part\n", "In this section we rely on the *dowhy* library to build a causal model.\n", "\n", "**This sort of data implicitly requires a time-dependent analysis, but we will analyze data statically on a per-day basis.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preparing the data for the causal model\n", "We extract the data that is relevant for our analysis and we drop lines containing *nan*.\n", "\n", "**Missing values may be dealt better than just dropping**" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dropped 2208 rows with nan values\n" ] } ], "source": [ "data = pd.DataFrame(data=\n", " {'confirmed': df['confirmed'],\n", " 'deaths': df['confirmed'],\n", " 'hospital beds': df[confounder],\n", " 'Introduction of quarantine policies': df[treatment].astype('bool'),\n", " }\n", " )\n", "rows1 = data.shape[0]\n", "data = data.dropna()\n", "print('Dropped {0} rows with nan values'.format(rows1 - data.shape[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting up a Causal Model\n", "\n", "We now set up a causal model as illustrated by the graph below." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_graph:If this is observed data (not from a randomized experiment), there might always be missing confounders. Adding a node named \"Unobserved Confounders\" to reflect this.\n", "INFO:dowhy.causal_model:Model to find the causal effect of treatment ['Introduction of quarantine policies'] on outcome ['deaths']\n" ] } ], "source": [ "model= dy.CausalModel(\n", " data = data,\n", " treatment = treatment,\n", " outcome = 'deaths',\n", " graph = \"\"\"graph[directed 1 node[id \"confirmed\" label \"confirmed\"]\n", " node[id \"deaths\" label \"deaths\"]\n", " node[id \"Introduction of quarantine policies\" label \"Introduction of quarantine policies\"]\n", " node[id \"hospital beds\" label \"hospital beds\"]\n", " edge[source \"confirmed\" target \"deaths\"]\n", " edge[source \"hospital beds\" target \"deaths\"]\n", " edge[source \"Introduction of quarantine policies\" target \"deaths\"]\n", " edge[source \"hospital beds\" target \"Introduction of quarantine policies\"]\n", " edge[source \"Introduction of quarantine policies\" target \"confirmed\"]]\n", " \"\"\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import Image, display\n", "model.view_model()\n", "display(Image(filename=\"causal_model.png\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This causal model is arbitrary, and proposed mainly as an example.\n", "\n", "We asssume that the number of deaths is influenced by the number of already confirmed cases, the availability of hospital beds (the number of beds is taken as a proxy measure of the availability and quality of healthcare), by adoption of the policy, and other unobserved confounder; the number of confirmed cases is taken to depend on the adoption of the chosen policy; the introduction of the chosen policy is affected again by the quality of healthcare expressed by the number of beds pro capite and by other unobserved confounders.\n", "\n", "We take as **outcome** variable the number of deaths, as **treatment** variable the specific policy under study, as **confounder** the number of hospital beds (plus potentially unobserved confounders). The number of confirmed cases constitutes a **mediator**.\n", "\n", "This model will allow us to answer (within this model) the question: *What is the effect of the chosen policy on the number of deaths caused by covid19?*\n", "\n", "Several questions may be asked about the meaningfulness of this model: does the choice of number of beds make sense as a confounder? Is the number of deaths affected only the mediator of confirmed cases? Is this model complete at all? These are valid questions, but we do not provide an answer as we lack domain knowledge. Instead we will study this model to provide an example of such analysis.\n", "\n", "**Proper modelling, using knowledge from epidemiology and sociology, should be done here.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Analysis\n", "\n", "We now go through the standard steps of causal analysis (after modelling): (i) *identification*, (ii) *estimation*, and (iii) *refutation*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Identification\n", "\n", "We evaluate the identifiability of the causal effect, that is whether we can evalute the causal effect of the policy under consideration on the outcome given the model we have specified." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['U', 'hospital beds']\n", "WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARN: Do you want to continue by ignoring any unobserved confounders? (use proceed_when_unidentifiable=True to disable this prompt) [y/n] y\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:[]\n" ] } ], "source": [ "identified_estimand = model.identify_effect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that *dowhy* warns us about the presence of unobserved confounders that may affect our analysis. We go on with this analysis even if we are obviously aware of the fact that our simple model ignores many confounders.\n", "\n", "**Proper modelling, using knowledge from epidemiology and sociology, is necessary to identify confounders.**" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Estimand type: nonparametric-ate\n", "### Estimand : 1\n", "Estimand name: backdoor\n", "Estimand expression:\n", " d \n", "──────────────────────────────────────(Expectation(deaths|hospital beds))\n", "d[Introduction of quarantine policies] \n", "Estimand assumption 1, Unconfoundedness: If U→{Introduction of quarantine policies} and U→deaths then P(deaths|Introduction of quarantine policies,hospital beds,U) = P(deaths|Introduction of quarantine policies,hospital beds)\n", "### Estimand : 2\n", "Estimand name: iv\n", "No such variable found!\n", "\n" ] } ], "source": [ "print(identified_estimand)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*dowhy* allowed us to discover an estimand based on the *backdoor* criterion to assess the causal effect of the treatment on the outcome; notice that *dowhy* reminds us that this estimand is based on an *unconfoundedness* assumption." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Estimation\n", "\n", "We run three methods for the estimation of the causal effect: a *propensity score matching* (by default, *dowhy* uses LogisticRegression for computing propensity scores and k-NN (k=1) to find a match), *propensity score stratification* (by default, *dowhy* sets up 50 clipped strata), and *inverse probability weighting*." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_estimator:INFO: Using Propensity Score Matching Estimator\n", "INFO:dowhy.causal_estimator:b: deaths~Introduction of quarantine policies+hospital beds\n", "/home/fmzennaro/miniconda2_1/envs/covid/lib/python3.7/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n", " y = column_or_1d(y, warn=True)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "*** Causal Estimate ***\n", "\n", "## Target estimand\n", "Estimand type: nonparametric-ate\n", "### Estimand : 1\n", "Estimand name: backdoor\n", "Estimand expression:\n", " d \n", "──────────────────────────────────────(Expectation(deaths|hospital beds))\n", "d[Introduction of quarantine policies] \n", "Estimand assumption 1, Unconfoundedness: If U→{Introduction of quarantine policies} and U→deaths then P(deaths|Introduction of quarantine policies,hospital beds,U) = P(deaths|Introduction of quarantine policies,hospital beds)\n", "### Estimand : 2\n", "Estimand name: iv\n", "No such variable found!\n", "\n", "## Realized estimand\n", "b: deaths~Introduction of quarantine policies+hospital beds\n", "## Estimate\n", "Value: 401.1329164185031\n", "\n" ] } ], "source": [ "estimate = model.estimate_effect(identified_estimand, method_name=\"backdoor.propensity_score_matching\")\n", "print(estimate)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_estimator:INFO: Using Propensity Score Stratification Estimator\n", "INFO:dowhy.causal_estimator:b: deaths~Introduction of quarantine policies+hospital beds\n", "/home/fmzennaro/miniconda2_1/envs/covid/lib/python3.7/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n", " y = column_or_1d(y, warn=True)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "*** Causal Estimate ***\n", "\n", "## Target estimand\n", "Estimand type: nonparametric-ate\n", "### Estimand : 1\n", "Estimand name: backdoor\n", "Estimand expression:\n", " d \n", "──────────────────────────────────────(Expectation(deaths|hospital beds))\n", "d[Introduction of quarantine policies] \n", "Estimand assumption 1, Unconfoundedness: If U→{Introduction of quarantine policies} and U→deaths then P(deaths|Introduction of quarantine policies,hospital beds,U) = P(deaths|Introduction of quarantine policies,hospital beds)\n", "### Estimand : 2\n", "Estimand name: iv\n", "No such variable found!\n", "\n", "## Realized estimand\n", "b: deaths~Introduction of quarantine policies+hospital beds\n", "## Estimate\n", "Value: 1667.7588405151612\n", "\n" ] } ], "source": [ "estimate = model.estimate_effect(identified_estimand, method_name=\"backdoor.propensity_score_stratification\")\n", "print(estimate)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator\n", "INFO:dowhy.causal_estimator:b: deaths~Introduction of quarantine policies+hospital beds\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "*** Causal Estimate ***\n", "\n", "## Target estimand\n", "Estimand type: nonparametric-ate\n", "### Estimand : 1\n", "Estimand name: backdoor\n", "Estimand expression:\n", " d \n", "──────────────────────────────────────(Expectation(deaths|hospital beds))\n", "d[Introduction of quarantine policies] \n", "Estimand assumption 1, Unconfoundedness: If U→{Introduction of quarantine policies} and U→deaths then P(deaths|Introduction of quarantine policies,hospital beds,U) = P(deaths|Introduction of quarantine policies,hospital beds)\n", "### Estimand : 2\n", "Estimand name: iv\n", "No such variable found!\n", "\n", "## Realized estimand\n", "b: deaths~Introduction of quarantine policies+hospital beds\n", "## Estimate\n", "Value: 1440.5259837012877\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/fmzennaro/miniconda2_1/envs/covid/lib/python3.7/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n", " y = column_or_1d(y, warn=True)\n" ] } ], "source": [ "estimate = model.estimate_effect(identified_estimand, method_name=\"backdoor.propensity_score_weighting\", method_params={\"weighting_scheme\":\"ips_weight\"})\n", "print(estimate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All the models seems to agree on a strong effect of the treatment on the outcome, although in the opposite way we would expect. This outcome is reasonable given our simple model and limited data: the data cover only a limited span of time; very likely the number of deaths is following its trend and the introduction of a policy like a quarantine which has a delayed effect was not registered in the data. Again, this should remark the current **NON-relevance** of this preliminary results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Refutation\n", "\n", "A last step in a rigorous causal analysis would be a refutation step. Our model is just a dummy model, so we are not really taking its conclusions in consideration. However we present the formal refutation procedure provided by *dowhy*.\n", "\n", "We run a refutation process based on shuffling the values of our treatment; if the treatment had a real causal effect the estimate average treatment effect is expected to drop to 0." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_estimator:INFO: Using Propensity Score Stratification Estimator\n", "INFO:dowhy.causal_estimator:b: deaths~placebo+hospital beds\n", "/home/fmzennaro/miniconda2_1/envs/covid/lib/python3.7/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n", " y = column_or_1d(y, warn=True)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Refute: Use a Placebo Treatment\n", "Estimated effect:(1667.7588405151612,)\n", "New effect:(26.190793359377157,)\n", "\n" ] } ], "source": [ "from dowhy.causal_refuters.placebo_treatment_refuter import PlaceboTreatmentRefuter\n", "refuter = PlaceboTreatmentRefuter(data=data, identified_estimand=identified_estimand, estimate=estimate, placebo_type='permute')\n", "print(refuter.refute_estimate())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Further work\n", "\n", "This notebook has presented an outline for a static causal analysis of covid19 data. However this draft is far from being complete:\n", "\n", "- **Proper modelling and inclusion of meaningful and relevant variables is essential.**\n", "- **Validation of standard causal hypothesis and assumptions is required.**\n", "- **A time-dependent model would be in order to process this scenario.**" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }