{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# RESEARCH IN PYTHON: USING IVE TO RECOVER THE TREATMENT EFFECT\n", "# by J. NATHAN MATIAS March 18, 2015\n", "\n", "# THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n", "# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n", "# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n", "# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n", "# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n", "# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\n", "# THE SOFTWARE." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Using Instrumental-Variables Estimation to Recover the Treatment Effect in Quasi-Experiments " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This section is taken from [Chapter 11](http://www.ats.ucla.edu/stat/stata/examples/methods_matter/chapter11/default.htm) of [Methods Matter](http://www.ats.ucla.edu/stat/examples/methods_matter/) by Richard Murnane and John Willett. \n", "\n", "In Chapter 10, Murnane and Willett introduce instrumental variables estimation(IVE) as a method for carving out causal claims from observational data ([chapter summary](http://acawiki.org/Introducing_Instrumental-Variables_Estimation)) ([example code](http://nbviewer.ipython.org/github/natematias/research_in_python/blob/master/instrumental_variables_estimation/Instrumental-Variables%20Estimation.ipynb)). \n", "\n", "In Chapter 11, the authors explain how IVE can be used to \"recover\" the treatment effect in cases where random assignment is applied to an offer to participate, where not everyone takes the offer, and where other people participate through some other means. They use the example of research on the effectiveness of a financial aid offer on the likelihood of a student to finish 8th grade, using a subset of data from Bogotá from a study on \"[Vouchers for Private Schooling in Columbia](http://www.nber.org/papers/w8343)\" (2002) by Joshua Angrist, Eric Bettinger, Erik Bloom, Elizabeth King, and Michael Kremer ([full data here](http://economics.mit.edu/faculty/angrist/data1/data/angetal02), [subset data here](http://www.ats.ucla.edu/stat/stata/examples/methods_matter/chapter11/default.htm)).\n", "\n", "The dataset includes the following variables:\n", "* *finish8th*: did the student finish 8th grade or not (outcome variable)\n", "* *won_lottry*: won the lottery to receive offer of financial aid\n", "* *use_fin_aid*: did the student use financial aid of any kind (not exclusive to the lottery) or not\n", "* *base_age*: student age\n", "* *male*: is the student male or not" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# THINGS TO IMPORT\n", "# This is a baseline set of libraries I import by default if I'm rushed for time.\n", "\n", "import codecs # load UTF-8 Content\n", "import json # load JSON files\n", "import pandas as pd # Pandas handles dataframes\n", "import numpy as np # Numpy handles lots of basic maths operations\n", "import matplotlib.pyplot as plt # Matplotlib for plotting\n", "import seaborn as sns # Seaborn for beautiful plots\n", "from dateutil import * # I prefer dateutil for parsing dates\n", "import math # transformations\n", "import statsmodels.formula.api as smf # for doing statistical regression\n", "import statsmodels.api as sm # access to the wider statsmodels library, including R datasets\n", "from collections import Counter # Counter is useful for grouping and counting\n", "import scipy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Acquire Dataset from Methods Matter" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import urllib2\n", "import os.path\n", "if(os.path.isfile(\"colombia_voucher.dta\")!=True):\n", " response = urllib2.urlopen(\"http://www.ats.ucla.edu/stat/stata/examples/methods_matter/chapter11/colombia_voucher.dta\")\n", " if(response.getcode()==200):\n", " f = open(\"colombia_voucher.dta\",\"w\")\n", " f.write(response.read())\n", " f.close()\n", "voucher_df = pd.read_stata(\"colombia_voucher.dta\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Summary Statistics" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==============================================================================\n", " OVERALL SUMMARY\n", "==============================================================================\n", " id won_lottry male base_age finish8th \\\n", "count 1171.000000 1171.000000 1171.000000 1171.000000 1171.000000 \n", "mean 1357.010248 0.505551 0.504697 12.004270 0.681469 \n", "std 890.711584 0.500183 0.500192 1.347038 0.466106 \n", "min 3.000000 0.000000 0.000000 7.000000 0.000000 \n", "25% 616.000000 0.000000 0.000000 11.000000 0.000000 \n", "50% 1280.000000 1.000000 1.000000 12.000000 1.000000 \n", "75% 1982.500000 1.000000 1.000000 13.000000 1.000000 \n", "max 4030.000000 1.000000 1.000000 17.000000 1.000000 \n", "\n", " use_fin_aid \n", "count 1171.000000 \n", "mean 0.581554 \n", "std 0.493515 \n", "min 0.000000 \n", "25% 0.000000 \n", "50% 1.000000 \n", "75% 1.000000 \n", "max 1.000000 \n", "==============================================================================\n", " LOTTERY = 0\n", "==============================================================================\n", " id won_lottry male base_age finish8th \\\n", "count 579.000000 579 579.000000 579.000000 579.000000 \n", "mean 1460.998273 0 0.504318 12.036269 0.625216 \n", "std 960.839468 0 0.500414 1.351814 0.484486 \n", "min 4.000000 0 0.000000 7.000000 0.000000 \n", "25% 650.500000 0 0.000000 11.000000 0.000000 \n", "50% 1392.000000 0 1.000000 12.000000 1.000000 \n", "75% 2122.500000 0 1.000000 13.000000 1.000000 \n", "max 4030.000000 0 1.000000 16.000000 1.000000 \n", "\n", " use_fin_aid \n", "count 579.000000 \n", "mean 0.240069 \n", "std 0.427495 \n", "min 0.000000 \n", "25% 0.000000 \n", "50% 0.000000 \n", "75% 0.000000 \n", "max 1.000000 \n", "==============================================================================\n", " LOTTERY = 1\n", "==============================================================================\n", " id won_lottry male base_age finish8th \\\n", "count 592.000000 592 592.000000 592.000000 592.000000 \n", "mean 1255.305743 1 0.505068 11.972973 0.736486 \n", "std 804.217066 0 0.500397 1.342755 0.440911 \n", "min 3.000000 1 0.000000 9.000000 0.000000 \n", "25% 578.750000 1 0.000000 11.000000 0.000000 \n", "50% 1210.000000 1 1.000000 12.000000 1.000000 \n", "75% 1707.250000 1 1.000000 13.000000 1.000000 \n", "max 4006.000000 1 1.000000 17.000000 1.000000 \n", "\n", " use_fin_aid \n", "count 592.000000 \n", "mean 0.915541 \n", "std 0.278311 \n", "min 0.000000 \n", "25% 1.000000 \n", "50% 1.000000 \n", "75% 1.000000 \n", "max 1.000000 \n" ] } ], "source": [ "print \"==============================================================================\"\n", "print \" OVERALL SUMMARY\"\n", "print \"==============================================================================\"\n", "\n", "print voucher_df.describe()\n", "\n", "for i in range(2):\n", " print \"==============================================================================\"\n", " print \" LOTTERY = %(i)d\" % {\"i\":i}\n", " print \"==============================================================================\"\n", " print voucher_df[voucher_df['won_lottry']==i].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Two-stage Least Squares Logistic Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " If you're interested to learn more on the rationale and process for doing this kind of analysis, Murnane and Willett introduce instrumental variables estimation(IVE) as a method for carving out causal claims from observational data ([chapter summary](http://acawiki.org/Introducing_Instrumental-Variables_Estimation)) ([example code](http://nbviewer.ipython.org/github/natematias/research_in_python/blob/master/instrumental_variables_estimation/Instrumental-Variables%20Estimation.ipynb)). \n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==============================================================================\n", " FIRST STAGE\n", "==============================================================================\n", " Generalized Linear Model Regression Results \n", "==============================================================================\n", "Dep. Variable: use_fin_aid No. Observations: 1171\n", "Model: GLM Df Residuals: 1167\n", "Model Family: Binomial Df Model: 3\n", "Link Function: logit Scale: 1.0\n", "Method: IRLS Log-Likelihood: -488.00\n", "Date: Thu, 19 Mar 2015 Deviance: 975.99\n", "Time: 23:08:46 Pearson chi2: 1.16e+03\n", "No. Iterations: 7 \n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.3455 0.731 0.472 0.637 -1.088 1.779\n", "won_lottry 3.5514 0.178 19.934 0.000 3.202 3.901\n", "male -0.1622 0.164 -0.992 0.321 -0.483 0.158\n", "base_age -0.1184 0.061 -1.946 0.052 -0.238 0.001\n", "==============================================================================\n", "\n", "\n", "==============================================================================\n", " SECOND STAGE\n", "==============================================================================\n", " Generalized Linear Model Regression Results \n", "==============================================================================\n", "Dep. Variable: finish8th No. Observations: 1171\n", "Model: GLM Df Residuals: 1167\n", "Model Family: Binomial Df Model: 3\n", "Link Function: logit Scale: 1.0\n", "Method: IRLS Log-Likelihood: -696.65\n", "Date: Thu, 19 Mar 2015 Deviance: 1393.3\n", "Time: 23:08:46 Pearson chi2: 1.17e+03\n", "No. Iterations: 6 \n", "======================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "--------------------------------------------------------------------------------------\n", "Intercept 4.0756 0.604 6.753 0.000 2.893 5.258\n", "use_fin_aid_fitted 0.7743 0.192 4.036 0.000 0.398 1.150\n", "male -0.4175 0.130 -3.208 0.001 -0.673 -0.162\n", "base_age -0.2919 0.048 -6.077 0.000 -0.386 -0.198\n", "======================================================================================\n" ] } ], "source": [ "print \"==============================================================================\"\n", "print \" FIRST STAGE\"\n", "print \"==============================================================================\"\n", "result = smf.glm(formula = \"use_fin_aid ~ won_lottry + male + base_age\", \n", " data=voucher_df,\n", " family=sm.families.Binomial()).fit()\n", "voucher_df['use_fin_aid_fitted']= result.predict()\n", "print result.summary()\n", "\n", "print\n", "print\n", "print \"==============================================================================\"\n", "print \" SECOND STAGE\"\n", "print \"==============================================================================\"#\n", "result = smf.glm(formula = \" finish8th ~ use_fin_aid_fitted + male + base_age\", \n", " data=voucher_df,\n", " family=sm.families.Binomial()).fit()\n", "print result.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Interpreting the Local Average Treatment Effect " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we use IVE to \"recover\" the treatment effect, how can we actually describe the results? According to Murnane and Willett, \"an estimate of a treatment effect obtained by IV methods should be regarded as an estimated *local average treatment effect* (LATE). The chapter walks readers through the kinds of groups involved:\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
**won_lottery = 1****won_lottery = 0**
use_fin_aid=1
(*used financial aid form some source*)
use_fin_aid=0
(*did not use financial aid from any source*)
\"**Compliers**\"
use_fin_aid=1
(used financial aid from some source)
use_fin_aid=1
(used financial aid from some source)
\"**Always-Takers**\"
use_fin_aid=0
(did not use financial aid from any source)
use_fin_aid=0
(did not use financial aid from any source)
\"**Never-Takers**\"
\n", "\n", "Murnane and Willett offer a model that distinguishes among groups based on their compliance with \"the intent of the lottery\" (277), based on a paper by Angrist, Imbens and Rubin on \"[Identification of Causal Effects Using Instrumental Variables](http://business.baylor.edu/scott_cunningham/teaching/angrist-imbens-and-rubin.pdf)\" (1996):\n", "* *Compliers* \"are willing to have their behavior determined by the outcomes of the lottery, regardless of the particular experimental conditions to which they were assigned\" (278).\n", "* *Always-Takers* \"are families who will find and make use of financial aid to pay private-school fees\" regardless of the lottery. They may find aid outside the lottery\n", "* *Never-takers* are the mirror image of always-takers: \"they will not make use of financial aid to pay childrens' fees at a private secondary school under any circumstances\" (278)\n", "* (there are other possible groups, like \"defiers\" (Gennetian et all, 2005) who always do the opposite of what investigators ask them to do, but we make the assumption of \"no defiers\" in this dataset)\n", "\n", "In this context, IV estimates of the **local average treatment effect** (LATE) for this quasi-experiment only applies to \"compliers\"--and not to never-takers or always-takers. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }