{ "metadata": { "name": "", "signature": "sha256:daf845121b68fedc6c20fe71f5f45290441e727ef67b53809f4a1ba57a794f9d" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Created 7/9/2014 by KO\n", "\n", "Implements propensity-score matching and eventually will implement balance diagnostics" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline\n", "import math\n", "import numpy as np\n", "import scipy\n", "from scipy.stats import binom, hypergeom\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from sklearn.linear_model import LogisticRegression" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Goal: find the average treatment effect in the treatment group (ATT) on RE78.\n", "\n", "Import the data: controls and treated from Lalonde/Dehejia papers. Here's what the site says about the data:\n", "\n", "The variables from left to right are: treatment indicator (1 if treated, 0 if not treated), age, education, Black (1 if black, 0 otherwise), Hispanic (1 if Hispanic, 0 otherwise), married (1 if married, 0 otherwise), nodegree (1 if no degree, 0 otherwise), RE74 (earnings in 1974), RE75 (earnings in 1975), and RE78 (earnings in 1978).\n", "\n", "http://users.nber.org/%7Erdehejia/nswdata2.html\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "names = ['Treated', 'Age', 'Education', 'Black', 'Hispanic', 'Married',\n", " 'Nodegree', 'RE74', 'RE75', 'RE78']\n", "treated = pd.read_table('nswre74_treated.txt', sep = '\\s+',\n", " header = None, names = names)\n", "control = pd.read_table('nswre74_control.txt', sep='\\s+', \n", " header = None, names = names)\n", "data = pd.concat([treated, control])\n", "data.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | Treated | \n", "Age | \n", "Education | \n", "Black | \n", "Hispanic | \n", "Married | \n", "Nodegree | \n", "RE74 | \n", "RE75 | \n", "RE78 | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "37 | \n", "11 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "0 | \n", "0 | \n", "9930.0460 | \n", "
1 | \n", "1 | \n", "22 | \n", "9 | \n", "0 | \n", "1 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "3595.8940 | \n", "
2 | \n", "1 | \n", "30 | \n", "12 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "24909.4500 | \n", "
3 | \n", "1 | \n", "27 | \n", "11 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "7506.1460 | \n", "
4 | \n", "1 | \n", "33 | \n", "8 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "289.7899 | \n", "