{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Notebook Tasks\n",
    "\n",
    "<br>\n",
    "**_Possible Samples for Statistical Tests_**:\n",
    "- Given the above, there are a number of possible tests:<br><br>\n",
    "\n",
    "<table>\n",
    "  <tr>\n",
    "    <th>IV: SOX Policies</th>\n",
    "    <th>DV: Donor Advisory</th>\n",
    "    <th>N</th>\n",
    "    <th>Notes</th>\n",
    "    <th>TO DO</th>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>2011</td>\n",
    "    <td>2016</td>\n",
    "    <td>4,857</td>\n",
    "    <td>47 donor advisories on these organizations; associational test (we don't know when the SOX policies were added); also, DV is 'current donor advisory'</td>\n",
    "    <td>ready to run<br></td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>2011</td>\n",
    "    <td>2012-2016</td>\n",
    "    <td>4,857</td>\n",
    "    <td>47 2016 advisories plus probably another dozen or so advisories over the 2012-2015 period; associational test as above, but adds in donor advisories that were put in place then dropped between 2012 and 2015.</td>\n",
    "    <td>some minor work creating this new DV but not very burdensome</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>2011</td>\n",
    "    <td>2011</td>\n",
    "    <td>5,439</td>\n",
    "    <td>39 donor advisories; pure cross-sectional test<br></td>\n",
    "    <td>Download the '2011' 990 data (SOX policies + controls) for the 39 orgs with a 2011 donor advisory; a few hours work to download and enter the data</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>2016</td>\n",
    "    <td>2016</td>\n",
    "    <td>8,304</td>\n",
    "    <td>328 donor advisories; pure cross-sectional test</td>\n",
    "    <td>ready to run</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>change 2011-2016</td>\n",
    "    <td>2016</td>\n",
    "    <td>4,857</td>\n",
    "    <td>'Divide 4,857 orgs into three groups: i) those with no SOX policies in 2011 and still no SOX policies in 2016; ii) those with SOX policies in 2011 and 2016; and iii) those with no SOX policies in 2011 but SOX policies in 2016. Create dummy variables for each group and see whether those in group iii) do better than i) or ii). This is a relatively low cost 'pre-post' test.</td>\n",
    "    <td>moderate amount of work to create the new dummies but not too burdensome</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>change 2011-2016</td>\n",
    "    <td>2012-2016</td>\n",
    "    <td>TBD</td>\n",
    "    <td>Similar to above option, but would need to take a sample of organizations in group iii) and go through their 990s to find out exactly when they added the SOX policies</td>\n",
    "    <td>Resource-intensive 990 searches</td>\n",
    "  </tr>\n",
    "</table>\n",
    "\n",
    "\n",
    "<br><br>\n",
    "**_Notes from Meeting with Dan:_**\n",
    "- Do not do 3rd or 6th test -- benefit not worth the cost\n",
    "- 1st and 2nd tests can be robustness analyses\n",
    "- Focus on 4th and 5th tests\n",
    "- Control variables:\n",
    "  - Size: total revenues best (probably logged)\n",
    "    - will need 2011 and 2016 versions for the 4th and 5th tests\n",
    "  - efficiency ratio\n",
    "  - age (from BMF)\n",
    "  - complexity (could be a good control from Erica's paper)\n",
    "  - fixed effects:\n",
    "    - state\n",
    "    - category\n",
    "      - I need to scrape the category dummies for the new orgs in the 2016 database\n",
    "        - CN does not include that information in the ratings area, but it is included on the webpage in the 'breadcrumbs' area\n",
    "  - The focus of our paper is on SOC policies; if an org has SOX policies it probably has other governance policies, and these would be highly correlated. So, we will leave the other governance variables out of one version of the 4th and 5th tests, and then try to include them in another set. The best candidates are:\n",
    "    - *independent board* --> related to Erica's *independence of key actors\" concept\n",
    "    - *board review of 990* and *audited financials* --> both related to Erica's *board monitoring* concept\n",
    "    - we could include other governance variables as needed.\n",
    "- We are focusing on non-health, non-university organizations; by focusing more on a donor-focused sample (CN), we are differentiating the work from previous studies.\n",
    "- To differentiate from Erica's *JBE* paper, we should use the SOI data to see how many of the donor advisories are because of 'non-material diversions'.\n",
    "\n",
    "\n",
    "\n",
    "<br><br>\n",
    "**_To Do (beyond notes listed in table above):_**\n",
    "- For all above tests, we need to decide on controls, then find/merge/create any not currently in dataset\n",
    "- Run a selection model?\n",
    "- Code the *type* of advisory? Maybe save for future study\n",
    "- There are 53 orgs on the CN 'Watchlist' -- we probably don't need to look at these but it's a possible future move.\n",
    "\n",
    "<br>\n",
    "**_Notes on 2011 data:_**\n",
    "- Only 47 of 329 current donor advisories are on orgs that were rated in 2011\n",
    "- Number of 2011 orgs (n=5,349) missing from 2016 ratings: 582\n",
    "- Number of 2016 orgs (n=8,304) not in 2011 ratings: 3,447\n",
    "- In 2011 when I scraped the current ratings there are 39 blank rows. Specifically, I checked the following spreadsheet: *Charity Navigator - current ratings, October 18, 2011 (WITH UPDATES FOR DONOR ADVISORY ORGS).xlsx*  -- 39 rows were blank for all ratings information, so I checked against the historical ratings on the CN website. (So far) all rows were either 1) dropped from CN, 2) had a donor advisory, or 3) still have a donor advisory. I have 5,439 orgs in the 2011 database. 39 seem to have had donor advisories on them at that time. So, the 2011 sample is the 5,400 orgs that did not have an advisory on them at the time. This conforms with the *n* of 5,400 in the above logit."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>\n",
    "\n",
    "### Import Packages\n",
    "First, we will import several necessary Python packages. We will be using the <a href=\"http://pandas.pydata.org/\">Python Data Analysis Library,</a> or <i>PANDAS</i>, extensively for our data manipulations. It is invaluable for analyzing datasets. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import of basic elements of PANDAS and numpy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from pandas import DataFrame\n",
    "from pandas import Series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>\n",
    "\n",
    "We can check which version of various packages we're using. You can see I'm running PANDAS 0.17 here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.18.1\n"
     ]
    }
   ],
   "source": [
    "print pd.__version__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>\n",
    "PANDAS allows you to set various options for, among other things, inspecting the data. I like to be able to see all of the columns. Therefore, I typically include this line at the top of all my notebooks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#http://pandas.pydata.org/pandas-docs/stable/options.html\n",
    "pd.set_option('display.max_columns', None)\n",
    "pd.set_option('max_colwidth', 500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Read in Data\n",
    "Let's read in the merged historical/current/2011 dataset we created in the last notebook. First we'll change the working directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/Users/gregorysaxton/Google Drive/SOX\n"
     ]
    }
   ],
   "source": [
    "cd '/Users/gregorysaxton/Google Drive/SOX'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Logit Tests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "35\n",
      "4863\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>donor_advisory</th>\n",
       "      <th>donor_advisory_2016</th>\n",
       "      <th>donor_advisory_2011_to_2016</th>\n",
       "      <th>org_id</th>\n",
       "      <th>EIN</th>\n",
       "      <th>FYE</th>\n",
       "      <th>Form 990 FYE</th>\n",
       "      <th>ratings_system</th>\n",
       "      <th>2011_data</th>\n",
       "      <th>2016_data</th>\n",
       "      <th>conflict_of_interest_policy_v2</th>\n",
       "      <th>records_retention_policy_v2</th>\n",
       "      <th>whistleblower_policy_v2</th>\n",
       "      <th>SOX_policies</th>\n",
       "      <th>SOX_policies_binary</th>\n",
       "      <th>SOX_policies_all_binary</th>\n",
       "      <th>program_efficiency</th>\n",
       "      <th>complexity</th>\n",
       "      <th>complexity_2011</th>\n",
       "      <th>age</th>\n",
       "      <th>total_revenue_logged</th>\n",
       "      <th>category</th>\n",
       "      <th>state</th>\n",
       "      <th>tot_rev</th>\n",
       "      <th>category_Animals</th>\n",
       "      <th>category_Arts, Culture, Humanities</th>\n",
       "      <th>category_Community Development</th>\n",
       "      <th>category_Education</th>\n",
       "      <th>category_Environment</th>\n",
       "      <th>category_Health</th>\n",
       "      <th>category_Human Services</th>\n",
       "      <th>category_Human and Civil Rights</th>\n",
       "      <th>category_International</th>\n",
       "      <th>category_Religion</th>\n",
       "      <th>category_Research and Public Policy</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>50715</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>5954</td>\n",
       "      <td>010202467</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>2009-12</td>\n",
       "      <td>CN 2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.788895</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>62.0</td>\n",
       "      <td>15.947563</td>\n",
       "      <td>Research and Public Policy</td>\n",
       "      <td>ME</td>\n",
       "      <td>8432154.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       donor_advisory  donor_advisory_2016  donor_advisory_2011_to_2016  \\\n",
       "50715             0.0                  0.0                          0.0   \n",
       "\n",
       "      org_id        EIN     FYE Form 990 FYE ratings_system  2011_data  \\\n",
       "50715   5954  010202467  FY2009      2009-12         CN 2.0        1.0   \n",
       "\n",
       "       2016_data  conflict_of_interest_policy_v2  records_retention_policy_v2  \\\n",
       "50715        0.0                             1.0                          1.0   \n",
       "\n",
       "       whistleblower_policy_v2  SOX_policies  SOX_policies_binary  \\\n",
       "50715                      1.0           3.0                  1.0   \n",
       "\n",
       "       SOX_policies_all_binary  program_efficiency  complexity  \\\n",
       "50715                      1.0            0.788895         0.0   \n",
       "\n",
       "       complexity_2011   age  total_revenue_logged  \\\n",
       "50715              3.0  62.0             15.947563   \n",
       "\n",
       "                         category state    tot_rev  category_Animals  \\\n",
       "50715  Research and Public Policy    ME  8432154.0               0.0   \n",
       "\n",
       "       category_Arts, Culture, Humanities  category_Community Development  \\\n",
       "50715                                 0.0                             0.0   \n",
       "\n",
       "       category_Education  category_Environment  category_Health  \\\n",
       "50715                 0.0                   0.0              0.0   \n",
       "\n",
       "       category_Human Services  category_Human and Civil Rights  \\\n",
       "50715                      0.0                              0.0   \n",
       "\n",
       "       category_International  category_Religion  \\\n",
       "50715                     0.0                0.0   \n",
       "\n",
       "       category_Research and Public Policy  \n",
       "50715                                  1.0  "
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_2011 = pd.read_pickle('Tests 1-2 data.pkl')\n",
    "print len(df_2011.columns)\n",
    "print len(df_2011)\n",
    "df_2011.head(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'complexity_2011', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n"
     ]
    }
   ],
   "source": [
    "print df_2011.columns.tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.0    4816\n",
      "1.0      47\n",
      "Name: donor_advisory_2016, dtype: int64\n",
      "0.0    4755\n",
      "1.0     108\n",
      "Name: donor_advisory_2011_to_2016, dtype: int64\n"
     ]
    }
   ],
   "source": [
    "print df_2011['donor_advisory_2016'].value_counts()\n",
    "print df_2011['donor_advisory_2011_to_2016'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n"
     ]
    }
   ],
   "source": [
    "#DVs = ['donor_advisory', \n",
    "DVs = ['donor_advisory_2016', 'donor_advisory_2011_to_2016']\n",
    "indicators = ['org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data']\n",
    "IVs = ['conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2',\n",
    "       'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']\n",
    "controls = ['program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state']\n",
    "fixed_effects = ['category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', \n",
    "                 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', \n",
    "                 'category_Human and Civil Rights', 'category_International', 'category_Religion', \n",
    "                 'category_Research and Public Policy']\n",
    "SOI_check = ['tot_rev']\n",
    "\n",
    "merge_cols = ['_merge_v1', '_merge_v2', '_merge_v3', '_merge_v4', '_merge_47', '_merge_efile']\n",
    "\n",
    "#+ SOI_check\n",
    "logit_cols = DVs + indicators + IVs + controls  + fixed_effects\n",
    "print logit_cols"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>donor_advisory</th>\n",
       "      <th>donor_advisory_2016</th>\n",
       "      <th>donor_advisory_2011_to_2016</th>\n",
       "      <th>org_id</th>\n",
       "      <th>EIN</th>\n",
       "      <th>FYE</th>\n",
       "      <th>Form 990 FYE</th>\n",
       "      <th>ratings_system</th>\n",
       "      <th>2011_data</th>\n",
       "      <th>2016_data</th>\n",
       "      <th>conflict_of_interest_policy_v2</th>\n",
       "      <th>records_retention_policy_v2</th>\n",
       "      <th>whistleblower_policy_v2</th>\n",
       "      <th>SOX_policies</th>\n",
       "      <th>SOX_policies_binary</th>\n",
       "      <th>SOX_policies_all_binary</th>\n",
       "      <th>program_efficiency</th>\n",
       "      <th>complexity</th>\n",
       "      <th>complexity_2011</th>\n",
       "      <th>age</th>\n",
       "      <th>total_revenue_logged</th>\n",
       "      <th>category</th>\n",
       "      <th>state</th>\n",
       "      <th>tot_rev</th>\n",
       "      <th>category_Animals</th>\n",
       "      <th>category_Arts, Culture, Humanities</th>\n",
       "      <th>category_Community Development</th>\n",
       "      <th>category_Education</th>\n",
       "      <th>category_Environment</th>\n",
       "      <th>category_Health</th>\n",
       "      <th>category_Human Services</th>\n",
       "      <th>category_Human and Civil Rights</th>\n",
       "      <th>category_International</th>\n",
       "      <th>category_Religion</th>\n",
       "      <th>category_Research and Public Policy</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [donor_advisory, donor_advisory_2016, donor_advisory_2011_to_2016, org_id, EIN, FYE, Form 990 FYE, ratings_system, 2011_data, 2016_data, conflict_of_interest_policy_v2, records_retention_policy_v2, whistleblower_policy_v2, SOX_policies, SOX_policies_binary, SOX_policies_all_binary, program_efficiency, complexity, complexity_2011, age, total_revenue_logged, category, state, tot_rev, category_Animals, category_Arts, Culture, Humanities, category_Community Development, category_Education, category_Environment, category_Health, category_Human Services, category_Human and Civil Rights, category_International, category_Religion, category_Research and Public Policy]\n",
       "Index: []"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_2011[df_2011.duplicated()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#df_2011.to_excel('df_2011.xls')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Test Logit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.6.1\n"
     ]
    }
   ],
   "source": [
    "import statsmodels\n",
    "import statsmodels.api as sm\n",
    "import statsmodels.formula.api as smf   #FOR USING 'R'-STYLE FORMULAS FOR REGRESSIONS\n",
    "print statsmodels.__version__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "donor_advisory_2011_to_2016 ~ SOX_policies + total_revenue_logged +  program_efficiency + age + complexity_2011 + C(category)\n"
     ]
    }
   ],
   "source": [
    "#IVs = '%s + ' % IV\n",
    "#DV = '%s ~ ' % DV  \n",
    "IVs = 'SOX_policies '\n",
    "#IVs = 'SOX_policies_binary'\n",
    "#DV = 'advisory ~ '\n",
    "#DV = 'donor_advisory_2016 ~ '\n",
    "DV = 'donor_advisory_2011_to_2016 ~ '\n",
    "controls = '+ total_revenue_logged +  program_efficiency + age + complexity_2011 + C(category)'\n",
    "\n",
    "#admin_expense_percent + leader_comp_percent + budget_surplus\n",
    "logit_formula = DV+IVs+controls\n",
    "print logit_formula\n",
    "#globals()[\"mod%s\" % model_num] = smf.logit(formula=logit_formula, data=df).fit()   \n",
    "#print globals()[\"mod%s\" % model_num].summary()\n",
    "# #print model_num.summary()\n",
    "#print '\\n', \"Chi-squared value:\", globals()[\"mod%s\" % model_num].llr, '\\n' #TO GET THE CHI-SQUARED VALUE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n"
     ]
    }
   ],
   "source": [
    "print df_2011.columns.tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4838"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "logit_cols1 = ['donor_advisory_2016', 'donor_advisory_2011_to_2016', \n",
    "               'SOX_policies', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state']\n",
    "len(df_2011[logit_cols1].dropna())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1242"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df_2011.dropna())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warning: Maximum number of iterations has been exceeded.\n",
      "         Current function value: 0.075203\n",
      "         Iterations: 35\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals\n",
      "  \"Check mle_retvals\", ConvergenceWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th> <td>donor_advisory_2011_to_2016</td> <th>  No. Observations:  </th>  <td>  4833</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>                    <td>Logit</td>            <th>  Df Residuals:      </th>  <td>  4817</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>                    <td>MLE</td>             <th>  Df Model:          </th>  <td>    15</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>               <td>Tue, 06 Sep 2016</td>       <th>  Pseudo R-squ.:     </th>  <td>0.08962</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>                   <td>12:35:26</td>           <th>  Log-Likelihood:    </th> <td> -363.45</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>                <td>False</td>            <th>  LL-Null:           </th> <td> -399.24</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                           <td> </td>              <th>  LLR p-value:       </th> <td>2.351e-09</td>\n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "                      <td></td>                         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>                                 <td>   -2.6828</td> <td>    1.438</td> <td>   -1.865</td> <td> 0.062</td> <td>   -5.502     0.136</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Arts, Culture, Humanities]</th>  <td>   -0.5078</td> <td>    0.515</td> <td>   -0.986</td> <td> 0.324</td> <td>   -1.517     0.502</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Community Development]</th>      <td>   -0.3612</td> <td>    0.575</td> <td>   -0.629</td> <td> 0.530</td> <td>   -1.487     0.765</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Education]</th>                  <td>   -1.2152</td> <td>    0.794</td> <td>   -1.531</td> <td> 0.126</td> <td>   -2.771     0.340</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Environment]</th>                <td>  -18.8719</td> <td> 4856.123</td> <td>   -0.004</td> <td> 0.997</td> <td>-9536.698  9498.954</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Health]</th>                     <td>   -1.0195</td> <td>    0.548</td> <td>   -1.862</td> <td> 0.063</td> <td>   -2.093     0.054</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Human Services]</th>             <td>   -0.1790</td> <td>    0.415</td> <td>   -0.431</td> <td> 0.666</td> <td>   -0.993     0.635</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Human and Civil Rights]</th>     <td>   -0.6823</td> <td>    0.687</td> <td>   -0.993</td> <td> 0.321</td> <td>   -2.029     0.664</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.International]</th>              <td>   -0.3676</td> <td>    0.511</td> <td>   -0.719</td> <td> 0.472</td> <td>   -1.369     0.634</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Religion]</th>                   <td>    0.3141</td> <td>    0.449</td> <td>    0.700</td> <td> 0.484</td> <td>   -0.566     1.194</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Research and Public Policy]</th> <td>   -0.5298</td> <td>    0.796</td> <td>   -0.665</td> <td> 0.506</td> <td>   -2.090     1.031</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>SOX_policies</th>                              <td>   -0.4256</td> <td>    0.110</td> <td>   -3.876</td> <td> 0.000</td> <td>   -0.641    -0.210</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>total_revenue_logged</th>                      <td>    0.2806</td> <td>    0.101</td> <td>    2.785</td> <td> 0.005</td> <td>    0.083     0.478</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>program_efficiency</th>                        <td>   -3.0563</td> <td>    0.767</td> <td>   -3.982</td> <td> 0.000</td> <td>   -4.561    -1.552</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>age</th>                                       <td>   -0.0065</td> <td>    0.007</td> <td>   -0.981</td> <td> 0.327</td> <td>   -0.019     0.006</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>complexity_2011</th>                           <td>   -0.7391</td> <td>    0.274</td> <td>   -2.700</td> <td> 0.007</td> <td>   -1.275    -0.203</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                                Logit Regression Results                               \n",
       "=======================================================================================\n",
       "Dep. Variable:     donor_advisory_2011_to_2016   No. Observations:                 4833\n",
       "Model:                                   Logit   Df Residuals:                     4817\n",
       "Method:                                    MLE   Df Model:                           15\n",
       "Date:                         Tue, 06 Sep 2016   Pseudo R-squ.:                 0.08962\n",
       "Time:                                 12:35:26   Log-Likelihood:                -363.45\n",
       "converged:                               False   LL-Null:                       -399.24\n",
       "                                                 LLR p-value:                 2.351e-09\n",
       "=============================================================================================================\n",
       "                                                coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------------------------------\n",
       "Intercept                                    -2.6828      1.438     -1.865      0.062        -5.502     0.136\n",
       "C(category)[T.Arts, Culture, Humanities]     -0.5078      0.515     -0.986      0.324        -1.517     0.502\n",
       "C(category)[T.Community Development]         -0.3612      0.575     -0.629      0.530        -1.487     0.765\n",
       "C(category)[T.Education]                     -1.2152      0.794     -1.531      0.126        -2.771     0.340\n",
       "C(category)[T.Environment]                  -18.8719   4856.123     -0.004      0.997     -9536.698  9498.954\n",
       "C(category)[T.Health]                        -1.0195      0.548     -1.862      0.063        -2.093     0.054\n",
       "C(category)[T.Human Services]                -0.1790      0.415     -0.431      0.666        -0.993     0.635\n",
       "C(category)[T.Human and Civil Rights]        -0.6823      0.687     -0.993      0.321        -2.029     0.664\n",
       "C(category)[T.International]                 -0.3676      0.511     -0.719      0.472        -1.369     0.634\n",
       "C(category)[T.Religion]                       0.3141      0.449      0.700      0.484        -0.566     1.194\n",
       "C(category)[T.Research and Public Policy]    -0.5298      0.796     -0.665      0.506        -2.090     1.031\n",
       "SOX_policies                                 -0.4256      0.110     -3.876      0.000        -0.641    -0.210\n",
       "total_revenue_logged                          0.2806      0.101      2.785      0.005         0.083     0.478\n",
       "program_efficiency                           -3.0563      0.767     -3.982      0.000        -4.561    -1.552\n",
       "age                                          -0.0065      0.007     -0.981      0.327        -0.019     0.006\n",
       "complexity_2011                              -0.7391      0.274     -2.700      0.007        -1.275    -0.203\n",
       "=============================================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "logit = smf.logit(formula=logit_formula, data=df_2011).fit() \n",
    "logit.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "df_2011['donor_advisory_2016'] = df_2011['donor_advisory_2016'].astype('int')\n",
    "df_2011['donor_advisory_2011_to_2016'] = df_2011['donor_advisory_2011_to_2016'].astype('int')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "donor_advisory                         float64\n",
       "donor_advisory_2016                      int64\n",
       "donor_advisory_2011_to_2016              int64\n",
       "org_id                                  object\n",
       "EIN                                     object\n",
       "FYE                                     object\n",
       "Form 990 FYE                            object\n",
       "ratings_system                          object\n",
       "2011_data                              float64\n",
       "2016_data                              float64\n",
       "conflict_of_interest_policy_v2         float64\n",
       "records_retention_policy_v2            float64\n",
       "whistleblower_policy_v2                float64\n",
       "SOX_policies                           float64\n",
       "SOX_policies_binary                    float64\n",
       "SOX_policies_all_binary                float64\n",
       "program_efficiency                     float64\n",
       "complexity                             float64\n",
       "age                                    float64\n",
       "total_revenue_logged                   float64\n",
       "category                                object\n",
       "state                                   object\n",
       "tot_rev                                float64\n",
       "category_Animals                       float64\n",
       "category_Arts, Culture, Humanities     float64\n",
       "category_Community Development         float64\n",
       "category_Education                     float64\n",
       "category_Environment                   float64\n",
       "category_Health                        float64\n",
       "category_Human Services                float64\n",
       "category_Human and Civil Rights        float64\n",
       "category_International                 float64\n",
       "category_Religion                      float64\n",
       "category_Research and Public Policy    float64\n",
       "dtype: object"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_2011.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Write Function for Logits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#def new_logit(IV,model_num):\n",
    "def new_logit_clustered(data, DV, columns, FE, model_num):\n",
    "    #IVs = '%s + ' % IV\n",
    "    #DV = 'RTs_binary ~ '\n",
    "    DV = '%s ~ ' % DV  \n",
    "    #controls = 'from_user_followers_count + time_on_twitter_days + CSR_sustainability +  \\\n",
    "    #         URLs_binary + photo'\n",
    "    IVs = ' + '.join(columns)\n",
    "    FE = '%s ' % FE\n",
    "    logit_formula = DV+IVs+FE\n",
    "    print logit_formula\n",
    "    globals()[\"mod%s\" % model_num] = smf.logit(formula=logit_formula, data=data).fit(cov_type='cluster',\n",
    "                                                        cov_kwds={'groups': df['firm_from_user_screen_name']})   \n",
    "    print globals()[\"mod%s\" % model_num].summary()\n",
    "    #print model_num.summary()\n",
    "    print '\\n', \"Chi-squared value:\", globals()[\"mod%s\" % model_num].llr, '\\n' #TO GET THE CHI-SQUARED VALUE\n",
    "    #print '\\n', \"Pseudo R-squared:\", globals()[\"mod%s\" % model_num].prsquared #TO GET THE PSEUDO-R-SQUARED"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1157,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "donor_advisory_2011_to_2016 ~ SOX_policies + total_revenue_logged +  program_efficiency + age + complexity_2011 + C(category)\n"
     ]
    }
   ],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 1089,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4863\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "4813"
      ]
     },
     "execution_count": 1089,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "logit_variables = ['2011_data', 'donor_advisory_2016', 'SOX_policies', 'total_revenue_logged', \n",
    "                   'program_efficiency', 'age', 'complexity', 'state', 'category']\n",
    "df_2011 = df[logit_variables]\n",
    "df_2011 = df_2011[df_2011['2011_data']==1]\n",
    "print len(df_2011)\n",
    "len(df_2011.dropna())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1100,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2011_data</th>\n",
       "      <td>84958</td>\n",
       "      <td>0.057240</td>\n",
       "      <td>0.232302</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>donor_advisory_2016</th>\n",
       "      <td>84958</td>\n",
       "      <td>0.004332</td>\n",
       "      <td>0.065672</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SOX_policies</th>\n",
       "      <td>21894</td>\n",
       "      <td>2.724582</td>\n",
       "      <td>0.689867</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>total_revenue_logged</th>\n",
       "      <td>21825</td>\n",
       "      <td>15.911470</td>\n",
       "      <td>1.458552</td>\n",
       "      <td>11.616123</td>\n",
       "      <td>14.781636</td>\n",
       "      <td>15.706608</td>\n",
       "      <td>16.864409</td>\n",
       "      <td>22.042788</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>program_efficiency</th>\n",
       "      <td>21894</td>\n",
       "      <td>0.805400</td>\n",
       "      <td>0.103635</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.756568</td>\n",
       "      <td>0.817758</td>\n",
       "      <td>0.871105</td>\n",
       "      <td>1.010186</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>age</th>\n",
       "      <td>83830</td>\n",
       "      <td>39.508147</td>\n",
       "      <td>19.310175</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>24.000000</td>\n",
       "      <td>35.000000</td>\n",
       "      <td>52.000000</td>\n",
       "      <td>108.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>complexity</th>\n",
       "      <td>84958</td>\n",
       "      <td>0.373031</td>\n",
       "      <td>1.220945</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>8.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>complexity_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>2.466791</td>\n",
       "      <td>0.514468</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                      count       mean        std        min        25%  \\\n",
       "2011_data             84958   0.057240   0.232302   0.000000   0.000000   \n",
       "donor_advisory_2016   84958   0.004332   0.065672   0.000000   0.000000   \n",
       "SOX_policies          21894   2.724582   0.689867   0.000000   3.000000   \n",
       "total_revenue_logged  21825  15.911470   1.458552  11.616123  14.781636   \n",
       "program_efficiency    21894   0.805400   0.103635   0.000000   0.756568   \n",
       "age                   83830  39.508147  19.310175   0.000000  24.000000   \n",
       "complexity            84958   0.373031   1.220945   0.000000   0.000000   \n",
       "complexity_2011        4833   2.466791   0.514468   1.000000   2.000000   \n",
       "\n",
       "                            50%        75%         max  \n",
       "2011_data              0.000000   0.000000    1.000000  \n",
       "donor_advisory_2016    0.000000   0.000000    1.000000  \n",
       "SOX_policies           3.000000   3.000000    3.000000  \n",
       "total_revenue_logged  15.706608  16.864409   22.042788  \n",
       "program_efficiency     0.817758   0.871105    1.010186  \n",
       "age                   35.000000  52.000000  108.000000  \n",
       "complexity             0.000000   0.000000    8.000000  \n",
       "complexity_2011        2.000000   3.000000    3.000000  "
      ]
     },
     "execution_count": 1100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['2011_data', 'donor_advisory_2016', 'SOX_policies', 'total_revenue_logged', \n",
    "                   'program_efficiency', 'age', 'complexity', 'complexity_2011', 'state', 'category']].describe().T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1158,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warning: Maximum number of iterations has been exceeded.\n",
      "         Current function value: 0.075575\n",
      "         Iterations: 35\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals\n",
      "  \"Check mle_retvals\", ConvergenceWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th> <td>donor_advisory_2011_to_2016</td> <th>  No. Observations:  </th>  <td>  4808</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>                    <td>Logit</td>            <th>  Df Residuals:      </th>  <td>  4792</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>                    <td>MLE</td>             <th>  Df Model:          </th>  <td>    15</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>               <td>Fri, 02 Sep 2016</td>       <th>  Pseudo R-squ.:     </th>  <td>0.08892</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>                   <td>12:11:05</td>           <th>  Log-Likelihood:    </th> <td> -363.36</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>                <td>False</td>            <th>  LL-Null:           </th> <td> -398.83</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                           <td> </td>              <th>  LLR p-value:       </th> <td>3.049e-09</td>\n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "                      <td></td>                         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>                                 <td>   -2.6649</td> <td>    1.444</td> <td>   -1.845</td> <td> 0.065</td> <td>   -5.495     0.166</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Arts, Culture, Humanities]</th>  <td>   -0.5071</td> <td>    0.515</td> <td>   -0.985</td> <td> 0.325</td> <td>   -1.516     0.502</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Community Development]</th>      <td>   -0.3595</td> <td>    0.575</td> <td>   -0.626</td> <td> 0.532</td> <td>   -1.486     0.767</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Education]</th>                  <td>   -1.2142</td> <td>    0.794</td> <td>   -1.530</td> <td> 0.126</td> <td>   -2.770     0.341</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Environment]</th>                <td>  -20.2785</td> <td> 9810.033</td> <td>   -0.002</td> <td> 0.998</td> <td>-1.92e+04  1.92e+04</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Health]</th>                     <td>   -1.0181</td> <td>    0.548</td> <td>   -1.859</td> <td> 0.063</td> <td>   -2.092     0.055</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Human Services]</th>             <td>   -0.1780</td> <td>    0.415</td> <td>   -0.429</td> <td> 0.668</td> <td>   -0.992     0.636</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Human and Civil Rights]</th>     <td>   -0.6780</td> <td>    0.687</td> <td>   -0.987</td> <td> 0.324</td> <td>   -2.025     0.669</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.International]</th>              <td>   -0.3664</td> <td>    0.511</td> <td>   -0.717</td> <td> 0.474</td> <td>   -1.368     0.636</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Religion]</th>                   <td>    0.3200</td> <td>    0.449</td> <td>    0.712</td> <td> 0.476</td> <td>   -0.560     1.200</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Research and Public Policy]</th> <td>   -0.5291</td> <td>    0.796</td> <td>   -0.665</td> <td> 0.506</td> <td>   -2.090     1.031</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>SOX_policies</th>                              <td>   -0.4264</td> <td>    0.110</td> <td>   -3.882</td> <td> 0.000</td> <td>   -0.642    -0.211</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>total_revenue_logged</th>                      <td>    0.2790</td> <td>    0.101</td> <td>    2.753</td> <td> 0.006</td> <td>    0.080     0.478</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>program_efficiency</th>                        <td>   -3.0518</td> <td>    0.768</td> <td>   -3.976</td> <td> 0.000</td> <td>   -4.556    -1.547</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>age</th>                                       <td>   -0.0064</td> <td>    0.007</td> <td>   -0.977</td> <td> 0.329</td> <td>   -0.019     0.006</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>complexity_2011</th>                           <td>   -0.7374</td> <td>    0.274</td> <td>   -2.695</td> <td> 0.007</td> <td>   -1.274    -0.201</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                                Logit Regression Results                               \n",
       "=======================================================================================\n",
       "Dep. Variable:     donor_advisory_2011_to_2016   No. Observations:                 4808\n",
       "Model:                                   Logit   Df Residuals:                     4792\n",
       "Method:                                    MLE   Df Model:                           15\n",
       "Date:                         Fri, 02 Sep 2016   Pseudo R-squ.:                 0.08892\n",
       "Time:                                 12:11:05   Log-Likelihood:                -363.36\n",
       "converged:                               False   LL-Null:                       -398.83\n",
       "                                                 LLR p-value:                 3.049e-09\n",
       "=============================================================================================================\n",
       "                                                coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------------------------------\n",
       "Intercept                                    -2.6649      1.444     -1.845      0.065        -5.495     0.166\n",
       "C(category)[T.Arts, Culture, Humanities]     -0.5071      0.515     -0.985      0.325        -1.516     0.502\n",
       "C(category)[T.Community Development]         -0.3595      0.575     -0.626      0.532        -1.486     0.767\n",
       "C(category)[T.Education]                     -1.2142      0.794     -1.530      0.126        -2.770     0.341\n",
       "C(category)[T.Environment]                  -20.2785   9810.033     -0.002      0.998     -1.92e+04  1.92e+04\n",
       "C(category)[T.Health]                        -1.0181      0.548     -1.859      0.063        -2.092     0.055\n",
       "C(category)[T.Human Services]                -0.1780      0.415     -0.429      0.668        -0.992     0.636\n",
       "C(category)[T.Human and Civil Rights]        -0.6780      0.687     -0.987      0.324        -2.025     0.669\n",
       "C(category)[T.International]                 -0.3664      0.511     -0.717      0.474        -1.368     0.636\n",
       "C(category)[T.Religion]                       0.3200      0.449      0.712      0.476        -0.560     1.200\n",
       "C(category)[T.Research and Public Policy]    -0.5291      0.796     -0.665      0.506        -2.090     1.031\n",
       "SOX_policies                                 -0.4264      0.110     -3.882      0.000        -0.642    -0.211\n",
       "total_revenue_logged                          0.2790      0.101      2.753      0.006         0.080     0.478\n",
       "program_efficiency                           -3.0518      0.768     -3.976      0.000        -4.556    -1.547\n",
       "age                                          -0.0064      0.007     -0.977      0.329        -0.019     0.006\n",
       "complexity_2011                              -0.7374      0.274     -2.695      0.007        -1.274    -0.201\n",
       "=============================================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 1158,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "logit = smf.logit(formula=logit_formula, data=df[df['2011_data']==1]).fit() \n",
    "logit.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Test with standard errors clustered on state"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'donor_advisory_2016 ~ SOX_policies + total_revenue_logged +  program_efficiency + age + complexity_2011 + C(category)'"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "logit_formula"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "cols1 = ['donor_advisory_2011_to_2016', 'SOX_policies', 'total_revenue_logged', 'program_efficiency', 'age',\n",
    "         'complexity_2011', 'category', 'state']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warning: Maximum number of iterations has been exceeded.\n",
      "         Current function value: 0.075203\n",
      "         Iterations: 35\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals\n",
      "  \"Check mle_retvals\", ConvergenceWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th> <td>donor_advisory_2011_to_2016</td> <th>  No. Observations:  </th>  <td>  4833</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>                    <td>Logit</td>            <th>  Df Residuals:      </th>  <td>  4817</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>                    <td>MLE</td>             <th>  Df Model:          </th>  <td>    15</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>               <td>Tue, 06 Sep 2016</td>       <th>  Pseudo R-squ.:     </th>  <td>0.08962</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>                   <td>12:35:43</td>           <th>  Log-Likelihood:    </th> <td> -363.45</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>                <td>False</td>            <th>  LL-Null:           </th> <td> -399.24</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                           <td> </td>              <th>  LLR p-value:       </th> <td>2.351e-09</td>\n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "                      <td></td>                         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>                                 <td>   -2.6828</td> <td>    1.038</td> <td>   -2.585</td> <td> 0.010</td> <td>   -4.717    -0.649</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Arts, Culture, Humanities]</th>  <td>   -0.5078</td> <td>    0.465</td> <td>   -1.092</td> <td> 0.275</td> <td>   -1.419     0.404</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Community Development]</th>      <td>   -0.3612</td> <td>    0.700</td> <td>   -0.516</td> <td> 0.606</td> <td>   -1.733     1.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Education]</th>                  <td>   -1.2152</td> <td>    0.683</td> <td>   -1.779</td> <td> 0.075</td> <td>   -2.554     0.123</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Environment]</th>                <td>  -18.8719</td> <td>    0.433</td> <td>  -43.552</td> <td> 0.000</td> <td>  -19.721   -18.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Health]</th>                     <td>   -1.0195</td> <td>    0.520</td> <td>   -1.959</td> <td> 0.050</td> <td>   -2.039     0.000</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Human Services]</th>             <td>   -0.1790</td> <td>    0.370</td> <td>   -0.484</td> <td> 0.628</td> <td>   -0.904     0.546</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Human and Civil Rights]</th>     <td>   -0.6823</td> <td>    0.483</td> <td>   -1.412</td> <td> 0.158</td> <td>   -1.629     0.265</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.International]</th>              <td>   -0.3676</td> <td>    0.487</td> <td>   -0.754</td> <td> 0.451</td> <td>   -1.323     0.588</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Religion]</th>                   <td>    0.3141</td> <td>    0.470</td> <td>    0.668</td> <td> 0.504</td> <td>   -0.607     1.235</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Research and Public Policy]</th> <td>   -0.5298</td> <td>    0.526</td> <td>   -1.007</td> <td> 0.314</td> <td>   -1.561     0.501</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>SOX_policies</th>                              <td>   -0.4256</td> <td>    0.093</td> <td>   -4.588</td> <td> 0.000</td> <td>   -0.607    -0.244</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>total_revenue_logged</th>                      <td>    0.2806</td> <td>    0.074</td> <td>    3.776</td> <td> 0.000</td> <td>    0.135     0.426</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>program_efficiency</th>                        <td>   -3.0563</td> <td>    0.816</td> <td>   -3.744</td> <td> 0.000</td> <td>   -4.656    -1.456</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>age</th>                                       <td>   -0.0065</td> <td>    0.005</td> <td>   -1.199</td> <td> 0.231</td> <td>   -0.017     0.004</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>complexity_2011</th>                           <td>   -0.7391</td> <td>    0.263</td> <td>   -2.814</td> <td> 0.005</td> <td>   -1.254    -0.224</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                                Logit Regression Results                               \n",
       "=======================================================================================\n",
       "Dep. Variable:     donor_advisory_2011_to_2016   No. Observations:                 4833\n",
       "Model:                                   Logit   Df Residuals:                     4817\n",
       "Method:                                    MLE   Df Model:                           15\n",
       "Date:                         Tue, 06 Sep 2016   Pseudo R-squ.:                 0.08962\n",
       "Time:                                 12:35:43   Log-Likelihood:                -363.45\n",
       "converged:                               False   LL-Null:                       -399.24\n",
       "                                                 LLR p-value:                 2.351e-09\n",
       "=============================================================================================================\n",
       "                                                coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------------------------------\n",
       "Intercept                                    -2.6828      1.038     -2.585      0.010        -4.717    -0.649\n",
       "C(category)[T.Arts, Culture, Humanities]     -0.5078      0.465     -1.092      0.275        -1.419     0.404\n",
       "C(category)[T.Community Development]         -0.3612      0.700     -0.516      0.606        -1.733     1.010\n",
       "C(category)[T.Education]                     -1.2152      0.683     -1.779      0.075        -2.554     0.123\n",
       "C(category)[T.Environment]                  -18.8719      0.433    -43.552      0.000       -19.721   -18.023\n",
       "C(category)[T.Health]                        -1.0195      0.520     -1.959      0.050        -2.039     0.000\n",
       "C(category)[T.Human Services]                -0.1790      0.370     -0.484      0.628        -0.904     0.546\n",
       "C(category)[T.Human and Civil Rights]        -0.6823      0.483     -1.412      0.158        -1.629     0.265\n",
       "C(category)[T.International]                 -0.3676      0.487     -0.754      0.451        -1.323     0.588\n",
       "C(category)[T.Religion]                       0.3141      0.470      0.668      0.504        -0.607     1.235\n",
       "C(category)[T.Research and Public Policy]    -0.5298      0.526     -1.007      0.314        -1.561     0.501\n",
       "SOX_policies                                 -0.4256      0.093     -4.588      0.000        -0.607    -0.244\n",
       "total_revenue_logged                          0.2806      0.074      3.776      0.000         0.135     0.426\n",
       "program_efficiency                           -3.0563      0.816     -3.744      0.000        -4.656    -1.456\n",
       "age                                          -0.0065      0.005     -1.199      0.231        -0.017     0.004\n",
       "complexity_2011                              -0.7391      0.263     -2.814      0.005        -1.254    -0.224\n",
       "=============================================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "logit = smf.logit(formula=logit_formula, data=df_2011[cols1].dropna()).fit(cov_type='cluster', \n",
    "                                                    cov_kwds={'groups': df_2011[cols1].dropna()['state']}) \n",
    "logit.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Below this I've played around with creating a '2011' dataset --> and run a couple of logits"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<br>Here are the two variables that serve as indicators of '2016' and '2011' CN data. **_These 8,304 and 4,863 rows will serve as the base for conducting the logit regressions_**. Design variable creation solutions are these subsets of data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 373,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "8304\n",
      "4863\n"
     ]
    }
   ],
   "source": [
    "print len(df[df['latest_entry']=='True'])\n",
    "print len(df[df['2011 data']==1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "- Control variables:\n",
    "  - Size: total revenues best (probably logged)\n",
    "    - will need 2011 and 2016 versions for the 4th and 5th tests\n",
    "  - efficiency ratio\n",
    "  - complexity (could be a good control from Erica's paper)\n",
    "  - fixed effects:\n",
    "    - state\n",
    "    - category\n",
    "      - I need to scrape the category dummies for the new orgs in the 2016 database\n",
    "        - CN does not include that information in the ratings area, but it is included on the webpage in the 'breadcrumbs' area\n",
    "  - The focus of our paper is on SOX policies; if an org has SOX policies it probably has other governance policies, and these would be highly correlated. So, we will leave the other governance variables out of one version of the 4th and 5th tests, and then try to include them in another set. The best candidates are:\n",
    "    - *independent board* --> related to Erica's *independence of key actors\" concept\n",
    "    - *board review of 990* and *audited financials* --> both related to Erica's *board monitoring* concept\n",
    "    - we could include other governance variables as needed.\n",
    "- We are focusing on non-health, non-university organizations; by focusing more on a donor-focused sample (CN), we are differentiating the work from previous studies.\n",
    "- To differentiate from Erica's *JBE* paper, we should use the SOI data to see how many of the donor advisories are because of 'non-material diversions'.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 397,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_star_2011', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011']\n"
     ]
    }
   ],
   "source": [
    "cols_2011 = [col for col in list(df) if col.endswith('_2011')]\n",
    "print cols_2011"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 398,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "charity_name_2011                    object\n",
       "category_2011                        object\n",
       "city_2011                            object\n",
       "state_2011                           object\n",
       "cause_2011                           object\n",
       "tag_line_2011                        object\n",
       "url_2011                             object\n",
       "ein_2011                             object\n",
       "fye_2011                             object\n",
       "overall_rating_2011                 float64\n",
       "overall_rating_star_2011            float64\n",
       "efficiency_rating_2011              float64\n",
       "AT_rating_2011                      float64\n",
       "financial_rating_star_2011          float64\n",
       "AT_rating_star_2011                 float64\n",
       "program_expense_percent_2011        float64\n",
       "admin_expense_percent_2011          float64\n",
       "fund_expense_percent_2011           float64\n",
       "fund_efficiency_2011                float64\n",
       "primary_revenue_growth_2011         float64\n",
       "program_expense_growth_2011         float64\n",
       "working_capital_ratio_2011          float64\n",
       "independent_board_2011               object\n",
       "no_material_division_2011            object\n",
       "audited_financials_2011              object\n",
       "no_loans_related_2011                object\n",
       "documents_minutes_2011               object\n",
       "form_990_2011                        object\n",
       "conflict_of_interest_policy_2011     object\n",
       "whistleblower_policy_2011            object\n",
       "records_retention_policy_2011        object\n",
       "CEO_listed_2011                      object\n",
       "process_CEO_compensation_2011        object\n",
       "no_board_compensation_2011           object\n",
       "donor_privacy_policy_2011            object\n",
       "board_listed_2011                    object\n",
       "audited_financials_web_2011          object\n",
       "form_990_web_2011                    object\n",
       "staff_listed_2011                    object\n",
       "primary_revenue_2011                float64\n",
       "other_revenue_2011                  float64\n",
       "total_revenue_2011                  float64\n",
       "govt_revenue_2011                    object\n",
       "program_expense_2011                float64\n",
       "admin_expense_2011                  float64\n",
       "fund_expense_2011                   float64\n",
       "total_functional_expense_2011       float64\n",
       "affiliate_payments_2011             float64\n",
       "budget_surplus_2011                 float64\n",
       "net_assets_2011                     float64\n",
       "leader_comp_2011                    float64\n",
       "leader_comp_percent_2011            float64\n",
       "email_2011                           object\n",
       "website_2011                         object\n",
       "dtype: object"
      ]
     },
     "execution_count": 398,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[cols_2011].dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 379,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>overall_rating_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>53.547455</td>\n",
       "      <td>8.933975e+00</td>\n",
       "      <td>8.040000e+00</td>\n",
       "      <td>48.73</td>\n",
       "      <td>54.61</td>\n",
       "      <td>59.97</td>\n",
       "      <td>6.996000e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>overall_rating_star_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>2.869646</td>\n",
       "      <td>8.916785e-01</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>2.00</td>\n",
       "      <td>3.00</td>\n",
       "      <td>3.00</td>\n",
       "      <td>4.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>efficiency_rating_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>53.430780</td>\n",
       "      <td>1.048261e+01</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>46.53</td>\n",
       "      <td>54.88</td>\n",
       "      <td>61.73</td>\n",
       "      <td>6.997000e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AT_rating_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>56.446100</td>\n",
       "      <td>1.152050e+01</td>\n",
       "      <td>-2.000000e+00</td>\n",
       "      <td>52.00</td>\n",
       "      <td>59.00</td>\n",
       "      <td>63.00</td>\n",
       "      <td>7.000000e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>financial_rating_star_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>2.850403</td>\n",
       "      <td>1.007304e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>2.00</td>\n",
       "      <td>3.00</td>\n",
       "      <td>4.00</td>\n",
       "      <td>4.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AT_rating_star_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>3.128078</td>\n",
       "      <td>1.001005e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>3.00</td>\n",
       "      <td>3.00</td>\n",
       "      <td>4.00</td>\n",
       "      <td>4.000000e+00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>program_expense_percent_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>80.416325</td>\n",
       "      <td>1.055343e+01</td>\n",
       "      <td>2.200000e+00</td>\n",
       "      <td>75.50</td>\n",
       "      <td>81.60</td>\n",
       "      <td>87.00</td>\n",
       "      <td>9.970000e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>admin_expense_percent_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>10.320608</td>\n",
       "      <td>6.665639e+00</td>\n",
       "      <td>-5.000000e-01</td>\n",
       "      <td>5.80</td>\n",
       "      <td>9.20</td>\n",
       "      <td>13.20</td>\n",
       "      <td>6.790000e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fund_expense_percent_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>9.110180</td>\n",
       "      <td>8.056593e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>4.10</td>\n",
       "      <td>7.50</td>\n",
       "      <td>11.90</td>\n",
       "      <td>9.070000e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fund_efficiency_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>0.129679</td>\n",
       "      <td>5.190376e-01</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.10</td>\n",
       "      <td>0.16</td>\n",
       "      <td>3.533000e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>primary_revenue_growth_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>4.403580</td>\n",
       "      <td>1.554988e+01</td>\n",
       "      <td>-7.960000e+01</td>\n",
       "      <td>-3.50</td>\n",
       "      <td>2.60</td>\n",
       "      <td>10.20</td>\n",
       "      <td>2.452000e+02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>program_expense_growth_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>6.314918</td>\n",
       "      <td>2.073699e+01</td>\n",
       "      <td>-5.320000e+01</td>\n",
       "      <td>-1.10</td>\n",
       "      <td>4.10</td>\n",
       "      <td>11.10</td>\n",
       "      <td>1.007400e+03</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>working_capital_ratio_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>1.788438</td>\n",
       "      <td>2.455707e+00</td>\n",
       "      <td>-3.260000e+00</td>\n",
       "      <td>0.44</td>\n",
       "      <td>1.01</td>\n",
       "      <td>2.20</td>\n",
       "      <td>5.842000e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>primary_revenue_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>16525465.992551</td>\n",
       "      <td>7.313324e+07</td>\n",
       "      <td>1.498700e+04</td>\n",
       "      <td>1989838.00</td>\n",
       "      <td>4430280.00</td>\n",
       "      <td>11346855.00</td>\n",
       "      <td>3.502077e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>other_revenue_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>651129.463066</td>\n",
       "      <td>6.235616e+06</td>\n",
       "      <td>-6.612359e+07</td>\n",
       "      <td>5922.00</td>\n",
       "      <td>99129.00</td>\n",
       "      <td>443689.00</td>\n",
       "      <td>2.392543e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>total_revenue_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>17176106.777985</td>\n",
       "      <td>7.570407e+07</td>\n",
       "      <td>-4.263887e+07</td>\n",
       "      <td>2103386.00</td>\n",
       "      <td>4673878.00</td>\n",
       "      <td>11721565.00</td>\n",
       "      <td>3.587775e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>program_expense_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>14784010.616387</td>\n",
       "      <td>6.451777e+07</td>\n",
       "      <td>2.848300e+04</td>\n",
       "      <td>1694422.00</td>\n",
       "      <td>3808132.00</td>\n",
       "      <td>9557716.00</td>\n",
       "      <td>3.091879e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>admin_expense_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>1477281.476929</td>\n",
       "      <td>7.805211e+06</td>\n",
       "      <td>-8.585600e+04</td>\n",
       "      <td>188863.00</td>\n",
       "      <td>412603.00</td>\n",
       "      <td>1010869.00</td>\n",
       "      <td>4.323913e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fund_expense_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>1133717.688599</td>\n",
       "      <td>5.382822e+06</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>160486.00</td>\n",
       "      <td>359811.00</td>\n",
       "      <td>830250.00</td>\n",
       "      <td>2.231224e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>total_functional_expense_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>17395155.368922</td>\n",
       "      <td>7.292829e+07</td>\n",
       "      <td>1.507310e+05</td>\n",
       "      <td>2188637.00</td>\n",
       "      <td>4769351.00</td>\n",
       "      <td>11766482.00</td>\n",
       "      <td>3.354177e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>affiliate_payments_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>60756.626733</td>\n",
       "      <td>1.812179e+06</td>\n",
       "      <td>-4.059500e+05</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>1.235951e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>budget_surplus_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>-219048.159321</td>\n",
       "      <td>9.895040e+06</td>\n",
       "      <td>-3.008552e+08</td>\n",
       "      <td>-477470.00</td>\n",
       "      <td>1652.00</td>\n",
       "      <td>452604.00</td>\n",
       "      <td>2.335980e+08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>net_assets_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>33372071.914339</td>\n",
       "      <td>1.651812e+08</td>\n",
       "      <td>-1.691832e+07</td>\n",
       "      <td>2013666.00</td>\n",
       "      <td>6004379.00</td>\n",
       "      <td>19206628.00</td>\n",
       "      <td>7.002755e+09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>leader_comp_2011</th>\n",
       "      <td>4657</td>\n",
       "      <td>153519.535753</td>\n",
       "      <td>1.313950e+05</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>80808.00</td>\n",
       "      <td>126250.00</td>\n",
       "      <td>191203.00</td>\n",
       "      <td>2.257910e+06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>leader_comp_percent_2011</th>\n",
       "      <td>4657</td>\n",
       "      <td>3.166133</td>\n",
       "      <td>3.015173e+00</td>\n",
       "      <td>0.000000e+00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>2.33</td>\n",
       "      <td>4.44</td>\n",
       "      <td>3.090000e+01</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                               count             mean           std  \\\n",
       "overall_rating_2011             4833        53.547455  8.933975e+00   \n",
       "overall_rating_star_2011        4833         2.869646  8.916785e-01   \n",
       "efficiency_rating_2011          4833        53.430780  1.048261e+01   \n",
       "AT_rating_2011                  4833        56.446100  1.152050e+01   \n",
       "financial_rating_star_2011      4833         2.850403  1.007304e+00   \n",
       "AT_rating_star_2011             4833         3.128078  1.001005e+00   \n",
       "program_expense_percent_2011    4833        80.416325  1.055343e+01   \n",
       "admin_expense_percent_2011      4833        10.320608  6.665639e+00   \n",
       "fund_expense_percent_2011       4833         9.110180  8.056593e+00   \n",
       "fund_efficiency_2011            4833         0.129679  5.190376e-01   \n",
       "primary_revenue_growth_2011     4833         4.403580  1.554988e+01   \n",
       "program_expense_growth_2011     4833         6.314918  2.073699e+01   \n",
       "working_capital_ratio_2011      4833         1.788438  2.455707e+00   \n",
       "primary_revenue_2011            4833  16525465.992551  7.313324e+07   \n",
       "other_revenue_2011              4833    651129.463066  6.235616e+06   \n",
       "total_revenue_2011              4833  17176106.777985  7.570407e+07   \n",
       "program_expense_2011            4833  14784010.616387  6.451777e+07   \n",
       "admin_expense_2011              4833   1477281.476929  7.805211e+06   \n",
       "fund_expense_2011               4833   1133717.688599  5.382822e+06   \n",
       "total_functional_expense_2011   4833  17395155.368922  7.292829e+07   \n",
       "affiliate_payments_2011         4833     60756.626733  1.812179e+06   \n",
       "budget_surplus_2011             4833   -219048.159321  9.895040e+06   \n",
       "net_assets_2011                 4833  33372071.914339  1.651812e+08   \n",
       "leader_comp_2011                4657    153519.535753  1.313950e+05   \n",
       "leader_comp_percent_2011        4657         3.166133  3.015173e+00   \n",
       "\n",
       "                                        min         25%         50%  \\\n",
       "overall_rating_2011            8.040000e+00       48.73       54.61   \n",
       "overall_rating_star_2011       0.000000e+00        2.00        3.00   \n",
       "efficiency_rating_2011         0.000000e+00       46.53       54.88   \n",
       "AT_rating_2011                -2.000000e+00       52.00       59.00   \n",
       "financial_rating_star_2011     0.000000e+00        2.00        3.00   \n",
       "AT_rating_star_2011            0.000000e+00        3.00        3.00   \n",
       "program_expense_percent_2011   2.200000e+00       75.50       81.60   \n",
       "admin_expense_percent_2011    -5.000000e-01        5.80        9.20   \n",
       "fund_expense_percent_2011      0.000000e+00        4.10        7.50   \n",
       "fund_efficiency_2011           0.000000e+00        0.05        0.10   \n",
       "primary_revenue_growth_2011   -7.960000e+01       -3.50        2.60   \n",
       "program_expense_growth_2011   -5.320000e+01       -1.10        4.10   \n",
       "working_capital_ratio_2011    -3.260000e+00        0.44        1.01   \n",
       "primary_revenue_2011           1.498700e+04  1989838.00  4430280.00   \n",
       "other_revenue_2011            -6.612359e+07     5922.00    99129.00   \n",
       "total_revenue_2011            -4.263887e+07  2103386.00  4673878.00   \n",
       "program_expense_2011           2.848300e+04  1694422.00  3808132.00   \n",
       "admin_expense_2011            -8.585600e+04   188863.00   412603.00   \n",
       "fund_expense_2011              0.000000e+00   160486.00   359811.00   \n",
       "total_functional_expense_2011  1.507310e+05  2188637.00  4769351.00   \n",
       "affiliate_payments_2011       -4.059500e+05        0.00        0.00   \n",
       "budget_surplus_2011           -3.008552e+08  -477470.00     1652.00   \n",
       "net_assets_2011               -1.691832e+07  2013666.00  6004379.00   \n",
       "leader_comp_2011               0.000000e+00    80808.00   126250.00   \n",
       "leader_comp_percent_2011       0.000000e+00        1.00        2.33   \n",
       "\n",
       "                                       75%           max  \n",
       "overall_rating_2011                  59.97  6.996000e+01  \n",
       "overall_rating_star_2011              3.00  4.000000e+00  \n",
       "efficiency_rating_2011               61.73  6.997000e+01  \n",
       "AT_rating_2011                       63.00  7.000000e+01  \n",
       "financial_rating_star_2011            4.00  4.000000e+00  \n",
       "AT_rating_star_2011                   4.00  4.000000e+00  \n",
       "program_expense_percent_2011         87.00  9.970000e+01  \n",
       "admin_expense_percent_2011           13.20  6.790000e+01  \n",
       "fund_expense_percent_2011            11.90  9.070000e+01  \n",
       "fund_efficiency_2011                  0.16  3.533000e+01  \n",
       "primary_revenue_growth_2011          10.20  2.452000e+02  \n",
       "program_expense_growth_2011          11.10  1.007400e+03  \n",
       "working_capital_ratio_2011            2.20  5.842000e+01  \n",
       "primary_revenue_2011           11346855.00  3.502077e+09  \n",
       "other_revenue_2011               443689.00  2.392543e+08  \n",
       "total_revenue_2011             11721565.00  3.587775e+09  \n",
       "program_expense_2011            9557716.00  3.091879e+09  \n",
       "admin_expense_2011              1010869.00  4.323913e+08  \n",
       "fund_expense_2011                830250.00  2.231224e+08  \n",
       "total_functional_expense_2011  11766482.00  3.354177e+09  \n",
       "affiliate_payments_2011               0.00  1.235951e+08  \n",
       "budget_surplus_2011              452604.00  2.335980e+08  \n",
       "net_assets_2011                19206628.00  7.002755e+09  \n",
       "leader_comp_2011                 191203.00  2.257910e+06  \n",
       "leader_comp_percent_2011              4.44  3.090000e+01  "
      ]
     },
     "execution_count": 379,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[cols_2011].describe().T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 380,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import division"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Program Efficiency Ratio\n",
    "efficiency = ProgExp/TotExp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 386,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "34         NaN\n",
       "35         NaN\n",
       "36         NaN\n",
       "37         NaN\n",
       "38         NaN\n",
       "39    0.824939\n",
       "40         NaN\n",
       "41         NaN\n",
       "42         NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 386,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['org_id']=='12123']['program_expense_2011']/df[df['org_id']=='12123']['total_functional_expense_2011']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 388,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>charity_name_2011</th>\n",
       "      <th>category_2011</th>\n",
       "      <th>city_2011</th>\n",
       "      <th>state_2011</th>\n",
       "      <th>cause_2011</th>\n",
       "      <th>tag_line_2011</th>\n",
       "      <th>url_2011</th>\n",
       "      <th>ein_2011</th>\n",
       "      <th>fye_2011</th>\n",
       "      <th>overall_rating_2011</th>\n",
       "      <th>overall_rating_star_2011</th>\n",
       "      <th>efficiency_rating_2011</th>\n",
       "      <th>AT_rating_2011</th>\n",
       "      <th>financial_rating_star_2011</th>\n",
       "      <th>AT_rating_star_2011</th>\n",
       "      <th>program_expense_percent_2011</th>\n",
       "      <th>admin_expense_percent_2011</th>\n",
       "      <th>fund_expense_percent_2011</th>\n",
       "      <th>fund_efficiency_2011</th>\n",
       "      <th>primary_revenue_growth_2011</th>\n",
       "      <th>program_expense_growth_2011</th>\n",
       "      <th>working_capital_ratio_2011</th>\n",
       "      <th>independent_board_2011</th>\n",
       "      <th>no_material_division_2011</th>\n",
       "      <th>audited_financials_2011</th>\n",
       "      <th>no_loans_related_2011</th>\n",
       "      <th>documents_minutes_2011</th>\n",
       "      <th>form_990_2011</th>\n",
       "      <th>conflict_of_interest_policy_2011</th>\n",
       "      <th>whistleblower_policy_2011</th>\n",
       "      <th>records_retention_policy_2011</th>\n",
       "      <th>CEO_listed_2011</th>\n",
       "      <th>process_CEO_compensation_2011</th>\n",
       "      <th>no_board_compensation_2011</th>\n",
       "      <th>donor_privacy_policy_2011</th>\n",
       "      <th>board_listed_2011</th>\n",
       "      <th>audited_financials_web_2011</th>\n",
       "      <th>form_990_web_2011</th>\n",
       "      <th>staff_listed_2011</th>\n",
       "      <th>primary_revenue_2011</th>\n",
       "      <th>other_revenue_2011</th>\n",
       "      <th>total_revenue_2011</th>\n",
       "      <th>govt_revenue_2011</th>\n",
       "      <th>program_expense_2011</th>\n",
       "      <th>admin_expense_2011</th>\n",
       "      <th>fund_expense_2011</th>\n",
       "      <th>total_functional_expense_2011</th>\n",
       "      <th>affiliate_payments_2011</th>\n",
       "      <th>budget_surplus_2011</th>\n",
       "      <th>net_assets_2011</th>\n",
       "      <th>leader_comp_2011</th>\n",
       "      <th>leader_comp_percent_2011</th>\n",
       "      <th>email_2011</th>\n",
       "      <th>website_2011</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>100 Club of Arizona</td>\n",
       "      <td>Human Services</td>\n",
       "      <td>Phoenix</td>\n",
       "      <td>AZ</td>\n",
       "      <td>Multipurpose Human Service Organizations</td>\n",
       "      <td>Supporting families of public safety</td>\n",
       "      <td>http://www.charitynavigator.org/index.cfm?bay=search.summary&amp;orgid=12123</td>\n",
       "      <td>23-7172077</td>\n",
       "      <td>12/2009</td>\n",
       "      <td>63.84</td>\n",
       "      <td>4</td>\n",
       "      <td>66.58</td>\n",
       "      <td>62</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>82.4</td>\n",
       "      <td>14.4</td>\n",
       "      <td>3</td>\n",
       "      <td>0.03</td>\n",
       "      <td>6.2</td>\n",
       "      <td>6.3</td>\n",
       "      <td>1.06</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>NO</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>NO</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>1212051</td>\n",
       "      <td>-227543</td>\n",
       "      <td>984508</td>\n",
       "      <td>Note: This organization receives $0 in government support.</td>\n",
       "      <td>1019191</td>\n",
       "      <td>178385</td>\n",
       "      <td>37899</td>\n",
       "      <td>1235475</td>\n",
       "      <td>0</td>\n",
       "      <td>-250967</td>\n",
       "      <td>1316781</td>\n",
       "      <td>122623</td>\n",
       "      <td>9.92</td>\n",
       "      <td>info@100club.org</td>\n",
       "      <td>http://www.100club.org</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      charity_name_2011   category_2011 city_2011 state_2011  \\\n",
       "34                  NaN             NaN       NaN        NaN   \n",
       "35                  NaN             NaN       NaN        NaN   \n",
       "36                  NaN             NaN       NaN        NaN   \n",
       "37                  NaN             NaN       NaN        NaN   \n",
       "38                  NaN             NaN       NaN        NaN   \n",
       "39  100 Club of Arizona  Human Services   Phoenix         AZ   \n",
       "40                  NaN             NaN       NaN        NaN   \n",
       "41                  NaN             NaN       NaN        NaN   \n",
       "42                  NaN             NaN       NaN        NaN   \n",
       "\n",
       "                                  cause_2011  \\\n",
       "34                                       NaN   \n",
       "35                                       NaN   \n",
       "36                                       NaN   \n",
       "37                                       NaN   \n",
       "38                                       NaN   \n",
       "39  Multipurpose Human Service Organizations   \n",
       "40                                       NaN   \n",
       "41                                       NaN   \n",
       "42                                       NaN   \n",
       "\n",
       "                           tag_line_2011  \\\n",
       "34                                   NaN   \n",
       "35                                   NaN   \n",
       "36                                   NaN   \n",
       "37                                   NaN   \n",
       "38                                   NaN   \n",
       "39  Supporting families of public safety   \n",
       "40                                   NaN   \n",
       "41                                   NaN   \n",
       "42                                   NaN   \n",
       "\n",
       "                                                                    url_2011  \\\n",
       "34                                                                       NaN   \n",
       "35                                                                       NaN   \n",
       "36                                                                       NaN   \n",
       "37                                                                       NaN   \n",
       "38                                                                       NaN   \n",
       "39  http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=12123   \n",
       "40                                                                       NaN   \n",
       "41                                                                       NaN   \n",
       "42                                                                       NaN   \n",
       "\n",
       "      ein_2011   fye_2011  overall_rating_2011  overall_rating_star_2011  \\\n",
       "34         NaN        NaN                  NaN                       NaN   \n",
       "35         NaN        NaN                  NaN                       NaN   \n",
       "36         NaN        NaN                  NaN                       NaN   \n",
       "37         NaN        NaN                  NaN                       NaN   \n",
       "38         NaN        NaN                  NaN                       NaN   \n",
       "39  23-7172077   12/2009                 63.84                         4   \n",
       "40         NaN        NaN                  NaN                       NaN   \n",
       "41         NaN        NaN                  NaN                       NaN   \n",
       "42         NaN        NaN                  NaN                       NaN   \n",
       "\n",
       "    efficiency_rating_2011  AT_rating_2011  financial_rating_star_2011  \\\n",
       "34                     NaN             NaN                         NaN   \n",
       "35                     NaN             NaN                         NaN   \n",
       "36                     NaN             NaN                         NaN   \n",
       "37                     NaN             NaN                         NaN   \n",
       "38                     NaN             NaN                         NaN   \n",
       "39                   66.58              62                           4   \n",
       "40                     NaN             NaN                         NaN   \n",
       "41                     NaN             NaN                         NaN   \n",
       "42                     NaN             NaN                         NaN   \n",
       "\n",
       "    AT_rating_star_2011  program_expense_percent_2011  \\\n",
       "34                  NaN                           NaN   \n",
       "35                  NaN                           NaN   \n",
       "36                  NaN                           NaN   \n",
       "37                  NaN                           NaN   \n",
       "38                  NaN                           NaN   \n",
       "39                    4                          82.4   \n",
       "40                  NaN                           NaN   \n",
       "41                  NaN                           NaN   \n",
       "42                  NaN                           NaN   \n",
       "\n",
       "    admin_expense_percent_2011  fund_expense_percent_2011  \\\n",
       "34                         NaN                        NaN   \n",
       "35                         NaN                        NaN   \n",
       "36                         NaN                        NaN   \n",
       "37                         NaN                        NaN   \n",
       "38                         NaN                        NaN   \n",
       "39                        14.4                          3   \n",
       "40                         NaN                        NaN   \n",
       "41                         NaN                        NaN   \n",
       "42                         NaN                        NaN   \n",
       "\n",
       "    fund_efficiency_2011  primary_revenue_growth_2011  \\\n",
       "34                   NaN                          NaN   \n",
       "35                   NaN                          NaN   \n",
       "36                   NaN                          NaN   \n",
       "37                   NaN                          NaN   \n",
       "38                   NaN                          NaN   \n",
       "39                  0.03                          6.2   \n",
       "40                   NaN                          NaN   \n",
       "41                   NaN                          NaN   \n",
       "42                   NaN                          NaN   \n",
       "\n",
       "    program_expense_growth_2011  working_capital_ratio_2011  \\\n",
       "34                          NaN                         NaN   \n",
       "35                          NaN                         NaN   \n",
       "36                          NaN                         NaN   \n",
       "37                          NaN                         NaN   \n",
       "38                          NaN                         NaN   \n",
       "39                          6.3                        1.06   \n",
       "40                          NaN                         NaN   \n",
       "41                          NaN                         NaN   \n",
       "42                          NaN                         NaN   \n",
       "\n",
       "   independent_board_2011 no_material_division_2011 audited_financials_2011  \\\n",
       "34                    NaN                       NaN                     NaN   \n",
       "35                    NaN                       NaN                     NaN   \n",
       "36                    NaN                       NaN                     NaN   \n",
       "37                    NaN                       NaN                     NaN   \n",
       "38                    NaN                       NaN                     NaN   \n",
       "39                    yes                       yes                     yes   \n",
       "40                    NaN                       NaN                     NaN   \n",
       "41                    NaN                       NaN                     NaN   \n",
       "42                    NaN                       NaN                     NaN   \n",
       "\n",
       "   no_loans_related_2011 documents_minutes_2011 form_990_2011  \\\n",
       "34                   NaN                    NaN           NaN   \n",
       "35                   NaN                    NaN           NaN   \n",
       "36                   NaN                    NaN           NaN   \n",
       "37                   NaN                    NaN           NaN   \n",
       "38                   NaN                    NaN           NaN   \n",
       "39                   yes                    yes           yes   \n",
       "40                   NaN                    NaN           NaN   \n",
       "41                   NaN                    NaN           NaN   \n",
       "42                   NaN                    NaN           NaN   \n",
       "\n",
       "   conflict_of_interest_policy_2011 whistleblower_policy_2011  \\\n",
       "34                              NaN                       NaN   \n",
       "35                              NaN                       NaN   \n",
       "36                              NaN                       NaN   \n",
       "37                              NaN                       NaN   \n",
       "38                              NaN                       NaN   \n",
       "39                              yes                       yes   \n",
       "40                              NaN                       NaN   \n",
       "41                              NaN                       NaN   \n",
       "42                              NaN                       NaN   \n",
       "\n",
       "   records_retention_policy_2011 CEO_listed_2011  \\\n",
       "34                           NaN             NaN   \n",
       "35                           NaN             NaN   \n",
       "36                           NaN             NaN   \n",
       "37                           NaN             NaN   \n",
       "38                           NaN             NaN   \n",
       "39                            NO             yes   \n",
       "40                           NaN             NaN   \n",
       "41                           NaN             NaN   \n",
       "42                           NaN             NaN   \n",
       "\n",
       "   process_CEO_compensation_2011 no_board_compensation_2011  \\\n",
       "34                           NaN                        NaN   \n",
       "35                           NaN                        NaN   \n",
       "36                           NaN                        NaN   \n",
       "37                           NaN                        NaN   \n",
       "38                           NaN                        NaN   \n",
       "39                           yes                        yes   \n",
       "40                           NaN                        NaN   \n",
       "41                           NaN                        NaN   \n",
       "42                           NaN                        NaN   \n",
       "\n",
       "   donor_privacy_policy_2011 board_listed_2011 audited_financials_web_2011  \\\n",
       "34                       NaN               NaN                         NaN   \n",
       "35                       NaN               NaN                         NaN   \n",
       "36                       NaN               NaN                         NaN   \n",
       "37                       NaN               NaN                         NaN   \n",
       "38                       NaN               NaN                         NaN   \n",
       "39                       yes               yes                          NO   \n",
       "40                       NaN               NaN                         NaN   \n",
       "41                       NaN               NaN                         NaN   \n",
       "42                       NaN               NaN                         NaN   \n",
       "\n",
       "   form_990_web_2011 staff_listed_2011  primary_revenue_2011  \\\n",
       "34               NaN               NaN                   NaN   \n",
       "35               NaN               NaN                   NaN   \n",
       "36               NaN               NaN                   NaN   \n",
       "37               NaN               NaN                   NaN   \n",
       "38               NaN               NaN                   NaN   \n",
       "39               yes               yes               1212051   \n",
       "40               NaN               NaN                   NaN   \n",
       "41               NaN               NaN                   NaN   \n",
       "42               NaN               NaN                   NaN   \n",
       "\n",
       "    other_revenue_2011  total_revenue_2011  \\\n",
       "34                 NaN                 NaN   \n",
       "35                 NaN                 NaN   \n",
       "36                 NaN                 NaN   \n",
       "37                 NaN                 NaN   \n",
       "38                 NaN                 NaN   \n",
       "39             -227543              984508   \n",
       "40                 NaN                 NaN   \n",
       "41                 NaN                 NaN   \n",
       "42                 NaN                 NaN   \n",
       "\n",
       "                                             govt_revenue_2011  \\\n",
       "34                                                         NaN   \n",
       "35                                                         NaN   \n",
       "36                                                         NaN   \n",
       "37                                                         NaN   \n",
       "38                                                         NaN   \n",
       "39  Note: This organization receives $0 in government support.   \n",
       "40                                                         NaN   \n",
       "41                                                         NaN   \n",
       "42                                                         NaN   \n",
       "\n",
       "    program_expense_2011  admin_expense_2011  fund_expense_2011  \\\n",
       "34                   NaN                 NaN                NaN   \n",
       "35                   NaN                 NaN                NaN   \n",
       "36                   NaN                 NaN                NaN   \n",
       "37                   NaN                 NaN                NaN   \n",
       "38                   NaN                 NaN                NaN   \n",
       "39               1019191              178385              37899   \n",
       "40                   NaN                 NaN                NaN   \n",
       "41                   NaN                 NaN                NaN   \n",
       "42                   NaN                 NaN                NaN   \n",
       "\n",
       "    total_functional_expense_2011  affiliate_payments_2011  \\\n",
       "34                            NaN                      NaN   \n",
       "35                            NaN                      NaN   \n",
       "36                            NaN                      NaN   \n",
       "37                            NaN                      NaN   \n",
       "38                            NaN                      NaN   \n",
       "39                        1235475                        0   \n",
       "40                            NaN                      NaN   \n",
       "41                            NaN                      NaN   \n",
       "42                            NaN                      NaN   \n",
       "\n",
       "    budget_surplus_2011  net_assets_2011  leader_comp_2011  \\\n",
       "34                  NaN              NaN               NaN   \n",
       "35                  NaN              NaN               NaN   \n",
       "36                  NaN              NaN               NaN   \n",
       "37                  NaN              NaN               NaN   \n",
       "38                  NaN              NaN               NaN   \n",
       "39              -250967          1316781            122623   \n",
       "40                  NaN              NaN               NaN   \n",
       "41                  NaN              NaN               NaN   \n",
       "42                  NaN              NaN               NaN   \n",
       "\n",
       "    leader_comp_percent_2011        email_2011            website_2011  \n",
       "34                       NaN               NaN                     NaN  \n",
       "35                       NaN               NaN                     NaN  \n",
       "36                       NaN               NaN                     NaN  \n",
       "37                       NaN               NaN                     NaN  \n",
       "38                       NaN               NaN                     NaN  \n",
       "39                      9.92  info@100club.org  http://www.100club.org  \n",
       "40                       NaN               NaN                     NaN  \n",
       "41                       NaN               NaN                     NaN  \n",
       "42                       NaN               NaN                     NaN  "
      ]
     },
     "execution_count": 388,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['org_id']=='12123'][cols_2011]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 395,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "9          NaN\n",
       "21    0.797448\n",
       "39    0.824939\n",
       "52    0.854655\n",
       "63    0.786945\n",
       "dtype: float64"
      ]
     },
     "execution_count": 395,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['2011 data']==1]['program_expense_2011'][:5]/df[df['2011 data']==1]['total_functional_expense_2011'][:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Complexity\n",
    "Number of revenue sources (Donations, Government Grants, Program Service Revenues)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 407,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4863\n",
      "4833\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Note: This organization receives $0 in government support.    2531\n",
       "GOVERNMENT SUPPORT MUST BE RECEIVED                           2302\n",
       "Name: govt_revenue_2011, dtype: int64"
      ]
     },
     "execution_count": 407,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print len(df[df['2011 data']==1])\n",
    "print len(df[df['govt_revenue_2011'].notnull()])\n",
    "df[df['govt_revenue_2011'].notnull()]['govt_revenue_2011'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 409,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "161                           GOVERNMENT SUPPORT MUST BE RECEIVED\n",
       "185    Note: This organization receives $0 in government support.\n",
       "208                           GOVERNMENT SUPPORT MUST BE RECEIVED\n",
       "228    Note: This organization receives $0 in government support.\n",
       "244    Note: This organization receives $0 in government support.\n",
       "255    Note: This organization receives $0 in government support.\n",
       "276    Note: This organization receives $0 in government support.\n",
       "Name: govt_revenue_2011, dtype: object"
      ]
     },
     "execution_count": 409,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['govt_revenue_2011'].notnull()]['govt_revenue_2011'][5:12]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 410,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2531\n",
       "1    2302\n",
       "Name: govt_revenue_2011_binary, dtype: int64"
      ]
     },
     "execution_count": 410,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['govt_revenue_2011_binary'] = np.nan\n",
    "df['govt_revenue_2011_binary'] = np.where(\n",
    "    df['govt_revenue_2011'] == 'Note: This organization receives $0 in government support.', 0,\n",
    "    df['govt_revenue_2011_binary'])\n",
    "df['govt_revenue_2011_binary'] = np.where(\n",
    "    df['govt_revenue_2011'] == 'GOVERNMENT SUPPORT MUST BE RECEIVED', 1, \n",
    "    df['govt_revenue_2011_binary'])\n",
    "df['govt_revenue_2011_binary'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 420,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4863\n",
      "4833\n",
      "21     3914222\n",
      "39     1212051\n",
      "52      762512\n",
      "63     1140158\n",
      "148    1375169\n",
      "Name: primary_revenue_2011, dtype: float64\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "4833"
      ]
     },
     "execution_count": 420,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print len(df[df['2011 data']==1])\n",
    "print len(df[df['primary_revenue_2011'].notnull()])\n",
    "print df[df['primary_revenue_2011'].notnull()]['primary_revenue_2011'][:5]\n",
    "df[df['primary_revenue_2011'].notnull()]['primary_revenue_2011'].value_counts().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 427,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "39        -227543\n",
       "63         -44041\n",
       "312       -572243\n",
       "362         -2666\n",
       "385         -7531\n",
       "477      -4653961\n",
       "558       -176202\n",
       "907      -3543041\n",
       "942      -1726139\n",
       "1161      -831757\n",
       "1317        -7494\n",
       "1352      -651915\n",
       "1372      -288151\n",
       "1390            0\n",
       "1604     -7067577\n",
       "1674      -271688\n",
       "1689    -11111670\n",
       "1705       -46280\n",
       "1927     -1702433\n",
       "1955        -6955\n",
       "1982       -77563\n",
       "2013      -388287\n",
       "2027       -76537\n",
       "2099        -1543\n",
       "2196      -162612\n",
       "2249        -2860\n",
       "2352      -113036\n",
       "2436          678\n",
       "2597        -7227\n",
       "2663      -456651\n",
       "           ...   \n",
       "82598    -1308477\n",
       "82683           0\n",
       "82914     -212856\n",
       "82971    -1358658\n",
       "82987      -29502\n",
       "83033    -1003245\n",
       "83098     -356419\n",
       "83115   -10947728\n",
       "83366         991\n",
       "83423      -18112\n",
       "83590     -173857\n",
       "83632       -1810\n",
       "83713    -4529558\n",
       "83746           0\n",
       "83758         251\n",
       "83841     -364815\n",
       "83873     -313582\n",
       "83898           0\n",
       "83910         300\n",
       "83911           0\n",
       "83912          56\n",
       "83913          28\n",
       "83917       -4437\n",
       "83920       -3318\n",
       "83922         263\n",
       "83925         748\n",
       "83927           0\n",
       "83928          30\n",
       "83932     -228354\n",
       "83940    -6663315\n",
       "Name: other_revenue_2011, dtype: float64"
      ]
     },
     "execution_count": 427,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['other_revenue_2011']<1000]['other_revenue_2011']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 429,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "46"
      ]
     },
     "execution_count": 429,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df[df['other_revenue_2011']==0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 419,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4863\n",
      "4833\n",
      "21     216503\n",
      "39    -227543\n",
      "52      21340\n",
      "63     -44041\n",
      "148      5061\n",
      "Name: other_revenue_2011, dtype: float64\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "4833"
      ]
     },
     "execution_count": 419,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print len(df[df['2011 data']==1])\n",
    "print len(df[df['other_revenue_2011'].notnull()])\n",
    "print df[df['other_revenue_2011'].notnull()]['other_revenue_2011'][:5]\n",
    "df[df['other_revenue_2011'].notnull()]['other_revenue_2011'].value_counts().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 431,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4833\n",
      "4833\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "1    4787\n",
       "0      46\n",
       "Name: other_revenue_2011_binary, dtype: int64"
      ]
     },
     "execution_count": 431,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print len(df[df['other_revenue_2011'].notnull()])\n",
    "df['other_revenue_2011_binary'] = np.where(df['other_revenue_2011']>0, 1, df['other_revenue_2011'] )\n",
    "df['other_revenue_2011_binary'] = np.where(df['other_revenue_2011']<0, 1, df['other_revenue_2011_binary'] )\n",
    "print len(df[df['other_revenue_2011_binary'].notnull()])\n",
    "df['other_revenue_2011_binary'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 434,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4833\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "2    2501\n",
       "3    2294\n",
       "1      38\n",
       "Name: complexity_2011, dtype: int64"
      ]
     },
     "execution_count": 434,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['complexity_2011'] = 1 + df['other_revenue_2011_binary'] + df['govt_revenue_2011_binary']\n",
    "print len(df[df['complexity_2011'].notnull()])\n",
    "df['complexity_2011'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "cols_2011 = [col for col in list(df) if col.endswith('_2011')]\n",
    "print cols_2011"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Test Logits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1132,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.6.1\n"
     ]
    }
   ],
   "source": [
    "import statsmodels\n",
    "import statsmodels.api as sm\n",
    "import statsmodels.formula.api as smf   #FOR USING 'R'-STYLE FORMULAS FOR REGRESSIONS\n",
    "print statsmodels.__version__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 455,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "yes yes yes    3548\n",
       "yes NO NO       416\n",
       "yes yes NO      284\n",
       "yes NO yes      265\n",
       "NO NO NO        255\n",
       "NO NO yes        30\n",
       "NO yes yes       21\n",
       "NO yes NO        14\n",
       "Name: SOX_policies_2011, dtype: int64"
      ]
     },
     "execution_count": 455,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['SOX_policies_2011'] = df['conflict_of_interest_policy_2011'] + ' ' + df['whistleblower_policy_2011'] + ' ' + df['records_retention_policy_2011']\n",
    "df['SOX_policies_2011'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 456,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3    3548\n",
       "2     570\n",
       "1     460\n",
       "0     255\n",
       "Name: SOX_policies_2011, dtype: int64"
      ]
     },
     "execution_count": 456,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['SOX_policies_2011'] = df['SOX_policies_2011'].str.count('yes')\n",
    "df['SOX_policies_2011'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 491,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>total_revenue_2011_logged</th>\n",
       "      <td>4811</td>\n",
       "      <td>15.530981</td>\n",
       "      <td>1.282845</td>\n",
       "      <td>12.586466</td>\n",
       "      <td>14.569243</td>\n",
       "      <td>15.36338</td>\n",
       "      <td>16.27977</td>\n",
       "      <td>2.200080e+01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>total_revenue_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>17176106.777985</td>\n",
       "      <td>75704074.259743</td>\n",
       "      <td>-42638874.000000</td>\n",
       "      <td>2103386.000000</td>\n",
       "      <td>4673878.00000</td>\n",
       "      <td>11721565.00000</td>\n",
       "      <td>3.587775e+09</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           count             mean              std  \\\n",
       "total_revenue_2011_logged   4811        15.530981         1.282845   \n",
       "total_revenue_2011          4833  17176106.777985  75704074.259743   \n",
       "\n",
       "                                       min             25%            50%  \\\n",
       "total_revenue_2011_logged        12.586466       14.569243       15.36338   \n",
       "total_revenue_2011        -42638874.000000  2103386.000000  4673878.00000   \n",
       "\n",
       "                                      75%           max  \n",
       "total_revenue_2011_logged        16.27977  2.200080e+01  \n",
       "total_revenue_2011         11721565.00000  3.587775e+09  "
      ]
     },
     "execution_count": 491,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['total_revenue_2011_logged'] = np.log(df['total_revenue_2011'])\n",
    "df[['total_revenue_2011_logged', 'total_revenue_2011']].describe().T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "df['advisory'] = df['']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 545,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "donor_advisory_2016 ~ SOX_policies_2011 + total_revenue_2011_logged +  program_expense_percent_2011 + age +              complexity_2011 + C(category)\n"
     ]
    }
   ],
   "source": [
    "#IVs = '%s + ' % IV\n",
    "#DV = '%s ~ ' % DV  \n",
    "IVs = 'SOX_policies_2011 '\n",
    "#DV = 'advisory ~ '\n",
    "DV = 'donor_advisory_2016 ~ '\n",
    "controls = '+ total_revenue_2011_logged +  program_expense_percent_2011 + age +  \\\n",
    "            complexity_2011 + C(category)'\n",
    "\n",
    "#admin_expense_percent + leader_comp_percent + budget_surplus\n",
    "logit_formula = DV+IVs+controls\n",
    "print logit_formula\n",
    "#globals()[\"mod%s\" % model_num] = smf.logit(formula=logit_formula, data=df).fit()   \n",
    "#print globals()[\"mod%s\" % model_num].summary()\n",
    "# #print model_num.summary()\n",
    "#print '\\n', \"Chi-squared value:\", globals()[\"mod%s\" % model_num].llr, '\\n' #TO GET THE CHI-SQUARED VALUE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 532,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "df.to_csv('df.csv', encoding='utf-8')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 535,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "84958\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "4808"
      ]
     },
     "execution_count": 535,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "logit_variables = ['donor_advisory_2016', 'SOX_policies_2011', 'total_revenue_2011_logged', \n",
    "                   'program_expense_percent_2011', 'age', 'complexity_2011', 'state_2011', 'category']\n",
    "df_2011 = df[logit_variables]\n",
    "print len(df_2011)\n",
    "len(df_2011.dropna())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 519,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "donor_advisory_2016 ~ SOX_policies_2011 + total_revenue_2011_logged +  program_expense_percent_2011 + age +              complexity_2011\n"
     ]
    }
   ],
   "source": [
    "#IVs = '%s + ' % IV\n",
    "#DV = '%s ~ ' % DV  \n",
    "IVs = 'SOX_policies_2011 '\n",
    "#DV = 'advisory ~ '\n",
    "DV = 'donor_advisory_2016 ~ '\n",
    "controls = '+ total_revenue_2011_logged +  program_expense_percent_2011 + age +  \\\n",
    "            complexity_2011'\n",
    "\n",
    "#admin_expense_percent + leader_comp_percent + budget_surplus\n",
    "logit_formula = DV+IVs+controls\n",
    "print logit_formula\n",
    "#globals()[\"mod%s\" % model_num] = smf.logit(formula=logit_formula, data=df).fit()   \n",
    "#print globals()[\"mod%s\" % model_num].summary()\n",
    "# #print model_num.summary()\n",
    "#print '\\n', \"Chi-squared value:\", globals()[\"mod%s\" % model_num].llr, '\\n' #TO GET THE CHI-SQUARED VALUE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 536,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.044916\n",
      "         Iterations 10\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th> <td>donor_advisory_2016</td> <th>  No. Observations:  </th>  <td>  4808</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>                <td>Logit</td>        <th>  Df Residuals:      </th>  <td>  4802</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>                <td>MLE</td>         <th>  Df Model:          </th>  <td>     5</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>           <td>Wed, 31 Aug 2016</td>   <th>  Pseudo R-squ.:     </th>  <td>0.08557</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>               <td>21:43:38</td>       <th>  Log-Likelihood:    </th> <td> -215.96</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>            <td>True</td>         <th>  LL-Null:           </th> <td> -236.17</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                       <td> </td>          <th>  LLR p-value:       </th> <td>1.229e-07</td>\n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "                <td></td>                  <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>                    <td>   -3.0296</td> <td>    1.925</td> <td>   -1.574</td> <td> 0.115</td> <td>   -6.802     0.743</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>SOX_policies_2011</th>            <td>   -0.4992</td> <td>    0.145</td> <td>   -3.448</td> <td> 0.001</td> <td>   -0.783    -0.215</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>total_revenue_2011_logged</th>    <td>    0.3113</td> <td>    0.138</td> <td>    2.260</td> <td> 0.024</td> <td>    0.041     0.581</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>program_expense_percent_2011</th> <td>   -0.0285</td> <td>    0.009</td> <td>   -3.068</td> <td> 0.002</td> <td>   -0.047    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>age</th>                          <td>   -0.0104</td> <td>    0.009</td> <td>   -1.130</td> <td> 0.258</td> <td>   -0.028     0.008</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>complexity_2011</th>              <td>   -1.2189</td> <td>    0.358</td> <td>   -3.402</td> <td> 0.001</td> <td>   -1.921    -0.517</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                            Logit Regression Results                           \n",
       "===============================================================================\n",
       "Dep. Variable:     donor_advisory_2016   No. Observations:                 4808\n",
       "Model:                           Logit   Df Residuals:                     4802\n",
       "Method:                            MLE   Df Model:                            5\n",
       "Date:                 Wed, 31 Aug 2016   Pseudo R-squ.:                 0.08557\n",
       "Time:                         21:43:38   Log-Likelihood:                -215.96\n",
       "converged:                        True   LL-Null:                       -236.17\n",
       "                                         LLR p-value:                 1.229e-07\n",
       "================================================================================================\n",
       "                                   coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------------------\n",
       "Intercept                       -3.0296      1.925     -1.574      0.115        -6.802     0.743\n",
       "SOX_policies_2011               -0.4992      0.145     -3.448      0.001        -0.783    -0.215\n",
       "total_revenue_2011_logged        0.3113      0.138      2.260      0.024         0.041     0.581\n",
       "program_expense_percent_2011    -0.0285      0.009     -3.068      0.002        -0.047    -0.010\n",
       "age                             -0.0104      0.009     -1.130      0.258        -0.028     0.008\n",
       "complexity_2011                 -1.2189      0.358     -3.402      0.001        -1.921    -0.517\n",
       "================================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 536,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "logit = smf.logit(formula=logit_formula, data=df_2011).fit() \n",
    "logit.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "#def new_logit(IV,model_num):\n",
    "def new_logit_clustered(data, DV, columns, FE, model_num):\n",
    "    #IVs = '%s + ' % IV\n",
    "    #DV = 'RTs_binary ~ '\n",
    "    DV = '%s ~ ' % DV  \n",
    "    #controls = 'from_user_followers_count + time_on_twitter_days + CSR_sustainability +  \\\n",
    "    #         URLs_binary + photo'\n",
    "    IVs = ' + '.join(columns)\n",
    "    FE = '%s ' % FE\n",
    "    logit_formula = DV+IVs+FE\n",
    "    print logit_formula\n",
    "    globals()[\"mod%s\" % model_num] = smf.logit(formula=logit_formula, data=data).fit(cov_type='cluster',\n",
    "                                                        cov_kwds={'groups': df['firm_from_user_screen_name']})   \n",
    "    print globals()[\"mod%s\" % model_num].summary()\n",
    "    #print model_num.summary()\n",
    "    print '\\n', \"Chi-squared value:\", globals()[\"mod%s\" % model_num].llr, '\\n' #TO GET THE CHI-SQUARED VALUE\n",
    "    #print '\\n', \"Pseudo R-squared:\", globals()[\"mod%s\" % model_num].prsquared #TO GET THE PSEUDO-R-SQUARED"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 543,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.044916\n",
      "         Iterations 10\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th> <td>donor_advisory_2016</td> <th>  No. Observations:  </th>  <td>  4808</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>                <td>Logit</td>        <th>  Df Residuals:      </th>  <td>  4802</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>                <td>MLE</td>         <th>  Df Model:          </th>  <td>     5</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>           <td>Wed, 31 Aug 2016</td>   <th>  Pseudo R-squ.:     </th>  <td>0.08557</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>               <td>21:46:12</td>       <th>  Log-Likelihood:    </th> <td> -215.96</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>            <td>True</td>         <th>  LL-Null:           </th> <td> -236.17</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                       <td> </td>          <th>  LLR p-value:       </th> <td>1.229e-07</td>\n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "                <td></td>                  <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>                    <td>   -3.0296</td> <td>    1.925</td> <td>   -1.574</td> <td> 0.115</td> <td>   -6.802     0.743</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>SOX_policies_2011</th>            <td>   -0.4992</td> <td>    0.145</td> <td>   -3.448</td> <td> 0.001</td> <td>   -0.783    -0.215</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>total_revenue_2011_logged</th>    <td>    0.3113</td> <td>    0.138</td> <td>    2.260</td> <td> 0.024</td> <td>    0.041     0.581</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>program_expense_percent_2011</th> <td>   -0.0285</td> <td>    0.009</td> <td>   -3.068</td> <td> 0.002</td> <td>   -0.047    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>age</th>                          <td>   -0.0104</td> <td>    0.009</td> <td>   -1.130</td> <td> 0.258</td> <td>   -0.028     0.008</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>complexity_2011</th>              <td>   -1.2189</td> <td>    0.358</td> <td>   -3.402</td> <td> 0.001</td> <td>   -1.921    -0.517</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                            Logit Regression Results                           \n",
       "===============================================================================\n",
       "Dep. Variable:     donor_advisory_2016   No. Observations:                 4808\n",
       "Model:                           Logit   Df Residuals:                     4802\n",
       "Method:                            MLE   Df Model:                            5\n",
       "Date:                 Wed, 31 Aug 2016   Pseudo R-squ.:                 0.08557\n",
       "Time:                         21:46:12   Log-Likelihood:                -215.96\n",
       "converged:                        True   LL-Null:                       -236.17\n",
       "                                         LLR p-value:                 1.229e-07\n",
       "================================================================================================\n",
       "                                   coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------------------\n",
       "Intercept                       -3.0296      1.925     -1.574      0.115        -6.802     0.743\n",
       "SOX_policies_2011               -0.4992      0.145     -3.448      0.001        -0.783    -0.215\n",
       "total_revenue_2011_logged        0.3113      0.138      2.260      0.024         0.041     0.581\n",
       "program_expense_percent_2011    -0.0285      0.009     -3.068      0.002        -0.047    -0.010\n",
       "age                             -0.0104      0.009     -1.130      0.258        -0.028     0.008\n",
       "complexity_2011                 -1.2189      0.358     -3.402      0.001        -1.921    -0.517\n",
       "================================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 543,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "logit = smf.logit(formula=logit_formula, data=df).fit() \n",
    "logit.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 524,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4863"
      ]
     },
     "execution_count": 524,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df[df['2011 data']==1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 523,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>SOX_policies_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>2.533416</td>\n",
       "      <td>0.869466</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>3.00000</td>\n",
       "      <td>3.00000</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>total_revenue_2011_logged</th>\n",
       "      <td>4811</td>\n",
       "      <td>15.530981</td>\n",
       "      <td>1.282845</td>\n",
       "      <td>12.586466</td>\n",
       "      <td>14.569243</td>\n",
       "      <td>15.36338</td>\n",
       "      <td>16.27977</td>\n",
       "      <td>22.000798</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>program_expense_percent_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>80.416325</td>\n",
       "      <td>10.553429</td>\n",
       "      <td>2.200000</td>\n",
       "      <td>75.500000</td>\n",
       "      <td>81.60000</td>\n",
       "      <td>87.00000</td>\n",
       "      <td>99.700000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>age</th>\n",
       "      <td>4860</td>\n",
       "      <td>40.051029</td>\n",
       "      <td>19.240216</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>25.000000</td>\n",
       "      <td>35.00000</td>\n",
       "      <td>52.00000</td>\n",
       "      <td>108.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>complexity_2011</th>\n",
       "      <td>4833</td>\n",
       "      <td>2.466791</td>\n",
       "      <td>0.514468</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.00000</td>\n",
       "      <td>3.00000</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                              count       mean        std        min  \\\n",
       "SOX_policies_2011              4833   2.533416   0.869466   0.000000   \n",
       "total_revenue_2011_logged      4811  15.530981   1.282845  12.586466   \n",
       "program_expense_percent_2011   4833  80.416325  10.553429   2.200000   \n",
       "age                            4860  40.051029  19.240216   0.000000   \n",
       "complexity_2011                4833   2.466791   0.514468   1.000000   \n",
       "\n",
       "                                    25%       50%       75%         max  \n",
       "SOX_policies_2011              2.000000   3.00000   3.00000    3.000000  \n",
       "total_revenue_2011_logged     14.569243  15.36338  16.27977   22.000798  \n",
       "program_expense_percent_2011  75.500000  81.60000  87.00000   99.700000  \n",
       "age                           25.000000  35.00000  52.00000  108.000000  \n",
       "complexity_2011                2.000000   2.00000   3.00000    3.000000  "
      ]
     },
     "execution_count": 523,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['2011 data']==1][['SOX_policies_2011', 'total_revenue_2011_logged', 'program_expense_percent_2011', \n",
    "    'age', 'complexity_2011']].describe().T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 508,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4863"
      ]
     },
     "execution_count": 508,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df[df['state_2011'].notnull()])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 509,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "NY    672\n",
       "CA    649\n",
       "DC    333\n",
       "FL    283\n",
       "TX    239\n",
       "VA    187\n",
       "MA    183\n",
       "IL    172\n",
       "PA    169\n",
       "OH    127\n",
       "CO    126\n",
       "GA    114\n",
       "WA    110\n",
       "MD    108\n",
       "MI    107\n",
       "MN     94\n",
       "NC     93\n",
       "MO     92\n",
       "NJ     86\n",
       "TN     80\n",
       "OR     79\n",
       "AZ     75\n",
       "CT     71\n",
       "WI     64\n",
       "IN     52\n",
       "KY     37\n",
       "SC     35\n",
       "NE     33\n",
       "LA     29\n",
       "OK     29\n",
       "ME     27\n",
       "UT     26\n",
       "AL     26\n",
       "KS     24\n",
       "NM     24\n",
       "IA     23\n",
       "MT     22\n",
       "HI     20\n",
       "NH     17\n",
       "NV     17\n",
       "RI     15\n",
       "MS     15\n",
       "VT     14\n",
       "AR     14\n",
       "DE     13\n",
       "WY      9\n",
       "AK      8\n",
       "ID      7\n",
       "SD      6\n",
       "WV      5\n",
       "ND      2\n",
       "PR      1\n",
       "Name: state_2011, dtype: int64"
      ]
     },
     "execution_count": 509,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['state_2011'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 516,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 516,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df[ (df['2011 data']==1) & (df['state_2011'].isnull())])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 540,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>donor_advisory_2016</th>\n",
       "      <td>4808</td>\n",
       "      <td>0.008527</td>\n",
       "      <td>0.091959</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SOX_policies_2011</th>\n",
       "      <td>4808</td>\n",
       "      <td>2.534110</td>\n",
       "      <td>0.869042</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>3.0000</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>total_revenue_2011_logged</th>\n",
       "      <td>4808</td>\n",
       "      <td>15.532027</td>\n",
       "      <td>1.282551</td>\n",
       "      <td>12.586466</td>\n",
       "      <td>14.569824</td>\n",
       "      <td>15.364199</td>\n",
       "      <td>16.2804</td>\n",
       "      <td>22.000798</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>program_expense_percent_2011</th>\n",
       "      <td>4808</td>\n",
       "      <td>80.427787</td>\n",
       "      <td>10.558554</td>\n",
       "      <td>2.200000</td>\n",
       "      <td>75.500000</td>\n",
       "      <td>81.600000</td>\n",
       "      <td>87.0000</td>\n",
       "      <td>99.700000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>age</th>\n",
       "      <td>4808</td>\n",
       "      <td>40.016015</td>\n",
       "      <td>19.198956</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>25.000000</td>\n",
       "      <td>35.000000</td>\n",
       "      <td>52.0000</td>\n",
       "      <td>108.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>complexity_2011</th>\n",
       "      <td>4808</td>\n",
       "      <td>2.466722</td>\n",
       "      <td>0.514543</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>3.0000</td>\n",
       "      <td>3.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                              count       mean        std        min  \\\n",
       "donor_advisory_2016            4808   0.008527   0.091959   0.000000   \n",
       "SOX_policies_2011              4808   2.534110   0.869042   0.000000   \n",
       "total_revenue_2011_logged      4808  15.532027   1.282551  12.586466   \n",
       "program_expense_percent_2011   4808  80.427787  10.558554   2.200000   \n",
       "age                            4808  40.016015  19.198956   0.000000   \n",
       "complexity_2011                4808   2.466722   0.514543   1.000000   \n",
       "\n",
       "                                    25%        50%      75%         max  \n",
       "donor_advisory_2016            0.000000   0.000000   0.0000    1.000000  \n",
       "SOX_policies_2011              2.000000   3.000000   3.0000    3.000000  \n",
       "total_revenue_2011_logged     14.569824  15.364199  16.2804   22.000798  \n",
       "program_expense_percent_2011  75.500000  81.600000  87.0000   99.700000  \n",
       "age                           25.000000  35.000000  52.0000  108.000000  \n",
       "complexity_2011                2.000000   2.000000   3.0000    3.000000  "
      ]
     },
     "execution_count": 540,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_2011.dropna().describe().T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 546,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warning: Maximum number of iterations has been exceeded.\n",
      "         Current function value: 0.043361\n",
      "         Iterations: 35\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals\n",
      "  \"Check mle_retvals\", ConvergenceWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th> <td>donor_advisory_2016</td> <th>  No. Observations:  </th>  <td>  4808</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>                <td>Logit</td>        <th>  Df Residuals:      </th>  <td>  4792</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>                <td>MLE</td>         <th>  Df Model:          </th>  <td>    15</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>           <td>Wed, 31 Aug 2016</td>   <th>  Pseudo R-squ.:     </th>  <td>0.1172</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>               <td>21:48:06</td>       <th>  Log-Likelihood:    </th> <td> -208.48</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>            <td>False</td>        <th>  LL-Null:           </th> <td> -236.17</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                       <td> </td>          <th>  LLR p-value:       </th> <td>1.546e-06</td>\n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "                      <td></td>                         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>                                 <td>   -3.4848</td> <td>    1.884</td> <td>   -1.849</td> <td> 0.064</td> <td>   -7.178     0.208</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Arts, Culture, Humanities]</th>  <td>   -0.7432</td> <td>    0.965</td> <td>   -0.770</td> <td> 0.441</td> <td>   -2.635     1.149</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Community Development]</th>      <td>    0.2960</td> <td>    0.856</td> <td>    0.346</td> <td> 0.730</td> <td>   -1.382     1.974</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Education]</th>                  <td>   -0.1108</td> <td>    0.901</td> <td>   -0.123</td> <td> 0.902</td> <td>   -1.877     1.655</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Environment]</th>                <td>  -17.5510</td> <td>    0.614</td> <td>  -28.598</td> <td> 0.000</td> <td>  -18.754   -16.348</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Health]</th>                     <td>   -0.1414</td> <td>    0.744</td> <td>   -0.190</td> <td> 0.849</td> <td>   -1.600     1.317</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Human Services]</th>             <td>   -0.2230</td> <td>    0.760</td> <td>   -0.293</td> <td> 0.769</td> <td>   -1.713     1.268</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Human and Civil Rights]</th>     <td>   -0.0320</td> <td>    0.774</td> <td>   -0.041</td> <td> 0.967</td> <td>   -1.550     1.486</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.International]</th>              <td>   -0.0034</td> <td>    0.766</td> <td>   -0.004</td> <td> 0.996</td> <td>   -1.505     1.498</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Religion]</th>                   <td>    1.1258</td> <td>    0.708</td> <td>    1.589</td> <td> 0.112</td> <td>   -0.263     2.514</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Research and Public Policy]</th> <td>    0.5617</td> <td>    0.799</td> <td>    0.703</td> <td> 0.482</td> <td>   -1.004     2.127</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>SOX_policies_2011</th>                         <td>   -0.4728</td> <td>    0.124</td> <td>   -3.805</td> <td> 0.000</td> <td>   -0.716    -0.229</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>total_revenue_2011_logged</th>                 <td>    0.2978</td> <td>    0.126</td> <td>    2.356</td> <td> 0.018</td> <td>    0.050     0.546</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>program_expense_percent_2011</th>              <td>   -0.0330</td> <td>    0.011</td> <td>   -2.998</td> <td> 0.003</td> <td>   -0.055    -0.011</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>age</th>                                       <td>   -0.0104</td> <td>    0.007</td> <td>   -1.499</td> <td> 0.134</td> <td>   -0.024     0.003</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>complexity_2011</th>                           <td>   -0.8049</td> <td>    0.465</td> <td>   -1.730</td> <td> 0.084</td> <td>   -1.717     0.107</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                            Logit Regression Results                           \n",
       "===============================================================================\n",
       "Dep. Variable:     donor_advisory_2016   No. Observations:                 4808\n",
       "Model:                           Logit   Df Residuals:                     4792\n",
       "Method:                            MLE   Df Model:                           15\n",
       "Date:                 Wed, 31 Aug 2016   Pseudo R-squ.:                  0.1172\n",
       "Time:                         21:48:06   Log-Likelihood:                -208.48\n",
       "converged:                       False   LL-Null:                       -236.17\n",
       "                                         LLR p-value:                 1.546e-06\n",
       "=============================================================================================================\n",
       "                                                coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------------------------------\n",
       "Intercept                                    -3.4848      1.884     -1.849      0.064        -7.178     0.208\n",
       "C(category)[T.Arts, Culture, Humanities]     -0.7432      0.965     -0.770      0.441        -2.635     1.149\n",
       "C(category)[T.Community Development]          0.2960      0.856      0.346      0.730        -1.382     1.974\n",
       "C(category)[T.Education]                     -0.1108      0.901     -0.123      0.902        -1.877     1.655\n",
       "C(category)[T.Environment]                  -17.5510      0.614    -28.598      0.000       -18.754   -16.348\n",
       "C(category)[T.Health]                        -0.1414      0.744     -0.190      0.849        -1.600     1.317\n",
       "C(category)[T.Human Services]                -0.2230      0.760     -0.293      0.769        -1.713     1.268\n",
       "C(category)[T.Human and Civil Rights]        -0.0320      0.774     -0.041      0.967        -1.550     1.486\n",
       "C(category)[T.International]                 -0.0034      0.766     -0.004      0.996        -1.505     1.498\n",
       "C(category)[T.Religion]                       1.1258      0.708      1.589      0.112        -0.263     2.514\n",
       "C(category)[T.Research and Public Policy]     0.5617      0.799      0.703      0.482        -1.004     2.127\n",
       "SOX_policies_2011                            -0.4728      0.124     -3.805      0.000        -0.716    -0.229\n",
       "total_revenue_2011_logged                     0.2978      0.126      2.356      0.018         0.050     0.546\n",
       "program_expense_percent_2011                 -0.0330      0.011     -2.998      0.003        -0.055    -0.011\n",
       "age                                          -0.0104      0.007     -1.499      0.134        -0.024     0.003\n",
       "complexity_2011                              -0.8049      0.465     -1.730      0.084        -1.717     0.107\n",
       "=============================================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 546,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "logit = smf.logit(formula=logit_formula, data=df_2011.dropna()).fit(cov_type='cluster', \n",
    "                                                    cov_kwds={'groups': df_2011.dropna()['state_2011']}) \n",
    "logit.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 547,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warning: Maximum number of iterations has been exceeded.\n",
      "         Current function value: 0.043361\n",
      "         Iterations: 35\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals\n",
      "  \"Check mle_retvals\", ConvergenceWarning)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th> <td>donor_advisory_2016</td> <th>  No. Observations:  </th>  <td>  4808</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>                <td>Logit</td>        <th>  Df Residuals:      </th>  <td>  4792</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>                <td>MLE</td>         <th>  Df Model:          </th>  <td>    15</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>           <td>Wed, 31 Aug 2016</td>   <th>  Pseudo R-squ.:     </th>  <td>0.1172</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>               <td>21:48:36</td>       <th>  Log-Likelihood:    </th> <td> -208.48</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>            <td>False</td>        <th>  LL-Null:           </th> <td> -236.17</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                       <td> </td>          <th>  LLR p-value:       </th> <td>1.546e-06</td>\n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "                      <td></td>                         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>                                 <td>   -3.4848</td> <td>    2.010</td> <td>   -1.734</td> <td> 0.083</td> <td>   -7.424     0.454</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Arts, Culture, Humanities]</th>  <td>   -0.7432</td> <td>    0.940</td> <td>   -0.790</td> <td> 0.429</td> <td>   -2.586     1.100</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Community Development]</th>      <td>    0.2960</td> <td>    0.835</td> <td>    0.354</td> <td> 0.723</td> <td>   -1.341     1.933</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Education]</th>                  <td>   -0.1108</td> <td>    0.929</td> <td>   -0.119</td> <td> 0.905</td> <td>   -1.931     1.709</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Environment]</th>                <td>  -17.5510</td> <td> 4367.028</td> <td>   -0.004</td> <td> 0.997</td> <td>-8576.769  8541.667</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Health]</th>                     <td>   -0.1414</td> <td>    0.755</td> <td>   -0.187</td> <td> 0.851</td> <td>   -1.620     1.338</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Human Services]</th>             <td>   -0.2230</td> <td>    0.707</td> <td>   -0.315</td> <td> 0.753</td> <td>   -1.609     1.163</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Human and Civil Rights]</th>     <td>   -0.0320</td> <td>    0.934</td> <td>   -0.034</td> <td> 0.973</td> <td>   -1.863     1.799</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.International]</th>              <td>   -0.0034</td> <td>    0.790</td> <td>   -0.004</td> <td> 0.997</td> <td>   -1.552     1.545</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Religion]</th>                   <td>    1.1258</td> <td>    0.673</td> <td>    1.672</td> <td> 0.095</td> <td>   -0.194     2.445</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(category)[T.Research and Public Policy]</th> <td>    0.5617</td> <td>    0.930</td> <td>    0.604</td> <td> 0.546</td> <td>   -1.260     2.384</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>SOX_policies_2011</th>                         <td>   -0.4728</td> <td>    0.146</td> <td>   -3.240</td> <td> 0.001</td> <td>   -0.759    -0.187</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>total_revenue_2011_logged</th>                 <td>    0.2978</td> <td>    0.139</td> <td>    2.150</td> <td> 0.032</td> <td>    0.026     0.569</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>program_expense_percent_2011</th>              <td>   -0.0330</td> <td>    0.010</td> <td>   -3.336</td> <td> 0.001</td> <td>   -0.052    -0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>age</th>                                       <td>   -0.0104</td> <td>    0.010</td> <td>   -1.081</td> <td> 0.280</td> <td>   -0.029     0.008</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>complexity_2011</th>                           <td>   -0.8049</td> <td>    0.394</td> <td>   -2.045</td> <td> 0.041</td> <td>   -1.576    -0.034</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                            Logit Regression Results                           \n",
       "===============================================================================\n",
       "Dep. Variable:     donor_advisory_2016   No. Observations:                 4808\n",
       "Model:                           Logit   Df Residuals:                     4792\n",
       "Method:                            MLE   Df Model:                           15\n",
       "Date:                 Wed, 31 Aug 2016   Pseudo R-squ.:                  0.1172\n",
       "Time:                         21:48:36   Log-Likelihood:                -208.48\n",
       "converged:                       False   LL-Null:                       -236.17\n",
       "                                         LLR p-value:                 1.546e-06\n",
       "=============================================================================================================\n",
       "                                                coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------------------------------\n",
       "Intercept                                    -3.4848      2.010     -1.734      0.083        -7.424     0.454\n",
       "C(category)[T.Arts, Culture, Humanities]     -0.7432      0.940     -0.790      0.429        -2.586     1.100\n",
       "C(category)[T.Community Development]          0.2960      0.835      0.354      0.723        -1.341     1.933\n",
       "C(category)[T.Education]                     -0.1108      0.929     -0.119      0.905        -1.931     1.709\n",
       "C(category)[T.Environment]                  -17.5510   4367.028     -0.004      0.997     -8576.769  8541.667\n",
       "C(category)[T.Health]                        -0.1414      0.755     -0.187      0.851        -1.620     1.338\n",
       "C(category)[T.Human Services]                -0.2230      0.707     -0.315      0.753        -1.609     1.163\n",
       "C(category)[T.Human and Civil Rights]        -0.0320      0.934     -0.034      0.973        -1.863     1.799\n",
       "C(category)[T.International]                 -0.0034      0.790     -0.004      0.997        -1.552     1.545\n",
       "C(category)[T.Religion]                       1.1258      0.673      1.672      0.095        -0.194     2.445\n",
       "C(category)[T.Research and Public Policy]     0.5617      0.930      0.604      0.546        -1.260     2.384\n",
       "SOX_policies_2011                            -0.4728      0.146     -3.240      0.001        -0.759    -0.187\n",
       "total_revenue_2011_logged                     0.2978      0.139      2.150      0.032         0.026     0.569\n",
       "program_expense_percent_2011                 -0.0330      0.010     -3.336      0.001        -0.052    -0.014\n",
       "age                                          -0.0104      0.010     -1.081      0.280        -0.029     0.008\n",
       "complexity_2011                              -0.8049      0.394     -2.045      0.041        -1.576    -0.034\n",
       "=============================================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 547,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "logit = smf.logit(formula=logit_formula, data=df).fit() \n",
    "logit.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 549,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Human Services                1188\n",
       "Arts, Culture, Humanities      670\n",
       "Health                         574\n",
       "International                  428\n",
       "Community Development          381\n",
       "Animals                        372\n",
       "Environment                    317\n",
       "Religion                       295\n",
       "Education                      280\n",
       "Human and Civil Rights         181\n",
       "Research and Public Policy     122\n",
       "Name: category, dtype: int64"
      ]
     },
     "execution_count": 549,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_2011.dropna()['category'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 513,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.044916\n",
      "         Iterations 10\n"
     ]
    },
    {
     "ename": "ValueError",
     "evalue": "The weights and list don't have the same length.",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-513-f1b037f5d426>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      1\u001b[0m logit = smf.logit(formula=logit_formula, data=df[df['2011 data']==1]).fit(cov_type='cluster', \n\u001b[0;32m----> 2\u001b[0;31m                                                     cov_kwds={'groups': df[df['2011 data']==1]['state_2011']}) \n\u001b[0m\u001b[1;32m      3\u001b[0m \u001b[0mlogit\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msummary\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.pyc\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)\u001b[0m\n\u001b[1;32m   1374\u001b[0m         bnryfit = super(Logit, self).fit(start_params=start_params,\n\u001b[1;32m   1375\u001b[0m                 \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmaxiter\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmaxiter\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfull_output\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfull_output\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1376\u001b[0;31m                 disp=disp, callback=callback, **kwargs)\n\u001b[0m\u001b[1;32m   1377\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1378\u001b[0m         \u001b[0mdiscretefit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mLogitResults\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbnryfit\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.pyc\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)\u001b[0m\n\u001b[1;32m    201\u001b[0m         mlefit = super(DiscreteModel, self).fit(start_params=start_params,\n\u001b[1;32m    202\u001b[0m                 \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmaxiter\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmaxiter\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfull_output\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfull_output\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 203\u001b[0;31m                 disp=disp, callback=callback, **kwargs)\n\u001b[0m\u001b[1;32m    204\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    205\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mmlefit\u001b[0m \u001b[0;31m# up to subclasses to wrap results\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs)\u001b[0m\n\u001b[1;32m    455\u001b[0m         \u001b[0;31m#print('kwds inLikelihoodModel.fit', kwds)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    456\u001b[0m         \u001b[0;31m#TODO: add Hessian approximation and change the above if needed\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 457\u001b[0;31m         \u001b[0mmlefit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mLikelihoodModelResults\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxopt\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mHinv\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mscale\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1.\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    458\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    459\u001b[0m         \u001b[0;31m#TODO: hardcode scale?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, model, params, normalized_cov_params, scale, **kwargs)\u001b[0m\n\u001b[1;32m    940\u001b[0m                 \u001b[0;31m# TODO: we shouldn't need use_t in get_robustcov_results\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    941\u001b[0m                 get_robustcov_results(self, cov_type=cov_type, use_self=True,\n\u001b[0;32m--> 942\u001b[0;31m                                            use_t=use_t, **cov_kwds)\n\u001b[0m\u001b[1;32m    943\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    944\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/base/covtype.pyc\u001b[0m in \u001b[0;36mget_robustcov_results\u001b[0;34m(self, cov_type, use_t, **kwds)\u001b[0m\n\u001b[1;32m    193\u001b[0m                 \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mn_groups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mn_groups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munique\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgroups\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    194\u001b[0m             res.cov_params_default = sw.cov_cluster(self, groups,\n\u001b[0;32m--> 195\u001b[0;31m                                              use_correction=use_correction)\n\u001b[0m\u001b[1;32m    196\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    197\u001b[0m         \u001b[0;32melif\u001b[0m \u001b[0mgroups\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndim\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc\u001b[0m in \u001b[0;36mcov_cluster\u001b[0;34m(results, group, use_correction)\u001b[0m\n\u001b[1;32m    535\u001b[0m         \u001b[0mclusters\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munique\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgroup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    536\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 537\u001b[0;31m     \u001b[0mscale\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mS_crosssection\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mxu\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgroup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    538\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    539\u001b[0m     \u001b[0mnobs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mk_params\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mxu\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc\u001b[0m in \u001b[0;36mS_crosssection\u001b[0;34m(x, group)\u001b[0m\n\u001b[1;32m    489\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    490\u001b[0m     '''\n\u001b[0;32m--> 491\u001b[0;31m     \u001b[0mx_group_sums\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgroup_sums\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgroup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m  \u001b[0;31m#TODO: why transposed\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    492\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    493\u001b[0m     \u001b[0;32mreturn\u001b[0m \u001b[0mS_white_simple\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_group_sums\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc\u001b[0m in \u001b[0;36mgroup_sums\u001b[0;34m(x, group)\u001b[0m\n\u001b[1;32m    435\u001b[0m     \u001b[0;31m#TODO: transpose return in group_sum, need test coverage first\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    436\u001b[0m     return np.array([np.bincount(group, weights=x[:, col])\n\u001b[0;32m--> 437\u001b[0;31m                             for col in range(x.shape[1])])\n\u001b[0m\u001b[1;32m    438\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    439\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mValueError\u001b[0m: The weights and list don't have the same length."
     ]
    }
   ],
   "source": [
    "logit = smf.logit(formula=logit_formula, data=df[df['2011 data']==1]).fit(cov_type='cluster', \n",
    "                                                    cov_kwds={'groups': df[df['2011 data']]['state_2011']}) \n",
    "logit.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 526,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.044916\n",
      "         Iterations 10\n"
     ]
    },
    {
     "ename": "ValueError",
     "evalue": "The weights and list don't have the same length.",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-526-17812cb557f5>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      1\u001b[0m logit = smf.logit(formula=logit_formula, data=df[df['total_revenue_2011_logged'].notnull()]).fit(cov_type='cluster', \n\u001b[0;32m----> 2\u001b[0;31m                                                     cov_kwds={'groups': df[df['total_revenue_2011_logged'].notnull()]['state_2011']}) \n\u001b[0m\u001b[1;32m      3\u001b[0m \u001b[0mlogit\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msummary\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.pyc\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)\u001b[0m\n\u001b[1;32m   1374\u001b[0m         bnryfit = super(Logit, self).fit(start_params=start_params,\n\u001b[1;32m   1375\u001b[0m                 \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmaxiter\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmaxiter\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfull_output\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfull_output\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1376\u001b[0;31m                 disp=disp, callback=callback, **kwargs)\n\u001b[0m\u001b[1;32m   1377\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1378\u001b[0m         \u001b[0mdiscretefit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mLogitResults\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbnryfit\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.pyc\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)\u001b[0m\n\u001b[1;32m    201\u001b[0m         mlefit = super(DiscreteModel, self).fit(start_params=start_params,\n\u001b[1;32m    202\u001b[0m                 \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmaxiter\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmaxiter\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfull_output\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfull_output\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 203\u001b[0;31m                 disp=disp, callback=callback, **kwargs)\n\u001b[0m\u001b[1;32m    204\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    205\u001b[0m         \u001b[0;32mreturn\u001b[0m \u001b[0mmlefit\u001b[0m \u001b[0;31m# up to subclasses to wrap results\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs)\u001b[0m\n\u001b[1;32m    455\u001b[0m         \u001b[0;31m#print('kwds inLikelihoodModel.fit', kwds)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    456\u001b[0m         \u001b[0;31m#TODO: add Hessian approximation and change the above if needed\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 457\u001b[0;31m         \u001b[0mmlefit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mLikelihoodModelResults\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxopt\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mHinv\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mscale\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1.\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    458\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    459\u001b[0m         \u001b[0;31m#TODO: hardcode scale?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, model, params, normalized_cov_params, scale, **kwargs)\u001b[0m\n\u001b[1;32m    940\u001b[0m                 \u001b[0;31m# TODO: we shouldn't need use_t in get_robustcov_results\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    941\u001b[0m                 get_robustcov_results(self, cov_type=cov_type, use_self=True,\n\u001b[0;32m--> 942\u001b[0;31m                                            use_t=use_t, **cov_kwds)\n\u001b[0m\u001b[1;32m    943\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    944\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/base/covtype.pyc\u001b[0m in \u001b[0;36mget_robustcov_results\u001b[0;34m(self, cov_type, use_t, **kwds)\u001b[0m\n\u001b[1;32m    193\u001b[0m                 \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mn_groups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mn_groups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munique\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgroups\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    194\u001b[0m             res.cov_params_default = sw.cov_cluster(self, groups,\n\u001b[0;32m--> 195\u001b[0;31m                                              use_correction=use_correction)\n\u001b[0m\u001b[1;32m    196\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    197\u001b[0m         \u001b[0;32melif\u001b[0m \u001b[0mgroups\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndim\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc\u001b[0m in \u001b[0;36mcov_cluster\u001b[0;34m(results, group, use_correction)\u001b[0m\n\u001b[1;32m    535\u001b[0m         \u001b[0mclusters\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munique\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgroup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    536\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 537\u001b[0;31m     \u001b[0mscale\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mS_crosssection\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mxu\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgroup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    538\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    539\u001b[0m     \u001b[0mnobs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mk_params\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mxu\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc\u001b[0m in \u001b[0;36mS_crosssection\u001b[0;34m(x, group)\u001b[0m\n\u001b[1;32m    489\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    490\u001b[0m     '''\n\u001b[0;32m--> 491\u001b[0;31m     \u001b[0mx_group_sums\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgroup_sums\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgroup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m  \u001b[0;31m#TODO: why transposed\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    492\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    493\u001b[0m     \u001b[0;32mreturn\u001b[0m \u001b[0mS_white_simple\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_group_sums\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc\u001b[0m in \u001b[0;36mgroup_sums\u001b[0;34m(x, group)\u001b[0m\n\u001b[1;32m    435\u001b[0m     \u001b[0;31m#TODO: transpose return in group_sum, need test coverage first\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    436\u001b[0m     return np.array([np.bincount(group, weights=x[:, col])\n\u001b[0;32m--> 437\u001b[0;31m                             for col in range(x.shape[1])])\n\u001b[0m\u001b[1;32m    438\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    439\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mValueError\u001b[0m: The weights and list don't have the same length."
     ]
    }
   ],
   "source": [
    "logit = smf.logit(formula=logit_formula, data=df[df['total_revenue_2011_logged'].notnull()]).fit(cov_type='cluster', \n",
    "                                    cov_kwds={'groups': df[df['total_revenue_2011_logged'].notnull()]['state_2011']}) \n",
    "logit.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 500,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4863\n",
      "4816\n",
      "47\n"
     ]
    }
   ],
   "source": [
    "print len(df[(df['2011 data']==1) & (df['donor_advisory_2016'].notnull())])\n",
    "print len(df[(df['2011 data']==1) & (df['donor_advisory_2016']==0)])\n",
    "print len(df[(df['2011 data']==1) & (df['donor_advisory_2016']==1)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 468,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_star_2011', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011']\n"
     ]
    }
   ],
   "source": [
    "print cols_2011"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "BMF_columns = ['NEW ROW', 'NAME_2015_BMF', 'STREET_2015_BMF', 'CITY_2015_BMF', 'STATE_2015_BMF', 'ZIP_2015_BMF', \n",
    "'RULING_2015_BMF', 'ACTIVITY_2015_BMF', 'TAX_PERIOD_2015_BMF', 'ASSET_AMT_2015_BMF', 'INCOME_AMT_2015_BMF', \n",
    "'REVENUE_AMT_2015_BMF', 'NTEE_CD_2015_BMF', '2015 BMF', 'ruledate_2004_BMF', 'name_MSTRALL', 'state_MSTRALL', \n",
    "'NTEE1_MSTRALL', 'nteecc_MSTRALL', 'zip_MSTRALL', 'fips_MSTRALL', 'taxper_MSTRALL', 'income_MSTRALL', \n",
    "'F990REV_MSTRALL', 'assets_MSTRALL', 'ruledate_MSTRALL', 'deductcd_MSTRALL', 'accper_MSTRALL', \n",
    "'rule_date_v1', 'taxpd']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "df['']df[(df['SOX_policies_2011'].notnull())]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 475,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "321\n",
      "321 321\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['16722', '14954', '16155']"
      ]
     },
     "execution_count": 475,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print len(df[df['current_donor_advisory']==1])\n",
    "DA_2016 = df[df['current_donor_advisory']==1]['org_id'].tolist()\n",
    "print len(DA_2016), len(set(DA_2016))\n",
    "DA_2016[:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 486,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    84590\n",
       "1      368\n",
       "Name: donor_advisory_2016, dtype: int64"
      ]
     },
     "execution_count": 486,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['donor_advisory_2016'] = np.nan\n",
    "df['donor_advisory_2016'] = np.where(  df['org_id'].isin(DA_2016), 1,0\n",
    "                                       )\n",
    "df['donor_advisory_2016'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 473,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>org_id</th>\n",
       "      <th>FYE</th>\n",
       "      <th>2011 data</th>\n",
       "      <th>current_donor_advisory</th>\n",
       "      <th>past_donor_advisory</th>\n",
       "      <th>SOX_policies_2011</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>10,000 Degrees</td>\n",
       "      <td>6466</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>100 Club of Arizona</td>\n",
       "      <td>12123</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>1000 Friends of Florida</td>\n",
       "      <td>10092</td>\n",
       "      <td>FY2008</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63</th>\n",
       "      <td>1000 Friends of Oregon</td>\n",
       "      <td>8770</td>\n",
       "      <td>FY2010</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>148</th>\n",
       "      <td>4 Paws for Ability</td>\n",
       "      <td>13055</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>161</th>\n",
       "      <td>The 92nd Street Y</td>\n",
       "      <td>4792</td>\n",
       "      <td>FY2010</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>185</th>\n",
       "      <td>A Better Chance</td>\n",
       "      <td>6082</td>\n",
       "      <td>FY2010</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>208</th>\n",
       "      <td>A Contemporary Theatre</td>\n",
       "      <td>3634</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>228</th>\n",
       "      <td>A Kid Again</td>\n",
       "      <td>9239</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>244</th>\n",
       "      <td>A Noise Within</td>\n",
       "      <td>10176</td>\n",
       "      <td>FY2010</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>255</th>\n",
       "      <td>A Place Called Home</td>\n",
       "      <td>8040</td>\n",
       "      <td>FY2010</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>276</th>\n",
       "      <td>A.J. Muste Memorial Institute</td>\n",
       "      <td>6096</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>292</th>\n",
       "      <td>AAA Foundation for Traffic Safety</td>\n",
       "      <td>8302</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>312</th>\n",
       "      <td>Aaron Diamond AIDS Research Center</td>\n",
       "      <td>4991</td>\n",
       "      <td>FY2010</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>328</th>\n",
       "      <td>AARP Foundation</td>\n",
       "      <td>3205</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>337</th>\n",
       "      <td>AAUW - American Association of University Women</td>\n",
       "      <td>3240</td>\n",
       "      <td>FY2010</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>362</th>\n",
       "      <td>Abilities United</td>\n",
       "      <td>7940</td>\n",
       "      <td>FY2010</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>373</th>\n",
       "      <td>The Ability Experience</td>\n",
       "      <td>7632</td>\n",
       "      <td>FY2010</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>385</th>\n",
       "      <td>Abode Services</td>\n",
       "      <td>9182</td>\n",
       "      <td>FY2010</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>399</th>\n",
       "      <td>The Abraham Fund Initiatives</td>\n",
       "      <td>9371</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>426</th>\n",
       "      <td>Abused Deaf Women's Advocacy Services</td>\n",
       "      <td>12762</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>441</th>\n",
       "      <td>Academy of Achievement</td>\n",
       "      <td>5705</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>458</th>\n",
       "      <td>Academy of American Poets</td>\n",
       "      <td>9256</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>477</th>\n",
       "      <td>The Academy of Natural Sciences of Drexel University</td>\n",
       "      <td>3209</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                     name org_id     FYE  \\\n",
       "21                                         10,000 Degrees   6466  FY2009   \n",
       "39                                    100 Club of Arizona  12123  FY2009   \n",
       "52                                1000 Friends of Florida  10092  FY2008   \n",
       "63                                 1000 Friends of Oregon   8770  FY2010   \n",
       "148                                    4 Paws for Ability  13055  FY2009   \n",
       "161                                     The 92nd Street Y   4792  FY2010   \n",
       "185                                       A Better Chance   6082  FY2010   \n",
       "208                                A Contemporary Theatre   3634  FY2009   \n",
       "228                                           A Kid Again   9239  FY2009   \n",
       "244                                        A Noise Within  10176  FY2010   \n",
       "255                                   A Place Called Home   8040  FY2010   \n",
       "276                         A.J. Muste Memorial Institute   6096  FY2009   \n",
       "292                     AAA Foundation for Traffic Safety   8302  FY2009   \n",
       "312                    Aaron Diamond AIDS Research Center   4991  FY2010   \n",
       "328                                       AARP Foundation   3205  FY2009   \n",
       "337       AAUW - American Association of University Women   3240  FY2010   \n",
       "362                                      Abilities United   7940  FY2010   \n",
       "373                                The Ability Experience   7632  FY2010   \n",
       "385                                        Abode Services   9182  FY2010   \n",
       "399                          The Abraham Fund Initiatives   9371  FY2009   \n",
       "426                 Abused Deaf Women's Advocacy Services  12762  FY2009   \n",
       "441                                Academy of Achievement   5705  FY2009   \n",
       "458                             Academy of American Poets   9256  FY2009   \n",
       "477  The Academy of Natural Sciences of Drexel University   3209  FY2009   \n",
       "\n",
       "     2011 data  current_donor_advisory  past_donor_advisory  SOX_policies_2011  \n",
       "21           1                       0                    0                  3  \n",
       "39           1                       0                    0                  2  \n",
       "52           1                       0                    0                  0  \n",
       "63           1                       0                    0                  3  \n",
       "148          1                       0                    0                  3  \n",
       "161          1                       0                    0                  3  \n",
       "185          1                       0                    0                  3  \n",
       "208          1                       0                    0                  3  \n",
       "228          1                       0                    0                  3  \n",
       "244          1                       0                    0                  0  \n",
       "255          1                       0                    0                  3  \n",
       "276          1                       0                    0                  3  \n",
       "292          1                       0                    0                  3  \n",
       "312          1                       0                    0                  3  \n",
       "328          1                       0                    0                  3  \n",
       "337          1                       0                    0                  3  \n",
       "362          1                       0                    0                  3  \n",
       "373          1                       0                    0                  3  \n",
       "385          1                       0                    0                  3  \n",
       "399          1                       0                    0                  3  \n",
       "426          1                       0                    0                  3  \n",
       "441          1                       0                    0                  1  \n",
       "458          1                       0                    0                  3  \n",
       "477          1                       0                    0                  3  "
      ]
     },
     "execution_count": 473,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[(df['SOX_policies_2011'].notnull())][['name', 'org_id', 'FYE', '2011 data', #'current_or_past_donor_advisory',\n",
    "                                 'current_donor_advisory', 'past_donor_advisory', 'SOX_policies_2011']][:24]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 447,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    84637\n",
       "1      321\n",
       "Name: advisory, dtype: int64"
      ]
     },
     "execution_count": 447,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#df['advisory'] = np.where(~df['advisory text - current advisory'].isnull(), 1,0)\n",
    "#df['advisory'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 458,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#df[(df['2011 data']==1) & (df['past_donor_advisory']==1)][:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 483,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "invalid syntax (<unknown>, line 1)",
     "output_type": "error",
     "traceback": [
      "\u001b[0;36m  File \u001b[0;32m\"<unknown>\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m    2016 _donor_advisory\u001b[0m\n\u001b[0m                       ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
     ]
    }
   ],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "controls = ['total_revenue_2011', 'program_expense_percent_2011', \n",
    "            ]\n",
    "len(df[(df['2011 data']==1) & (df['total_revenue_2011'].isnull())])#[controls]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 411,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>org_id</th>\n",
       "      <th>EIN</th>\n",
       "      <th>org_url</th>\n",
       "      <th>name</th>\n",
       "      <th>category</th>\n",
       "      <th>category-full</th>\n",
       "      <th>Date Published</th>\n",
       "      <th>Form 990 FYE</th>\n",
       "      <th>Form 990 FYE, v2</th>\n",
       "      <th>FYE</th>\n",
       "      <th>Earliest Rating Publication Date</th>\n",
       "      <th>ratings_system</th>\n",
       "      <th>Overall Score</th>\n",
       "      <th>Overall Rating</th>\n",
       "      <th>advisory text - current advisory</th>\n",
       "      <th>advisory text - past advisory</th>\n",
       "      <th>current_or_past_donor_advisory</th>\n",
       "      <th>current_donor_advisory</th>\n",
       "      <th>past_donor_advisory</th>\n",
       "      <th>latest_entry</th>\n",
       "      <th>current_ratings_url</th>\n",
       "      <th>ein_2016</th>\n",
       "      <th>Publication_date_and_FY_2016</th>\n",
       "      <th>Publication Date_2016</th>\n",
       "      <th>FYE_2016</th>\n",
       "      <th>donor_alert_2016</th>\n",
       "      <th>overall_rating_2016</th>\n",
       "      <th>efficiency_rating_rating_2016</th>\n",
       "      <th>AT_rating_2016</th>\n",
       "      <th>overall_rating_star_2016</th>\n",
       "      <th>financial_rating_star_2016</th>\n",
       "      <th>AT_rating_star_2016</th>\n",
       "      <th>program_expense_percent_2016</th>\n",
       "      <th>admin_expense_percent_2016</th>\n",
       "      <th>fund_expense_percent_2016</th>\n",
       "      <th>fund_efficiency_2016</th>\n",
       "      <th>working_capital_ratio_2016</th>\n",
       "      <th>program_expense_growth_2016</th>\n",
       "      <th>liabilities_to_assets_2016</th>\n",
       "      <th>independent_board_2016</th>\n",
       "      <th>no_material_division_2016</th>\n",
       "      <th>audited_financials_2016</th>\n",
       "      <th>no_loans_related_2016</th>\n",
       "      <th>documents_minutes_2016</th>\n",
       "      <th>form_990_2016</th>\n",
       "      <th>conflict_of_interest_policy_2016</th>\n",
       "      <th>whistleblower_policy_2016</th>\n",
       "      <th>records_retention_policy_2016</th>\n",
       "      <th>CEO_listed_2016</th>\n",
       "      <th>process_CEO_compensation_2016</th>\n",
       "      <th>no_board_compensation_2016</th>\n",
       "      <th>donor_privacy_policy_2016</th>\n",
       "      <th>board_listed_2016</th>\n",
       "      <th>audited_financials_web_2016</th>\n",
       "      <th>form_990_web_2016</th>\n",
       "      <th>staff_listed_2016</th>\n",
       "      <th>contributions_gifts_grants_2016</th>\n",
       "      <th>federated_campaigns_2016</th>\n",
       "      <th>membership_dues_2016</th>\n",
       "      <th>fundraising_events_2016</th>\n",
       "      <th>related_organizations_2016</th>\n",
       "      <th>government_grants_2016</th>\n",
       "      <th>total_contributions_2016</th>\n",
       "      <th>program_service_revenue_2016</th>\n",
       "      <th>total_primary_revenue_2016</th>\n",
       "      <th>other_revenue_2016</th>\n",
       "      <th>total_revenue_2016</th>\n",
       "      <th>program_expenses_2016</th>\n",
       "      <th>administrative_expenses_2016</th>\n",
       "      <th>fundraising_expenses_2016</th>\n",
       "      <th>total_functional_expenses_2016</th>\n",
       "      <th>payments_to_affiliates_2016</th>\n",
       "      <th>excess_or_deficit_2016</th>\n",
       "      <th>net_assets_2016</th>\n",
       "      <th>comp_2016</th>\n",
       "      <th>cp_2016</th>\n",
       "      <th>mission_2016</th>\n",
       "      <th>2011 data</th>\n",
       "      <th>charity_name_2011</th>\n",
       "      <th>category_2011</th>\n",
       "      <th>city_2011</th>\n",
       "      <th>state_2011</th>\n",
       "      <th>cause_2011</th>\n",
       "      <th>tag_line_2011</th>\n",
       "      <th>url_2011</th>\n",
       "      <th>ein_2011</th>\n",
       "      <th>fye_2011</th>\n",
       "      <th>overall_rating_2011</th>\n",
       "      <th>overall_rating_2011_plus_30</th>\n",
       "      <th>overall_rating_2011_plus_30_v2</th>\n",
       "      <th>overall_rating_star_2011</th>\n",
       "      <th>overall_rating_star_2011_text</th>\n",
       "      <th>efficiency_rating_2011</th>\n",
       "      <th>AT_rating_2011</th>\n",
       "      <th>financial_rating_star_2011</th>\n",
       "      <th>AT_rating_star_2011</th>\n",
       "      <th>program_expense_percent_2011</th>\n",
       "      <th>admin_expense_percent_2011</th>\n",
       "      <th>fund_expense_percent_2011</th>\n",
       "      <th>fund_efficiency_2011</th>\n",
       "      <th>primary_revenue_growth_2011</th>\n",
       "      <th>program_expense_growth_2011</th>\n",
       "      <th>working_capital_ratio_2011</th>\n",
       "      <th>independent_board_2011</th>\n",
       "      <th>no_material_division_2011</th>\n",
       "      <th>audited_financials_2011</th>\n",
       "      <th>no_loans_related_2011</th>\n",
       "      <th>documents_minutes_2011</th>\n",
       "      <th>form_990_2011</th>\n",
       "      <th>conflict_of_interest_policy_2011</th>\n",
       "      <th>whistleblower_policy_2011</th>\n",
       "      <th>records_retention_policy_2011</th>\n",
       "      <th>CEO_listed_2011</th>\n",
       "      <th>process_CEO_compensation_2011</th>\n",
       "      <th>no_board_compensation_2011</th>\n",
       "      <th>donor_privacy_policy_2011</th>\n",
       "      <th>board_listed_2011</th>\n",
       "      <th>audited_financials_web_2011</th>\n",
       "      <th>form_990_web_2011</th>\n",
       "      <th>staff_listed_2011</th>\n",
       "      <th>primary_revenue_2011</th>\n",
       "      <th>other_revenue_2011</th>\n",
       "      <th>total_revenue_2011</th>\n",
       "      <th>govt_revenue_2011</th>\n",
       "      <th>program_expense_2011</th>\n",
       "      <th>admin_expense_2011</th>\n",
       "      <th>fund_expense_2011</th>\n",
       "      <th>total_functional_expense_2011</th>\n",
       "      <th>affiliate_payments_2011</th>\n",
       "      <th>budget_surplus_2011</th>\n",
       "      <th>net_assets_2011</th>\n",
       "      <th>leader_comp_2011</th>\n",
       "      <th>leader_comp_percent_2011</th>\n",
       "      <th>email_2011</th>\n",
       "      <th>website_2011</th>\n",
       "      <th>2016 Advisory - Date Posted</th>\n",
       "      <th>2016 Advisory - Charity Name</th>\n",
       "      <th>2016 Advisory - advisory_url</th>\n",
       "      <th>2016 Advisory - advisory</th>\n",
       "      <th>_merge_v1</th>\n",
       "      <th>to_be_merged</th>\n",
       "      <th>NEW ROW</th>\n",
       "      <th>NAME_2015_BMF</th>\n",
       "      <th>STREET_2015_BMF</th>\n",
       "      <th>CITY_2015_BMF</th>\n",
       "      <th>STATE_2015_BMF</th>\n",
       "      <th>ZIP_2015_BMF</th>\n",
       "      <th>RULING_2015_BMF</th>\n",
       "      <th>ACTIVITY_2015_BMF</th>\n",
       "      <th>TAX_PERIOD_2015_BMF</th>\n",
       "      <th>ASSET_AMT_2015_BMF</th>\n",
       "      <th>INCOME_AMT_2015_BMF</th>\n",
       "      <th>REVENUE_AMT_2015_BMF</th>\n",
       "      <th>NTEE_CD_2015_BMF</th>\n",
       "      <th>2015 BMF</th>\n",
       "      <th>ruledate_2004_BMF</th>\n",
       "      <th>name_MSTRALL</th>\n",
       "      <th>state_MSTRALL</th>\n",
       "      <th>NTEE1_MSTRALL</th>\n",
       "      <th>nteecc_MSTRALL</th>\n",
       "      <th>zip_MSTRALL</th>\n",
       "      <th>fips_MSTRALL</th>\n",
       "      <th>taxper_MSTRALL</th>\n",
       "      <th>income_MSTRALL</th>\n",
       "      <th>F990REV_MSTRALL</th>\n",
       "      <th>assets_MSTRALL</th>\n",
       "      <th>ruledate_MSTRALL</th>\n",
       "      <th>deductcd_MSTRALL</th>\n",
       "      <th>accper_MSTRALL</th>\n",
       "      <th>rule_date_v1</th>\n",
       "      <th>taxpd</th>\n",
       "      <th>NAME_SOI</th>\n",
       "      <th>yr_frmtn</th>\n",
       "      <th>pt1_num_vtng_gvrn_bdy_mems</th>\n",
       "      <th>pt1_num_ind_vtng_mems</th>\n",
       "      <th>num_vtng_gvrn_bdy_mems</th>\n",
       "      <th>num_ind_vtng_mems</th>\n",
       "      <th>tot_num_empls</th>\n",
       "      <th>tot_num_vlntrs</th>\n",
       "      <th>contri_grnts_cy</th>\n",
       "      <th>prog_srvc_rev_cy</th>\n",
       "      <th>invst_incm_cy</th>\n",
       "      <th>oth_rev_cy</th>\n",
       "      <th>grnts_and_smlr_amts_cy</th>\n",
       "      <th>tot_prof_fndrsng_exp_cy</th>\n",
       "      <th>tot_fndrsng_exp_cy</th>\n",
       "      <th>pt1_tot_asts_eoy</th>\n",
       "      <th>aud_fincl_stmts</th>\n",
       "      <th>mtrl_divrsn_or_misuse</th>\n",
       "      <th>cnflct_int_plcy</th>\n",
       "      <th>whistleblower_plcy</th>\n",
       "      <th>doc_retention_plcy</th>\n",
       "      <th>federated_campaigns</th>\n",
       "      <th>memshp_dues</th>\n",
       "      <th>rltd_orgs</th>\n",
       "      <th>govt_grnts</th>\n",
       "      <th>all_oth_contri</th>\n",
       "      <th>nncsh_contri</th>\n",
       "      <th>tot_contri</th>\n",
       "      <th>psr_tot</th>\n",
       "      <th>inv_incm_tot_rev</th>\n",
       "      <th>bonds_tot_rev</th>\n",
       "      <th>roylrev_tot_rev</th>\n",
       "      <th>net_rent_tot_rev</th>\n",
       "      <th>gain_or_loss_sec</th>\n",
       "      <th>gain_or_loss_oth</th>\n",
       "      <th>oth_rev_tot</th>\n",
       "      <th>tot_rev</th>\n",
       "      <th>mgmt_srvc_fee_tot</th>\n",
       "      <th>fee_for_srvc_leg_tot</th>\n",
       "      <th>fee_for_srvc_acct_tot</th>\n",
       "      <th>fee_for_srvc_lbby_tot</th>\n",
       "      <th>fee_for_srvc_prof_tot</th>\n",
       "      <th>fee_for_srvc_invst_tot</th>\n",
       "      <th>fee_for_srvc_oth_tot</th>\n",
       "      <th>fs_audited</th>\n",
       "      <th>audit_committee</th>\n",
       "      <th>vlntr_hrs</th>\n",
       "      <th>_merge</th>\n",
       "      <th>rule_date</th>\n",
       "      <th>ruledate_2004_BMF_v2</th>\n",
       "      <th>ruledate_MSTRALL_v2</th>\n",
       "      <th>yr_frmtn_v2</th>\n",
       "      <th>age</th>\n",
       "      <th>category_Animals</th>\n",
       "      <th>category_Arts, Culture, Humanities</th>\n",
       "      <th>category_Community Development</th>\n",
       "      <th>category_Education</th>\n",
       "      <th>category_Environment</th>\n",
       "      <th>category_Health</th>\n",
       "      <th>category_Human Services</th>\n",
       "      <th>category_Human and Civil Rights</th>\n",
       "      <th>category_International</th>\n",
       "      <th>category_Religion</th>\n",
       "      <th>category_Research and Public Policy</th>\n",
       "      <th>govt_revenue_2011_binary</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10166</td>\n",
       "      <td>043314346</td>\n",
       "      <td>http://www.charitynavigator.org/index.cfm?bay=search.summary&amp;orgid=10166</td>\n",
       "      <td>Angel Flight Northeast</td>\n",
       "      <td>Health</td>\n",
       "      <td>Health : Patient and Family Support</td>\n",
       "      <td>2011-01-05 00:00:00</td>\n",
       "      <td>2009-12</td>\n",
       "      <td>2009-12-01</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>2005-12-01</td>\n",
       "      <td>CN 1.0</td>\n",
       "      <td>--</td>\n",
       "      <td>Donor Advisory</td>\n",
       "      <td>NaN</td>\n",
       "      <td>This donor advisory was published on Wednesday, January 5, 2011.In accordance with our.policy for removing Donor Advisories., Charity Navigator removed the Donor Advisory for Angel Flight Northeast on March 1, 2012 because the Donor Advisory had been in place for more than a year (since January 5, 2011) and because the issue that prompted the Donor Advisory has been resolved..Charity Navigator had published a Donor Advisory for this  charity because we became aware of the following informati...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>Angel Flight Northeast</td>\n",
       "      <td>Health</td>\n",
       "      <td>North Andover</td>\n",
       "      <td>MA</td>\n",
       "      <td>Patient and Family Support</td>\n",
       "      <td>Providing free flights so children and adults can access medical care since 1996</td>\n",
       "      <td>http://www.charitynavigator.org/index.cfm?bay=search.summary&amp;orgid=10166</td>\n",
       "      <td>04-3314346</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>--</td>\n",
       "      <td>--</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Missing - Apparent Donor Advisory</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ANGEL FLIGHT OF NEW ENGLAND INC</td>\n",
       "      <td>LAWRENCE MUNICIPAL AIRPORT</td>\n",
       "      <td>NORTH ANDOVER</td>\n",
       "      <td>MA</td>\n",
       "      <td>01845-0000</td>\n",
       "      <td>200812</td>\n",
       "      <td>994179000</td>\n",
       "      <td>201312</td>\n",
       "      <td>869310</td>\n",
       "      <td>896259</td>\n",
       "      <td>3877845</td>\n",
       "      <td>E99</td>\n",
       "      <td>1</td>\n",
       "      <td>199608</td>\n",
       "      <td>ANGEL FLIGHT NEW ENG</td>\n",
       "      <td>MA</td>\n",
       "      <td>E</td>\n",
       "      <td>E87</td>\n",
       "      <td>01867-1110</td>\n",
       "      <td>25017</td>\n",
       "      <td>200012</td>\n",
       "      <td>539450</td>\n",
       "      <td>520862</td>\n",
       "      <td>318758</td>\n",
       "      <td>199608</td>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "      <td>1996</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "      <td>2008</td>\n",
       "      <td>1996</td>\n",
       "      <td>1996</td>\n",
       "      <td>nan</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>6466</td>\n",
       "      <td>953667812</td>\n",
       "      <td>http://www.charitynavigator.org/index.cfm?bay=search.summary&amp;orgid=6466</td>\n",
       "      <td>10,000 Degrees</td>\n",
       "      <td>Education</td>\n",
       "      <td>Education : Scholarship and Financial Support</td>\n",
       "      <td>2011-09-20 00:00:00</td>\n",
       "      <td>2009-06</td>\n",
       "      <td>2009-06-01</td>\n",
       "      <td>FY2009</td>\n",
       "      <td>2003-09-01</td>\n",
       "      <td>CN 2.0</td>\n",
       "      <td>85.33</td>\n",
       "      <td>3 stars</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>10,000 Degrees</td>\n",
       "      <td>Education</td>\n",
       "      <td>San Rafael</td>\n",
       "      <td>CA</td>\n",
       "      <td>Other Education Programs and Services</td>\n",
       "      <td>Creating College Graduates Who Change the World</td>\n",
       "      <td>http://www.charitynavigator.org/index.cfm?bay=search.summary&amp;orgid=6466</td>\n",
       "      <td>95-3667812</td>\n",
       "      <td>06/2009</td>\n",
       "      <td>55.33</td>\n",
       "      <td>85.33</td>\n",
       "      <td>85.33</td>\n",
       "      <td>3</td>\n",
       "      <td>3 stars</td>\n",
       "      <td>52.42</td>\n",
       "      <td>59</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>79.7</td>\n",
       "      <td>7.3</td>\n",
       "      <td>12.8</td>\n",
       "      <td>0.11</td>\n",
       "      <td>3.4</td>\n",
       "      <td>0</td>\n",
       "      <td>0.67</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>yes</td>\n",
       "      <td>NO</td>\n",
       "      <td>yes</td>\n",
       "      <td>NO</td>\n",
       "      <td>NO</td>\n",
       "      <td>yes</td>\n",
       "      <td>3914222</td>\n",
       "      <td>216503</td>\n",
       "      <td>4130725</td>\n",
       "      <td>Note: This organization receives $0 in government support.</td>\n",
       "      <td>2813532</td>\n",
       "      <td>260007</td>\n",
       "      <td>454629</td>\n",
       "      <td>3528168</td>\n",
       "      <td>0</td>\n",
       "      <td>602557</td>\n",
       "      <td>3389166</td>\n",
       "      <td>154300</td>\n",
       "      <td>4.37</td>\n",
       "      <td>info@10000degrees.org</td>\n",
       "      <td>http://www.10000degrees.org</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>both</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10000 DEGREES</td>\n",
       "      <td>1650 LOS GAMOS SUITE 110</td>\n",
       "      <td>SAN RAFAEL</td>\n",
       "      <td>CA</td>\n",
       "      <td>94903-1838</td>\n",
       "      <td>198105</td>\n",
       "      <td>40000000</td>\n",
       "      <td>201506</td>\n",
       "      <td>8611662</td>\n",
       "      <td>7760209</td>\n",
       "      <td>7627694</td>\n",
       "      <td>B11</td>\n",
       "      <td>1</td>\n",
       "      <td>198211</td>\n",
       "      <td>MARIN EDUC FND</td>\n",
       "      <td>CA</td>\n",
       "      <td>B</td>\n",
       "      <td>B20</td>\n",
       "      <td>94901-2920</td>\n",
       "      <td>06041</td>\n",
       "      <td>200106</td>\n",
       "      <td>3958011</td>\n",
       "      <td>3958011</td>\n",
       "      <td>1958251</td>\n",
       "      <td>198211</td>\n",
       "      <td>1</td>\n",
       "      <td>06</td>\n",
       "      <td>1982</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "      <td>1981</td>\n",
       "      <td>1982</td>\n",
       "      <td>1982</td>\n",
       "      <td>nan</td>\n",
       "      <td>35</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   org_id        EIN  \\\n",
       "9   10166  043314346   \n",
       "21   6466  953667812   \n",
       "\n",
       "                                                                     org_url  \\\n",
       "9   http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=10166   \n",
       "21   http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=6466   \n",
       "\n",
       "                      name   category  \\\n",
       "9   Angel Flight Northeast     Health   \n",
       "21          10,000 Degrees  Education   \n",
       "\n",
       "                                    category-full       Date Published  \\\n",
       "9             Health : Patient and Family Support  2011-01-05 00:00:00   \n",
       "21  Education : Scholarship and Financial Support  2011-09-20 00:00:00   \n",
       "\n",
       "   Form 990 FYE Form 990 FYE, v2     FYE Earliest Rating Publication Date  \\\n",
       "9       2009-12       2009-12-01  FY2009                       2005-12-01   \n",
       "21      2009-06       2009-06-01  FY2009                       2003-09-01   \n",
       "\n",
       "   ratings_system Overall Score  Overall Rating  \\\n",
       "9          CN 1.0            --  Donor Advisory   \n",
       "21         CN 2.0         85.33         3 stars   \n",
       "\n",
       "   advisory text - current advisory  \\\n",
       "9                               NaN   \n",
       "21                              NaN   \n",
       "\n",
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          advisory text - past advisory  \\\n",
       "9   This donor advisory was published on Wednesday, January 5, 2011.In accordance with our.policy for removing Donor Advisories., Charity Navigator removed the Donor Advisory for Angel Flight Northeast on March 1, 2012 because the Donor Advisory had been in place for more than a year (since January 5, 2011) and because the issue that prompted the Donor Advisory has been resolved..Charity Navigator had published a Donor Advisory for this  charity because we became aware of the following informati...   \n",
       "21                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  NaN   \n",
       "\n",
       "    current_or_past_donor_advisory  current_donor_advisory  \\\n",
       "9                                1                       0   \n",
       "21                               0                       0   \n",
       "\n",
       "    past_donor_advisory latest_entry current_ratings_url ein_2016  \\\n",
       "9                     1        False                 NaN      NaN   \n",
       "21                    0        False                 NaN      NaN   \n",
       "\n",
       "   Publication_date_and_FY_2016 Publication Date_2016 FYE_2016  \\\n",
       "9                           NaN                   NaN      NaN   \n",
       "21                          NaN                   NaN      NaN   \n",
       "\n",
       "   donor_alert_2016 overall_rating_2016 efficiency_rating_rating_2016  \\\n",
       "9               NaN                 NaN                           NaN   \n",
       "21              NaN                 NaN                           NaN   \n",
       "\n",
       "   AT_rating_2016 overall_rating_star_2016 financial_rating_star_2016  \\\n",
       "9             NaN                      NaN                        NaN   \n",
       "21            NaN                      NaN                        NaN   \n",
       "\n",
       "   AT_rating_star_2016 program_expense_percent_2016  \\\n",
       "9                  NaN                          NaN   \n",
       "21                 NaN                          NaN   \n",
       "\n",
       "   admin_expense_percent_2016 fund_expense_percent_2016 fund_efficiency_2016  \\\n",
       "9                         NaN                       NaN                  NaN   \n",
       "21                        NaN                       NaN                  NaN   \n",
       "\n",
       "   working_capital_ratio_2016 program_expense_growth_2016  \\\n",
       "9                         NaN                         NaN   \n",
       "21                        NaN                         NaN   \n",
       "\n",
       "   liabilities_to_assets_2016 independent_board_2016  \\\n",
       "9                         NaN                    NaN   \n",
       "21                        NaN                    NaN   \n",
       "\n",
       "   no_material_division_2016 audited_financials_2016 no_loans_related_2016  \\\n",
       "9                        NaN                     NaN                   NaN   \n",
       "21                       NaN                     NaN                   NaN   \n",
       "\n",
       "   documents_minutes_2016 form_990_2016 conflict_of_interest_policy_2016  \\\n",
       "9                     NaN           NaN                              NaN   \n",
       "21                    NaN           NaN                              NaN   \n",
       "\n",
       "   whistleblower_policy_2016 records_retention_policy_2016 CEO_listed_2016  \\\n",
       "9                        NaN                           NaN             NaN   \n",
       "21                       NaN                           NaN             NaN   \n",
       "\n",
       "   process_CEO_compensation_2016 no_board_compensation_2016  \\\n",
       "9                            NaN                        NaN   \n",
       "21                           NaN                        NaN   \n",
       "\n",
       "   donor_privacy_policy_2016 board_listed_2016 audited_financials_web_2016  \\\n",
       "9                        NaN               NaN                         NaN   \n",
       "21                       NaN               NaN                         NaN   \n",
       "\n",
       "   form_990_web_2016 staff_listed_2016 contributions_gifts_grants_2016  \\\n",
       "9                NaN               NaN                             NaN   \n",
       "21               NaN               NaN                             NaN   \n",
       "\n",
       "   federated_campaigns_2016 membership_dues_2016 fundraising_events_2016  \\\n",
       "9                       NaN                  NaN                     NaN   \n",
       "21                      NaN                  NaN                     NaN   \n",
       "\n",
       "   related_organizations_2016 government_grants_2016 total_contributions_2016  \\\n",
       "9                         NaN                    NaN                      NaN   \n",
       "21                        NaN                    NaN                      NaN   \n",
       "\n",
       "   program_service_revenue_2016 total_primary_revenue_2016 other_revenue_2016  \\\n",
       "9                           NaN                        NaN                NaN   \n",
       "21                          NaN                        NaN                NaN   \n",
       "\n",
       "   total_revenue_2016 program_expenses_2016 administrative_expenses_2016  \\\n",
       "9                 NaN                   NaN                          NaN   \n",
       "21                NaN                   NaN                          NaN   \n",
       "\n",
       "   fundraising_expenses_2016 total_functional_expenses_2016  \\\n",
       "9                        NaN                            NaN   \n",
       "21                       NaN                            NaN   \n",
       "\n",
       "   payments_to_affiliates_2016 excess_or_deficit_2016 net_assets_2016  \\\n",
       "9                          NaN                    NaN             NaN   \n",
       "21                         NaN                    NaN             NaN   \n",
       "\n",
       "   comp_2016 cp_2016 mission_2016  2011 data       charity_name_2011  \\\n",
       "9        NaN     NaN          NaN          1  Angel Flight Northeast   \n",
       "21       NaN     NaN          NaN          1          10,000 Degrees   \n",
       "\n",
       "   category_2011      city_2011 state_2011  \\\n",
       "9         Health  North Andover         MA   \n",
       "21     Education     San Rafael         CA   \n",
       "\n",
       "                               cause_2011  \\\n",
       "9              Patient and Family Support   \n",
       "21  Other Education Programs and Services   \n",
       "\n",
       "                                                                       tag_line_2011  \\\n",
       "9   Providing free flights so children and adults can access medical care since 1996   \n",
       "21                                   Creating College Graduates Who Change the World   \n",
       "\n",
       "                                                                    url_2011  \\\n",
       "9   http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=10166   \n",
       "21   http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=6466   \n",
       "\n",
       "      ein_2011   fye_2011  overall_rating_2011 overall_rating_2011_plus_30  \\\n",
       "9   04-3314346        NaN                  NaN                          --   \n",
       "21  95-3667812   06/2009                 55.33                       85.33   \n",
       "\n",
       "   overall_rating_2011_plus_30_v2  overall_rating_star_2011  \\\n",
       "9                              --                       NaN   \n",
       "21                          85.33                         3   \n",
       "\n",
       "        overall_rating_star_2011_text  efficiency_rating_2011  AT_rating_2011  \\\n",
       "9   Missing - Apparent Donor Advisory                     NaN             NaN   \n",
       "21                            3 stars                   52.42              59   \n",
       "\n",
       "    financial_rating_star_2011  AT_rating_star_2011  \\\n",
       "9                          NaN                  NaN   \n",
       "21                           3                    3   \n",
       "\n",
       "    program_expense_percent_2011  admin_expense_percent_2011  \\\n",
       "9                            NaN                         NaN   \n",
       "21                          79.7                         7.3   \n",
       "\n",
       "    fund_expense_percent_2011  fund_efficiency_2011  \\\n",
       "9                         NaN                   NaN   \n",
       "21                       12.8                  0.11   \n",
       "\n",
       "    primary_revenue_growth_2011  program_expense_growth_2011  \\\n",
       "9                           NaN                          NaN   \n",
       "21                          3.4                            0   \n",
       "\n",
       "    working_capital_ratio_2011 independent_board_2011  \\\n",
       "9                          NaN                    NaN   \n",
       "21                        0.67                    yes   \n",
       "\n",
       "   no_material_division_2011 audited_financials_2011 no_loans_related_2011  \\\n",
       "9                        NaN                     NaN                   NaN   \n",
       "21                       yes                     yes                   yes   \n",
       "\n",
       "   documents_minutes_2011 form_990_2011 conflict_of_interest_policy_2011  \\\n",
       "9                     NaN           NaN                              NaN   \n",
       "21                    yes           yes                              yes   \n",
       "\n",
       "   whistleblower_policy_2011 records_retention_policy_2011 CEO_listed_2011  \\\n",
       "9                        NaN                           NaN             NaN   \n",
       "21                       yes                           yes             yes   \n",
       "\n",
       "   process_CEO_compensation_2011 no_board_compensation_2011  \\\n",
       "9                            NaN                        NaN   \n",
       "21                           yes                        yes   \n",
       "\n",
       "   donor_privacy_policy_2011 board_listed_2011 audited_financials_web_2011  \\\n",
       "9                        NaN               NaN                         NaN   \n",
       "21                        NO               yes                          NO   \n",
       "\n",
       "   form_990_web_2011 staff_listed_2011  primary_revenue_2011  \\\n",
       "9                NaN               NaN                   NaN   \n",
       "21                NO               yes               3914222   \n",
       "\n",
       "    other_revenue_2011  total_revenue_2011  \\\n",
       "9                  NaN                 NaN   \n",
       "21              216503             4130725   \n",
       "\n",
       "                                             govt_revenue_2011  \\\n",
       "9                                                          NaN   \n",
       "21  Note: This organization receives $0 in government support.   \n",
       "\n",
       "    program_expense_2011  admin_expense_2011  fund_expense_2011  \\\n",
       "9                    NaN                 NaN                NaN   \n",
       "21               2813532              260007             454629   \n",
       "\n",
       "    total_functional_expense_2011  affiliate_payments_2011  \\\n",
       "9                             NaN                      NaN   \n",
       "21                        3528168                        0   \n",
       "\n",
       "    budget_surplus_2011  net_assets_2011  leader_comp_2011  \\\n",
       "9                   NaN              NaN               NaN   \n",
       "21               602557          3389166            154300   \n",
       "\n",
       "    leader_comp_percent_2011             email_2011  \\\n",
       "9                        NaN                    NaN   \n",
       "21                      4.37  info@10000degrees.org   \n",
       "\n",
       "                   website_2011 2016 Advisory - Date Posted  \\\n",
       "9                           NaN                         NaN   \n",
       "21  http://www.10000degrees.org                         NaN   \n",
       "\n",
       "   2016 Advisory - Charity Name 2016 Advisory - advisory_url  \\\n",
       "9                           NaN                          NaN   \n",
       "21                          NaN                          NaN   \n",
       "\n",
       "   2016 Advisory - advisory  _merge_v1  to_be_merged NEW ROW  \\\n",
       "9                       NaN  left_only             1     NaN   \n",
       "21                      NaN       both             0     NaN   \n",
       "\n",
       "                      NAME_2015_BMF             STREET_2015_BMF  \\\n",
       "9   ANGEL FLIGHT OF NEW ENGLAND INC  LAWRENCE MUNICIPAL AIRPORT   \n",
       "21                    10000 DEGREES    1650 LOS GAMOS SUITE 110   \n",
       "\n",
       "    CITY_2015_BMF STATE_2015_BMF ZIP_2015_BMF  RULING_2015_BMF  \\\n",
       "9   NORTH ANDOVER             MA   01845-0000           200812   \n",
       "21     SAN RAFAEL             CA   94903-1838           198105   \n",
       "\n",
       "    ACTIVITY_2015_BMF  TAX_PERIOD_2015_BMF  ASSET_AMT_2015_BMF  \\\n",
       "9           994179000               201312              869310   \n",
       "21           40000000               201506             8611662   \n",
       "\n",
       "    INCOME_AMT_2015_BMF  REVENUE_AMT_2015_BMF NTEE_CD_2015_BMF  2015 BMF  \\\n",
       "9                896259               3877845              E99         1   \n",
       "21              7760209               7627694              B11         1   \n",
       "\n",
       "    ruledate_2004_BMF          name_MSTRALL state_MSTRALL NTEE1_MSTRALL  \\\n",
       "9              199608  ANGEL FLIGHT NEW ENG            MA             E   \n",
       "21             198211        MARIN EDUC FND            CA             B   \n",
       "\n",
       "   nteecc_MSTRALL zip_MSTRALL fips_MSTRALL taxper_MSTRALL  income_MSTRALL  \\\n",
       "9             E87  01867-1110        25017         200012          539450   \n",
       "21            B20  94901-2920        06041         200106         3958011   \n",
       "\n",
       "    F990REV_MSTRALL  assets_MSTRALL ruledate_MSTRALL deductcd_MSTRALL  \\\n",
       "9            520862          318758           199608                1   \n",
       "21          3958011         1958251           198211                1   \n",
       "\n",
       "   accper_MSTRALL rule_date_v1 taxpd NAME_SOI  yr_frmtn  \\\n",
       "9              12         1996   NaN      NaN       NaN   \n",
       "21             06         1982   NaN      NaN       NaN   \n",
       "\n",
       "    pt1_num_vtng_gvrn_bdy_mems  pt1_num_ind_vtng_mems  num_vtng_gvrn_bdy_mems  \\\n",
       "9                          NaN                    NaN                     NaN   \n",
       "21                         NaN                    NaN                     NaN   \n",
       "\n",
       "    num_ind_vtng_mems  tot_num_empls  tot_num_vlntrs  contri_grnts_cy  \\\n",
       "9                 NaN            NaN             NaN              NaN   \n",
       "21                NaN            NaN             NaN              NaN   \n",
       "\n",
       "    prog_srvc_rev_cy  invst_incm_cy  oth_rev_cy  grnts_and_smlr_amts_cy  \\\n",
       "9                NaN            NaN         NaN                     NaN   \n",
       "21               NaN            NaN         NaN                     NaN   \n",
       "\n",
       "    tot_prof_fndrsng_exp_cy  tot_fndrsng_exp_cy  pt1_tot_asts_eoy  \\\n",
       "9                       NaN                 NaN               NaN   \n",
       "21                      NaN                 NaN               NaN   \n",
       "\n",
       "   aud_fincl_stmts mtrl_divrsn_or_misuse cnflct_int_plcy whistleblower_plcy  \\\n",
       "9              NaN                   NaN             NaN                NaN   \n",
       "21             NaN                   NaN             NaN                NaN   \n",
       "\n",
       "   doc_retention_plcy  federated_campaigns  memshp_dues  rltd_orgs  \\\n",
       "9                 NaN                  NaN          NaN        NaN   \n",
       "21                NaN                  NaN          NaN        NaN   \n",
       "\n",
       "    govt_grnts  all_oth_contri  nncsh_contri  tot_contri  psr_tot  \\\n",
       "9          NaN             NaN           NaN         NaN      NaN   \n",
       "21         NaN             NaN           NaN         NaN      NaN   \n",
       "\n",
       "    inv_incm_tot_rev  bonds_tot_rev  roylrev_tot_rev  net_rent_tot_rev  \\\n",
       "9                NaN            NaN              NaN               NaN   \n",
       "21               NaN            NaN              NaN               NaN   \n",
       "\n",
       "    gain_or_loss_sec  gain_or_loss_oth  oth_rev_tot  tot_rev  \\\n",
       "9                NaN               NaN          NaN      NaN   \n",
       "21               NaN               NaN          NaN      NaN   \n",
       "\n",
       "    mgmt_srvc_fee_tot  fee_for_srvc_leg_tot  fee_for_srvc_acct_tot  \\\n",
       "9                 NaN                   NaN                    NaN   \n",
       "21                NaN                   NaN                    NaN   \n",
       "\n",
       "    fee_for_srvc_lbby_tot  fee_for_srvc_prof_tot  fee_for_srvc_invst_tot  \\\n",
       "9                     NaN                    NaN                     NaN   \n",
       "21                    NaN                    NaN                     NaN   \n",
       "\n",
       "    fee_for_srvc_oth_tot fs_audited audit_committee  vlntr_hrs     _merge  \\\n",
       "9                    NaN        NaN             NaN        NaN  left_only   \n",
       "21                   NaN        NaN             NaN        NaN  left_only   \n",
       "\n",
       "   rule_date  ruledate_2004_BMF_v2 ruledate_MSTRALL_v2 yr_frmtn_v2  age  \\\n",
       "9       2008                  1996                1996         nan    8   \n",
       "21      1981                  1982                1982         nan   35   \n",
       "\n",
       "    category_Animals  category_Arts, Culture, Humanities  \\\n",
       "9                  0                                   0   \n",
       "21                 0                                   0   \n",
       "\n",
       "    category_Community Development  category_Education  category_Environment  \\\n",
       "9                                0                   0                     0   \n",
       "21                               0                   1                     0   \n",
       "\n",
       "    category_Health  category_Human Services  category_Human and Civil Rights  \\\n",
       "9                 1                        0                                0   \n",
       "21                0                        0                                0   \n",
       "\n",
       "    category_International  category_Religion  \\\n",
       "9                        0                  0   \n",
       "21                       0                  0   \n",
       "\n",
       "    category_Research and Public Policy  govt_revenue_2011_binary  \n",
       "9                                     0                       NaN  \n",
       "21                                    0                         0  "
      ]
     },
     "execution_count": 411,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['2011 data']==1][:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 396,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['org_id', 'EIN', 'org_url', 'name', 'category', 'category-full', 'Date Published', 'Form 990 FYE', 'Form 990 FYE, v2', 'FYE', 'Earliest Rating Publication Date', 'ratings_system', 'Overall Score', 'Overall Rating', 'advisory text - current advisory', 'advisory text - past advisory', 'current_or_past_donor_advisory', 'current_donor_advisory', 'past_donor_advisory', 'latest_entry', 'current_ratings_url', 'ein_2016', 'Publication_date_and_FY_2016', 'Publication Date_2016', 'FYE_2016', 'donor_alert_2016', 'overall_rating_2016', 'efficiency_rating_rating_2016', 'AT_rating_2016', 'overall_rating_star_2016', 'financial_rating_star_2016', 'AT_rating_star_2016', 'program_expense_percent_2016', 'admin_expense_percent_2016', 'fund_expense_percent_2016', 'fund_efficiency_2016', 'working_capital_ratio_2016', 'program_expense_growth_2016', 'liabilities_to_assets_2016', 'independent_board_2016', 'no_material_division_2016', 'audited_financials_2016', 'no_loans_related_2016', 'documents_minutes_2016', 'form_990_2016', 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016', 'CEO_listed_2016', 'process_CEO_compensation_2016', 'no_board_compensation_2016', 'donor_privacy_policy_2016', 'board_listed_2016', 'audited_financials_web_2016', 'form_990_web_2016', 'staff_listed_2016', 'contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016', 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016', 'total_contributions_2016', 'program_service_revenue_2016', 'total_primary_revenue_2016', 'other_revenue_2016', 'total_revenue_2016', 'program_expenses_2016', 'administrative_expenses_2016', 'fundraising_expenses_2016', 'total_functional_expenses_2016', 'payments_to_affiliates_2016', 'excess_or_deficit_2016', 'net_assets_2016', 'comp_2016', 'cp_2016', 'mission_2016', '2011 data', 'charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_2011_plus_30', 'overall_rating_2011_plus_30_v2', 'overall_rating_star_2011', 'overall_rating_star_2011_text', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011', '2016 Advisory - Date Posted', '2016 Advisory - Charity Name', '2016 Advisory - advisory_url', '2016 Advisory - advisory', '_merge_v1', 'to_be_merged', u'NEW ROW', 'NAME_2015_BMF', 'STREET_2015_BMF', 'CITY_2015_BMF', 'STATE_2015_BMF', 'ZIP_2015_BMF', 'RULING_2015_BMF', 'ACTIVITY_2015_BMF', 'TAX_PERIOD_2015_BMF', 'ASSET_AMT_2015_BMF', 'INCOME_AMT_2015_BMF', 'REVENUE_AMT_2015_BMF', 'NTEE_CD_2015_BMF', '2015 BMF', 'ruledate_2004_BMF', 'name_MSTRALL', 'state_MSTRALL', 'NTEE1_MSTRALL', 'nteecc_MSTRALL', 'zip_MSTRALL', 'fips_MSTRALL', 'taxper_MSTRALL', 'income_MSTRALL', 'F990REV_MSTRALL', 'assets_MSTRALL', 'ruledate_MSTRALL', 'deductcd_MSTRALL', 'accper_MSTRALL', 'rule_date_v1', 'taxpd', 'NAME_SOI', 'yr_frmtn', 'pt1_num_vtng_gvrn_bdy_mems', 'pt1_num_ind_vtng_mems', 'num_vtng_gvrn_bdy_mems', 'num_ind_vtng_mems', 'tot_num_empls', 'tot_num_vlntrs', 'contri_grnts_cy', 'prog_srvc_rev_cy', 'invst_incm_cy', 'oth_rev_cy', 'grnts_and_smlr_amts_cy', 'tot_prof_fndrsng_exp_cy', 'tot_fndrsng_exp_cy', 'pt1_tot_asts_eoy', 'aud_fincl_stmts', 'mtrl_divrsn_or_misuse', 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy', 'federated_campaigns', 'memshp_dues', 'rltd_orgs', 'govt_grnts', 'all_oth_contri', 'nncsh_contri', 'tot_contri', 'psr_tot', 'inv_incm_tot_rev', 'bonds_tot_rev', 'roylrev_tot_rev', 'net_rent_tot_rev', 'gain_or_loss_sec', 'gain_or_loss_oth', 'oth_rev_tot', 'tot_rev', 'mgmt_srvc_fee_tot', 'fee_for_srvc_leg_tot', 'fee_for_srvc_acct_tot', 'fee_for_srvc_lbby_tot', 'fee_for_srvc_prof_tot', 'fee_for_srvc_invst_tot', 'fee_for_srvc_oth_tot', 'fs_audited', 'audit_committee', 'vlntr_hrs', '_merge', 'rule_date', 'ruledate_2004_BMF_v2', 'ruledate_MSTRALL_v2', 'yr_frmtn_v2', 'age', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n"
     ]
    }
   ],
   "source": [
    "print df.columns.tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SOI data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "SOI_columns = ['taxpd', 'yr_frmtn', 'pt1_num_vtng_gvrn_bdy_mems', 'pt1_num_ind_vtng_mems', 'num_vtng_gvrn_bdy_mems', \n",
    "               'num_ind_vtng_mems', 'tot_num_empls', 'tot_num_vlntrs', 'contri_grnts_cy', 'prog_srvc_rev_cy', \n",
    "               'invst_incm_cy', 'oth_rev_cy', 'grnts_and_smlr_amts_cy', 'tot_prof_fndrsng_exp_cy', \n",
    "               'tot_fndrsng_exp_cy', 'pt1_tot_asts_eoy', 'aud_fincl_stmts', 'mtrl_divrsn_or_misuse', \n",
    "               'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy', 'federated_campaigns', 'memshp_dues', \n",
    "               'rltd_orgs', 'govt_grnts', 'all_oth_contri', 'nncsh_contri', 'tot_contri', 'psr_tot', \n",
    "               'inv_incm_tot_rev', 'bonds_tot_rev', 'roylrev_tot_rev', 'net_rent_tot_rev', 'gain_or_loss_sec', \n",
    "               'gain_or_loss_oth', 'oth_rev_tot', 'tot_rev', 'mgmt_srvc_fee_tot', 'fee_for_srvc_leg_tot', \n",
    "               'fee_for_srvc_acct_tot', 'fee_for_srvc_lbby_tot', 'fee_for_srvc_prof_tot', 'fee_for_srvc_invst_tot', \n",
    "               'fee_for_srvc_oth_tot', 'fs_audited', 'audit_committee', 'vlntr_hrs', 'NAME_SOI']\n",
    "len(SOI_columns)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}