{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Notebook Tasks\n", "\n", "
\n", "**_Possible Samples for Statistical Tests_**:\n", "- Given the above, there are a number of possible tests:

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IV: SOX PoliciesDV: Donor AdvisoryNNotesTO DO
201120164,85747 donor advisories on these organizations; associational test (we don't know when the SOX policies were added); also, DV is 'current donor advisory'ready to run
20112012-20164,85747 2016 advisories plus probably another dozen or so advisories over the 2012-2015 period; associational test as above, but adds in donor advisories that were put in place then dropped between 2012 and 2015.some minor work creating this new DV but not very burdensome
201120115,43939 donor advisories; pure cross-sectional test
Download the '2011' 990 data (SOX policies + controls) for the 39 orgs with a 2011 donor advisory; a few hours work to download and enter the data
201620168,304328 donor advisories; pure cross-sectional testready to run
change 2011-201620164,857'Divide 4,857 orgs into three groups: i) those with no SOX policies in 2011 and still no SOX policies in 2016; ii) those with SOX policies in 2011 and 2016; and iii) those with no SOX policies in 2011 but SOX policies in 2016. Create dummy variables for each group and see whether those in group iii) do better than i) or ii). This is a relatively low cost 'pre-post' test.moderate amount of work to create the new dummies but not too burdensome
change 2011-20162012-2016TBDSimilar to above option, but would need to take a sample of organizations in group iii) and go through their 990s to find out exactly when they added the SOX policiesResource-intensive 990 searches
\n", "\n", "\n", "

\n", "**_Notes from Meeting with Dan:_**\n", "- Do not do 3rd or 6th test -- benefit not worth the cost\n", "- 1st and 2nd tests can be robustness analyses\n", "- Focus on 4th and 5th tests\n", "- Control variables:\n", " - Size: total revenues best (probably logged)\n", " - will need 2011 and 2016 versions for the 4th and 5th tests\n", " - efficiency ratio\n", " - age (from BMF)\n", " - complexity (could be a good control from Erica's paper)\n", " - fixed effects:\n", " - state\n", " - category\n", " - I need to scrape the category dummies for the new orgs in the 2016 database\n", " - CN does not include that information in the ratings area, but it is included on the webpage in the 'breadcrumbs' area\n", " - The focus of our paper is on SOC policies; if an org has SOX policies it probably has other governance policies, and these would be highly correlated. So, we will leave the other governance variables out of one version of the 4th and 5th tests, and then try to include them in another set. The best candidates are:\n", " - *independent board* --> related to Erica's *independence of key actors\" concept\n", " - *board review of 990* and *audited financials* --> both related to Erica's *board monitoring* concept\n", " - we could include other governance variables as needed.\n", "- We are focusing on non-health, non-university organizations; by focusing more on a donor-focused sample (CN), we are differentiating the work from previous studies.\n", "- To differentiate from Erica's *JBE* paper, we should use the SOI data to see how many of the donor advisories are because of 'non-material diversions'.\n", "\n", "\n", "\n", "

\n", "**_To Do (beyond notes listed in table above):_**\n", "- For all above tests, we need to decide on controls, then find/merge/create any not currently in dataset\n", "- Run a selection model?\n", "- Code the *type* of advisory? Maybe save for future study\n", "- There are 53 orgs on the CN 'Watchlist' -- we probably don't need to look at these but it's a possible future move.\n", "\n", "
\n", "**_Notes on 2011 data:_**\n", "- Only 47 of 329 current donor advisories are on orgs that were rated in 2011\n", "- Number of 2011 orgs (n=5,349) missing from 2016 ratings: 582\n", "- Number of 2016 orgs (n=8,304) not in 2011 ratings: 3,447\n", "- In 2011 when I scraped the current ratings there are 39 blank rows. Specifically, I checked the following spreadsheet: *Charity Navigator - current ratings, October 18, 2011 (WITH UPDATES FOR DONOR ADVISORY ORGS).xlsx* -- 39 rows were blank for all ratings information, so I checked against the historical ratings on the CN website. (So far) all rows were either 1) dropped from CN, 2) had a donor advisory, or 3) still have a donor advisory. I have 5,439 orgs in the 2011 database. 39 seem to have had donor advisories on them at that time. So, the 2011 sample is the 5,400 orgs that did not have an advisory on them at the time. This conforms with the *n* of 5,400 in the above logit." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "### Import Packages\n", "First, we will import several necessary Python packages. We will be using the Python Data Analysis Library, or PANDAS, extensively for our data manipulations. It is invaluable for analyzing datasets. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import of basic elements of PANDAS and numpy" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from pandas import DataFrame\n", "from pandas import Series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "We can check which version of various packages we're using. You can see I'm running PANDAS 0.17 here." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.18.1\n" ] } ], "source": [ "print pd.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "PANDAS allows you to set various options for, among other things, inspecting the data. I like to be able to see all of the columns. Therefore, I typically include this line at the top of all my notebooks." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#http://pandas.pydata.org/pandas-docs/stable/options.html\n", "pd.set_option('display.max_columns', None)\n", "pd.set_option('max_colwidth', 500)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read in Data\n", "Let's read in the merged historical/current/2011 dataset we created in the last notebook. First we'll change the working directory." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/gregorysaxton/Google Drive/SOX\n" ] } ], "source": [ "cd '/Users/gregorysaxton/Google Drive/SOX'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Logit Tests" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "35\n", "4863\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revcategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
507150.00.00.05954010202467FY20092009-12CN 2.01.00.01.01.01.03.01.01.00.7888950.03.062.015.947563Research and Public PolicyME8432154.00.00.00.00.00.00.00.00.00.00.01.0
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "50715 0.0 0.0 0.0 \n", "\n", " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "50715 5954 010202467 FY2009 2009-12 CN 2.0 1.0 \n", "\n", " 2016_data conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "50715 0.0 1.0 1.0 \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "50715 1.0 3.0 1.0 \n", "\n", " SOX_policies_all_binary program_efficiency complexity \\\n", "50715 1.0 0.788895 0.0 \n", "\n", " complexity_2011 age total_revenue_logged \\\n", "50715 3.0 62.0 15.947563 \n", "\n", " category state tot_rev category_Animals \\\n", "50715 Research and Public Policy ME 8432154.0 0.0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "50715 0.0 0.0 \n", "\n", " category_Education category_Environment category_Health \\\n", "50715 0.0 0.0 0.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "50715 0.0 0.0 \n", "\n", " category_International category_Religion \\\n", "50715 0.0 0.0 \n", "\n", " category_Research and Public Policy \n", "50715 1.0 " ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011 = pd.read_pickle('Tests 1-2 data.pkl')\n", "print len(df_2011.columns)\n", "print len(df_2011)\n", "df_2011.head(1)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'complexity_2011', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ "print df_2011.columns.tolist()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0 4816\n", "1.0 47\n", "Name: donor_advisory_2016, dtype: int64\n", "0.0 4755\n", "1.0 108\n", "Name: donor_advisory_2011_to_2016, dtype: int64\n" ] } ], "source": [ "print df_2011['donor_advisory_2016'].value_counts()\n", "print df_2011['donor_advisory_2011_to_2016'].value_counts()" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ "#DVs = ['donor_advisory', \n", "DVs = ['donor_advisory_2016', 'donor_advisory_2011_to_2016']\n", "indicators = ['org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data']\n", "IVs = ['conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2',\n", " 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']\n", "controls = ['program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state']\n", "fixed_effects = ['category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', \n", " 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', \n", " 'category_Human and Civil Rights', 'category_International', 'category_Religion', \n", " 'category_Research and Public Policy']\n", "SOI_check = ['tot_rev']\n", "\n", "merge_cols = ['_merge_v1', '_merge_v2', '_merge_v3', '_merge_v4', '_merge_47', '_merge_efile']\n", "\n", "#+ SOI_check\n", "logit_cols = DVs + indicators + IVs + controls + fixed_effects\n", "print logit_cols" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revcategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [donor_advisory, donor_advisory_2016, donor_advisory_2011_to_2016, org_id, EIN, FYE, Form 990 FYE, ratings_system, 2011_data, 2016_data, conflict_of_interest_policy_v2, records_retention_policy_v2, whistleblower_policy_v2, SOX_policies, SOX_policies_binary, SOX_policies_all_binary, program_efficiency, complexity, complexity_2011, age, total_revenue_logged, category, state, tot_rev, category_Animals, category_Arts, Culture, Humanities, category_Community Development, category_Education, category_Environment, category_Health, category_Human Services, category_Human and Civil Rights, category_International, category_Religion, category_Research and Public Policy]\n", "Index: []" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011[df_2011.duplicated()]" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#df_2011.to_excel('df_2011.xls')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test Logit" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.6.1\n" ] } ], "source": [ "import statsmodels\n", "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf #FOR USING 'R'-STYLE FORMULAS FOR REGRESSIONS\n", "print statsmodels.__version__" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "donor_advisory_2011_to_2016 ~ SOX_policies + total_revenue_logged + program_efficiency + age + complexity_2011 + C(category)\n" ] } ], "source": [ "#IVs = '%s + ' % IV\n", "#DV = '%s ~ ' % DV \n", "IVs = 'SOX_policies '\n", "#IVs = 'SOX_policies_binary'\n", "#DV = 'advisory ~ '\n", "#DV = 'donor_advisory_2016 ~ '\n", "DV = 'donor_advisory_2011_to_2016 ~ '\n", "controls = '+ total_revenue_logged + program_efficiency + age + complexity_2011 + C(category)'\n", "\n", "#admin_expense_percent + leader_comp_percent + budget_surplus\n", "logit_formula = DV+IVs+controls\n", "print logit_formula\n", "#globals()[\"mod%s\" % model_num] = smf.logit(formula=logit_formula, data=df).fit() \n", "#print globals()[\"mod%s\" % model_num].summary()\n", "# #print model_num.summary()\n", "#print '\\n', \"Chi-squared value:\", globals()[\"mod%s\" % model_num].llr, '\\n' #TO GET THE CHI-SQUARED VALUE" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ "print df_2011.columns.tolist()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "4838" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logit_cols1 = ['donor_advisory_2016', 'donor_advisory_2011_to_2016', \n", " 'SOX_policies', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state']\n", "len(df_2011[logit_cols1].dropna())" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1242" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df_2011.dropna())" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warning: Maximum number of iterations has been exceeded.\n", " Current function value: 0.075203\n", " Iterations: 35\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals\n", " \"Check mle_retvals\", ConvergenceWarning)\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: donor_advisory_2011_to_2016 No. Observations: 4833
Model: Logit Df Residuals: 4817
Method: MLE Df Model: 15
Date: Tue, 06 Sep 2016 Pseudo R-squ.: 0.08962
Time: 12:35:26 Log-Likelihood: -363.45
converged: False LL-Null: -399.24
LLR p-value: 2.351e-09
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept -2.6828 1.438 -1.865 0.062 -5.502 0.136
C(category)[T.Arts, Culture, Humanities] -0.5078 0.515 -0.986 0.324 -1.517 0.502
C(category)[T.Community Development] -0.3612 0.575 -0.629 0.530 -1.487 0.765
C(category)[T.Education] -1.2152 0.794 -1.531 0.126 -2.771 0.340
C(category)[T.Environment] -18.8719 4856.123 -0.004 0.997 -9536.698 9498.954
C(category)[T.Health] -1.0195 0.548 -1.862 0.063 -2.093 0.054
C(category)[T.Human Services] -0.1790 0.415 -0.431 0.666 -0.993 0.635
C(category)[T.Human and Civil Rights] -0.6823 0.687 -0.993 0.321 -2.029 0.664
C(category)[T.International] -0.3676 0.511 -0.719 0.472 -1.369 0.634
C(category)[T.Religion] 0.3141 0.449 0.700 0.484 -0.566 1.194
C(category)[T.Research and Public Policy] -0.5298 0.796 -0.665 0.506 -2.090 1.031
SOX_policies -0.4256 0.110 -3.876 0.000 -0.641 -0.210
total_revenue_logged 0.2806 0.101 2.785 0.005 0.083 0.478
program_efficiency -3.0563 0.767 -3.982 0.000 -4.561 -1.552
age -0.0065 0.007 -0.981 0.327 -0.019 0.006
complexity_2011 -0.7391 0.274 -2.700 0.007 -1.275 -0.203
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "=======================================================================================\n", "Dep. Variable: donor_advisory_2011_to_2016 No. Observations: 4833\n", "Model: Logit Df Residuals: 4817\n", "Method: MLE Df Model: 15\n", "Date: Tue, 06 Sep 2016 Pseudo R-squ.: 0.08962\n", "Time: 12:35:26 Log-Likelihood: -363.45\n", "converged: False LL-Null: -399.24\n", " LLR p-value: 2.351e-09\n", "=============================================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------------------------------\n", "Intercept -2.6828 1.438 -1.865 0.062 -5.502 0.136\n", "C(category)[T.Arts, Culture, Humanities] -0.5078 0.515 -0.986 0.324 -1.517 0.502\n", "C(category)[T.Community Development] -0.3612 0.575 -0.629 0.530 -1.487 0.765\n", "C(category)[T.Education] -1.2152 0.794 -1.531 0.126 -2.771 0.340\n", "C(category)[T.Environment] -18.8719 4856.123 -0.004 0.997 -9536.698 9498.954\n", "C(category)[T.Health] -1.0195 0.548 -1.862 0.063 -2.093 0.054\n", "C(category)[T.Human Services] -0.1790 0.415 -0.431 0.666 -0.993 0.635\n", "C(category)[T.Human and Civil Rights] -0.6823 0.687 -0.993 0.321 -2.029 0.664\n", "C(category)[T.International] -0.3676 0.511 -0.719 0.472 -1.369 0.634\n", "C(category)[T.Religion] 0.3141 0.449 0.700 0.484 -0.566 1.194\n", "C(category)[T.Research and Public Policy] -0.5298 0.796 -0.665 0.506 -2.090 1.031\n", "SOX_policies -0.4256 0.110 -3.876 0.000 -0.641 -0.210\n", "total_revenue_logged 0.2806 0.101 2.785 0.005 0.083 0.478\n", "program_efficiency -3.0563 0.767 -3.982 0.000 -4.561 -1.552\n", "age -0.0065 0.007 -0.981 0.327 -0.019 0.006\n", "complexity_2011 -0.7391 0.274 -2.700 0.007 -1.275 -0.203\n", "=============================================================================================================\n", "\"\"\"" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logit = smf.logit(formula=logit_formula, data=df_2011).fit() \n", "logit.summary()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_2011['donor_advisory_2016'] = df_2011['donor_advisory_2016'].astype('int')\n", "df_2011['donor_advisory_2011_to_2016'] = df_2011['donor_advisory_2011_to_2016'].astype('int')" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "donor_advisory float64\n", "donor_advisory_2016 int64\n", "donor_advisory_2011_to_2016 int64\n", "org_id object\n", "EIN object\n", "FYE object\n", "Form 990 FYE object\n", "ratings_system object\n", "2011_data float64\n", "2016_data float64\n", "conflict_of_interest_policy_v2 float64\n", "records_retention_policy_v2 float64\n", "whistleblower_policy_v2 float64\n", "SOX_policies float64\n", "SOX_policies_binary float64\n", "SOX_policies_all_binary float64\n", "program_efficiency float64\n", "complexity float64\n", "age float64\n", "total_revenue_logged float64\n", "category object\n", "state object\n", "tot_rev float64\n", "category_Animals float64\n", "category_Arts, Culture, Humanities float64\n", "category_Community Development float64\n", "category_Education float64\n", "category_Environment float64\n", "category_Health float64\n", "category_Human Services float64\n", "category_Human and Civil Rights float64\n", "category_International float64\n", "category_Religion float64\n", "category_Research and Public Policy float64\n", "dtype: object" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Write Function for Logits" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#def new_logit(IV,model_num):\n", "def new_logit_clustered(data, DV, columns, FE, model_num):\n", " #IVs = '%s + ' % IV\n", " #DV = 'RTs_binary ~ '\n", " DV = '%s ~ ' % DV \n", " #controls = 'from_user_followers_count + time_on_twitter_days + CSR_sustainability + \\\n", " # URLs_binary + photo'\n", " IVs = ' + '.join(columns)\n", " FE = '%s ' % FE\n", " logit_formula = DV+IVs+FE\n", " print logit_formula\n", " globals()[\"mod%s\" % model_num] = smf.logit(formula=logit_formula, data=data).fit(cov_type='cluster',\n", " cov_kwds={'groups': df['firm_from_user_screen_name']}) \n", " print globals()[\"mod%s\" % model_num].summary()\n", " #print model_num.summary()\n", " print '\\n', \"Chi-squared value:\", globals()[\"mod%s\" % model_num].llr, '\\n' #TO GET THE CHI-SQUARED VALUE\n", " #print '\\n', \"Pseudo R-squared:\", globals()[\"mod%s\" % model_num].prsquared #TO GET THE PSEUDO-R-SQUARED" ] }, { "cell_type": "code", "execution_count": 1157, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "donor_advisory_2011_to_2016 ~ SOX_policies + total_revenue_logged + program_efficiency + age + complexity_2011 + C(category)\n" ] } ], "source": [] }, { "cell_type": "code", "execution_count": 1089, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4863\n" ] }, { "data": { "text/plain": [ "4813" ] }, "execution_count": 1089, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logit_variables = ['2011_data', 'donor_advisory_2016', 'SOX_policies', 'total_revenue_logged', \n", " 'program_efficiency', 'age', 'complexity', 'state', 'category']\n", "df_2011 = df[logit_variables]\n", "df_2011 = df_2011[df_2011['2011_data']==1]\n", "print len(df_2011)\n", "len(df_2011.dropna())" ] }, { "cell_type": "code", "execution_count": 1100, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data849580.0572400.2323020.0000000.0000000.0000000.0000001.000000
donor_advisory_2016849580.0043320.0656720.0000000.0000000.0000000.0000001.000000
SOX_policies218942.7245820.6898670.0000003.0000003.0000003.0000003.000000
total_revenue_logged2182515.9114701.45855211.61612314.78163615.70660816.86440922.042788
program_efficiency218940.8054000.1036350.0000000.7565680.8177580.8711051.010186
age8383039.50814719.3101750.00000024.00000035.00000052.000000108.000000
complexity849580.3730311.2209450.0000000.0000000.0000000.0000008.000000
complexity_201148332.4667910.5144681.0000002.0000002.0000003.0000003.000000
\n", "
" ], "text/plain": [ " count mean std min 25% \\\n", "2011_data 84958 0.057240 0.232302 0.000000 0.000000 \n", "donor_advisory_2016 84958 0.004332 0.065672 0.000000 0.000000 \n", "SOX_policies 21894 2.724582 0.689867 0.000000 3.000000 \n", "total_revenue_logged 21825 15.911470 1.458552 11.616123 14.781636 \n", "program_efficiency 21894 0.805400 0.103635 0.000000 0.756568 \n", "age 83830 39.508147 19.310175 0.000000 24.000000 \n", "complexity 84958 0.373031 1.220945 0.000000 0.000000 \n", "complexity_2011 4833 2.466791 0.514468 1.000000 2.000000 \n", "\n", " 50% 75% max \n", "2011_data 0.000000 0.000000 1.000000 \n", "donor_advisory_2016 0.000000 0.000000 1.000000 \n", "SOX_policies 3.000000 3.000000 3.000000 \n", "total_revenue_logged 15.706608 16.864409 22.042788 \n", "program_efficiency 0.817758 0.871105 1.010186 \n", "age 35.000000 52.000000 108.000000 \n", "complexity 0.000000 0.000000 8.000000 \n", "complexity_2011 2.000000 3.000000 3.000000 " ] }, "execution_count": 1100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['2011_data', 'donor_advisory_2016', 'SOX_policies', 'total_revenue_logged', \n", " 'program_efficiency', 'age', 'complexity', 'complexity_2011', 'state', 'category']].describe().T" ] }, { "cell_type": "code", "execution_count": 1158, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warning: Maximum number of iterations has been exceeded.\n", " Current function value: 0.075575\n", " Iterations: 35\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals\n", " \"Check mle_retvals\", ConvergenceWarning)\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: donor_advisory_2011_to_2016 No. Observations: 4808
Model: Logit Df Residuals: 4792
Method: MLE Df Model: 15
Date: Fri, 02 Sep 2016 Pseudo R-squ.: 0.08892
Time: 12:11:05 Log-Likelihood: -363.36
converged: False LL-Null: -398.83
LLR p-value: 3.049e-09
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept -2.6649 1.444 -1.845 0.065 -5.495 0.166
C(category)[T.Arts, Culture, Humanities] -0.5071 0.515 -0.985 0.325 -1.516 0.502
C(category)[T.Community Development] -0.3595 0.575 -0.626 0.532 -1.486 0.767
C(category)[T.Education] -1.2142 0.794 -1.530 0.126 -2.770 0.341
C(category)[T.Environment] -20.2785 9810.033 -0.002 0.998 -1.92e+04 1.92e+04
C(category)[T.Health] -1.0181 0.548 -1.859 0.063 -2.092 0.055
C(category)[T.Human Services] -0.1780 0.415 -0.429 0.668 -0.992 0.636
C(category)[T.Human and Civil Rights] -0.6780 0.687 -0.987 0.324 -2.025 0.669
C(category)[T.International] -0.3664 0.511 -0.717 0.474 -1.368 0.636
C(category)[T.Religion] 0.3200 0.449 0.712 0.476 -0.560 1.200
C(category)[T.Research and Public Policy] -0.5291 0.796 -0.665 0.506 -2.090 1.031
SOX_policies -0.4264 0.110 -3.882 0.000 -0.642 -0.211
total_revenue_logged 0.2790 0.101 2.753 0.006 0.080 0.478
program_efficiency -3.0518 0.768 -3.976 0.000 -4.556 -1.547
age -0.0064 0.007 -0.977 0.329 -0.019 0.006
complexity_2011 -0.7374 0.274 -2.695 0.007 -1.274 -0.201
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "=======================================================================================\n", "Dep. Variable: donor_advisory_2011_to_2016 No. Observations: 4808\n", "Model: Logit Df Residuals: 4792\n", "Method: MLE Df Model: 15\n", "Date: Fri, 02 Sep 2016 Pseudo R-squ.: 0.08892\n", "Time: 12:11:05 Log-Likelihood: -363.36\n", "converged: False LL-Null: -398.83\n", " LLR p-value: 3.049e-09\n", "=============================================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------------------------------\n", "Intercept -2.6649 1.444 -1.845 0.065 -5.495 0.166\n", "C(category)[T.Arts, Culture, Humanities] -0.5071 0.515 -0.985 0.325 -1.516 0.502\n", "C(category)[T.Community Development] -0.3595 0.575 -0.626 0.532 -1.486 0.767\n", "C(category)[T.Education] -1.2142 0.794 -1.530 0.126 -2.770 0.341\n", "C(category)[T.Environment] -20.2785 9810.033 -0.002 0.998 -1.92e+04 1.92e+04\n", "C(category)[T.Health] -1.0181 0.548 -1.859 0.063 -2.092 0.055\n", "C(category)[T.Human Services] -0.1780 0.415 -0.429 0.668 -0.992 0.636\n", "C(category)[T.Human and Civil Rights] -0.6780 0.687 -0.987 0.324 -2.025 0.669\n", "C(category)[T.International] -0.3664 0.511 -0.717 0.474 -1.368 0.636\n", "C(category)[T.Religion] 0.3200 0.449 0.712 0.476 -0.560 1.200\n", "C(category)[T.Research and Public Policy] -0.5291 0.796 -0.665 0.506 -2.090 1.031\n", "SOX_policies -0.4264 0.110 -3.882 0.000 -0.642 -0.211\n", "total_revenue_logged 0.2790 0.101 2.753 0.006 0.080 0.478\n", "program_efficiency -3.0518 0.768 -3.976 0.000 -4.556 -1.547\n", "age -0.0064 0.007 -0.977 0.329 -0.019 0.006\n", "complexity_2011 -0.7374 0.274 -2.695 0.007 -1.274 -0.201\n", "=============================================================================================================\n", "\"\"\"" ] }, "execution_count": 1158, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logit = smf.logit(formula=logit_formula, data=df[df['2011_data']==1]).fit() \n", "logit.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test with standard errors clustered on state" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'donor_advisory_2016 ~ SOX_policies + total_revenue_logged + program_efficiency + age + complexity_2011 + C(category)'" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logit_formula" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cols1 = ['donor_advisory_2011_to_2016', 'SOX_policies', 'total_revenue_logged', 'program_efficiency', 'age',\n", " 'complexity_2011', 'category', 'state']" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warning: Maximum number of iterations has been exceeded.\n", " Current function value: 0.075203\n", " Iterations: 35\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals\n", " \"Check mle_retvals\", ConvergenceWarning)\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: donor_advisory_2011_to_2016 No. Observations: 4833
Model: Logit Df Residuals: 4817
Method: MLE Df Model: 15
Date: Tue, 06 Sep 2016 Pseudo R-squ.: 0.08962
Time: 12:35:43 Log-Likelihood: -363.45
converged: False LL-Null: -399.24
LLR p-value: 2.351e-09
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept -2.6828 1.038 -2.585 0.010 -4.717 -0.649
C(category)[T.Arts, Culture, Humanities] -0.5078 0.465 -1.092 0.275 -1.419 0.404
C(category)[T.Community Development] -0.3612 0.700 -0.516 0.606 -1.733 1.010
C(category)[T.Education] -1.2152 0.683 -1.779 0.075 -2.554 0.123
C(category)[T.Environment] -18.8719 0.433 -43.552 0.000 -19.721 -18.023
C(category)[T.Health] -1.0195 0.520 -1.959 0.050 -2.039 0.000
C(category)[T.Human Services] -0.1790 0.370 -0.484 0.628 -0.904 0.546
C(category)[T.Human and Civil Rights] -0.6823 0.483 -1.412 0.158 -1.629 0.265
C(category)[T.International] -0.3676 0.487 -0.754 0.451 -1.323 0.588
C(category)[T.Religion] 0.3141 0.470 0.668 0.504 -0.607 1.235
C(category)[T.Research and Public Policy] -0.5298 0.526 -1.007 0.314 -1.561 0.501
SOX_policies -0.4256 0.093 -4.588 0.000 -0.607 -0.244
total_revenue_logged 0.2806 0.074 3.776 0.000 0.135 0.426
program_efficiency -3.0563 0.816 -3.744 0.000 -4.656 -1.456
age -0.0065 0.005 -1.199 0.231 -0.017 0.004
complexity_2011 -0.7391 0.263 -2.814 0.005 -1.254 -0.224
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "=======================================================================================\n", "Dep. Variable: donor_advisory_2011_to_2016 No. Observations: 4833\n", "Model: Logit Df Residuals: 4817\n", "Method: MLE Df Model: 15\n", "Date: Tue, 06 Sep 2016 Pseudo R-squ.: 0.08962\n", "Time: 12:35:43 Log-Likelihood: -363.45\n", "converged: False LL-Null: -399.24\n", " LLR p-value: 2.351e-09\n", "=============================================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------------------------------\n", "Intercept -2.6828 1.038 -2.585 0.010 -4.717 -0.649\n", "C(category)[T.Arts, Culture, Humanities] -0.5078 0.465 -1.092 0.275 -1.419 0.404\n", "C(category)[T.Community Development] -0.3612 0.700 -0.516 0.606 -1.733 1.010\n", "C(category)[T.Education] -1.2152 0.683 -1.779 0.075 -2.554 0.123\n", "C(category)[T.Environment] -18.8719 0.433 -43.552 0.000 -19.721 -18.023\n", "C(category)[T.Health] -1.0195 0.520 -1.959 0.050 -2.039 0.000\n", "C(category)[T.Human Services] -0.1790 0.370 -0.484 0.628 -0.904 0.546\n", "C(category)[T.Human and Civil Rights] -0.6823 0.483 -1.412 0.158 -1.629 0.265\n", "C(category)[T.International] -0.3676 0.487 -0.754 0.451 -1.323 0.588\n", "C(category)[T.Religion] 0.3141 0.470 0.668 0.504 -0.607 1.235\n", "C(category)[T.Research and Public Policy] -0.5298 0.526 -1.007 0.314 -1.561 0.501\n", "SOX_policies -0.4256 0.093 -4.588 0.000 -0.607 -0.244\n", "total_revenue_logged 0.2806 0.074 3.776 0.000 0.135 0.426\n", "program_efficiency -3.0563 0.816 -3.744 0.000 -4.656 -1.456\n", "age -0.0065 0.005 -1.199 0.231 -0.017 0.004\n", "complexity_2011 -0.7391 0.263 -2.814 0.005 -1.254 -0.224\n", "=============================================================================================================\n", "\"\"\"" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logit = smf.logit(formula=logit_formula, data=df_2011[cols1].dropna()).fit(cov_type='cluster', \n", " cov_kwds={'groups': df_2011[cols1].dropna()['state']}) \n", "logit.summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Below this I've played around with creating a '2011' dataset --> and run a couple of logits" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Here are the two variables that serve as indicators of '2016' and '2011' CN data. **_These 8,304 and 4,863 rows will serve as the base for conducting the logit regressions_**. Design variable creation solutions are these subsets of data." ] }, { "cell_type": "code", "execution_count": 373, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8304\n", "4863\n" ] } ], "source": [ "print len(df[df['latest_entry']=='True'])\n", "print len(df[df['2011 data']==1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "- Control variables:\n", " - Size: total revenues best (probably logged)\n", " - will need 2011 and 2016 versions for the 4th and 5th tests\n", " - efficiency ratio\n", " - complexity (could be a good control from Erica's paper)\n", " - fixed effects:\n", " - state\n", " - category\n", " - I need to scrape the category dummies for the new orgs in the 2016 database\n", " - CN does not include that information in the ratings area, but it is included on the webpage in the 'breadcrumbs' area\n", " - The focus of our paper is on SOX policies; if an org has SOX policies it probably has other governance policies, and these would be highly correlated. So, we will leave the other governance variables out of one version of the 4th and 5th tests, and then try to include them in another set. The best candidates are:\n", " - *independent board* --> related to Erica's *independence of key actors\" concept\n", " - *board review of 990* and *audited financials* --> both related to Erica's *board monitoring* concept\n", " - we could include other governance variables as needed.\n", "- We are focusing on non-health, non-university organizations; by focusing more on a donor-focused sample (CN), we are differentiating the work from previous studies.\n", "- To differentiate from Erica's *JBE* paper, we should use the SOI data to see how many of the donor advisories are because of 'non-material diversions'.\n" ] }, { "cell_type": "code", "execution_count": 397, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_star_2011', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011']\n" ] } ], "source": [ "cols_2011 = [col for col in list(df) if col.endswith('_2011')]\n", "print cols_2011" ] }, { "cell_type": "code", "execution_count": 398, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "charity_name_2011 object\n", "category_2011 object\n", "city_2011 object\n", "state_2011 object\n", "cause_2011 object\n", "tag_line_2011 object\n", "url_2011 object\n", "ein_2011 object\n", "fye_2011 object\n", "overall_rating_2011 float64\n", "overall_rating_star_2011 float64\n", "efficiency_rating_2011 float64\n", "AT_rating_2011 float64\n", "financial_rating_star_2011 float64\n", "AT_rating_star_2011 float64\n", "program_expense_percent_2011 float64\n", "admin_expense_percent_2011 float64\n", "fund_expense_percent_2011 float64\n", "fund_efficiency_2011 float64\n", "primary_revenue_growth_2011 float64\n", "program_expense_growth_2011 float64\n", "working_capital_ratio_2011 float64\n", "independent_board_2011 object\n", "no_material_division_2011 object\n", "audited_financials_2011 object\n", "no_loans_related_2011 object\n", "documents_minutes_2011 object\n", "form_990_2011 object\n", "conflict_of_interest_policy_2011 object\n", "whistleblower_policy_2011 object\n", "records_retention_policy_2011 object\n", "CEO_listed_2011 object\n", "process_CEO_compensation_2011 object\n", "no_board_compensation_2011 object\n", "donor_privacy_policy_2011 object\n", "board_listed_2011 object\n", "audited_financials_web_2011 object\n", "form_990_web_2011 object\n", "staff_listed_2011 object\n", "primary_revenue_2011 float64\n", "other_revenue_2011 float64\n", "total_revenue_2011 float64\n", "govt_revenue_2011 object\n", "program_expense_2011 float64\n", "admin_expense_2011 float64\n", "fund_expense_2011 float64\n", "total_functional_expense_2011 float64\n", "affiliate_payments_2011 float64\n", "budget_surplus_2011 float64\n", "net_assets_2011 float64\n", "leader_comp_2011 float64\n", "leader_comp_percent_2011 float64\n", "email_2011 object\n", "website_2011 object\n", "dtype: object" ] }, "execution_count": 398, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[cols_2011].dtypes" ] }, { "cell_type": "code", "execution_count": 379, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
overall_rating_2011483353.5474558.933975e+008.040000e+0048.7354.6159.976.996000e+01
overall_rating_star_201148332.8696468.916785e-010.000000e+002.003.003.004.000000e+00
efficiency_rating_2011483353.4307801.048261e+010.000000e+0046.5354.8861.736.997000e+01
AT_rating_2011483356.4461001.152050e+01-2.000000e+0052.0059.0063.007.000000e+01
financial_rating_star_201148332.8504031.007304e+000.000000e+002.003.004.004.000000e+00
AT_rating_star_201148333.1280781.001005e+000.000000e+003.003.004.004.000000e+00
program_expense_percent_2011483380.4163251.055343e+012.200000e+0075.5081.6087.009.970000e+01
admin_expense_percent_2011483310.3206086.665639e+00-5.000000e-015.809.2013.206.790000e+01
fund_expense_percent_201148339.1101808.056593e+000.000000e+004.107.5011.909.070000e+01
fund_efficiency_201148330.1296795.190376e-010.000000e+000.050.100.163.533000e+01
primary_revenue_growth_201148334.4035801.554988e+01-7.960000e+01-3.502.6010.202.452000e+02
program_expense_growth_201148336.3149182.073699e+01-5.320000e+01-1.104.1011.101.007400e+03
working_capital_ratio_201148331.7884382.455707e+00-3.260000e+000.441.012.205.842000e+01
primary_revenue_2011483316525465.9925517.313324e+071.498700e+041989838.004430280.0011346855.003.502077e+09
other_revenue_20114833651129.4630666.235616e+06-6.612359e+075922.0099129.00443689.002.392543e+08
total_revenue_2011483317176106.7779857.570407e+07-4.263887e+072103386.004673878.0011721565.003.587775e+09
program_expense_2011483314784010.6163876.451777e+072.848300e+041694422.003808132.009557716.003.091879e+09
admin_expense_201148331477281.4769297.805211e+06-8.585600e+04188863.00412603.001010869.004.323913e+08
fund_expense_201148331133717.6885995.382822e+060.000000e+00160486.00359811.00830250.002.231224e+08
total_functional_expense_2011483317395155.3689227.292829e+071.507310e+052188637.004769351.0011766482.003.354177e+09
affiliate_payments_2011483360756.6267331.812179e+06-4.059500e+050.000.000.001.235951e+08
budget_surplus_20114833-219048.1593219.895040e+06-3.008552e+08-477470.001652.00452604.002.335980e+08
net_assets_2011483333372071.9143391.651812e+08-1.691832e+072013666.006004379.0019206628.007.002755e+09
leader_comp_20114657153519.5357531.313950e+050.000000e+0080808.00126250.00191203.002.257910e+06
leader_comp_percent_201146573.1661333.015173e+000.000000e+001.002.334.443.090000e+01
\n", "
" ], "text/plain": [ " count mean std \\\n", "overall_rating_2011 4833 53.547455 8.933975e+00 \n", "overall_rating_star_2011 4833 2.869646 8.916785e-01 \n", "efficiency_rating_2011 4833 53.430780 1.048261e+01 \n", "AT_rating_2011 4833 56.446100 1.152050e+01 \n", "financial_rating_star_2011 4833 2.850403 1.007304e+00 \n", "AT_rating_star_2011 4833 3.128078 1.001005e+00 \n", "program_expense_percent_2011 4833 80.416325 1.055343e+01 \n", "admin_expense_percent_2011 4833 10.320608 6.665639e+00 \n", "fund_expense_percent_2011 4833 9.110180 8.056593e+00 \n", "fund_efficiency_2011 4833 0.129679 5.190376e-01 \n", "primary_revenue_growth_2011 4833 4.403580 1.554988e+01 \n", "program_expense_growth_2011 4833 6.314918 2.073699e+01 \n", "working_capital_ratio_2011 4833 1.788438 2.455707e+00 \n", "primary_revenue_2011 4833 16525465.992551 7.313324e+07 \n", "other_revenue_2011 4833 651129.463066 6.235616e+06 \n", "total_revenue_2011 4833 17176106.777985 7.570407e+07 \n", "program_expense_2011 4833 14784010.616387 6.451777e+07 \n", "admin_expense_2011 4833 1477281.476929 7.805211e+06 \n", "fund_expense_2011 4833 1133717.688599 5.382822e+06 \n", "total_functional_expense_2011 4833 17395155.368922 7.292829e+07 \n", "affiliate_payments_2011 4833 60756.626733 1.812179e+06 \n", "budget_surplus_2011 4833 -219048.159321 9.895040e+06 \n", "net_assets_2011 4833 33372071.914339 1.651812e+08 \n", "leader_comp_2011 4657 153519.535753 1.313950e+05 \n", "leader_comp_percent_2011 4657 3.166133 3.015173e+00 \n", "\n", " min 25% 50% \\\n", "overall_rating_2011 8.040000e+00 48.73 54.61 \n", "overall_rating_star_2011 0.000000e+00 2.00 3.00 \n", "efficiency_rating_2011 0.000000e+00 46.53 54.88 \n", "AT_rating_2011 -2.000000e+00 52.00 59.00 \n", "financial_rating_star_2011 0.000000e+00 2.00 3.00 \n", "AT_rating_star_2011 0.000000e+00 3.00 3.00 \n", "program_expense_percent_2011 2.200000e+00 75.50 81.60 \n", "admin_expense_percent_2011 -5.000000e-01 5.80 9.20 \n", "fund_expense_percent_2011 0.000000e+00 4.10 7.50 \n", "fund_efficiency_2011 0.000000e+00 0.05 0.10 \n", "primary_revenue_growth_2011 -7.960000e+01 -3.50 2.60 \n", "program_expense_growth_2011 -5.320000e+01 -1.10 4.10 \n", "working_capital_ratio_2011 -3.260000e+00 0.44 1.01 \n", "primary_revenue_2011 1.498700e+04 1989838.00 4430280.00 \n", "other_revenue_2011 -6.612359e+07 5922.00 99129.00 \n", "total_revenue_2011 -4.263887e+07 2103386.00 4673878.00 \n", "program_expense_2011 2.848300e+04 1694422.00 3808132.00 \n", "admin_expense_2011 -8.585600e+04 188863.00 412603.00 \n", "fund_expense_2011 0.000000e+00 160486.00 359811.00 \n", "total_functional_expense_2011 1.507310e+05 2188637.00 4769351.00 \n", "affiliate_payments_2011 -4.059500e+05 0.00 0.00 \n", "budget_surplus_2011 -3.008552e+08 -477470.00 1652.00 \n", "net_assets_2011 -1.691832e+07 2013666.00 6004379.00 \n", "leader_comp_2011 0.000000e+00 80808.00 126250.00 \n", "leader_comp_percent_2011 0.000000e+00 1.00 2.33 \n", "\n", " 75% max \n", "overall_rating_2011 59.97 6.996000e+01 \n", "overall_rating_star_2011 3.00 4.000000e+00 \n", "efficiency_rating_2011 61.73 6.997000e+01 \n", "AT_rating_2011 63.00 7.000000e+01 \n", "financial_rating_star_2011 4.00 4.000000e+00 \n", "AT_rating_star_2011 4.00 4.000000e+00 \n", "program_expense_percent_2011 87.00 9.970000e+01 \n", "admin_expense_percent_2011 13.20 6.790000e+01 \n", "fund_expense_percent_2011 11.90 9.070000e+01 \n", "fund_efficiency_2011 0.16 3.533000e+01 \n", "primary_revenue_growth_2011 10.20 2.452000e+02 \n", "program_expense_growth_2011 11.10 1.007400e+03 \n", "working_capital_ratio_2011 2.20 5.842000e+01 \n", "primary_revenue_2011 11346855.00 3.502077e+09 \n", "other_revenue_2011 443689.00 2.392543e+08 \n", "total_revenue_2011 11721565.00 3.587775e+09 \n", "program_expense_2011 9557716.00 3.091879e+09 \n", "admin_expense_2011 1010869.00 4.323913e+08 \n", "fund_expense_2011 830250.00 2.231224e+08 \n", "total_functional_expense_2011 11766482.00 3.354177e+09 \n", "affiliate_payments_2011 0.00 1.235951e+08 \n", "budget_surplus_2011 452604.00 2.335980e+08 \n", "net_assets_2011 19206628.00 7.002755e+09 \n", "leader_comp_2011 191203.00 2.257910e+06 \n", "leader_comp_percent_2011 4.44 3.090000e+01 " ] }, "execution_count": 379, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[cols_2011].describe().T" ] }, { "cell_type": "code", "execution_count": 380, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from __future__ import division" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Program Efficiency Ratio\n", "efficiency = ProgExp/TotExp" ] }, { "cell_type": "code", "execution_count": 386, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "34 NaN\n", "35 NaN\n", "36 NaN\n", "37 NaN\n", "38 NaN\n", "39 0.824939\n", "40 NaN\n", "41 NaN\n", "42 NaN\n", "dtype: float64" ] }, "execution_count": 386, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['org_id']=='12123']['program_expense_2011']/df[df['org_id']=='12123']['total_functional_expense_2011']" ] }, { "cell_type": "code", "execution_count": 388, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
charity_name_2011category_2011city_2011state_2011cause_2011tag_line_2011url_2011ein_2011fye_2011overall_rating_2011overall_rating_star_2011efficiency_rating_2011AT_rating_2011financial_rating_star_2011AT_rating_star_2011program_expense_percent_2011admin_expense_percent_2011fund_expense_percent_2011fund_efficiency_2011primary_revenue_growth_2011program_expense_growth_2011working_capital_ratio_2011independent_board_2011no_material_division_2011audited_financials_2011no_loans_related_2011documents_minutes_2011form_990_2011conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011CEO_listed_2011process_CEO_compensation_2011no_board_compensation_2011donor_privacy_policy_2011board_listed_2011audited_financials_web_2011form_990_web_2011staff_listed_2011primary_revenue_2011other_revenue_2011total_revenue_2011govt_revenue_2011program_expense_2011admin_expense_2011fund_expense_2011total_functional_expense_2011affiliate_payments_2011budget_surplus_2011net_assets_2011leader_comp_2011leader_comp_percent_2011email_2011website_2011
34NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
35NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
36NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
37NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
38NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
39100 Club of ArizonaHuman ServicesPhoenixAZMultipurpose Human Service OrganizationsSupporting families of public safetyhttp://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=1212323-717207712/200963.84466.58624482.414.430.036.26.31.06yesyesyesyesyesyesyesyesNOyesyesyesyesyesNOyesyes1212051-227543984508Note: This organization receives $0 in government support.10191911783853789912354750-25096713167811226239.92info@100club.orghttp://www.100club.org
40NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
41NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
42NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " charity_name_2011 category_2011 city_2011 state_2011 \\\n", "34 NaN NaN NaN NaN \n", "35 NaN NaN NaN NaN \n", "36 NaN NaN NaN NaN \n", "37 NaN NaN NaN NaN \n", "38 NaN NaN NaN NaN \n", "39 100 Club of Arizona Human Services Phoenix AZ \n", "40 NaN NaN NaN NaN \n", "41 NaN NaN NaN NaN \n", "42 NaN NaN NaN NaN \n", "\n", " cause_2011 \\\n", "34 NaN \n", "35 NaN \n", "36 NaN \n", "37 NaN \n", "38 NaN \n", "39 Multipurpose Human Service Organizations \n", "40 NaN \n", "41 NaN \n", "42 NaN \n", "\n", " tag_line_2011 \\\n", "34 NaN \n", "35 NaN \n", "36 NaN \n", "37 NaN \n", "38 NaN \n", "39 Supporting families of public safety \n", "40 NaN \n", "41 NaN \n", "42 NaN \n", "\n", " url_2011 \\\n", "34 NaN \n", "35 NaN \n", "36 NaN \n", "37 NaN \n", "38 NaN \n", "39 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=12123 \n", "40 NaN \n", "41 NaN \n", "42 NaN \n", "\n", " ein_2011 fye_2011 overall_rating_2011 overall_rating_star_2011 \\\n", "34 NaN NaN NaN NaN \n", "35 NaN NaN NaN NaN \n", "36 NaN NaN NaN NaN \n", "37 NaN NaN NaN NaN \n", "38 NaN NaN NaN NaN \n", "39 23-7172077 12/2009 63.84 4 \n", "40 NaN NaN NaN NaN \n", "41 NaN NaN NaN NaN \n", "42 NaN NaN NaN NaN \n", "\n", " efficiency_rating_2011 AT_rating_2011 financial_rating_star_2011 \\\n", "34 NaN NaN NaN \n", "35 NaN NaN NaN \n", "36 NaN NaN NaN \n", "37 NaN NaN NaN \n", "38 NaN NaN NaN \n", "39 66.58 62 4 \n", "40 NaN NaN NaN \n", "41 NaN NaN NaN \n", "42 NaN NaN NaN \n", "\n", " AT_rating_star_2011 program_expense_percent_2011 \\\n", "34 NaN NaN \n", "35 NaN NaN \n", "36 NaN NaN \n", "37 NaN NaN \n", "38 NaN NaN \n", "39 4 82.4 \n", "40 NaN NaN \n", "41 NaN NaN \n", "42 NaN NaN \n", "\n", " admin_expense_percent_2011 fund_expense_percent_2011 \\\n", "34 NaN NaN \n", "35 NaN NaN \n", "36 NaN NaN \n", "37 NaN NaN \n", "38 NaN NaN \n", "39 14.4 3 \n", "40 NaN NaN \n", "41 NaN NaN \n", "42 NaN NaN \n", "\n", " fund_efficiency_2011 primary_revenue_growth_2011 \\\n", "34 NaN NaN \n", "35 NaN NaN \n", "36 NaN NaN \n", "37 NaN NaN \n", "38 NaN NaN \n", "39 0.03 6.2 \n", "40 NaN NaN \n", "41 NaN NaN \n", "42 NaN NaN \n", "\n", " program_expense_growth_2011 working_capital_ratio_2011 \\\n", "34 NaN NaN \n", "35 NaN NaN \n", "36 NaN NaN \n", "37 NaN NaN \n", "38 NaN NaN \n", "39 6.3 1.06 \n", "40 NaN NaN \n", "41 NaN NaN \n", "42 NaN NaN \n", "\n", " independent_board_2011 no_material_division_2011 audited_financials_2011 \\\n", "34 NaN NaN NaN \n", "35 NaN NaN NaN \n", "36 NaN NaN NaN \n", "37 NaN NaN NaN \n", "38 NaN NaN NaN \n", "39 yes yes yes \n", "40 NaN NaN NaN \n", "41 NaN NaN NaN \n", "42 NaN NaN NaN \n", "\n", " no_loans_related_2011 documents_minutes_2011 form_990_2011 \\\n", "34 NaN NaN NaN \n", "35 NaN NaN NaN \n", "36 NaN NaN NaN \n", "37 NaN NaN NaN \n", "38 NaN NaN NaN \n", "39 yes yes yes \n", "40 NaN NaN NaN \n", "41 NaN NaN NaN \n", "42 NaN NaN NaN \n", "\n", " conflict_of_interest_policy_2011 whistleblower_policy_2011 \\\n", "34 NaN NaN \n", "35 NaN NaN \n", "36 NaN NaN \n", "37 NaN NaN \n", "38 NaN NaN \n", "39 yes yes \n", "40 NaN NaN \n", "41 NaN NaN \n", "42 NaN NaN \n", "\n", " records_retention_policy_2011 CEO_listed_2011 \\\n", "34 NaN NaN \n", "35 NaN NaN \n", "36 NaN NaN \n", "37 NaN NaN \n", "38 NaN NaN \n", "39 NO yes \n", "40 NaN NaN \n", "41 NaN NaN \n", "42 NaN NaN \n", "\n", " process_CEO_compensation_2011 no_board_compensation_2011 \\\n", "34 NaN NaN \n", "35 NaN NaN \n", "36 NaN NaN \n", "37 NaN NaN \n", "38 NaN NaN \n", "39 yes yes \n", "40 NaN NaN \n", "41 NaN NaN \n", "42 NaN NaN \n", "\n", " donor_privacy_policy_2011 board_listed_2011 audited_financials_web_2011 \\\n", "34 NaN NaN NaN \n", "35 NaN NaN NaN \n", "36 NaN NaN NaN \n", "37 NaN NaN NaN \n", "38 NaN NaN NaN \n", "39 yes yes NO \n", "40 NaN NaN NaN \n", "41 NaN NaN NaN \n", "42 NaN NaN NaN \n", "\n", " form_990_web_2011 staff_listed_2011 primary_revenue_2011 \\\n", "34 NaN NaN NaN \n", "35 NaN NaN NaN \n", "36 NaN NaN NaN \n", "37 NaN NaN NaN \n", "38 NaN NaN NaN \n", "39 yes yes 1212051 \n", "40 NaN NaN NaN \n", "41 NaN NaN NaN \n", "42 NaN NaN NaN \n", "\n", " other_revenue_2011 total_revenue_2011 \\\n", "34 NaN NaN \n", "35 NaN NaN \n", "36 NaN NaN \n", "37 NaN NaN \n", "38 NaN NaN \n", "39 -227543 984508 \n", "40 NaN NaN \n", "41 NaN NaN \n", "42 NaN NaN \n", "\n", " govt_revenue_2011 \\\n", "34 NaN \n", "35 NaN \n", "36 NaN \n", "37 NaN \n", "38 NaN \n", "39 Note: This organization receives $0 in government support. \n", "40 NaN \n", "41 NaN \n", "42 NaN \n", "\n", " program_expense_2011 admin_expense_2011 fund_expense_2011 \\\n", "34 NaN NaN NaN \n", "35 NaN NaN NaN \n", "36 NaN NaN NaN \n", "37 NaN NaN NaN \n", "38 NaN NaN NaN \n", "39 1019191 178385 37899 \n", "40 NaN NaN NaN \n", "41 NaN NaN NaN \n", "42 NaN NaN NaN \n", "\n", " total_functional_expense_2011 affiliate_payments_2011 \\\n", "34 NaN NaN \n", "35 NaN NaN \n", "36 NaN NaN \n", "37 NaN NaN \n", "38 NaN NaN \n", "39 1235475 0 \n", "40 NaN NaN \n", "41 NaN NaN \n", "42 NaN NaN \n", "\n", " budget_surplus_2011 net_assets_2011 leader_comp_2011 \\\n", "34 NaN NaN NaN \n", "35 NaN NaN NaN \n", "36 NaN NaN NaN \n", "37 NaN NaN NaN \n", "38 NaN NaN NaN \n", "39 -250967 1316781 122623 \n", "40 NaN NaN NaN \n", "41 NaN NaN NaN \n", "42 NaN NaN NaN \n", "\n", " leader_comp_percent_2011 email_2011 website_2011 \n", "34 NaN NaN NaN \n", "35 NaN NaN NaN \n", "36 NaN NaN NaN \n", "37 NaN NaN NaN \n", "38 NaN NaN NaN \n", "39 9.92 info@100club.org http://www.100club.org \n", "40 NaN NaN NaN \n", "41 NaN NaN NaN \n", "42 NaN NaN NaN " ] }, "execution_count": 388, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['org_id']=='12123'][cols_2011]" ] }, { "cell_type": "code", "execution_count": 395, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "9 NaN\n", "21 0.797448\n", "39 0.824939\n", "52 0.854655\n", "63 0.786945\n", "dtype: float64" ] }, "execution_count": 395, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['2011 data']==1]['program_expense_2011'][:5]/df[df['2011 data']==1]['total_functional_expense_2011'][:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Complexity\n", "Number of revenue sources (Donations, Government Grants, Program Service Revenues)" ] }, { "cell_type": "code", "execution_count": 407, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4863\n", "4833\n" ] }, { "data": { "text/plain": [ "Note: This organization receives $0 in government support. 2531\n", "GOVERNMENT SUPPORT MUST BE RECEIVED 2302\n", "Name: govt_revenue_2011, dtype: int64" ] }, "execution_count": 407, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['2011 data']==1])\n", "print len(df[df['govt_revenue_2011'].notnull()])\n", "df[df['govt_revenue_2011'].notnull()]['govt_revenue_2011'].value_counts()" ] }, { "cell_type": "code", "execution_count": 409, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "161 GOVERNMENT SUPPORT MUST BE RECEIVED\n", "185 Note: This organization receives $0 in government support.\n", "208 GOVERNMENT SUPPORT MUST BE RECEIVED\n", "228 Note: This organization receives $0 in government support.\n", "244 Note: This organization receives $0 in government support.\n", "255 Note: This organization receives $0 in government support.\n", "276 Note: This organization receives $0 in government support.\n", "Name: govt_revenue_2011, dtype: object" ] }, "execution_count": 409, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['govt_revenue_2011'].notnull()]['govt_revenue_2011'][5:12]" ] }, { "cell_type": "code", "execution_count": 410, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 2531\n", "1 2302\n", "Name: govt_revenue_2011_binary, dtype: int64" ] }, "execution_count": 410, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['govt_revenue_2011_binary'] = np.nan\n", "df['govt_revenue_2011_binary'] = np.where(\n", " df['govt_revenue_2011'] == 'Note: This organization receives $0 in government support.', 0,\n", " df['govt_revenue_2011_binary'])\n", "df['govt_revenue_2011_binary'] = np.where(\n", " df['govt_revenue_2011'] == 'GOVERNMENT SUPPORT MUST BE RECEIVED', 1, \n", " df['govt_revenue_2011_binary'])\n", "df['govt_revenue_2011_binary'].value_counts()" ] }, { "cell_type": "code", "execution_count": 420, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4863\n", "4833\n", "21 3914222\n", "39 1212051\n", "52 762512\n", "63 1140158\n", "148 1375169\n", "Name: primary_revenue_2011, dtype: float64\n" ] }, { "data": { "text/plain": [ "4833" ] }, "execution_count": 420, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['2011 data']==1])\n", "print len(df[df['primary_revenue_2011'].notnull()])\n", "print df[df['primary_revenue_2011'].notnull()]['primary_revenue_2011'][:5]\n", "df[df['primary_revenue_2011'].notnull()]['primary_revenue_2011'].value_counts().sum()" ] }, { "cell_type": "code", "execution_count": 427, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "39 -227543\n", "63 -44041\n", "312 -572243\n", "362 -2666\n", "385 -7531\n", "477 -4653961\n", "558 -176202\n", "907 -3543041\n", "942 -1726139\n", "1161 -831757\n", "1317 -7494\n", "1352 -651915\n", "1372 -288151\n", "1390 0\n", "1604 -7067577\n", "1674 -271688\n", "1689 -11111670\n", "1705 -46280\n", "1927 -1702433\n", "1955 -6955\n", "1982 -77563\n", "2013 -388287\n", "2027 -76537\n", "2099 -1543\n", "2196 -162612\n", "2249 -2860\n", "2352 -113036\n", "2436 678\n", "2597 -7227\n", "2663 -456651\n", " ... \n", "82598 -1308477\n", "82683 0\n", "82914 -212856\n", "82971 -1358658\n", "82987 -29502\n", "83033 -1003245\n", "83098 -356419\n", "83115 -10947728\n", "83366 991\n", "83423 -18112\n", "83590 -173857\n", "83632 -1810\n", "83713 -4529558\n", "83746 0\n", "83758 251\n", "83841 -364815\n", "83873 -313582\n", "83898 0\n", "83910 300\n", "83911 0\n", "83912 56\n", "83913 28\n", "83917 -4437\n", "83920 -3318\n", "83922 263\n", "83925 748\n", "83927 0\n", "83928 30\n", "83932 -228354\n", "83940 -6663315\n", "Name: other_revenue_2011, dtype: float64" ] }, "execution_count": 427, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['other_revenue_2011']<1000]['other_revenue_2011']" ] }, { "cell_type": "code", "execution_count": 429, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "46" ] }, "execution_count": 429, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df[df['other_revenue_2011']==0])" ] }, { "cell_type": "code", "execution_count": 419, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4863\n", "4833\n", "21 216503\n", "39 -227543\n", "52 21340\n", "63 -44041\n", "148 5061\n", "Name: other_revenue_2011, dtype: float64\n" ] }, { "data": { "text/plain": [ "4833" ] }, "execution_count": 419, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['2011 data']==1])\n", "print len(df[df['other_revenue_2011'].notnull()])\n", "print df[df['other_revenue_2011'].notnull()]['other_revenue_2011'][:5]\n", "df[df['other_revenue_2011'].notnull()]['other_revenue_2011'].value_counts().sum()" ] }, { "cell_type": "code", "execution_count": 431, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4833\n", "4833\n" ] }, { "data": { "text/plain": [ "1 4787\n", "0 46\n", "Name: other_revenue_2011_binary, dtype: int64" ] }, "execution_count": 431, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['other_revenue_2011'].notnull()])\n", "df['other_revenue_2011_binary'] = np.where(df['other_revenue_2011']>0, 1, df['other_revenue_2011'] )\n", "df['other_revenue_2011_binary'] = np.where(df['other_revenue_2011']<0, 1, df['other_revenue_2011_binary'] )\n", "print len(df[df['other_revenue_2011_binary'].notnull()])\n", "df['other_revenue_2011_binary'].value_counts()" ] }, { "cell_type": "code", "execution_count": 434, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4833\n" ] }, { "data": { "text/plain": [ "2 2501\n", "3 2294\n", "1 38\n", "Name: complexity_2011, dtype: int64" ] }, "execution_count": 434, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['complexity_2011'] = 1 + df['other_revenue_2011_binary'] + df['govt_revenue_2011_binary']\n", "print len(df[df['complexity_2011'].notnull()])\n", "df['complexity_2011'].value_counts()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cols_2011 = [col for col in list(df) if col.endswith('_2011')]\n", "print cols_2011" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test Logits" ] }, { "cell_type": "code", "execution_count": 1132, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.6.1\n" ] } ], "source": [ "import statsmodels\n", "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf #FOR USING 'R'-STYLE FORMULAS FOR REGRESSIONS\n", "print statsmodels.__version__" ] }, { "cell_type": "code", "execution_count": 455, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "yes yes yes 3548\n", "yes NO NO 416\n", "yes yes NO 284\n", "yes NO yes 265\n", "NO NO NO 255\n", "NO NO yes 30\n", "NO yes yes 21\n", "NO yes NO 14\n", "Name: SOX_policies_2011, dtype: int64" ] }, "execution_count": 455, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['SOX_policies_2011'] = df['conflict_of_interest_policy_2011'] + ' ' + df['whistleblower_policy_2011'] + ' ' + df['records_retention_policy_2011']\n", "df['SOX_policies_2011'].value_counts()" ] }, { "cell_type": "code", "execution_count": 456, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "3 3548\n", "2 570\n", "1 460\n", "0 255\n", "Name: SOX_policies_2011, dtype: int64" ] }, "execution_count": 456, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['SOX_policies_2011'] = df['SOX_policies_2011'].str.count('yes')\n", "df['SOX_policies_2011'].value_counts()" ] }, { "cell_type": "code", "execution_count": 491, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
total_revenue_2011_logged481115.5309811.28284512.58646614.56924315.3633816.279772.200080e+01
total_revenue_2011483317176106.77798575704074.259743-42638874.0000002103386.0000004673878.0000011721565.000003.587775e+09
\n", "
" ], "text/plain": [ " count mean std \\\n", "total_revenue_2011_logged 4811 15.530981 1.282845 \n", "total_revenue_2011 4833 17176106.777985 75704074.259743 \n", "\n", " min 25% 50% \\\n", "total_revenue_2011_logged 12.586466 14.569243 15.36338 \n", "total_revenue_2011 -42638874.000000 2103386.000000 4673878.00000 \n", "\n", " 75% max \n", "total_revenue_2011_logged 16.27977 2.200080e+01 \n", "total_revenue_2011 11721565.00000 3.587775e+09 " ] }, "execution_count": 491, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['total_revenue_2011_logged'] = np.log(df['total_revenue_2011'])\n", "df[['total_revenue_2011_logged', 'total_revenue_2011']].describe().T" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['advisory'] = df['']" ] }, { "cell_type": "code", "execution_count": 545, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "donor_advisory_2016 ~ SOX_policies_2011 + total_revenue_2011_logged + program_expense_percent_2011 + age + complexity_2011 + C(category)\n" ] } ], "source": [ "#IVs = '%s + ' % IV\n", "#DV = '%s ~ ' % DV \n", "IVs = 'SOX_policies_2011 '\n", "#DV = 'advisory ~ '\n", "DV = 'donor_advisory_2016 ~ '\n", "controls = '+ total_revenue_2011_logged + program_expense_percent_2011 + age + \\\n", " complexity_2011 + C(category)'\n", "\n", "#admin_expense_percent + leader_comp_percent + budget_surplus\n", "logit_formula = DV+IVs+controls\n", "print logit_formula\n", "#globals()[\"mod%s\" % model_num] = smf.logit(formula=logit_formula, data=df).fit() \n", "#print globals()[\"mod%s\" % model_num].summary()\n", "# #print model_num.summary()\n", "#print '\\n', \"Chi-squared value:\", globals()[\"mod%s\" % model_num].llr, '\\n' #TO GET THE CHI-SQUARED VALUE" ] }, { "cell_type": "code", "execution_count": 532, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df.to_csv('df.csv', encoding='utf-8')" ] }, { "cell_type": "code", "execution_count": 535, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84958\n" ] }, { "data": { "text/plain": [ "4808" ] }, "execution_count": 535, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logit_variables = ['donor_advisory_2016', 'SOX_policies_2011', 'total_revenue_2011_logged', \n", " 'program_expense_percent_2011', 'age', 'complexity_2011', 'state_2011', 'category']\n", "df_2011 = df[logit_variables]\n", "print len(df_2011)\n", "len(df_2011.dropna())" ] }, { "cell_type": "code", "execution_count": 519, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "donor_advisory_2016 ~ SOX_policies_2011 + total_revenue_2011_logged + program_expense_percent_2011 + age + complexity_2011\n" ] } ], "source": [ "#IVs = '%s + ' % IV\n", "#DV = '%s ~ ' % DV \n", "IVs = 'SOX_policies_2011 '\n", "#DV = 'advisory ~ '\n", "DV = 'donor_advisory_2016 ~ '\n", "controls = '+ total_revenue_2011_logged + program_expense_percent_2011 + age + \\\n", " complexity_2011'\n", "\n", "#admin_expense_percent + leader_comp_percent + budget_surplus\n", "logit_formula = DV+IVs+controls\n", "print logit_formula\n", "#globals()[\"mod%s\" % model_num] = smf.logit(formula=logit_formula, data=df).fit() \n", "#print globals()[\"mod%s\" % model_num].summary()\n", "# #print model_num.summary()\n", "#print '\\n', \"Chi-squared value:\", globals()[\"mod%s\" % model_num].llr, '\\n' #TO GET THE CHI-SQUARED VALUE" ] }, { "cell_type": "code", "execution_count": 536, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.044916\n", " Iterations 10\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: donor_advisory_2016 No. Observations: 4808
Model: Logit Df Residuals: 4802
Method: MLE Df Model: 5
Date: Wed, 31 Aug 2016 Pseudo R-squ.: 0.08557
Time: 21:43:38 Log-Likelihood: -215.96
converged: True LL-Null: -236.17
LLR p-value: 1.229e-07
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept -3.0296 1.925 -1.574 0.115 -6.802 0.743
SOX_policies_2011 -0.4992 0.145 -3.448 0.001 -0.783 -0.215
total_revenue_2011_logged 0.3113 0.138 2.260 0.024 0.041 0.581
program_expense_percent_2011 -0.0285 0.009 -3.068 0.002 -0.047 -0.010
age -0.0104 0.009 -1.130 0.258 -0.028 0.008
complexity_2011 -1.2189 0.358 -3.402 0.001 -1.921 -0.517
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "===============================================================================\n", "Dep. Variable: donor_advisory_2016 No. Observations: 4808\n", "Model: Logit Df Residuals: 4802\n", "Method: MLE Df Model: 5\n", "Date: Wed, 31 Aug 2016 Pseudo R-squ.: 0.08557\n", "Time: 21:43:38 Log-Likelihood: -215.96\n", "converged: True LL-Null: -236.17\n", " LLR p-value: 1.229e-07\n", "================================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------------------\n", "Intercept -3.0296 1.925 -1.574 0.115 -6.802 0.743\n", "SOX_policies_2011 -0.4992 0.145 -3.448 0.001 -0.783 -0.215\n", "total_revenue_2011_logged 0.3113 0.138 2.260 0.024 0.041 0.581\n", "program_expense_percent_2011 -0.0285 0.009 -3.068 0.002 -0.047 -0.010\n", "age -0.0104 0.009 -1.130 0.258 -0.028 0.008\n", "complexity_2011 -1.2189 0.358 -3.402 0.001 -1.921 -0.517\n", "================================================================================================\n", "\"\"\"" ] }, "execution_count": 536, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logit = smf.logit(formula=logit_formula, data=df_2011).fit() \n", "logit.summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#def new_logit(IV,model_num):\n", "def new_logit_clustered(data, DV, columns, FE, model_num):\n", " #IVs = '%s + ' % IV\n", " #DV = 'RTs_binary ~ '\n", " DV = '%s ~ ' % DV \n", " #controls = 'from_user_followers_count + time_on_twitter_days + CSR_sustainability + \\\n", " # URLs_binary + photo'\n", " IVs = ' + '.join(columns)\n", " FE = '%s ' % FE\n", " logit_formula = DV+IVs+FE\n", " print logit_formula\n", " globals()[\"mod%s\" % model_num] = smf.logit(formula=logit_formula, data=data).fit(cov_type='cluster',\n", " cov_kwds={'groups': df['firm_from_user_screen_name']}) \n", " print globals()[\"mod%s\" % model_num].summary()\n", " #print model_num.summary()\n", " print '\\n', \"Chi-squared value:\", globals()[\"mod%s\" % model_num].llr, '\\n' #TO GET THE CHI-SQUARED VALUE\n", " #print '\\n', \"Pseudo R-squared:\", globals()[\"mod%s\" % model_num].prsquared #TO GET THE PSEUDO-R-SQUARED" ] }, { "cell_type": "code", "execution_count": 543, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.044916\n", " Iterations 10\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: donor_advisory_2016 No. Observations: 4808
Model: Logit Df Residuals: 4802
Method: MLE Df Model: 5
Date: Wed, 31 Aug 2016 Pseudo R-squ.: 0.08557
Time: 21:46:12 Log-Likelihood: -215.96
converged: True LL-Null: -236.17
LLR p-value: 1.229e-07
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept -3.0296 1.925 -1.574 0.115 -6.802 0.743
SOX_policies_2011 -0.4992 0.145 -3.448 0.001 -0.783 -0.215
total_revenue_2011_logged 0.3113 0.138 2.260 0.024 0.041 0.581
program_expense_percent_2011 -0.0285 0.009 -3.068 0.002 -0.047 -0.010
age -0.0104 0.009 -1.130 0.258 -0.028 0.008
complexity_2011 -1.2189 0.358 -3.402 0.001 -1.921 -0.517
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "===============================================================================\n", "Dep. Variable: donor_advisory_2016 No. Observations: 4808\n", "Model: Logit Df Residuals: 4802\n", "Method: MLE Df Model: 5\n", "Date: Wed, 31 Aug 2016 Pseudo R-squ.: 0.08557\n", "Time: 21:46:12 Log-Likelihood: -215.96\n", "converged: True LL-Null: -236.17\n", " LLR p-value: 1.229e-07\n", "================================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------------------\n", "Intercept -3.0296 1.925 -1.574 0.115 -6.802 0.743\n", "SOX_policies_2011 -0.4992 0.145 -3.448 0.001 -0.783 -0.215\n", "total_revenue_2011_logged 0.3113 0.138 2.260 0.024 0.041 0.581\n", "program_expense_percent_2011 -0.0285 0.009 -3.068 0.002 -0.047 -0.010\n", "age -0.0104 0.009 -1.130 0.258 -0.028 0.008\n", "complexity_2011 -1.2189 0.358 -3.402 0.001 -1.921 -0.517\n", "================================================================================================\n", "\"\"\"" ] }, "execution_count": 543, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logit = smf.logit(formula=logit_formula, data=df).fit() \n", "logit.summary()" ] }, { "cell_type": "code", "execution_count": 524, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "4863" ] }, "execution_count": 524, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df[df['2011 data']==1])" ] }, { "cell_type": "code", "execution_count": 523, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
SOX_policies_201148332.5334160.8694660.0000002.0000003.000003.000003.000000
total_revenue_2011_logged481115.5309811.28284512.58646614.56924315.3633816.2797722.000798
program_expense_percent_2011483380.41632510.5534292.20000075.50000081.6000087.0000099.700000
age486040.05102919.2402160.00000025.00000035.0000052.00000108.000000
complexity_201148332.4667910.5144681.0000002.0000002.000003.000003.000000
\n", "
" ], "text/plain": [ " count mean std min \\\n", "SOX_policies_2011 4833 2.533416 0.869466 0.000000 \n", "total_revenue_2011_logged 4811 15.530981 1.282845 12.586466 \n", "program_expense_percent_2011 4833 80.416325 10.553429 2.200000 \n", "age 4860 40.051029 19.240216 0.000000 \n", "complexity_2011 4833 2.466791 0.514468 1.000000 \n", "\n", " 25% 50% 75% max \n", "SOX_policies_2011 2.000000 3.00000 3.00000 3.000000 \n", "total_revenue_2011_logged 14.569243 15.36338 16.27977 22.000798 \n", "program_expense_percent_2011 75.500000 81.60000 87.00000 99.700000 \n", "age 25.000000 35.00000 52.00000 108.000000 \n", "complexity_2011 2.000000 2.00000 3.00000 3.000000 " ] }, "execution_count": 523, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['2011 data']==1][['SOX_policies_2011', 'total_revenue_2011_logged', 'program_expense_percent_2011', \n", " 'age', 'complexity_2011']].describe().T" ] }, { "cell_type": "code", "execution_count": 508, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "4863" ] }, "execution_count": 508, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df[df['state_2011'].notnull()])" ] }, { "cell_type": "code", "execution_count": 509, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "NY 672\n", "CA 649\n", "DC 333\n", "FL 283\n", "TX 239\n", "VA 187\n", "MA 183\n", "IL 172\n", "PA 169\n", "OH 127\n", "CO 126\n", "GA 114\n", "WA 110\n", "MD 108\n", "MI 107\n", "MN 94\n", "NC 93\n", "MO 92\n", "NJ 86\n", "TN 80\n", "OR 79\n", "AZ 75\n", "CT 71\n", "WI 64\n", "IN 52\n", "KY 37\n", "SC 35\n", "NE 33\n", "LA 29\n", "OK 29\n", "ME 27\n", "UT 26\n", "AL 26\n", "KS 24\n", "NM 24\n", "IA 23\n", "MT 22\n", "HI 20\n", "NH 17\n", "NV 17\n", "RI 15\n", "MS 15\n", "VT 14\n", "AR 14\n", "DE 13\n", "WY 9\n", "AK 8\n", "ID 7\n", "SD 6\n", "WV 5\n", "ND 2\n", "PR 1\n", "Name: state_2011, dtype: int64" ] }, "execution_count": 509, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['state_2011'].value_counts()" ] }, { "cell_type": "code", "execution_count": 516, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 516, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df[ (df['2011 data']==1) & (df['state_2011'].isnull())])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 540, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
donor_advisory_201648080.0085270.0919590.0000000.0000000.0000000.00001.000000
SOX_policies_201148082.5341100.8690420.0000002.0000003.0000003.00003.000000
total_revenue_2011_logged480815.5320271.28255112.58646614.56982415.36419916.280422.000798
program_expense_percent_2011480880.42778710.5585542.20000075.50000081.60000087.000099.700000
age480840.01601519.1989560.00000025.00000035.00000052.0000108.000000
complexity_201148082.4667220.5145431.0000002.0000002.0000003.00003.000000
\n", "
" ], "text/plain": [ " count mean std min \\\n", "donor_advisory_2016 4808 0.008527 0.091959 0.000000 \n", "SOX_policies_2011 4808 2.534110 0.869042 0.000000 \n", "total_revenue_2011_logged 4808 15.532027 1.282551 12.586466 \n", "program_expense_percent_2011 4808 80.427787 10.558554 2.200000 \n", "age 4808 40.016015 19.198956 0.000000 \n", "complexity_2011 4808 2.466722 0.514543 1.000000 \n", "\n", " 25% 50% 75% max \n", "donor_advisory_2016 0.000000 0.000000 0.0000 1.000000 \n", "SOX_policies_2011 2.000000 3.000000 3.0000 3.000000 \n", "total_revenue_2011_logged 14.569824 15.364199 16.2804 22.000798 \n", "program_expense_percent_2011 75.500000 81.600000 87.0000 99.700000 \n", "age 25.000000 35.000000 52.0000 108.000000 \n", "complexity_2011 2.000000 2.000000 3.0000 3.000000 " ] }, "execution_count": 540, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011.dropna().describe().T" ] }, { "cell_type": "code", "execution_count": 546, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warning: Maximum number of iterations has been exceeded.\n", " Current function value: 0.043361\n", " Iterations: 35\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals\n", " \"Check mle_retvals\", ConvergenceWarning)\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: donor_advisory_2016 No. Observations: 4808
Model: Logit Df Residuals: 4792
Method: MLE Df Model: 15
Date: Wed, 31 Aug 2016 Pseudo R-squ.: 0.1172
Time: 21:48:06 Log-Likelihood: -208.48
converged: False LL-Null: -236.17
LLR p-value: 1.546e-06
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept -3.4848 1.884 -1.849 0.064 -7.178 0.208
C(category)[T.Arts, Culture, Humanities] -0.7432 0.965 -0.770 0.441 -2.635 1.149
C(category)[T.Community Development] 0.2960 0.856 0.346 0.730 -1.382 1.974
C(category)[T.Education] -0.1108 0.901 -0.123 0.902 -1.877 1.655
C(category)[T.Environment] -17.5510 0.614 -28.598 0.000 -18.754 -16.348
C(category)[T.Health] -0.1414 0.744 -0.190 0.849 -1.600 1.317
C(category)[T.Human Services] -0.2230 0.760 -0.293 0.769 -1.713 1.268
C(category)[T.Human and Civil Rights] -0.0320 0.774 -0.041 0.967 -1.550 1.486
C(category)[T.International] -0.0034 0.766 -0.004 0.996 -1.505 1.498
C(category)[T.Religion] 1.1258 0.708 1.589 0.112 -0.263 2.514
C(category)[T.Research and Public Policy] 0.5617 0.799 0.703 0.482 -1.004 2.127
SOX_policies_2011 -0.4728 0.124 -3.805 0.000 -0.716 -0.229
total_revenue_2011_logged 0.2978 0.126 2.356 0.018 0.050 0.546
program_expense_percent_2011 -0.0330 0.011 -2.998 0.003 -0.055 -0.011
age -0.0104 0.007 -1.499 0.134 -0.024 0.003
complexity_2011 -0.8049 0.465 -1.730 0.084 -1.717 0.107
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "===============================================================================\n", "Dep. Variable: donor_advisory_2016 No. Observations: 4808\n", "Model: Logit Df Residuals: 4792\n", "Method: MLE Df Model: 15\n", "Date: Wed, 31 Aug 2016 Pseudo R-squ.: 0.1172\n", "Time: 21:48:06 Log-Likelihood: -208.48\n", "converged: False LL-Null: -236.17\n", " LLR p-value: 1.546e-06\n", "=============================================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------------------------------\n", "Intercept -3.4848 1.884 -1.849 0.064 -7.178 0.208\n", "C(category)[T.Arts, Culture, Humanities] -0.7432 0.965 -0.770 0.441 -2.635 1.149\n", "C(category)[T.Community Development] 0.2960 0.856 0.346 0.730 -1.382 1.974\n", "C(category)[T.Education] -0.1108 0.901 -0.123 0.902 -1.877 1.655\n", "C(category)[T.Environment] -17.5510 0.614 -28.598 0.000 -18.754 -16.348\n", "C(category)[T.Health] -0.1414 0.744 -0.190 0.849 -1.600 1.317\n", "C(category)[T.Human Services] -0.2230 0.760 -0.293 0.769 -1.713 1.268\n", "C(category)[T.Human and Civil Rights] -0.0320 0.774 -0.041 0.967 -1.550 1.486\n", "C(category)[T.International] -0.0034 0.766 -0.004 0.996 -1.505 1.498\n", "C(category)[T.Religion] 1.1258 0.708 1.589 0.112 -0.263 2.514\n", "C(category)[T.Research and Public Policy] 0.5617 0.799 0.703 0.482 -1.004 2.127\n", "SOX_policies_2011 -0.4728 0.124 -3.805 0.000 -0.716 -0.229\n", "total_revenue_2011_logged 0.2978 0.126 2.356 0.018 0.050 0.546\n", "program_expense_percent_2011 -0.0330 0.011 -2.998 0.003 -0.055 -0.011\n", "age -0.0104 0.007 -1.499 0.134 -0.024 0.003\n", "complexity_2011 -0.8049 0.465 -1.730 0.084 -1.717 0.107\n", "=============================================================================================================\n", "\"\"\"" ] }, "execution_count": 546, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logit = smf.logit(formula=logit_formula, data=df_2011.dropna()).fit(cov_type='cluster', \n", " cov_kwds={'groups': df_2011.dropna()['state_2011']}) \n", "logit.summary()" ] }, { "cell_type": "code", "execution_count": 547, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warning: Maximum number of iterations has been exceeded.\n", " Current function value: 0.043361\n", " Iterations: 35\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "//anaconda/lib/python2.7/site-packages/statsmodels/base/model.py:466: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals\n", " \"Check mle_retvals\", ConvergenceWarning)\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: donor_advisory_2016 No. Observations: 4808
Model: Logit Df Residuals: 4792
Method: MLE Df Model: 15
Date: Wed, 31 Aug 2016 Pseudo R-squ.: 0.1172
Time: 21:48:36 Log-Likelihood: -208.48
converged: False LL-Null: -236.17
LLR p-value: 1.546e-06
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept -3.4848 2.010 -1.734 0.083 -7.424 0.454
C(category)[T.Arts, Culture, Humanities] -0.7432 0.940 -0.790 0.429 -2.586 1.100
C(category)[T.Community Development] 0.2960 0.835 0.354 0.723 -1.341 1.933
C(category)[T.Education] -0.1108 0.929 -0.119 0.905 -1.931 1.709
C(category)[T.Environment] -17.5510 4367.028 -0.004 0.997 -8576.769 8541.667
C(category)[T.Health] -0.1414 0.755 -0.187 0.851 -1.620 1.338
C(category)[T.Human Services] -0.2230 0.707 -0.315 0.753 -1.609 1.163
C(category)[T.Human and Civil Rights] -0.0320 0.934 -0.034 0.973 -1.863 1.799
C(category)[T.International] -0.0034 0.790 -0.004 0.997 -1.552 1.545
C(category)[T.Religion] 1.1258 0.673 1.672 0.095 -0.194 2.445
C(category)[T.Research and Public Policy] 0.5617 0.930 0.604 0.546 -1.260 2.384
SOX_policies_2011 -0.4728 0.146 -3.240 0.001 -0.759 -0.187
total_revenue_2011_logged 0.2978 0.139 2.150 0.032 0.026 0.569
program_expense_percent_2011 -0.0330 0.010 -3.336 0.001 -0.052 -0.014
age -0.0104 0.010 -1.081 0.280 -0.029 0.008
complexity_2011 -0.8049 0.394 -2.045 0.041 -1.576 -0.034
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "===============================================================================\n", "Dep. Variable: donor_advisory_2016 No. Observations: 4808\n", "Model: Logit Df Residuals: 4792\n", "Method: MLE Df Model: 15\n", "Date: Wed, 31 Aug 2016 Pseudo R-squ.: 0.1172\n", "Time: 21:48:36 Log-Likelihood: -208.48\n", "converged: False LL-Null: -236.17\n", " LLR p-value: 1.546e-06\n", "=============================================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------------------------------\n", "Intercept -3.4848 2.010 -1.734 0.083 -7.424 0.454\n", "C(category)[T.Arts, Culture, Humanities] -0.7432 0.940 -0.790 0.429 -2.586 1.100\n", "C(category)[T.Community Development] 0.2960 0.835 0.354 0.723 -1.341 1.933\n", "C(category)[T.Education] -0.1108 0.929 -0.119 0.905 -1.931 1.709\n", "C(category)[T.Environment] -17.5510 4367.028 -0.004 0.997 -8576.769 8541.667\n", "C(category)[T.Health] -0.1414 0.755 -0.187 0.851 -1.620 1.338\n", "C(category)[T.Human Services] -0.2230 0.707 -0.315 0.753 -1.609 1.163\n", "C(category)[T.Human and Civil Rights] -0.0320 0.934 -0.034 0.973 -1.863 1.799\n", "C(category)[T.International] -0.0034 0.790 -0.004 0.997 -1.552 1.545\n", "C(category)[T.Religion] 1.1258 0.673 1.672 0.095 -0.194 2.445\n", "C(category)[T.Research and Public Policy] 0.5617 0.930 0.604 0.546 -1.260 2.384\n", "SOX_policies_2011 -0.4728 0.146 -3.240 0.001 -0.759 -0.187\n", "total_revenue_2011_logged 0.2978 0.139 2.150 0.032 0.026 0.569\n", "program_expense_percent_2011 -0.0330 0.010 -3.336 0.001 -0.052 -0.014\n", "age -0.0104 0.010 -1.081 0.280 -0.029 0.008\n", "complexity_2011 -0.8049 0.394 -2.045 0.041 -1.576 -0.034\n", "=============================================================================================================\n", "\"\"\"" ] }, "execution_count": 547, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logit = smf.logit(formula=logit_formula, data=df).fit() \n", "logit.summary()" ] }, { "cell_type": "code", "execution_count": 549, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Human Services 1188\n", "Arts, Culture, Humanities 670\n", "Health 574\n", "International 428\n", "Community Development 381\n", "Animals 372\n", "Environment 317\n", "Religion 295\n", "Education 280\n", "Human and Civil Rights 181\n", "Research and Public Policy 122\n", "Name: category, dtype: int64" ] }, "execution_count": 549, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011.dropna()['category'].value_counts()" ] }, { "cell_type": "code", "execution_count": 513, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.044916\n", " Iterations 10\n" ] }, { "ename": "ValueError", "evalue": "The weights and list don't have the same length.", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m logit = smf.logit(formula=logit_formula, data=df[df['2011 data']==1]).fit(cov_type='cluster', \n\u001b[0;32m----> 2\u001b[0;31m cov_kwds={'groups': df[df['2011 data']==1]['state_2011']}) \n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mlogit\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msummary\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.pyc\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)\u001b[0m\n\u001b[1;32m 1374\u001b[0m bnryfit = super(Logit, self).fit(start_params=start_params,\n\u001b[1;32m 1375\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmaxiter\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmaxiter\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfull_output\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfull_output\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1376\u001b[0;31m disp=disp, callback=callback, **kwargs)\n\u001b[0m\u001b[1;32m 1377\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1378\u001b[0m \u001b[0mdiscretefit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mLogitResults\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbnryfit\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.pyc\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)\u001b[0m\n\u001b[1;32m 201\u001b[0m mlefit = super(DiscreteModel, self).fit(start_params=start_params,\n\u001b[1;32m 202\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmaxiter\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmaxiter\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfull_output\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfull_output\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 203\u001b[0;31m disp=disp, callback=callback, **kwargs)\n\u001b[0m\u001b[1;32m 204\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 205\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mmlefit\u001b[0m \u001b[0;31m# up to subclasses to wrap results\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs)\u001b[0m\n\u001b[1;32m 455\u001b[0m \u001b[0;31m#print('kwds inLikelihoodModel.fit', kwds)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 456\u001b[0m \u001b[0;31m#TODO: add Hessian approximation and change the above if needed\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 457\u001b[0;31m \u001b[0mmlefit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mLikelihoodModelResults\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxopt\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mHinv\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mscale\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1.\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 458\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 459\u001b[0m \u001b[0;31m#TODO: hardcode scale?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, model, params, normalized_cov_params, scale, **kwargs)\u001b[0m\n\u001b[1;32m 940\u001b[0m \u001b[0;31m# TODO: we shouldn't need use_t in get_robustcov_results\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 941\u001b[0m get_robustcov_results(self, cov_type=cov_type, use_self=True,\n\u001b[0;32m--> 942\u001b[0;31m use_t=use_t, **cov_kwds)\n\u001b[0m\u001b[1;32m 943\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 944\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/base/covtype.pyc\u001b[0m in \u001b[0;36mget_robustcov_results\u001b[0;34m(self, cov_type, use_t, **kwds)\u001b[0m\n\u001b[1;32m 193\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mn_groups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mn_groups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munique\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgroups\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 194\u001b[0m res.cov_params_default = sw.cov_cluster(self, groups,\n\u001b[0;32m--> 195\u001b[0;31m use_correction=use_correction)\n\u001b[0m\u001b[1;32m 196\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 197\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mgroups\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndim\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc\u001b[0m in \u001b[0;36mcov_cluster\u001b[0;34m(results, group, use_correction)\u001b[0m\n\u001b[1;32m 535\u001b[0m \u001b[0mclusters\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munique\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgroup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 536\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 537\u001b[0;31m \u001b[0mscale\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mS_crosssection\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mxu\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgroup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 538\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 539\u001b[0m \u001b[0mnobs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mk_params\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mxu\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc\u001b[0m in \u001b[0;36mS_crosssection\u001b[0;34m(x, group)\u001b[0m\n\u001b[1;32m 489\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 490\u001b[0m '''\n\u001b[0;32m--> 491\u001b[0;31m \u001b[0mx_group_sums\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgroup_sums\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgroup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m \u001b[0;31m#TODO: why transposed\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 492\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 493\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mS_white_simple\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_group_sums\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc\u001b[0m in \u001b[0;36mgroup_sums\u001b[0;34m(x, group)\u001b[0m\n\u001b[1;32m 435\u001b[0m \u001b[0;31m#TODO: transpose return in group_sum, need test coverage first\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 436\u001b[0m return np.array([np.bincount(group, weights=x[:, col])\n\u001b[0;32m--> 437\u001b[0;31m for col in range(x.shape[1])])\n\u001b[0m\u001b[1;32m 438\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 439\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: The weights and list don't have the same length." ] } ], "source": [ "logit = smf.logit(formula=logit_formula, data=df[df['2011 data']==1]).fit(cov_type='cluster', \n", " cov_kwds={'groups': df[df['2011 data']]['state_2011']}) \n", "logit.summary()" ] }, { "cell_type": "code", "execution_count": 526, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.044916\n", " Iterations 10\n" ] }, { "ename": "ValueError", "evalue": "The weights and list don't have the same length.", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m logit = smf.logit(formula=logit_formula, data=df[df['total_revenue_2011_logged'].notnull()]).fit(cov_type='cluster', \n\u001b[0;32m----> 2\u001b[0;31m cov_kwds={'groups': df[df['total_revenue_2011_logged'].notnull()]['state_2011']}) \n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mlogit\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msummary\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.pyc\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)\u001b[0m\n\u001b[1;32m 1374\u001b[0m bnryfit = super(Logit, self).fit(start_params=start_params,\n\u001b[1;32m 1375\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmaxiter\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmaxiter\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfull_output\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfull_output\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1376\u001b[0;31m disp=disp, callback=callback, **kwargs)\n\u001b[0m\u001b[1;32m 1377\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1378\u001b[0m \u001b[0mdiscretefit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mLogitResults\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbnryfit\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/discrete/discrete_model.pyc\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)\u001b[0m\n\u001b[1;32m 201\u001b[0m mlefit = super(DiscreteModel, self).fit(start_params=start_params,\n\u001b[1;32m 202\u001b[0m \u001b[0mmethod\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmethod\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmaxiter\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmaxiter\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfull_output\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfull_output\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 203\u001b[0;31m disp=disp, callback=callback, **kwargs)\n\u001b[0m\u001b[1;32m 204\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 205\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mmlefit\u001b[0m \u001b[0;31m# up to subclasses to wrap results\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs)\u001b[0m\n\u001b[1;32m 455\u001b[0m \u001b[0;31m#print('kwds inLikelihoodModel.fit', kwds)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 456\u001b[0m \u001b[0;31m#TODO: add Hessian approximation and change the above if needed\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 457\u001b[0;31m \u001b[0mmlefit\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mLikelihoodModelResults\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mxopt\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mHinv\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mscale\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1.\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 458\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 459\u001b[0m \u001b[0;31m#TODO: hardcode scale?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/base/model.pyc\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, model, params, normalized_cov_params, scale, **kwargs)\u001b[0m\n\u001b[1;32m 940\u001b[0m \u001b[0;31m# TODO: we shouldn't need use_t in get_robustcov_results\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 941\u001b[0m get_robustcov_results(self, cov_type=cov_type, use_self=True,\n\u001b[0;32m--> 942\u001b[0;31m use_t=use_t, **cov_kwds)\n\u001b[0m\u001b[1;32m 943\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 944\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/base/covtype.pyc\u001b[0m in \u001b[0;36mget_robustcov_results\u001b[0;34m(self, cov_type, use_t, **kwds)\u001b[0m\n\u001b[1;32m 193\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mn_groups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mn_groups\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munique\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgroups\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 194\u001b[0m res.cov_params_default = sw.cov_cluster(self, groups,\n\u001b[0;32m--> 195\u001b[0;31m use_correction=use_correction)\n\u001b[0m\u001b[1;32m 196\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 197\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mgroups\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndim\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc\u001b[0m in \u001b[0;36mcov_cluster\u001b[0;34m(results, group, use_correction)\u001b[0m\n\u001b[1;32m 535\u001b[0m \u001b[0mclusters\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munique\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgroup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 536\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 537\u001b[0;31m \u001b[0mscale\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mS_crosssection\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mxu\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgroup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 538\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 539\u001b[0m \u001b[0mnobs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mk_params\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mxu\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc\u001b[0m in \u001b[0;36mS_crosssection\u001b[0;34m(x, group)\u001b[0m\n\u001b[1;32m 489\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 490\u001b[0m '''\n\u001b[0;32m--> 491\u001b[0;31m \u001b[0mx_group_sums\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgroup_sums\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgroup\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m \u001b[0;31m#TODO: why transposed\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 492\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 493\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mS_white_simple\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_group_sums\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/statsmodels/stats/sandwich_covariance.pyc\u001b[0m in \u001b[0;36mgroup_sums\u001b[0;34m(x, group)\u001b[0m\n\u001b[1;32m 435\u001b[0m \u001b[0;31m#TODO: transpose return in group_sum, need test coverage first\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 436\u001b[0m return np.array([np.bincount(group, weights=x[:, col])\n\u001b[0;32m--> 437\u001b[0;31m for col in range(x.shape[1])])\n\u001b[0m\u001b[1;32m 438\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 439\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: The weights and list don't have the same length." ] } ], "source": [ "logit = smf.logit(formula=logit_formula, data=df[df['total_revenue_2011_logged'].notnull()]).fit(cov_type='cluster', \n", " cov_kwds={'groups': df[df['total_revenue_2011_logged'].notnull()]['state_2011']}) \n", "logit.summary()" ] }, { "cell_type": "code", "execution_count": 500, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4863\n", "4816\n", "47\n" ] } ], "source": [ "print len(df[(df['2011 data']==1) & (df['donor_advisory_2016'].notnull())])\n", "print len(df[(df['2011 data']==1) & (df['donor_advisory_2016']==0)])\n", "print len(df[(df['2011 data']==1) & (df['donor_advisory_2016']==1)])" ] }, { "cell_type": "code", "execution_count": 468, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_star_2011', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011']\n" ] } ], "source": [ "print cols_2011" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "BMF_columns = ['NEW ROW', 'NAME_2015_BMF', 'STREET_2015_BMF', 'CITY_2015_BMF', 'STATE_2015_BMF', 'ZIP_2015_BMF', \n", "'RULING_2015_BMF', 'ACTIVITY_2015_BMF', 'TAX_PERIOD_2015_BMF', 'ASSET_AMT_2015_BMF', 'INCOME_AMT_2015_BMF', \n", "'REVENUE_AMT_2015_BMF', 'NTEE_CD_2015_BMF', '2015 BMF', 'ruledate_2004_BMF', 'name_MSTRALL', 'state_MSTRALL', \n", "'NTEE1_MSTRALL', 'nteecc_MSTRALL', 'zip_MSTRALL', 'fips_MSTRALL', 'taxper_MSTRALL', 'income_MSTRALL', \n", "'F990REV_MSTRALL', 'assets_MSTRALL', 'ruledate_MSTRALL', 'deductcd_MSTRALL', 'accper_MSTRALL', \n", "'rule_date_v1', 'taxpd']" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['']df[(df['SOX_policies_2011'].notnull())]" ] }, { "cell_type": "code", "execution_count": 475, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "321\n", "321 321\n" ] }, { "data": { "text/plain": [ "['16722', '14954', '16155']" ] }, "execution_count": 475, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['current_donor_advisory']==1])\n", "DA_2016 = df[df['current_donor_advisory']==1]['org_id'].tolist()\n", "print len(DA_2016), len(set(DA_2016))\n", "DA_2016[:3]" ] }, { "cell_type": "code", "execution_count": 486, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 84590\n", "1 368\n", "Name: donor_advisory_2016, dtype: int64" ] }, "execution_count": 486, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['donor_advisory_2016'] = np.nan\n", "df['donor_advisory_2016'] = np.where( df['org_id'].isin(DA_2016), 1,0\n", " )\n", "df['donor_advisory_2016'].value_counts()" ] }, { "cell_type": "code", "execution_count": 473, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameorg_idFYE2011 datacurrent_donor_advisorypast_donor_advisorySOX_policies_2011
2110,000 Degrees6466FY20091003
39100 Club of Arizona12123FY20091002
521000 Friends of Florida10092FY20081000
631000 Friends of Oregon8770FY20101003
1484 Paws for Ability13055FY20091003
161The 92nd Street Y4792FY20101003
185A Better Chance6082FY20101003
208A Contemporary Theatre3634FY20091003
228A Kid Again9239FY20091003
244A Noise Within10176FY20101000
255A Place Called Home8040FY20101003
276A.J. Muste Memorial Institute6096FY20091003
292AAA Foundation for Traffic Safety8302FY20091003
312Aaron Diamond AIDS Research Center4991FY20101003
328AARP Foundation3205FY20091003
337AAUW - American Association of University Women3240FY20101003
362Abilities United7940FY20101003
373The Ability Experience7632FY20101003
385Abode Services9182FY20101003
399The Abraham Fund Initiatives9371FY20091003
426Abused Deaf Women's Advocacy Services12762FY20091003
441Academy of Achievement5705FY20091001
458Academy of American Poets9256FY20091003
477The Academy of Natural Sciences of Drexel University3209FY20091003
\n", "
" ], "text/plain": [ " name org_id FYE \\\n", "21 10,000 Degrees 6466 FY2009 \n", "39 100 Club of Arizona 12123 FY2009 \n", "52 1000 Friends of Florida 10092 FY2008 \n", "63 1000 Friends of Oregon 8770 FY2010 \n", "148 4 Paws for Ability 13055 FY2009 \n", "161 The 92nd Street Y 4792 FY2010 \n", "185 A Better Chance 6082 FY2010 \n", "208 A Contemporary Theatre 3634 FY2009 \n", "228 A Kid Again 9239 FY2009 \n", "244 A Noise Within 10176 FY2010 \n", "255 A Place Called Home 8040 FY2010 \n", "276 A.J. Muste Memorial Institute 6096 FY2009 \n", "292 AAA Foundation for Traffic Safety 8302 FY2009 \n", "312 Aaron Diamond AIDS Research Center 4991 FY2010 \n", "328 AARP Foundation 3205 FY2009 \n", "337 AAUW - American Association of University Women 3240 FY2010 \n", "362 Abilities United 7940 FY2010 \n", "373 The Ability Experience 7632 FY2010 \n", "385 Abode Services 9182 FY2010 \n", "399 The Abraham Fund Initiatives 9371 FY2009 \n", "426 Abused Deaf Women's Advocacy Services 12762 FY2009 \n", "441 Academy of Achievement 5705 FY2009 \n", "458 Academy of American Poets 9256 FY2009 \n", "477 The Academy of Natural Sciences of Drexel University 3209 FY2009 \n", "\n", " 2011 data current_donor_advisory past_donor_advisory SOX_policies_2011 \n", "21 1 0 0 3 \n", "39 1 0 0 2 \n", "52 1 0 0 0 \n", "63 1 0 0 3 \n", "148 1 0 0 3 \n", "161 1 0 0 3 \n", "185 1 0 0 3 \n", "208 1 0 0 3 \n", "228 1 0 0 3 \n", "244 1 0 0 0 \n", "255 1 0 0 3 \n", "276 1 0 0 3 \n", "292 1 0 0 3 \n", "312 1 0 0 3 \n", "328 1 0 0 3 \n", "337 1 0 0 3 \n", "362 1 0 0 3 \n", "373 1 0 0 3 \n", "385 1 0 0 3 \n", "399 1 0 0 3 \n", "426 1 0 0 3 \n", "441 1 0 0 1 \n", "458 1 0 0 3 \n", "477 1 0 0 3 " ] }, "execution_count": 473, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['SOX_policies_2011'].notnull())][['name', 'org_id', 'FYE', '2011 data', #'current_or_past_donor_advisory',\n", " 'current_donor_advisory', 'past_donor_advisory', 'SOX_policies_2011']][:24]" ] }, { "cell_type": "code", "execution_count": 447, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 84637\n", "1 321\n", "Name: advisory, dtype: int64" ] }, "execution_count": 447, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df['advisory'] = np.where(~df['advisory text - current advisory'].isnull(), 1,0)\n", "#df['advisory'].value_counts()" ] }, { "cell_type": "code", "execution_count": 458, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#df[(df['2011 data']==1) & (df['past_donor_advisory']==1)][:5]" ] }, { "cell_type": "code", "execution_count": 483, "metadata": { "collapsed": false }, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m 2016 _donor_advisory\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "controls = ['total_revenue_2011', 'program_expense_percent_2011', \n", " ]\n", "len(df[(df['2011 data']==1) & (df['total_revenue_2011'].isnull())])#[controls]" ] }, { "cell_type": "code", "execution_count": 411, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINorg_urlnamecategorycategory-fullDate PublishedForm 990 FYEForm 990 FYE, v2FYEEarliest Rating Publication Dateratings_systemOverall ScoreOverall Ratingadvisory text - current advisoryadvisory text - past advisorycurrent_or_past_donor_advisorycurrent_donor_advisorypast_donor_advisorylatest_entrycurrent_ratings_urlein_2016Publication_date_and_FY_2016Publication Date_2016FYE_2016donor_alert_2016overall_rating_2016efficiency_rating_rating_2016AT_rating_2016overall_rating_star_2016financial_rating_star_2016AT_rating_star_2016program_expense_percent_2016admin_expense_percent_2016fund_expense_percent_2016fund_efficiency_2016working_capital_ratio_2016program_expense_growth_2016liabilities_to_assets_2016independent_board_2016no_material_division_2016audited_financials_2016no_loans_related_2016documents_minutes_2016form_990_2016conflict_of_interest_policy_2016whistleblower_policy_2016records_retention_policy_2016CEO_listed_2016process_CEO_compensation_2016no_board_compensation_2016donor_privacy_policy_2016board_listed_2016audited_financials_web_2016form_990_web_2016staff_listed_2016contributions_gifts_grants_2016federated_campaigns_2016membership_dues_2016fundraising_events_2016related_organizations_2016government_grants_2016total_contributions_2016program_service_revenue_2016total_primary_revenue_2016other_revenue_2016total_revenue_2016program_expenses_2016administrative_expenses_2016fundraising_expenses_2016total_functional_expenses_2016payments_to_affiliates_2016excess_or_deficit_2016net_assets_2016comp_2016cp_2016mission_20162011 datacharity_name_2011category_2011city_2011state_2011cause_2011tag_line_2011url_2011ein_2011fye_2011overall_rating_2011overall_rating_2011_plus_30overall_rating_2011_plus_30_v2overall_rating_star_2011overall_rating_star_2011_textefficiency_rating_2011AT_rating_2011financial_rating_star_2011AT_rating_star_2011program_expense_percent_2011admin_expense_percent_2011fund_expense_percent_2011fund_efficiency_2011primary_revenue_growth_2011program_expense_growth_2011working_capital_ratio_2011independent_board_2011no_material_division_2011audited_financials_2011no_loans_related_2011documents_minutes_2011form_990_2011conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011CEO_listed_2011process_CEO_compensation_2011no_board_compensation_2011donor_privacy_policy_2011board_listed_2011audited_financials_web_2011form_990_web_2011staff_listed_2011primary_revenue_2011other_revenue_2011total_revenue_2011govt_revenue_2011program_expense_2011admin_expense_2011fund_expense_2011total_functional_expense_2011affiliate_payments_2011budget_surplus_2011net_assets_2011leader_comp_2011leader_comp_percent_2011email_2011website_20112016 Advisory - Date Posted2016 Advisory - Charity Name2016 Advisory - advisory_url2016 Advisory - advisory_merge_v1to_be_mergedNEW ROWNAME_2015_BMFSTREET_2015_BMFCITY_2015_BMFSTATE_2015_BMFZIP_2015_BMFRULING_2015_BMFACTIVITY_2015_BMFTAX_PERIOD_2015_BMFASSET_AMT_2015_BMFINCOME_AMT_2015_BMFREVENUE_AMT_2015_BMFNTEE_CD_2015_BMF2015 BMFruledate_2004_BMFname_MSTRALLstate_MSTRALLNTEE1_MSTRALLnteecc_MSTRALLzip_MSTRALLfips_MSTRALLtaxper_MSTRALLincome_MSTRALLF990REV_MSTRALLassets_MSTRALLruledate_MSTRALLdeductcd_MSTRALLaccper_MSTRALLrule_date_v1taxpdNAME_SOIyr_frmtnpt1_num_vtng_gvrn_bdy_memspt1_num_ind_vtng_memsnum_vtng_gvrn_bdy_memsnum_ind_vtng_memstot_num_emplstot_num_vlntrscontri_grnts_cyprog_srvc_rev_cyinvst_incm_cyoth_rev_cygrnts_and_smlr_amts_cytot_prof_fndrsng_exp_cytot_fndrsng_exp_cypt1_tot_asts_eoyaud_fincl_stmtsmtrl_divrsn_or_misusecnflct_int_plcywhistleblower_plcydoc_retention_plcyfederated_campaignsmemshp_duesrltd_orgsgovt_grntsall_oth_contrinncsh_contritot_contripsr_totinv_incm_tot_revbonds_tot_revroylrev_tot_revnet_rent_tot_revgain_or_loss_secgain_or_loss_othoth_rev_tottot_revmgmt_srvc_fee_totfee_for_srvc_leg_totfee_for_srvc_acct_totfee_for_srvc_lbby_totfee_for_srvc_prof_totfee_for_srvc_invst_totfee_for_srvc_oth_totfs_auditedaudit_committeevlntr_hrs_mergerule_dateruledate_2004_BMF_v2ruledate_MSTRALL_v2yr_frmtn_v2agecategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policygovt_revenue_2011_binary
910166043314346http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=10166Angel Flight NortheastHealthHealth : Patient and Family Support2011-01-05 00:00:002009-122009-12-01FY20092005-12-01CN 1.0--Donor AdvisoryNaNThis donor advisory was published on Wednesday, January 5, 2011.In accordance with our.policy for removing Donor Advisories., Charity Navigator removed the Donor Advisory for Angel Flight Northeast on March 1, 2012 because the Donor Advisory had been in place for more than a year (since January 5, 2011) and because the issue that prompted the Donor Advisory has been resolved..Charity Navigator had published a Donor Advisory for this  charity because we became aware of the following informati...101FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1Angel Flight NortheastHealthNorth AndoverMAPatient and Family SupportProviding free flights so children and adults can access medical care since 1996http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=1016604-3314346NaNNaN----NaNMissing - Apparent Donor AdvisoryNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_only1NaNANGEL FLIGHT OF NEW ENGLAND INCLAWRENCE MUNICIPAL AIRPORTNORTH ANDOVERMA01845-00002008129941790002013128693108962593877845E991199608ANGEL FLIGHT NEW ENGMAEE8701867-1110250172000125394505208623187581996081121996NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_only200819961996nan800000100000NaN
216466953667812http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=646610,000 DegreesEducationEducation : Scholarship and Financial Support2011-09-20 00:00:002009-062009-06-01FY20092003-09-01CN 2.085.333 starsNaNNaN000FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN110,000 DegreesEducationSan RafaelCAOther Education Programs and ServicesCreating College Graduates Who Change the Worldhttp://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=646695-366781206/200955.3385.3385.3333 stars52.42593379.77.312.80.113.400.67yesyesyesyesyesyesyesyesyesyesyesyesNOyesNONOyes39142222165034130725Note: This organization receives $0 in government support.28135322600074546293528168060255733891661543004.37info@10000degrees.orghttp://www.10000degrees.orgNaNNaNNaNNaNboth0NaN10000 DEGREES1650 LOS GAMOS SUITE 110SAN RAFAELCA94903-183819810540000000201506861166277602097627694B111198211MARIN EDUC FNDCABB2094901-2920060412001063958011395801119582511982111061982NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_only198119821982nan35000100000000
\n", "
" ], "text/plain": [ " org_id EIN \\\n", "9 10166 043314346 \n", "21 6466 953667812 \n", "\n", " org_url \\\n", "9 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=10166 \n", "21 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=6466 \n", "\n", " name category \\\n", "9 Angel Flight Northeast Health \n", "21 10,000 Degrees Education \n", "\n", " category-full Date Published \\\n", "9 Health : Patient and Family Support 2011-01-05 00:00:00 \n", "21 Education : Scholarship and Financial Support 2011-09-20 00:00:00 \n", "\n", " Form 990 FYE Form 990 FYE, v2 FYE Earliest Rating Publication Date \\\n", "9 2009-12 2009-12-01 FY2009 2005-12-01 \n", "21 2009-06 2009-06-01 FY2009 2003-09-01 \n", "\n", " ratings_system Overall Score Overall Rating \\\n", "9 CN 1.0 -- Donor Advisory \n", "21 CN 2.0 85.33 3 stars \n", "\n", " advisory text - current advisory \\\n", "9 NaN \n", "21 NaN \n", "\n", " advisory text - past advisory \\\n", "9 This donor advisory was published on Wednesday, January 5, 2011.In accordance with our.policy for removing Donor Advisories., Charity Navigator removed the Donor Advisory for Angel Flight Northeast on March 1, 2012 because the Donor Advisory had been in place for more than a year (since January 5, 2011) and because the issue that prompted the Donor Advisory has been resolved..Charity Navigator had published a Donor Advisory for this  charity because we became aware of the following informati... \n", "21 NaN \n", "\n", " current_or_past_donor_advisory current_donor_advisory \\\n", "9 1 0 \n", "21 0 0 \n", "\n", " past_donor_advisory latest_entry current_ratings_url ein_2016 \\\n", "9 1 False NaN NaN \n", "21 0 False NaN NaN \n", "\n", " Publication_date_and_FY_2016 Publication Date_2016 FYE_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " donor_alert_2016 overall_rating_2016 efficiency_rating_rating_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " AT_rating_2016 overall_rating_star_2016 financial_rating_star_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " AT_rating_star_2016 program_expense_percent_2016 \\\n", "9 NaN NaN \n", "21 NaN NaN \n", "\n", " admin_expense_percent_2016 fund_expense_percent_2016 fund_efficiency_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " working_capital_ratio_2016 program_expense_growth_2016 \\\n", "9 NaN NaN \n", "21 NaN NaN \n", "\n", " liabilities_to_assets_2016 independent_board_2016 \\\n", "9 NaN NaN \n", "21 NaN NaN \n", "\n", " no_material_division_2016 audited_financials_2016 no_loans_related_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " documents_minutes_2016 form_990_2016 conflict_of_interest_policy_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " whistleblower_policy_2016 records_retention_policy_2016 CEO_listed_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " process_CEO_compensation_2016 no_board_compensation_2016 \\\n", "9 NaN NaN \n", "21 NaN NaN \n", "\n", " donor_privacy_policy_2016 board_listed_2016 audited_financials_web_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " form_990_web_2016 staff_listed_2016 contributions_gifts_grants_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " federated_campaigns_2016 membership_dues_2016 fundraising_events_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " related_organizations_2016 government_grants_2016 total_contributions_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " program_service_revenue_2016 total_primary_revenue_2016 other_revenue_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " total_revenue_2016 program_expenses_2016 administrative_expenses_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " fundraising_expenses_2016 total_functional_expenses_2016 \\\n", "9 NaN NaN \n", "21 NaN NaN \n", "\n", " payments_to_affiliates_2016 excess_or_deficit_2016 net_assets_2016 \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " comp_2016 cp_2016 mission_2016 2011 data charity_name_2011 \\\n", "9 NaN NaN NaN 1 Angel Flight Northeast \n", "21 NaN NaN NaN 1 10,000 Degrees \n", "\n", " category_2011 city_2011 state_2011 \\\n", "9 Health North Andover MA \n", "21 Education San Rafael CA \n", "\n", " cause_2011 \\\n", "9 Patient and Family Support \n", "21 Other Education Programs and Services \n", "\n", " tag_line_2011 \\\n", "9 Providing free flights so children and adults can access medical care since 1996 \n", "21 Creating College Graduates Who Change the World \n", "\n", " url_2011 \\\n", "9 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=10166 \n", "21 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=6466 \n", "\n", " ein_2011 fye_2011 overall_rating_2011 overall_rating_2011_plus_30 \\\n", "9 04-3314346 NaN NaN -- \n", "21 95-3667812 06/2009 55.33 85.33 \n", "\n", " overall_rating_2011_plus_30_v2 overall_rating_star_2011 \\\n", "9 -- NaN \n", "21 85.33 3 \n", "\n", " overall_rating_star_2011_text efficiency_rating_2011 AT_rating_2011 \\\n", "9 Missing - Apparent Donor Advisory NaN NaN \n", "21 3 stars 52.42 59 \n", "\n", " financial_rating_star_2011 AT_rating_star_2011 \\\n", "9 NaN NaN \n", "21 3 3 \n", "\n", " program_expense_percent_2011 admin_expense_percent_2011 \\\n", "9 NaN NaN \n", "21 79.7 7.3 \n", "\n", " fund_expense_percent_2011 fund_efficiency_2011 \\\n", "9 NaN NaN \n", "21 12.8 0.11 \n", "\n", " primary_revenue_growth_2011 program_expense_growth_2011 \\\n", "9 NaN NaN \n", "21 3.4 0 \n", "\n", " working_capital_ratio_2011 independent_board_2011 \\\n", "9 NaN NaN \n", "21 0.67 yes \n", "\n", " no_material_division_2011 audited_financials_2011 no_loans_related_2011 \\\n", "9 NaN NaN NaN \n", "21 yes yes yes \n", "\n", " documents_minutes_2011 form_990_2011 conflict_of_interest_policy_2011 \\\n", "9 NaN NaN NaN \n", "21 yes yes yes \n", "\n", " whistleblower_policy_2011 records_retention_policy_2011 CEO_listed_2011 \\\n", "9 NaN NaN NaN \n", "21 yes yes yes \n", "\n", " process_CEO_compensation_2011 no_board_compensation_2011 \\\n", "9 NaN NaN \n", "21 yes yes \n", "\n", " donor_privacy_policy_2011 board_listed_2011 audited_financials_web_2011 \\\n", "9 NaN NaN NaN \n", "21 NO yes NO \n", "\n", " form_990_web_2011 staff_listed_2011 primary_revenue_2011 \\\n", "9 NaN NaN NaN \n", "21 NO yes 3914222 \n", "\n", " other_revenue_2011 total_revenue_2011 \\\n", "9 NaN NaN \n", "21 216503 4130725 \n", "\n", " govt_revenue_2011 \\\n", "9 NaN \n", "21 Note: This organization receives $0 in government support. \n", "\n", " program_expense_2011 admin_expense_2011 fund_expense_2011 \\\n", "9 NaN NaN NaN \n", "21 2813532 260007 454629 \n", "\n", " total_functional_expense_2011 affiliate_payments_2011 \\\n", "9 NaN NaN \n", "21 3528168 0 \n", "\n", " budget_surplus_2011 net_assets_2011 leader_comp_2011 \\\n", "9 NaN NaN NaN \n", "21 602557 3389166 154300 \n", "\n", " leader_comp_percent_2011 email_2011 \\\n", "9 NaN NaN \n", "21 4.37 info@10000degrees.org \n", "\n", " website_2011 2016 Advisory - Date Posted \\\n", "9 NaN NaN \n", "21 http://www.10000degrees.org NaN \n", "\n", " 2016 Advisory - Charity Name 2016 Advisory - advisory_url \\\n", "9 NaN NaN \n", "21 NaN NaN \n", "\n", " 2016 Advisory - advisory _merge_v1 to_be_merged NEW ROW \\\n", "9 NaN left_only 1 NaN \n", "21 NaN both 0 NaN \n", "\n", " NAME_2015_BMF STREET_2015_BMF \\\n", "9 ANGEL FLIGHT OF NEW ENGLAND INC LAWRENCE MUNICIPAL AIRPORT \n", "21 10000 DEGREES 1650 LOS GAMOS SUITE 110 \n", "\n", " CITY_2015_BMF STATE_2015_BMF ZIP_2015_BMF RULING_2015_BMF \\\n", "9 NORTH ANDOVER MA 01845-0000 200812 \n", "21 SAN RAFAEL CA 94903-1838 198105 \n", "\n", " ACTIVITY_2015_BMF TAX_PERIOD_2015_BMF ASSET_AMT_2015_BMF \\\n", "9 994179000 201312 869310 \n", "21 40000000 201506 8611662 \n", "\n", " INCOME_AMT_2015_BMF REVENUE_AMT_2015_BMF NTEE_CD_2015_BMF 2015 BMF \\\n", "9 896259 3877845 E99 1 \n", "21 7760209 7627694 B11 1 \n", "\n", " ruledate_2004_BMF name_MSTRALL state_MSTRALL NTEE1_MSTRALL \\\n", "9 199608 ANGEL FLIGHT NEW ENG MA E \n", "21 198211 MARIN EDUC FND CA B \n", "\n", " nteecc_MSTRALL zip_MSTRALL fips_MSTRALL taxper_MSTRALL income_MSTRALL \\\n", "9 E87 01867-1110 25017 200012 539450 \n", "21 B20 94901-2920 06041 200106 3958011 \n", "\n", " F990REV_MSTRALL assets_MSTRALL ruledate_MSTRALL deductcd_MSTRALL \\\n", "9 520862 318758 199608 1 \n", "21 3958011 1958251 198211 1 \n", "\n", " accper_MSTRALL rule_date_v1 taxpd NAME_SOI yr_frmtn \\\n", "9 12 1996 NaN NaN NaN \n", "21 06 1982 NaN NaN NaN \n", "\n", " pt1_num_vtng_gvrn_bdy_mems pt1_num_ind_vtng_mems num_vtng_gvrn_bdy_mems \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " num_ind_vtng_mems tot_num_empls tot_num_vlntrs contri_grnts_cy \\\n", "9 NaN NaN NaN NaN \n", "21 NaN NaN NaN NaN \n", "\n", " prog_srvc_rev_cy invst_incm_cy oth_rev_cy grnts_and_smlr_amts_cy \\\n", "9 NaN NaN NaN NaN \n", "21 NaN NaN NaN NaN \n", "\n", " tot_prof_fndrsng_exp_cy tot_fndrsng_exp_cy pt1_tot_asts_eoy \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " aud_fincl_stmts mtrl_divrsn_or_misuse cnflct_int_plcy whistleblower_plcy \\\n", "9 NaN NaN NaN NaN \n", "21 NaN NaN NaN NaN \n", "\n", " doc_retention_plcy federated_campaigns memshp_dues rltd_orgs \\\n", "9 NaN NaN NaN NaN \n", "21 NaN NaN NaN NaN \n", "\n", " govt_grnts all_oth_contri nncsh_contri tot_contri psr_tot \\\n", "9 NaN NaN NaN NaN NaN \n", "21 NaN NaN NaN NaN NaN \n", "\n", " inv_incm_tot_rev bonds_tot_rev roylrev_tot_rev net_rent_tot_rev \\\n", "9 NaN NaN NaN NaN \n", "21 NaN NaN NaN NaN \n", "\n", " gain_or_loss_sec gain_or_loss_oth oth_rev_tot tot_rev \\\n", "9 NaN NaN NaN NaN \n", "21 NaN NaN NaN NaN \n", "\n", " mgmt_srvc_fee_tot fee_for_srvc_leg_tot fee_for_srvc_acct_tot \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " fee_for_srvc_lbby_tot fee_for_srvc_prof_tot fee_for_srvc_invst_tot \\\n", "9 NaN NaN NaN \n", "21 NaN NaN NaN \n", "\n", " fee_for_srvc_oth_tot fs_audited audit_committee vlntr_hrs _merge \\\n", "9 NaN NaN NaN NaN left_only \n", "21 NaN NaN NaN NaN left_only \n", "\n", " rule_date ruledate_2004_BMF_v2 ruledate_MSTRALL_v2 yr_frmtn_v2 age \\\n", "9 2008 1996 1996 nan 8 \n", "21 1981 1982 1982 nan 35 \n", "\n", " category_Animals category_Arts, Culture, Humanities \\\n", "9 0 0 \n", "21 0 0 \n", "\n", " category_Community Development category_Education category_Environment \\\n", "9 0 0 0 \n", "21 0 1 0 \n", "\n", " category_Health category_Human Services category_Human and Civil Rights \\\n", "9 1 0 0 \n", "21 0 0 0 \n", "\n", " category_International category_Religion \\\n", "9 0 0 \n", "21 0 0 \n", "\n", " category_Research and Public Policy govt_revenue_2011_binary \n", "9 0 NaN \n", "21 0 0 " ] }, "execution_count": 411, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['2011 data']==1][:2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 396, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['org_id', 'EIN', 'org_url', 'name', 'category', 'category-full', 'Date Published', 'Form 990 FYE', 'Form 990 FYE, v2', 'FYE', 'Earliest Rating Publication Date', 'ratings_system', 'Overall Score', 'Overall Rating', 'advisory text - current advisory', 'advisory text - past advisory', 'current_or_past_donor_advisory', 'current_donor_advisory', 'past_donor_advisory', 'latest_entry', 'current_ratings_url', 'ein_2016', 'Publication_date_and_FY_2016', 'Publication Date_2016', 'FYE_2016', 'donor_alert_2016', 'overall_rating_2016', 'efficiency_rating_rating_2016', 'AT_rating_2016', 'overall_rating_star_2016', 'financial_rating_star_2016', 'AT_rating_star_2016', 'program_expense_percent_2016', 'admin_expense_percent_2016', 'fund_expense_percent_2016', 'fund_efficiency_2016', 'working_capital_ratio_2016', 'program_expense_growth_2016', 'liabilities_to_assets_2016', 'independent_board_2016', 'no_material_division_2016', 'audited_financials_2016', 'no_loans_related_2016', 'documents_minutes_2016', 'form_990_2016', 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016', 'CEO_listed_2016', 'process_CEO_compensation_2016', 'no_board_compensation_2016', 'donor_privacy_policy_2016', 'board_listed_2016', 'audited_financials_web_2016', 'form_990_web_2016', 'staff_listed_2016', 'contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016', 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016', 'total_contributions_2016', 'program_service_revenue_2016', 'total_primary_revenue_2016', 'other_revenue_2016', 'total_revenue_2016', 'program_expenses_2016', 'administrative_expenses_2016', 'fundraising_expenses_2016', 'total_functional_expenses_2016', 'payments_to_affiliates_2016', 'excess_or_deficit_2016', 'net_assets_2016', 'comp_2016', 'cp_2016', 'mission_2016', '2011 data', 'charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_2011_plus_30', 'overall_rating_2011_plus_30_v2', 'overall_rating_star_2011', 'overall_rating_star_2011_text', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011', '2016 Advisory - Date Posted', '2016 Advisory - Charity Name', '2016 Advisory - advisory_url', '2016 Advisory - advisory', '_merge_v1', 'to_be_merged', u'NEW ROW', 'NAME_2015_BMF', 'STREET_2015_BMF', 'CITY_2015_BMF', 'STATE_2015_BMF', 'ZIP_2015_BMF', 'RULING_2015_BMF', 'ACTIVITY_2015_BMF', 'TAX_PERIOD_2015_BMF', 'ASSET_AMT_2015_BMF', 'INCOME_AMT_2015_BMF', 'REVENUE_AMT_2015_BMF', 'NTEE_CD_2015_BMF', '2015 BMF', 'ruledate_2004_BMF', 'name_MSTRALL', 'state_MSTRALL', 'NTEE1_MSTRALL', 'nteecc_MSTRALL', 'zip_MSTRALL', 'fips_MSTRALL', 'taxper_MSTRALL', 'income_MSTRALL', 'F990REV_MSTRALL', 'assets_MSTRALL', 'ruledate_MSTRALL', 'deductcd_MSTRALL', 'accper_MSTRALL', 'rule_date_v1', 'taxpd', 'NAME_SOI', 'yr_frmtn', 'pt1_num_vtng_gvrn_bdy_mems', 'pt1_num_ind_vtng_mems', 'num_vtng_gvrn_bdy_mems', 'num_ind_vtng_mems', 'tot_num_empls', 'tot_num_vlntrs', 'contri_grnts_cy', 'prog_srvc_rev_cy', 'invst_incm_cy', 'oth_rev_cy', 'grnts_and_smlr_amts_cy', 'tot_prof_fndrsng_exp_cy', 'tot_fndrsng_exp_cy', 'pt1_tot_asts_eoy', 'aud_fincl_stmts', 'mtrl_divrsn_or_misuse', 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy', 'federated_campaigns', 'memshp_dues', 'rltd_orgs', 'govt_grnts', 'all_oth_contri', 'nncsh_contri', 'tot_contri', 'psr_tot', 'inv_incm_tot_rev', 'bonds_tot_rev', 'roylrev_tot_rev', 'net_rent_tot_rev', 'gain_or_loss_sec', 'gain_or_loss_oth', 'oth_rev_tot', 'tot_rev', 'mgmt_srvc_fee_tot', 'fee_for_srvc_leg_tot', 'fee_for_srvc_acct_tot', 'fee_for_srvc_lbby_tot', 'fee_for_srvc_prof_tot', 'fee_for_srvc_invst_tot', 'fee_for_srvc_oth_tot', 'fs_audited', 'audit_committee', 'vlntr_hrs', '_merge', 'rule_date', 'ruledate_2004_BMF_v2', 'ruledate_MSTRALL_v2', 'yr_frmtn_v2', 'age', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ "print df.columns.tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SOI data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "SOI_columns = ['taxpd', 'yr_frmtn', 'pt1_num_vtng_gvrn_bdy_mems', 'pt1_num_ind_vtng_mems', 'num_vtng_gvrn_bdy_mems', \n", " 'num_ind_vtng_mems', 'tot_num_empls', 'tot_num_vlntrs', 'contri_grnts_cy', 'prog_srvc_rev_cy', \n", " 'invst_incm_cy', 'oth_rev_cy', 'grnts_and_smlr_amts_cy', 'tot_prof_fndrsng_exp_cy', \n", " 'tot_fndrsng_exp_cy', 'pt1_tot_asts_eoy', 'aud_fincl_stmts', 'mtrl_divrsn_or_misuse', \n", " 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy', 'federated_campaigns', 'memshp_dues', \n", " 'rltd_orgs', 'govt_grnts', 'all_oth_contri', 'nncsh_contri', 'tot_contri', 'psr_tot', \n", " 'inv_incm_tot_rev', 'bonds_tot_rev', 'roylrev_tot_rev', 'net_rent_tot_rev', 'gain_or_loss_sec', \n", " 'gain_or_loss_oth', 'oth_rev_tot', 'tot_rev', 'mgmt_srvc_fee_tot', 'fee_for_srvc_leg_tot', \n", " 'fee_for_srvc_acct_tot', 'fee_for_srvc_lbby_tot', 'fee_for_srvc_prof_tot', 'fee_for_srvc_invst_tot', \n", " 'fee_for_srvc_oth_tot', 'fs_audited', 'audit_committee', 'vlntr_hrs', 'NAME_SOI']\n", "len(SOI_columns)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 0 }