{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Notebook Tasks\n", "In this notebook you'll see code used to conduct the following steps:\n", "- Read in merged current/historical/2011 dataset with all BMF and SOI data: \n", " - **_merged data with EIN clean-up, SOI data, and 2015, 2008, and 2004 BMF data.pkl_**\n", " - There are 84,958 rows in this dataset, including some years for which only SOI data are included. \n", "\n", "\n", "**_Notes:_**\n", "- Of the 8,304 organizations in the dataset, 4,857 are those that were rated in 2011.\n", " - 582 of the 5,439 2011 organizations have been dropped by CN, leaving 4,857\n", "- The dataset is organized in a **_org/FY/ratings system_** format \n", " - i.e., one row per organization for each fiscal year, with multiple rows per org/FY when there has been a ratings system change from *CN 1.0* to *CN 2.0* to *CN 2.1.* \n", " - Some organizations will have more than one entry per fiscal year -- even for the same ratings system (CN2.1, CN2.0, or CN1.0) -- multiple ratings per year are triggered by amended 990s, etc.); e.g., https://www.charitynavigator.org/index.cfm?bay=search.history&orgid=10166\n", "- The baseline for the dataset is data gathered from each organization's *Historical Ratings* page -- you'll see the same number of rows -- and same data for each row -- as seen on those pages; e.g, https://www.charitynavigator.org/index.cfm?bay=search.history&orgid=10166\n", "- What I've done is to merge each organization's current *Rating Profile* into this *org/FY/ratings system* dataset, as seen here: https://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=10166\n", "- I have merged/appended in two rows of data for each organization:\n", " - **_current ratings_** -- data scraped from the *Rating Profile* in August 2016.\n", " - **_2011 data_** -- data scraped from the *Rating Profile* in October 2011.\n", "- So, for *SOX policy data*, we have two years' of data -- 2011 and 2016. For all years, we have what is on the *Historical Ratings* page -- namely, the overall numerical score, the overall numerical star rating, and whether the organization was under a donor advisory that year. \n", "\n", "\n", "
\n", "**_Possible Samples for Statistical Tests_**:\n", "- Given the above, there are a number of possible tests:

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IV: SOX PoliciesDV: Donor AdvisoryNNotesTO DO
201120164,85747 donor advisories on these organizations; associational test (we don't know when the SOX policies were added); also, DV is 'current donor advisory'ready to run
20112012-20164,85747 2016 advisories plus probably another dozen or so advisories over the 2012-2015 period; associational test as above, but adds in donor advisories that were put in place then dropped between 2012 and 2015.some minor work creating this new DV but not very burdensome
201120115,43939 donor advisories; pure cross-sectional test
Download the '2011' 990 data (SOX policies + controls) for the 39 orgs with a 2011 donor advisory; a few hours work to download and enter the data
201620168,304328 donor advisories; pure cross-sectional testready to run
change 2011-201620164,857'Divide 4,857 orgs into three groups: i) those with no SOX policies in 2011 and still no SOX policies in 2016; ii) those with SOX policies in 2011 and 2016; and iii) those with no SOX policies in 2011 but SOX policies in 2016. Create dummy variables for each group and see whether those in group iii) do better than i) or ii). This is a relatively low cost 'pre-post' test.moderate amount of work to create the new dummies but not too burdensome
change 2011-20162012-2016TBDSimilar to above option, but would need to take a sample of organizations in group iii) and go through their 990s to find out exactly when they added the SOX policiesResource-intensive 990 searches
\n", "\n", "\n", "

\n", "**_Notes from Meeting with Dan:_**\n", "- Do not do 3rd or 6th test -- benefit not worth the cost\n", "- 1st and 2nd tests can be robustness analyses\n", "- Focus on 4th and 5th tests\n", "- Control variables:\n", " - Size: total revenues best (probably logged)\n", " - will need 2011 and 2016 versions for the 4th and 5th tests\n", " - efficiency ratio\n", " - age (from BMF)\n", " - complexity (could be a good control from Erica's paper)\n", " - fixed effects:\n", " - state\n", " - category\n", " - I need to scrape the category dummies for the new orgs in the 2016 database\n", " - CN does not include that information in the ratings area, but it is included on the webpage in the 'breadcrumbs' area\n", " - The focus of our paper is on SOC policies; if an org has SOX policies it probably has other governance policies, and these would be highly correlated. So, we will leave the other governance variables out of one version of the 4th and 5th tests, and then try to include them in another set. The best candidates are:\n", " - *independent board* --> related to Erica's *independence of key actors\" concept\n", " - *board review of 990* and *audited financials* --> both related to Erica's *board monitoring* concept\n", " - we could include other governance variables as needed.\n", "- We are focusing on non-health, non-university organizations; by focusing more on a donor-focused sample (CN), we are differentiating the work from previous studies.\n", "- To differentiate from Erica's *JBE* paper, we should use the SOI data to see how many of the donor advisories are because of 'non-material diversions'.\n", "\n", "\n", "\n", "

\n", "**_To Do (beyond notes listed in table above):_**\n", "- For all above tests, we need to decide on controls, then find/merge/create any not currently in dataset\n", "- Run a selection model?\n", "- Code the *type* of advisory? Maybe save for future study\n", "- There are 53 orgs on the CN 'Watchlist' -- we probably don't need to look at these but it's a possible future move.\n", "\n", "
\n", "**_Notes on 2011 data:_**\n", "- Only 47 of 329 current donor advisories are on orgs that were rated in 2011\n", "- Number of 2011 orgs (n=5,349) missing from 2016 ratings: 582\n", "- Number of 2016 orgs (n=8,304) not in 2011 ratings: 3,447\n", "- In 2011 when I scraped the current ratings there are 39 blank rows. Specifically, I checked the following spreadsheet: *Charity Navigator - current ratings, October 18, 2011 (WITH UPDATES FOR DONOR ADVISORY ORGS).xlsx* -- 39 rows were blank for all ratings information, so I checked against the historical ratings on the CN website. (So far) all rows were either 1) dropped from CN, 2) had a donor advisory, or 3) still have a donor advisory. I have 5,439 orgs in the 2011 database. 39 seem to have had donor advisories on them at that time. So, the 2011 sample is the 5,400 orgs that did not have an advisory on them at the time. This conforms with the *n* of 5,400 in the above logit." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "### Import Packages\n", "First, we will import several necessary Python packages. We will be using the Python Data Analysis Library, or PANDAS, extensively for our data manipulations. It is invaluable for analyzing datasets. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import of basic elements of PANDAS and numpy" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from pandas import DataFrame\n", "from pandas import Series" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "We can check which version of various packages we're using. You can see I'm running PANDAS 0.17 here." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.18.1\n" ] } ], "source": [ "print pd.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "PANDAS allows you to set various options for, among other things, inspecting the data. I like to be able to see all of the columns. Therefore, I typically include this line at the top of all my notebooks." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#http://pandas.pydata.org/pandas-docs/stable/options.html\n", "pd.set_option('display.max_columns', None)\n", "pd.set_option('max_colwidth', 500)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read in Data\n", "Let's read in the merged historical/current/2011 dataset we created in the last notebook. First we'll change the working directory." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/gregorysaxton/Google Drive/SOX\n" ] } ], "source": [ "cd '/Users/gregorysaxton/Google Drive/SOX'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Historical Ratings" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of columns: 219\n", "Number of observations: 84958\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINorg_urlnamecategorycategory-fullDate PublishedForm 990 FYEForm 990 FYE, v2FYEEarliest Rating Publication Dateratings_systemOverall ScoreOverall Ratingadvisory text - current advisoryadvisory text - past advisorycurrent_or_past_donor_advisorycurrent_donor_advisorypast_donor_advisorylatest_entrycurrent_ratings_urlein_2016Publication_date_and_FY_2016Publication Date_2016FYE_2016donor_alert_2016overall_rating_2016efficiency_rating_rating_2016AT_rating_2016overall_rating_star_2016financial_rating_star_2016AT_rating_star_2016program_expense_percent_2016admin_expense_percent_2016fund_expense_percent_2016fund_efficiency_2016working_capital_ratio_2016program_expense_growth_2016liabilities_to_assets_2016independent_board_2016no_material_division_2016audited_financials_2016no_loans_related_2016documents_minutes_2016form_990_2016conflict_of_interest_policy_2016whistleblower_policy_2016records_retention_policy_2016CEO_listed_2016process_CEO_compensation_2016no_board_compensation_2016donor_privacy_policy_2016board_listed_2016audited_financials_web_2016form_990_web_2016staff_listed_2016contributions_gifts_grants_2016federated_campaigns_2016membership_dues_2016fundraising_events_2016related_organizations_2016government_grants_2016total_contributions_2016program_service_revenue_2016total_primary_revenue_2016other_revenue_2016total_revenue_2016program_expenses_2016administrative_expenses_2016fundraising_expenses_2016total_functional_expenses_2016payments_to_affiliates_2016excess_or_deficit_2016net_assets_2016comp_2016cp_2016mission_20162011 datacharity_name_2011category_2011city_2011state_2011cause_2011tag_line_2011url_2011ein_2011fye_2011overall_rating_2011overall_rating_2011_plus_30overall_rating_2011_plus_30_v2overall_rating_star_2011overall_rating_star_2011_textefficiency_rating_2011AT_rating_2011financial_rating_star_2011AT_rating_star_2011program_expense_percent_2011admin_expense_percent_2011fund_expense_percent_2011fund_efficiency_2011primary_revenue_growth_2011program_expense_growth_2011working_capital_ratio_2011independent_board_2011no_material_division_2011audited_financials_2011no_loans_related_2011documents_minutes_2011form_990_2011conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011CEO_listed_2011process_CEO_compensation_2011no_board_compensation_2011donor_privacy_policy_2011board_listed_2011audited_financials_web_2011form_990_web_2011staff_listed_2011primary_revenue_2011other_revenue_2011total_revenue_2011govt_revenue_2011program_expense_2011admin_expense_2011fund_expense_2011total_functional_expense_2011affiliate_payments_2011budget_surplus_2011net_assets_2011leader_comp_2011leader_comp_percent_2011email_2011website_20112016 Advisory - Date Posted2016 Advisory - Charity Name2016 Advisory - advisory_url2016 Advisory - advisory_merge_v1to_be_mergedNEW ROWNAME_2015_BMFSTREET_2015_BMFCITY_2015_BMFSTATE_2015_BMFZIP_2015_BMFRULING_2015_BMFACTIVITY_2015_BMFTAX_PERIOD_2015_BMFASSET_AMT_2015_BMFINCOME_AMT_2015_BMFREVENUE_AMT_2015_BMFNTEE_CD_2015_BMF2015 BMFruledate_2004_BMFname_MSTRALLstate_MSTRALLNTEE1_MSTRALLnteecc_MSTRALLzip_MSTRALLfips_MSTRALLtaxper_MSTRALLincome_MSTRALLF990REV_MSTRALLassets_MSTRALLruledate_MSTRALLdeductcd_MSTRALLaccper_MSTRALLrule_datetaxpdNAME_SOIyr_frmtnpt1_num_vtng_gvrn_bdy_memspt1_num_ind_vtng_memsnum_vtng_gvrn_bdy_memsnum_ind_vtng_memstot_num_emplstot_num_vlntrscontri_grnts_cyprog_srvc_rev_cyinvst_incm_cyoth_rev_cygrnts_and_smlr_amts_cytot_prof_fndrsng_exp_cytot_fndrsng_exp_cypt1_tot_asts_eoyaud_fincl_stmtsmtrl_divrsn_or_misusecnflct_int_plcywhistleblower_plcydoc_retention_plcyfederated_campaignsmemshp_duesrltd_orgsgovt_grntsall_oth_contrinncsh_contritot_contripsr_totinv_incm_tot_revbonds_tot_revroylrev_tot_revnet_rent_tot_revgain_or_loss_secgain_or_loss_othoth_rev_tottot_revmgmt_srvc_fee_totfee_for_srvc_leg_totfee_for_srvc_acct_totfee_for_srvc_lbby_totfee_for_srvc_prof_totfee_for_srvc_invst_totfee_for_srvc_oth_totfs_auditedaudit_committeevlntr_hrs_merge
016722020503776http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=16722Portsmouth Girls Softball AssociationHuman ServicesHuman Services : Multipurpose Human Service Organizations2016-08-12 00:00:00current2015-01-01currentNaNcurrentNaNcurrent (2016) donor advisory\\r\\n\\t\\tOn August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\"NaN1.01.00.0Truehttp://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=16722NaNNaNNaNcurrentcurrent donor advisory 2016NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_only0.0NaNPORTSMOUTH GIRLS SOFTBALL ASSOCIATIONPO BOX 8092PORTSMOUTHNH03802-8092201104.00.0201309.00.00.00.0N631.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2011NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_only
\n", "
" ], "text/plain": [ " org_id EIN \\\n", "0 16722 020503776 \n", "\n", " org_url \\\n", "0 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=16722 \n", "\n", " name category \\\n", "0 Portsmouth Girls Softball Association Human Services \n", "\n", " category-full \\\n", "0 Human Services : Multipurpose Human Service Organizations \n", "\n", " Date Published Form 990 FYE Form 990 FYE, v2 FYE \\\n", "0 2016-08-12 00:00:00 current 2015-01-01 current \n", "\n", " Earliest Rating Publication Date ratings_system Overall Score \\\n", "0 NaN current NaN \n", "\n", " Overall Rating \\\n", "0 current (2016) donor advisory \n", "\n", " advisory text - current advisory \\\n", "0 \\r\\n\\t\\tOn August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\" \n", "\n", " advisory text - past advisory current_or_past_donor_advisory \\\n", "0 NaN 1.0 \n", "\n", " current_donor_advisory past_donor_advisory latest_entry \\\n", "0 1.0 0.0 True \n", "\n", " current_ratings_url \\\n", "0 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=16722 \n", "\n", " ein_2016 Publication_date_and_FY_2016 Publication Date_2016 FYE_2016 \\\n", "0 NaN NaN NaN current \n", "\n", " donor_alert_2016 overall_rating_2016 \\\n", "0 current donor advisory 2016 NaN \n", "\n", " efficiency_rating_rating_2016 AT_rating_2016 overall_rating_star_2016 \\\n", "0 NaN NaN NaN \n", "\n", " financial_rating_star_2016 AT_rating_star_2016 program_expense_percent_2016 \\\n", "0 NaN NaN NaN \n", "\n", " admin_expense_percent_2016 fund_expense_percent_2016 fund_efficiency_2016 \\\n", "0 NaN NaN NaN \n", "\n", " working_capital_ratio_2016 program_expense_growth_2016 \\\n", "0 NaN NaN \n", "\n", " liabilities_to_assets_2016 independent_board_2016 no_material_division_2016 \\\n", "0 NaN NaN NaN \n", "\n", " audited_financials_2016 no_loans_related_2016 documents_minutes_2016 \\\n", "0 NaN NaN NaN \n", "\n", " form_990_2016 conflict_of_interest_policy_2016 whistleblower_policy_2016 \\\n", "0 NaN NaN NaN \n", "\n", " records_retention_policy_2016 CEO_listed_2016 process_CEO_compensation_2016 \\\n", "0 NaN NaN NaN \n", "\n", " no_board_compensation_2016 donor_privacy_policy_2016 board_listed_2016 \\\n", "0 NaN NaN NaN \n", "\n", " audited_financials_web_2016 form_990_web_2016 staff_listed_2016 \\\n", "0 NaN NaN NaN \n", "\n", " contributions_gifts_grants_2016 federated_campaigns_2016 \\\n", "0 NaN NaN \n", "\n", " membership_dues_2016 fundraising_events_2016 related_organizations_2016 \\\n", "0 NaN NaN NaN \n", "\n", " government_grants_2016 total_contributions_2016 \\\n", "0 NaN NaN \n", "\n", " program_service_revenue_2016 total_primary_revenue_2016 other_revenue_2016 \\\n", "0 NaN NaN NaN \n", "\n", " total_revenue_2016 program_expenses_2016 administrative_expenses_2016 \\\n", "0 NaN NaN NaN \n", "\n", " fundraising_expenses_2016 total_functional_expenses_2016 \\\n", "0 NaN NaN \n", "\n", " payments_to_affiliates_2016 excess_or_deficit_2016 net_assets_2016 \\\n", "0 NaN NaN NaN \n", "\n", " comp_2016 cp_2016 mission_2016 2011 data charity_name_2011 category_2011 \\\n", "0 NaN NaN NaN NaN NaN NaN \n", "\n", " city_2011 state_2011 cause_2011 tag_line_2011 url_2011 ein_2011 fye_2011 \\\n", "0 NaN NaN NaN NaN NaN NaN NaN \n", "\n", " overall_rating_2011 overall_rating_2011_plus_30 \\\n", "0 NaN NaN \n", "\n", " overall_rating_2011_plus_30_v2 overall_rating_star_2011 \\\n", "0 NaN NaN \n", "\n", " overall_rating_star_2011_text efficiency_rating_2011 AT_rating_2011 \\\n", "0 NaN NaN NaN \n", "\n", " financial_rating_star_2011 AT_rating_star_2011 \\\n", "0 NaN NaN \n", "\n", " program_expense_percent_2011 admin_expense_percent_2011 \\\n", "0 NaN NaN \n", "\n", " fund_expense_percent_2011 fund_efficiency_2011 \\\n", "0 NaN NaN \n", "\n", " primary_revenue_growth_2011 program_expense_growth_2011 \\\n", "0 NaN NaN \n", "\n", " working_capital_ratio_2011 independent_board_2011 \\\n", "0 NaN NaN \n", "\n", " no_material_division_2011 audited_financials_2011 no_loans_related_2011 \\\n", "0 NaN NaN NaN \n", "\n", " documents_minutes_2011 form_990_2011 conflict_of_interest_policy_2011 \\\n", "0 NaN NaN NaN \n", "\n", " whistleblower_policy_2011 records_retention_policy_2011 CEO_listed_2011 \\\n", "0 NaN NaN NaN \n", "\n", " process_CEO_compensation_2011 no_board_compensation_2011 \\\n", "0 NaN NaN \n", "\n", " donor_privacy_policy_2011 board_listed_2011 audited_financials_web_2011 \\\n", "0 NaN NaN NaN \n", "\n", " form_990_web_2011 staff_listed_2011 primary_revenue_2011 \\\n", "0 NaN NaN NaN \n", "\n", " other_revenue_2011 total_revenue_2011 govt_revenue_2011 \\\n", "0 NaN NaN NaN \n", "\n", " program_expense_2011 admin_expense_2011 fund_expense_2011 \\\n", "0 NaN NaN NaN \n", "\n", " total_functional_expense_2011 affiliate_payments_2011 \\\n", "0 NaN NaN \n", "\n", " budget_surplus_2011 net_assets_2011 leader_comp_2011 \\\n", "0 NaN NaN NaN \n", "\n", " leader_comp_percent_2011 email_2011 website_2011 \\\n", "0 NaN NaN NaN \n", "\n", " 2016 Advisory - Date Posted 2016 Advisory - Charity Name \\\n", "0 NaN NaN \n", "\n", " 2016 Advisory - advisory_url 2016 Advisory - advisory _merge_v1 \\\n", "0 NaN NaN left_only \n", "\n", " to_be_merged NEW ROW NAME_2015_BMF \\\n", "0 0.0 NaN PORTSMOUTH GIRLS SOFTBALL ASSOCIATION \n", "\n", " STREET_2015_BMF CITY_2015_BMF STATE_2015_BMF ZIP_2015_BMF RULING_2015_BMF \\\n", "0 PO BOX 8092 PORTSMOUTH NH 03802-8092 201104.0 \n", "\n", " ACTIVITY_2015_BMF TAX_PERIOD_2015_BMF ASSET_AMT_2015_BMF \\\n", "0 0.0 201309.0 0.0 \n", "\n", " INCOME_AMT_2015_BMF REVENUE_AMT_2015_BMF NTEE_CD_2015_BMF 2015 BMF \\\n", "0 0.0 0.0 N63 1.0 \n", "\n", " ruledate_2004_BMF name_MSTRALL state_MSTRALL NTEE1_MSTRALL nteecc_MSTRALL \\\n", "0 NaN NaN NaN NaN NaN \n", "\n", " zip_MSTRALL fips_MSTRALL taxper_MSTRALL income_MSTRALL F990REV_MSTRALL \\\n", "0 NaN NaN NaN NaN NaN \n", "\n", " assets_MSTRALL ruledate_MSTRALL deductcd_MSTRALL accper_MSTRALL rule_date \\\n", "0 NaN NaN NaN NaN 2011 \n", "\n", " taxpd NAME_SOI yr_frmtn pt1_num_vtng_gvrn_bdy_mems pt1_num_ind_vtng_mems \\\n", "0 NaN NaN NaN NaN NaN \n", "\n", " num_vtng_gvrn_bdy_mems num_ind_vtng_mems tot_num_empls tot_num_vlntrs \\\n", "0 NaN NaN NaN NaN \n", "\n", " contri_grnts_cy prog_srvc_rev_cy invst_incm_cy oth_rev_cy \\\n", "0 NaN NaN NaN NaN \n", "\n", " grnts_and_smlr_amts_cy tot_prof_fndrsng_exp_cy tot_fndrsng_exp_cy \\\n", "0 NaN NaN NaN \n", "\n", " pt1_tot_asts_eoy aud_fincl_stmts mtrl_divrsn_or_misuse cnflct_int_plcy \\\n", "0 NaN NaN NaN NaN \n", "\n", " whistleblower_plcy doc_retention_plcy federated_campaigns memshp_dues \\\n", "0 NaN NaN NaN NaN \n", "\n", " rltd_orgs govt_grnts all_oth_contri nncsh_contri tot_contri psr_tot \\\n", "0 NaN NaN NaN NaN NaN NaN \n", "\n", " inv_incm_tot_rev bonds_tot_rev roylrev_tot_rev net_rent_tot_rev \\\n", "0 NaN NaN NaN NaN \n", "\n", " gain_or_loss_sec gain_or_loss_oth oth_rev_tot tot_rev \\\n", "0 NaN NaN NaN NaN \n", "\n", " mgmt_srvc_fee_tot fee_for_srvc_leg_tot fee_for_srvc_acct_tot \\\n", "0 NaN NaN NaN \n", "\n", " fee_for_srvc_lbby_tot fee_for_srvc_prof_tot fee_for_srvc_invst_tot \\\n", "0 NaN NaN NaN \n", "\n", " fee_for_srvc_oth_tot fs_audited audit_committee vlntr_hrs _merge \n", "0 NaN NaN NaN NaN left_only " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_pickle('merged data with EIN clean-up, SOI data, and 2015, 2008, and 2004 BMF data.pkl')\n", "print \"Number of columns:\", len(df.columns)\n", "print \"Number of observations:\", len(df)\n", "df.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Show columns\n", "The variables are organized as follows. First come the organization identifiers -- *org_id* and *EIN*. These are followed by variables indicating the FY and date the ratings were posted. Then there are variables indicating the existence of a donor advisory, then all of the '2016' *Rating Profile* variables (variable names followed by '_2016') and then all the '2011 variables. After that comes the BMF data and the SOI data." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['org_id', 'EIN', 'org_url', 'name', 'category', 'category-full', 'Date Published', 'Form 990 FYE', 'Form 990 FYE, v2', 'FYE', 'Earliest Rating Publication Date', 'ratings_system', 'Overall Score', 'Overall Rating', 'advisory text - current advisory', 'advisory text - past advisory', 'current_or_past_donor_advisory', 'current_donor_advisory', 'past_donor_advisory', 'latest_entry', 'current_ratings_url', 'ein_2016', 'Publication_date_and_FY_2016', 'Publication Date_2016', 'FYE_2016', 'donor_alert_2016', 'overall_rating_2016', 'efficiency_rating_rating_2016', 'AT_rating_2016', 'overall_rating_star_2016', 'financial_rating_star_2016', 'AT_rating_star_2016', 'program_expense_percent_2016', 'admin_expense_percent_2016', 'fund_expense_percent_2016', 'fund_efficiency_2016', 'working_capital_ratio_2016', 'program_expense_growth_2016', 'liabilities_to_assets_2016', 'independent_board_2016', 'no_material_division_2016', 'audited_financials_2016', 'no_loans_related_2016', 'documents_minutes_2016', 'form_990_2016', 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016', 'CEO_listed_2016', 'process_CEO_compensation_2016', 'no_board_compensation_2016', 'donor_privacy_policy_2016', 'board_listed_2016', 'audited_financials_web_2016', 'form_990_web_2016', 'staff_listed_2016', 'contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016', 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016', 'total_contributions_2016', 'program_service_revenue_2016', 'total_primary_revenue_2016', 'other_revenue_2016', 'total_revenue_2016', 'program_expenses_2016', 'administrative_expenses_2016', 'fundraising_expenses_2016', 'total_functional_expenses_2016', 'payments_to_affiliates_2016', 'excess_or_deficit_2016', 'net_assets_2016', 'comp_2016', 'cp_2016', 'mission_2016', '2011 data', 'charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_2011_plus_30', 'overall_rating_2011_plus_30_v2', 'overall_rating_star_2011', 'overall_rating_star_2011_text', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011', '2016 Advisory - Date Posted', '2016 Advisory - Charity Name', '2016 Advisory - advisory_url', '2016 Advisory - advisory', '_merge_v1', 'to_be_merged', u'NEW ROW', 'NAME_2015_BMF', 'STREET_2015_BMF', 'CITY_2015_BMF', 'STATE_2015_BMF', 'ZIP_2015_BMF', 'RULING_2015_BMF', 'ACTIVITY_2015_BMF', 'TAX_PERIOD_2015_BMF', 'ASSET_AMT_2015_BMF', 'INCOME_AMT_2015_BMF', 'REVENUE_AMT_2015_BMF', 'NTEE_CD_2015_BMF', '2015 BMF', 'ruledate_2004_BMF', 'name_MSTRALL', 'state_MSTRALL', 'NTEE1_MSTRALL', 'nteecc_MSTRALL', 'zip_MSTRALL', 'fips_MSTRALL', 'taxper_MSTRALL', 'income_MSTRALL', 'F990REV_MSTRALL', 'assets_MSTRALL', 'ruledate_MSTRALL', 'deductcd_MSTRALL', 'accper_MSTRALL', 'rule_date', 'taxpd', 'NAME_SOI', 'yr_frmtn', 'pt1_num_vtng_gvrn_bdy_mems', 'pt1_num_ind_vtng_mems', 'num_vtng_gvrn_bdy_mems', 'num_ind_vtng_mems', 'tot_num_empls', 'tot_num_vlntrs', 'contri_grnts_cy', 'prog_srvc_rev_cy', 'invst_incm_cy', 'oth_rev_cy', 'grnts_and_smlr_amts_cy', 'tot_prof_fndrsng_exp_cy', 'tot_fndrsng_exp_cy', 'pt1_tot_asts_eoy', 'aud_fincl_stmts', 'mtrl_divrsn_or_misuse', 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy', 'federated_campaigns', 'memshp_dues', 'rltd_orgs', 'govt_grnts', 'all_oth_contri', 'nncsh_contri', 'tot_contri', 'psr_tot', 'inv_incm_tot_rev', 'bonds_tot_rev', 'roylrev_tot_rev', 'net_rent_tot_rev', 'gain_or_loss_sec', 'gain_or_loss_oth', 'oth_rev_tot', 'tot_rev', 'mgmt_srvc_fee_tot', 'fee_for_srvc_leg_tot', 'fee_for_srvc_acct_tot', 'fee_for_srvc_lbby_tot', 'fee_for_srvc_prof_tot', 'fee_for_srvc_invst_tot', 'fee_for_srvc_oth_tot', 'fs_audited', 'audit_committee', 'vlntr_hrs', '_merge']\n" ] } ], "source": [ "print df.columns.tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
**_Note_**: Only 9,951 of the original rows have SOI data; another 1,013 rows are *SOI data only*, but these could be useful for additional tests as well as filling in the blanks with, for instance, SOX data for orgs with current donor advisories." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "left_only 73994\n", "both 9951\n", "right_only 1013\n", "Name: _merge, dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['_merge'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Check how many rows have 2011 data\n", "#Yes, there are 4,857 unique *org_ids* with 2011 data." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4863\n", "4857\n" ] } ], "source": [ "#print len(df[df['2011 data']==1])\n", "#print len(set(df[df['2011 data']==1]['org_id'].tolist()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### FYE\n", "- We have *BMF* data for 2004, 2008, and 2015\n", "- We have *SOI* data for 2008 through 2013\n", "- We have *CN* data for 2002 through 2016\n", "\n", "These observations span a broad range of fiscal years, going back as early as FY2000.\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "FY2014 15044\n", "FY2013 8572\n", "FY2009 7865\n", "FY2012 7703\n", "FY2010 7092\n", "FY2011 6807\n", "FY2008 4770\n", "FY2007 4441\n", "FY2006 4315\n", "FY2005 4117\n", "FY2004 3887\n", "FY2003 3178\n", "FY2015 2367\n", "FY2002 2190\n", "FY2001 1698\n", "FY2000 591\n", "current 321\n", "Name: FYE, dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['FYE'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Time-Invariant Controls\n", "Age, State, Category" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Age\n", "The ruling date values have already been incorporated into *rule_date* in the prior notebook. After that we were missing the *rule_date* information for 73 observations (that number is higher now given the 1,013 rows with only SOI data). All of those 73 are organizations with a current donor advisory. Let's double check that there are no more BMF ruling date values to incorporate and then merge in SOI *yr_frmtn* values." ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "83686\n" ] }, { "data": { "text/plain": [ "0 201104\n", "1 200812\n", "2 200812\n", "Name: RULING_2015_BMF, dtype: float64" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print df['RULING_2015_BMF'].value_counts().sum()\n", "df['RULING_2015_BMF'][:3]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "80768\n" ] }, { "data": { "text/plain": [ "0 NaN\n", "1 199608\n", "2 199608\n", "Name: ruledate_2004_BMF, dtype: float32" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print df['ruledate_2004_BMF'].value_counts().sum()\n", "df['ruledate_2004_BMF'][:3]" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 NaN\n", "1 199608\n", "2 199608\n", "Name: ruledate_MSTRALL, dtype: object" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print df['ruledate_MSTRALL'].value_counts().sum()\n", "df['ruledate_MSTRALL'][:3]" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10840\n" ] }, { "data": { "text/plain": [ "156 1874\n", "157 1874\n", "158 1874\n", "Name: yr_frmtn, dtype: float64" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print df['yr_frmtn'].value_counts().sum()\n", "df[df['yr_frmtn'].notnull()]['yr_frmtn'][:3]" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "83872\n" ] }, { "data": { "text/plain": [ "0 2011\n", "1 1996\n", "2 1996\n", "Name: rule_date, dtype: object" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print df['rule_date'].value_counts().sum()\n", "df['rule_date'][:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Check for missing ruling date information in SOI and BMF data" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1086\n", "989\n", "0\n", "0\n", "0\n" ] } ], "source": [ "print len(df[df['rule_date'].isnull()])\n", "print len(df[(df['rule_date'].isnull()) & df['yr_frmtn'].notnull()])\n", "print len(df[(df['rule_date'].isnull()) & df['ruledate_MSTRALL'].notnull()])\n", "print len(df[(df['rule_date'].isnull()) & df['ruledate_2004_BMF'].notnull()])\n", "print len(df[(df['rule_date'].isnull()) & df['RULING_2015_BMF'].notnull()])" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64\n", "object\n" ] } ], "source": [ "print df['yr_frmtn'].dtype\n", "print df['rule_date'].dtype" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1086\n", "97\n" ] } ], "source": [ "print len(df[df['rule_date'].isnull()])\n", "df['rule_date'] = np.where( ( df['rule_date'].isnull() & df['yr_frmtn'].notnull() ), \n", " df['yr_frmtn'].astype('str'), df['rule_date']\n", " )\n", "print len(df[df['rule_date'].isnull()])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 2011\n", "1 1996\n", "Name: rule_date, dtype: object" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['rule_date'].notnull()]['rule_date'][:2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
67 rows have a *rule_date* of 0.0. We need to delete those values." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "67\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rule_datename
17900.0Alaska Wilderness League
17910.0Alaska Wilderness League
\n", "
" ], "text/plain": [ " rule_date name\n", "1790 0.0 Alaska Wilderness League\n", "1791 0.0 Alaska Wilderness League" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['rule_date']=='0.0'])\n", "#df[df['rule_date']=='0.0'][['age', 'rule_date', 'name']][:2]\n", "df[df['rule_date']=='0.0'][['rule_date', 'name']][:2]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84861\n", "0\n", "84794\n" ] } ], "source": [ "print df['rule_date'].value_counts().sum()\n", "df['rule_date'] = np.where(df['rule_date']=='0.0', np.nan, df['rule_date'])\n", "print len(df[df['rule_date']=='0.0'])\n", "print df['rule_date'].value_counts().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Same with 5 rows where value is ''" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84794\n", "0\n", "84789\n" ] } ], "source": [ "print df['rule_date'].value_counts().sum()\n", "df['rule_date'] = np.where(df['rule_date']=='', np.nan, df['rule_date'])\n", "print len(df[df['rule_date']=='0.0'])\n", "print df['rule_date'].value_counts().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Some values have decimals so let's restrict variable to first four characters." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df['rule_date'] = df['rule_date'].str[:4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Given the above deletions, let's see if we can grab a few more ruling dates." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "169\n", "161\n" ] } ], "source": [ "print len(df[df['rule_date'].isnull()])\n", "df['rule_date'] = np.where( ( df['rule_date'].isnull() & df['yr_frmtn'].notnull() ), \n", " df['yr_frmtn'].astype('str'), df['rule_date']\n", " )\n", "print len(df[df['rule_date'].isnull()])" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "zip_MSTRALL object\n", "fips_MSTRALL object\n", "taxper_MSTRALL object\n", "income_MSTRALL float64\n", "F990REV_MSTRALL float64\n", "assets_MSTRALL float64\n", "ruledate_MSTRALL object\n", "deductcd_MSTRALL object\n", "accper_MSTRALL object\n", "rule_date object\n", "taxpd object\n", "NAME_SOI object\n", "yr_frmtn float64\n", "pt1_num_vtng_gvrn_bdy_mems float64\n", "pt1_num_ind_vtng_mems float64\n", "num_vtng_gvrn_bdy_mems float64\n", "num_ind_vtng_mems float64\n", "tot_num_empls float64\n", "tot_num_vlntrs float64\n", "contri_grnts_cy float64\n", "dtype: object" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes[160:180]" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rule_date
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [rule_date]\n", "Index: []" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df[df['rule_date']=='1996.0'][['rule_date']]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84797\n", "12\n", "0\n", "84785\n" ] } ], "source": [ "print df['rule_date'].value_counts().sum()\n", "print len(df[df['rule_date']=='0000'])\n", "df['rule_date'] = np.where(df['rule_date']=='0000', np.nan, df['rule_date'])\n", "print len(df[df['rule_date']=='0000'])\n", "print df['rule_date'].value_counts().sum()" ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for index, row in df.iterrows():\n", " if pd.notnull(row['rule_date']):\n", " df.ix[index, 'age'] = 2016 - int(row['rule_date'])\n", " else:\n", " pass" ] }, { "cell_type": "code", "execution_count": 134, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 84785.000000\n", "mean 41.056932\n", "std 20.262034\n", "min 0.000000\n", "25% 25.000000\n", "50% 36.000000\n", "75% 53.000000\n", "max 162.000000\n", "Name: age, dtype: float64" ] }, "execution_count": 134, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['age'].describe()" ] }, { "cell_type": "code", "execution_count": 135, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84785\n", "84785\n" ] } ], "source": [ "print df['rule_date'].value_counts().sum()\n", "print df['age'].value_counts().sum()\n", "#df['age'].value_counts()" ] }, { "cell_type": "code", "execution_count": 137, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAECCAYAAADU5FG5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGZhJREFUeJzt3X+cXXV95/FXCAESMwmNTtitCFlS+WDdLgqtLUoBfYAK\ndqXtoxUf1Iq4hi0bWWxXq2DxsbWN2KK20O3SCigIrfXHrhZLAdmllURKCyraKH6IiYH1UQtDZphk\nSAQyM/vHuRNOJjOZM3PmzL135vV8PPLIvd977p3P3Hvnvu/5fr/nexaNjo4iSdJMHdLuAiRJ3c0g\nkSTVYpBIkmoxSCRJtRgkkqRaDBJJUi2HNv0DIuJrwGDr6veBDwE3AiPA5sxc39puHXAR8CywITNv\ni4gjgFuA1cBO4ILM3NF0zZKk6hY1eRxJRBwO3JuZJ5fa/hr4SGZujIhrgTuA+4C7gJOAZcAm4GTg\nnUBPZn4wIs4DTsnMdzVWsCRp2preIzkReF5E3AksBt4PnJSZG1u33w68lmLvZFNm7gV2RsSW1n1P\nBf6gtO0VDdcrSZqmpsdIdgNXZebrgIuBvwAWlW7fBawAeniu+wtgCFg5rn1sW0lSB2k6SB6mCA8y\ncwuwAziqdHsP8CTF+MeKce0DrfaecdtKkjpI011bbwd+ClgfET9OERZfjojTM/MrwNnA3cD9wIaI\nOAxYCpwAbAbuBc4BHmj9v/HAH7G/0dHR0UWLFk21mSRpfzP+4Gx6sH0J8EngWIpxkN+m2Cu5HlgC\nPASsy8zRiPhPwH+m+GU2ZOYXI2IpcBPwb4GngfMz8/EpfuxoX9+uRn6fmert7cGaptaJNUFn1mVN\n1VhTdb29PTMOkkb3SDLzWeAtE9x0xgTb3gDcMK5tD/CmRoqTJM0KD0iUJNVikEiSajFIJEm1GCSS\npFoMEklSLQaJJKkWg0SSVItBIkmqpfHzkehAw8PDbN++bd/1NWuOY/HixW2sSJJmziBpg+3bt3Hp\nVbeybOVqdg8+ztXveSNr17643WVJ0owYJG2ybOVqlv/YC9tdhiTV5hiJJKkWg0SSVItBIkmqxSCR\nJNXiYHsXcxqxpE5gkHQxpxFL6gQGSZdzGrGkdnOMRJJUi0EiSarFIJEk1eIYSZcpz9R69NFH2lyN\nJBkkXac8U2vHDx7i+Ue/pN0lSVrg7NrqQmMztZb2rGp3KZJkkEiS6jFIJEm1GCSSpFoMEklSLQaJ\nJKkWg0SSVItBIkmqxSCRJNVikEiSajFIJEm1GCSSpFoMEklSLQaJJKmWxpeRj4jVwAPAmcAwcCMw\nAmzOzPWtbdYBFwHPAhsy87aIOAK4BVgN7AQuyMwdTdcrSZqeRvdIIuJQ4M+A3a2mjwGXZ+bpwCER\ncW5EHAVcApwCvB64MiKWABcD38rM04CbgSuarFWSNDNNd219BLgW+BdgEXBSZm5s3XY7cBbwCmBT\nZu7NzJ3AFuBE4FTgjtK2ZzZca0cZHh5m69Yt+/4NDw+3uyRJmlBjQRIRbwMez8y7KEJk/M/bBawA\neoDBUvsQsHJc+9i2C8bYmRAv+/h9XHrVrftOrytJnabJMZILgZGIOItiD+NTQG/p9h7gSYrxjxXj\n2gda7T3jtq2kt7dn6o3mWLmmgYHl+922atXyA2oeGFi+70yI5W3G33eqx6laU6foxJqgM+uypmqs\nqXmNBUlrHASAiLgb+A3gqog4LTPvAc4G7gbuBzZExGHAUuAEYDNwL3AOxUD9OcBGKurr2zVbv8as\n6O3t2a+m/v6h/W7v7x86oObJthnfPtXjVK2pE3RiTdCZdVlTNdZUXZ1wm+vpv+8GPhgRXwWWAJ/P\nzMeAa4BNwP+hGIx/hmJs5d9HxEbgHcDvznGtkqQKGp/+C5CZryldPWOC228AbhjXtgd4U7OVSZLq\n8oBESVItBokkqRaDRJJUi0EiSarFIJEk1WKQSJJqMUgkSbUYJJKkWgwSSVItBokkqRaDRJJUi0Ei\nSaplThZtXKiGh4fZvn0bAwPL6e8fYs2a41i8eHG7y5KkWWWQNGjsLIfLVq5m9+DjXP2eN7J27Yvb\nXZYkzSqDpGHlsxxK0nzkGIkkqRaDRJJUi0EiSarFIJEk1WKQSJJqMUgkSbUYJJKkWgwSSVItBokk\nqRaDRJJUi0EiSarFIJEk1WKQSJJqMUgkSbUYJJKkWgwSSVItBokkqRaDRJJUi0EiSarFIJEk1WKQ\nSJJqObTJB4+IQ4DrgABGgN8AngZubF3fnJnrW9uuAy4CngU2ZOZtEXEEcAuwGtgJXJCZO5qseT4Y\nHh5m+/Zt+66vWXMcixcvbmNFkuazpvdI/iMwmpmnAlcAHwI+BlyemacDh0TEuRFxFHAJcArweuDK\niFgCXAx8KzNPA25uPYamsH37Ni696lYu+/h9XHrVrfuFiiTNtkaDJDP/mmIvA+BYYAA4KTM3ttpu\nB84CXgFsysy9mbkT2AKcCJwK3FHa9swm651Plq1czfIfeyHLVq5udymS5rlKQRIRfxsRv9raS5iW\nzByJiBuBa4C/BBaVbt4FrAB6gMFS+xCwclz72LaSpA5SdY/kwxRdTlsi4k8j4mem80My823A8cD1\nwNLSTT3AkxTjHyvGtQ+02nvGbStJ6iCVBtsz8x7gnohYCvwK8L8iYidFMFybmU9PdL+IeAtwdGZ+\nGPgRMAw8EBGnZ+ZXgLOBu4H7gQ0RcRhF0JwAbAbuBc4BHmj9v/HAn3Kg3t6eqTeaAwMDy/e7vmrV\ncnp7eyZtn8l9Z/r40DnPU1kn1gSdWZc1VWNNzas8aysizgB+HXgtxXjFZyjGN24FXjfJ3f438MmI\n+ErrZ/1X4LvA9a1usoeAz2fmaERcA2yi6Pq6PDOfiYhrgZsiYiPFbK/zq9Ta17er6q/VqP7+oQOu\n9/XtmrR9Jved6eP39vZ0zPM0phNrgs6sy5qqsabq6oRbpSCJiEeAbcAngXdm5p5W+99T7E1MKDN3\nA+dNcNMZE2x7A3DDuLY9wJuq1ChJao+qeySvAXZl5uMRsTQifiIzv5eZw8BJDda3oJSP/3j00Ufa\nXI0kVVM1SN4AvI0iNFYDX4qIP8rMjzdV2EI0dvzHspWr2fGDh3j+0S9pd0mSNKWqs7YuAn4eIDMf\nAU6mOIBQs2zs+I+lPavaXYokVVI1SJZQDHaPeQYYnf1yJEndpmrX1heBuyPis63rv0wxW0uStMBV\n2iPJzPdSHJkewHHANZn5O00WJknqDtNZ/fch4DFaS5xExGmtAxXVJcqzwgYGlrNixWpXBZZUW9Xj\nSP6UYiXfraXmUYppwSrp5Cm85Vlhuwcf5+r3vJG1a1+83zYuQS9puqrukbwWiLEDETW5Tp/COzYr\nbDJVwkaSyqoGyTb2X7VXBzH2Yb178LF2lzIjU4WNJJVVDZJ+4DsRcS/F4osAZObbG6lKktQ1qgbJ\nHTx3gilJkvapuoz8TRGxBngpcCfwosz8fpOFSZK6Q9UzJJ4HfAm4GlgF/EPrXCOSpAWu6hIp7wVe\nSWsFYODlwGWNVSVJ6hpVg2Q4M/ediSUzfwiMNFOSJKmbVB1s/3ZEvBNYEhEvA/4L8GBzZUmSukXV\nPZL1wAuBPcAngJ0UYSJJWuCqztp6imJMxHERSdJ+qq61NcKB5x/5YWYePfslSZK6SdU9kn1dYBGx\nBPhF4JSmipIkdY+qYyT7ZOazmfk5XPlXkkT1rq23lq4uojjC/ZlGKpIkdZWq039fXbo8CjwBnDf7\n5UiSuk3VMZILmy5EktSdqnZtfZ8DZ21B0c01mpnHzWpVatzoyMh+Z3D0TIiSZqpq19ZfAk8D1wHP\nAr8G/Azw/obqUsP27Orjo595gmUrf+iZECXVUjVIXpeZP126fnVEfC0zO+uk5JoWz4QoaTZUnf67\nKCLOHLsSEb9AsUyKJGmBq7pHchHwqYj4NxRjJd8FLmisKklS16g6a+trwEsj4gXAjzJzqNmyJEnd\nouoZEo+NiLuAfwCWR8TdrVPvSpIWuKpjJH8OXAUMAY8BnwY+1VRR2t/YVN2tW7fsN2VXkjpB1TGS\nF2TmlyPiDzJzFLguItY3Wdh8Uz5uY7phUJ6qu+MHD/H8o1/SRImSNCNVg2RPRBxN66DEiDiV4rgS\nVVQ3DMam6u4efKyhCiVpZqoGyW8CfwOsjYgHgVXArzZW1TxlGEiaj6oGyVEUR7IfDywGvpuZrv4r\nSaocJH+YmbcB3676wBFxKMX53dcAhwEbgO8ANwIjwObMXN/adh3FsSrPAhsy87aIOAK4BVhNcfDj\nBZm5o+rPlyTNjapBsjUiPgH8I7BnrDEzDzZz6y3AE5n51og4Evgm8CBweWZujIhrI+Jc4D7gEuAk\nYBmwKSK+DFwMfCszPxgR5wFXAO+a5u8nSWrYQaf/RsTYQkw7KFb6/TmKc5O8Gjhjisf+LMWHPxTd\nYXuBkzJzY6vtduAs4BXApszcm5k7gS3AicCpwB2lbfct0SJJ6hxT7ZF8ieLD/8KI+G+Z+dGqD5yZ\nuwEiogf4HMVKwR8pbbILWAH0AIOl9iFg5bj2sW0lSR1mqgMSF5Uu/9p0HzwiXgTcDdyUmX9FMTYy\npgd4kmL8Y8W49oFWe8+4bTvG8PAwW7du2fdveHi43SVJUltMtUdSPpnVokm3mkBEHAXcCazPzL9r\nNX8jIk7LzHuAsylC5n5gQ0QcBiwFTgA2A/cC5wAPtP7fSEW9vT1Tb1TTww8/zKVX3cqylavZPfg4\nN195PscffzwDA8un/VirVi2nt7dnRvet8jhVHn+q+1ZRhOvWfdfXrl077ZNlzcVrNxOdWJc1VWNN\nzas62A4TnyHxYC4DjgSuiIgPtO5/KfAnEbEEeAj4fGaORsQ1wCaKsLo8M5+JiGuBmyJiI8XBj+dX\n/cF9fbumWer09fcP7Xc+j/7+Ifr6dtHfP731LEdHRnjwwW/T3z9Ue/mTyWqoUttU961i69Yt+4Xr\ndE+W1dvbMyev3XR1Yl3WVI01VVcn3KYKkpdGxLbW5ReWLk95it3MfBcTz7I6Y4JtbwBuGNe2B3jT\nFPV1vW5f/mR4eJjt24u3xaOPPuLJsqQFaKogOX5OqljguvmI9+3bt+3bC+nGIJRU30GDxFPpqopu\nCMLynhPAmjXHTXv8RtLEpjNGInWt8p7TTMZvJE3OING8crA9D8dvpGYYJJpX3POQ5p5BonnHPQ9p\nbhkkqsTBakmTMUhUSbnL6Kkn/5V3v/nlHHPMsZ5DXpJBourK03w/+plvzuggSvdspPnHINGMzPTY\nEQfDpfnHINGcczBcml8MknlidGRk33iF4xaS5pJBMk90++KPkrqXQTKPdMOaV1U4IC91F4NkFnRy\nt1In1zYZB+Sl7mKQzIJO7lbq5NoOxgF5qXsYJLOkk7uVOrm2mbL7S+ocBom6kt1fUucwSNS17P6S\nOsMh7S5AktTdDBJJUi0GiSSpFoNEklSLg+3TUJ5y2i0H93WL4eFhHn74Yfr7h3xupS5jkExDecpp\nNx3c1w18bqXuZdfWNI1NOV3as6rdpcw7PrdSdzJIJEm12LWlrlzYUVLnMEjUNQs7OtlB6kwGiYDu\nWNjRAXmpMzlGoq7igLzUedwjkVpcml6aGYNEjSgP4EN3fCi7NL00MwaJGlEewO+mD2WXppemzyBR\nY/xQlhYGB9slSbUYJJKkWhrv2oqInwU+nJmvjoi1wI3ACLA5M9e3tlkHXAQ8C2zIzNsi4gjgFmA1\nsBO4IDN3NF2vJGl6Gt0jiYj3ANcBh7eaPgZcnpmnA4dExLkRcRRwCXAK8HrgyohYAlwMfCszTwNu\nBq5oslZJ0sw03bX1PeCXStdPzsyNrcu3A2cBrwA2ZebezNwJbAFOBE4F7ihte2bDtUqSZqDRIMnM\nLwB7S02LSpd3ASuAHmCw1D4ErBzXPratJKnDzPX035HS5R7gSYrxjxXj2gda7T3jtq2kt7dn6o1m\nYGBgeSOP26lGR0YYHOxjYGA5g4N9tR5r1arl9Pb2HPAcTtY+W9tM1V52sG2aek/VYU3VWFPz5jpI\nvh4Rp2XmPcDZwN3A/cCGiDgMWAqcAGwG7gXOAR5o/b9x4oc8UF/frtmuG4D+/qFGHrdT7dnVxwc+\n/gTLVm6tvUhif/8QfX27DngOJ2ufrW2mah/fNtE2vb09jb2nZsqaqrGm6uqE21wHybuB61qD6Q8B\nn8/M0Yi4BthE0fV1eWY+ExHXAjdFxEbgaeD8Oa5VdMeqwE0bHh5m69Yt+653w3Iv0lxqPEgy8xHg\nla3LW4AzJtjmBuCGcW17gDc1XZ80la1bt7oGl3QQLpEiVeByL9LkPLJdklSLeyRacLpxiXupkxkk\nWnC6dYl7qVMZJFqQHPOQZo9jJJKkWtwjmYDn7p4b5bGK8piFpO5ikEzAc3fPjfJYRd0j5yW1j0Ey\nCfvQ54ZHzkvdzzESSVItBokkqRaDRJJUi2Mkkg7KWYyaikGijuYU4fZzFqOmYpCoo1WZImzYNM9Z\njDoYg0SNq/tBP9UUYY9HkdrLIFHj5uKD3uNRpPZx1pbmxNgH/dKeVe0uRdIsM0gkSbUYJJKkWhwj\nmYIzgrqXr500NwySKTgjqHv52klzw66tChwo7l6+dlLzDBJJUi12bUnTUB53AdedksAgkaalPO7i\nulNSwSCRpsl1p6T9GSRa0JwiLNVnkGhBc4qwVJ9BogVvthd89ERQWmgMEmmWTXYiKANG85VBIk2g\nPHYyONg37ftPNCDvmQY1XxkkLeVviw66arpjJ5O9f8YP5jvjS/ORQdJS/rbooKtgemMnk71/HMzX\nQuASKSWuy6Q6Jnv/+L7SfOceidRmDsKr23V0kETEIuB/AicCPwLekZnbDn6vgyv/0Q4PDwOLWLz4\nEMdF1DblbrGnnvxX3v3ml3PMMccChoq6Q0cHCfCLwOGZ+cqI+FngY622GRvfl7205/mOi2hG6hwV\nP9kg/O7Bx/joZ77pWl7qKp0eJKcCdwBk5j9GxE9XvePBugvKf7SzfTCaFo46A+kHu68zu9RtOj1I\nVgCDpet7I+KQzByZ7A7XXv9pnnrqaZ4aGuBvvz7onH01qs4Xkenet/zl6IknltLf/xSLFxfzZca+\nKE3WdVveRpptnR4kO4Ge0vWDhgjA7fd9n+G9Izzd/z1YsmZfe7kbYffg4wDs2dUPLPKylzvy8u7B\nx/frMnv00Uf4/evu4ojlqxh8bBuHP+9Ijli+ih8N9fM7687imGOOrbTNdJX/ZsbXVDYwsJz+/qFp\nP36TuqGm+fAFd9Ho6Gi7a5hURPwy8AuZ+faI+Dngisx8Q7vrkiQ9p9P3SL4AnBURX21dv7CdxUiS\nDtTReySSpM7nke2SpFoMEklSLQaJJKkWg0SSVEunz9qqpIk1uWrUcijwCWANcBiwAfgOcCMwAmzO\nzPVtqm018ABwJjDc7poi4n3AG4ElFK/fPe2sqfXa3UTx2u0F1tHG56m1LNCHM/PVEbF2ojoiYh1w\nEfAssCEzb5vjul4GXEPxfD0NvDUz++a6rnJNpbbzgXdm5itb19tWU0T0AtcBRwKLKZ6n77e5ppcB\n17Z+9sOZ+Y7WNtOuab7skexbkwu4jGJNrnZ5C/BEZp4GvB74H616Ls/M04FDIuLcuS6q9SH5Z8Du\nVlNba4qI04FTWq/ZGcAx7a4JOAdYnJmvAn4P+FC7aoqI91B88Bzeajqgjog4CrgEOIXivXZlRCyZ\n47r+GFifma+hmK7/3rmua4KaiIiXA28vXW93TX8I3JKZZwBXACd0QE0fAP5767PqiIh4w0xrmi9B\nst+aXEDlNbka8FmKNwoU3zz2Aidl5sZW2+0UewRz7SMU3z7+heLQ6XbX9Dpgc0R8EbgV+JsOqOlh\n4NDWHu5Kim9k7arpe8Avla6fPK6Os4BXAJsyc29m7gS2AP9hjus6LzP/uXX5UIoegbmua7+aIuL5\nwO8Dl5a2aWtNwKuAoyPiLuB84O87oKZvAC9ovd97KN7vM6ppvgTJhGtytaOQzNydmU9FRA/wOeD9\njK15UdhF8SE1ZyLibcDjmXlXqZby8zPnNQEvAE4GfgW4GPiLDqhpCPh3wHeBP6fosmnLa5eZX6D4\nEjJmfB0rKP74y+/7IRqub3xdmfkYQES8ElgP/BEH/j02Wle5ptbf/fXAbwFPlTZrW00ta4D+zDwL\n+H/A+zqgpi0U7/FvA6spwm1GNc2XIJn2mlxNiogXAXcDN2XmX1H0a4/pAZ6c45IupFgh4O8oxpE+\nBfS2uaYdwJ2tbz4PU3yTLb9h21HTbwJ3ZGbw3PN0WJtrGjPRe2gnxR/++PY5FRHnUYxxnZOZO9pc\n10nAT1DsfX8a+MmI+Fiba4Li/f6l1uUvUfSaDLa5pquBV2XmTwI3U3Sfzqim+RIkX6Xo36a1Jtc/\nH3zz5rT6GO8Efjszb2o1fyMiTmtdPhvYOOGdG5KZp2fmq1sDkQ8Cvw7c3s6agE0UfbBExI8DzwP+\nb2vspF019fPct7EnKbpqvtHmmsZ8fYLX637g1Ig4LCJWAicAm+eyqIh4C8WeyBmZObaa4z+1qa5F\nmflAZv5Ua8zmzcB3MvO32ljTmI20PqOA01o/u92v3w6KvVsouryPnGlN82LWFp21JtdlFC/IFRHx\nAWCUoq/2T1qDVg8Bn29jfWPeDVzXrpoy87aI+PmI+CeKbpuLge3A9W18nv4Y+ERE3EMxk+x9wNfa\nXNOYA16vzByNiGsoQnkRxWD8M3NVUKsb6WrgEeALETEKfCUzf7dNdU263lNmPtbO54ri9bs+Ii6m\n+LJyfmYOtrmmdcBnIuJZ4Blg3UyfJ9fakiTVMl+6tiRJbWKQSJJqMUgkSbUYJJKkWgwSSVItBokk\nqRaDRJJUi0EiSarl/wOInz7bbzi5dAAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from pylab import*\n", "%matplotlib inline\n", "#df['age'].plot(kind='bar')\n", "df[df['age'].notnull()]['age'].plot.hist(by=None, bins=100)" ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 84797.000000\n", "mean 41.336415\n", "std 31.022303\n", "min 0.000000\n", "25% 25.000000\n", "50% 36.000000\n", "75% 53.000000\n", "max 2016.000000\n", "Name: age, dtype: float64" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['age'].describe()" ] }, { "cell_type": "code", "execution_count": 141, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnameyr_frmtnFYEagerule_date
84113520715244NaN1854FY20081621854
84377350868211NaN1854FY20091621854
84433520715244NaN1854FY20091621854
\n", "
" ], "text/plain": [ " EIN name yr_frmtn FYE age rule_date\n", "84113 520715244 NaN 1854 FY2008 162 1854\n", "84377 350868211 NaN 1854 FY2009 162 1854\n", "84433 520715244 NaN 1854 FY2009 162 1854" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['age']>160][['EIN', 'name', 'yr_frmtn', 'FYE', 'age', 'rule_date']]" ] }, { "cell_type": "code", "execution_count": 144, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnameyr_frmtnFYEagerule_date
45332520715244Little Sisters of the Poor at St. Martin's BaltimoreNaNFY2014701946
45333520715244Little Sisters of the Poor at St. Martin's Baltimore1854FY2013701946
45334520715244Little Sisters of the Poor at St. Martin's BaltimoreNaNFY2012701946
45335520715244Little Sisters of the Poor at St. Martin's BaltimoreNaNFY2011701946
45336520715244Little Sisters of the Poor at St. Martin's BaltimoreNaNFY2010701946
84113520715244NaN1854FY2008701946
84433520715244NaN1854FY2009701946
\n", "
" ], "text/plain": [ " EIN name \\\n", "45332 520715244 Little Sisters of the Poor at St. Martin's Baltimore \n", "45333 520715244 Little Sisters of the Poor at St. Martin's Baltimore \n", "45334 520715244 Little Sisters of the Poor at St. Martin's Baltimore \n", "45335 520715244 Little Sisters of the Poor at St. Martin's Baltimore \n", "45336 520715244 Little Sisters of the Poor at St. Martin's Baltimore \n", "84113 520715244 NaN \n", "84433 520715244 NaN \n", "\n", " yr_frmtn FYE age rule_date \n", "45332 NaN FY2014 70 1946 \n", "45333 1854 FY2013 70 1946 \n", "45334 NaN FY2012 70 1946 \n", "45335 NaN FY2011 70 1946 \n", "45336 NaN FY2010 70 1946 \n", "84113 1854 FY2008 70 1946 \n", "84433 1854 FY2009 70 1946 " ] }, "execution_count": 144, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.set_value(84113, 'rule_date', 1946)\n", "df.set_value(84433, 'rule_date', 1946)\n", "df.set_value(84113, 'age', 2016-1946)\n", "df.set_value(84433, 'age', 2016-1946)\n", "df[df['EIN']=='520715244'][['EIN', 'name', 'yr_frmtn', 'FYE', 'age', 'rule_date']]" ] }, { "cell_type": "code", "execution_count": 146, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnameyr_frmtnFYEagerule_date
82938350868211YMCA of Greater IndianapolisNaNFY2014721944
82939350868211YMCA of Greater IndianapolisNaNFY2014721944
82940350868211YMCA of Greater Indianapolis1854FY2013721944
82941350868211YMCA of Greater Indianapolis1854FY2012721944
82942350868211YMCA of Greater Indianapolis1854FY2011721944
82943350868211YMCA of Greater Indianapolis1854FY2010721944
82944350868211YMCA of Greater Indianapolis1854FY2008721944
82945350868211YMCA of Greater Indianapolis1854FY2008721944
82946350868211YMCA of Greater IndianapolisNaNFY2007721944
82947350868211YMCA of Greater IndianapolisNaNFY2006721944
82948350868211YMCA of Greater IndianapolisNaNFY2005721944
82949350868211YMCA of Greater IndianapolisNaNFY2004721944
82950350868211YMCA of Greater IndianapolisNaNFY2003721944
82951350868211YMCA of Greater IndianapolisNaNFY2002721944
84377350868211NaN1854FY2009721944
\n", "
" ], "text/plain": [ " EIN name yr_frmtn FYE age \\\n", "82938 350868211 YMCA of Greater Indianapolis NaN FY2014 72 \n", "82939 350868211 YMCA of Greater Indianapolis NaN FY2014 72 \n", "82940 350868211 YMCA of Greater Indianapolis 1854 FY2013 72 \n", "82941 350868211 YMCA of Greater Indianapolis 1854 FY2012 72 \n", "82942 350868211 YMCA of Greater Indianapolis 1854 FY2011 72 \n", "82943 350868211 YMCA of Greater Indianapolis 1854 FY2010 72 \n", "82944 350868211 YMCA of Greater Indianapolis 1854 FY2008 72 \n", "82945 350868211 YMCA of Greater Indianapolis 1854 FY2008 72 \n", "82946 350868211 YMCA of Greater Indianapolis NaN FY2007 72 \n", "82947 350868211 YMCA of Greater Indianapolis NaN FY2006 72 \n", "82948 350868211 YMCA of Greater Indianapolis NaN FY2005 72 \n", "82949 350868211 YMCA of Greater Indianapolis NaN FY2004 72 \n", "82950 350868211 YMCA of Greater Indianapolis NaN FY2003 72 \n", "82951 350868211 YMCA of Greater Indianapolis NaN FY2002 72 \n", "84377 350868211 NaN 1854 FY2009 72 \n", "\n", " rule_date \n", "82938 1944 \n", "82939 1944 \n", "82940 1944 \n", "82941 1944 \n", "82942 1944 \n", "82943 1944 \n", "82944 1944 \n", "82945 1944 \n", "82946 1944 \n", "82947 1944 \n", "82948 1944 \n", "82949 1944 \n", "82950 1944 \n", "82951 1944 \n", "84377 1944 " ] }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.set_value(84377, 'rule_date', 1944)\n", "df.set_value(84377, 'age', 2016-1944)\n", "df[df['EIN']=='350868211'][['EIN', 'name', 'yr_frmtn', 'FYE', 'age', 'rule_date']]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.set_value(84113, 'rule_date', 1946)\n", "df.set_value(84433, 'rule_date', 1946)\n", "df.set_value(84113, 'age', 2016-1946)\n", "df.set_value(84433, 'age', 2016-1946)\n", "df[df['EIN']=='520715244'][['EIN', 'name', 'yr_frmtn', 'FYE', 'age', 'rule_date']]" ] }, { "cell_type": "code", "execution_count": 147, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 147, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAECCAYAAADU5FG5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGclJREFUeJzt3X+UZGV95/H3MAwy4/SMDvagEYFI9IvrZomQmGAIPzyi\nQrJichLxEBV1Axt2dDFZjILBc+I6/ghqFrJZEgEFJWtQdzUQFpRdEumRkIDxRybi13GGgXVFaKab\nnh5mhJnu3j9uVVNT09NdXbeq61b3+3XOnKn71K2qb3d196ee+zz3ucumpqaQJKldh/S6AElSfzNI\nJEmlGCSSpFIMEklSKQaJJKkUg0SSVMqh3X6BiPgGMFbbfAD4EHA9MAlszswNtf0uAC4E9gIbM/PW\niDgcuBFYD+wEzs/MHd2uWZLUumXdPI8kIp4B3J2ZJzW0/TXwscwcioirgduBe4A7gBOBVcAm4CTg\nHcBAZn4gIs4FTs7Md3WtYEnSvHW7R3IC8MyI+AqwHHgfcGJmDtXuvw14NUXvZFNm7gN2RsSW2mNP\nAT7asO/lXa5XkjRP3R4j2Q1ckZmvAS4C/hJY1nD/OLAGGODpw18Au4C1Te31fSVJFdLtIPk+RXiQ\nmVuAHcCRDfcPAI9TjH+saWofrbUPNO0rSaqQbh/aejvws8CGiPgpirD4akSclplfA84C7gTuBTZG\nxGHASuB4YDNwN3A2cF/t/6EDX2J/U1NTU8uWLZtrN0nS/tr+w9ntwfYVwKeBYyjGQf6AoldyLbAC\nuB+4IDOnIuLfAf+e4ovZmJlfjoiVwA3A84AngfMy89E5XnZqeHi8K19PJw0ODlD1OvuhRrDOTrPO\nzuqjOtsOkq72SDJzL/CmGe46fYZ9rwOua2rbA7yhK8VJkjrCExIlSaUYJJKkUgwSSVIpBokkqRSD\nRJJUikEiSSrFIJEklWKQSJJKMUgkSaUYJJKkUgwSSVIpBokkqRSDRJJUikEiSSrFIJEkldLtKySq\nSyYmJti+fdt+bcce+0KWL1/eo4okLVUGSZ/avn0bF19xM6vWrgdg99ijXPnu13HccS/qcWWSlhqD\npI+tWrue1c9+fq/LkLTEOUYiSSrFIJEklWKQSJJKMUgkSaUYJJKkUgwSSVIpBokkqRSDRJJUiick\nLkLNy6e4dIqkbjJIKqYTa2g1Lp/i0imSus0gqZhOraHl8imSFopBUkGGgKR+4mC7JKkUg0SSVIpB\nIkkqxTGSHpmYmGDr1i3T207RldSvDJIe2bp1q1N0JS0KBkkPOTtL0mLgGIkkqZSu90giYj1wH/Aq\nYAK4HpgENmfmhto+FwAXAnuBjZl5a0QcDtwIrAd2Audn5o5u1ytJmp+u9kgi4lDgz4HdtaZPAJdl\n5mnAIRFxTkQcCbwTOBl4LfDhiFgBXAR8JzNPBT4LXN7NWiVJ7en2oa2PAVcDPwKWASdm5lDtvtuA\nM4GXA5syc19m7gS2ACcApwC3N+z7qi7XKklqQ9eCJCLeCjyamXdQhEjz640Da4ABYKyhfRewtqm9\nvq8kqWK6OUbyNmAyIs6k6GF8BhhsuH8AeJxi/GNNU/torX2gad+WDA4OzL1Tj42OPrzf9rp1qxkc\nHGB0dPUB+9bv2//xB9+v+b6ZHt+qfvhegnV2mnV2Vr/U2a6uBUltHASAiLgT+F3giog4NTPvAs4C\n7gTuBTZGxGHASuB4YDNwN3A2xUD92cAQLRoeHu/Ul7FgRkZ2MTw8zsjIroPe19zW6nPM9PhWDA4O\n9MX30jo7yzo7q5/qbNdCT/+9BPhARHwdWAF8MTMfAa4CNgH/m2Iw/imKsZV/HRFDwO8Af7TAtUqS\nWrAgJyRm5isbNk+f4f7rgOua2vYAb+huZZKksjwhUZJUikEiSSrFIJEklWKQSJJKMUgkSaUYJJKk\nUgwSSVIpBokkqRSDRJJUikEiSSrFIJEklWKQSJJKMUgkSaUYJJKkUgwSSVIpBokkqRSDRJJUikEi\nSSrFIJEklWKQSJJKMUgkSaUYJJKkUgwSSVIpBokkqRSDRJJUikEiSSrFIJEklWKQSJJKMUgkSaUc\n2usCtHAmJibYvn3bfm3HHvtCli9f3qOKJC0GBskSsn37Ni6+4mZWrV0PwO6xR7ny3a/juONe1OPK\nJPUzg2SJWbV2Pauf/fwD2pt7K8ce+8KFLEtSHzNIBOzfW6n3VJ773BN7XZakPmCQaNrBeiuSNBuD\npI80Hn566KEHe1yNJBUMkj7SePhpxw/v54ijXtLrkiSpu0ESEYcA1wABTAK/CzwJXF/b3pyZG2r7\nXgBcCOwFNmbmrRFxOHAjsB7YCZyfmTu6WXPV1Q8/7R57pNelSBLQ/RMS/y0wlZmnAJcDHwI+AVyW\nmacBh0TEORFxJPBO4GTgtcCHI2IFcBHwncw8Ffhs7TkkSRXS1R5JZv51RNxS2zwGGAVelZlDtbbb\ngFdT9E42ZeY+YGdEbAFOAE4BPtqwr0HS5zwpUlp8WgqSiPhfwKeBL2fm3vm8QGZORsT1wOuB3wLO\nbLh7HFgDDABjDe27gLVN7fV91cc8KVJafFo9tPURikNOWyLizyLiF+bzIpn5VuDFwLXAyoa7BoDH\nKcY/1jS1j9baB5r2VZ+rj/OsfvbzpwNFUv9qqUeSmXcBd0XESuA3gf8RETspguHqzHxypsdFxJuA\nozLzI8BPgAngvog4LTO/BpwF3AncC2yMiMMoguZ4YDNwN3A2cF/t/6EDX+VAg4MDc+/UY6OjD++3\nvW7dagYHBxgdXX3AvrPd18p+7Tz3unXFdqe/l7PVUEY/vOdgnZ1mndXQ8hhJRJwOvJliTOM24CaK\nw1Q3A685yMP+J/DpiPha7bX+I/A94NraYPr9wBczcyoirgI2AcsoBuOfioirgRsiYohittd5rdQ6\nPDze6pdVGSMjuxgeHmdkZNe87mtlv3aeu77d6e/lbDW0a3BwoC/ec+vsLOvsrDJh1+oYyYPANopx\nkndk5p5a+99R9CZmlJm7gXNnuOv0Gfa9DriuqW0P8IZWapQk9UarPZJXAuOZ+WhErIyIn8nMH2Tm\nBOCCTC1onq00Njbcw2okqXNaDZJfBd5KERrrgVsi4k8y85PdKmyxaZ6t5JnpkhaLVmdtXQj8CkBm\nPgicRHECoeahcbbSyoF1vS5Hkjqi1SBZQTHYXfcUMNX5ciRJ/abVQ1tfBu6MiM/Xtn+DYraWJGmJ\na6lHkpnvAa6iWHzxhcBVmfmH3SxMktQf5rPW1v3AIxTneRARp9ZOVNQSMtMleV0nS1raWj2P5M8o\nVvLd2tA8RTEtWEvITJfkdZ0saWlrtUfyaiDqJyJqafOSvJIatTpraxu1Q1qSJDVqtUcyAnw3Iu6m\nWHwRgMx8e1eqkiT1jVaD5PbaP0mS9tPqMvI3RMSxwEuBrwAvyMwHulmYJKk/tDRGEhHnArcAVwLr\ngL+vXWtETSYmJti6dcv0v4mJiV6XJEld1epg+3uAV1BbARh4GXBp16rqY/XpsZd+8h4uvuLmA65P\nLkmLTatBMpGZ01dmycyHgcnulNT/6tNjvYyspKWg1cH2f4mIdwArIuLngP8AfKt7ZUmS+kWrPZIN\nwPOBPcCngJ0UYSJJWuJanbX1BMWYiOMikqT9tLrW1iQHXn/k4cw8qvMlSZL6Sas9kulDYBGxAng9\ncHK3ipIk9Y9Wx0imZebezPwCrvwrSaL1Q1tvadhcRnGG+1NdqUgdNTU5yUMPPQgw/b8kdVKr03/P\naLg9BTwGnNv5ctRpe8aH+fhNj7Fq7cPs+OH9HHHUS3pdkqRFptUxkrd1u5ClrNu9hvoJkrvHHplX\nPevWrWZkZBfglRAlHVyrh7Ye4MBZW1Ac5prKzBd2tKolpmq9hul6bn8YwCshSppVq4e2/jvwJHAN\nsBf4beAXgPd1qa4lZ769hm7zKoiSWtVqkLwmM3++YfvKiPhGZjp6K0lLXKvTf5dFxKvqGxHxaxTL\npKgi6uMaW7ducXaWpAXVao/kQuAzEfFcirGS7wHnd60qzVvVxlkkLR2tztr6BvDSiHgO8JPM3NXd\nstSOqo2zSFoaWr1C4jERcQfw98DqiLizduldSdIS1+oYyV8AVwC7gEeAzwGf6VZRkqT+0WqQPCcz\nvwqQmVOZeQ2wpntlSZL6RatBsicijqJ2UmJEnEJxXokkaYlrddbW7wF/AxwXEd8C1gG/1bWqJEl9\no9UgOZLiTPYXA8uB72Wmq/9KkloOkj/OzFuBf2n1iSPiUIrrux8LHAZsBL4LXA9MApszc0Nt3wso\nzlXZC2zMzFsj4nDgRmA9xcmP52fmjlZfX5K0MFoNkq0R8SngH4A99cbMnG3m1puAxzLzLRHxLODb\nwLeAyzJzKCKujohzgHuAdwInAquATRHxVeAi4DuZ+YGIOBe4HHjXPL8+SVKXzTrYHhH1Vft2UKz0\n+0sU1yY5Azh9juf+PMUffygOh+0DTszMoVrbbcCZwMuBTZm5LzN3AluAE4BTgNsb9p1eokWSVB1z\n9Uhuofjj/7aI+E+Z+fFWnzgzdwNExADwBYqVgj/WsMs4xRTiAWCsoX0XsLapvb6vKmxiYoLt27dN\nb3sNE2lpmGv677KG27893yePiBcAdwI3ZOZfUYyN1A0Aj1OMf6xpah+ttQ807asK2759GxdfcTOX\nfvIeLr7i5v1CRdLiNVePpPFiVssOutcMIuJI4CvAhsz821rzNyPi1My8CziLImTuBTZGxGHASuB4\nYDNwN3A2cF/t/yFaNDg4MPdOXTI6unq/7XXrVjM4OHBA+3x04jnKPvfB9qu3Q/G1N17HpPG+uple\nZ6b95quX7/l8WGdnWWc1tDrYDjNfIXE2lwLPAi6PiPfXHn8x8KcRsQK4H/hiZk5FxFXAJoqwuiwz\nn4qIq4EbImKI4uTH81p94eHh8XmW2jn1S9M2bg8Pjx/QPt/nLPscZZ/7YPvV2+u3D3ZfY9vBnrtd\ng4MDPX3PW2WdnWWdnVUm7OYKkpdGRP34xPMbbs95id3MfBczz7I6fYZ9rwOua2rbA7xhjvokST02\nV5C8eEGqkCT1rVmDxEvpSk9zVpo0s/mMkUhLWn1W2qq169k99ihXvvt1HHfci3pdltRzBonUYK5e\nR+OsNEkFg0RqYK9Dmj+DRGpir0Oan1YvbCVJ0ozskajrGscdHnrIiYDSYmOQqOsaxx12/PB+jjjq\nJfN+DqfeStVlkGhB1Mcddo890tbjHQSXqssgUd9wEFyqJgfbJUmlGCSSpFI8tKVFxUF5aeEZJFpU\nHJSXFp5BokXHQXlpYTlGIkkqxSCRJJXioa0lbGpycnrJEpcukdQug2QJ2zM+zMdveoxVax9ue+kS\nSTJIuqgfPvGXXbpEkgySLvITv6SlwCDpsrKf+PuhVyNpaTNI2rCQZ0/bq5FUdQZJGxb67GnHMfbX\n2EsDWLfuhB5WI8kgaZNnT/dOYy9t99ijfPbDq3n2s5/X67KkJcsgUV8yyKXq8Mx2SVIp9kjUtuax\niirPKmueIAEuMS91ikGitjWOVQCVnlXWOEECcIl5qYMMEpXSOFZR9VlljqtI3eEYiSSpFHskqgwv\nkyv1J4NEleFlcqX+ZJBoTgu53pfjGFL/MUg0J9f7kjQbB9vVknpPYeXAul6XIqliut4jiYhfBD6S\nmWdExHHA9cAksDkzN9T2uQC4ENgLbMzMWyPicOBGYD2wEzg/M3d0u15J0vx0NUgi4t3Am4FdtaZP\nAJdl5lBEXB0R5wD3AO8ETgRWAZsi4qvARcB3MvMDEXEucDnwrm7Wq6WjcdxnYmICWMby5YdU+ux8\nqaq63SP5AfDrwGdr2ydl5lDt9m3Aqyl6J5sycx+wMyK2ACcApwAfbdj38i7XqiWkedxn5cARrFq7\n3jEgqQ1dHSPJzC8B+xqaljXcHgfWAAPAWEP7LmBtU3t9X6ljGsd9HAOS2rfQs7YmG24PAI9TjH+s\naWofrbUPNO3bksHBgbl3KmF0dPX07anJScbGhqfbxsaGu/ra/WTdutUMDg7s9/2az371dmDO5zjY\nfq3WMFdtsz33fHT7Z7NTrLOz+qXOdi10kPxTRJyamXcBZwF3AvcCGyPiMGAlcDywGbgbOBu4r/b/\n0MxPeaDh4fFO172fkZFd07f3jA/z/k8+xqq1W4FqL1y40EZGdjE8PL7f92s++9Xb67dnc7D9Wq1h\nrtpme+5WDQ4OdP1nsxOss7P6qc52LXSQXAJcExErgPuBL2bmVERcBWyiOPR1WWY+FRFXAzdExBDw\nJHDeAtfasn5auFC9MzExwdatW/ZrcxkYLQZdD5LMfBB4Re32FuD0Gfa5DriuqW0P8IZu1yctlK1b\nt7qUvRYlz2yXFpBLwGgx8sx2SVIpBokkqRQPbamnFnJlYUndYZCop1xZWOp/Bol6rj4A3c7U6anJ\nSR544IHpczzs1UgLzyCZhZd+rb6nTwgtptTaq5EWnkEyCy/92r5eXVXRE0KlhWeQzMF5/+1ZymMf\n9mS11Bgk6poyYx/9rLEn+8TjP+aSN76Mo48+xgU9tWgZJFIXNIbox2/69pLsmWnp8IREqcu81okW\nO3skkirD8aX+ZJBo0VqomWPNf/w8l6V9zpTsTwaJFq2FmjnW+McPPJelLGdK9h+DRJXU2JuA9j/l\nL9TMMc9l0VJmkKiSGnsTUL1P+S42KT3NIFFlVflT/lI+4VJq5vRfqU1O65UKBokkqRQPbbXIY+JL\nT6cG/KXFziBpkcfEl56qD/hLVWGQzMNSXYRwKavygL9UFY6RSJJKMUgkSaUYJJKkUgwSSVIpBokk\nqRRnbUl9xOt1qIoMkgZeV0ILqfmEx1ZCwet1qIoMkgZeV0ILqfGEx/mEgtfrUNUYJE08AU0LqZVQ\naOwp20tWFRkkUgXMdpirsafc2Etu59CY1A0GiVQBcx3mmml5nnYPjUmdZpBIFdHO2IfjJaoCg0Sq\nmHaWr29+DHioSwun0kESEcuA/wacAPwE+J3M3Db7o+bmXHxVWTvL1zc/5onHf8wlb3wZRx99DBMT\nE8Ayli8vzj/2512dVukgAV4PPCMzXxERvwh8otZWinPxVXXtzB5sfszHb/r29PVzVg4c4c+7uqbq\nQXIKcDtAZv5DRPx8qw+cq9fhsWUtdo0D9PXbHgJTN1Q9SNYAYw3b+yLikMycPNgDrr72czzxxJP8\n+Mf/j6Hv753zU5iX0NVS0nwIrPF3o/7ha3R0NcPDYzQeDms8PHaw23UG09JT9SDZCQw0bM8aIgC3\n3fMAE/smeexH22Dg6On25sHL3WOPAjDyo+SD13yXw1evY+yRbTzreS+e3m/P+AiwrOXbVd/PGqx1\nz/gIKweOoFHjB6kPXnPH9O/CM575LA5fvQ5gv+2D3Qb4ya4R/vCCMzn66GNoR+Pv5u6xR2f9cDc6\nupqRkV1tvc5CqmKdnT60uWxqaqqjT9hJEfEbwK9l5tsj4peAyzPzV3tdlyTpaVXvkXwJODMivl7b\nflsvi5EkHajSPRJJUvV5YStJUikGiSSpFINEklSKQSJJKqXqs7Za0q01uTohIg4FPgUcCxwGbAS+\nC1wPTAKbM3NDr+prFhHrgfuAVwETVLDOiHgv8DpgBcX7fhcVq7P2vt9A8b7vAy6gYt/P2rJDH8nM\nMyLiuJlqi4gLgAuBvcDGzLy1hzX+HHAVxffzSeAtmTnc6xqb62xoOw94R2a+orZdqTojYhC4BngW\nsJzi+/lAO3Uulh7J9JpcwKUUa3JVxZuAxzLzVOC1wH+lqO+yzDwNOCQizullgXW1P35/DuyuNVWu\nzog4DTi59l6fDhxNBesEzgaWZ+YvA/8Z+BAVqjMi3k3xR+QZtaYDaouII4F3AidT/Ox+OCJW9LDG\n/wJsyMxXUpwa8J5e13iQOomIlwFvb9iuYp1/DNyYmacDlwPHt1vnYgmS/dbkAlpek2sBfJ7iTYIi\n9fcBJ2bmUK3tNopP/1XwMeBq4EcUp0JXsc7XAJsj4svAzcDfUM06vw8cWustr6X4dFelOn8A/HrD\n9klNtZ0JvBzYlJn7MnMnsAX4Nz2s8dzM/Ofa7UMpjj70ukZoqjMijgA+CFzcsE/l6gR+GTgqIu4A\nzgP+rt06F0uQzLgmV6+KaZSZuzPziYgYAL4AvI/GtSxgnOIPTU9FxFuBRzPzDp6ur/F7WIk6gecA\nJwG/CVwE/CXVrHMX8NPA94C/oDgkU5n3PTO/RPGhpq65tjUUyxM1/l7tYgFrbq4xMx8BiIhXABuA\nP+HA3/0FrbFW13Sdtb871wK/DzzRsFul6qw5FhjJzDOB/wu8lzbrrMQf2w6Y95pcCykiXgDcCdyQ\nmX9FcRy6bgB4vCeF7e9tFKsI/C3FWNNngMGG+6tS5w7gK7VPTN+n+FTa+INelTp/D7g9M4Onv5+H\nNdxflTrrZvqZ3Enxh6W5vWci4lyKcbGzM3MH1avxROBnKHr2nwP+VUR8gurVCcXv0i2127dQHMkZ\no406F0uQfJ3imDS1Nbn+efbdF07tmONXgD/IzBtqzd+MiFNrt88ChmZ88ALKzNMy84zaYOG3gDcD\nt1WtTmATxbFbIuKngGcC/6c2dgLVqXOEpz/ZPU5xKOabFayz7p9meK/vBU6JiMMiYi1wPLC5VwVG\nxJsoeiKnZ2Z9Ncd/rFCNyzLzvsz82do4zhuB72bm71eszrohan83gVMp6mnrPV8Us7ao9ppcl1LM\nirg8It4PTFEcO/3T2iDW/cAXe1jfbC4BrqlSnZl5a0T8SkT8I8XhmIuA7cC1VaqTYmD4UxFxF8Xs\nsvcC36B6ddYd8F5n5lREXEUR3ssoBuOf6kVxtUNGVwIPAl+KiCnga5n5R1WpkeJ3e0aZ+UiF6qy7\nhOLn8SKKDz3nZeZYO3W61pYkqZTFcmhLktQjBokkqRSDRJJUikEiSSrFIJEklWKQSJJKMUgkSaUY\nJJKkUv4/qjMawAyuYnMAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df[df['age'].notnull()]['age'].plot.hist(by=None, bins=100)" ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnameyr_frmtnFYEagerule_date
195731116884A Chance to Change FoundationNaNFY20151161900
196731116884A Chance to Change FoundationNaNFY20141161900
197731116884A Chance to Change FoundationNaNFY20141161900
5610742684333Any Baby Can of San AntonioNaNFY20151161900
5611742684333Any Baby Can of San AntonioNaNFY20141161900
5612742684333Any Baby Can of San AntonioNaNFY20141161900
5613742684333Any Baby Can of San AntonioNaNFY20131161900
8651480891418Bill of Rights InstituteNaNFY20141161900
8652480891418Bill of Rights InstituteNaNFY20141161900
8653480891418Bill of Rights InstituteNaNFY20131161900
\n", "
" ], "text/plain": [ " EIN name yr_frmtn FYE age \\\n", "195 731116884 A Chance to Change Foundation NaN FY2015 116 \n", "196 731116884 A Chance to Change Foundation NaN FY2014 116 \n", "197 731116884 A Chance to Change Foundation NaN FY2014 116 \n", "5610 742684333 Any Baby Can of San Antonio NaN FY2015 116 \n", "5611 742684333 Any Baby Can of San Antonio NaN FY2014 116 \n", "5612 742684333 Any Baby Can of San Antonio NaN FY2014 116 \n", "5613 742684333 Any Baby Can of San Antonio NaN FY2013 116 \n", "8651 480891418 Bill of Rights Institute NaN FY2014 116 \n", "8652 480891418 Bill of Rights Institute NaN FY2014 116 \n", "8653 480891418 Bill of Rights Institute NaN FY2013 116 \n", "\n", " rule_date \n", "195 1900 \n", "196 1900 \n", "197 1900 \n", "5610 1900 \n", "5611 1900 \n", "5612 1900 \n", "5613 1900 \n", "8651 1900 \n", "8652 1900 \n", "8653 1900 " ] }, "execution_count": 155, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['age']>100][['EIN', 'name', 'yr_frmtn', 'FYE', 'age', 'rule_date']][:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
The problem is with the 2004 BMF dates." ] }, { "cell_type": "code", "execution_count": 153, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnameyr_frmtnFYEagerule_dateRULING_2015_BMFruledate_2004_BMFruledate_MSTRALL
195731116884A Chance to Change FoundationNaNFY20151161900198111190000000000
196731116884A Chance to Change FoundationNaNFY20141161900198111190000000000
197731116884A Chance to Change FoundationNaNFY20141161900198111190000000000
\n", "
" ], "text/plain": [ " EIN name yr_frmtn FYE age \\\n", "195 731116884 A Chance to Change Foundation NaN FY2015 116 \n", "196 731116884 A Chance to Change Foundation NaN FY2014 116 \n", "197 731116884 A Chance to Change Foundation NaN FY2014 116 \n", "\n", " rule_date RULING_2015_BMF ruledate_2004_BMF ruledate_MSTRALL \n", "195 1900 198111 190000 000000 \n", "196 1900 198111 190000 000000 \n", "197 1900 198111 190000 000000 " ] }, "execution_count": 153, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['EIN']=='731116884'][['EIN', 'name', 'yr_frmtn', 'FYE', 'age', 'rule_date',\n", " 'RULING_2015_BMF', 'ruledate_2004_BMF', 'ruledate_MSTRALL']]" ] }, { "cell_type": "code", "execution_count": 152, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnameyr_frmtnFYEagerule_dateRULING_2015_BMFruledate_2004_BMFruledate_MSTRALL
5610742684333Any Baby Can of San AntonioNaNFY20151161900201107190000000000
5611742684333Any Baby Can of San AntonioNaNFY20141161900201107190000000000
5612742684333Any Baby Can of San AntonioNaNFY20141161900201107190000000000
5613742684333Any Baby Can of San AntonioNaNFY20131161900201107190000000000
84203742684333NaN1993FY2009231993NaNNaNNaN
84525742684333NaN1993FY2010231993NaNNaNNaN
\n", "
" ], "text/plain": [ " EIN name yr_frmtn FYE age \\\n", "5610 742684333 Any Baby Can of San Antonio NaN FY2015 116 \n", "5611 742684333 Any Baby Can of San Antonio NaN FY2014 116 \n", "5612 742684333 Any Baby Can of San Antonio NaN FY2014 116 \n", "5613 742684333 Any Baby Can of San Antonio NaN FY2013 116 \n", "84203 742684333 NaN 1993 FY2009 23 \n", "84525 742684333 NaN 1993 FY2010 23 \n", "\n", " rule_date RULING_2015_BMF ruledate_2004_BMF ruledate_MSTRALL \n", "5610 1900 201107 190000 000000 \n", "5611 1900 201107 190000 000000 \n", "5612 1900 201107 190000 000000 \n", "5613 1900 201107 190000 000000 \n", "84203 1993 NaN NaN NaN \n", "84525 1993 NaN NaN NaN " ] }, "execution_count": 152, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['EIN']=='742684333'][['EIN', 'name', 'yr_frmtn', 'FYE', 'age', 'rule_date',\n", " 'RULING_2015_BMF', 'ruledate_2004_BMF', 'ruledate_MSTRALL']]" ] }, { "cell_type": "code", "execution_count": 156, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnameyr_frmtnFYEagerule_dateRULING_2015_BMFruledate_2004_BMFruledate_MSTRALL
8651480891418Bill of Rights InstituteNaNFY20141161900198006190000000000
8652480891418Bill of Rights InstituteNaNFY20141161900198006190000000000
8653480891418Bill of Rights InstituteNaNFY20131161900198006190000000000
8654480891418Bill of Rights InstituteNaNFY20121161900198006190000000000
8655480891418Bill of Rights InstituteNaNFY20111161900198006190000000000
8656480891418Bill of Rights InstituteNaNFY20101161900198006190000000000
8657480891418Bill of Rights InstituteNaNFY20091161900198006190000000000
8658480891418Bill of Rights InstituteNaNFY20081161900198006190000000000
\n", "
" ], "text/plain": [ " EIN name yr_frmtn FYE age rule_date \\\n", "8651 480891418 Bill of Rights Institute NaN FY2014 116 1900 \n", "8652 480891418 Bill of Rights Institute NaN FY2014 116 1900 \n", "8653 480891418 Bill of Rights Institute NaN FY2013 116 1900 \n", "8654 480891418 Bill of Rights Institute NaN FY2012 116 1900 \n", "8655 480891418 Bill of Rights Institute NaN FY2011 116 1900 \n", "8656 480891418 Bill of Rights Institute NaN FY2010 116 1900 \n", "8657 480891418 Bill of Rights Institute NaN FY2009 116 1900 \n", "8658 480891418 Bill of Rights Institute NaN FY2008 116 1900 \n", "\n", " RULING_2015_BMF ruledate_2004_BMF ruledate_MSTRALL \n", "8651 198006 190000 000000 \n", "8652 198006 190000 000000 \n", "8653 198006 190000 000000 \n", "8654 198006 190000 000000 \n", "8655 198006 190000 000000 \n", "8656 198006 190000 000000 \n", "8657 198006 190000 000000 \n", "8658 198006 190000 000000 " ] }, "execution_count": 156, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['EIN']=='480891418'][['EIN', 'name', 'yr_frmtn', 'FYE', 'age', 'rule_date',\n", " 'RULING_2015_BMF', 'ruledate_2004_BMF', 'ruledate_MSTRALL']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Start Over - BMF 2004 and MSTRALL values for ruling date were wrong in above examples" ] }, { "cell_type": "code", "execution_count": 182, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#df['rule_date_v2'] = df['rule_date']" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of columns: 219\n", "Number of observations: 84958\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINorg_urlnamecategorycategory-fullDate PublishedForm 990 FYEForm 990 FYE, v2FYEEarliest Rating Publication Dateratings_systemOverall ScoreOverall Ratingadvisory text - current advisoryadvisory text - past advisorycurrent_or_past_donor_advisorycurrent_donor_advisorypast_donor_advisorylatest_entrycurrent_ratings_urlein_2016Publication_date_and_FY_2016Publication Date_2016FYE_2016donor_alert_2016overall_rating_2016efficiency_rating_rating_2016AT_rating_2016overall_rating_star_2016financial_rating_star_2016AT_rating_star_2016program_expense_percent_2016admin_expense_percent_2016fund_expense_percent_2016fund_efficiency_2016working_capital_ratio_2016program_expense_growth_2016liabilities_to_assets_2016independent_board_2016no_material_division_2016audited_financials_2016no_loans_related_2016documents_minutes_2016form_990_2016conflict_of_interest_policy_2016whistleblower_policy_2016records_retention_policy_2016CEO_listed_2016process_CEO_compensation_2016no_board_compensation_2016donor_privacy_policy_2016board_listed_2016audited_financials_web_2016form_990_web_2016staff_listed_2016contributions_gifts_grants_2016federated_campaigns_2016membership_dues_2016fundraising_events_2016related_organizations_2016government_grants_2016total_contributions_2016program_service_revenue_2016total_primary_revenue_2016other_revenue_2016total_revenue_2016program_expenses_2016administrative_expenses_2016fundraising_expenses_2016total_functional_expenses_2016payments_to_affiliates_2016excess_or_deficit_2016net_assets_2016comp_2016cp_2016mission_20162011 datacharity_name_2011category_2011city_2011state_2011cause_2011tag_line_2011url_2011ein_2011fye_2011overall_rating_2011overall_rating_2011_plus_30overall_rating_2011_plus_30_v2overall_rating_star_2011overall_rating_star_2011_textefficiency_rating_2011AT_rating_2011financial_rating_star_2011AT_rating_star_2011program_expense_percent_2011admin_expense_percent_2011fund_expense_percent_2011fund_efficiency_2011primary_revenue_growth_2011program_expense_growth_2011working_capital_ratio_2011independent_board_2011no_material_division_2011audited_financials_2011no_loans_related_2011documents_minutes_2011form_990_2011conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011CEO_listed_2011process_CEO_compensation_2011no_board_compensation_2011donor_privacy_policy_2011board_listed_2011audited_financials_web_2011form_990_web_2011staff_listed_2011primary_revenue_2011other_revenue_2011total_revenue_2011govt_revenue_2011program_expense_2011admin_expense_2011fund_expense_2011total_functional_expense_2011affiliate_payments_2011budget_surplus_2011net_assets_2011leader_comp_2011leader_comp_percent_2011email_2011website_20112016 Advisory - Date Posted2016 Advisory - Charity Name2016 Advisory - advisory_url2016 Advisory - advisory_merge_v1to_be_mergedNEW ROWNAME_2015_BMFSTREET_2015_BMFCITY_2015_BMFSTATE_2015_BMFZIP_2015_BMFRULING_2015_BMFACTIVITY_2015_BMFTAX_PERIOD_2015_BMFASSET_AMT_2015_BMFINCOME_AMT_2015_BMFREVENUE_AMT_2015_BMFNTEE_CD_2015_BMF2015 BMFruledate_2004_BMFname_MSTRALLstate_MSTRALLNTEE1_MSTRALLnteecc_MSTRALLzip_MSTRALLfips_MSTRALLtaxper_MSTRALLincome_MSTRALLF990REV_MSTRALLassets_MSTRALLruledate_MSTRALLdeductcd_MSTRALLaccper_MSTRALLrule_datetaxpdNAME_SOIyr_frmtnpt1_num_vtng_gvrn_bdy_memspt1_num_ind_vtng_memsnum_vtng_gvrn_bdy_memsnum_ind_vtng_memstot_num_emplstot_num_vlntrscontri_grnts_cyprog_srvc_rev_cyinvst_incm_cyoth_rev_cygrnts_and_smlr_amts_cytot_prof_fndrsng_exp_cytot_fndrsng_exp_cypt1_tot_asts_eoyaud_fincl_stmtsmtrl_divrsn_or_misusecnflct_int_plcywhistleblower_plcydoc_retention_plcyfederated_campaignsmemshp_duesrltd_orgsgovt_grntsall_oth_contrinncsh_contritot_contripsr_totinv_incm_tot_revbonds_tot_revroylrev_tot_revnet_rent_tot_revgain_or_loss_secgain_or_loss_othoth_rev_tottot_revmgmt_srvc_fee_totfee_for_srvc_leg_totfee_for_srvc_acct_totfee_for_srvc_lbby_totfee_for_srvc_prof_totfee_for_srvc_invst_totfee_for_srvc_oth_totfs_auditedaudit_committeevlntr_hrs_merge
016722020503776http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=16722Portsmouth Girls Softball AssociationHuman ServicesHuman Services : Multipurpose Human Service Organizations2016-08-12 00:00:00current2015-01-01currentNaNcurrentNaNcurrent (2016) donor advisory\\r\\n\\t\\tOn August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\"NaN1.01.00.0Truehttp://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=16722NaNNaNNaNcurrentcurrent donor advisory 2016NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_only0.0NaNPORTSMOUTH GIRLS SOFTBALL ASSOCIATIONPO BOX 8092PORTSMOUTHNH03802-8092201104.00.0201309.00.00.00.0N631.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN2011NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_only
\n", "
" ], "text/plain": [ " org_id EIN \\\n", "0 16722 020503776 \n", "\n", " org_url \\\n", "0 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=16722 \n", "\n", " name category \\\n", "0 Portsmouth Girls Softball Association Human Services \n", "\n", " category-full \\\n", "0 Human Services : Multipurpose Human Service Organizations \n", "\n", " Date Published Form 990 FYE Form 990 FYE, v2 FYE \\\n", "0 2016-08-12 00:00:00 current 2015-01-01 current \n", "\n", " Earliest Rating Publication Date ratings_system Overall Score \\\n", "0 NaN current NaN \n", "\n", " Overall Rating \\\n", "0 current (2016) donor advisory \n", "\n", " advisory text - current advisory \\\n", "0 \\r\\n\\t\\tOn August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\" \n", "\n", " advisory text - past advisory current_or_past_donor_advisory \\\n", "0 NaN 1.0 \n", "\n", " current_donor_advisory past_donor_advisory latest_entry \\\n", "0 1.0 0.0 True \n", "\n", " current_ratings_url \\\n", "0 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=16722 \n", "\n", " ein_2016 Publication_date_and_FY_2016 Publication Date_2016 FYE_2016 \\\n", "0 NaN NaN NaN current \n", "\n", " donor_alert_2016 overall_rating_2016 \\\n", "0 current donor advisory 2016 NaN \n", "\n", " efficiency_rating_rating_2016 AT_rating_2016 overall_rating_star_2016 \\\n", "0 NaN NaN NaN \n", "\n", " financial_rating_star_2016 AT_rating_star_2016 program_expense_percent_2016 \\\n", "0 NaN NaN NaN \n", "\n", " admin_expense_percent_2016 fund_expense_percent_2016 fund_efficiency_2016 \\\n", "0 NaN NaN NaN \n", "\n", " working_capital_ratio_2016 program_expense_growth_2016 \\\n", "0 NaN NaN \n", "\n", " liabilities_to_assets_2016 independent_board_2016 no_material_division_2016 \\\n", "0 NaN NaN NaN \n", "\n", " audited_financials_2016 no_loans_related_2016 documents_minutes_2016 \\\n", "0 NaN NaN NaN \n", "\n", " form_990_2016 conflict_of_interest_policy_2016 whistleblower_policy_2016 \\\n", "0 NaN NaN NaN \n", "\n", " records_retention_policy_2016 CEO_listed_2016 process_CEO_compensation_2016 \\\n", "0 NaN NaN NaN \n", "\n", " no_board_compensation_2016 donor_privacy_policy_2016 board_listed_2016 \\\n", "0 NaN NaN NaN \n", "\n", " audited_financials_web_2016 form_990_web_2016 staff_listed_2016 \\\n", "0 NaN NaN NaN \n", "\n", " contributions_gifts_grants_2016 federated_campaigns_2016 \\\n", "0 NaN NaN \n", "\n", " membership_dues_2016 fundraising_events_2016 related_organizations_2016 \\\n", "0 NaN NaN NaN \n", "\n", " government_grants_2016 total_contributions_2016 \\\n", "0 NaN NaN \n", "\n", " program_service_revenue_2016 total_primary_revenue_2016 other_revenue_2016 \\\n", "0 NaN NaN NaN \n", "\n", " total_revenue_2016 program_expenses_2016 administrative_expenses_2016 \\\n", "0 NaN NaN NaN \n", "\n", " fundraising_expenses_2016 total_functional_expenses_2016 \\\n", "0 NaN NaN \n", "\n", " payments_to_affiliates_2016 excess_or_deficit_2016 net_assets_2016 \\\n", "0 NaN NaN NaN \n", "\n", " comp_2016 cp_2016 mission_2016 2011 data charity_name_2011 category_2011 \\\n", "0 NaN NaN NaN NaN NaN NaN \n", "\n", " city_2011 state_2011 cause_2011 tag_line_2011 url_2011 ein_2011 fye_2011 \\\n", "0 NaN NaN NaN NaN NaN NaN NaN \n", "\n", " overall_rating_2011 overall_rating_2011_plus_30 \\\n", "0 NaN NaN \n", "\n", " overall_rating_2011_plus_30_v2 overall_rating_star_2011 \\\n", "0 NaN NaN \n", "\n", " overall_rating_star_2011_text efficiency_rating_2011 AT_rating_2011 \\\n", "0 NaN NaN NaN \n", "\n", " financial_rating_star_2011 AT_rating_star_2011 \\\n", "0 NaN NaN \n", "\n", " program_expense_percent_2011 admin_expense_percent_2011 \\\n", "0 NaN NaN \n", "\n", " fund_expense_percent_2011 fund_efficiency_2011 \\\n", "0 NaN NaN \n", "\n", " primary_revenue_growth_2011 program_expense_growth_2011 \\\n", "0 NaN NaN \n", "\n", " working_capital_ratio_2011 independent_board_2011 \\\n", "0 NaN NaN \n", "\n", " no_material_division_2011 audited_financials_2011 no_loans_related_2011 \\\n", "0 NaN NaN NaN \n", "\n", " documents_minutes_2011 form_990_2011 conflict_of_interest_policy_2011 \\\n", "0 NaN NaN NaN \n", "\n", " whistleblower_policy_2011 records_retention_policy_2011 CEO_listed_2011 \\\n", "0 NaN NaN NaN \n", "\n", " process_CEO_compensation_2011 no_board_compensation_2011 \\\n", "0 NaN NaN \n", "\n", " donor_privacy_policy_2011 board_listed_2011 audited_financials_web_2011 \\\n", "0 NaN NaN NaN \n", "\n", " form_990_web_2011 staff_listed_2011 primary_revenue_2011 \\\n", "0 NaN NaN NaN \n", "\n", " other_revenue_2011 total_revenue_2011 govt_revenue_2011 \\\n", "0 NaN NaN NaN \n", "\n", " program_expense_2011 admin_expense_2011 fund_expense_2011 \\\n", "0 NaN NaN NaN \n", "\n", " total_functional_expense_2011 affiliate_payments_2011 \\\n", "0 NaN NaN \n", "\n", " budget_surplus_2011 net_assets_2011 leader_comp_2011 \\\n", "0 NaN NaN NaN \n", "\n", " leader_comp_percent_2011 email_2011 website_2011 \\\n", "0 NaN NaN NaN \n", "\n", " 2016 Advisory - Date Posted 2016 Advisory - Charity Name \\\n", "0 NaN NaN \n", "\n", " 2016 Advisory - advisory_url 2016 Advisory - advisory _merge_v1 \\\n", "0 NaN NaN left_only \n", "\n", " to_be_merged NEW ROW NAME_2015_BMF \\\n", "0 0.0 NaN PORTSMOUTH GIRLS SOFTBALL ASSOCIATION \n", "\n", " STREET_2015_BMF CITY_2015_BMF STATE_2015_BMF ZIP_2015_BMF RULING_2015_BMF \\\n", "0 PO BOX 8092 PORTSMOUTH NH 03802-8092 201104.0 \n", "\n", " ACTIVITY_2015_BMF TAX_PERIOD_2015_BMF ASSET_AMT_2015_BMF \\\n", "0 0.0 201309.0 0.0 \n", "\n", " INCOME_AMT_2015_BMF REVENUE_AMT_2015_BMF NTEE_CD_2015_BMF 2015 BMF \\\n", "0 0.0 0.0 N63 1.0 \n", "\n", " ruledate_2004_BMF name_MSTRALL state_MSTRALL NTEE1_MSTRALL nteecc_MSTRALL \\\n", "0 NaN NaN NaN NaN NaN \n", "\n", " zip_MSTRALL fips_MSTRALL taxper_MSTRALL income_MSTRALL F990REV_MSTRALL \\\n", "0 NaN NaN NaN NaN NaN \n", "\n", " assets_MSTRALL ruledate_MSTRALL deductcd_MSTRALL accper_MSTRALL rule_date \\\n", "0 NaN NaN NaN NaN 2011 \n", "\n", " taxpd NAME_SOI yr_frmtn pt1_num_vtng_gvrn_bdy_mems pt1_num_ind_vtng_mems \\\n", "0 NaN NaN NaN NaN NaN \n", "\n", " num_vtng_gvrn_bdy_mems num_ind_vtng_mems tot_num_empls tot_num_vlntrs \\\n", "0 NaN NaN NaN NaN \n", "\n", " contri_grnts_cy prog_srvc_rev_cy invst_incm_cy oth_rev_cy \\\n", "0 NaN NaN NaN NaN \n", "\n", " grnts_and_smlr_amts_cy tot_prof_fndrsng_exp_cy tot_fndrsng_exp_cy \\\n", "0 NaN NaN NaN \n", "\n", " pt1_tot_asts_eoy aud_fincl_stmts mtrl_divrsn_or_misuse cnflct_int_plcy \\\n", "0 NaN NaN NaN NaN \n", "\n", " whistleblower_plcy doc_retention_plcy federated_campaigns memshp_dues \\\n", "0 NaN NaN NaN NaN \n", "\n", " rltd_orgs govt_grnts all_oth_contri nncsh_contri tot_contri psr_tot \\\n", "0 NaN NaN NaN NaN NaN NaN \n", "\n", " inv_incm_tot_rev bonds_tot_rev roylrev_tot_rev net_rent_tot_rev \\\n", "0 NaN NaN NaN NaN \n", "\n", " gain_or_loss_sec gain_or_loss_oth oth_rev_tot tot_rev \\\n", "0 NaN NaN NaN NaN \n", "\n", " mgmt_srvc_fee_tot fee_for_srvc_leg_tot fee_for_srvc_acct_tot \\\n", "0 NaN NaN NaN \n", "\n", " fee_for_srvc_lbby_tot fee_for_srvc_prof_tot fee_for_srvc_invst_tot \\\n", "0 NaN NaN NaN \n", "\n", " fee_for_srvc_oth_tot fs_audited audit_committee vlntr_hrs _merge \n", "0 NaN NaN NaN NaN left_only " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_pickle('merged data with EIN clean-up, SOI data, and 2015, 2008, and 2004 BMF data.pkl')\n", "print \"Number of columns:\", len(df.columns)\n", "print \"Number of observations:\", len(df)\n", "df.head(1)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "91\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINnamecategoryorg_urlGuidestar URLRuling Year
014954NaN26.4.26 FoundationHuman Serviceshttp://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=14954NaNNaN
116191200309308.0The Affordable Housing Coalition of San Diego CountyCommunity Developmenthttp://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=16191NaNNaN
215442261786633.0All Day FoundationHuman Serviceshttp://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=15442https://www.guidestar.org/profile/26-17866332008.0
\n", "
" ], "text/plain": [ " org_id EIN name \\\n", "0 14954 NaN 26.4.26 Foundation \n", "1 16191 200309308.0 The Affordable Housing Coalition of San Diego County \n", "2 15442 261786633.0 All Day Foundation \n", "\n", " category \\\n", "0 Human Services \n", "1 Community Development \n", "2 Human Services \n", "\n", " org_url \\\n", "0 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=14954 \n", "1 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=16191 \n", "2 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=15442 \n", "\n", " Guidestar URL Ruling Year \n", "0 NaN NaN \n", "1 NaN NaN \n", "2 https://www.guidestar.org/profile/26-1786633 2008.0 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "updated_ruledate = pd.read_excel('91 orgs missing BMF data_DGN.xls')\n", "print len(updated_ruledate)\n", "updated_ruledate[:3]" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "18\n", "18\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idrule_date
2154422008.0
7161572001.0
19132332006.0
21161782008.0
23139442009.0
35162042015.0
40162072005.0
51155122010.0
53137722006.0
58166481970.0
61161852011.0
63161542008.0
68162362012.0
70161882015.0
72161892005.0
84162172011.0
89161362007.0
90136272010.0
\n", "
" ], "text/plain": [ " org_id rule_date\n", "2 15442 2008.0\n", "7 16157 2001.0\n", "19 13233 2006.0\n", "21 16178 2008.0\n", "23 13944 2009.0\n", "35 16204 2015.0\n", "40 16207 2005.0\n", "51 15512 2010.0\n", "53 13772 2006.0\n", "58 16648 1970.0\n", "61 16185 2011.0\n", "63 16154 2008.0\n", "68 16236 2012.0\n", "70 16188 2015.0\n", "72 16189 2005.0\n", "84 16217 2011.0\n", "89 16136 2007.0\n", "90 13627 2010.0" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(updated_ruledate[updated_ruledate['Ruling Year'].notnull()])\n", "updated_ruledate = updated_ruledate[updated_ruledate['Ruling Year'].notnull()]\n", "updated_ruledate = updated_ruledate[['org_id', 'Ruling Year']]\n", "updated_ruledate.columns = ['org_id', 'rule_date']\n", "updated_ruledate['org_id'] = updated_ruledate['org_id'].astype('str')\n", "print len(updated_ruledate)\n", "updated_ruledate" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "219\n", "84958\n", "84958\n", "220\n", "84958\n" ] } ], "source": [ "print len(df.columns)\n", "print len(df)\n", "print len(pd.merge(df, updated_ruledate, left_on='org_id', right_on='org_id', how='left'))\n", "df = pd.merge(df, updated_ruledate, left_on='org_id', right_on='org_id', how='left')\n", "print len(df.columns)\n", "print len(df)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.rename(columns={'rule_date_x':'rule_date_v1'}, inplace=True)\n", "df.rename(columns={'rule_date_y':'rule_date'}, inplace=True)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1086\n", "84940\n" ] } ], "source": [ "print len(df[df['rule_date_v1'].isnull()])\n", "print len(df[df['rule_date'].isnull()])" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "18" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['rule_date'].value_counts().sum()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idnameEIN
4325613110Kids Charity of Tampa Bay550900271
4325713110Kids Charity of Tampa Bay550900271
4325813110Kids Charity of Tampa Bay550900271
4325913110Kids Charity of Tampa Bay550900271
4326013110Kids Charity of Tampa Bay550900271
\n", "
" ], "text/plain": [ " org_id name EIN\n", "43256 13110 Kids Charity of Tampa Bay 550900271\n", "43257 13110 Kids Charity of Tampa Bay 550900271\n", "43258 13110 Kids Charity of Tampa Bay 550900271\n", "43259 13110 Kids Charity of Tampa Bay 550900271\n", "43260 13110 Kids Charity of Tampa Bay 550900271" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['name']=='Kids Charity of Tampa Bay'][['org_id', 'name', 'EIN']]" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "18\n" ] }, { "data": { "text/plain": [ "nan 84935\n", "2006 5\n", "2008.0 3\n", "2010.0 2\n", "2006.0 2\n", "2005.0 2\n", "2015.0 2\n", "2011.0 2\n", "2007.0 1\n", "2009.0 1\n", "2001.0 1\n", "1970.0 1\n", "2012.0 1\n", "Name: rule_date, dtype: int64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print df['rule_date'].value_counts().sum()\n", "df['rule_date'] = np.where(df['EIN']=='550900271', '2006', df['rule_date'])\n", "df['rule_date'].value_counts()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84935 0\n", "23\n", "84935\n" ] } ], "source": [ "print len(df[df['rule_date']=='nan']), len(df[df['rule_date'].isnull()])\n", "df['rule_date'] = np.where(df['rule_date']=='nan', np.nan, df['rule_date']\n", " )\n", "print df['rule_date'].value_counts().sum()\n", "print len(df[df['rule_date'].isnull()])" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "object\n" ] }, { "data": { "text/plain": [ "2006 5\n", "2008.0 3\n", "2010.0 2\n", "2006.0 2\n", "2005.0 2\n", "2015.0 2\n", "2011.0 2\n", "2007.0 1\n", "2009.0 1\n", "2001.0 1\n", "1970.0 1\n", "2012.0 1\n", "Name: rule_date, dtype: int64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print df['rule_date'].dtype\n", "df['rule_date'].value_counts()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2006 7\n", "2008 3\n", "2010 2\n", "2011 2\n", "2015 2\n", "2005 2\n", "1970 1\n", "2012 1\n", "2009 1\n", "2007 1\n", "2001 1\n", "Name: rule_date, dtype: int64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['rule_date'] = df['rule_date'].str[:4]\n", "df['rule_date'].value_counts()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84935\n", "23\n", "1249\n", "83709\n" ] } ], "source": [ "print len(df[df['rule_date'].isnull()])\n", "print len(df[df['rule_date'].notnull()])\n", "df['rule_date'] = np.where( ( df['rule_date'].isnull() & df['RULING_2015_BMF'].notnull() ), \n", " df['RULING_2015_BMF'].astype('str').str[:4], df['rule_date']\n", " )\n", "print len(df[df['rule_date'].isnull()])\n", "print len(df[df['rule_date'].notnull()])" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "83709" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['rule_date'].value_counts().sum()" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "EIN object\n", "name object\n", "yr_frmtn float64\n", "FYE object\n", "rule_date object\n", "RULING_2015_BMF float64\n", "ruledate_2004_BMF float32\n", "ruledate_MSTRALL object\n", "dtype: object" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['EIN', 'name', 'yr_frmtn', 'FYE', 'rule_date',\n", " 'RULING_2015_BMF', 'ruledate_2004_BMF', 'ruledate_MSTRALL']].dtypes" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df['ruledate_2004_BMF_v2'] = df['ruledate_2004_BMF'].astype('str').str[:4]\n", "df['ruledate_MSTRALL_v2'] = df['ruledate_MSTRALL'].str[:4]\n", "df['yr_frmtn_v2'] = df['yr_frmtn'].astype('str').str[:4]" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df['ruledate_2004_BMF_v2'] = np.where(df['ruledate_2004_BMF_v2']=='nan', np.nan, df['ruledate_2004_BMF_v2']\n", " )" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnamerule_dateyr_frmtnyr_frmtn_v2RULING_2015_BMFruledate_2004_BMF_v2ruledate_MSTRALL_v2
3475133636844American Foundation for Disabled ChildrenNaNNaNnanNaN19971997
3476133636844American Foundation for Disabled ChildrenNaNNaNnanNaN19971997
3477133636844American Foundation for Disabled ChildrenNaNNaNnanNaN19971997
3478133636844American Foundation for Disabled ChildrenNaNNaNnanNaN19971997
3479133636844American Foundation for Disabled ChildrenNaNNaNnanNaN19971997
3480133636844American Foundation for Disabled ChildrenNaNNaNnanNaN19971997
3481133636844American Foundation for Disabled ChildrenNaNNaNnanNaN19971997
3482133636844American Foundation for Disabled ChildrenNaNNaNnanNaN19971997
3483133636844American Foundation for Disabled ChildrenNaNNaNnanNaN19971997
6395942923077ASCEND: a Humanitarian AllianceNaNNaNnanNaN19841984
7339351965051Backstreet MissionsNaNNaNnanNaN19961996
7340351965051Backstreet MissionsNaNNaNnanNaN19961996
7341351965051Backstreet MissionsNaNNaNnanNaN19961996
7342351965051Backstreet MissionsNaNNaNnanNaN19961996
7343351965051Backstreet MissionsNaNNaNnanNaN19961996
7344351965051Backstreet MissionsNaNNaNnanNaN19961996
7345351965051Backstreet MissionsNaNNaNnanNaN19961996
7346351965051Backstreet MissionsNaNNaNnanNaN19961996
\n", "
" ], "text/plain": [ " EIN name rule_date \\\n", "3475 133636844 American Foundation for Disabled Children NaN \n", "3476 133636844 American Foundation for Disabled Children NaN \n", "3477 133636844 American Foundation for Disabled Children NaN \n", "3478 133636844 American Foundation for Disabled Children NaN \n", "3479 133636844 American Foundation for Disabled Children NaN \n", "3480 133636844 American Foundation for Disabled Children NaN \n", "3481 133636844 American Foundation for Disabled Children NaN \n", "3482 133636844 American Foundation for Disabled Children NaN \n", "3483 133636844 American Foundation for Disabled Children NaN \n", "6395 942923077 ASCEND: a Humanitarian Alliance NaN \n", "7339 351965051 Backstreet Missions NaN \n", "7340 351965051 Backstreet Missions NaN \n", "7341 351965051 Backstreet Missions NaN \n", "7342 351965051 Backstreet Missions NaN \n", "7343 351965051 Backstreet Missions NaN \n", "7344 351965051 Backstreet Missions NaN \n", "7345 351965051 Backstreet Missions NaN \n", "7346 351965051 Backstreet Missions NaN \n", "\n", " yr_frmtn yr_frmtn_v2 RULING_2015_BMF ruledate_2004_BMF_v2 \\\n", "3475 NaN nan NaN 1997 \n", "3476 NaN nan NaN 1997 \n", "3477 NaN nan NaN 1997 \n", "3478 NaN nan NaN 1997 \n", "3479 NaN nan NaN 1997 \n", "3480 NaN nan NaN 1997 \n", "3481 NaN nan NaN 1997 \n", "3482 NaN nan NaN 1997 \n", "3483 NaN nan NaN 1997 \n", "6395 NaN nan NaN 1984 \n", "7339 NaN nan NaN 1996 \n", "7340 NaN nan NaN 1996 \n", "7341 NaN nan NaN 1996 \n", "7342 NaN nan NaN 1996 \n", "7343 NaN nan NaN 1996 \n", "7344 NaN nan NaN 1996 \n", "7345 NaN nan NaN 1996 \n", "7346 NaN nan NaN 1996 \n", "\n", " ruledate_MSTRALL_v2 \n", "3475 1997 \n", "3476 1997 \n", "3477 1997 \n", "3478 1997 \n", "3479 1997 \n", "3480 1997 \n", "3481 1997 \n", "3482 1997 \n", "3483 1997 \n", "6395 1984 \n", "7339 1996 \n", "7340 1996 \n", "7341 1996 \n", "7342 1996 \n", "7343 1996 \n", "7344 1996 \n", "7345 1996 \n", "7346 1996 " ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['rule_date'].isnull() & df['ruledate_2004_BMF'].notnull()][['EIN', 'name', \n", " 'rule_date', 'yr_frmtn', 'yr_frmtn_v2',\n", " 'RULING_2015_BMF', 'ruledate_2004_BMF_v2', 'ruledate_MSTRALL_v2']][4:22]" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df['ruledate_2004_BMF_v2'] = df['ruledate_2004_BMF_v2'].astype('float')#.dtype" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnamerule_dateyr_frmtnyr_frmtn_v2RULING_2015_BMFruledate_2004_BMF_v2ruledate_MSTRALL_v2
16998440616374Children's TLCNaNNaNnanNaN1958.01958
16999440616374Children's TLCNaNNaNnanNaN1958.01958
17000440616374Children's TLCNaNNaNnanNaN1958.01958
20070066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20071066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20072066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20073066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20074066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20075066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20076066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20077066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20078066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20079066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20080066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20081066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20082066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20083066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20084066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
43176131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43177131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43178131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43179131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43180131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43181131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43182131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43183131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43184131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43185131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43186131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43187131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
53168990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53169990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53170990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53171990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53172990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53173990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53174990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53175990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53176990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53177990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53178990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53179990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53180990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53181990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
82725942719901Yavneh Day SchoolNaNNaNnanNaN1900.00000
83903942719901Yavneh Day SchoolNaNNaNnanNaN1900.00000
\n", "
" ], "text/plain": [ " EIN name rule_date \\\n", "16998 440616374 Children's TLC NaN \n", "16999 440616374 Children's TLC NaN \n", "17000 440616374 Children's TLC NaN \n", "20070 066079596 Community Health Charities of New England NaN \n", "20071 066079596 Community Health Charities of New England NaN \n", "20072 066079596 Community Health Charities of New England NaN \n", "20073 066079596 Community Health Charities of New England NaN \n", "20074 066079596 Community Health Charities of New England NaN \n", "20075 066079596 Community Health Charities of New England NaN \n", "20076 066079596 Community Health Charities of New England NaN \n", "20077 066079596 Community Health Charities of New England NaN \n", "20078 066079596 Community Health Charities of New England NaN \n", "20079 066079596 Community Health Charities of New England NaN \n", "20080 066079596 Community Health Charities of New England NaN \n", "20081 066079596 Community Health Charities of New England NaN \n", "20082 066079596 Community Health Charities of New England NaN \n", "20083 066079596 Community Health Charities of New England NaN \n", "20084 066079596 Community Health Charities of New England NaN \n", "43176 131777413 Kidney & Urology Foundation of America NaN \n", "43177 131777413 Kidney & Urology Foundation of America NaN \n", "43178 131777413 Kidney & Urology Foundation of America NaN \n", "43179 131777413 Kidney & Urology Foundation of America NaN \n", "43180 131777413 Kidney & Urology Foundation of America NaN \n", "43181 131777413 Kidney & Urology Foundation of America NaN \n", "43182 131777413 Kidney & Urology Foundation of America NaN \n", "43183 131777413 Kidney & Urology Foundation of America NaN \n", "43184 131777413 Kidney & Urology Foundation of America NaN \n", "43185 131777413 Kidney & Urology Foundation of America NaN \n", "43186 131777413 Kidney & Urology Foundation of America NaN \n", "43187 131777413 Kidney & Urology Foundation of America NaN \n", "53168 990266733 National Kidney Foundation of Hawaii NaN \n", "53169 990266733 National Kidney Foundation of Hawaii NaN \n", "53170 990266733 National Kidney Foundation of Hawaii NaN \n", "53171 990266733 National Kidney Foundation of Hawaii NaN \n", "53172 990266733 National Kidney Foundation of Hawaii NaN \n", "53173 990266733 National Kidney Foundation of Hawaii NaN \n", "53174 990266733 National Kidney Foundation of Hawaii NaN \n", "53175 990266733 National Kidney Foundation of Hawaii NaN \n", "53176 990266733 National Kidney Foundation of Hawaii NaN \n", "53177 990266733 National Kidney Foundation of Hawaii NaN \n", "53178 990266733 National Kidney Foundation of Hawaii NaN \n", "53179 990266733 National Kidney Foundation of Hawaii NaN \n", "53180 990266733 National Kidney Foundation of Hawaii NaN \n", "53181 990266733 National Kidney Foundation of Hawaii NaN \n", "82725 942719901 Yavneh Day School NaN \n", "83903 942719901 Yavneh Day School NaN \n", "\n", " yr_frmtn yr_frmtn_v2 RULING_2015_BMF ruledate_2004_BMF_v2 \\\n", "16998 NaN nan NaN 1958.0 \n", "16999 NaN nan NaN 1958.0 \n", "17000 NaN nan NaN 1958.0 \n", "20070 NaN nan NaN 1966.0 \n", "20071 NaN nan NaN 1966.0 \n", "20072 NaN nan NaN 1966.0 \n", "20073 NaN nan NaN 1966.0 \n", "20074 NaN nan NaN 1966.0 \n", "20075 NaN nan NaN 1966.0 \n", "20076 NaN nan NaN 1966.0 \n", "20077 NaN nan NaN 1966.0 \n", "20078 NaN nan NaN 1966.0 \n", "20079 NaN nan NaN 1966.0 \n", "20080 NaN nan NaN 1966.0 \n", "20081 NaN nan NaN 1966.0 \n", "20082 NaN nan NaN 1966.0 \n", "20083 NaN nan NaN 1966.0 \n", "20084 NaN nan NaN 1966.0 \n", "43176 NaN nan NaN 1969.0 \n", "43177 NaN nan NaN 1969.0 \n", "43178 NaN nan NaN 1969.0 \n", "43179 NaN nan NaN 1969.0 \n", "43180 NaN nan NaN 1969.0 \n", "43181 NaN nan NaN 1969.0 \n", "43182 NaN nan NaN 1969.0 \n", "43183 NaN nan NaN 1969.0 \n", "43184 NaN nan NaN 1969.0 \n", "43185 NaN nan NaN 1969.0 \n", "43186 NaN nan NaN 1969.0 \n", "43187 NaN nan NaN 1969.0 \n", "53168 NaN nan NaN 1969.0 \n", "53169 NaN nan NaN 1969.0 \n", "53170 NaN nan NaN 1969.0 \n", "53171 NaN nan NaN 1969.0 \n", "53172 NaN nan NaN 1969.0 \n", "53173 NaN nan NaN 1969.0 \n", "53174 NaN nan NaN 1969.0 \n", "53175 NaN nan NaN 1969.0 \n", "53176 NaN nan NaN 1969.0 \n", "53177 NaN nan NaN 1969.0 \n", "53178 NaN nan NaN 1969.0 \n", "53179 NaN nan NaN 1969.0 \n", "53180 NaN nan NaN 1969.0 \n", "53181 NaN nan NaN 1969.0 \n", "82725 NaN nan NaN 1900.0 \n", "83903 NaN nan NaN 1900.0 \n", "\n", " ruledate_MSTRALL_v2 \n", "16998 1958 \n", "16999 1958 \n", "17000 1958 \n", "20070 1955 \n", "20071 1955 \n", "20072 1955 \n", "20073 1955 \n", "20074 1955 \n", "20075 1955 \n", "20076 1955 \n", "20077 1955 \n", "20078 1955 \n", "20079 1955 \n", "20080 1955 \n", "20081 1955 \n", "20082 1955 \n", "20083 1955 \n", "20084 1955 \n", "43176 \n", "43177 \n", "43178 \n", "43179 \n", "43180 \n", "43181 \n", "43182 \n", "43183 \n", "43184 \n", "43185 \n", "43186 \n", "43187 \n", "53168 1969 \n", "53169 1969 \n", "53170 1969 \n", "53171 1969 \n", "53172 1969 \n", "53173 1969 \n", "53174 1969 \n", "53175 1969 \n", "53176 1969 \n", "53177 1969 \n", "53178 1969 \n", "53179 1969 \n", "53180 1969 \n", "53181 1969 \n", "82725 0000 \n", "83903 0000 " ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['rule_date'].isnull()) & (df['ruledate_2004_BMF'].notnull()) & (df['ruledate_2004_BMF_v2'].notnull())\n", " & (df['ruledate_2004_BMF_v2']<1970)][['EIN', 'name', \n", " 'rule_date', 'yr_frmtn', 'yr_frmtn_v2',\n", " 'RULING_2015_BMF', 'ruledate_2004_BMF_v2', 'ruledate_MSTRALL_v2']]" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "80768\n", "80768\n" ] } ], "source": [ "print df['ruledate_2004_BMF_v2'].value_counts().sum()\n", "df['ruledate_2004_BMF_v2'] = np.where(df['EIN']=='942719901', 1980, df['ruledate_2004_BMF_v2'])\n", "print df['ruledate_2004_BMF_v2'].value_counts().sum()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnamerule_dateyr_frmtnyr_frmtn_v2RULING_2015_BMFruledate_2004_BMF_v2ruledate_MSTRALL_v2
16998440616374Children's TLCNaNNaNnanNaN1958.01958
16999440616374Children's TLCNaNNaNnanNaN1958.01958
17000440616374Children's TLCNaNNaNnanNaN1958.01958
20070066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20071066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20072066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20073066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20074066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20075066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20076066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20077066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20078066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20079066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20080066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20081066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20082066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20083066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
20084066079596Community Health Charities of New EnglandNaNNaNnanNaN1966.01955
43176131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43177131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43178131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43179131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43180131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43181131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43182131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43183131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43184131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43185131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43186131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
43187131777413Kidney & Urology Foundation of AmericaNaNNaNnanNaN1969.0
53168990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53169990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53170990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53171990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53172990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53173990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53174990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53175990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53176990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53177990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53178990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53179990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53180990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
53181990266733National Kidney Foundation of HawaiiNaNNaNnanNaN1969.01969
\n", "
" ], "text/plain": [ " EIN name rule_date \\\n", "16998 440616374 Children's TLC NaN \n", "16999 440616374 Children's TLC NaN \n", "17000 440616374 Children's TLC NaN \n", "20070 066079596 Community Health Charities of New England NaN \n", "20071 066079596 Community Health Charities of New England NaN \n", "20072 066079596 Community Health Charities of New England NaN \n", "20073 066079596 Community Health Charities of New England NaN \n", "20074 066079596 Community Health Charities of New England NaN \n", "20075 066079596 Community Health Charities of New England NaN \n", "20076 066079596 Community Health Charities of New England NaN \n", "20077 066079596 Community Health Charities of New England NaN \n", "20078 066079596 Community Health Charities of New England NaN \n", "20079 066079596 Community Health Charities of New England NaN \n", "20080 066079596 Community Health Charities of New England NaN \n", "20081 066079596 Community Health Charities of New England NaN \n", "20082 066079596 Community Health Charities of New England NaN \n", "20083 066079596 Community Health Charities of New England NaN \n", "20084 066079596 Community Health Charities of New England NaN \n", "43176 131777413 Kidney & Urology Foundation of America NaN \n", "43177 131777413 Kidney & Urology Foundation of America NaN \n", "43178 131777413 Kidney & Urology Foundation of America NaN \n", "43179 131777413 Kidney & Urology Foundation of America NaN \n", "43180 131777413 Kidney & Urology Foundation of America NaN \n", "43181 131777413 Kidney & Urology Foundation of America NaN \n", "43182 131777413 Kidney & Urology Foundation of America NaN \n", "43183 131777413 Kidney & Urology Foundation of America NaN \n", "43184 131777413 Kidney & Urology Foundation of America NaN \n", "43185 131777413 Kidney & Urology Foundation of America NaN \n", "43186 131777413 Kidney & Urology Foundation of America NaN \n", "43187 131777413 Kidney & Urology Foundation of America NaN \n", "53168 990266733 National Kidney Foundation of Hawaii NaN \n", "53169 990266733 National Kidney Foundation of Hawaii NaN \n", "53170 990266733 National Kidney Foundation of Hawaii NaN \n", "53171 990266733 National Kidney Foundation of Hawaii NaN \n", "53172 990266733 National Kidney Foundation of Hawaii NaN \n", "53173 990266733 National Kidney Foundation of Hawaii NaN \n", "53174 990266733 National Kidney Foundation of Hawaii NaN \n", "53175 990266733 National Kidney Foundation of Hawaii NaN \n", "53176 990266733 National Kidney Foundation of Hawaii NaN \n", "53177 990266733 National Kidney Foundation of Hawaii NaN \n", "53178 990266733 National Kidney Foundation of Hawaii NaN \n", "53179 990266733 National Kidney Foundation of Hawaii NaN \n", "53180 990266733 National Kidney Foundation of Hawaii NaN \n", "53181 990266733 National Kidney Foundation of Hawaii NaN \n", "\n", " yr_frmtn yr_frmtn_v2 RULING_2015_BMF ruledate_2004_BMF_v2 \\\n", "16998 NaN nan NaN 1958.0 \n", "16999 NaN nan NaN 1958.0 \n", "17000 NaN nan NaN 1958.0 \n", "20070 NaN nan NaN 1966.0 \n", "20071 NaN nan NaN 1966.0 \n", "20072 NaN nan NaN 1966.0 \n", "20073 NaN nan NaN 1966.0 \n", "20074 NaN nan NaN 1966.0 \n", "20075 NaN nan NaN 1966.0 \n", "20076 NaN nan NaN 1966.0 \n", "20077 NaN nan NaN 1966.0 \n", "20078 NaN nan NaN 1966.0 \n", "20079 NaN nan NaN 1966.0 \n", "20080 NaN nan NaN 1966.0 \n", "20081 NaN nan NaN 1966.0 \n", "20082 NaN nan NaN 1966.0 \n", "20083 NaN nan NaN 1966.0 \n", "20084 NaN nan NaN 1966.0 \n", "43176 NaN nan NaN 1969.0 \n", "43177 NaN nan NaN 1969.0 \n", "43178 NaN nan NaN 1969.0 \n", "43179 NaN nan NaN 1969.0 \n", "43180 NaN nan NaN 1969.0 \n", "43181 NaN nan NaN 1969.0 \n", "43182 NaN nan NaN 1969.0 \n", "43183 NaN nan NaN 1969.0 \n", "43184 NaN nan NaN 1969.0 \n", "43185 NaN nan NaN 1969.0 \n", "43186 NaN nan NaN 1969.0 \n", "43187 NaN nan NaN 1969.0 \n", "53168 NaN nan NaN 1969.0 \n", "53169 NaN nan NaN 1969.0 \n", "53170 NaN nan NaN 1969.0 \n", "53171 NaN nan NaN 1969.0 \n", "53172 NaN nan NaN 1969.0 \n", "53173 NaN nan NaN 1969.0 \n", "53174 NaN nan NaN 1969.0 \n", "53175 NaN nan NaN 1969.0 \n", "53176 NaN nan NaN 1969.0 \n", "53177 NaN nan NaN 1969.0 \n", "53178 NaN nan NaN 1969.0 \n", "53179 NaN nan NaN 1969.0 \n", "53180 NaN nan NaN 1969.0 \n", "53181 NaN nan NaN 1969.0 \n", "\n", " ruledate_MSTRALL_v2 \n", "16998 1958 \n", "16999 1958 \n", "17000 1958 \n", "20070 1955 \n", "20071 1955 \n", "20072 1955 \n", "20073 1955 \n", "20074 1955 \n", "20075 1955 \n", "20076 1955 \n", "20077 1955 \n", "20078 1955 \n", "20079 1955 \n", "20080 1955 \n", "20081 1955 \n", "20082 1955 \n", "20083 1955 \n", "20084 1955 \n", "43176 \n", "43177 \n", "43178 \n", "43179 \n", "43180 \n", "43181 \n", "43182 \n", "43183 \n", "43184 \n", "43185 \n", "43186 \n", "43187 \n", "53168 1969 \n", "53169 1969 \n", "53170 1969 \n", "53171 1969 \n", "53172 1969 \n", "53173 1969 \n", "53174 1969 \n", "53175 1969 \n", "53176 1969 \n", "53177 1969 \n", "53178 1969 \n", "53179 1969 \n", "53180 1969 \n", "53181 1969 " ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['rule_date'].isnull()) & (df['ruledate_2004_BMF'].notnull()) & (df['ruledate_2004_BMF_v2'].notnull())\n", " & (df['ruledate_2004_BMF_v2']<1970)][['EIN', 'name', \n", " 'rule_date', 'yr_frmtn', 'yr_frmtn_v2',\n", " 'RULING_2015_BMF', 'ruledate_2004_BMF_v2', 'ruledate_MSTRALL_v2']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Quick save " ] }, { "cell_type": "code", "execution_count": 286, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.to_pickle('quick save - merged with age fixes.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Now merge in 2004 BMF rule date values" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1249\n", "1091\n" ] } ], "source": [ "print len(df[df['rule_date'].isnull()])\n", "df['rule_date'] = np.where( ( df['rule_date'].isnull() & df['ruledate_2004_BMF_v2'].notnull() ), \n", " df['ruledate_2004_BMF_v2'].astype('str').str[:4], df['rule_date']\n", " )\n", "print len(df[df['rule_date'].isnull()])" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1091\n", "83867\n", "989\n", "5\n" ] } ], "source": [ "print len(df[(df['rule_date'].isnull())])\n", "print len(df[df['rule_date'].notnull()])\n", "print len(df[(df['rule_date'].isnull()) & df['yr_frmtn'].notnull()])\n", "print len(df[(df['rule_date'].isnull()) & df['ruledate_MSTRALL'].notnull()])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Merge in MSTRALL values" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnamerule_dateyr_frmtnyr_frmtn_v2RULING_2015_BMFruledate_2004_BMF_v2ruledate_MSTRALL_v2
2299591951577Allied Veterans of the World Inc. and AffiliatesNaNNaNnanNaNNaN1996
7107251892177August Wilson Center for African American CultureNaNNaNnanNaNNaN2002
11932050604703C & O Conservation Inc.NaNNaNnanNaNNaN2004
20124300256973Community Rehabilitation Center FoundationNaNNaNnanNaNNaN2004
56127030498214Newark NowNaNNaNnanNaNNaN2003
\n", "
" ], "text/plain": [ " EIN name rule_date \\\n", "2299 591951577 Allied Veterans of the World Inc. and Affiliates NaN \n", "7107 251892177 August Wilson Center for African American Culture NaN \n", "11932 050604703 C & O Conservation Inc. NaN \n", "20124 300256973 Community Rehabilitation Center Foundation NaN \n", "56127 030498214 Newark Now NaN \n", "\n", " yr_frmtn yr_frmtn_v2 RULING_2015_BMF ruledate_2004_BMF_v2 \\\n", "2299 NaN nan NaN NaN \n", "7107 NaN nan NaN NaN \n", "11932 NaN nan NaN NaN \n", "20124 NaN nan NaN NaN \n", "56127 NaN nan NaN NaN \n", "\n", " ruledate_MSTRALL_v2 \n", "2299 1996 \n", "7107 2002 \n", "11932 2004 \n", "20124 2004 \n", "56127 2003 " ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['rule_date'].isnull() & df['ruledate_MSTRALL_v2'].notnull()][['EIN', 'name', \n", " 'rule_date',\n", " 'yr_frmtn', 'yr_frmtn_v2',\n", " 'RULING_2015_BMF', 'ruledate_2004_BMF_v2', \n", " 'ruledate_MSTRALL_v2']]" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1091\n", "1086\n" ] } ], "source": [ "print len(df[df['rule_date'].isnull()])\n", "df['rule_date'] = np.where( ( df['rule_date'].isnull() & df['ruledate_MSTRALL_v2'].notnull() ), \n", " df['ruledate_MSTRALL_v2'].str[:4], df['rule_date']\n", " )\n", "print len(df[df['rule_date'].isnull()])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Quick save" ] }, { "cell_type": "code", "execution_count": 317, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.to_pickle('quick save - merged with age fixes.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### No need to try with SOI variable\n", "NOTE: All 985 cases where the *yr_frmtn* is available but *rule_date* is missing are 'right_only' merges ('SOI only' data)." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "989\n", "278\n", "989\n", "278\n" ] } ], "source": [ "print len(df[(df['rule_date'].isnull()) & (df['yr_frmtn'].notnull())])\n", "print len(df[(df['rule_date'].isnull()) & (df['yr_frmtn'].notnull()) & (df['yr_frmtn']<1955)])\n", "print len(df[(df['rule_date'].isnull()) & (df['yr_frmtn'].notnull()) & (df['_merge']=='right_only')])\n", "print len(df[(df['rule_date'].isnull()) & (df['yr_frmtn'].notnull()) & (df['yr_frmtn']<1955)\n", " & (df['_merge']=='right_only')])" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnamerule_dateyr_frmtnyr_frmtn_v2RULING_2015_BMFruledate_2004_BMF_v2ruledate_MSTRALL_v2_merge
83945010211478NaNNaN1937.01937NaNNaNNaNright_only
83946010212541NaNNaN1911.01911NaNNaNNaNright_only
\n", "
" ], "text/plain": [ " EIN name rule_date yr_frmtn yr_frmtn_v2 RULING_2015_BMF \\\n", "83945 010211478 NaN NaN 1937.0 1937 NaN \n", "83946 010212541 NaN NaN 1911.0 1911 NaN \n", "\n", " ruledate_2004_BMF_v2 ruledate_MSTRALL_v2 _merge \n", "83945 NaN NaN right_only \n", "83946 NaN NaN right_only " ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['rule_date'].isnull()) & (df['yr_frmtn'].notnull()) & (df['yr_frmtn']<1940)\n", " ][['EIN', 'name', \n", " 'rule_date', 'yr_frmtn', 'yr_frmtn_v2',\n", " 'RULING_2015_BMF', 'ruledate_2004_BMF_v2', 'ruledate_MSTRALL_v2', '_merge']][:2]" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "83872\n", "83876\n" ] } ], "source": [ "print df['rule_date'].value_counts().sum()\n", "df['rule_date'] = np.where(df['EIN']=='010211478', '1959', df['rule_date'])\n", "df['rule_date'] = np.where(df['EIN']=='010212541', '1942', df['rule_date'])\n", "print df['rule_date'].value_counts().sum()" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#df[df['EIN']=='010211478'][['EIN', 'name', \n", "# 'rule_date', 'yr_frmtn', 'yr_frmtn_v2',\n", "# 'RULING_2015_BMF', 'ruledate_2004_BMF_v2', 'ruledate_MSTRALL_v2']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create *Age* variable " ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "83876\n", "46\n", "0\n", "0\n" ] } ], "source": [ "print df['rule_date'].value_counts().sum()\n", "print len(df[df['rule_date']=='0.0'])\n", "print len(df[df['rule_date']==''])\n", "print len(df[df['rule_date']=='0000'])" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "83876\n", "0\n", "83830\n" ] } ], "source": [ "print df['rule_date'].value_counts().sum()\n", "df['rule_date'] = np.where(df['rule_date']=='0.0', np.nan, df['rule_date'])\n", "print len(df[df['rule_date']=='0.0'])\n", "print df['rule_date'].value_counts().sum()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#print df['rule_date'].value_counts().sum()\n", "#df['rule_date'] = np.where(df['rule_date']=='', np.nan, df['rule_date'])\n", "#print len(df[df['rule_date']=='0.0'])\n", "#print df['rule_date'].value_counts().sum()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#print df['rule_date'].value_counts().sum()\n", "#print len(df[df['rule_date']=='0000'])\n", "#df['rule_date'] = np.where(df['rule_date']=='0000', np.nan, df['rule_date'])\n", "#print len(df[df['rule_date']=='0000'])\n", "#print df['rule_date'].value_counts().sum()" ] }, { "cell_type": "code", "execution_count": 348, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#df['age'] = np.nan" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for index, row in df.iterrows():\n", " if pd.notnull(row['rule_date']):\n", " df.ix[index, 'age'] = 2016 - int(row['rule_date'])\n", " else:\n", " pass" ] }, { "cell_type": "code", "execution_count": 350, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "83830\n" ] } ], "source": [ "print df['age'].value_counts().sum()" ] }, { "cell_type": "code", "execution_count": 351, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 351, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAECCAYAAADU5FG5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHEdJREFUeJzt3X+cXHV97/HXJhDIupOVxQ21IOaaq5/04eMWDS0WbkpA\nQQWraB9K+qAoaE1uuYEKjwvXEht81N5VLI1I5N7cGoIEaa8IVyKYG8BH8cJuaTWhKI3GT8KGYNvL\nj012sz9IIMnu3j/O2XAyO7t7Zs6cmXNm3s/HI4/MnDln5vvdnZ33fH+c72kZHx9HRESkUrPqXQAR\nEck3BYmIiCSiIBERkUQUJCIikoiCREREElGQiIhIIsel+eRmNgtYDxgwBvwxMAf4AbAz3G2du99n\nZsuBFcBhoMvdN5vZicA9wHxgCLjC3felWWYRESlPS5rnkZjZJcCH3f2zZrYUuA54CJjn7rdG9jsF\n+CGwGGgFeoAzgauBgrt/ycyWAWe7+7WpFVhERMqWaovE3b9vZg+FdxcAAwQBYWb2UYJWyXXAWUCP\nux8BhsxsF3AGsAT4anj8FmB1muUVEZHypT5G4u5jZnYXcBvwN8CPgevdfSmwG/giMA8YjBw2ArQD\nhcj24XA/ERHJkJoMtrv7lcA7gDuAR9396fChTcC7CMIiGhIFgtbLUHh7Ytv+WpRXRETiS3uw/XLg\nNHe/GXiVYMD9e2b2J+6+FXgf8BSwFegysznAXGARsB14ErgY2Bb+3z3Ta46Pj4+3tLSkUR0RkUZW\n8Qdn2oPtrcC3gF8jCK2bgX8BbgcOAS8CK9x9xMz+CPhPBJXpcvdNZjYX2Ai8GXgNuMzdX57hZcf7\n+oZTqU8WdHYWaNT6NXLdQPXLuyaoXzaDpE4UJDnVyHUD1S/vmqB+FQeJTkgUEZFEFCQiIpKIgkRE\nRBJRkIiISCIKEhERSURBIiIiiShIREQkEQWJiIgkoiAREZFEFCQiIpKIgkRERBJRkIiISCIKEhER\nSURBIiIiiShIREQkEQWJiIgkkuqldiW7RkdH2bNn99H7Cxa8jdmzZ9exRCKSVwqSJrVnz24+d8uD\ntLbP58Dgy9x2w0dYuPDt9S6WiOSQgqSJtbbPp+2kU+tdDBHJOY2RiIhIIgoSERFJREEiIiKJKEhE\nRCQRBYmIiCSS6qwtM5sFrAcMGAP+GHgNuCu8v93dV4b7LgdWAIeBLnffbGYnAvcA84Eh4Ap335dm\nmUVEpDxpt0g+DIy7+xJgNfBl4GvAKndfCswys0vM7BTgGuBs4IPAV8zseOAq4Bl3Pxf4dvgcIiKS\nIakGibt/n6CVAfBWYABY7O7d4bYtwIXAWUCPux9x9yFgF3AGsAR4OLLvBWmWV0REypf6GIm7j5nZ\nXcBa4G+BlsjDw8A8oAAMRraPAO1F2yf2FRGRDKnJme3ufqWZzQe2AnMjDxWA/QTjH/OKtg+E2wtF\n+86os7Mw8045Vo36DQy0HXO/o6MtEz+3LJQhTapfvjV6/SqV9mD75cBp7n4z8CowCmwzs6Xu/jhw\nEfAYQcB0mdkcgqBZBGwHngQuBraF/3dPfpXJ+vqGq12VzOjsLFSlfv39I5Pu1/vnVq26ZZXql2/N\nUL9Kpd0i+R7wLTN7PHytPwF+CdwRDqbvAO5393EzWwv0EHR9rXL3Q2a2DthoZt0Es70uS7m8IiJS\nplSDxN0PAMtKPHReiX03ABuKth0ELk2lcCIiUhU6IVFERBJRkIiISCIKEhERSURBIiIiiShIREQk\nEQWJiIgkoiAREZFEFCQiIpKIgkRERBJRkIiISCIKEhERSURBIiIiiShIREQkEQWJiIgkoiAREZFE\nFCQiIpKIgkRERBJRkIiISCIKEhERSURBIiIiiShIREQkEQWJiIgkoiAREZFEFCQiIpLIcWk9sZkd\nB9wJLADmAF3AvwA/AHaGu61z9/vMbDmwAjgMdLn7ZjM7EbgHmA8MAVe4+760yisiIpVJLUiAy4G9\n7v4pMzsJ+Cnw58Aad791YiczOwW4BlgMtAI9ZvYocBXwjLt/ycyWAauBa1Msr4iIVCDNIPkucF94\nexZBa+NMYJGZfZSgVXIdcBbQ4+5HgCEz2wWcASwBvhoev4UgSEREJGNSGyNx9wPu/oqZFQgC5c+A\nnwDXu/tSYDfwRWAeMBg5dARoBwqR7cPhfiIikjFptkgws7cA3wNud/fvmFm7u0+EwyZgLfA4x4ZE\nARggGBcpRLbtj/u6nZ2FmXfKsWrUb2Cg7Zj7HR1tmfi5ZaEMaVL98q3R61epNAfbTwEeAVa6+4/C\nzY+Y2dXuvg14H/AUsBXoMrM5wFxgEbAdeBK4GNgW/t8d97X7+oarVo+s6ewsVKV+/f0jk+7X++dW\nrbplleqXb81Qv0ql2SK5EXgjsNrMbgLGCcZEvm5mh4AXgRXuPmJma4EeoAVY5e6HzGwdsNHMuoHX\ngMtSLKuIiFQotSBx92spPctqSYl9NwAbirYdBC5Np3QSNT42xq9+9fzR+wsWvI3Zs2fHOnZ0dJQ9\ne3ZXdKyINIZUx0gkHw4O97Hm3r20tr/AgcGXue2Gj7Bw4dtjHbtnz24+d8uDtLbPL/tYEWkMChIB\noLV9Pm0nnVrzY0Uk/7REioiIJKIgERGRRBQkIiKSiIJEREQSUZCIiEgiChIREUlE038lE0ZHR9m5\nc+fRpVt0YqNIfihIJBN0YqNIfilIJDN0YqNIPilIGoTWvBKRelGQNAh1DYlIvShIGoi6hkSkHjT9\nV0REElGQiIhIIurakilpAF9E4lCQyJQ0gC8icShIcizaYoheKreaNIAvIjNRkORYtMWw7193cPJp\nv1HvIolIE4oVJGb2f4BvAZvc/XC6RZLpjI6O0tu7CwhaIRMthgODL9W5ZCLSrOK2SG4GrgBuMbPN\nwF3uvjW9YslUent71QoRkUyJFSTu/gTwhJnNBT4O/G8zGwLuANa5+2spllGKzNQKGR8bO2bMpNqz\nrWoxNiMi+RF7jMTMzgM+Cbwf2ALcC1wIPAh8II3CSWUODvex5t69tLa/kMpsK43NiEhU3DGS54Hd\nBOMkV7v7wXD7/wVKdnGZ2XHAncACYA7QBfwCuAsYA7a7+8pw3+XACuAw0OXum83sROAeYD4wBFzh\n7vsqqWQzSnu2lcZmRGRC3DPb3wssc/e7Aczs3wO4+6i7L57imMuBve5+LvBB4Hbga8Aqd18KzDKz\nS8zsFOAa4Oxwv6+Y2fHAVcAz4fHfBlZXVEMpy0S3WG/vrmO6rabaLiISt2vrQ8CVwGKCFsJDZnar\nu39zmmO+C9wX3p4NHAEWu3t3uG0LQTfZGNDj7keAITPbBZwBLAG+GtlXQVID0W6xaLfVVNtFROIG\nyQrgPQDu/ryZnQn8GJgySNz9AICZFQgC5QvAX0V2GQbmAQVgMLJ9BGgv2j6xb1OKDm4PDval/npT\ndVupO0tESokbJMcD0ZlZh4DxmQ4ys7cA3wNud/fvmNlfRh4uAPsJxj/mFW0fCLcXivaNpbOzMPNO\nObJz585Eg9sdHW2TfiYDA23VLOK0rxVHcXkqfZ6sa8Q6Ral+zSlukGwCHjOz74b3f59gttaUwrGP\nR4CV7v6jcPPTZnZuOJ34IuAxgsH6LjObA8wFFgHbgSeBi4Ft4f/dxNTXNxx311zo7x9J1Bro7x+Z\n9DPp7x+pVvFmfK24x1XjebKss7PQcHWKUv3yLUlIxj2P5PNm9nFgKcHMqrXuvmmGw24E3gisNrOb\nCFownwO+EQ6m7wDud/dxM1sL9AAtBIPxh8xsHbDRzLoJWkOXVVA/ERFJWTlrbe0AXiL4sCfSsijJ\n3a8Fri3x0Hkl9t0AbCjadhC4tIzyiYhIHcQ9j+S/Ax8GeiObxwmmBYuISBOL2yJ5P2ATJyKKNApd\nvEskubhBspuwS0ukkejiXSLJxQ2SfuAXZvYk8OrERnf/TCqlEqkhXbxLJJm4QfJw+E9EROQYcaf/\nbjSzBcA7Cc4NeYu7P5dmwUREJB9iLdpoZsuAh4DbgA7gH8zs8jQLJiIi+RB39d/PA+cAw+7+MvBu\nghMORUSkycUNklF3P7o2gLu/QLBqr4iINLm4g+0/N7OrgePN7F3AfwZ+ml6xREQkL+K2SFYCpwIH\nCa56OEQQJiIi0uTiztp6hWBMROMiIiJyjLhrbY0x+fojL7j7adUvkoiI5EncFsnRLrBwCfiPElxj\nXUREmlw5y8gD4O6HgfvM7AsplKfpaNFAEcm7uF1bn4rcbSE4w/1QKiVqMmkvGjg+NsavfvX80ftp\nBlUtX0tEsiNui+T8yO1xYC+wrPrFaU5pLhp4cLiPNffupbX9hdRXt63la4lIdsQdI/l02gWR9L7R\n13J1W62kK9J84nZtPcfkWVsQdHONu/vbqlqqJqVv9CKSR3G7tv4WeA1YDxwG/hD4bUAD7lWmb/Qi\nkjdxg+QD7v5bkfu3mdlT7v78lEeIiEhTiLtESouZXTBxx8x+j2CZFBERaXJxWyQrgLvN7NcIxkp+\nCVyRWqlERCQ34s7aegp4p5m9CXjV3UfSLZbk3VQz0HQCpkjjiTtr663AHcAC4HfN7EHgM+6+J8ax\n7wFudvfzwyXofwDsDB9e5+73mdlyglbPYaDL3Teb2YnAPcB8gm60K9x9X1m1k7qZagZa9ATMV/a/\nyPV/8G5OP/2tx4SOiORL3K6tvwZuAb4KvAT8L+Bu4NzpDjKzG4BPAhMtmDOBNe5+a2SfU4BrgMVA\nK9BjZo8CVwHPuPuXwkv9rgaujVleyYCpZqBNbD8w+BJr7v0Zre0vsO9fd3Dyab9Rh1KKSFJxB9vf\n5O6PArj7uLuvB+bFOO5Z4GOR+2cCHzKzx81svZm1AWcBPe5+xN2HgF3AGcAS4OHwuC3ABTSRia6h\n3t5dDf1tfSJU5hY66l0UEalQ3BbJQTM7jfCkRDNbQnBeybTc/YGwW2zCj4H17v60md0IfJHgSouD\nkX1GgHagENk+TLzgahjRriF9WxeRLIsbJNcRjG0sNLOfAh3AJyp4vU3uPhEOm4C1wOMcGxIFYIBg\nXKQQ2bY/7ot0dhZm3ikjBgbapnws2gVULR0dbXR2FqZ93WobHxtjcLCPgYE2Bgf7Yh0zUc60Ff8c\n0n7dPL03K6H6Nae4QXIKwZns7wBmA79090pW/33EzK52923A+4CngK1Al5nNAeYCi4DtwJPAxcC2\n8P/uuC/S1zdcQdGqL84Mpf7+2k6A6+8foa9vuKave3C4j5u+uZfW9t7YrauJcqat+OeQ5ut2dhaq\n9txZnP1WzfplUTPUr1Jxg+Qv3X0z8POKXylwFfANMzsEvAiscPcRM1sL9BCs3bXK3Q+Z2Tpgo5l1\nE3SjXZbwtWsu7SXi8ySN1lUz03tLsiRukPSa2Z0EYxwHJza6+90zHRguo3JOePtpgkH04n02ABuK\nth0ELo1ZvszS2lmSFr23JCumDRIzO9Xd/w3YR9Ba+J3Iw+MEU4AlJ6InCTbyTDARqa2ZWiQPAYvd\n/dNm9l/cfU0tCiXpyPtMsCyOC4jIzEHSErn9h4CCJOfyPFahcQGRbJopSKIXs2qZci+RGtG4gEj2\nxD2zHUpfIVFERJrcTC2Sd5rZRKf0qZHbusSuiIgAMwfJO2pSChERya1pg0SX0hURkZnEPSFRpC6i\nU3517otINilIJNOiU37zeO5LrelcG6kHBYlkXp7Pfak1nWsj9aAgEWkwOtdGak1BIrkUXTcM1IUj\nUk8KEsml6Lph6sIRqS8FieSWunBEskFBItLENMtLqkFBItLENMtLqkFBIpmjC3DVlroIJSkFiWRO\n1i7Ape4fkekpSGpE01XLU4+TEKf6Han7R2R6CpIa0XTV7Jvud6TuH5GpKUhqSB9G2affkUj5yrlC\nooiIyCQKEhERSST1ri0zew9ws7ufb2YLgbuAMWC7u68M91kOrAAOA13uvtnMTgTuAeYDQ8AV7r4v\n7fLWgqa3ikgjSTVIzOwG4JPASLjpa8Aqd+82s3Vmdgnwj8A1wGKgFegxs0eBq4Bn3P1LZrYMWA1c\nm2Z5ayVr01tFRJJIu2vrWeBjkftnunt3eHsLcCFwFtDj7kfcfQjYBZwBLAEejux7QcplramJQd25\nhY56F0VEJJFUg8TdHwCORDa1RG4PA/OAAjAY2T4CtBdtn9hXREQyptbTf8citwvAfoLxj3lF2wfC\n7YWifWPp7CzMvFMNDAy01bsITaOjo43OzgKjo6P09vYe3b5w4cJpT/yc7nc08ZzF+0xsr0S13ptT\nlancslazbpCdv720NHr9KlXrIPknMzvX3Z8ALgIeA7YCXWY2B5gLLAK2A08CFwPbwv+7Sz/lZH19\nw9Uud0X6+0dm3kmqor9/hL6+YXp7d5V1Fvp0v6OJ5yzeZ2J7uTo7C1V7b05VpnLLWq26QXXrl0XN\nUL9K1TpIrgfWm9nxwA7gfncfN7O1QA9B19cqdz9kZuuAjWbWDbwGXFbjskpO6aRCkdpKPUjc/Xng\nnPD2LuC8EvtsADYUbTsIXJp2+UREJBktkSJSIa0KLBJQkIhUSKsCiwQUJJJ75a4UEG1JJF1ZQOMx\nk6ml1nwUJJJ75a4UEG1JaGWB6lNLrfkoSKQhlHshrHpcOKuZqKXWXLT6r4iIJKIgERGRRBQkIiKS\niMZIRErQNWNE4lOQiJSga8aIxKcgEZlCtWd2BSsT7zp6G1qYPXvWMbdB511I/ihIpClU8yTEUqJd\nYVMFQ29v7zHnr8wtnDzpts67kDxSkEjDKh7nWHPvz1I7CbG4K2yqYIi2ckrdFskjBYk0rFLjHGme\nhKhgkGalIJGGpjPYq0vraEkpChKRjCr+0Ib6f3BrHS0pRUEiklHRD20gMx/c6raTYgoSkQzTh7bk\ngZZIERGRRBQkIiKSiLq2RDKkFmt8pX1ypjQfBYlIiqLBMDjYN+P+tVjjS1eIlGpTkFSZvu1JVCXB\nUItzX3R+jVSTgqTK9G1PiulDWxpdXYLEzJ4CBsO7zwFfBu4CxoDt7r4y3G85sAI4DHS5++bal7Z8\n+uAQkWZS8yAxsxMA3P29kW3fB1a5e7eZrTOzS4B/BK4BFgOtQI+ZPeruh2tdZpEs00W4pN7q0SI5\nA3iDmT0CzAa+ACx29+7w8S3A+wlaJz3ufgQYMrNdwG8CT9WhzCJANj+0qzVAH60b1H85FsmPegTJ\nAeAWd99gZm8nCI6WyOPDwDygwOvdXwAjQHvNSilSQlavnFiN7tRo3bKyHIvkQz2CZCfwLIC77zKz\nfQTdVxMKwH5giCBQirfPqLOzUJ2SVmBgoK1ury21Uc8xsI6ONjo7C7HfZxP7R013bHRJlpmOHR8b\nY3Cw7+i2jo7WkmUr9Tx51Sj1qLZ6BMlngP8ArDSzXycIi0fNbKm7Pw5cBDwGbAW6zGwOMBdYBGyP\n8wJ9fcOpFHwqmvIrtdLfP0Jf3zD9/SNl7V+8rRrHHhzu46Zv7qW1vZcDgy/z7a9cxkknvXnS85d6\nnjzq7Cw0RD2mkiQk6xEkG4BvmVk3wTjIlcA+4A4zOx7YAdzv7uNmthboIej6WuXuh+pQ3hlpyq80\nKy0qKVCHIAlnXV1e4qHzSuy7gSB4Mk9TfiVtWRzoFwGdkCiSG0kG+tX9KmlSkFRIf5hSD5W2fNX9\nKmlSkFRIf5iSN+p+lbQoSBLQH6ZkmcZUpFYUJCINKqsnT0rj0RUSRRrYRKt5bqGj3kWRBqYgERGR\nRNS1JSKTaAFHKYeCREQm0QKOUg4FiYiUpOVPJC6NkYiISCIN1yL5+rpvc+DAId4w9wQu+8Ql9S6O\nSO4lOR9FYy3NoeGC5O+eDS5hcsIrO7jsE9V9bi2LIs0oyfkoGmtpDg0XJGnSsijSrJKs4qCxlsan\nMZIy6QQvEZFjqUUiIomNj43x3HPP0d8/om7fJqQgEZHEXr/srrp9m5GCRESqolarYUcnvYyOjgIt\nzJ4d9NJrVlh9KEhmoJlaIvUR/duD10OieNLL3MLJtLbP16ywOmrqIJnqjRqlmVoi9RH92ysOiWjr\nR7PC6q9hg2R8bIze3l1H788UEtN9m9EFrESSq+TkxGqERJwvjJJMwwbJyNC+skJCZ+CKpGuqkxPj\nfNAnObs+7hdGqVzDBgmU920m+iZ/Zf+LXP8H7+b009+qcRGRKir1Nxnngz7p1R7V/ZWuTAeJmbUA\n/wM4A3gV+Ky7757+qMpFu7DW3PszXaJUpEZK9QwUf4krp4tZk2RqK9NBAnwUOMHdzzGz9wBfC7el\nTuMiIrWXpOVRHELBl8FjJ8moCzsdWQ+SJcDDAO7+YzP7rUqeRG8ekWypVssjqlQIFT/PVF3YoM+F\nJLIeJPOAwcj9I2Y2y93HynkSjX+IZEvSMY+pxAmhUl3Y5Q7CJ5kJ1oizyLIeJENAIXJ/xhBpGfw5\no0fGGD2wjwOHg0MPDvczt3AyAK+ODPDf1v+QE9s6GHxpN2988zuO7gMtuq3bmbydlXJU8/bE3yTA\ngcGX616Gmb5YDgy00d8/cnTfic+RV0f6+bPlFx5t2cyk+Nhv/sVncz+LrGV8fLzeZZiSmf0+8Hvu\n/hkz+x1gtbt/qN7lEhGR12W9RfIAcKGZ/X14/9P1LIyIiEyW6RaJiIhkny5sJSIiiShIREQkEQWJ\niIgkoiAREZFEsj5rK5Zar8lVC2Z2HHAnsACYA3QBvwDuAsaA7e6+sl7lqxYzmw9sAy4ARmmg+pnZ\nnwIfAY4neH8+QYPUL3x/biR4fx4BltMAv79wKaab3f18M1tIifqY2XJgBXAY6HL3zfUqb7mK6vcu\nYC3B7+814FPu3ldJ/RqlRXJ0TS7gRoI1ufLucmCvu58LfBC4naBeq9x9KTDLzC6pZwGTCj+M/idw\nINzUMPUzs6XA2eF78jzgdBqofsDFwGx3/4/AXwBfJuf1M7MbgPXACeGmSfUxs1OAa4CzCf4uv2Jm\nx9elwGUqUb+vAyvd/b0Ep1p8vtL6NUqQHLMmF1DRmlwZ811gdXh7NsG3hsXu3h1u20LwLT7P/gpY\nB/w/gtONG6l+HwC2m9km4EHgBzRW/XYCx4W9Ae0E317zXr9ngY9F7p9ZVJ8LgbOAHnc/4u5DwC7g\nN2tbzIoV12+Zu/9zePs4gt6ciurXKEFSck2uehWmGtz9gLu/YmYF4D7gC0TXy4Bhgj/gXDKzK4GX\n3f2HvF6v6O8s1/UD3gScCXwcuAr4GxqrfiPAvwN+Cfw1QRdJrt+f7v4AwRe2CcX1mUewZFP0s2aE\nnNSzuH7u/hKAmZ0DrARuZfJnaaz65frDNqLsNbnywMzeAjwGbHT37xD01U4oAPvrUrDq+DTBqgU/\nIhjbuhvojDye9/rtAx4Jv9ntJPi2F/2DzHv9rgMednfj9d/fnMjjea8flP57GyL4sC3enktmtoxg\n/O5id99HhfVrlCD5e4I+W8I1uf55+t2zL+yrfAT4r+6+Mdz8tJmdG96+COgueXAOuPtSdz/f3c8H\nfgp8EtjSKPUDegj6mDGzXwfeAPxdOHYC+a9fP69/c91P0DXydAPVD+CfSrwftwJLzGyOmbUDi4Dt\n9SpgEmZ2OUFL5Dx3n1ix8idUUL+GmLVFY67JdSPwRmC1md0EjAOfA74RDn7tAO6vY/nScD2wvhHq\n5+6bzex3zewnBF0kVwF7gDsaoX4EA7V3mtkTBLPS/hR4isapH5R4P7r7uJmtJfii0EIwGH+onoWs\nRNj1fxvwPPCAmY0Dj7v7n1dSP621JSIiiTRK15aIiNSJgkRERBJRkIiISCIKEhERSURBIiIiiShI\nREQkEQWJiIgkoiAREZFE/j8LU2KJaXGLxgAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df[df['age'].notnull()]['age'].plot.hist(by=None, bins=100)" ] }, { "cell_type": "code", "execution_count": 355, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "32\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINnameagerule_dateyr_frmtn_v2RULING_2015_BMFruledate_2004_BMF_v2ruledate_MSTRALL_v2_merge
13112530196523Carnegie Institution for Science1021914190419140319141914both
13113530196523Carnegie Institution for Science1021914190419140319141914both
13114530196523Carnegie Institution for Science1021914190419140319141914both
13115530196523Carnegie Institution for Science1021914190419140319141914both
13116530196523Carnegie Institution for Science1021914190419140319141914both
13117530196523Carnegie Institution for Science1021914190419140319141914both
13118530196523Carnegie Institution for Science1021914190219140319141914both
13119530196523Carnegie Institution for Science1021914190219140319141914both
13120530196523Carnegie Institution for Science1021914nan19140319141914left_only
13121530196523Carnegie Institution for Science1021914nan19140319141914left_only
13122530196523Carnegie Institution for Science1021914nan19140319141914left_only
13123530196523Carnegie Institution for Science1021914nan19140319141914left_only
13124530196523Carnegie Institution for Science1021914nan19140319141914left_only
13125530196523Carnegie Institution for Science1021914nan19140319141914left_only
13126530196523Carnegie Institution for Science1021914nan19140319141914left_only
13127530196523Carnegie Institution for Science1021914nan19140319141914left_only
13128530196523Carnegie Institution for Science1021914nan19140319141914left_only
64485351054670The Rescue Mission, Fort Wayne1081908nan19081019351935left_only
64486351054670The Rescue Mission, Fort Wayne1081908nan19081019351935left_only
64487351054670The Rescue Mission, Fort Wayne1081908nan19081019351935left_only
64488351054670The Rescue Mission, Fort Wayne1081908nan19081019351935left_only
64489351054670The Rescue Mission, Fort Wayne1081908nan19081019351935left_only
64490351054670The Rescue Mission, Fort Wayne1081908nan19081019351935left_only
64491351054670The Rescue Mission, Fort Wayne1081908nan19081019351935left_only
64492351054670The Rescue Mission, Fort Wayne1081908nan19081019351935left_only
64493351054670The Rescue Mission, Fort Wayne1081908nan19081019351935left_only
64494351054670The Rescue Mission, Fort Wayne1081908nan19081019351935left_only
64495351054670The Rescue Mission, Fort Wayne1081908nan19081019351935left_only
77310750838777United Way of Odessa1031913nan19130819131913left_only
77311750838777United Way of Odessa1031913nan19130819131913left_only
77312750838777United Way of Odessa1031913nan19130819131913left_only
77313750838777United Way of Odessa1031913nan19130819131913left_only
\n", "
" ], "text/plain": [ " EIN name age rule_date yr_frmtn_v2 \\\n", "13112 530196523 Carnegie Institution for Science 102 1914 1904 \n", "13113 530196523 Carnegie Institution for Science 102 1914 1904 \n", "13114 530196523 Carnegie Institution for Science 102 1914 1904 \n", "13115 530196523 Carnegie Institution for Science 102 1914 1904 \n", "13116 530196523 Carnegie Institution for Science 102 1914 1904 \n", "13117 530196523 Carnegie Institution for Science 102 1914 1904 \n", "13118 530196523 Carnegie Institution for Science 102 1914 1902 \n", "13119 530196523 Carnegie Institution for Science 102 1914 1902 \n", "13120 530196523 Carnegie Institution for Science 102 1914 nan \n", "13121 530196523 Carnegie Institution for Science 102 1914 nan \n", "13122 530196523 Carnegie Institution for Science 102 1914 nan \n", "13123 530196523 Carnegie Institution for Science 102 1914 nan \n", "13124 530196523 Carnegie Institution for Science 102 1914 nan \n", "13125 530196523 Carnegie Institution for Science 102 1914 nan \n", "13126 530196523 Carnegie Institution for Science 102 1914 nan \n", "13127 530196523 Carnegie Institution for Science 102 1914 nan \n", "13128 530196523 Carnegie Institution for Science 102 1914 nan \n", "64485 351054670 The Rescue Mission, Fort Wayne 108 1908 nan \n", "64486 351054670 The Rescue Mission, Fort Wayne 108 1908 nan \n", "64487 351054670 The Rescue Mission, Fort Wayne 108 1908 nan \n", "64488 351054670 The Rescue Mission, Fort Wayne 108 1908 nan \n", "64489 351054670 The Rescue Mission, Fort Wayne 108 1908 nan \n", "64490 351054670 The Rescue Mission, Fort Wayne 108 1908 nan \n", "64491 351054670 The Rescue Mission, Fort Wayne 108 1908 nan \n", "64492 351054670 The Rescue Mission, Fort Wayne 108 1908 nan \n", "64493 351054670 The Rescue Mission, Fort Wayne 108 1908 nan \n", "64494 351054670 The Rescue Mission, Fort Wayne 108 1908 nan \n", "64495 351054670 The Rescue Mission, Fort Wayne 108 1908 nan \n", "77310 750838777 United Way of Odessa 103 1913 nan \n", "77311 750838777 United Way of Odessa 103 1913 nan \n", "77312 750838777 United Way of Odessa 103 1913 nan \n", "77313 750838777 United Way of Odessa 103 1913 nan \n", "\n", " RULING_2015_BMF ruledate_2004_BMF_v2 ruledate_MSTRALL_v2 _merge \n", "13112 191403 1914 1914 both \n", "13113 191403 1914 1914 both \n", "13114 191403 1914 1914 both \n", "13115 191403 1914 1914 both \n", "13116 191403 1914 1914 both \n", "13117 191403 1914 1914 both \n", "13118 191403 1914 1914 both \n", "13119 191403 1914 1914 both \n", "13120 191403 1914 1914 left_only \n", "13121 191403 1914 1914 left_only \n", "13122 191403 1914 1914 left_only \n", "13123 191403 1914 1914 left_only \n", "13124 191403 1914 1914 left_only \n", "13125 191403 1914 1914 left_only \n", "13126 191403 1914 1914 left_only \n", "13127 191403 1914 1914 left_only \n", "13128 191403 1914 1914 left_only \n", "64485 190810 1935 1935 left_only \n", "64486 190810 1935 1935 left_only \n", "64487 190810 1935 1935 left_only \n", "64488 190810 1935 1935 left_only \n", "64489 190810 1935 1935 left_only \n", "64490 190810 1935 1935 left_only \n", "64491 190810 1935 1935 left_only \n", "64492 190810 1935 1935 left_only \n", "64493 190810 1935 1935 left_only \n", "64494 190810 1935 1935 left_only \n", "64495 190810 1935 1935 left_only \n", "77310 191308 1913 1913 left_only \n", "77311 191308 1913 1913 left_only \n", "77312 191308 1913 1913 left_only \n", "77313 191308 1913 1913 left_only " ] }, "execution_count": 355, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['age']>100])\n", "df[df['age']>100][['EIN', 'name', 'age',\n", " 'rule_date', 'yr_frmtn_v2',\n", " 'RULING_2015_BMF', 'ruledate_2004_BMF_v2', 'ruledate_MSTRALL_v2', '_merge']]" ] }, { "cell_type": "code", "execution_count": 238, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#df.to_pickle('quick save - merged with age fixes.pkl')" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1009\n", "119\n", "119\n", "0\n" ] } ], "source": [ "print len(df[(df['age'].isnull() & (df['_merge']=='right_only'))])\n", "print len(df[(df['age'].isnull() & (df['_merge']!='right_only'))])\n", "print len(df[(df['age'].isnull() & (df['_merge']=='left_only'))])\n", "print len(df[(df['age'].isnull() & (df['_merge']=='both'))])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Save DF" ] }, { "cell_type": "code", "execution_count": 361, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.to_pickle('quick save - merged with age fixes.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Category \n", "There are 11 categories here." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "83945\n" ] }, { "data": { "text/plain": [ "Human Services 21132\n", "Arts, Culture, Humanities 11519\n", "Health 9803\n", "Community Development 7454\n", "International 7220\n", "Animals 6165\n", "Education 5198\n", "Environment 5086\n", "Religion 5062\n", "Human and Civil Rights 3244\n", "Research and Public Policy 2062\n", "Name: category, dtype: int64" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print df['category'].value_counts().sum()\n", "df['category'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create category dummy variables" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
category_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
00.00.00.00.00.00.01.00.00.00.00.0
10.00.00.00.00.01.00.00.00.00.00.0
20.00.00.00.00.01.00.00.00.00.00.0
30.00.00.00.00.01.00.00.00.00.00.0
40.00.00.00.00.01.00.00.00.00.00.0
\n", "
" ], "text/plain": [ " category_Animals category_Arts, Culture, Humanities \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 0.0 0.0 \n", "4 0.0 0.0 \n", "\n", " category_Community Development category_Education category_Environment \\\n", "0 0.0 0.0 0.0 \n", "1 0.0 0.0 0.0 \n", "2 0.0 0.0 0.0 \n", "3 0.0 0.0 0.0 \n", "4 0.0 0.0 0.0 \n", "\n", " category_Health category_Human Services category_Human and Civil Rights \\\n", "0 0.0 1.0 0.0 \n", "1 1.0 0.0 0.0 \n", "2 1.0 0.0 0.0 \n", "3 1.0 0.0 0.0 \n", "4 1.0 0.0 0.0 \n", "\n", " category_International category_Religion \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 0.0 0.0 \n", "4 0.0 0.0 \n", "\n", " category_Research and Public Policy \n", "0 0.0 \n", "1 0.0 \n", "2 0.0 \n", "3 0.0 \n", "4 0.0 " ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.get_dummies(df['category'], prefix='category').head(5)" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = pd.concat([df, pd.get_dummies(df['category'], prefix='category')], axis=1)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['org_id', 'EIN', 'org_url', 'name', 'category', 'category-full', 'Date Published', 'Form 990 FYE', 'Form 990 FYE, v2', 'FYE', 'Earliest Rating Publication Date', 'ratings_system', 'Overall Score', 'Overall Rating', 'advisory text - current advisory', 'advisory text - past advisory', 'current_or_past_donor_advisory', 'current_donor_advisory', 'past_donor_advisory', 'latest_entry', 'current_ratings_url', 'ein_2016', 'Publication_date_and_FY_2016', 'Publication Date_2016', 'FYE_2016', 'donor_alert_2016', 'overall_rating_2016', 'efficiency_rating_rating_2016', 'AT_rating_2016', 'overall_rating_star_2016', 'financial_rating_star_2016', 'AT_rating_star_2016', 'program_expense_percent_2016', 'admin_expense_percent_2016', 'fund_expense_percent_2016', 'fund_efficiency_2016', 'working_capital_ratio_2016', 'program_expense_growth_2016', 'liabilities_to_assets_2016', 'independent_board_2016', 'no_material_division_2016', 'audited_financials_2016', 'no_loans_related_2016', 'documents_minutes_2016', 'form_990_2016', 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016', 'CEO_listed_2016', 'process_CEO_compensation_2016', 'no_board_compensation_2016', 'donor_privacy_policy_2016', 'board_listed_2016', 'audited_financials_web_2016', 'form_990_web_2016', 'staff_listed_2016', 'contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016', 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016', 'total_contributions_2016', 'program_service_revenue_2016', 'total_primary_revenue_2016', 'other_revenue_2016', 'total_revenue_2016', 'program_expenses_2016', 'administrative_expenses_2016', 'fundraising_expenses_2016', 'total_functional_expenses_2016', 'payments_to_affiliates_2016', 'excess_or_deficit_2016', 'net_assets_2016', 'comp_2016', 'cp_2016', 'mission_2016', '2011 data', 'charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_2011_plus_30', 'overall_rating_2011_plus_30_v2', 'overall_rating_star_2011', 'overall_rating_star_2011_text', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011', '2016 Advisory - Date Posted', '2016 Advisory - Charity Name', '2016 Advisory - advisory_url', '2016 Advisory - advisory', '_merge_v1', 'to_be_merged', u'NEW ROW', 'NAME_2015_BMF', 'STREET_2015_BMF', 'CITY_2015_BMF', 'STATE_2015_BMF', 'ZIP_2015_BMF', 'RULING_2015_BMF', 'ACTIVITY_2015_BMF', 'TAX_PERIOD_2015_BMF', 'ASSET_AMT_2015_BMF', 'INCOME_AMT_2015_BMF', 'REVENUE_AMT_2015_BMF', 'NTEE_CD_2015_BMF', '2015 BMF', 'ruledate_2004_BMF', 'name_MSTRALL', 'state_MSTRALL', 'NTEE1_MSTRALL', 'nteecc_MSTRALL', 'zip_MSTRALL', 'fips_MSTRALL', 'taxper_MSTRALL', 'income_MSTRALL', 'F990REV_MSTRALL', 'assets_MSTRALL', 'ruledate_MSTRALL', 'deductcd_MSTRALL', 'accper_MSTRALL', 'rule_date_v1', 'taxpd', 'NAME_SOI', 'yr_frmtn', 'pt1_num_vtng_gvrn_bdy_mems', 'pt1_num_ind_vtng_mems', 'num_vtng_gvrn_bdy_mems', 'num_ind_vtng_mems', 'tot_num_empls', 'tot_num_vlntrs', 'contri_grnts_cy', 'prog_srvc_rev_cy', 'invst_incm_cy', 'oth_rev_cy', 'grnts_and_smlr_amts_cy', 'tot_prof_fndrsng_exp_cy', 'tot_fndrsng_exp_cy', 'pt1_tot_asts_eoy', 'aud_fincl_stmts', 'mtrl_divrsn_or_misuse', 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy', 'federated_campaigns', 'memshp_dues', 'rltd_orgs', 'govt_grnts', 'all_oth_contri', 'nncsh_contri', 'tot_contri', 'psr_tot', 'inv_incm_tot_rev', 'bonds_tot_rev', 'roylrev_tot_rev', 'net_rent_tot_rev', 'gain_or_loss_sec', 'gain_or_loss_oth', 'oth_rev_tot', 'tot_rev', 'mgmt_srvc_fee_tot', 'fee_for_srvc_leg_tot', 'fee_for_srvc_acct_tot', 'fee_for_srvc_lbby_tot', 'fee_for_srvc_prof_tot', 'fee_for_srvc_invst_tot', 'fee_for_srvc_oth_tot', 'fs_audited', 'audit_committee', 'vlntr_hrs', '_merge', 'rule_date', 'ruledate_2004_BMF_v2', 'ruledate_MSTRALL_v2', 'yr_frmtn_v2', 'age', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ "print df.columns.tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Time_Variant Controls\n", "- Control variables:\n", " - Size: total revenues best (probably logged)\n", " - will need 2011 and 2016 versions for the 4th and 5th tests\n", " - efficiency ratio\n", " - complexity (could be a good control from Erica's paper)\n", " - The focus of our paper is on SOX policies; if an org has SOX policies it probably has other governance policies, and these would be highly correlated. So, we will leave the other governance variables out of one version of the 4th and 5th tests, and then try to include them in another set. The best candidates are:\n", " - *independent board* --> related to Erica's *independence of key actors\" concept\n", " - *board review of 990* and *audited financials* --> both related to Erica's *board monitoring* concept\n", " - we could include other governance variables as needed.\n", "- We are focusing on non-health, non-university organizations; by focusing more on a donor-focused sample (CN), we are differentiating the work from previous studies.\n", "- To differentiate from Erica's *JBE* paper, we should use the SOI data to see how many of the donor advisories are because of 'non-material diversions'.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Size - Logged Total Revenues" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYElatest_entry2011 dataOverall Ratingtotal_revenue_2016total_revenue_2011tot_revTAX_PERIOD_2015_BMFINCOME_AMT_2015_BMFREVENUE_AMT_2015_BMFtaxper_MSTRALLF990REV_MSTRALL
016722020503776currentTrueNaNcurrent (2016) donor advisoryNaNNaNNaN201309.00.00.0NaNNaN
110166043314346FY2013TrueNaN3 stars$766,123NaNNaN201312.0896259.03877845.0200012520862.0
\n", "
" ], "text/plain": [ " org_id EIN FYE latest_entry 2011 data \\\n", "0 16722 020503776 current True NaN \n", "1 10166 043314346 FY2013 True NaN \n", "\n", " Overall Rating total_revenue_2016 total_revenue_2011 \\\n", "0 current (2016) donor advisory NaN NaN \n", "1 3 stars $766,123 NaN \n", "\n", " tot_rev TAX_PERIOD_2015_BMF INCOME_AMT_2015_BMF REVENUE_AMT_2015_BMF \\\n", "0 NaN 201309.0 0.0 0.0 \n", "1 NaN 201312.0 896259.0 3877845.0 \n", "\n", " taxper_MSTRALL F990REV_MSTRALL \n", "0 NaN NaN \n", "1 200012 520862.0 " ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "total_revenue_columns = ['org_id', 'EIN', 'FYE', 'latest_entry', '2011 data', 'Overall Rating',\n", " 'total_revenue_2016', \n", " 'total_revenue_2011', 'tot_rev', 'TAX_PERIOD_2015_BMF',\n", " 'INCOME_AMT_2015_BMF', 'REVENUE_AMT_2015_BMF', 'taxper_MSTRALL', 'F990REV_MSTRALL']\n", "df[total_revenue_columns][:2]\n", "#df[df['EIN']=='020503776'][total_revenue_columns]\n", "#df[df['EIN']=='020503776'][total_revenue_columns]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Create a combined *total_revenue* column. First I will make the 2016 variable a float variable. " ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 NaN\n", "1 766123.0\n", "2 NaN\n", "3 NaN\n", "4 NaN\n", "5 NaN\n", "6 NaN\n", "7 NaN\n", "8 NaN\n", "9 NaN\n", "10 NaN\n", "11 NaN\n", "12 NaN\n", "13 NaN\n", "14 NaN\n", "15 6569428.0\n", "16 NaN\n", "17 NaN\n", "18 NaN\n", "19 NaN\n", "20 NaN\n", "21 NaN\n", "22 NaN\n", "23 NaN\n", "24 NaN\n", "Name: total_revenue_2016, dtype: float64" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import re\n", "df['total_revenue_2016'] = df['total_revenue_2016'].replace( '[\\$,)]',\n", " '', regex=True ).replace( '[(]','-', regex=True ).astype(float)\n", "df['total_revenue_2016'][:25]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Make the combined variable *total_revenue*. Start by making it equal to 2016 value, then add in 2011 value, and if it's missing from a given FY and the SOI data are available, add in the value for the SOI variable *tot_rev*." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "7983\n", "12816\n", "21894\n" ] } ], "source": [ "df['total_revenue'] = np.nan\n", "print len(df[df['total_revenue'].notnull()])\n", "df['total_revenue'] = df['total_revenue_2016']\n", "print len(df[df['total_revenue'].notnull()])\n", "df['total_revenue'] = np.where(df['total_revenue_2011'].notnull(), df['total_revenue_2011'], df['total_revenue'])\n", "print len(df[df['total_revenue'].notnull()])\n", "df['total_revenue'] = np.where( ( df['total_revenue'].isnull() & df['tot_rev'].notnull()),\n", " df['tot_rev'], df['total_revenue'])\n", "print len(df[df['total_revenue'].notnull()])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Let's take a look and check that the variable is correct. First I will **_sort the dataframe._**" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYEratings_systemlatest_entry2011 dataOverall Ratingtotal_revenue_2016total_revenue_2011total_revenuetot_revTAX_PERIOD_2015_BMFREVENUE_AMT_2015_BMFtaxper_MSTRALLF990REV_MSTRALL
5527610002030179306FY2004CN 1.0FalseNaN3 starsNaNNaNNaNNaN201506.04607676.02001065184243.0
7608810003042104017FY2014CN 2.1TrueNaN3 stars7190604.0NaN7190604.0NaN201506.05612448.02001065981270.0
7608910003042104017FY2014CN 2.0FalseNaN3 starsNaNNaNNaNNaN201506.05612448.02001065981270.0
7609010003042104017FY2013CN 2.0FalseNaN3 starsNaNNaNNaNNaN201506.05612448.02001065981270.0
7609110003042104017FY2012CN 2.0FalseNaN3 starsNaNNaNNaNNaN201506.05612448.02001065981270.0
7609210003042104017FY2011CN 2.0FalseNaN3 starsNaNNaNNaNNaN201506.05612448.02001065981270.0
7609310003042104017FY2010CN 2.0False1.03 starsNaN7196242.07196242.0NaN201506.05612448.02001065981270.0
7609410003042104017FY2009CN 1.0FalseNaN2 starsNaNNaNNaNNaN201506.05612448.02001065981270.0
7609510003042104017FY2008CN 1.0FalseNaN2 starsNaNNaNNaNNaN201506.05612448.02001065981270.0
7609610003042104017FY2007CN 1.0FalseNaN1 starsNaNNaNNaNNaN201506.05612448.02001065981270.0
\n", "
" ], "text/plain": [ " org_id EIN FYE ratings_system latest_entry 2011 data \\\n", "55276 10002 030179306 FY2004 CN 1.0 False NaN \n", "76088 10003 042104017 FY2014 CN 2.1 True NaN \n", "76089 10003 042104017 FY2014 CN 2.0 False NaN \n", "76090 10003 042104017 FY2013 CN 2.0 False NaN \n", "76091 10003 042104017 FY2012 CN 2.0 False NaN \n", "76092 10003 042104017 FY2011 CN 2.0 False NaN \n", "76093 10003 042104017 FY2010 CN 2.0 False 1.0 \n", "76094 10003 042104017 FY2009 CN 1.0 False NaN \n", "76095 10003 042104017 FY2008 CN 1.0 False NaN \n", "76096 10003 042104017 FY2007 CN 1.0 False NaN \n", "\n", " Overall Rating total_revenue_2016 total_revenue_2011 total_revenue \\\n", "55276 3 stars NaN NaN NaN \n", "76088 3 stars 7190604.0 NaN 7190604.0 \n", "76089 3 stars NaN NaN NaN \n", "76090 3 stars NaN NaN NaN \n", "76091 3 stars NaN NaN NaN \n", "76092 3 stars NaN NaN NaN \n", "76093 3 stars NaN 7196242.0 7196242.0 \n", "76094 2 stars NaN NaN NaN \n", "76095 2 stars NaN NaN NaN \n", "76096 1 stars NaN NaN NaN \n", "\n", " tot_rev TAX_PERIOD_2015_BMF REVENUE_AMT_2015_BMF taxper_MSTRALL \\\n", "55276 NaN 201506.0 4607676.0 200106 \n", "76088 NaN 201506.0 5612448.0 200106 \n", "76089 NaN 201506.0 5612448.0 200106 \n", "76090 NaN 201506.0 5612448.0 200106 \n", "76091 NaN 201506.0 5612448.0 200106 \n", "76092 NaN 201506.0 5612448.0 200106 \n", "76093 NaN 201506.0 5612448.0 200106 \n", "76094 NaN 201506.0 5612448.0 200106 \n", "76095 NaN 201506.0 5612448.0 200106 \n", "76096 NaN 201506.0 5612448.0 200106 \n", "\n", " F990REV_MSTRALL \n", "55276 5184243.0 \n", "76088 5981270.0 \n", "76089 5981270.0 \n", "76090 5981270.0 \n", "76091 5981270.0 \n", "76092 5981270.0 \n", "76093 5981270.0 \n", "76094 5981270.0 \n", "76095 5981270.0 \n", "76096 5981270.0 " ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values(by=['org_id', 'FYE', 'ratings_system', 'latest_entry'], ascending=[1, 0, 0, 0])[['org_id', \n", " 'EIN', 'FYE', 'ratings_system', 'latest_entry', '2011 data', 'Overall Rating',\n", " 'total_revenue_2016', \n", " 'total_revenue_2011', 'total_revenue', 'tot_rev', 'TAX_PERIOD_2015_BMF',\n", " 'REVENUE_AMT_2015_BMF', 'taxper_MSTRALL', 'F990REV_MSTRALL']][45:55]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create logged version" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "69\n", "0\n", "0\n" ] } ], "source": [ "print len(df[df['total_revenue']==0])\n", "print len(df[df['total_revenue']<0])\n", "df['total_revenue_no_neg'] = df['total_revenue']\n", "df['total_revenue_no_neg'] = np.where(df['total_revenue_no_neg']<=0, 1, df['total_revenue_no_neg'])\n", "print len(df[df['total_revenue_no_neg']==0])\n", "print len(df[df['total_revenue_no_neg']<0])" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYEratings_systemlatest_entryOverall Ratingtotal_revenuetotal_revenue_loggedtotal_revenue_2016total_revenue_2011tot_revTAX_PERIOD_2015_BMFREVENUE_AMT_2015_BMFtaxper_MSTRALLF990REV_MSTRALL
4343410000364395095FY2014CN 2.1True4 stars4413156.015.3001014413156.0NaNNaN201506.04881425.0200106123179.0
4343510000364395095FY2014CN 2.0False4 starsNaNNaNNaNNaNNaN201506.04881425.0200106123179.0
4343610000364395095FY2014CN 2.0False4 starsNaNNaNNaNNaNNaN201506.04881425.0200106123179.0
4343710000364395095FY2014CN 2.0False4 starsNaNNaNNaNNaNNaN201506.04881425.0200106123179.0
4343810000364395095FY2013CN 2.0False3 starsNaNNaNNaNNaNNaN201506.04881425.0200106123179.0
4343910000364395095FY2012CN 2.0False2 starsNaNNaNNaNNaNNaN201506.04881425.0200106123179.0
4344010000364395095FY2011CN 2.0False4 starsNaNNaNNaNNaNNaN201506.04881425.0200106123179.0
4344110000364395095FY2010CN 2.0False4 stars3787334.015.147173NaN3787334.0NaN201506.04881425.0200106123179.0
4344210000364395095FY2010CN 2.0False3 starsNaNNaNNaNNaNNaN201506.04881425.0200106123179.0
4344310000364395095FY2009CN 1.0False4 starsNaNNaNNaNNaNNaN201506.04881425.0200106123179.0
4344410000364395095FY2008CN 1.0False4 starsNaNNaNNaNNaNNaN201506.04881425.0200106123179.0
4344510000364395095FY2007CN 1.0False4 starsNaNNaNNaNNaNNaN201506.04881425.0200106123179.0
4344610000364395095FY2006CN 1.0False4 starsNaNNaNNaNNaNNaN201506.04881425.0200106123179.0
4344710000364395095FY2005CN 1.0False4 starsNaNNaNNaNNaNNaN201506.04881425.0200106123179.0
7426510001222392881FY2015CN 2.1True4 stars3627051.015.1039303627051.0NaNNaN201506.03627051.0200012620272.0
7426610001222392881FY2015CN 2.1False4 starsNaNNaNNaNNaNNaN201506.03627051.0200012620272.0
7426710001222392881FY2015CN 2.0False4 starsNaNNaNNaNNaNNaN201506.03627051.0200012620272.0
7426810001222392881FY2014CN 2.0False4 starsNaNNaNNaNNaNNaN201506.03627051.0200012620272.0
7426910001222392881FY2014CN 2.0False3 starsNaNNaNNaNNaNNaN201506.03627051.0200012620272.0
7427010001222392881FY2013CN 2.0False4 starsNaNNaNNaNNaNNaN201506.03627051.0200012620272.0
\n", "
" ], "text/plain": [ " org_id EIN FYE ratings_system latest_entry Overall Rating \\\n", "43434 10000 364395095 FY2014 CN 2.1 True 4 stars \n", "43435 10000 364395095 FY2014 CN 2.0 False 4 stars \n", "43436 10000 364395095 FY2014 CN 2.0 False 4 stars \n", "43437 10000 364395095 FY2014 CN 2.0 False 4 stars \n", "43438 10000 364395095 FY2013 CN 2.0 False 3 stars \n", "43439 10000 364395095 FY2012 CN 2.0 False 2 stars \n", "43440 10000 364395095 FY2011 CN 2.0 False 4 stars \n", "43441 10000 364395095 FY2010 CN 2.0 False 4 stars \n", "43442 10000 364395095 FY2010 CN 2.0 False 3 stars \n", "43443 10000 364395095 FY2009 CN 1.0 False 4 stars \n", "43444 10000 364395095 FY2008 CN 1.0 False 4 stars \n", "43445 10000 364395095 FY2007 CN 1.0 False 4 stars \n", "43446 10000 364395095 FY2006 CN 1.0 False 4 stars \n", "43447 10000 364395095 FY2005 CN 1.0 False 4 stars \n", "74265 10001 222392881 FY2015 CN 2.1 True 4 stars \n", "74266 10001 222392881 FY2015 CN 2.1 False 4 stars \n", "74267 10001 222392881 FY2015 CN 2.0 False 4 stars \n", "74268 10001 222392881 FY2014 CN 2.0 False 4 stars \n", "74269 10001 222392881 FY2014 CN 2.0 False 3 stars \n", "74270 10001 222392881 FY2013 CN 2.0 False 4 stars \n", "\n", " total_revenue total_revenue_logged total_revenue_2016 \\\n", "43434 4413156.0 15.300101 4413156.0 \n", "43435 NaN NaN NaN \n", "43436 NaN NaN NaN \n", "43437 NaN NaN NaN \n", "43438 NaN NaN NaN \n", "43439 NaN NaN NaN \n", "43440 NaN NaN NaN \n", "43441 3787334.0 15.147173 NaN \n", "43442 NaN NaN NaN \n", "43443 NaN NaN NaN \n", "43444 NaN NaN NaN \n", "43445 NaN NaN NaN \n", "43446 NaN NaN NaN \n", "43447 NaN NaN NaN \n", "74265 3627051.0 15.103930 3627051.0 \n", "74266 NaN NaN NaN \n", "74267 NaN NaN NaN \n", "74268 NaN NaN NaN \n", "74269 NaN NaN NaN \n", "74270 NaN NaN NaN \n", "\n", " total_revenue_2011 tot_rev TAX_PERIOD_2015_BMF REVENUE_AMT_2015_BMF \\\n", "43434 NaN NaN 201506.0 4881425.0 \n", "43435 NaN NaN 201506.0 4881425.0 \n", "43436 NaN NaN 201506.0 4881425.0 \n", "43437 NaN NaN 201506.0 4881425.0 \n", "43438 NaN NaN 201506.0 4881425.0 \n", "43439 NaN NaN 201506.0 4881425.0 \n", "43440 NaN NaN 201506.0 4881425.0 \n", "43441 3787334.0 NaN 201506.0 4881425.0 \n", "43442 NaN NaN 201506.0 4881425.0 \n", "43443 NaN NaN 201506.0 4881425.0 \n", "43444 NaN NaN 201506.0 4881425.0 \n", "43445 NaN NaN 201506.0 4881425.0 \n", "43446 NaN NaN 201506.0 4881425.0 \n", "43447 NaN NaN 201506.0 4881425.0 \n", "74265 NaN NaN 201506.0 3627051.0 \n", "74266 NaN NaN 201506.0 3627051.0 \n", "74267 NaN NaN 201506.0 3627051.0 \n", "74268 NaN NaN 201506.0 3627051.0 \n", "74269 NaN NaN 201506.0 3627051.0 \n", "74270 NaN NaN 201506.0 3627051.0 \n", "\n", " taxper_MSTRALL F990REV_MSTRALL \n", "43434 200106 123179.0 \n", "43435 200106 123179.0 \n", "43436 200106 123179.0 \n", "43437 200106 123179.0 \n", "43438 200106 123179.0 \n", "43439 200106 123179.0 \n", "43440 200106 123179.0 \n", "43441 200106 123179.0 \n", "43442 200106 123179.0 \n", "43443 200106 123179.0 \n", "43444 200106 123179.0 \n", "43445 200106 123179.0 \n", "43446 200106 123179.0 \n", "43447 200106 123179.0 \n", "74265 200012 620272.0 \n", "74266 200012 620272.0 \n", "74267 200012 620272.0 \n", "74268 200012 620272.0 \n", "74269 200012 620272.0 \n", "74270 200012 620272.0 " ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['total_revenue_logged'] = np.log(df['total_revenue_no_neg'])\n", "df.sort_values(by=['org_id', 'FYE', 'ratings_system', 'latest_entry'], ascending=[1, 0, 0, 0])[['org_id', \n", " 'EIN', 'FYE', 'ratings_system', 'latest_entry', #'2011 data', \n", " 'Overall Rating',\n", " 'total_revenue', 'total_revenue_logged', 'total_revenue_2016', \n", " 'total_revenue_2011', 'tot_rev', 'TAX_PERIOD_2015_BMF',\n", " 'REVENUE_AMT_2015_BMF', 'taxper_MSTRALL', 'F990REV_MSTRALL']][:20]" ] }, { "cell_type": "code", "execution_count": 1235, "metadata": { "collapsed": false }, "outputs": [ { "ename": "KeyError", "evalue": "\"['2011 data'] not in index\"", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'total_revenue'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m<\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mtotal_revenue_columns\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1961\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_set_as_cached\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlabel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1962\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1963\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1964\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1965\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0miget_value\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mj\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc\u001b[0m in \u001b[0;36m_getitem_array\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 2005\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2006\u001b[0m \u001b[0;31m# duplicate columns & possible reduce dimensionality\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2007\u001b[0;31m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_constructor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_data\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2008\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_unique\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2009\u001b[0m \u001b[0mresult\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/core/indexing.pyc\u001b[0m in \u001b[0;36m_convert_to_indexer\u001b[0;34m(self, obj, axis, is_setter)\u001b[0m\n\u001b[1;32m 1148\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1149\u001b[0m \u001b[0;31m# a positional\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1150\u001b[0;31m \u001b[0;32mif\u001b[0m \u001b[0mis_int_positional\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1151\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1152\u001b[0m \u001b[0;31m# if we are setting and its not a valid location\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mKeyError\u001b[0m: \"['2011 data'] not in index\"" ] } ], "source": [ "#df[df['total_revenue']<0][total_revenue_columns][:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NOTE: The *total_revenue* variable is now all set. If we need to fill in additional values, we could add the BMF data to the column. But let's only do that if necessary.\n", "\n", "### NOTE: The 69 observations with negative *total_revenue* are missing from the logged version *total_revenue_logged*" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "//anaconda/lib/python2.7/site-packages/numpy/lib/function_base.py:3834: RuntimeWarning: Invalid value encountered in percentile\n", " RuntimeWarning)\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011 data4863.01.000000e+000.000000e+001.0NaNNaNNaN1.000000e+00
total_revenue21894.03.046526e+071.204029e+08-218265025.0NaNNaNNaN3.741635e+09
total_revenue_logged21894.01.586132e+011.707654e+000.0NaNNaNNaN2.204279e+01
total_revenue_20167983.01.612795e+077.898075e+07-718326.0NaNNaNNaN3.471552e+09
total_revenue_20114833.01.717611e+077.570407e+07-42638874.0NaNNaNNaN3.587775e+09
tot_rev10964.04.989796e+071.583081e+08-218265025.0NaNNaNNaN3.741635e+09
TAX_PERIOD_2015_BMF83668.02.014741e+055.179052e+01200412.0NaNNaNNaN2.016040e+05
REVENUE_AMT_2015_BMF83405.02.186730e+078.406647e+07-204684.0NaNNaNNaN4.025714e+09
F990REV_MSTRALL83137.01.092375e+075.056853e+070.0NaNNaNNaN2.711607e+09
\n", "
" ], "text/plain": [ " count mean std min 25% \\\n", "2011 data 4863.0 1.000000e+00 0.000000e+00 1.0 NaN \n", "total_revenue 21894.0 3.046526e+07 1.204029e+08 -218265025.0 NaN \n", "total_revenue_logged 21894.0 1.586132e+01 1.707654e+00 0.0 NaN \n", "total_revenue_2016 7983.0 1.612795e+07 7.898075e+07 -718326.0 NaN \n", "total_revenue_2011 4833.0 1.717611e+07 7.570407e+07 -42638874.0 NaN \n", "tot_rev 10964.0 4.989796e+07 1.583081e+08 -218265025.0 NaN \n", "TAX_PERIOD_2015_BMF 83668.0 2.014741e+05 5.179052e+01 200412.0 NaN \n", "REVENUE_AMT_2015_BMF 83405.0 2.186730e+07 8.406647e+07 -204684.0 NaN \n", "F990REV_MSTRALL 83137.0 1.092375e+07 5.056853e+07 0.0 NaN \n", "\n", " 50% 75% max \n", "2011 data NaN NaN 1.000000e+00 \n", "total_revenue NaN NaN 3.741635e+09 \n", "total_revenue_logged NaN NaN 2.204279e+01 \n", "total_revenue_2016 NaN NaN 3.471552e+09 \n", "total_revenue_2011 NaN NaN 3.587775e+09 \n", "tot_rev NaN NaN 3.741635e+09 \n", "TAX_PERIOD_2015_BMF NaN NaN 2.016040e+05 \n", "REVENUE_AMT_2015_BMF NaN NaN 4.025714e+09 \n", "F990REV_MSTRALL NaN NaN 2.711607e+09 " ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "total_revenue_columns = ['org_id', \n", " 'EIN', 'FYE', 'ratings_system', 'latest_entry', '2011 data', 'Overall Rating',\n", " 'total_revenue', 'total_revenue_logged', 'total_revenue_2016', \n", " 'total_revenue_2011', 'tot_rev', 'TAX_PERIOD_2015_BMF',\n", " 'REVENUE_AMT_2015_BMF', 'taxper_MSTRALL', 'F990REV_MSTRALL']\n", "df[total_revenue_columns].describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Save DF" ] }, { "cell_type": "code", "execution_count": 725, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84958\n" ] } ], "source": [ "print len(df)\n", "df.to_pickle('Merged dataset with with Age, Category dummies, and Total Revenues.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STATE \n", "Create new *state* variable then add in values for the three variables below successively, as I did with *total_revenue* above. " ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['state_2011', 'STATE_2015_BMF', 'state_MSTRALL']" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[col for col in list(df) if 'state' in col.lower()]" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "80095\n", "1272\n", "1821\n" ] } ], "source": [ "print len(df[df['state_2011'].isnull()])\n", "print len(df[df['STATE_2015_BMF'].isnull()])\n", "print len(df[df['state_MSTRALL'].isnull()])" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "9 MA\n", "21 CA\n", "Name: state_2011, dtype: object" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['state_2011'].notnull()]['state_2011'][:2]" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 NH\n", "1 MA\n", "Name: STATE_2015_BMF, dtype: object" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['STATE_2015_BMF'].notnull()]['STATE_2015_BMF'][:2]" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 MA\n", "2 MA\n", "Name: state_MSTRALL, dtype: object" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['state_MSTRALL'].notnull()]['state_MSTRALL'][:2]" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "4863\n", "83697\n", "83849\n" ] } ], "source": [ "df['state'] = np.nan\n", "print len(df[df['state'].notnull()])\n", "df['state'] = df['state_2011']\n", "print len(df[df['state'].notnull()])\n", "df['state'] = np.where( ( df['state'].isnull() & df['STATE_2015_BMF'].notnull()),\n", " df['STATE_2015_BMF'], df['state'])\n", "print len(df[df['state'].notnull()])\n", "df['state'] = np.where( ( df['state'].isnull() & df['state_MSTRALL'].notnull()),\n", " df['state_MSTRALL'], df['state'])\n", "print len(df[df['state'].notnull()])" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "NY 10840\n", "CA 10459\n", "DC 5237\n", "FL 4509\n", "TX 4250\n", "VA 3430\n", "MA 3183\n", "IL 2984\n", "PA 2911\n", "CO 2393\n", "OH 2377\n", "GA 2003\n", "MD 1963\n", "WA 1945\n", "MI 1935\n", "MN 1809\n", "NJ 1638\n", "TN 1611\n", "NC 1582\n", "MO 1541\n", "AZ 1344\n", "OR 1292\n", "CT 1274\n", "WI 1140\n", "IN 977\n", "SC 658\n", "OK 610\n", "KY 608\n", "NE 569\n", "AL 524\n", "LA 520\n", "KS 518\n", "ME 483\n", "UT 463\n", "IA 406\n", "MT 383\n", "NM 371\n", "NH 320\n", "MS 316\n", "HI 316\n", "RI 303\n", "NV 297\n", "VT 280\n", "AR 273\n", "DE 228\n", "AK 158\n", "ID 149\n", "WV 135\n", "SD 128\n", "WY 123\n", "ND 64\n", "PR 19\n", "Name: state, dtype: int64" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['state'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Save DF" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "print len(df)\n", "df.to_pickle('Merged dataset with with Age, Category dummies, and Total Revenues.pkl')" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#df = pd.read_pickle('Merged dataset with with Age, Category dummies, and Total Revenues.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## EFFICIENCY" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
This variable is the average over 3 years." ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 88.8\n", "2 NaN\n", "3 NaN\n", "Name: program_expense_percent_2016, dtype: object" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['org_id']=='10166']['program_expense_percent_2016'][:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Let's make one out of the original columns. First we'll need to change the variables to floats." ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "program_expenses_2016 float64\n", "total_functional_expenses_2016 float64\n", "dtype: object" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['program_expenses_2016', 'total_functional_expenses_2016']].dtypes" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 659342.0\n", "2 NaN\n", "Name: program_expenses_2016, dtype: float64" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['org_id']=='10166']['program_expenses_2016'][:2]" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 757112.0\n", "2 NaN\n", "Name: total_functional_expenses_2016, dtype: float64" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['org_id']=='10166']['total_functional_expenses_2016'][:2]" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 NaN\n", "1 659342.0\n", "Name: program_expenses_2016, dtype: float64" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['program_expenses_2016'] = df['program_expenses_2016'].replace( '[\\$,)]',\n", " '', regex=True ).replace( '[(]','-', regex=True ).astype(float)\n", "df['program_expenses_2016'][:2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Read in and re-merge SOI data\n", "I did not keep the two columns needed for efficiency so I'll re-merge those in." ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "52\n", "8563\n", "4\n", "8563\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINFYEtot_func_expns_prg_srvcstot_func_expns_tot
0010202467200871425688729757
\n", "
" ], "text/plain": [ " EIN FYE tot_func_expns_prg_srvcs tot_func_expns_tot\n", "0 010202467 2008 7142568 8729757" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOI_data_valid_EINS = pd.read_pickle('combined SOI file 2008 to 2013 for CN EINs, v2.pkl')\n", "print len(SOI_data_valid_EINS.columns)\n", "print len(SOI_data_valid_EINS)\n", "SOI_data_valid_EINS = SOI_data_valid_EINS[['EIN', 'FYE', 'tot_func_expns_prg_srvcs', 'tot_func_expns_tot']]\n", "print len(SOI_data_valid_EINS.columns)\n", "print len(SOI_data_valid_EINS)\n", "SOI_data_valid_EINS.head(1)" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "247\n", "84958\n", "249\n", "84958\n" ] } ], "source": [ "print len(df.columns)\n", "print len(df)\n", "print len(pd.merge(df, SOI_data_valid_EINS, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='left').columns)\n", "print len(pd.merge(df, SOI_data_valid_EINS, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='left'))" ] }, { "cell_type": "code", "execution_count": 759, "metadata": { "collapsed": true }, "outputs": [], "source": [ " " ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 FY2008\n", "1 FY2009\n", "Name: FYE, dtype: object" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOI_data_valid_EINS['FYE'] = 'FY' + SOI_data_valid_EINS['FYE']\n", "SOI_data_valid_EINS['FYE'][:2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 85, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "247\n", "84958\n", "249\n" ] }, { "ename": "ValueError", "evalue": "Cannot use name of an existing column for indicator column", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mprint\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mprint\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmerge\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mSOI_data_valid_EINS\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mleft_on\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'EIN'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m'FYE'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mright_on\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'EIN'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m'FYE'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhow\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'left'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmerge\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mSOI_data_valid_EINS\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mleft_on\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'EIN'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m'FYE'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mright_on\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'EIN'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m'FYE'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhow\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'left'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindicator\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0;32mprint\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;32mprint\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc\u001b[0m in \u001b[0;36mmerge\u001b[0;34m(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator)\u001b[0m\n\u001b[1;32m 37\u001b[0m \u001b[0mright_index\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mright_index\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msort\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msuffixes\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msuffixes\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 38\u001b[0m copy=copy, indicator=indicator)\n\u001b[0;32m---> 39\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mop\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 40\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0m__debug__\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 41\u001b[0m \u001b[0mmerge\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_merge_doc\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0;34m'\\nleft : DataFrame'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc\u001b[0m in \u001b[0;36mget_result\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 213\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindicator\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 214\u001b[0m self.left, self.right = self._indicator_pre_merge(\n\u001b[0;32m--> 215\u001b[0;31m self.left, self.right)\n\u001b[0m\u001b[1;32m 216\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 217\u001b[0m \u001b[0mjoin_index\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mleft_indexer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mright_indexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_join_info\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc\u001b[0m in \u001b[0;36m_indicator_pre_merge\u001b[0;34m(self, left, right)\u001b[0m\n\u001b[1;32m 251\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindicator_name\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mcolumns\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 252\u001b[0m raise ValueError(\n\u001b[0;32m--> 253\u001b[0;31m \"Cannot use name of an existing column for indicator column\")\n\u001b[0m\u001b[1;32m 254\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 255\u001b[0m \u001b[0mleft\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mleft\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mValueError\u001b[0m: Cannot use name of an existing column for indicator column" ] } ], "source": [ "print len(df.columns)\n", "print len(df)\n", "print len(pd.merge(df, SOI_data_valid_EINS, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='left').columns)\n", "df = pd.merge(df, SOI_data_valid_EINS, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='left', indicator=True)\n", "print len(df.columns)\n", "print len(df)" ] }, { "cell_type": "code", "execution_count": 762, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "left_only 73994\n", "both 10964\n", "right_only 0\n", "dtype: int64" ] }, "execution_count": 762, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['_merge'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create combined efficiency variables" ] }, { "cell_type": "code", "execution_count": 777, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idFYEprogram_expense_percent_2016program_expenses_2016total_functional_expenses_2016program_expense_2011total_functional_expense_2011tot_func_expns_prg_srvcstot_func_expns_tot
156466FY201477.745709465984968NaNNaNNaNNaN
166466FY2014NaNNaNNaNNaNNaNNaNNaN
176466FY2013NaNNaNNaNNaNNaNNaNNaN
186466FY2012NaNNaNNaNNaNNaNNaNNaN
196466FY2011NaNNaNNaNNaNNaNNaNNaN
206466FY2010NaNNaNNaNNaNNaNNaNNaN
216466FY2009NaNNaNNaN28135323528168NaNNaN
\n", "
" ], "text/plain": [ " org_id FYE program_expense_percent_2016 program_expenses_2016 \\\n", "15 6466 FY2014 77.7 4570946 \n", "16 6466 FY2014 NaN NaN \n", "17 6466 FY2013 NaN NaN \n", "18 6466 FY2012 NaN NaN \n", "19 6466 FY2011 NaN NaN \n", "20 6466 FY2010 NaN NaN \n", "21 6466 FY2009 NaN NaN \n", "\n", " total_functional_expenses_2016 program_expense_2011 \\\n", "15 5984968 NaN \n", "16 NaN NaN \n", "17 NaN NaN \n", "18 NaN NaN \n", "19 NaN NaN \n", "20 NaN NaN \n", "21 NaN 2813532 \n", "\n", " total_functional_expense_2011 tot_func_expns_prg_srvcs \\\n", "15 NaN NaN \n", "16 NaN NaN \n", "17 NaN NaN \n", "18 NaN NaN \n", "19 NaN NaN \n", "20 NaN NaN \n", "21 3528168 NaN \n", "\n", " tot_func_expns_tot \n", "15 NaN \n", "16 NaN \n", "17 NaN \n", "18 NaN \n", "19 NaN \n", "20 NaN \n", "21 NaN " ] }, "execution_count": 777, "metadata": {}, "output_type": "execute_result" } ], "source": [ "efficiency_columns = ['org_id', 'FYE', 'program_expense_percent_2016', \n", " 'program_expenses_2016', 'total_functional_expenses_2016',\n", " 'program_expense_2011', 'total_functional_expense_2011',\n", " 'tot_func_expns_prg_srvcs', 'tot_func_expns_tot']\n", "df[efficiency_columns][15:22]" ] }, { "cell_type": "code", "execution_count": 779, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "7983\n", "12816\n", "21894\n" ] } ], "source": [ "df['program_expenses'] = np.nan\n", "print len(df[df['program_expenses'].notnull()])\n", "df['program_expenses'] = df['program_expenses_2016']\n", "print len(df[df['program_expenses'].notnull()])\n", "df['program_expenses'] = np.where( (df['program_expenses'].isnull() & df['program_expense_2011'].notnull()),\n", " df['program_expense_2011'], df['program_expenses'])\n", "print len(df[df['program_expenses'].notnull()])\n", "df['program_expenses'] = np.where( ( df['program_expenses'].isnull() & df['tot_func_expns_prg_srvcs'].notnull()),\n", " df['tot_func_expns_prg_srvcs'], df['program_expenses'])\n", "print len(df[df['program_expenses'].notnull()])" ] }, { "cell_type": "code", "execution_count": 792, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idFYElatest_entry2011 dataprogram_expensesprogram_expenses_2016tot_func_expns_prg_srvcsprogram_expense_2011total_functional_expense_2011total_functional_expenses_2016tot_func_expns_tot
14813055FY2009False11129807NaNNaN11298071542843NaNNaN
14913055FY2009FalseNaNNaNNaNNaNNaNNaNNaNNaN
15013546FY2015TrueNaN14920061492006NaNNaNNaN1984182NaN
15113546FY2014FalseNaNNaNNaNNaNNaNNaNNaNNaN
15213546FY2014FalseNaNNaNNaNNaNNaNNaNNaNNaN
15313546FY2013FalseNaNNaNNaNNaNNaNNaNNaNNaN
15413546FY2012FalseNaNNaNNaNNaNNaNNaNNaNNaN
15516635currentTrueNaNNaNNaNNaNNaNNaNNaNNaN
1564792FY2014TrueNaN547675055476750554767505NaNNaN6331538563315385
1574792FY2014FalseNaN54767505NaN54767505NaNNaNNaN63315385
1584792FY2013FalseNaN61046694NaN61046694NaNNaNNaN69754948
1594792FY2012FalseNaN58820915NaN58820915NaNNaNNaN67426560
\n", "
" ], "text/plain": [ " org_id FYE latest_entry 2011 data program_expenses \\\n", "148 13055 FY2009 False 1 1129807 \n", "149 13055 FY2009 False NaN NaN \n", "150 13546 FY2015 True NaN 1492006 \n", "151 13546 FY2014 False NaN NaN \n", "152 13546 FY2014 False NaN NaN \n", "153 13546 FY2013 False NaN NaN \n", "154 13546 FY2012 False NaN NaN \n", "155 16635 current True NaN NaN \n", "156 4792 FY2014 True NaN 54767505 \n", "157 4792 FY2014 False NaN 54767505 \n", "158 4792 FY2013 False NaN 61046694 \n", "159 4792 FY2012 False NaN 58820915 \n", "\n", " program_expenses_2016 tot_func_expns_prg_srvcs program_expense_2011 \\\n", "148 NaN NaN 1129807 \n", "149 NaN NaN NaN \n", "150 1492006 NaN NaN \n", "151 NaN NaN NaN \n", "152 NaN NaN NaN \n", "153 NaN NaN NaN \n", "154 NaN NaN NaN \n", "155 NaN NaN NaN \n", "156 54767505 54767505 NaN \n", "157 NaN 54767505 NaN \n", "158 NaN 61046694 NaN \n", "159 NaN 58820915 NaN \n", "\n", " total_functional_expense_2011 total_functional_expenses_2016 \\\n", "148 1542843 NaN \n", "149 NaN NaN \n", "150 NaN 1984182 \n", "151 NaN NaN \n", "152 NaN NaN \n", "153 NaN NaN \n", "154 NaN NaN \n", "155 NaN NaN \n", "156 NaN 63315385 \n", "157 NaN NaN \n", "158 NaN NaN \n", "159 NaN NaN \n", "\n", " tot_func_expns_tot \n", "148 NaN \n", "149 NaN \n", "150 NaN \n", "151 NaN \n", "152 NaN \n", "153 NaN \n", "154 NaN \n", "155 NaN \n", "156 63315385 \n", "157 63315385 \n", "158 69754948 \n", "159 67426560 " ] }, "execution_count": 792, "metadata": {}, "output_type": "execute_result" } ], "source": [ "efficiency_columns = ['org_id', 'FYE', 'latest_entry', '2011 data', 'program_expenses',\n", " 'program_expenses_2016', 'tot_func_expns_prg_srvcs',\n", " 'program_expense_2011', 'total_functional_expense_2011', 'total_functional_expenses_2016',\n", " 'tot_func_expns_tot']\n", "#'program_expense_percent_2016',\n", "df[efficiency_columns][148:160]" ] }, { "cell_type": "code", "execution_count": 793, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "7983\n", "12816\n", "21894\n" ] } ], "source": [ "df['total_expenses'] = np.nan\n", "print len(df[df['total_expenses'].notnull()])\n", "df['total_expenses'] = df['total_functional_expenses_2016']\n", "print len(df[df['total_expenses'].notnull()])\n", "df['total_expenses'] = np.where( (df['total_expenses'].isnull() & df['total_functional_expense_2011'].notnull()),\n", " df['total_functional_expense_2011'], df['total_expenses'])\n", "print len(df[df['total_expenses'].notnull()])\n", "df['total_expenses'] = np.where( ( df['total_expenses'].isnull() & df['tot_func_expns_tot'].notnull()),\n", " df['tot_func_expns_tot'], df['total_expenses'])\n", "print len(df[df['total_expenses'].notnull()])" ] }, { "cell_type": "code", "execution_count": 795, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idFYElatest_entry2011 datatotal_expensestotal_functional_expenses_2016total_functional_expense_2011tot_func_expns_totprogram_expensesprogram_expenses_2016tot_func_expns_prg_srvcsprogram_expense_2011
14813055FY2009False11542843NaN1542843NaN1129807NaNNaN1129807
14913055FY2009FalseNaNNaNNaNNaNNaNNaNNaNNaNNaN
15013546FY2015TrueNaN19841821984182NaNNaN14920061492006NaNNaN
15113546FY2014FalseNaNNaNNaNNaNNaNNaNNaNNaNNaN
15213546FY2014FalseNaNNaNNaNNaNNaNNaNNaNNaNNaN
15313546FY2013FalseNaNNaNNaNNaNNaNNaNNaNNaNNaN
15413546FY2012FalseNaNNaNNaNNaNNaNNaNNaNNaNNaN
15516635currentTrueNaNNaNNaNNaNNaNNaNNaNNaNNaN
1564792FY2014TrueNaN6331538563315385NaN63315385547675055476750554767505NaN
1574792FY2014FalseNaN63315385NaNNaN6331538554767505NaN54767505NaN
1584792FY2013FalseNaN69754948NaNNaN6975494861046694NaN61046694NaN
1594792FY2012FalseNaN67426560NaNNaN6742656058820915NaN58820915NaN
\n", "
" ], "text/plain": [ " org_id FYE latest_entry 2011 data total_expenses \\\n", "148 13055 FY2009 False 1 1542843 \n", "149 13055 FY2009 False NaN NaN \n", "150 13546 FY2015 True NaN 1984182 \n", "151 13546 FY2014 False NaN NaN \n", "152 13546 FY2014 False NaN NaN \n", "153 13546 FY2013 False NaN NaN \n", "154 13546 FY2012 False NaN NaN \n", "155 16635 current True NaN NaN \n", "156 4792 FY2014 True NaN 63315385 \n", "157 4792 FY2014 False NaN 63315385 \n", "158 4792 FY2013 False NaN 69754948 \n", "159 4792 FY2012 False NaN 67426560 \n", "\n", " total_functional_expenses_2016 total_functional_expense_2011 \\\n", "148 NaN 1542843 \n", "149 NaN NaN \n", "150 1984182 NaN \n", "151 NaN NaN \n", "152 NaN NaN \n", "153 NaN NaN \n", "154 NaN NaN \n", "155 NaN NaN \n", "156 63315385 NaN \n", "157 NaN NaN \n", "158 NaN NaN \n", "159 NaN NaN \n", "\n", " tot_func_expns_tot program_expenses program_expenses_2016 \\\n", "148 NaN 1129807 NaN \n", "149 NaN NaN NaN \n", "150 NaN 1492006 1492006 \n", "151 NaN NaN NaN \n", "152 NaN NaN NaN \n", "153 NaN NaN NaN \n", "154 NaN NaN NaN \n", "155 NaN NaN NaN \n", "156 63315385 54767505 54767505 \n", "157 63315385 54767505 NaN \n", "158 69754948 61046694 NaN \n", "159 67426560 58820915 NaN \n", "\n", " tot_func_expns_prg_srvcs program_expense_2011 \n", "148 NaN 1129807 \n", "149 NaN NaN \n", "150 NaN NaN \n", "151 NaN NaN \n", "152 NaN NaN \n", "153 NaN NaN \n", "154 NaN NaN \n", "155 NaN NaN \n", "156 54767505 NaN \n", "157 54767505 NaN \n", "158 61046694 NaN \n", "159 58820915 NaN " ] }, "execution_count": 795, "metadata": {}, "output_type": "execute_result" } ], "source": [ "efficiency_columns = ['org_id', 'FYE', 'latest_entry', '2011 data', 'total_expenses',\n", " 'total_functional_expenses_2016', 'total_functional_expense_2011', 'tot_func_expns_tot',\n", " 'program_expenses',\n", " 'program_expenses_2016', 'tot_func_expns_prg_srvcs',\n", " 'program_expense_2011', \n", " ]\n", "#'program_expense_percent_2016',\n", "df[efficiency_columns][148:160]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create combined version of efficiency variable " ] }, { "cell_type": "code", "execution_count": 796, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df['program_efficiency'] = df['program_expenses']/df['total_expenses']" ] }, { "cell_type": "code", "execution_count": 797, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idFYElatest_entry2011 dataprogram_efficiencyprogram_expensestotal_expensestotal_functional_expenses_2016total_functional_expense_2011tot_func_expns_totprogram_expenses_2016tot_func_expns_prg_srvcsprogram_expense_2011
14813055FY2009False10.73228911298071542843NaN1542843NaNNaNNaN1129807
14913055FY2009FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
15013546FY2015TrueNaN0.751950149200619841821984182NaNNaN1492006NaNNaN
15113546FY2014FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
15213546FY2014FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
15313546FY2013FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
15413546FY2012FalseNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
15516635currentTrueNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1564792FY2014TrueNaN0.864995547675056331538563315385NaN633153855476750554767505NaN
1574792FY2014FalseNaN0.8649955476750563315385NaNNaN63315385NaN54767505NaN
1584792FY2013FalseNaN0.8751596104669469754948NaNNaN69754948NaN61046694NaN
1594792FY2012FalseNaN0.8723705882091567426560NaNNaN67426560NaN58820915NaN
\n", "
" ], "text/plain": [ " org_id FYE latest_entry 2011 data program_efficiency \\\n", "148 13055 FY2009 False 1 0.732289 \n", "149 13055 FY2009 False NaN NaN \n", "150 13546 FY2015 True NaN 0.751950 \n", "151 13546 FY2014 False NaN NaN \n", "152 13546 FY2014 False NaN NaN \n", "153 13546 FY2013 False NaN NaN \n", "154 13546 FY2012 False NaN NaN \n", "155 16635 current True NaN NaN \n", "156 4792 FY2014 True NaN 0.864995 \n", "157 4792 FY2014 False NaN 0.864995 \n", "158 4792 FY2013 False NaN 0.875159 \n", "159 4792 FY2012 False NaN 0.872370 \n", "\n", " program_expenses total_expenses total_functional_expenses_2016 \\\n", "148 1129807 1542843 NaN \n", "149 NaN NaN NaN \n", "150 1492006 1984182 1984182 \n", "151 NaN NaN NaN \n", "152 NaN NaN NaN \n", "153 NaN NaN NaN \n", "154 NaN NaN NaN \n", "155 NaN NaN NaN \n", "156 54767505 63315385 63315385 \n", "157 54767505 63315385 NaN \n", "158 61046694 69754948 NaN \n", "159 58820915 67426560 NaN \n", "\n", " total_functional_expense_2011 tot_func_expns_tot program_expenses_2016 \\\n", "148 1542843 NaN NaN \n", "149 NaN NaN NaN \n", "150 NaN NaN 1492006 \n", "151 NaN NaN NaN \n", "152 NaN NaN NaN \n", "153 NaN NaN NaN \n", "154 NaN NaN NaN \n", "155 NaN NaN NaN \n", "156 NaN 63315385 54767505 \n", "157 NaN 63315385 NaN \n", "158 NaN 69754948 NaN \n", "159 NaN 67426560 NaN \n", "\n", " tot_func_expns_prg_srvcs program_expense_2011 \n", "148 NaN 1129807 \n", "149 NaN NaN \n", "150 NaN NaN \n", "151 NaN NaN \n", "152 NaN NaN \n", "153 NaN NaN \n", "154 NaN NaN \n", "155 NaN NaN \n", "156 54767505 NaN \n", "157 54767505 NaN \n", "158 61046694 NaN \n", "159 58820915 NaN " ] }, "execution_count": 797, "metadata": {}, "output_type": "execute_result" } ], "source": [ "efficiency_columns = ['org_id', 'FYE', 'latest_entry', '2011 data', 'program_efficiency',\n", " 'program_expenses', 'total_expenses',\n", " 'total_functional_expenses_2016', 'total_functional_expense_2011', 'tot_func_expns_tot',\n", " \n", " 'program_expenses_2016', 'tot_func_expns_prg_srvcs',\n", " 'program_expense_2011', \n", " ]\n", "#'program_expense_percent_2016',\n", "df[efficiency_columns][148:160]" ] }, { "cell_type": "code", "execution_count": 799, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011 data48631.0000000.000000e+0011.0000001.0000001.0000001.000000e+00
program_efficiency218940.8054001.036348e-0100.7565680.8177580.8711051.010186e+00
program_expenses2189424379274.5940901.002906e+0801962544.2500005066605.00000016148624.7500003.157482e+09
total_expenses2189428601170.2742301.133036e+08393012545517.7500006333874.50000019654408.7500003.422010e+09
total_functional_expenses_2016798314220740.4246526.432642e+07492401847566.0000003489737.0000008646332.5000003.047227e+09
total_functional_expense_2011483317395155.3689227.292829e+071507312188637.0000004769351.00000011766482.0000003.354177e+09
tot_func_expns_tot1096447136722.8431231.522043e+08393015348063.75000013955866.00000038626359.0000003.422010e+09
program_expenses_2016798312064035.6439935.691937e+072271439223.5000002800356.0000007151241.0000002.754650e+09
tot_func_expns_prg_srvcs1096440173099.8529731.347221e+0804153747.25000011157879.50000032509353.2500003.157482e+09
program_expense_2011483314784010.6163876.451777e+07284831694422.0000003808132.0000009557716.0000003.091879e+09
\n", "
" ], "text/plain": [ " count mean std min \\\n", "2011 data 4863 1.000000 0.000000e+00 1 \n", "program_efficiency 21894 0.805400 1.036348e-01 0 \n", "program_expenses 21894 24379274.594090 1.002906e+08 0 \n", "total_expenses 21894 28601170.274230 1.133036e+08 39301 \n", "total_functional_expenses_2016 7983 14220740.424652 6.432642e+07 49240 \n", "total_functional_expense_2011 4833 17395155.368922 7.292829e+07 150731 \n", "tot_func_expns_tot 10964 47136722.843123 1.522043e+08 39301 \n", "program_expenses_2016 7983 12064035.643993 5.691937e+07 227 \n", "tot_func_expns_prg_srvcs 10964 40173099.852973 1.347221e+08 0 \n", "program_expense_2011 4833 14784010.616387 6.451777e+07 28483 \n", "\n", " 25% 50% \\\n", "2011 data 1.000000 1.000000 \n", "program_efficiency 0.756568 0.817758 \n", "program_expenses 1962544.250000 5066605.000000 \n", "total_expenses 2545517.750000 6333874.500000 \n", "total_functional_expenses_2016 1847566.000000 3489737.000000 \n", "total_functional_expense_2011 2188637.000000 4769351.000000 \n", "tot_func_expns_tot 5348063.750000 13955866.000000 \n", "program_expenses_2016 1439223.500000 2800356.000000 \n", "tot_func_expns_prg_srvcs 4153747.250000 11157879.500000 \n", "program_expense_2011 1694422.000000 3808132.000000 \n", "\n", " 75% max \n", "2011 data 1.000000 1.000000e+00 \n", "program_efficiency 0.871105 1.010186e+00 \n", "program_expenses 16148624.750000 3.157482e+09 \n", "total_expenses 19654408.750000 3.422010e+09 \n", "total_functional_expenses_2016 8646332.500000 3.047227e+09 \n", "total_functional_expense_2011 11766482.000000 3.354177e+09 \n", "tot_func_expns_tot 38626359.000000 3.422010e+09 \n", "program_expenses_2016 7151241.000000 2.754650e+09 \n", "tot_func_expns_prg_srvcs 32509353.250000 3.157482e+09 \n", "program_expense_2011 9557716.000000 3.091879e+09 " ] }, "execution_count": 799, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[efficiency_columns].describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Save DF" ] }, { "cell_type": "code", "execution_count": 800, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84958\n" ] } ], "source": [ "print len(df)\n", "df.to_pickle('Merged dataset with with Age, State, Category dummies, Total Revenues, and Efficiency.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## COMPLEXITY" ] }, { "cell_type": "code", "execution_count": 826, "metadata": { "collapsed": true }, "outputs": [], "source": [ "complexity_columns_SOI = ['FYE', 'contri_grnts_cy', 'federated_campaigns', 'memshp_dues', \n", " 'fndrsng_events', 'rltd_orgs', 'govt_grnts', 'prog_srvc_rev_cy',\n", " 'invst_incm_cy', 'oth_rev_cy']\n", "##### NOTE: 'invst_incm_cy' + 'other_rev_cy' MIGHT BE 'OTHER REVENUE' FOR CN" ] }, { "cell_type": "code", "execution_count": 819, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FYEcontributions_gifts_grants_2016federated_campaigns_2016membership_dues_2016fundraising_events_2016related_organizations_2016government_grants_2016program_service_revenue_2016other_revenue_2016
17443FY201319258100000141671324906
\n", "
" ], "text/plain": [ " FYE contributions_gifts_grants_2016 federated_campaigns_2016 \\\n", "17443 FY2013 192581 0 \n", "\n", " membership_dues_2016 fundraising_events_2016 \\\n", "17443 0 0 \n", "\n", " related_organizations_2016 government_grants_2016 \\\n", "17443 0 0 \n", "\n", " program_service_revenue_2016 other_revenue_2016 \n", "17443 141671 324906 " ] }, "execution_count": 819, "metadata": {}, "output_type": "execute_result" } ], "source": [ "complexity_columns = ['FYE', 'contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016', \n", " 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016', \n", " 'program_service_revenue_2016', 'other_revenue_2016']\n", "#'total_contributions_2016', 'total_primary_revenue_2016', 'total_revenue_2016',\n", "df[(df['EIN']=='362606232')&(df['latest_entry']=='True')][complexity_columns]" ] }, { "cell_type": "code", "execution_count": 820, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
contri_grnts_cyfederated_campaignsmemshp_duesfndrsng_eventsrltd_orgsgovt_grntsprog_srvc_rev_cyinvst_incm_cyoth_rev_cy
7261192581000001416713249060
\n", "
" ], "text/plain": [ " contri_grnts_cy federated_campaigns memshp_dues fndrsng_events \\\n", "7261 192581 0 0 0 \n", "\n", " rltd_orgs govt_grnts prog_srvc_rev_cy invst_incm_cy oth_rev_cy \n", "7261 0 0 141671 324906 0 " ] }, "execution_count": 820, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOI_2013[SOI_2013['EIN']=='362606232'][complexity_columns_SOI]" ] }, { "cell_type": "code", "execution_count": 801, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['contributions_gifts_grants_2016'] = df['contributions_gifts_grants_2016'].replace( '[\\$,)]',\n", " '', regex=True ).replace( '[(]','-', regex=True ).astype(float)\n", "df['federated_campaigns_2016'] = df['federated_campaigns_2016'].replace( '[\\$,)]',\n", " '', regex=True ).replace( '[(]','-', regex=True ).astype(float)\n", "df['membership_dues_2016'] = df['membership_dues_2016'].replace( '[\\$,)]',\n", " '', regex=True ).replace( '[(]','-', regex=True ).astype(float)\n", "df['fundraising_events_2016'] = df['fundraising_events_2016'].replace( '[\\$,)]',\n", " '', regex=True ).replace( '[(]','-', regex=True ).astype(float)\n", "df['related_organizations_2016'] = df['related_organizations_2016'].replace( '[\\$,)]',\n", " '', regex=True ).replace( '[(]','-', regex=True ).astype(float)\n", "df['government_grants_2016'] = df['government_grants_2016'].replace( '[\\$,)]',\n", " '', regex=True ).replace( '[(]','-', regex=True ).astype(float)\n", "df['program_service_revenue_2016'] = df['program_service_revenue_2016'].replace( '[\\$,)]',\n", " '', regex=True ).replace( '[(]','-', regex=True ).astype(float)\n", "df['other_revenue_2016'] = df['other_revenue_2016'].replace( '[\\$,)]',\n", " '', regex=True ).replace( '[(]','-', regex=True ).astype(float)" ] }, { "cell_type": "code", "execution_count": 802, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FYEcontributions_gifts_grants_2016federated_campaigns_2016membership_dues_2016fundraising_events_2016related_organizations_2016government_grants_2016program_service_revenue_2016other_revenue_2016
17443FY201319258100000141671324906
\n", "
" ], "text/plain": [ " FYE contributions_gifts_grants_2016 federated_campaigns_2016 \\\n", "17443 FY2013 192581 0 \n", "\n", " membership_dues_2016 fundraising_events_2016 \\\n", "17443 0 0 \n", "\n", " related_organizations_2016 government_grants_2016 \\\n", "17443 0 0 \n", "\n", " program_service_revenue_2016 other_revenue_2016 \n", "17443 141671 324906 " ] }, "execution_count": 802, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['EIN']=='362606232')&(df['latest_entry']=='True')][complexity_columns]" ] }, { "cell_type": "code", "execution_count": 803, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
contri_grnts_cyfederated_campaignsmemshp_duesfndrsng_eventsrltd_orgsgovt_grntsprog_srvc_rev_cyinvst_incm_cyoth_rev_cy
7261192581000001416713249060
\n", "
" ], "text/plain": [ " contri_grnts_cy federated_campaigns memshp_dues fndrsng_events \\\n", "7261 192581 0 0 0 \n", "\n", " rltd_orgs govt_grnts prog_srvc_rev_cy invst_incm_cy oth_rev_cy \n", "7261 0 0 141671 324906 0 " ] }, "execution_count": 803, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOI_2013[SOI_2013['EIN']=='362606232'][complexity_columns_SOI]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
I didn't keep one of the needed SOI variables so re-merge." ] }, { "cell_type": "code", "execution_count": 810, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "53\n", "8563\n", "3\n", "8563\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINFYEfndrsng_events
001020246720080
\n", "
" ], "text/plain": [ " EIN FYE fndrsng_events\n", "0 010202467 2008 0" ] }, "execution_count": 810, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOI_data_valid_EINS = pd.read_pickle('SOI_data_valid_EINS.pkl')\n", "print len(SOI_data_valid_EINS.columns)\n", "print len(SOI_data_valid_EINS)\n", "SOI_data_valid_EINS = SOI_data_valid_EINS[['EIN', 'FYE', 'fndrsng_events']]\n", "print len(SOI_data_valid_EINS.columns)\n", "print len(SOI_data_valid_EINS)\n", "SOI_data_valid_EINS.head(1)" ] }, { "cell_type": "code", "execution_count": 811, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "253\n", "84958\n", "254\n", "84958\n" ] } ], "source": [ "print len(df.columns)\n", "print len(df)\n", "print len(pd.merge(df, SOI_data_valid_EINS, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='left').columns)\n", "print len(pd.merge(df, SOI_data_valid_EINS, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='left'))" ] }, { "cell_type": "code", "execution_count": 812, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.rename(columns={'_merge':'_merge_v3'}, inplace=True)" ] }, { "cell_type": "code", "execution_count": 813, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 FY2008\n", "1 FY2009\n", "Name: FYE, dtype: object" ] }, "execution_count": 813, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOI_data_valid_EINS['FYE'] = 'FY' + SOI_data_valid_EINS['FYE']\n", "SOI_data_valid_EINS['FYE'][:2]" ] }, { "cell_type": "code", "execution_count": 814, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "253\n", "84958\n", "254\n", "255\n", "84958\n" ] } ], "source": [ "print len(df.columns)\n", "print len(df)\n", "print len(pd.merge(df, SOI_data_valid_EINS, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='left').columns)\n", "df = pd.merge(df, SOI_data_valid_EINS, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='left', indicator=True)\n", "print len(df.columns)\n", "print len(df)" ] }, { "cell_type": "code", "execution_count": 815, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "left_only 73994\n", "both 10964\n", "right_only 0\n", "dtype: int64" ] }, "execution_count": 815, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['_merge'].value_counts()" ] }, { "cell_type": "code", "execution_count": 824, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FYEcontributions_gifts_grants_2016federated_campaigns_2016membership_dues_2016fundraising_events_2016related_organizations_2016government_grants_2016program_service_revenue_2016other_revenue_2016
17443FY201319258100000141671324906
\n", "
" ], "text/plain": [ " FYE contributions_gifts_grants_2016 federated_campaigns_2016 \\\n", "17443 FY2013 192581 0 \n", "\n", " membership_dues_2016 fundraising_events_2016 \\\n", "17443 0 0 \n", "\n", " related_organizations_2016 government_grants_2016 \\\n", "17443 0 0 \n", "\n", " program_service_revenue_2016 other_revenue_2016 \n", "17443 141671 324906 " ] }, "execution_count": 824, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['EIN']=='362606232')&(df['latest_entry']=='True')][complexity_columns]" ] }, { "cell_type": "code", "execution_count": 827, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FYEcontri_grnts_cyfederated_campaignsmemshp_duesfndrsng_eventsrltd_orgsgovt_grntsprog_srvc_rev_cyinvst_incm_cyoth_rev_cy
17443FY2013192581000001416713249060
\n", "
" ], "text/plain": [ " FYE contri_grnts_cy federated_campaigns memshp_dues \\\n", "17443 FY2013 192581 0 0 \n", "\n", " fndrsng_events rltd_orgs govt_grnts prog_srvc_rev_cy invst_incm_cy \\\n", "17443 0 0 0 141671 324906 \n", "\n", " oth_rev_cy \n", "17443 0 " ] }, "execution_count": 827, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['EIN']=='362606232')&(df['latest_entry']=='True')][complexity_columns_SOI]" ] }, { "cell_type": "code", "execution_count": 828, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['FYE', 'contri_grnts_cy', 'federated_campaigns', 'memshp_dues', 'fndrsng_events', 'rltd_orgs', 'govt_grnts', 'prog_srvc_rev_cy', 'invst_incm_cy', 'oth_rev_cy']\n" ] } ], "source": [ "print complexity_columns_SOI" ] }, { "cell_type": "code", "execution_count": 829, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FYEcontri_grnts_cyfederated_campaignsmemshp_duesfndrsng_eventsrltd_orgsgovt_grntsprog_srvc_rev_cyinvst_incm_cyoth_rev_cyother_revenue_SOI
17443FY2013192581000001416713249060324906
\n", "
" ], "text/plain": [ " FYE contri_grnts_cy federated_campaigns memshp_dues \\\n", "17443 FY2013 192581 0 0 \n", "\n", " fndrsng_events rltd_orgs govt_grnts prog_srvc_rev_cy invst_incm_cy \\\n", "17443 0 0 0 141671 324906 \n", "\n", " oth_rev_cy other_revenue_SOI \n", "17443 0 324906 " ] }, "execution_count": 829, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['other_revenue_SOI'] = df['invst_incm_cy'] + df['oth_rev_cy']\n", "complexity_columns_SOI = complexity_columns_SOI + ['other_revenue_SOI']\n", "df[(df['EIN']=='362606232')&(df['latest_entry']=='True')][complexity_columnsn_SOI]" ] }, { "cell_type": "code", "execution_count": 830, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 9\n", "1 3\n", "2 9\n", "3 9\n", "4 9\n", "dtype: int64" ] }, "execution_count": 830, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[complexity_columns].astype(bool).sum(axis=1)[:5]" ] }, { "cell_type": "code", "execution_count": 836, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FYEcontributions_gifts_grants_2016federated_campaigns_2016membership_dues_2016fundraising_events_2016related_organizations_2016government_grants_2016program_service_revenue_2016other_revenue_2016
0currentNaNNaNNaNNaNNaNNaNNaNNaN
1FY2013513345000000252778
2FY2013NaNNaNNaNNaNNaNNaNNaNNaN
3FY2013NaNNaNNaNNaNNaNNaNNaNNaN
4FY2012NaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " FYE contributions_gifts_grants_2016 federated_campaigns_2016 \\\n", "0 current NaN NaN \n", "1 FY2013 513345 0 \n", "2 FY2013 NaN NaN \n", "3 FY2013 NaN NaN \n", "4 FY2012 NaN NaN \n", "\n", " membership_dues_2016 fundraising_events_2016 related_organizations_2016 \\\n", "0 NaN NaN NaN \n", "1 0 0 0 \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " government_grants_2016 program_service_revenue_2016 other_revenue_2016 \n", "0 NaN NaN NaN \n", "1 0 0 252778 \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN " ] }, "execution_count": 836, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[complexity_columns][:5]" ] }, { "cell_type": "code", "execution_count": 839, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016', 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016', 'program_service_revenue_2016', 'other_revenue_2016']\n" ] } ], "source": [ "complexity_columns.remove('FYE')\n", "print complexity_columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create *complexity_2016* \n", "http://stackoverflow.com/questions/23663623/pandas-conditional-count-across-row" ] }, { "cell_type": "code", "execution_count": 841, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 0\n", "1 2\n", "2 0\n", "Name: complexity_2016, dtype: int64" ] }, "execution_count": 841, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['complexity_2016'] = (df[complexity_columns] > 0).sum(1)\n", "df['complexity_2016'][:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create *complexity_SOI*" ] }, { "cell_type": "code", "execution_count": 843, "metadata": { "collapsed": false }, "outputs": [], "source": [ "complexity_columns_SOI.remove('FYE')\n", "complexity_columns_SOI.remove('invst_incm_cy')\n", "complexity_columns_SOI.remove('oth_rev_cy')" ] }, { "cell_type": "code", "execution_count": 844, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['contri_grnts_cy', 'federated_campaigns', 'memshp_dues', 'fndrsng_events', 'rltd_orgs', 'govt_grnts', 'prog_srvc_rev_cy', 'other_revenue_SOI']\n" ] } ], "source": [ "print complexity_columns_SOI" ] }, { "cell_type": "code", "execution_count": 845, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 0\n", "1 0\n", "2 0\n", "Name: complexity_SOI, dtype: int64" ] }, "execution_count": 845, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['complexity_SOI'] = (df[complexity_columns_SOI] > 0).sum(1)\n", "df['complexity_SOI'][:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Create combined *complexity* variable." ] }, { "cell_type": "code", "execution_count": 847, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "84958\n", "84958\n" ] } ], "source": [ "df['complexity'] = np.nan\n", "print len(df[df['complexity'].notnull()])\n", "df['complexity'] = df['complexity_2016']\n", "print len(df[df['complexity'].notnull()])\n", "df['complexity'] = np.where( (df['complexity'].isnull() & df['complexity_SOI'].notnull()),\n", " df['complexity_SOI'], df['complexity'])\n", "print len(df[df['complexity'].notnull()])" ] }, { "cell_type": "code", "execution_count": 848, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "7983" ] }, "execution_count": 848, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df[df['complexity']>0])" ] }, { "cell_type": "code", "execution_count": 849, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84958\n" ] } ], "source": [ "print len(df)\n", "df.to_pickle('Merged dataset with with Age, State, Category dummies, Total Revenues, Efficiency, Complexity.pkl')" ] }, { "cell_type": "code", "execution_count": 1227, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 84958.000000\n", "mean 0.373031\n", "std 1.220945\n", "min 0.000000\n", "25% 0.000000\n", "50% 0.000000\n", "75% 0.000000\n", "max 8.000000\n", "Name: complexity, dtype: float64" ] }, "execution_count": 1227, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['complexity'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SOX POLICIES" ] }, { "cell_type": "code", "execution_count": 853, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011conflict_of_interest_policy_2016whistleblower_policy_2016records_retention_policy_2016cnflct_int_plcywhistleblower_plcydoc_retention_plcy
0NaNNaNNaNNaNNaNNaNNaNNaNNaN
1NaNNaNNaN[_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif]NaNNaNNaN
2NaNNaNNaNNaNNaNNaNNaNNaNNaN
3NaNNaNNaNNaNNaNNaNNaNNaNNaN
4NaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " conflict_of_interest_policy_2011 whistleblower_policy_2011 \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN \n", "\n", " records_retention_policy_2011 conflict_of_interest_policy_2016 \\\n", "0 NaN NaN \n", "1 NaN [_gfx_/icons/checked.gif] \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN \n", "\n", " whistleblower_policy_2016 records_retention_policy_2016 cnflct_int_plcy \\\n", "0 NaN NaN NaN \n", "1 [_gfx_/icons/checked.gif] [_gfx_/icons/checked.gif] NaN \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " whistleblower_plcy doc_retention_plcy \n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN " ] }, "execution_count": 853, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOX_columns = ['conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011',\n", " 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016',\n", " 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy'\n", " ]\n", "df[SOX_columns][:5]" ] }, { "cell_type": "code", "execution_count": 925, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "7983\n", "12816\n", "21894\n" ] } ], "source": [ "df['conflict_of_interest_policy'] = np.nan\n", "print len(df[df['conflict_of_interest_policy'].notnull()])\n", "df['conflict_of_interest_policy'] = df['conflict_of_interest_policy_2016']\n", "print len(df[df['conflict_of_interest_policy'].notnull()])\n", "df['conflict_of_interest_policy'] = np.where( (df['conflict_of_interest_policy'].isnull() & df['conflict_of_interest_policy_2011'].notnull()),\n", " df['conflict_of_interest_policy_2011'], df['conflict_of_interest_policy'])\n", "print len(df[df['conflict_of_interest_policy'].notnull()])\n", "df['conflict_of_interest_policy'] = np.where( ( df['conflict_of_interest_policy'].isnull() & df['cnflct_int_plcy'].notnull()),\n", " df['cnflct_int_plcy'], df['conflict_of_interest_policy'])\n", "print len(df[df['conflict_of_interest_policy'].notnull()])" ] }, { "cell_type": "code", "execution_count": 855, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "7983\n", "12816\n", "21894\n" ] } ], "source": [ "df['whistleblower_policy'] = np.nan\n", "print len(df[df['whistleblower_policy'].notnull()])\n", "df['whistleblower_policy'] = df['whistleblower_policy_2016']\n", "print len(df[df['whistleblower_policy'].notnull()])\n", "df['whistleblower_policy'] = np.where( (df['whistleblower_policy'].isnull() & df['whistleblower_policy_2011'].notnull()),\n", " df['whistleblower_policy_2011'], df['whistleblower_policy'])\n", "print len(df[df['whistleblower_policy'].notnull()])\n", "df['whistleblower_policy'] = np.where( ( df['whistleblower_policy'].isnull() & df['whistleblower_plcy'].notnull()),\n", " df['whistleblower_plcy'], df['whistleblower_policy'])\n", "print len(df[df['whistleblower_policy'].notnull()])" ] }, { "cell_type": "code", "execution_count": 856, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "7983\n", "12816\n", "21894\n" ] } ], "source": [ "df['records_retention_policy'] = np.nan\n", "print len(df[df['records_retention_policy'].notnull()])\n", "df['records_retention_policy'] = df['records_retention_policy_2016']\n", "print len(df[df['records_retention_policy'].notnull()])\n", "df['records_retention_policy'] = np.where( (df['records_retention_policy'].isnull() & df['records_retention_policy_2011'].notnull()),\n", " df['records_retention_policy_2011'], df['records_retention_policy'])\n", "print len(df[df['records_retention_policy'].notnull()])\n", "df['records_retention_policy'] = np.where( ( df['records_retention_policy'].isnull() & df['doc_retention_plcy'].notnull()),\n", " df['doc_retention_plcy'], df['records_retention_policy'])\n", "print len(df[df['records_retention_policy'].notnull()])" ] }, { "cell_type": "code", "execution_count": 878, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
conflict_of_interest_policywhistleblower_policyrecords_retention_policyconflict_of_interest_policy_2016whistleblower_policy_2016records_retention_policy_2016conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011cnflct_int_plcywhistleblower_plcydoc_retention_plcy
172_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gifNaNNaNNaNNaNNaNNaN
173NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
174NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
175NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
176_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gifNaNNaNNaNNaNNaNNaN
177YYYNaNNaNNaNNaNNaNNaNYYY
178_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gifNaNNaNNaNNaNNaNNaN
179NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
180NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
181NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
182NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
183NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
184NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
185yesyesyesNaNNaNNaNyesyesyesNaNNaNNaN
186NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
187NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
188NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
189NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " conflict_of_interest_policy whistleblower_policy \\\n", "172 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "177 Y Y \n", "178 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 yes yes \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " records_retention_policy conflict_of_interest_policy_2016 \\\n", "172 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "177 Y NaN \n", "178 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 yes NaN \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " whistleblower_policy_2016 records_retention_policy_2016 \\\n", "172 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "177 NaN NaN \n", "178 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 NaN NaN \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " conflict_of_interest_policy_2011 whistleblower_policy_2011 \\\n", "172 NaN NaN \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 NaN NaN \n", "177 NaN NaN \n", "178 NaN NaN \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 yes yes \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " records_retention_policy_2011 cnflct_int_plcy whistleblower_plcy \\\n", "172 NaN NaN NaN \n", "173 NaN NaN NaN \n", "174 NaN NaN NaN \n", "175 NaN NaN NaN \n", "176 NaN NaN NaN \n", "177 NaN Y Y \n", "178 NaN NaN NaN \n", "179 NaN NaN NaN \n", "180 NaN NaN NaN \n", "181 NaN NaN NaN \n", "182 NaN NaN NaN \n", "183 NaN NaN NaN \n", "184 NaN NaN NaN \n", "185 yes NaN NaN \n", "186 NaN NaN NaN \n", "187 NaN NaN NaN \n", "188 NaN NaN NaN \n", "189 NaN NaN NaN \n", "\n", " doc_retention_plcy \n", "172 NaN \n", "173 NaN \n", "174 NaN \n", "175 NaN \n", "176 NaN \n", "177 Y \n", "178 NaN \n", "179 NaN \n", "180 NaN \n", "181 NaN \n", "182 NaN \n", "183 NaN \n", "184 NaN \n", "185 NaN \n", "186 NaN \n", "187 NaN \n", "188 NaN \n", "189 NaN " ] }, "execution_count": 878, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOX_columns = ['conflict_of_interest_policy', 'whistleblower_policy', 'records_retention_policy',\n", " 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016',\n", " 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011',\n", " 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy'\n", " ]\n", "df[SOX_columns][172:190]" ] }, { "cell_type": "code", "execution_count": 934, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 [_gfx_/icons/checked.gif]\n", "15 _gfx_/icons/checked.gif\n", "30 _gfx_/icons/checked.gif\n", "34 _gfx_/icons/checked.gif\n", "43 _gfx_/icons/checked.gif\n", "Name: conflict_of_interest_policy_2016, dtype: object" ] }, "execution_count": 934, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['conflict_of_interest_policy_2016'].notnull()]['conflict_of_interest_policy_2016'][:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
For 2016 data I inserted a *list* instead of a string (among other things, this meant I could not view frequencies). Let me fix that here. " ] }, { "cell_type": "code", "execution_count": 932, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nan \n", "_gfx_/icons/checked.gif \n" ] } ], "source": [ "for index, row in df[:2].iterrows():\n", " print row['conflict_of_interest_policy'], type(row['conflict_of_interest_policy'])\n", " if type(row['conflict_of_interest_policy'])==list:\n", " #print 'yes', type(str(row['conflict_of_interest_policy'][0])), str(row['conflict_of_interest_policy'][0])\n", " df.ix[index, 'conflict_of_interest_policy'] = str(row['conflict_of_interest_policy'][0])\n", " if type(row['whistleblower_policy'])==list:\n", " #print 'yes', type(str(row['whistleblower_policy'][0])), str(row['whistleblower_policy'][0])\n", " df.ix[index, 'whistleblower_policy'] = str(row['whistleblower_policy'][0])\n", " if type(row['records_retention_policy'])==list:\n", " try: \n", " #print 'yes', type(str(row['records_retention_policy'][0])), str(row['records_retention_policy'][0])\n", " df.ix[index, 'records_retention_policy'] = str(row['records_retention_policy'][0]) \n", " except:\n", " #print index\n", " pass" ] }, { "cell_type": "code", "execution_count": 948, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "yes 37604\n" ] } ], "source": [ "for index, row in df.iterrows():\n", " if type(row['records_retention_policy'])==list:\n", " try: \n", " print 'yes', index, type(str(row['records_retention_policy'][0])), str(row['records_retention_policy'])\n", " df.ix[index, 'records_retention_policy'] = np.nan \n", " except:\n", " #print index\n", " pass" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 950, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
conflict_of_interest_policyconflict_of_interest_policy_v2whistleblower_policyrecords_retention_policyconflict_of_interest_policy_2016whistleblower_policy_2016records_retention_policy_2016conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011cnflct_int_plcywhistleblower_plcydoc_retention_plcy
37603NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
37604_gfx_/icons/checked.gif1_gfx_/icons/checked.gif[][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][]NaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " conflict_of_interest_policy conflict_of_interest_policy_v2 \\\n", "37603 NaN NaN \n", "37604 _gfx_/icons/checked.gif 1 \n", "\n", " whistleblower_policy records_retention_policy \\\n", "37603 NaN NaN \n", "37604 _gfx_/icons/checked.gif [] \n", "\n", " conflict_of_interest_policy_2016 whistleblower_policy_2016 \\\n", "37603 NaN NaN \n", "37604 [_gfx_/icons/checked.gif] [_gfx_/icons/checked.gif] \n", "\n", " records_retention_policy_2016 conflict_of_interest_policy_2011 \\\n", "37603 NaN NaN \n", "37604 [] NaN \n", "\n", " whistleblower_policy_2011 records_retention_policy_2011 cnflct_int_plcy \\\n", "37603 NaN NaN NaN \n", "37604 NaN NaN NaN \n", "\n", " whistleblower_plcy doc_retention_plcy \n", "37603 NaN NaN \n", "37604 NaN NaN " ] }, "execution_count": 950, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[37603:37605][['conflict_of_interest_policy', 'conflict_of_interest_policy_v2', 'whistleblower_policy',\n", " 'records_retention_policy', \n", " 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016',\n", " 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011',\n", " 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy'\n", " ]]" ] }, { "cell_type": "code", "execution_count": 986, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df.set_value(37604, 'records_retention_policy', '_gfx_/icons/checked.gif')" ] }, { "cell_type": "code", "execution_count": 987, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
conflict_of_interest_policyconflict_of_interest_policy_v2whistleblower_policyrecords_retention_policyconflict_of_interest_policy_2016whistleblower_policy_2016records_retention_policy_2016conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011cnflct_int_plcywhistleblower_plcydoc_retention_plcy
37603NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
37604_gfx_/icons/checked.gif1_gfx_/icons/checked.gif_gfx_/icons/checked.gif[_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][]NaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " conflict_of_interest_policy conflict_of_interest_policy_v2 \\\n", "37603 NaN NaN \n", "37604 _gfx_/icons/checked.gif 1 \n", "\n", " whistleblower_policy records_retention_policy \\\n", "37603 NaN NaN \n", "37604 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "\n", " conflict_of_interest_policy_2016 whistleblower_policy_2016 \\\n", "37603 NaN NaN \n", "37604 [_gfx_/icons/checked.gif] [_gfx_/icons/checked.gif] \n", "\n", " records_retention_policy_2016 conflict_of_interest_policy_2011 \\\n", "37603 NaN NaN \n", "37604 [] NaN \n", "\n", " whistleblower_policy_2011 records_retention_policy_2011 cnflct_int_plcy \\\n", "37603 NaN NaN NaN \n", "37604 NaN NaN NaN \n", "\n", " whistleblower_plcy doc_retention_plcy \n", "37603 NaN NaN \n", "37604 NaN NaN " ] }, "execution_count": 987, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[37603:37605][['conflict_of_interest_policy', 'conflict_of_interest_policy_v2', 'whistleblower_policy',\n", " 'records_retention_policy', \n", " 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016',\n", " 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011',\n", " 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy'\n", " ]]" ] }, { "cell_type": "code", "execution_count": 988, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Y 8802\n", "_gfx_/icons/checked.gif 7779\n", "yes 4513\n", "NO 320\n", "N 276\n", "_gfx_/icons/checkboxX.gif 204\n", "Name: conflict_of_interest_policy, dtype: int64" ] }, "execution_count": 988, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['conflict_of_interest_policy'].value_counts()" ] }, { "cell_type": "code", "execution_count": 989, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['conflict_of_interest_policy_v2'] = np.nan\n", "df['conflict_of_interest_policy_v2'] = np.where(df['conflict_of_interest_policy']== '_gfx_/icons/checked.gif', \n", " 1, df['conflict_of_interest_policy_v2']) \n", "df['conflict_of_interest_policy_v2'] = np.where(df['conflict_of_interest_policy']== '_gfx_/icons/checkboxX.gif', \n", " 0, df['conflict_of_interest_policy_v2']) \n", "df['conflict_of_interest_policy_v2'] = np.where(df['conflict_of_interest_policy']== 'Y', \n", " 1, df['conflict_of_interest_policy_v2']) \n", "df['conflict_of_interest_policy_v2'] = np.where(df['conflict_of_interest_policy']== 'N', \n", " 0, df['conflict_of_interest_policy_v2']) \n", "df['conflict_of_interest_policy_v2'] = np.where(df['conflict_of_interest_policy']== 'yes', \n", " 1, df['conflict_of_interest_policy_v2']) \n", "df['conflict_of_interest_policy_v2'] = np.where(df['conflict_of_interest_policy']== 'NO', \n", " 0, df['conflict_of_interest_policy_v2']) " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 990, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21094\n", "800\n" ] }, { "data": { "text/plain": [ "1 21094\n", "0 800\n", "Name: conflict_of_interest_policy_v2, dtype: int64" ] }, "execution_count": 990, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print 8802+7779+4513\n", "print 320+276+204\n", "df['conflict_of_interest_policy_v2'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Now fix records_retention_policy" ] }, { "cell_type": "code", "execution_count": 991, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Y 8138\n", "_gfx_/icons/checked.gif 7247\n", "yes 3864\n", "NO 969\n", "N 940\n", "_gfx_/icons/checkboxX.gif 736\n", "Name: records_retention_policy, dtype: int64" ] }, "execution_count": 991, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['records_retention_policy'].value_counts() " ] }, { "cell_type": "code", "execution_count": 992, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['records_retention_policy_v2'] = np.nan\n", "df['records_retention_policy_v2'] = np.where(df['records_retention_policy']== '_gfx_/icons/checked.gif', \n", " 1, df['records_retention_policy_v2']) \n", "df['records_retention_policy_v2'] = np.where(df['records_retention_policy']== '_gfx_/icons/checkboxX.gif', \n", " 0, df['records_retention_policy_v2']) \n", "df['records_retention_policy_v2'] = np.where(df['records_retention_policy']== 'Y', \n", " 1, df['records_retention_policy_v2']) \n", "df['records_retention_policy_v2'] = np.where(df['records_retention_policy']== 'N', \n", " 0, df['records_retention_policy_v2']) \n", "df['records_retention_policy_v2'] = np.where(df['records_retention_policy']== 'yes', \n", " 1, df['records_retention_policy_v2']) \n", "df['records_retention_policy_v2'] = np.where(df['records_retention_policy']== 'NO', \n", " 0, df['records_retention_policy_v2']) " ] }, { "cell_type": "code", "execution_count": 993, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "19248\n", "2645\n" ] }, { "data": { "text/plain": [ "1 19249\n", "0 2645\n", "Name: records_retention_policy_v2, dtype: int64" ] }, "execution_count": 993, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print 8138+7246+3864\n", "print 969+940+736\n", "df['records_retention_policy_v2'].value_counts() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Now fix whistleblower_policy" ] }, { "cell_type": "code", "execution_count": 994, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Y 8145\n", "_gfx_/icons/checked.gif 7297\n", "yes 3867\n", "NO 966\n", "N 933\n", "_gfx_/icons/checkboxX.gif 686\n", "Name: whistleblower_policy, dtype: int64" ] }, "execution_count": 994, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['whistleblower_policy'].value_counts() " ] }, { "cell_type": "code", "execution_count": 995, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['whistleblower_policy_v2'] = np.nan\n", "df['whistleblower_policy_v2'] = np.where(df['whistleblower_policy']== '_gfx_/icons/checked.gif', \n", " 1, df['whistleblower_policy_v2']) \n", "df['whistleblower_policy_v2'] = np.where(df['whistleblower_policy']== '_gfx_/icons/checkboxX.gif', \n", " 0, df['whistleblower_policy_v2']) \n", "df['whistleblower_policy_v2'] = np.where(df['whistleblower_policy']== 'Y', \n", " 1, df['whistleblower_policy_v2']) \n", "df['whistleblower_policy_v2'] = np.where(df['whistleblower_policy']== 'N', \n", " 0, df['whistleblower_policy_v2']) \n", "df['whistleblower_policy_v2'] = np.where(df['whistleblower_policy']== 'yes', \n", " 1, df['whistleblower_policy_v2']) \n", "df['whistleblower_policy_v2'] = np.where(df['whistleblower_policy']== 'NO', \n", " 0, df['whistleblower_policy_v2']) " ] }, { "cell_type": "code", "execution_count": 996, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "19309\n", "2585\n" ] }, { "data": { "text/plain": [ "1 19309\n", "0 2585\n", "Name: whistleblower_policy_v2, dtype: int64" ] }, "execution_count": 996, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print 8145+7297+3867\n", "print 966+933+686\n", "df['whistleblower_policy_v2'].value_counts() " ] }, { "cell_type": "code", "execution_count": 997, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
conflict_of_interest_policyconflict_of_interest_policy_v2whistleblower_policywhistleblower_policy_v2records_retention_policyrecords_retention_policy_v2conflict_of_interest_policy_2016whistleblower_policy_2016records_retention_policy_2016conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011cnflct_int_plcywhistleblower_plcydoc_retention_plcy
172_gfx_/icons/checked.gif1_gfx_/icons/checked.gif1_gfx_/icons/checked.gif1_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gifNaNNaNNaNNaNNaNNaN
173NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
174NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
175NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
176_gfx_/icons/checked.gif1_gfx_/icons/checked.gif1_gfx_/icons/checked.gif1_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gifNaNNaNNaNNaNNaNNaN
177Y1Y1Y1NaNNaNNaNNaNNaNNaNYYY
178_gfx_/icons/checked.gif1_gfx_/icons/checked.gif1_gfx_/icons/checked.gif1_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gifNaNNaNNaNNaNNaNNaN
179NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
180NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
181NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
182NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
183NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
184NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
185yes1yes1yes1NaNNaNNaNyesyesyesNaNNaNNaN
186NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
187NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
188NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
189NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " conflict_of_interest_policy conflict_of_interest_policy_v2 \\\n", "172 _gfx_/icons/checked.gif 1 \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 _gfx_/icons/checked.gif 1 \n", "177 Y 1 \n", "178 _gfx_/icons/checked.gif 1 \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 yes 1 \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " whistleblower_policy whistleblower_policy_v2 \\\n", "172 _gfx_/icons/checked.gif 1 \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 _gfx_/icons/checked.gif 1 \n", "177 Y 1 \n", "178 _gfx_/icons/checked.gif 1 \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 yes 1 \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " records_retention_policy records_retention_policy_v2 \\\n", "172 _gfx_/icons/checked.gif 1 \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 _gfx_/icons/checked.gif 1 \n", "177 Y 1 \n", "178 _gfx_/icons/checked.gif 1 \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 yes 1 \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " conflict_of_interest_policy_2016 whistleblower_policy_2016 \\\n", "172 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "177 NaN NaN \n", "178 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 NaN NaN \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " records_retention_policy_2016 conflict_of_interest_policy_2011 \\\n", "172 _gfx_/icons/checked.gif NaN \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 _gfx_/icons/checked.gif NaN \n", "177 NaN NaN \n", "178 _gfx_/icons/checked.gif NaN \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 NaN yes \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " whistleblower_policy_2011 records_retention_policy_2011 cnflct_int_plcy \\\n", "172 NaN NaN NaN \n", "173 NaN NaN NaN \n", "174 NaN NaN NaN \n", "175 NaN NaN NaN \n", "176 NaN NaN NaN \n", "177 NaN NaN Y \n", "178 NaN NaN NaN \n", "179 NaN NaN NaN \n", "180 NaN NaN NaN \n", "181 NaN NaN NaN \n", "182 NaN NaN NaN \n", "183 NaN NaN NaN \n", "184 NaN NaN NaN \n", "185 yes yes NaN \n", "186 NaN NaN NaN \n", "187 NaN NaN NaN \n", "188 NaN NaN NaN \n", "189 NaN NaN NaN \n", "\n", " whistleblower_plcy doc_retention_plcy \n", "172 NaN NaN \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 NaN NaN \n", "177 Y Y \n", "178 NaN NaN \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 NaN NaN \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN " ] }, "execution_count": 997, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOX_columns = ['conflict_of_interest_policy', 'conflict_of_interest_policy_v2', 'whistleblower_policy',\n", " 'whistleblower_policy_v2', 'records_retention_policy', 'records_retention_policy_v2',\n", " 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016',\n", " 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011',\n", " 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy'\n", " ]\n", "df[SOX_columns][172:190]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create combined *SOX_policy* variable" ] }, { "cell_type": "code", "execution_count": 998, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 NaN\n", "1 3\n", "2 NaN\n", "Name: SOX_policies, dtype: float64" ] }, "execution_count": 998, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOX_columns = ['conflict_of_interest_policy_v2', 'whistleblower_policy_v2', 'records_retention_policy_v2']\n", "#df['SOX_policies'] = (df[SOX_columns] > 0).sum(1)\n", "df['SOX_policies'] = np.where(df['conflict_of_interest_policy_v2'].notnull(),\n", " (df[SOX_columns] > 0).sum(1), np.nan)\n", "df['SOX_policies'][:3]" ] }, { "cell_type": "code", "execution_count": 999, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SOX_policiesconflict_of_interest_policyconflict_of_interest_policy_v2whistleblower_policywhistleblower_policy_v2records_retention_policyrecords_retention_policy_v2conflict_of_interest_policy_2016whistleblower_policy_2016records_retention_policy_2016conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011cnflct_int_plcywhistleblower_plcydoc_retention_plcy
1723_gfx_/icons/checked.gif1_gfx_/icons/checked.gif1_gfx_/icons/checked.gif1_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gifNaNNaNNaNNaNNaNNaN
173NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
174NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
175NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1763_gfx_/icons/checked.gif1_gfx_/icons/checked.gif1_gfx_/icons/checked.gif1_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gifNaNNaNNaNNaNNaNNaN
1773Y1Y1Y1NaNNaNNaNNaNNaNNaNYYY
1783_gfx_/icons/checked.gif1_gfx_/icons/checked.gif1_gfx_/icons/checked.gif1_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gifNaNNaNNaNNaNNaNNaN
179NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
180NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
181NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
182NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
183NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
184NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1853yes1yes1yes1NaNNaNNaNyesyesyesNaNNaNNaN
186NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
187NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
188NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
189NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " SOX_policies conflict_of_interest_policy conflict_of_interest_policy_v2 \\\n", "172 3 _gfx_/icons/checked.gif 1 \n", "173 NaN NaN NaN \n", "174 NaN NaN NaN \n", "175 NaN NaN NaN \n", "176 3 _gfx_/icons/checked.gif 1 \n", "177 3 Y 1 \n", "178 3 _gfx_/icons/checked.gif 1 \n", "179 NaN NaN NaN \n", "180 NaN NaN NaN \n", "181 NaN NaN NaN \n", "182 NaN NaN NaN \n", "183 NaN NaN NaN \n", "184 NaN NaN NaN \n", "185 3 yes 1 \n", "186 NaN NaN NaN \n", "187 NaN NaN NaN \n", "188 NaN NaN NaN \n", "189 NaN NaN NaN \n", "\n", " whistleblower_policy whistleblower_policy_v2 \\\n", "172 _gfx_/icons/checked.gif 1 \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 _gfx_/icons/checked.gif 1 \n", "177 Y 1 \n", "178 _gfx_/icons/checked.gif 1 \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 yes 1 \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " records_retention_policy records_retention_policy_v2 \\\n", "172 _gfx_/icons/checked.gif 1 \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 _gfx_/icons/checked.gif 1 \n", "177 Y 1 \n", "178 _gfx_/icons/checked.gif 1 \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 yes 1 \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " conflict_of_interest_policy_2016 whistleblower_policy_2016 \\\n", "172 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "177 NaN NaN \n", "178 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 NaN NaN \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " records_retention_policy_2016 conflict_of_interest_policy_2011 \\\n", "172 _gfx_/icons/checked.gif NaN \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 _gfx_/icons/checked.gif NaN \n", "177 NaN NaN \n", "178 _gfx_/icons/checked.gif NaN \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 NaN yes \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN \n", "\n", " whistleblower_policy_2011 records_retention_policy_2011 cnflct_int_plcy \\\n", "172 NaN NaN NaN \n", "173 NaN NaN NaN \n", "174 NaN NaN NaN \n", "175 NaN NaN NaN \n", "176 NaN NaN NaN \n", "177 NaN NaN Y \n", "178 NaN NaN NaN \n", "179 NaN NaN NaN \n", "180 NaN NaN NaN \n", "181 NaN NaN NaN \n", "182 NaN NaN NaN \n", "183 NaN NaN NaN \n", "184 NaN NaN NaN \n", "185 yes yes NaN \n", "186 NaN NaN NaN \n", "187 NaN NaN NaN \n", "188 NaN NaN NaN \n", "189 NaN NaN NaN \n", "\n", " whistleblower_plcy doc_retention_plcy \n", "172 NaN NaN \n", "173 NaN NaN \n", "174 NaN NaN \n", "175 NaN NaN \n", "176 NaN NaN \n", "177 Y Y \n", "178 NaN NaN \n", "179 NaN NaN \n", "180 NaN NaN \n", "181 NaN NaN \n", "182 NaN NaN \n", "183 NaN NaN \n", "184 NaN NaN \n", "185 NaN NaN \n", "186 NaN NaN \n", "187 NaN NaN \n", "188 NaN NaN \n", "189 NaN NaN " ] }, "execution_count": 999, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOX_columns = ['SOX_policies', 'conflict_of_interest_policy', 'conflict_of_interest_policy_v2', 'whistleblower_policy',\n", " 'whistleblower_policy_v2', 'records_retention_policy', 'records_retention_policy_v2',\n", " 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016',\n", " 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011',\n", " 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy'\n", " ]\n", "df[SOX_columns][172:190]" ] }, { "cell_type": "code", "execution_count": 1005, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21894\n" ] }, { "data": { "text/plain": [ "3 18283\n", "2 1798\n", "1 1207\n", "0 606\n", "Name: SOX_policies, dtype: int64" ] }, "execution_count": 1005, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print df['SOX_policies'].value_counts().sum()\n", "df['SOX_policies'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1006, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21894\n" ] }, { "data": { "text/plain": [ "1 21288\n", "0 606\n", "Name: SOX_policies_binary, dtype: int64" ] }, "execution_count": 1006, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print df['SOX_policies_binary'].value_counts().sum()\n", "df['SOX_policies_binary'] = df['SOX_policies']\n", "df['SOX_policies_binary'] = np.where(df['SOX_policies_binary']>=1, 1, df['SOX_policies'])\n", "df['SOX_policies_binary'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1001, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21894\n", "conflict_of_interest_policy_v2\n" ] }, { "data": { "text/plain": [ "1 21094\n", "0 800\n", "Name: conflict_of_interest_policy_v2, dtype: int64" ] }, "execution_count": 1001, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOX_columns = ['conflict_of_interest_policy_v2', 'whistleblower_policy_v2', 'records_retention_policy_v2']\n", "print df[SOX_columns[0]].value_counts().sum()\n", "print SOX_columns[0]\n", "df[SOX_columns[0]].value_counts()" ] }, { "cell_type": "code", "execution_count": 1002, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "whistleblower_policy_v2\n", "21894\n" ] }, { "data": { "text/plain": [ "1 19309\n", "0 2585\n", "Name: whistleblower_policy_v2, dtype: int64" ] }, "execution_count": 1002, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print SOX_columns[1]\n", "print df[SOX_columns[1]].value_counts().sum()\n", "df[SOX_columns[1]].value_counts()" ] }, { "cell_type": "code", "execution_count": 1003, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "records_retention_policy_v2\n", "21894\n" ] }, { "data": { "text/plain": [ "1 19249\n", "0 2645\n", "Name: records_retention_policy_v2, dtype: int64" ] }, "execution_count": 1003, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print SOX_columns[2]\n", "print df[SOX_columns[2]].value_counts().sum()\n", "df[SOX_columns[2]].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Create binary version for *all three policies*" ] }, { "cell_type": "code", "execution_count": 1149, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 18283\n", "0 3611\n", "Name: SOX_policies_all_binary, dtype: int64" ] }, "execution_count": 1149, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['SOX_policies_all_binary'] = np.nan\n", "df['SOX_policies_all_binary'] = df['SOX_policies']\n", "df['SOX_policies_all_binary'] = np.where( ((df['SOX_policies_all_binary']==1) | (df['SOX_policies_all_binary']==2)),\n", " 0, df['SOX_policies_all_binary'])\n", "df['SOX_policies_all_binary'] = np.where(df['SOX_policies_all_binary']==3, 1, df['SOX_policies_all_binary'])\n", "df['SOX_policies_all_binary'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1150, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3 18283\n", "2 1798\n", "1 1207\n", "0 606\n", "Name: SOX_policies, dtype: int64 \n", "\n", "1 21288\n", "0 606\n", "Name: SOX_policies_binary, dtype: int64 \n", "\n", "1 18283\n", "0 3611\n", "Name: SOX_policies_all_binary, dtype: int64\n" ] } ], "source": [ "print df['SOX_policies'].value_counts(), '\\n'\n", "print df['SOX_policies_binary'].value_counts(), '\\n'\n", "print df['SOX_policies_all_binary'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Save DF" ] }, { "cell_type": "code", "execution_count": 1152, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84958\n" ] } ], "source": [ "print len(df)\n", "df.to_pickle('Merged dataset with Age, State, Category dummies, Total Revenues, Efficiency, Complexity, SOX.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### '2016 data' Indicator" ] }, { "cell_type": "code", "execution_count": 1010, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84958 84958\n" ] }, { "data": { "text/plain": [ "0 76654\n", "1 8304\n", "Name: 2016_data, dtype: int64" ] }, "execution_count": 1010, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['2016_data'] = np.where(df['latest_entry']=='True', 1,0)\n", "print len(df), df['2016_data'].value_counts().sum()\n", "df['2016_data'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1011, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
latest_entryFalseFalseTrue
2016_data
048755930
1008304
\n", "
" ], "text/plain": [ "latest_entry False False True\n", "2016_data \n", "0 48 75593 0\n", "1 0 0 8304" ] }, "execution_count": 1011, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df['2016_data'], df['latest_entry'])" ] }, { "cell_type": "code", "execution_count": 1014, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 4863\n", "Name: 2011_data, dtype: int64 \n", "\n", "0 80095\n", "1 4863\n", "Name: 2011_data, dtype: int64\n" ] } ], "source": [ "df.rename(columns={'2011 data':'2011_data'}, inplace=True)\n", "print df['2011_data'].value_counts(), '\\n'\n", "df['2011_data'] = np.where(df['2011_data']==1, 1,0)\n", "print df['2011_data'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Save DF" ] }, { "cell_type": "code", "execution_count": 1034, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84958\n" ] } ], "source": [ "print len(df)\n", "df.to_pickle('Merged dataset with Age, State, Category dummies, Total Revenues, Efficiency, Complexity, SOX.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DONOR ADVISORY" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['advisory text - current advisory']= df['advisory text - current advisory'].str.strip()" ] }, { "cell_type": "code", "execution_count": 1036, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idDate PublishedFYEOverall Ratingadvisory text - current advisoryadvisory text - past advisory
0167222016-08-12 00:00:00currentcurrent (2016) donor advisoryOn August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\"NaN
1101662016-06-01 00:00:00FY20133 starsNaNNaN
2101662015-12-01 00:00:00FY20133 starsNaNNaN
3101662015-08-01 00:00:00FY20133 starsNaNNaN
4101662014-08-01 00:00:00FY20123 starsNaNNaN
5101662013-11-01 00:00:00FY20123 starsNaNNaN
6101662012-09-01 00:00:00FY20113 starsNaNNaN
7101662012-04-01 00:00:00FY20103 starsNaNNaN
8101662012-03-01 00:00:00FY20103 starsNaNNaN
9101662011-01-05 00:00:00FY2009Donor AdvisoryNaNThis donor advisory was published on Wednesday, January 5, 2011.In accordance with our.policy for removing Donor Advisories., Charity Navigator removed the Donor Advisory for Angel Flight Northeast on March 1, 2012 because the Donor Advisory had been in place for more than a year (since January 5, 2011) and because the issue that prompted the Donor Advisory has been resolved..Charity Navigator had published a Donor Advisory for this  charity because we became aware of the following informati...
\n", "
" ], "text/plain": [ " org_id Date Published FYE Overall Rating \\\n", "0 16722 2016-08-12 00:00:00 current current (2016) donor advisory \n", "1 10166 2016-06-01 00:00:00 FY2013 3 stars \n", "2 10166 2015-12-01 00:00:00 FY2013 3 stars \n", "3 10166 2015-08-01 00:00:00 FY2013 3 stars \n", "4 10166 2014-08-01 00:00:00 FY2012 3 stars \n", "5 10166 2013-11-01 00:00:00 FY2012 3 stars \n", "6 10166 2012-09-01 00:00:00 FY2011 3 stars \n", "7 10166 2012-04-01 00:00:00 FY2010 3 stars \n", "8 10166 2012-03-01 00:00:00 FY2010 3 stars \n", "9 10166 2011-01-05 00:00:00 FY2009 Donor Advisory \n", "\n", " advisory text - current advisory \\\n", "0 On August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\" \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "5 NaN \n", "6 NaN \n", "7 NaN \n", "8 NaN \n", "9 NaN \n", "\n", " advisory text - past advisory \n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "5 NaN \n", "6 NaN \n", "7 NaN \n", "8 NaN \n", "9 This donor advisory was published on Wednesday, January 5, 2011.In accordance with our.policy for removing Donor Advisories., Charity Navigator removed the Donor Advisory for Angel Flight Northeast on March 1, 2012 because the Donor Advisory had been in place for more than a year (since January 5, 2011) and because the issue that prompted the Donor Advisory has been resolved..Charity Navigator had published a Donor Advisory for this  charity because we became aware of the following informati... " ] }, "execution_count": 1036, "metadata": {}, "output_type": "execute_result" } ], "source": [ "advisory_columns = ['org_id', 'Date Published', 'FYE', 'Overall Rating', \n", " 'advisory text - current advisory', 'advisory text - past advisory'\n", " ]\n", "df[advisory_columns][:10]" ] }, { "cell_type": "code", "execution_count": 1038, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idDate PublishedFYEOverall RatingAdvisory Textadvisory text - current advisoryadvisory text - past advisory
0167222016-08-12 00:00:00currentcurrent (2016) donor advisoryOn August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\"On August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\"NaN
1101662016-06-01 00:00:00FY20133 starsNaNNaNNaN
2101662015-12-01 00:00:00FY20133 starsNaNNaNNaN
3101662015-08-01 00:00:00FY20133 starsNaNNaNNaN
4101662014-08-01 00:00:00FY20123 starsNaNNaNNaN
5101662013-11-01 00:00:00FY20123 starsNaNNaNNaN
6101662012-09-01 00:00:00FY20113 starsNaNNaNNaN
7101662012-04-01 00:00:00FY20103 starsNaNNaNNaN
8101662012-03-01 00:00:00FY20103 starsNaNNaNNaN
9101662011-01-05 00:00:00FY2009Donor AdvisoryThis donor advisory was published on Wednesday, January 5, 2011.In accordance with our.policy for removing Donor Advisories., Charity Navigator removed the Donor Advisory for Angel Flight Northeast on March 1, 2012 because the Donor Advisory had been in place for more than a year (since January 5, 2011) and because the issue that prompted the Donor Advisory has been resolved..Charity Navigator had published a Donor Advisory for this  charity because we became aware of the following informati...NaNThis donor advisory was published on Wednesday, January 5, 2011.In accordance with our.policy for removing Donor Advisories., Charity Navigator removed the Donor Advisory for Angel Flight Northeast on March 1, 2012 because the Donor Advisory had been in place for more than a year (since January 5, 2011) and because the issue that prompted the Donor Advisory has been resolved..Charity Navigator had published a Donor Advisory for this  charity because we became aware of the following informati...
\n", "
" ], "text/plain": [ " org_id Date Published FYE Overall Rating \\\n", "0 16722 2016-08-12 00:00:00 current current (2016) donor advisory \n", "1 10166 2016-06-01 00:00:00 FY2013 3 stars \n", "2 10166 2015-12-01 00:00:00 FY2013 3 stars \n", "3 10166 2015-08-01 00:00:00 FY2013 3 stars \n", "4 10166 2014-08-01 00:00:00 FY2012 3 stars \n", "5 10166 2013-11-01 00:00:00 FY2012 3 stars \n", "6 10166 2012-09-01 00:00:00 FY2011 3 stars \n", "7 10166 2012-04-01 00:00:00 FY2010 3 stars \n", "8 10166 2012-03-01 00:00:00 FY2010 3 stars \n", "9 10166 2011-01-05 00:00:00 FY2009 Donor Advisory \n", "\n", " Advisory Text \\\n", "0 On August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\" \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "5 NaN \n", "6 NaN \n", "7 NaN \n", "8 NaN \n", "9 This donor advisory was published on Wednesday, January 5, 2011.In accordance with our.policy for removing Donor Advisories., Charity Navigator removed the Donor Advisory for Angel Flight Northeast on March 1, 2012 because the Donor Advisory had been in place for more than a year (since January 5, 2011) and because the issue that prompted the Donor Advisory has been resolved..Charity Navigator had published a Donor Advisory for this  charity because we became aware of the following informati... \n", "\n", " advisory text - current advisory \\\n", "0 On August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\" \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "5 NaN \n", "6 NaN \n", "7 NaN \n", "8 NaN \n", "9 NaN \n", "\n", " advisory text - past advisory \n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "5 NaN \n", "6 NaN \n", "7 NaN \n", "8 NaN \n", "9 This donor advisory was published on Wednesday, January 5, 2011.In accordance with our.policy for removing Donor Advisories., Charity Navigator removed the Donor Advisory for Angel Flight Northeast on March 1, 2012 because the Donor Advisory had been in place for more than a year (since January 5, 2011) and because the issue that prompted the Donor Advisory has been resolved..Charity Navigator had published a Donor Advisory for this  charity because we became aware of the following informati... " ] }, "execution_count": 1038, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Advisory Text'] = df['advisory text - current advisory']\n", "df['Advisory Text'] = np.where( (df['Advisory Text'].isnull() & df['advisory text - past advisory'].notnull()),\n", " df['advisory text - past advisory'], df['Advisory Text'])\n", "advisory_columns = ['org_id', 'Date Published', 'FYE', 'Overall Rating', 'Advisory Text',\n", " 'advisory text - current advisory', 'advisory text - past advisory'\n", " ]\n", "df[advisory_columns][:10] " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df['donor_advisory'] = df['Overall Rating'].str.contains('advisory', case=False)\n", "#df['donor_advisory'] = df['donor_advisory'].convert_objects(convert_numeric=True) #OLD CODE\n", "df['donor_advisory'] = pd.to_numeric(df['donor_advisory'])\n", "df['Advisory Text'] = df['advisory text - current advisory']" ] }, { "cell_type": "code", "execution_count": 1052, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idFYE2016_dataOverall Ratingdonor_advisorycurrent_donor_advisoryAdvisory Text
016722current1current (2016) donor advisory11On August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\"
110166FY201313 stars00NaN
210166FY201303 stars00NaN
310166FY201303 stars00NaN
410166FY201203 stars00NaN
510166FY201203 stars00NaN
610166FY201103 stars00NaN
710166FY201003 stars00NaN
810166FY201003 stars00NaN
910166FY20090Donor Advisory10NaN
\n", "
" ], "text/plain": [ " org_id FYE 2016_data Overall Rating donor_advisory \\\n", "0 16722 current 1 current (2016) donor advisory 1 \n", "1 10166 FY2013 1 3 stars 0 \n", "2 10166 FY2013 0 3 stars 0 \n", "3 10166 FY2013 0 3 stars 0 \n", "4 10166 FY2012 0 3 stars 0 \n", "5 10166 FY2012 0 3 stars 0 \n", "6 10166 FY2011 0 3 stars 0 \n", "7 10166 FY2010 0 3 stars 0 \n", "8 10166 FY2010 0 3 stars 0 \n", "9 10166 FY2009 0 Donor Advisory 1 \n", "\n", " current_donor_advisory \\\n", "0 1 \n", "1 0 \n", "2 0 \n", "3 0 \n", "4 0 \n", "5 0 \n", "6 0 \n", "7 0 \n", "8 0 \n", "9 0 \n", "\n", " Advisory Text \n", "0 On August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\" \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "5 NaN \n", "6 NaN \n", "7 NaN \n", "8 NaN \n", "9 NaN " ] }, "execution_count": 1052, "metadata": {}, "output_type": "execute_result" } ], "source": [ "advisory_columns = ['org_id', 'FYE', '2016_data', 'Overall Rating', 'donor_advisory', 'current_donor_advisory',\n", " 'Advisory Text', \n", " ]\n", "df[advisory_columns][:10] " ] }, { "cell_type": "code", "execution_count": 1049, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#df = df.drop('2016_donor_advisory', 1) \n", "#df = df.drop('donor_advisory_2016', 1)" ] }, { "cell_type": "code", "execution_count": 1057, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1061" ] }, "execution_count": 1057, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df[df['donor_advisory'].isnull()])" ] }, { "cell_type": "code", "execution_count": 1054, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "org_id object\n", "FYE object\n", "2016_data int64\n", "Overall Rating object\n", "donor_advisory float64\n", "current_donor_advisory float64\n", "Advisory Text object\n", "dtype: object" ] }, "execution_count": 1054, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[advisory_columns].dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create *2016_donor_advisory* variable\n", "We need this for '2011' test" ] }, { "cell_type": "code", "execution_count": 1062, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "321\n", "321\n", "321\n", "321\n" ] } ], "source": [ "print len(df[(df['2016_data']==1) & (df['donor_advisory']==1)])\n", "print len(df[(df['2016_data']==1) & (df['donor_advisory']==1)]['org_id'].tolist())\n", "advisories_2016 = df[(df['2016_data']==1) & (df['donor_advisory']==1)]['org_id'].tolist()\n", "print len(advisories_2016)\n", "print len(set(advisories_2016))" ] }, { "cell_type": "code", "execution_count": 1064, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 84590\n", "1 368\n", "Name: 2016_donor_advisory, dtype: int64 \n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idFYE2016_dataOverall Rating2016_donor_advisorydonor_advisorycurrent_donor_advisoryAdvisory Text
016722current1current (2016) donor advisory111On August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\"
110166FY201313 stars000NaN
210166FY201303 stars000NaN
310166FY201303 stars000NaN
410166FY201203 stars000NaN
510166FY201203 stars000NaN
610166FY201103 stars000NaN
710166FY201003 stars000NaN
810166FY201003 stars000NaN
910166FY20090Donor Advisory010NaN
\n", "
" ], "text/plain": [ " org_id FYE 2016_data Overall Rating \\\n", "0 16722 current 1 current (2016) donor advisory \n", "1 10166 FY2013 1 3 stars \n", "2 10166 FY2013 0 3 stars \n", "3 10166 FY2013 0 3 stars \n", "4 10166 FY2012 0 3 stars \n", "5 10166 FY2012 0 3 stars \n", "6 10166 FY2011 0 3 stars \n", "7 10166 FY2010 0 3 stars \n", "8 10166 FY2010 0 3 stars \n", "9 10166 FY2009 0 Donor Advisory \n", "\n", " 2016_donor_advisory donor_advisory current_donor_advisory \\\n", "0 1 1 1 \n", "1 0 0 0 \n", "2 0 0 0 \n", "3 0 0 0 \n", "4 0 0 0 \n", "5 0 0 0 \n", "6 0 0 0 \n", "7 0 0 0 \n", "8 0 0 0 \n", "9 0 1 0 \n", "\n", " Advisory Text \n", "0 On August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\" \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "5 NaN \n", "6 NaN \n", "7 NaN \n", "8 NaN \n", "9 NaN " ] }, "execution_count": 1064, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['2016_donor_advisory'] = np.nan\n", "df['2016_donor_advisory'] = np.where( df['org_id'].isin(advisories_2016), 1, 0)\n", "print df['2016_donor_advisory'].value_counts(), '\\n'\n", "advisory_columns = ['org_id', 'FYE', '2016_data', 'Overall Rating', '2016_donor_advisory', \n", " 'donor_advisory', 'current_donor_advisory', 'Advisory Text'\n", " ]\n", "df[advisory_columns][:10] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Verify data" ] }, { "cell_type": "code", "execution_count": 1065, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "47" ] }, "execution_count": 1065, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df[(df['2011_data']==1) & (df['2016_donor_advisory']==1)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Post-2011 Donor Advisory" ] }, { "cell_type": "code", "execution_count": 1076, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "391\n", "391\n", "391\n" ] } ], "source": [ "print len(set(df[df['donor_advisory']==1]['org_id'].tolist()))\n", "advisories_2011 = set(df[df['donor_advisory']==1]['org_id'].tolist())\n", "print len(advisories_2011)\n", "print len(set(advisories_2011))" ] }, { "cell_type": "code", "execution_count": 1077, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 83736\n", "1 1222\n", "Name: 2011_to_2016_donor_advisory, dtype: int64 \n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idFYE2016_dataOverall Rating2016_donor_advisory2011_to_2016_donor_advisorydonor_advisorycurrent_donor_advisoryAdvisory Text
016722current1current (2016) donor advisory1111On August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\"
110166FY201313 stars0100NaN
210166FY201303 stars0100NaN
310166FY201303 stars0100NaN
410166FY201203 stars0100NaN
510166FY201203 stars0100NaN
610166FY201103 stars0100NaN
710166FY201003 stars0100NaN
810166FY201003 stars0100NaN
910166FY20090Donor Advisory0110NaN
\n", "
" ], "text/plain": [ " org_id FYE 2016_data Overall Rating \\\n", "0 16722 current 1 current (2016) donor advisory \n", "1 10166 FY2013 1 3 stars \n", "2 10166 FY2013 0 3 stars \n", "3 10166 FY2013 0 3 stars \n", "4 10166 FY2012 0 3 stars \n", "5 10166 FY2012 0 3 stars \n", "6 10166 FY2011 0 3 stars \n", "7 10166 FY2010 0 3 stars \n", "8 10166 FY2010 0 3 stars \n", "9 10166 FY2009 0 Donor Advisory \n", "\n", " 2016_donor_advisory 2011_to_2016_donor_advisory donor_advisory \\\n", "0 1 1 1 \n", "1 0 1 0 \n", "2 0 1 0 \n", "3 0 1 0 \n", "4 0 1 0 \n", "5 0 1 0 \n", "6 0 1 0 \n", "7 0 1 0 \n", "8 0 1 0 \n", "9 0 1 1 \n", "\n", " current_donor_advisory \\\n", "0 1 \n", "1 0 \n", "2 0 \n", "3 0 \n", "4 0 \n", "5 0 \n", "6 0 \n", "7 0 \n", "8 0 \n", "9 0 \n", "\n", " Advisory Text \n", "0 On August 1, 2016, the New Hampshire Union Leader published an article titled, \"Former Portsmouth youth softball president accused of stealing thousands from nonprofit.\" \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "4 NaN \n", "5 NaN \n", "6 NaN \n", "7 NaN \n", "8 NaN \n", "9 NaN " ] }, "execution_count": 1077, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['2011_to_2016_donor_advisory'] = np.nan\n", "df['2011_to_2016_donor_advisory'] = np.where( df['org_id'].isin(advisories_2011), 1, 0)\n", "print df['2011_to_2016_donor_advisory'].value_counts(), '\\n'\n", "advisory_columns = ['org_id', 'FYE', '2016_data', 'Overall Rating', '2016_donor_advisory', \n", " '2011_to_2016_donor_advisory',\n", " 'donor_advisory', 'current_donor_advisory', 'Advisory Text'\n", " ]\n", "df[advisory_columns][:10] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Save DF" ] }, { "cell_type": "code", "execution_count": 1078, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84958\n" ] } ], "source": [ "print len(df)\n", "df.to_pickle('Merged dataset with Age, State, Category dummies, Total Revenues, Efficiency, Complexity, SOX, Donor Advisory.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Material Diversion" ] }, { "cell_type": "code", "execution_count": 1117, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "N 10921\n", "Y 43\n", "Name: mtrl_divrsn_or_misuse, dtype: int64" ] }, "execution_count": 1117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['mtrl_divrsn_or_misuse'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1118, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 83506\n", "1 391\n", "Name: donor_advisory, dtype: int64" ] }, "execution_count": 1118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['donor_advisory'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1116, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mtrl_divrsn_or_misuseNY
donor_advisory
0989235
1102
\n", "
" ], "text/plain": [ "mtrl_divrsn_or_misuse N Y\n", "donor_advisory \n", "0 9892 35\n", "1 10 2" ] }, "execution_count": 1116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df['donor_advisory'], df['mtrl_divrsn_or_misuse'])" ] }, { "cell_type": "code", "execution_count": 1126, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
no_material_division_2011NOOptOutyes
donor_advisory
01764768
\n", "
" ], "text/plain": [ "no_material_division_2011 NO OptOut yes\n", "donor_advisory \n", "0 17 6 4768" ] }, "execution_count": 1126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df[df['2011_data']==1]['donor_advisory'], df[df['2011_data']==1]['no_material_division_2011'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Fix 2016 version." ] }, { "cell_type": "code", "execution_count": 1124, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for index, row in df[2:].iterrows():\n", " #print row['no_material_division_2016'], type(row['no_material_division_2016'])\n", " if type(row['no_material_division_2016'])==list:\n", " #print 'yes', type(str(row['no_material_division_2016'][0])), str(row['no_material_division_2016'][0])\n", " df.ix[index, 'no_material_division_2016'] = str(row['no_material_division_2016'][0])" ] }, { "cell_type": "code", "execution_count": 1130, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
no_material_division_2016_gfx_/icons/checkboxOptOut.png_gfx_/icons/checkboxX.gif_gfx_/icons/checked.gif
donor_advisory_2011_to_2016
01047899
13166
\n", "
" ], "text/plain": [ "no_material_division_2016 _gfx_/icons/checkboxOptOut.png \\\n", "donor_advisory_2011_to_2016 \n", "0 10 \n", "1 3 \n", "\n", "no_material_division_2016 _gfx_/icons/checkboxX.gif \\\n", "donor_advisory_2011_to_2016 \n", "0 4 \n", "1 1 \n", "\n", "no_material_division_2016 _gfx_/icons/checked.gif \n", "donor_advisory_2011_to_2016 \n", "0 7899 \n", "1 66 " ] }, "execution_count": 1130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df['donor_advisory_2011_to_2016'], df['no_material_division_2016'])" ] }, { "cell_type": "code", "execution_count": 1128, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
no_material_division_2016_gfx_/icons/checkboxOptOut.png_gfx_/icons/checkboxX.gif_gfx_/icons/checked.gif
donor_advisory
01357965
\n", "
" ], "text/plain": [ "no_material_division_2016 _gfx_/icons/checkboxOptOut.png \\\n", "donor_advisory \n", "0 13 \n", "\n", "no_material_division_2016 _gfx_/icons/checkboxX.gif _gfx_/icons/checked.gif \n", "donor_advisory \n", "0 5 7965 " ] }, "execution_count": 1128, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df[df['2016_data']==1]['donor_advisory'], df[df['2016_data']==1]['no_material_division_2016'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 1113, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mtrl_divrsn_or_misuseNY
donor_advisory_2011_to_2016
01082233
19910
\n", "
" ], "text/plain": [ "mtrl_divrsn_or_misuse N Y\n", "donor_advisory_2011_to_2016 \n", "0 10822 33\n", "1 99 10" ] }, "execution_count": 1113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df['donor_advisory_2011_to_2016'], df['mtrl_divrsn_or_misuse'])" ] }, { "cell_type": "code", "execution_count": 1115, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mtrl_divrsn_or_misuseNY
past_donor_advisory
0989235
1102
\n", "
" ], "text/plain": [ "mtrl_divrsn_or_misuse N Y\n", "past_donor_advisory \n", "0 9892 35\n", "1 10 2" ] }, "execution_count": 1115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df['past_donor_advisory'], df['mtrl_divrsn_or_misuse'])" ] }, { "cell_type": "code", "execution_count": 1159, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['org_id', 'EIN', 'org_url', 'name', 'category', 'category-full', 'Date Published', 'Form 990 FYE', 'Form 990 FYE, v2', 'FYE', 'Earliest Rating Publication Date', 'ratings_system', 'Overall Score', 'Overall Rating', 'advisory text - current advisory', 'advisory text - past advisory', 'current_or_past_donor_advisory', 'current_donor_advisory', 'past_donor_advisory', 'latest_entry', 'current_ratings_url', 'ein_2016', 'Publication_date_and_FY_2016', 'Publication Date_2016', 'FYE_2016', 'donor_alert_2016', 'overall_rating_2016', 'efficiency_rating_rating_2016', 'AT_rating_2016', 'overall_rating_star_2016', 'financial_rating_star_2016', 'AT_rating_star_2016', 'program_expense_percent_2016', 'admin_expense_percent_2016', 'fund_expense_percent_2016', 'fund_efficiency_2016', 'working_capital_ratio_2016', 'program_expense_growth_2016', 'liabilities_to_assets_2016', 'independent_board_2016', 'no_material_division_2016', 'audited_financials_2016', 'no_loans_related_2016', 'documents_minutes_2016', 'form_990_2016', 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016', 'CEO_listed_2016', 'process_CEO_compensation_2016', 'no_board_compensation_2016', 'donor_privacy_policy_2016', 'board_listed_2016', 'audited_financials_web_2016', 'form_990_web_2016', 'staff_listed_2016', 'contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016', 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016', 'total_contributions_2016', 'program_service_revenue_2016', 'total_primary_revenue_2016', 'other_revenue_2016', 'total_revenue_2016', 'program_expenses_2016', 'administrative_expenses_2016', 'fundraising_expenses_2016', 'total_functional_expenses_2016', 'payments_to_affiliates_2016', 'excess_or_deficit_2016', 'net_assets_2016', 'comp_2016', 'cp_2016', 'mission_2016', '2011_data', 'charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_2011_plus_30', 'overall_rating_2011_plus_30_v2', 'overall_rating_star_2011', 'overall_rating_star_2011_text', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011', '2016 Advisory - Date Posted', '2016 Advisory - Charity Name', '2016 Advisory - advisory_url', '2016 Advisory - advisory', '_merge_v1', 'to_be_merged', u'NEW ROW', 'NAME_2015_BMF', 'STREET_2015_BMF', 'CITY_2015_BMF', 'STATE_2015_BMF', 'ZIP_2015_BMF', 'RULING_2015_BMF', 'ACTIVITY_2015_BMF', 'TAX_PERIOD_2015_BMF', 'ASSET_AMT_2015_BMF', 'INCOME_AMT_2015_BMF', 'REVENUE_AMT_2015_BMF', 'NTEE_CD_2015_BMF', '2015 BMF', 'ruledate_2004_BMF', 'name_MSTRALL', 'state_MSTRALL', 'NTEE1_MSTRALL', 'nteecc_MSTRALL', 'zip_MSTRALL', 'fips_MSTRALL', 'taxper_MSTRALL', 'income_MSTRALL', 'F990REV_MSTRALL', 'assets_MSTRALL', 'ruledate_MSTRALL', 'deductcd_MSTRALL', 'accper_MSTRALL', 'rule_date_v1', 'taxpd', 'NAME_SOI', 'yr_frmtn', 'pt1_num_vtng_gvrn_bdy_mems', 'pt1_num_ind_vtng_mems', 'num_vtng_gvrn_bdy_mems', 'num_ind_vtng_mems', 'tot_num_empls', 'tot_num_vlntrs', 'contri_grnts_cy', 'prog_srvc_rev_cy', 'invst_incm_cy', 'oth_rev_cy', 'grnts_and_smlr_amts_cy', 'tot_prof_fndrsng_exp_cy', 'tot_fndrsng_exp_cy', 'pt1_tot_asts_eoy', 'aud_fincl_stmts', 'mtrl_divrsn_or_misuse', 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy', 'federated_campaigns', 'memshp_dues', 'rltd_orgs', 'govt_grnts', 'all_oth_contri', 'nncsh_contri', 'tot_contri', 'psr_tot', 'inv_incm_tot_rev', 'bonds_tot_rev', 'roylrev_tot_rev', 'net_rent_tot_rev', 'gain_or_loss_sec', 'gain_or_loss_oth', 'oth_rev_tot', 'tot_rev', 'mgmt_srvc_fee_tot', 'fee_for_srvc_leg_tot', 'fee_for_srvc_acct_tot', 'fee_for_srvc_lbby_tot', 'fee_for_srvc_prof_tot', 'fee_for_srvc_invst_tot', 'fee_for_srvc_oth_tot', 'fs_audited', 'audit_committee', 'vlntr_hrs', '_merge_v2', 'rule_date', 'ruledate_2004_BMF_v2', 'ruledate_MSTRALL_v2', 'yr_frmtn_v2', 'age', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy', 'govt_revenue_2011_binary', 'other_revenue_2011_binary', 'complexity_2011', 'advisory', 'SOX_policies_2011', 'total_revenue_2011_logged', 'total_revenue', 'total_revenue_logged', 'program_efficiency_2016', 'state', 'tot_func_expns_prg_srvcs', 'tot_func_expns_tot', '_merge_v3', 'program_expenses', 'total_expenses', 'program_efficiency', 'fndrsng_events', '_merge', 'other_revenue_SOI', 'complexity_2016', 'complexity_SOI', 'complexity', 'conflict_of_interest_policy', 'whistleblower_policy', 'records_retention_policy', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', '2016_data', 'Advisory Text', 'donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'SOX_policies_all_binary']\n" ] } ], "source": [ "print df.columns.tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Variables" ] }, { "cell_type": "code", "execution_count": 1082, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.rename(columns={'2016_donor_advisory':'donor_advisory_2016'}, inplace=True)\n", "df.rename(columns={'2011_to_2016_donor_advisory':'donor_advisory_2011_to_2016'}, inplace=True)" ] }, { "cell_type": "code", "execution_count": 1189, "metadata": { "collapsed": true }, "outputs": [], "source": [ "DVs = ['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', \n", " 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2']\n", "indicators = ['org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data']\n", "IVs = ['SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']\n", "controls = ['program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state']\n", "fixed_effects = ['category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', \n", " 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', \n", " 'category_Human and Civil Rights', 'category_International', 'category_Religion', \n", " 'category_Research and Public Policy']\n", "SOI_check = ['tot_rev']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Save DF" ] }, { "cell_type": "code", "execution_count": 1160, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84958\n" ] } ], "source": [ "print len(df)\n", "df.to_pickle('Merged dataset with Age, State, Category dummies, Total Revenues, Efficiency, Complexity, SOX, Donor Advisory.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Output version of dataset with only 2011 rows and logit columns" ] }, { "cell_type": "code", "execution_count": 1190, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ "cols = DVs + indicators + IVs + controls + SOI_check + fixed_effects\n", "print cols" ] }, { "cell_type": "code", "execution_count": 1178, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4863\n", "4857\n", "4857\n" ] }, { "data": { "text/plain": [ "['5259', '9389', '11542', '4027', '4024']" ] }, "execution_count": 1178, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['2011_data']==1]['org_id'].tolist())\n", "print len(set(df[df['2011_data']==1]['org_id'].tolist()))\n", "org_ids_2011 = list(set(df[df['2011_data']==1]['org_id'].tolist()))\n", "print len(org_ids_2011)\n", "org_ids_2011[:5]" ] }, { "cell_type": "code", "execution_count": 1180, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "69616" ] }, "execution_count": 1180, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df[(df['org_id'].isin(org_ids_2011))])" ] }, { "cell_type": "code", "execution_count": 1181, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "4857" ] }, "execution_count": 1181, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df[(df['org_id'].isin(org_ids_2011))]['org_id'].tolist())\n", "len(set(df[(df['org_id'].isin(org_ids_2011))]['org_id'].tolist()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Save as Excel file." ] }, { "cell_type": "code", "execution_count": 1179, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df[(df['org_id'].isin(org_ids_2011))][cols].to_excel('2011 dataset.xlsx')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Create dummies for Test #5\n", "'Divide 4,857 orgs into three groups: i) those with no SOX policies in 2011 and still no SOX policies in 2016; ii) those with SOX policies in 2011 and 2016; and iii) those with no SOX policies in 2011 but SOX policies in 2016. Create dummy variables for each group and see whether those in group iii) do better than i) or ii). This is a relatively low cost 'pre-post' test.\n", "\n", "# Try SOX_policies in 2016 - SOX_policies in 2011\n", "\n", "# --> THIS COULD BE A PROBLEM FOR THOSE WITH A 2016 DONOR ADVISORY - WE DON'T HAVE THEIR 990 DETAILS --> PERHAPS ASK DAN TO GET THOSE? IN ANY CASE, THEY WILL LIKELY HAVE TO BE DOWNLOADED (OTHERWISE, WE ONLY HAVE THEIR '2011' SOX policy data, not the 2016\n", "\n", "# --> E.G., WE NEED 2016 SOX DATA FOR ORG_ID 10087\n", "\n", "# DESCRIPTIVE DATA -- SHOW HOW MANY GOT OR ADDED SOX POLICIES FROM 2011 TO 2016; ALSO DO THIS SINCE 2008 FOR THE AVAILABLE 'SOI' ORGANIZATIONS --> PERHAPS JUST FOR OUR SAMPLE PLUS FOR THE ENTIRE SOI DATASET" ] }, { "cell_type": "code", "execution_count": 1187, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "69616\n", "4857\n", "47\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016conflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2org_idEINFYEratings_system2011_data2016_dataSOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexityagetotal_revenue_loggedcategorystatetot_revcategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
429111NaNNaNNaN4994133552154currentcurrent01NaNNaNNaNNaN022NaNCommunity DevelopmentNYNaN00100000000
543111NaNNaNNaN10087581925867currentcurrent01NaNNaNNaNNaN025NaNHuman ServicesLANaN00000010000
1873111NaNNaNNaN6705112716763currentcurrent01NaNNaNNaNNaN031NaNHuman ServicesNYNaN00000010000
3663111NaNNaNNaN8626133119118currentcurrent01NaNNaNNaNNaN034NaNReligionNYNaN00000000010
5836111NaNNaNNaN11671300038297currentcurrent01NaNNaNNaNNaN013NaNCommunity DevelopmentCANaN00100000000
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "429 1 1 1 \n", "543 1 1 1 \n", "1873 1 1 1 \n", "3663 1 1 1 \n", "5836 1 1 1 \n", "\n", " conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "429 NaN NaN \n", "543 NaN NaN \n", "1873 NaN NaN \n", "3663 NaN NaN \n", "5836 NaN NaN \n", "\n", " whistleblower_policy_v2 org_id EIN FYE ratings_system \\\n", "429 NaN 4994 133552154 current current \n", "543 NaN 10087 581925867 current current \n", "1873 NaN 6705 112716763 current current \n", "3663 NaN 8626 133119118 current current \n", "5836 NaN 11671 300038297 current current \n", "\n", " 2011_data 2016_data SOX_policies SOX_policies_binary \\\n", "429 0 1 NaN NaN \n", "543 0 1 NaN NaN \n", "1873 0 1 NaN NaN \n", "3663 0 1 NaN NaN \n", "5836 0 1 NaN NaN \n", "\n", " SOX_policies_all_binary program_efficiency complexity age \\\n", "429 NaN NaN 0 22 \n", "543 NaN NaN 0 25 \n", "1873 NaN NaN 0 31 \n", "3663 NaN NaN 0 34 \n", "5836 NaN NaN 0 13 \n", "\n", " total_revenue_logged category state tot_rev \\\n", "429 NaN Community Development NY NaN \n", "543 NaN Human Services LA NaN \n", "1873 NaN Human Services NY NaN \n", "3663 NaN Religion NY NaN \n", "5836 NaN Community Development CA NaN \n", "\n", " category_Animals category_Arts, Culture, Humanities \\\n", "429 0 0 \n", "543 0 0 \n", "1873 0 0 \n", "3663 0 0 \n", "5836 0 0 \n", "\n", " category_Community Development category_Education \\\n", "429 1 0 \n", "543 0 0 \n", "1873 0 0 \n", "3663 0 0 \n", "5836 1 0 \n", "\n", " category_Environment category_Health category_Human Services \\\n", "429 0 0 0 \n", "543 0 0 1 \n", "1873 0 0 1 \n", "3663 0 0 0 \n", "5836 0 0 0 \n", "\n", " category_Human and Civil Rights category_International \\\n", "429 0 0 \n", "543 0 0 \n", "1873 0 0 \n", "3663 0 0 \n", "5836 0 0 \n", "\n", " category_Religion category_Research and Public Policy \n", "429 0 0 \n", "543 0 0 \n", "1873 0 0 \n", "3663 1 0 \n", "5836 0 0 " ] }, "execution_count": 1187, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[(df['org_id'].isin(org_ids_2011))])\n", "print len(df[(df['org_id'].isin(org_ids_2011)) & (df['2016_data']==1)])\n", "print len(df[(df['org_id'].isin(org_ids_2011)) & (df['2016_data']==1) & (df['donor_advisory']==1)])\n", "df[(df['org_id'].isin(org_ids_2011)) & (df['2016_data']==1) & (df['donor_advisory']==1)][cols].to_excel('47 missing SOX.xls')\n", "df[(df['org_id'].isin(org_ids_2011)) & (df['2016_data']==1) & (df['donor_advisory']==1)][cols][:5]" ] }, { "cell_type": "code", "execution_count": 1191, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df[(df['org_id'].isin(org_ids_2011)) & (df['donor_advisory_2016']==1)][cols].to_excel('47 missing SOX v2.xls')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Merge in hand-coded data for 47 orgs" ] }, { "cell_type": "code", "execution_count": 1237, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016conflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2org_idEINFYEForm 990 FYEratings_system2011_data2016_dataSOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexityagetotal_revenue_loggedcategorystatetot_revcategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
429111NaNNaNNaN4994133552154currentcurrentcurrent01NaNNaNNaNNaN022NaNCommunity DevelopmentNYNaN00100000000
543111NaNNaNNaN10087581925867currentcurrentcurrent01NaNNaNNaNNaN025NaNHuman ServicesLANaN00000010000
1873111NaNNaNNaN6705112716763currentcurrentcurrent01NaNNaNNaNNaN031NaNHuman ServicesNYNaN00000010000
3663111NaNNaNNaN8626133119118currentcurrentcurrent01NaNNaNNaNNaN034NaNReligionNYNaN00000000010
5836111NaNNaNNaN11671300038297currentcurrentcurrent01NaNNaNNaNNaN013NaNCommunity DevelopmentCANaN00100000000
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "429 1 1 1 \n", "543 1 1 1 \n", "1873 1 1 1 \n", "3663 1 1 1 \n", "5836 1 1 1 \n", "\n", " conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "429 NaN NaN \n", "543 NaN NaN \n", "1873 NaN NaN \n", "3663 NaN NaN \n", "5836 NaN NaN \n", "\n", " whistleblower_policy_v2 org_id EIN FYE Form 990 FYE \\\n", "429 NaN 4994 133552154 current current \n", "543 NaN 10087 581925867 current current \n", "1873 NaN 6705 112716763 current current \n", "3663 NaN 8626 133119118 current current \n", "5836 NaN 11671 300038297 current current \n", "\n", " ratings_system 2011_data 2016_data SOX_policies SOX_policies_binary \\\n", "429 current 0 1 NaN NaN \n", "543 current 0 1 NaN NaN \n", "1873 current 0 1 NaN NaN \n", "3663 current 0 1 NaN NaN \n", "5836 current 0 1 NaN NaN \n", "\n", " SOX_policies_all_binary program_efficiency complexity age \\\n", "429 NaN NaN 0 22 \n", "543 NaN NaN 0 25 \n", "1873 NaN NaN 0 31 \n", "3663 NaN NaN 0 34 \n", "5836 NaN NaN 0 13 \n", "\n", " total_revenue_logged category state tot_rev \\\n", "429 NaN Community Development NY NaN \n", "543 NaN Human Services LA NaN \n", "1873 NaN Human Services NY NaN \n", "3663 NaN Religion NY NaN \n", "5836 NaN Community Development CA NaN \n", "\n", " category_Animals category_Arts, Culture, Humanities \\\n", "429 0 0 \n", "543 0 0 \n", "1873 0 0 \n", "3663 0 0 \n", "5836 0 0 \n", "\n", " category_Community Development category_Education \\\n", "429 1 0 \n", "543 0 0 \n", "1873 0 0 \n", "3663 0 0 \n", "5836 1 0 \n", "\n", " category_Environment category_Health category_Human Services \\\n", "429 0 0 0 \n", "543 0 0 1 \n", "1873 0 0 1 \n", "3663 0 0 0 \n", "5836 0 0 0 \n", "\n", " category_Human and Civil Rights category_International \\\n", "429 0 0 \n", "543 0 0 \n", "1873 0 0 \n", "3663 0 0 \n", "5836 0 0 \n", "\n", " category_Religion category_Research and Public Policy \n", "429 0 0 \n", "543 0 0 \n", "1873 0 0 \n", "3663 1 0 \n", "5836 0 0 " ] }, "execution_count": 1237, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['org_id'].isin(org_ids_2011)) & (df['2016_data']==1) & (df['donor_advisory']==1)][cols][:5]" ] }, { "cell_type": "code", "execution_count": 1297, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "47\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYEdonor_advisoryconflict_of_interestrecords_retentionwhistleblowercomplexitytotal_revenueprogram_efficiencyprogram_expensetotal_expense
010087581925867FY201511005445953NaN229316370526
110552942719901FY2012110133935913NaN38428244134682
210902262224994FY201411113706895NaN356046565973
311009953523852FY2014111123620634NaN4907083455917
411327720760857FY2014111154244456NaN42109464942239
\n", "
" ], "text/plain": [ " org_id EIN FYE donor_advisory conflict_of_interest \\\n", "0 10087 581925867 FY2015 1 1 \n", "1 10552 942719901 FY2012 1 1 \n", "2 10902 262224994 FY2014 1 1 \n", "3 11009 953523852 FY2014 1 1 \n", "4 11327 720760857 FY2014 1 1 \n", "\n", " records_retention whistleblower complexity total_revenue \\\n", "0 0 0 5 445953 \n", "1 0 1 3 3935913 \n", "2 1 1 3 706895 \n", "3 1 1 2 3620634 \n", "4 1 1 5 4244456 \n", "\n", " program_efficiency program_expense total_expense \n", "0 NaN 229316 370526 \n", "1 NaN 3842824 4134682 \n", "2 NaN 356046 565973 \n", "3 NaN 490708 3455917 \n", "4 NaN 4210946 4942239 " ] }, "execution_count": 1297, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_47 = pd.read_excel('47 missing SOX_updated.xls')\n", "missing_47['EIN'] = missing_47['EIN'].astype('str')\n", "missing_47['FYE'] = 'FY' + missing_47['FYE'].astype('str')\n", "print len(missing_47)\n", "missing_47.head()" ] }, { "cell_type": "code", "execution_count": 1298, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYEdonor_advisoryconflict_of_interestrecords_retentionwhistleblowercomplexitytotal_revenueprogram_efficiencyprogram_expensetotal_expense
010087581925867FY2015110054459530.618893229316370526
110552942719901FY20121101339359130.92941238428244134682
210902262224994FY2014111137068950.629087356046565973
311009953523852FY20141111236206340.1419914907083455917
411327720760857FY20141111542444560.85203242109464942239
\n", "
" ], "text/plain": [ " org_id EIN FYE donor_advisory conflict_of_interest \\\n", "0 10087 581925867 FY2015 1 1 \n", "1 10552 942719901 FY2012 1 1 \n", "2 10902 262224994 FY2014 1 1 \n", "3 11009 953523852 FY2014 1 1 \n", "4 11327 720760857 FY2014 1 1 \n", "\n", " records_retention whistleblower complexity total_revenue \\\n", "0 0 0 5 445953 \n", "1 0 1 3 3935913 \n", "2 1 1 3 706895 \n", "3 1 1 2 3620634 \n", "4 1 1 5 4244456 \n", "\n", " program_efficiency program_expense total_expense \n", "0 0.618893 229316 370526 \n", "1 0.929412 3842824 4134682 \n", "2 0.629087 356046 565973 \n", "3 0.141991 490708 3455917 \n", "4 0.852032 4210946 4942239 " ] }, "execution_count": 1298, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_47['program_efficiency'] = missing_47['program_expense']/missing_47['total_expense']\n", "missing_47.head()" ] }, { "cell_type": "code", "execution_count": 1299, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 1299, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(missing_47[missing_47['total_revenue']<=0])" ] }, { "cell_type": "code", "execution_count": 1300, "metadata": { "collapsed": true }, "outputs": [], "source": [ "missing_47['total_revenue_logged'] = np.log(missing_47['total_revenue'])" ] }, { "cell_type": "code", "execution_count": 1301, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
org_id477450.5957452612.5505673416.0000004889.5000007651.0000009583.0000001.274000e+04
donor_advisory471.0000000.0000001.0000001.0000001.0000001.0000001.000000e+00
conflict_of_interest450.8222220.3866460.0000001.0000001.0000001.0000001.000000e+00
records_retention450.7111110.4583680.0000000.0000001.0000001.0000001.000000e+00
whistleblower450.7555560.4346130.0000001.0000001.0000001.0000001.000000e+00
complexity453.3111111.4113541.0000002.0000003.0000004.0000007.000000e+00
total_revenue459772891.82222222888854.408858234562.000000971459.0000002477684.0000005347792.0000001.215109e+08
program_efficiency440.7093590.2231160.0798280.6265380.7455440.8605381.000000e+00
program_expense457559185.04444419107996.4394030.000000516106.0000001707840.0000004210946.0000001.121012e+08
total_expense4510303757.13333326365217.6995570.0000001116610.0000002770228.0000005607875.0000001.626340e+08
total_revenue_logged4514.8713461.44730912.36547513.78655414.72283515.4921941.861551e+01
\n", "
" ], "text/plain": [ " count mean std min \\\n", "org_id 47 7450.595745 2612.550567 3416.000000 \n", "donor_advisory 47 1.000000 0.000000 1.000000 \n", "conflict_of_interest 45 0.822222 0.386646 0.000000 \n", "records_retention 45 0.711111 0.458368 0.000000 \n", "whistleblower 45 0.755556 0.434613 0.000000 \n", "complexity 45 3.311111 1.411354 1.000000 \n", "total_revenue 45 9772891.822222 22888854.408858 234562.000000 \n", "program_efficiency 44 0.709359 0.223116 0.079828 \n", "program_expense 45 7559185.044444 19107996.439403 0.000000 \n", "total_expense 45 10303757.133333 26365217.699557 0.000000 \n", "total_revenue_logged 45 14.871346 1.447309 12.365475 \n", "\n", " 25% 50% 75% \\\n", "org_id 4889.500000 7651.000000 9583.000000 \n", "donor_advisory 1.000000 1.000000 1.000000 \n", "conflict_of_interest 1.000000 1.000000 1.000000 \n", "records_retention 0.000000 1.000000 1.000000 \n", "whistleblower 1.000000 1.000000 1.000000 \n", "complexity 2.000000 3.000000 4.000000 \n", "total_revenue 971459.000000 2477684.000000 5347792.000000 \n", "program_efficiency 0.626538 0.745544 0.860538 \n", "program_expense 516106.000000 1707840.000000 4210946.000000 \n", "total_expense 1116610.000000 2770228.000000 5607875.000000 \n", "total_revenue_logged 13.786554 14.722835 15.492194 \n", "\n", " max \n", "org_id 1.274000e+04 \n", "donor_advisory 1.000000e+00 \n", "conflict_of_interest 1.000000e+00 \n", "records_retention 1.000000e+00 \n", "whistleblower 1.000000e+00 \n", "complexity 7.000000e+00 \n", "total_revenue 1.215109e+08 \n", "program_efficiency 1.000000e+00 \n", "program_expense 1.121012e+08 \n", "total_expense 1.626340e+08 \n", "total_revenue_logged 1.861551e+01 " ] }, "execution_count": 1301, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_47.describe().T" ] }, { "cell_type": "code", "execution_count": 1302, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 2\n", "2 3\n", "Name: SOX_policies, dtype: float64" ] }, "execution_count": 1302, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOX_columns_47 = ['conflict_of_interest', 'whistleblower', 'records_retention']\n", "missing_47['SOX_policies'] = np.where(missing_47['conflict_of_interest'].notnull(),\n", " (missing_47[SOX_columns_47] > 0).sum(1), np.nan)\n", "missing_47['SOX_policies'][:3]" ] }, { "cell_type": "code", "execution_count": 1303, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SOX_policiesconflict_of_interestwhistleblowerrecords_retention
01100
12110
23111
33111
43111
50000
63111
73111
\n", "
" ], "text/plain": [ " SOX_policies conflict_of_interest whistleblower records_retention\n", "0 1 1 0 0\n", "1 2 1 1 0\n", "2 3 1 1 1\n", "3 3 1 1 1\n", "4 3 1 1 1\n", "5 0 0 0 0\n", "6 3 1 1 1\n", "7 3 1 1 1" ] }, "execution_count": 1303, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOX_columns_47 = ['SOX_policies'] + SOX_columns_47\n", "missing_47[SOX_columns_47][:8]" ] }, { "cell_type": "code", "execution_count": 1304, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "45\n" ] }, { "data": { "text/plain": [ "1 38\n", "0 7\n", "Name: SOX_policies_binary, dtype: int64" ] }, "execution_count": 1304, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_47['SOX_policies_binary'] = missing_47['SOX_policies']\n", "missing_47['SOX_policies_binary'] = np.where(missing_47['SOX_policies_binary']>=1, 1, missing_47['SOX_policies'])\n", "print missing_47['SOX_policies_binary'].value_counts().sum()\n", "missing_47['SOX_policies_binary'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1305, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 30\n", "0 15\n", "Name: SOX_policies_all_binary, dtype: int64\n" ] } ], "source": [ "missing_47['SOX_policies_all_binary'] = np.nan\n", "missing_47['SOX_policies_all_binary'] = missing_47['SOX_policies']\n", "missing_47['SOX_policies_all_binary'] = np.where( ((missing_47['SOX_policies_all_binary']==1) | (missing_47['SOX_policies_all_binary']==2)),\n", " 0, missing_47['SOX_policies_all_binary'])\n", "missing_47['SOX_policies_all_binary'] = np.where(missing_47['SOX_policies_all_binary']==3, 1, missing_47['SOX_policies_all_binary'])\n", "print missing_47['SOX_policies_all_binary'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1307, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3 30\n", "0 7\n", "2 5\n", "1 3\n", "Name: SOX_policies, dtype: int64 \n", "\n", "1 38\n", "0 7\n", "Name: SOX_policies_binary, dtype: int64 \n", "\n", "1 30\n", "0 15\n", "Name: SOX_policies_all_binary, dtype: int64\n" ] } ], "source": [ "print missing_47['SOX_policies'].value_counts(), '\\n'\n", "print missing_47['SOX_policies_binary'].value_counts(), '\\n'\n", "print missing_47['SOX_policies_all_binary'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1308, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SOX_policies_binarySOX_policies_all_binarySOX_policiesconflict_of_interestwhistleblowerrecords_retention
0101100
1102110
2113111
3113111
4113111
5000000
6113111
7113111
\n", "
" ], "text/plain": [ " SOX_policies_binary SOX_policies_all_binary SOX_policies \\\n", "0 1 0 1 \n", "1 1 0 2 \n", "2 1 1 3 \n", "3 1 1 3 \n", "4 1 1 3 \n", "5 0 0 0 \n", "6 1 1 3 \n", "7 1 1 3 \n", "\n", " conflict_of_interest whistleblower records_retention \n", "0 1 0 0 \n", "1 1 1 0 \n", "2 1 1 1 \n", "3 1 1 1 \n", "4 1 1 1 \n", "5 0 0 0 \n", "6 1 1 1 \n", "7 1 1 1 " ] }, "execution_count": 1308, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SOX_columns_47 = ['SOX_policies_binary', 'SOX_policies_all_binary'] + SOX_columns_47\n", "missing_47[SOX_columns_47][:8]" ] }, { "cell_type": "code", "execution_count": 1309, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[u'org_id', u'EIN', u'FYE', u'donor_advisory', u'conflict_of_interest', u'records_retention', u'whistleblower', u'complexity', u'total_revenue', u'program_efficiency', u'program_expense', u'total_expense', 'total_revenue_logged', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']\n" ] } ], "source": [ "print missing_47.columns.tolist()" ] }, { "cell_type": "code", "execution_count": 1310, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYEconflict_of_interestrecords_retentionwhistleblowerSOX_policiesSOX_policies_all_binarySOX_policies_binarytotal_revenuetotal_revenue_loggedprogram_expensetotal_expenseprogram_efficiencycomplexity
010087581925867FY201510010144595313.0079692293163705260.6188935
110552942719901FY2012101201393591315.185653384282441346820.9294123
210902262224994FY201411131170689513.4686373560465659730.6290873
\n", "
" ], "text/plain": [ " org_id EIN FYE conflict_of_interest records_retention \\\n", "0 10087 581925867 FY2015 1 0 \n", "1 10552 942719901 FY2012 1 0 \n", "2 10902 262224994 FY2014 1 1 \n", "\n", " whistleblower SOX_policies SOX_policies_all_binary SOX_policies_binary \\\n", "0 0 1 0 1 \n", "1 1 2 0 1 \n", "2 1 3 1 1 \n", "\n", " total_revenue total_revenue_logged program_expense total_expense \\\n", "0 445953 13.007969 229316 370526 \n", "1 3935913 15.185653 3842824 4134682 \n", "2 706895 13.468637 356046 565973 \n", "\n", " program_efficiency complexity \n", "0 0.618893 5 \n", "1 0.929412 3 \n", "2 0.629087 3 " ] }, "execution_count": 1310, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_47 = missing_47[['org_id', 'EIN', 'FYE', 'conflict_of_interest', 'records_retention', 'whistleblower', \n", " 'SOX_policies', 'SOX_policies_all_binary', 'SOX_policies_binary',\n", " 'total_revenue', 'total_revenue_logged',\n", " 'program_expense', 'total_expense', 'program_efficiency', 'complexity']] \n", "#'EIN', 'donor_advisory', \n", "missing_47[:3]" ] }, { "cell_type": "code", "execution_count": 1311, "metadata": { "collapsed": true }, "outputs": [], "source": [ "missing_47['org_id'] = missing_47['org_id'].astype('str')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Rename columns. I checked that all 47 would be merged in as new rows, so I can make the column names the same as in the existing dataset. " ] }, { "cell_type": "code", "execution_count": 1313, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEIN_47FYEconflict_of_interest_policy_47records_retention_policy_47whistleblower_policy_47SOX_policies_47SOX_policies_all_binary_47SOX_policies_binary_47tot_rev_47total_revenue_logged_47program_expenses_47total_expenses_47program_efficiency_47complexity_47
010087581925867FY201510010144595313.0079692293163705260.6188935
110552942719901FY2012101201393591315.185653384282441346820.9294123
210902262224994FY201411131170689513.4686373560465659730.6290873
\n", "
" ], "text/plain": [ " org_id EIN_47 FYE conflict_of_interest_policy_47 \\\n", "0 10087 581925867 FY2015 1 \n", "1 10552 942719901 FY2012 1 \n", "2 10902 262224994 FY2014 1 \n", "\n", " records_retention_policy_47 whistleblower_policy_47 SOX_policies_47 \\\n", "0 0 0 1 \n", "1 0 1 2 \n", "2 1 1 3 \n", "\n", " SOX_policies_all_binary_47 SOX_policies_binary_47 tot_rev_47 \\\n", "0 0 1 445953 \n", "1 0 1 3935913 \n", "2 1 1 706895 \n", "\n", " total_revenue_logged_47 program_expenses_47 total_expenses_47 \\\n", "0 13.007969 229316 370526 \n", "1 15.185653 3842824 4134682 \n", "2 13.468637 356046 565973 \n", "\n", " program_efficiency_47 complexity_47 \n", "0 0.618893 5 \n", "1 0.929412 3 \n", "2 0.629087 3 " ] }, "execution_count": 1313, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_47.columns = ['org_id', 'EIN_47', 'FYE', 'conflict_of_interest_policy_47', 'records_retention_policy_47', \n", " 'whistleblower_policy_47', \n", " 'SOX_policies_47', 'SOX_policies_all_binary_47', 'SOX_policies_binary_47',\n", " 'tot_rev_47', 'total_revenue_logged_47',\n", " 'program_expenses_47', 'total_expenses_47', 'program_efficiency_47', 'complexity_47'] \n", "missing_47[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Save DF." ] }, { "cell_type": "code", "execution_count": 1314, "metadata": { "collapsed": true }, "outputs": [], "source": [ "missing_47.to_pickle('missing_47.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Merge into main dataframe" ] }, { "cell_type": "code", "execution_count": 1315, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.rename(columns={'_merge':'_merge_v4'}, inplace=True)" ] }, { "cell_type": "code", "execution_count": 1316, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "272\n", "84958\n" ] } ], "source": [ "#df.to_pickle('df.pkl')\n", "#df = pd.read_pickle('df.pkl')\n", "print len(df.columns)\n", "print len(df)" ] }, { "cell_type": "code", "execution_count": 1318, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "272\n", "84958\n", "285\n", "84958\n", "285\n", "85005\n", "286\n", "85005\n" ] } ], "source": [ "print len(df.columns)\n", "print len(df)\n", "print len(pd.merge(df, missing_47, left_on=['org_id','FYE'], right_on=['org_id','FYE'], how='left').columns)\n", "print len(pd.merge(df, missing_47, left_on=['org_id','FYE'], right_on=['org_id','FYE'], how='left'))\n", "print len(pd.merge(df, missing_47, left_on=['org_id','FYE'], right_on=['org_id','FYE'], how='outer').columns)\n", "print len(pd.merge(df, missing_47, left_on=['org_id','FYE'], right_on=['org_id','FYE'], how='outer'))\n", "df = pd.merge(df, missing_47, left_on=['org_id','FYE'], right_on=['org_id','FYE'], how='outer', indicator=True)\n", "print len(df.columns)\n", "print len(df)" ] }, { "cell_type": "code", "execution_count": 1319, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "left_only 84958\n", "right_only 47\n", "both 0\n", "dtype: int64" ] }, "execution_count": 1319, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['_merge'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1320, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['org_id', 'EIN', 'org_url', 'name', 'category', 'category-full', 'Date Published', 'Form 990 FYE', 'Form 990 FYE, v2', 'FYE', 'Earliest Rating Publication Date', 'ratings_system', 'Overall Score', 'Overall Rating', 'advisory text - current advisory', 'advisory text - past advisory', 'current_or_past_donor_advisory', 'current_donor_advisory', 'past_donor_advisory', 'latest_entry', 'current_ratings_url', 'ein_2016', 'Publication_date_and_FY_2016', 'Publication Date_2016', 'FYE_2016', 'donor_alert_2016', 'overall_rating_2016', 'efficiency_rating_rating_2016', 'AT_rating_2016', 'overall_rating_star_2016', 'financial_rating_star_2016', 'AT_rating_star_2016', 'program_expense_percent_2016', 'admin_expense_percent_2016', 'fund_expense_percent_2016', 'fund_efficiency_2016', 'working_capital_ratio_2016', 'program_expense_growth_2016', 'liabilities_to_assets_2016', 'independent_board_2016', 'no_material_division_2016', 'audited_financials_2016', 'no_loans_related_2016', 'documents_minutes_2016', 'form_990_2016', 'conflict_of_interest_policy_2016', 'whistleblower_policy_2016', 'records_retention_policy_2016', 'CEO_listed_2016', 'process_CEO_compensation_2016', 'no_board_compensation_2016', 'donor_privacy_policy_2016', 'board_listed_2016', 'audited_financials_web_2016', 'form_990_web_2016', 'staff_listed_2016', 'contributions_gifts_grants_2016', 'federated_campaigns_2016', 'membership_dues_2016', 'fundraising_events_2016', 'related_organizations_2016', 'government_grants_2016', 'total_contributions_2016', 'program_service_revenue_2016', 'total_primary_revenue_2016', 'other_revenue_2016', 'total_revenue_2016', 'program_expenses_2016', 'administrative_expenses_2016', 'fundraising_expenses_2016', 'total_functional_expenses_2016', 'payments_to_affiliates_2016', 'excess_or_deficit_2016', 'net_assets_2016', 'comp_2016', 'cp_2016', 'mission_2016', '2011_data', 'charity_name_2011', 'category_2011', 'city_2011', 'state_2011', 'cause_2011', 'tag_line_2011', 'url_2011', 'ein_2011', 'fye_2011', 'overall_rating_2011', 'overall_rating_2011_plus_30', 'overall_rating_2011_plus_30_v2', 'overall_rating_star_2011', 'overall_rating_star_2011_text', 'efficiency_rating_2011', 'AT_rating_2011', 'financial_rating_star_2011', 'AT_rating_star_2011', 'program_expense_percent_2011', 'admin_expense_percent_2011', 'fund_expense_percent_2011', 'fund_efficiency_2011', 'primary_revenue_growth_2011', 'program_expense_growth_2011', 'working_capital_ratio_2011', 'independent_board_2011', 'no_material_division_2011', 'audited_financials_2011', 'no_loans_related_2011', 'documents_minutes_2011', 'form_990_2011', 'conflict_of_interest_policy_2011', 'whistleblower_policy_2011', 'records_retention_policy_2011', 'CEO_listed_2011', 'process_CEO_compensation_2011', 'no_board_compensation_2011', 'donor_privacy_policy_2011', 'board_listed_2011', 'audited_financials_web_2011', 'form_990_web_2011', 'staff_listed_2011', 'primary_revenue_2011', 'other_revenue_2011', 'total_revenue_2011', 'govt_revenue_2011', 'program_expense_2011', 'admin_expense_2011', 'fund_expense_2011', 'total_functional_expense_2011', 'affiliate_payments_2011', 'budget_surplus_2011', 'net_assets_2011', 'leader_comp_2011', 'leader_comp_percent_2011', 'email_2011', 'website_2011', '2016 Advisory - Date Posted', '2016 Advisory - Charity Name', '2016 Advisory - advisory_url', '2016 Advisory - advisory', '_merge_v1', 'to_be_merged', u'NEW ROW', 'NAME_2015_BMF', 'STREET_2015_BMF', 'CITY_2015_BMF', 'STATE_2015_BMF', 'ZIP_2015_BMF', 'RULING_2015_BMF', 'ACTIVITY_2015_BMF', 'TAX_PERIOD_2015_BMF', 'ASSET_AMT_2015_BMF', 'INCOME_AMT_2015_BMF', 'REVENUE_AMT_2015_BMF', 'NTEE_CD_2015_BMF', '2015 BMF', 'ruledate_2004_BMF', 'name_MSTRALL', 'state_MSTRALL', 'NTEE1_MSTRALL', 'nteecc_MSTRALL', 'zip_MSTRALL', 'fips_MSTRALL', 'taxper_MSTRALL', 'income_MSTRALL', 'F990REV_MSTRALL', 'assets_MSTRALL', 'ruledate_MSTRALL', 'deductcd_MSTRALL', 'accper_MSTRALL', 'rule_date_v1', 'taxpd', 'NAME_SOI', 'yr_frmtn', 'pt1_num_vtng_gvrn_bdy_mems', 'pt1_num_ind_vtng_mems', 'num_vtng_gvrn_bdy_mems', 'num_ind_vtng_mems', 'tot_num_empls', 'tot_num_vlntrs', 'contri_grnts_cy', 'prog_srvc_rev_cy', 'invst_incm_cy', 'oth_rev_cy', 'grnts_and_smlr_amts_cy', 'tot_prof_fndrsng_exp_cy', 'tot_fndrsng_exp_cy', 'pt1_tot_asts_eoy', 'aud_fincl_stmts', 'mtrl_divrsn_or_misuse', 'cnflct_int_plcy', 'whistleblower_plcy', 'doc_retention_plcy', 'federated_campaigns', 'memshp_dues', 'rltd_orgs', 'govt_grnts', 'all_oth_contri', 'nncsh_contri', 'tot_contri', 'psr_tot', 'inv_incm_tot_rev', 'bonds_tot_rev', 'roylrev_tot_rev', 'net_rent_tot_rev', 'gain_or_loss_sec', 'gain_or_loss_oth', 'oth_rev_tot', 'tot_rev', 'mgmt_srvc_fee_tot', 'fee_for_srvc_leg_tot', 'fee_for_srvc_acct_tot', 'fee_for_srvc_lbby_tot', 'fee_for_srvc_prof_tot', 'fee_for_srvc_invst_tot', 'fee_for_srvc_oth_tot', 'fs_audited', 'audit_committee', 'vlntr_hrs', '_merge_v2', 'rule_date', 'ruledate_2004_BMF_v2', 'ruledate_MSTRALL_v2', 'yr_frmtn_v2', 'age', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy', 'govt_revenue_2011_binary', 'other_revenue_2011_binary', 'complexity_2011', 'advisory', 'SOX_policies_2011', 'total_revenue_2011_logged', 'total_revenue', 'total_revenue_logged', 'program_efficiency_2016', 'state', 'tot_func_expns_prg_srvcs', 'tot_func_expns_tot', '_merge_v3', 'program_expenses', 'total_expenses', 'program_efficiency', 'fndrsng_events', '_merge_v4', 'other_revenue_SOI', 'complexity_2016', 'complexity_SOI', 'complexity', 'conflict_of_interest_policy', 'whistleblower_policy', 'records_retention_policy', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', '2016_data', 'Advisory Text', 'donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'SOX_policies_all_binary', 'total_revenue_no_neg', 'EIN_47', 'conflict_of_interest_policy_47', 'records_retention_policy_47', 'whistleblower_policy_47', 'SOX_policies_47', 'SOX_policies_all_binary_47', 'SOX_policies_binary_47', 'tot_rev_47', 'total_revenue_logged_47', 'program_expenses_47', 'total_expenses_47', 'program_efficiency_47', 'complexity_47', '_merge']\n" ] } ], "source": [ "print df.columns.tolist()" ] }, { "cell_type": "code", "execution_count": 1321, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINorg_urlnamecategorycategory-fullDate PublishedForm 990 FYEForm 990 FYE, v2FYEEarliest Rating Publication Dateratings_systemOverall ScoreOverall Ratingadvisory text - current advisoryadvisory text - past advisorycurrent_or_past_donor_advisorycurrent_donor_advisorypast_donor_advisorylatest_entrycurrent_ratings_urlein_2016Publication_date_and_FY_2016Publication Date_2016FYE_2016donor_alert_2016overall_rating_2016efficiency_rating_rating_2016AT_rating_2016overall_rating_star_2016financial_rating_star_2016AT_rating_star_2016program_expense_percent_2016admin_expense_percent_2016fund_expense_percent_2016fund_efficiency_2016working_capital_ratio_2016program_expense_growth_2016liabilities_to_assets_2016independent_board_2016no_material_division_2016audited_financials_2016no_loans_related_2016documents_minutes_2016form_990_2016conflict_of_interest_policy_2016whistleblower_policy_2016records_retention_policy_2016CEO_listed_2016process_CEO_compensation_2016no_board_compensation_2016donor_privacy_policy_2016board_listed_2016audited_financials_web_2016form_990_web_2016staff_listed_2016contributions_gifts_grants_2016federated_campaigns_2016membership_dues_2016fundraising_events_2016related_organizations_2016government_grants_2016total_contributions_2016program_service_revenue_2016total_primary_revenue_2016other_revenue_2016total_revenue_2016program_expenses_2016administrative_expenses_2016fundraising_expenses_2016total_functional_expenses_2016payments_to_affiliates_2016excess_or_deficit_2016net_assets_2016comp_2016cp_2016mission_20162011_datacharity_name_2011category_2011city_2011state_2011cause_2011tag_line_2011url_2011ein_2011fye_2011overall_rating_2011overall_rating_2011_plus_30overall_rating_2011_plus_30_v2overall_rating_star_2011overall_rating_star_2011_textefficiency_rating_2011AT_rating_2011financial_rating_star_2011AT_rating_star_2011program_expense_percent_2011admin_expense_percent_2011fund_expense_percent_2011fund_efficiency_2011primary_revenue_growth_2011program_expense_growth_2011working_capital_ratio_2011independent_board_2011no_material_division_2011audited_financials_2011no_loans_related_2011documents_minutes_2011form_990_2011conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011CEO_listed_2011process_CEO_compensation_2011no_board_compensation_2011donor_privacy_policy_2011board_listed_2011audited_financials_web_2011form_990_web_2011staff_listed_2011primary_revenue_2011other_revenue_2011total_revenue_2011govt_revenue_2011program_expense_2011admin_expense_2011fund_expense_2011total_functional_expense_2011affiliate_payments_2011budget_surplus_2011net_assets_2011leader_comp_2011leader_comp_percent_2011email_2011website_20112016 Advisory - Date Posted2016 Advisory - Charity Name2016 Advisory - advisory_url2016 Advisory - advisory_merge_v1to_be_mergedNEW ROWNAME_2015_BMFSTREET_2015_BMFCITY_2015_BMFSTATE_2015_BMFZIP_2015_BMFRULING_2015_BMFACTIVITY_2015_BMFTAX_PERIOD_2015_BMFASSET_AMT_2015_BMFINCOME_AMT_2015_BMFREVENUE_AMT_2015_BMFNTEE_CD_2015_BMF2015 BMFruledate_2004_BMFname_MSTRALLstate_MSTRALLNTEE1_MSTRALLnteecc_MSTRALLzip_MSTRALLfips_MSTRALLtaxper_MSTRALLincome_MSTRALLF990REV_MSTRALLassets_MSTRALLruledate_MSTRALLdeductcd_MSTRALLaccper_MSTRALLrule_date_v1taxpdNAME_SOIyr_frmtnpt1_num_vtng_gvrn_bdy_memspt1_num_ind_vtng_memsnum_vtng_gvrn_bdy_memsnum_ind_vtng_memstot_num_emplstot_num_vlntrscontri_grnts_cyprog_srvc_rev_cyinvst_incm_cyoth_rev_cygrnts_and_smlr_amts_cytot_prof_fndrsng_exp_cytot_fndrsng_exp_cypt1_tot_asts_eoyaud_fincl_stmtsmtrl_divrsn_or_misusecnflct_int_plcywhistleblower_plcydoc_retention_plcyfederated_campaignsmemshp_duesrltd_orgsgovt_grntsall_oth_contrinncsh_contritot_contripsr_totinv_incm_tot_revbonds_tot_revroylrev_tot_revnet_rent_tot_revgain_or_loss_secgain_or_loss_othoth_rev_tottot_revmgmt_srvc_fee_totfee_for_srvc_leg_totfee_for_srvc_acct_totfee_for_srvc_lbby_totfee_for_srvc_prof_totfee_for_srvc_invst_totfee_for_srvc_oth_totfs_auditedaudit_committeevlntr_hrs_merge_v2rule_dateruledate_2004_BMF_v2ruledate_MSTRALL_v2yr_frmtn_v2agecategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policygovt_revenue_2011_binaryother_revenue_2011_binarycomplexity_2011advisorySOX_policies_2011total_revenue_2011_loggedtotal_revenuetotal_revenue_loggedprogram_efficiency_2016statetot_func_expns_prg_srvcstot_func_expns_tot_merge_v3program_expensestotal_expensesprogram_efficiencyfndrsng_events_merge_v4other_revenue_SOIcomplexity_2016complexity_SOIcomplexityconflict_of_interest_policywhistleblower_policyrecords_retention_policyconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binary2016_dataAdvisory Textdonor_advisorydonor_advisory_2016donor_advisory_2011_to_2016SOX_policies_all_binarytotal_revenue_no_negEIN_47conflict_of_interest_policy_47records_retention_policy_47whistleblower_policy_47SOX_policies_47SOX_policies_all_binary_47SOX_policies_binary_47tot_rev_47total_revenue_logged_47program_expenses_47total_expenses_47program_efficiency_47complexity_47_merge
8495810087NaNNaNNaNNaNNaNNaNNaNNaTFY2015NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN58192586710010144595313.0079692293163705260.6188935right_only
8495910552NaNNaNNaNNaNNaNNaNNaNNaTFY2012NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN942719901101201393591315.185653384282441346820.9294123right_only
\n", "
" ], "text/plain": [ " org_id EIN org_url name category category-full Date Published \\\n", "84958 10087 NaN NaN NaN NaN NaN NaN \n", "84959 10552 NaN NaN NaN NaN NaN NaN \n", "\n", " Form 990 FYE Form 990 FYE, v2 FYE Earliest Rating Publication Date \\\n", "84958 NaN NaT FY2015 NaN \n", "84959 NaN NaT FY2012 NaN \n", "\n", " ratings_system Overall Score Overall Rating \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " advisory text - current advisory advisory text - past advisory \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " current_or_past_donor_advisory current_donor_advisory \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " past_donor_advisory latest_entry current_ratings_url ein_2016 \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " Publication_date_and_FY_2016 Publication Date_2016 FYE_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " donor_alert_2016 overall_rating_2016 efficiency_rating_rating_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " AT_rating_2016 overall_rating_star_2016 financial_rating_star_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " AT_rating_star_2016 program_expense_percent_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " admin_expense_percent_2016 fund_expense_percent_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " fund_efficiency_2016 working_capital_ratio_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " program_expense_growth_2016 liabilities_to_assets_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " independent_board_2016 no_material_division_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " audited_financials_2016 no_loans_related_2016 documents_minutes_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " form_990_2016 conflict_of_interest_policy_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " whistleblower_policy_2016 records_retention_policy_2016 CEO_listed_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " process_CEO_compensation_2016 no_board_compensation_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " donor_privacy_policy_2016 board_listed_2016 audited_financials_web_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " form_990_web_2016 staff_listed_2016 contributions_gifts_grants_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " federated_campaigns_2016 membership_dues_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " fundraising_events_2016 related_organizations_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " government_grants_2016 total_contributions_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " program_service_revenue_2016 total_primary_revenue_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " other_revenue_2016 total_revenue_2016 program_expenses_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " administrative_expenses_2016 fundraising_expenses_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " total_functional_expenses_2016 payments_to_affiliates_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " excess_or_deficit_2016 net_assets_2016 comp_2016 cp_2016 mission_2016 \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " 2011_data charity_name_2011 category_2011 city_2011 state_2011 \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " cause_2011 tag_line_2011 url_2011 ein_2011 fye_2011 \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " overall_rating_2011 overall_rating_2011_plus_30 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " overall_rating_2011_plus_30_v2 overall_rating_star_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " overall_rating_star_2011_text efficiency_rating_2011 AT_rating_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " financial_rating_star_2011 AT_rating_star_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " program_expense_percent_2011 admin_expense_percent_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " fund_expense_percent_2011 fund_efficiency_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " primary_revenue_growth_2011 program_expense_growth_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " working_capital_ratio_2011 independent_board_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " no_material_division_2011 audited_financials_2011 no_loans_related_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " documents_minutes_2011 form_990_2011 conflict_of_interest_policy_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " whistleblower_policy_2011 records_retention_policy_2011 CEO_listed_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " process_CEO_compensation_2011 no_board_compensation_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " donor_privacy_policy_2011 board_listed_2011 audited_financials_web_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " form_990_web_2011 staff_listed_2011 primary_revenue_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " other_revenue_2011 total_revenue_2011 govt_revenue_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " program_expense_2011 admin_expense_2011 fund_expense_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " total_functional_expense_2011 affiliate_payments_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " budget_surplus_2011 net_assets_2011 leader_comp_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " leader_comp_percent_2011 email_2011 website_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " 2016 Advisory - Date Posted 2016 Advisory - Charity Name \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " 2016 Advisory - advisory_url 2016 Advisory - advisory _merge_v1 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " to_be_merged NEW ROW NAME_2015_BMF STREET_2015_BMF CITY_2015_BMF \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " STATE_2015_BMF ZIP_2015_BMF RULING_2015_BMF ACTIVITY_2015_BMF \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " TAX_PERIOD_2015_BMF ASSET_AMT_2015_BMF INCOME_AMT_2015_BMF \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " REVENUE_AMT_2015_BMF NTEE_CD_2015_BMF 2015 BMF ruledate_2004_BMF \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " name_MSTRALL state_MSTRALL NTEE1_MSTRALL nteecc_MSTRALL zip_MSTRALL \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " fips_MSTRALL taxper_MSTRALL income_MSTRALL F990REV_MSTRALL \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " assets_MSTRALL ruledate_MSTRALL deductcd_MSTRALL accper_MSTRALL \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " rule_date_v1 taxpd NAME_SOI yr_frmtn pt1_num_vtng_gvrn_bdy_mems \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " pt1_num_ind_vtng_mems num_vtng_gvrn_bdy_mems num_ind_vtng_mems \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " tot_num_empls tot_num_vlntrs contri_grnts_cy prog_srvc_rev_cy \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " invst_incm_cy oth_rev_cy grnts_and_smlr_amts_cy \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " tot_prof_fndrsng_exp_cy tot_fndrsng_exp_cy pt1_tot_asts_eoy \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " aud_fincl_stmts mtrl_divrsn_or_misuse cnflct_int_plcy \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " whistleblower_plcy doc_retention_plcy federated_campaigns memshp_dues \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " rltd_orgs govt_grnts all_oth_contri nncsh_contri tot_contri \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " psr_tot inv_incm_tot_rev bonds_tot_rev roylrev_tot_rev \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " net_rent_tot_rev gain_or_loss_sec gain_or_loss_oth oth_rev_tot \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " tot_rev mgmt_srvc_fee_tot fee_for_srvc_leg_tot \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " fee_for_srvc_acct_tot fee_for_srvc_lbby_tot fee_for_srvc_prof_tot \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " fee_for_srvc_invst_tot fee_for_srvc_oth_tot fs_audited \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " audit_committee vlntr_hrs _merge_v2 rule_date ruledate_2004_BMF_v2 \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " ruledate_MSTRALL_v2 yr_frmtn_v2 age category_Animals \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " category_Education category_Environment category_Health \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " category_International category_Religion \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " category_Research and Public Policy govt_revenue_2011_binary \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " other_revenue_2011_binary complexity_2011 advisory \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " SOX_policies_2011 total_revenue_2011_logged total_revenue \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " total_revenue_logged program_efficiency_2016 state \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " tot_func_expns_prg_srvcs tot_func_expns_tot _merge_v3 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " program_expenses total_expenses program_efficiency fndrsng_events \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " _merge_v4 other_revenue_SOI complexity_2016 complexity_SOI \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " complexity conflict_of_interest_policy whistleblower_policy \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " records_retention_policy conflict_of_interest_policy_v2 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " records_retention_policy_v2 whistleblower_policy_v2 SOX_policies \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " SOX_policies_binary 2016_data Advisory Text donor_advisory \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " SOX_policies_all_binary total_revenue_no_neg EIN_47 \\\n", "84958 NaN NaN 581925867 \n", "84959 NaN NaN 942719901 \n", "\n", " conflict_of_interest_policy_47 records_retention_policy_47 \\\n", "84958 1 0 \n", "84959 1 0 \n", "\n", " whistleblower_policy_47 SOX_policies_47 SOX_policies_all_binary_47 \\\n", "84958 0 1 0 \n", "84959 1 2 0 \n", "\n", " SOX_policies_binary_47 tot_rev_47 total_revenue_logged_47 \\\n", "84958 1 445953 13.007969 \n", "84959 1 3935913 15.185653 \n", "\n", " program_expenses_47 total_expenses_47 program_efficiency_47 \\\n", "84958 229316 370526 0.618893 \n", "84959 3842824 4134682 0.929412 \n", "\n", " complexity_47 _merge \n", "84958 5 right_only \n", "84959 3 right_only " ] }, "execution_count": 1321, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['_merge']=='right_only'][:2]" ] }, { "cell_type": "code", "execution_count": 1322, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.rename(columns={'_merge':'_merge_47'}, inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Replace values" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "print len(df[df['EIN'].notnull()])\n", "df['EIN'] = np.where( ( (df['EIN'].isnull()) & (df['EIN_47'].notnull()) ),\n", " df['EIN_47'], df['EIN'])\n", "print len(df[df['EIN'].notnull()])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "print len(df[df['conflict_of_interest_policy_v2'].notnull()])\n", "df['conflict_of_interest_policy_v2'] = np.where( ( (df['conflict_of_interest_policy_v2'].isnull()) \n", " & (df['conflict_of_interest_policy_47'].notnull()) ),\n", " df['conflict_of_interest_policy_47'], df['conflict_of_interest_policy_v2'])\n", "print len(df[df['conflict_of_interest_policy_v2'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1328, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21894\n", "21939\n" ] } ], "source": [ "print len(df[df['records_retention_policy_v2'].notnull()])\n", "df['records_retention_policy_v2'] = np.where( ( (df['records_retention_policy_v2'].isnull()) \n", " & (df['records_retention_policy_47'].notnull()) ),\n", " df['records_retention_policy_47'], df['records_retention_policy_v2'])\n", "print len(df[df['records_retention_policy_v2'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1340, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21894\n", "21939\n" ] } ], "source": [ "print len(df[df['whistleblower_policy_v2'].notnull()])\n", "df['whistleblower_policy_v2'] = np.where( ( (df['whistleblower_policy_v2'].isnull()) & (df['whistleblower_policy_47'].notnull()) ),\n", " df['whistleblower_policy_47'], df['whistleblower_policy_v2'])\n", "print len(df[df['whistleblower_policy_v2'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1330, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21894\n", "21939\n" ] } ], "source": [ "print len(df[df['SOX_policies'].notnull()])\n", "df['SOX_policies'] = np.where( ( (df['SOX_policies'].isnull()) & (df['SOX_policies_47'].notnull()) ),\n", " df['SOX_policies_47'], df['SOX_policies'])\n", "print len(df[df['SOX_policies'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1331, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21894\n", "21939\n" ] } ], "source": [ "print len(df[df['SOX_policies_all_binary'].notnull()])\n", "df['SOX_policies_all_binary'] = np.where( ( (df['SOX_policies_all_binary'].isnull()) & (df['SOX_policies_all_binary_47'].notnull()) ),\n", " df['SOX_policies_all_binary_47'], df['SOX_policies_all_binary'])\n", "print len(df[df['SOX_policies_all_binary'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1332, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21894\n", "21939\n" ] } ], "source": [ "print len(df[df['SOX_policies_binary'].notnull()])\n", "df['SOX_policies_binary'] = np.where( ( (df['SOX_policies_binary'].isnull()) & (df['SOX_policies_binary_47'].notnull()) ),\n", " df['SOX_policies_binary_47'], df['SOX_policies_binary'])\n", "print len(df[df['SOX_policies_binary'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1333, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10964\n", "11009\n" ] } ], "source": [ "print len(df[df['tot_rev'].notnull()])\n", "df['tot_rev'] = np.where( ( (df['tot_rev'].isnull()) & (df['tot_rev_47'].notnull()) ),\n", " df['tot_rev_47'], df['tot_rev'])\n", "print len(df[df['tot_rev'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1334, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21894\n", "21939\n" ] } ], "source": [ "print len(df[df['total_revenue_logged'].notnull()])\n", "df['total_revenue_logged'] = np.where( ( (df['total_revenue_logged'].isnull()) & (df['total_revenue_logged_47'].notnull()) ),\n", " df['total_revenue_logged_47'], df['total_revenue_logged'])\n", "print len(df[df['total_revenue_logged'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1335, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21894\n", "21939\n" ] } ], "source": [ "print len(df[df['program_expenses'].notnull()])\n", "df['program_expenses'] = np.where( ( (df['program_expenses'].isnull()) & (df['program_expenses_47'].notnull()) ),\n", " df['program_expenses_47'], df['program_expenses'])\n", "print len(df[df['program_expenses'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1336, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21894\n", "21939\n" ] } ], "source": [ "print len(df[df['total_expenses'].notnull()])\n", "df['total_expenses'] = np.where( ( (df['total_expenses'].isnull()) & (df['total_expenses_47'].notnull()) ),\n", " df['total_expenses_47'], df['total_expenses'])\n", "print len(df[df['total_expenses'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1337, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21894\n", "21938\n" ] } ], "source": [ "print len(df[df['program_efficiency'].notnull()])\n", "df['program_efficiency'] = np.where( ( (df['program_efficiency'].isnull()) & (df['program_efficiency_47'].notnull()) ),\n", " df['program_efficiency_47'], df['program_efficiency'])\n", "print len(df[df['program_efficiency'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1338, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "84958\n", "85003\n" ] } ], "source": [ "print len(df[df['complexity'].notnull()])\n", "df['complexity'] = np.where( ( (df['complexity'].isnull()) & (df['complexity_47'].notnull()) ),\n", " df['complexity_47'], df['complexity'])\n", "print len(df[df['complexity'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1341, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINorg_urlnamecategorycategory-fullDate PublishedForm 990 FYEForm 990 FYE, v2FYEEarliest Rating Publication Dateratings_systemOverall ScoreOverall Ratingadvisory text - current advisoryadvisory text - past advisorycurrent_or_past_donor_advisorycurrent_donor_advisorypast_donor_advisorylatest_entrycurrent_ratings_urlein_2016Publication_date_and_FY_2016Publication Date_2016FYE_2016donor_alert_2016overall_rating_2016efficiency_rating_rating_2016AT_rating_2016overall_rating_star_2016financial_rating_star_2016AT_rating_star_2016program_expense_percent_2016admin_expense_percent_2016fund_expense_percent_2016fund_efficiency_2016working_capital_ratio_2016program_expense_growth_2016liabilities_to_assets_2016independent_board_2016no_material_division_2016audited_financials_2016no_loans_related_2016documents_minutes_2016form_990_2016conflict_of_interest_policy_2016whistleblower_policy_2016records_retention_policy_2016CEO_listed_2016process_CEO_compensation_2016no_board_compensation_2016donor_privacy_policy_2016board_listed_2016audited_financials_web_2016form_990_web_2016staff_listed_2016contributions_gifts_grants_2016federated_campaigns_2016membership_dues_2016fundraising_events_2016related_organizations_2016government_grants_2016total_contributions_2016program_service_revenue_2016total_primary_revenue_2016other_revenue_2016total_revenue_2016program_expenses_2016administrative_expenses_2016fundraising_expenses_2016total_functional_expenses_2016payments_to_affiliates_2016excess_or_deficit_2016net_assets_2016comp_2016cp_2016mission_20162011_datacharity_name_2011category_2011city_2011state_2011cause_2011tag_line_2011url_2011ein_2011fye_2011overall_rating_2011overall_rating_2011_plus_30overall_rating_2011_plus_30_v2overall_rating_star_2011overall_rating_star_2011_textefficiency_rating_2011AT_rating_2011financial_rating_star_2011AT_rating_star_2011program_expense_percent_2011admin_expense_percent_2011fund_expense_percent_2011fund_efficiency_2011primary_revenue_growth_2011program_expense_growth_2011working_capital_ratio_2011independent_board_2011no_material_division_2011audited_financials_2011no_loans_related_2011documents_minutes_2011form_990_2011conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011CEO_listed_2011process_CEO_compensation_2011no_board_compensation_2011donor_privacy_policy_2011board_listed_2011audited_financials_web_2011form_990_web_2011staff_listed_2011primary_revenue_2011other_revenue_2011total_revenue_2011govt_revenue_2011program_expense_2011admin_expense_2011fund_expense_2011total_functional_expense_2011affiliate_payments_2011budget_surplus_2011net_assets_2011leader_comp_2011leader_comp_percent_2011email_2011website_20112016 Advisory - Date Posted2016 Advisory - Charity Name2016 Advisory - advisory_url2016 Advisory - advisory_merge_v1to_be_mergedNEW ROWNAME_2015_BMFSTREET_2015_BMFCITY_2015_BMFSTATE_2015_BMFZIP_2015_BMFRULING_2015_BMFACTIVITY_2015_BMFTAX_PERIOD_2015_BMFASSET_AMT_2015_BMFINCOME_AMT_2015_BMFREVENUE_AMT_2015_BMFNTEE_CD_2015_BMF2015 BMFruledate_2004_BMFname_MSTRALLstate_MSTRALLNTEE1_MSTRALLnteecc_MSTRALLzip_MSTRALLfips_MSTRALLtaxper_MSTRALLincome_MSTRALLF990REV_MSTRALLassets_MSTRALLruledate_MSTRALLdeductcd_MSTRALLaccper_MSTRALLrule_date_v1taxpdNAME_SOIyr_frmtnpt1_num_vtng_gvrn_bdy_memspt1_num_ind_vtng_memsnum_vtng_gvrn_bdy_memsnum_ind_vtng_memstot_num_emplstot_num_vlntrscontri_grnts_cyprog_srvc_rev_cyinvst_incm_cyoth_rev_cygrnts_and_smlr_amts_cytot_prof_fndrsng_exp_cytot_fndrsng_exp_cypt1_tot_asts_eoyaud_fincl_stmtsmtrl_divrsn_or_misusecnflct_int_plcywhistleblower_plcydoc_retention_plcyfederated_campaignsmemshp_duesrltd_orgsgovt_grntsall_oth_contrinncsh_contritot_contripsr_totinv_incm_tot_revbonds_tot_revroylrev_tot_revnet_rent_tot_revgain_or_loss_secgain_or_loss_othoth_rev_tottot_revmgmt_srvc_fee_totfee_for_srvc_leg_totfee_for_srvc_acct_totfee_for_srvc_lbby_totfee_for_srvc_prof_totfee_for_srvc_invst_totfee_for_srvc_oth_totfs_auditedaudit_committeevlntr_hrs_merge_v2rule_dateruledate_2004_BMF_v2ruledate_MSTRALL_v2yr_frmtn_v2agecategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policygovt_revenue_2011_binaryother_revenue_2011_binarycomplexity_2011advisorySOX_policies_2011total_revenue_2011_loggedtotal_revenuetotal_revenue_loggedprogram_efficiency_2016statetot_func_expns_prg_srvcstot_func_expns_tot_merge_v3program_expensestotal_expensesprogram_efficiencyfndrsng_events_merge_v4other_revenue_SOIcomplexity_2016complexity_SOIcomplexityconflict_of_interest_policywhistleblower_policyrecords_retention_policyconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binary2016_dataAdvisory Textdonor_advisorydonor_advisory_2016donor_advisory_2011_to_2016SOX_policies_all_binarytotal_revenue_no_negEIN_47conflict_of_interest_policy_47records_retention_policy_47whistleblower_policy_47SOX_policies_47SOX_policies_all_binary_47SOX_policies_binary_47tot_rev_47total_revenue_logged_47program_expenses_47total_expenses_47program_efficiency_47complexity_47_merge_47
8495810087581925867NaNNaNNaNNaNNaNNaNNaTFY2015NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN445953NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN13.007969NaNNaNNaNNaNNaN2293163705260.618893NaNNaNNaNNaNNaN5NaN0NaN10011NaNNaNNaNNaNNaN0NaN58192586710010144595313.0079692293163705260.6188935right_only
8495910552942719901NaNNaNNaNNaNNaNNaNNaTFY2012NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN3935913NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN15.185653NaNNaNNaNNaNNaN384282441346820.929412NaNNaNNaNNaNNaN3NaN1NaN10121NaNNaNNaNNaNNaN0NaN942719901101201393591315.185653384282441346820.9294123right_only
\n", "
" ], "text/plain": [ " org_id EIN org_url name category category-full Date Published \\\n", "84958 10087 581925867 NaN NaN NaN NaN NaN \n", "84959 10552 942719901 NaN NaN NaN NaN NaN \n", "\n", " Form 990 FYE Form 990 FYE, v2 FYE Earliest Rating Publication Date \\\n", "84958 NaN NaT FY2015 NaN \n", "84959 NaN NaT FY2012 NaN \n", "\n", " ratings_system Overall Score Overall Rating \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " advisory text - current advisory advisory text - past advisory \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " current_or_past_donor_advisory current_donor_advisory \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " past_donor_advisory latest_entry current_ratings_url ein_2016 \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " Publication_date_and_FY_2016 Publication Date_2016 FYE_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " donor_alert_2016 overall_rating_2016 efficiency_rating_rating_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " AT_rating_2016 overall_rating_star_2016 financial_rating_star_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " AT_rating_star_2016 program_expense_percent_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " admin_expense_percent_2016 fund_expense_percent_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " fund_efficiency_2016 working_capital_ratio_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " program_expense_growth_2016 liabilities_to_assets_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " independent_board_2016 no_material_division_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " audited_financials_2016 no_loans_related_2016 documents_minutes_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " form_990_2016 conflict_of_interest_policy_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " whistleblower_policy_2016 records_retention_policy_2016 CEO_listed_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " process_CEO_compensation_2016 no_board_compensation_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " donor_privacy_policy_2016 board_listed_2016 audited_financials_web_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " form_990_web_2016 staff_listed_2016 contributions_gifts_grants_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " federated_campaigns_2016 membership_dues_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " fundraising_events_2016 related_organizations_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " government_grants_2016 total_contributions_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " program_service_revenue_2016 total_primary_revenue_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " other_revenue_2016 total_revenue_2016 program_expenses_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " administrative_expenses_2016 fundraising_expenses_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " total_functional_expenses_2016 payments_to_affiliates_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " excess_or_deficit_2016 net_assets_2016 comp_2016 cp_2016 mission_2016 \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " 2011_data charity_name_2011 category_2011 city_2011 state_2011 \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " cause_2011 tag_line_2011 url_2011 ein_2011 fye_2011 \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " overall_rating_2011 overall_rating_2011_plus_30 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " overall_rating_2011_plus_30_v2 overall_rating_star_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " overall_rating_star_2011_text efficiency_rating_2011 AT_rating_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " financial_rating_star_2011 AT_rating_star_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " program_expense_percent_2011 admin_expense_percent_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " fund_expense_percent_2011 fund_efficiency_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " primary_revenue_growth_2011 program_expense_growth_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " working_capital_ratio_2011 independent_board_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " no_material_division_2011 audited_financials_2011 no_loans_related_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " documents_minutes_2011 form_990_2011 conflict_of_interest_policy_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " whistleblower_policy_2011 records_retention_policy_2011 CEO_listed_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " process_CEO_compensation_2011 no_board_compensation_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " donor_privacy_policy_2011 board_listed_2011 audited_financials_web_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " form_990_web_2011 staff_listed_2011 primary_revenue_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " other_revenue_2011 total_revenue_2011 govt_revenue_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " program_expense_2011 admin_expense_2011 fund_expense_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " total_functional_expense_2011 affiliate_payments_2011 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " budget_surplus_2011 net_assets_2011 leader_comp_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " leader_comp_percent_2011 email_2011 website_2011 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " 2016 Advisory - Date Posted 2016 Advisory - Charity Name \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " 2016 Advisory - advisory_url 2016 Advisory - advisory _merge_v1 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " to_be_merged NEW ROW NAME_2015_BMF STREET_2015_BMF CITY_2015_BMF \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " STATE_2015_BMF ZIP_2015_BMF RULING_2015_BMF ACTIVITY_2015_BMF \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " TAX_PERIOD_2015_BMF ASSET_AMT_2015_BMF INCOME_AMT_2015_BMF \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " REVENUE_AMT_2015_BMF NTEE_CD_2015_BMF 2015 BMF ruledate_2004_BMF \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " name_MSTRALL state_MSTRALL NTEE1_MSTRALL nteecc_MSTRALL zip_MSTRALL \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " fips_MSTRALL taxper_MSTRALL income_MSTRALL F990REV_MSTRALL \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " assets_MSTRALL ruledate_MSTRALL deductcd_MSTRALL accper_MSTRALL \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " rule_date_v1 taxpd NAME_SOI yr_frmtn pt1_num_vtng_gvrn_bdy_mems \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " pt1_num_ind_vtng_mems num_vtng_gvrn_bdy_mems num_ind_vtng_mems \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " tot_num_empls tot_num_vlntrs contri_grnts_cy prog_srvc_rev_cy \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " invst_incm_cy oth_rev_cy grnts_and_smlr_amts_cy \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " tot_prof_fndrsng_exp_cy tot_fndrsng_exp_cy pt1_tot_asts_eoy \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " aud_fincl_stmts mtrl_divrsn_or_misuse cnflct_int_plcy \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " whistleblower_plcy doc_retention_plcy federated_campaigns memshp_dues \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " rltd_orgs govt_grnts all_oth_contri nncsh_contri tot_contri \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " psr_tot inv_incm_tot_rev bonds_tot_rev roylrev_tot_rev \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " net_rent_tot_rev gain_or_loss_sec gain_or_loss_oth oth_rev_tot \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " tot_rev mgmt_srvc_fee_tot fee_for_srvc_leg_tot \\\n", "84958 445953 NaN NaN \n", "84959 3935913 NaN NaN \n", "\n", " fee_for_srvc_acct_tot fee_for_srvc_lbby_tot fee_for_srvc_prof_tot \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " fee_for_srvc_invst_tot fee_for_srvc_oth_tot fs_audited \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " audit_committee vlntr_hrs _merge_v2 rule_date ruledate_2004_BMF_v2 \\\n", "84958 NaN NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN NaN \n", "\n", " ruledate_MSTRALL_v2 yr_frmtn_v2 age category_Animals \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " category_Education category_Environment category_Health \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " category_International category_Religion \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " category_Research and Public Policy govt_revenue_2011_binary \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " other_revenue_2011_binary complexity_2011 advisory \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " SOX_policies_2011 total_revenue_2011_logged total_revenue \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " total_revenue_logged program_efficiency_2016 state \\\n", "84958 13.007969 NaN NaN \n", "84959 15.185653 NaN NaN \n", "\n", " tot_func_expns_prg_srvcs tot_func_expns_tot _merge_v3 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "\n", " program_expenses total_expenses program_efficiency fndrsng_events \\\n", "84958 229316 370526 0.618893 NaN \n", "84959 3842824 4134682 0.929412 NaN \n", "\n", " _merge_v4 other_revenue_SOI complexity_2016 complexity_SOI \\\n", "84958 NaN NaN NaN NaN \n", "84959 NaN NaN NaN NaN \n", "\n", " complexity conflict_of_interest_policy whistleblower_policy \\\n", "84958 5 NaN 0 \n", "84959 3 NaN 1 \n", "\n", " records_retention_policy conflict_of_interest_policy_v2 \\\n", "84958 NaN 1 \n", "84959 NaN 1 \n", "\n", " records_retention_policy_v2 whistleblower_policy_v2 SOX_policies \\\n", "84958 0 0 1 \n", "84959 0 1 2 \n", "\n", " SOX_policies_binary 2016_data Advisory Text donor_advisory \\\n", "84958 1 NaN NaN NaN \n", "84959 1 NaN NaN NaN \n", "\n", " donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "\n", " SOX_policies_all_binary total_revenue_no_neg EIN_47 \\\n", "84958 0 NaN 581925867 \n", "84959 0 NaN 942719901 \n", "\n", " conflict_of_interest_policy_47 records_retention_policy_47 \\\n", "84958 1 0 \n", "84959 1 0 \n", "\n", " whistleblower_policy_47 SOX_policies_47 SOX_policies_all_binary_47 \\\n", "84958 0 1 0 \n", "84959 1 2 0 \n", "\n", " SOX_policies_binary_47 tot_rev_47 total_revenue_logged_47 \\\n", "84958 1 445953 13.007969 \n", "84959 1 3935913 15.185653 \n", "\n", " program_expenses_47 total_expenses_47 program_efficiency_47 \\\n", "84958 229316 370526 0.618893 \n", "84959 3842824 4134682 0.929412 \n", "\n", " complexity_47 _merge_47 \n", "84958 5 right_only \n", "84959 3 right_only " ] }, "execution_count": 1341, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['_merge_47']=='right_only'][:2]" ] }, { "cell_type": "code", "execution_count": 1344, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ "print cols" ] }, { "cell_type": "code", "execution_count": 1351, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cols = ['org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data',\n", " 'donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', \n", " 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', \n", " 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', \n", " 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'tot_rev',\n", " 'state', 'category', 'category_Animals', 'category_Arts, Culture, Humanities', \n", " 'category_Community Development', 'category_Education', 'category_Environment', \n", " 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', \n", " 'category_International', 'category_Religion', 'category_Research and Public Policy',\n", " ]" ] }, { "cell_type": "code", "execution_count": 1352, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYEForm 990 FYEratings_system2011_data2016_datadonor_advisorydonor_advisory_2016donor_advisory_2011_to_2016conflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexityagetotal_revenue_loggedtot_revstatecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
8495810087581925867FY2015NaNNaNNaNNaNNaNNaNNaN1001100.6188935NaN13.007969445953NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
8495910552942719901FY2012NaNNaNNaNNaNNaNNaNNaN1012100.9294123NaN15.1856533935913NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
8496010902262224994FY2014NaNNaNNaNNaNNaNNaNNaN1113110.6290873NaN13.468637706895NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
8496111009953523852FY2014NaNNaNNaNNaNNaNNaNNaN1113110.1419912NaN15.1021603620634NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
8496211327720760857FY2014NaNNaNNaNNaNNaNNaNNaN1113110.8520325NaN15.2611244244456NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
8496311671300038297FY2015NaNNaNNaNNaNNaNNaNNaN0000001.0000002NaN13.518482743023NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
8496411787421568866FY2013NaNNaNNaNNaNNaNNaNNaN1113110.2746141NaN15.4274705012622NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
8496512740201226416FY2014NaNNaNNaNNaNNaNNaNNaN1113110.0798282NaN15.5284125545025NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849663416581766061FY2013NaNNaNNaNNaNNaNNaNNaN1113110.3520634NaN15.9150628162508NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849673432741109733FY2014NaNNaNNaNNaNNaNNaNNaN1113110.9294397NaN17.18563229082050NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849683495760574835FY2014NaNNaNNaNNaNNaNNaNNaN1113110.9456722NaN12.365475234562NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849693696362167011FY2014NaNNaNNaNNaNNaNNaNNaN1113110.8440135NaN18.37266495311778NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849704172530173054FY2014NaNNaNNaNNaNNaNNaNNaN1001100.3642716NaN14.7204672471824NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849714292941347046FY2014NaNNaNNaNNaNNaNNaNNaN1113110.4355134NaN16.33246012390738NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
84972444142129889FY2014NaNNaNNaNNaNNaNNaNNaN1113110.7213594NaN16.95081022995526NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849734518731284606FY2015NaNNaNNaNNaNNaNNaNNaN1113110.8497293NaN15.3885864821453NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849744574952844062FY2014NaNNaNNaNNaNNaNNaNNaN1113110.6892854NaN18.615514121510885NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849754608222680030FY2014NaNNaNNaNNaNNaNNaNNaN1113110.8186915NaN16.97908523655000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849764778112613334FY2015NaNNaNNaNNaNNaNNaNNaN0000000.7905484NaN16.32482812296531NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849774785390806314FY2014NaNNaNNaNNaNNaNNaNNaN1113110.8125586NaN17.20122629539116NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849784994133552154FY2011NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849795445521614093FY2014NaNNaNNaNNaNNaNNaNNaN1113110.8521315NaN14.7974522669634NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849805602611080398FY2010NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849815652731011191FY2014NaNNaNNaNNaNNaNNaNNaN1113110.6407623NaN14.0495141263649NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849825668741945661FY2014NaNNaNNaNNaNNaNNaNNaN1113110.5685724NaN15.0171643325610NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849836033135590516FY2014NaNNaNNaNNaNNaNNaNNaN0000000.9770703NaN14.2555831552819NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849846705112716763FY2013NaNNaNNaNNaNNaNNaNNaN1113110.7951812NaN13.234134559128NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849856897311016441FY2014NaNNaNNaNNaNNaNNaNNaN1113110.6521183NaN14.9819063210399NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849867051440665046FY2014NaNNaNNaNNaNNaNNaNNaN1113110.9559553NaN13.719331908301NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849877229592111273FY2014NaNNaNNaNNaNNaNNaNNaN0000000.8986823NaN14.6744482360651NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849887299742372030FY2014NaNNaNNaNNaNNaNNaNNaN0112100.6711273NaN13.786554971459NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849897651113059922FY2014NaNNaNNaNNaNNaNNaNNaN1113110.4191072NaN13.057601468645NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849907909911488652FY2014NaNNaNNaNNaNNaNNaNNaN0000000.6499082NaN12.538616278902NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849917973521165147FY2014NaNNaNNaNNaNNaNNaNNaN1012100.5544232NaN13.688772880964NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849928005581494135FY2014NaNNaNNaNNaNNaNNaNNaN1113110.6004351NaN14.0115181216536NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849938358431196717FY2015NaNNaNNaNNaNNaNNaNNaN1113110.8746414NaN15.3685644725877NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849948404610523288FY2014NaNNaNNaNNaNNaNNaNNaN1113110.7583925NaN13.373186642541NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849958626133119118FY2015NaNNaNNaNNaNNaNNaNNaN0000000.9738983NaN13.9246121115275NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849968717860335082FY2014NaNNaNNaNNaNNaNNaNNaN1113110.6419643NaN14.7159182460605NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849978722910996619FY2014NaNNaNNaNNaNNaNNaNNaN1113110.7197803NaN14.5834032155219NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849989107141631995FY2014NaNNaNNaNNaNNaNNaNNaN1113110.7326963NaN15.4921945347792NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
849999190911298249FY2014NaNNaNNaNNaNNaNNaNNaN101210NaN1NaN14.7228352477684NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
850009557521629221FY2013NaNNaNNaNNaNNaNNaNNaN1113110.6855042NaN14.4591091903318NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
850019609581909303FY2015NaNNaNNaNNaNNaNNaNNaN1113110.9134504NaN14.8301392758339NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
850029761911959600FY2014NaNNaNNaNNaNNaNNaNNaN1001100.9046313NaN13.072391475628NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
850039765930386792FY2014NaNNaNNaNNaNNaNNaNNaN1102100.8365474NaN15.8924907980330NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
850049967582197227FY2014NaNNaNNaNNaNNaNNaNNaN0000000.8558371NaN14.1114101344335NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "84958 10087 581925867 FY2015 NaN NaN NaN \n", "84959 10552 942719901 FY2012 NaN NaN NaN \n", "84960 10902 262224994 FY2014 NaN NaN NaN \n", "84961 11009 953523852 FY2014 NaN NaN NaN \n", "84962 11327 720760857 FY2014 NaN NaN NaN \n", "84963 11671 300038297 FY2015 NaN NaN NaN \n", "84964 11787 421568866 FY2013 NaN NaN NaN \n", "84965 12740 201226416 FY2014 NaN NaN NaN \n", "84966 3416 581766061 FY2013 NaN NaN NaN \n", "84967 3432 741109733 FY2014 NaN NaN NaN \n", "84968 3495 760574835 FY2014 NaN NaN NaN \n", "84969 3696 362167011 FY2014 NaN NaN NaN \n", "84970 4172 530173054 FY2014 NaN NaN NaN \n", "84971 4292 941347046 FY2014 NaN NaN NaN \n", "84972 4441 42129889 FY2014 NaN NaN NaN \n", "84973 4518 731284606 FY2015 NaN NaN NaN \n", "84974 4574 952844062 FY2014 NaN NaN NaN \n", "84975 4608 222680030 FY2014 NaN NaN NaN \n", "84976 4778 112613334 FY2015 NaN NaN NaN \n", "84977 4785 390806314 FY2014 NaN NaN NaN \n", "84978 4994 133552154 FY2011 NaN NaN NaN \n", "84979 5445 521614093 FY2014 NaN NaN NaN \n", "84980 5602 611080398 FY2010 NaN NaN NaN \n", "84981 5652 731011191 FY2014 NaN NaN NaN \n", "84982 5668 741945661 FY2014 NaN NaN NaN \n", "84983 6033 135590516 FY2014 NaN NaN NaN \n", "84984 6705 112716763 FY2013 NaN NaN NaN \n", "84985 6897 311016441 FY2014 NaN NaN NaN \n", "84986 7051 440665046 FY2014 NaN NaN NaN \n", "84987 7229 592111273 FY2014 NaN NaN NaN \n", "84988 7299 742372030 FY2014 NaN NaN NaN \n", "84989 7651 113059922 FY2014 NaN NaN NaN \n", "84990 7909 911488652 FY2014 NaN NaN NaN \n", "84991 7973 521165147 FY2014 NaN NaN NaN \n", "84992 8005 581494135 FY2014 NaN NaN NaN \n", "84993 8358 431196717 FY2015 NaN NaN NaN \n", "84994 8404 610523288 FY2014 NaN NaN NaN \n", "84995 8626 133119118 FY2015 NaN NaN NaN \n", "84996 8717 860335082 FY2014 NaN NaN NaN \n", "84997 8722 910996619 FY2014 NaN NaN NaN \n", "84998 9107 141631995 FY2014 NaN NaN NaN \n", "84999 9190 911298249 FY2014 NaN NaN NaN \n", "85000 9557 521629221 FY2013 NaN NaN NaN \n", "85001 9609 581909303 FY2015 NaN NaN NaN \n", "85002 9761 911959600 FY2014 NaN NaN NaN \n", "85003 9765 930386792 FY2014 NaN NaN NaN \n", "85004 9967 582197227 FY2014 NaN NaN NaN \n", "\n", " 2016_data donor_advisory donor_advisory_2016 \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "84960 NaN NaN NaN \n", "84961 NaN NaN NaN \n", "84962 NaN NaN NaN \n", "84963 NaN NaN NaN \n", "84964 NaN NaN NaN \n", "84965 NaN NaN NaN \n", "84966 NaN NaN NaN \n", "84967 NaN NaN NaN \n", "84968 NaN NaN NaN \n", "84969 NaN NaN NaN \n", "84970 NaN NaN NaN \n", "84971 NaN NaN NaN \n", "84972 NaN NaN NaN \n", "84973 NaN NaN NaN \n", "84974 NaN NaN NaN \n", "84975 NaN NaN NaN \n", "84976 NaN NaN NaN \n", "84977 NaN NaN NaN \n", "84978 NaN NaN NaN \n", "84979 NaN NaN NaN \n", "84980 NaN NaN NaN \n", "84981 NaN NaN NaN \n", "84982 NaN NaN NaN \n", "84983 NaN NaN NaN \n", "84984 NaN NaN NaN \n", "84985 NaN NaN NaN \n", "84986 NaN NaN NaN \n", "84987 NaN NaN NaN \n", "84988 NaN NaN NaN \n", "84989 NaN NaN NaN \n", "84990 NaN NaN NaN \n", "84991 NaN NaN NaN \n", "84992 NaN NaN NaN \n", "84993 NaN NaN NaN \n", "84994 NaN NaN NaN \n", "84995 NaN NaN NaN \n", "84996 NaN NaN NaN \n", "84997 NaN NaN NaN \n", "84998 NaN NaN NaN \n", "84999 NaN NaN NaN \n", "85000 NaN NaN NaN \n", "85001 NaN NaN NaN \n", "85002 NaN NaN NaN \n", "85003 NaN NaN NaN \n", "85004 NaN NaN NaN \n", "\n", " donor_advisory_2011_to_2016 conflict_of_interest_policy_v2 \\\n", "84958 NaN 1 \n", "84959 NaN 1 \n", "84960 NaN 1 \n", "84961 NaN 1 \n", "84962 NaN 1 \n", "84963 NaN 0 \n", "84964 NaN 1 \n", "84965 NaN 1 \n", "84966 NaN 1 \n", "84967 NaN 1 \n", "84968 NaN 1 \n", "84969 NaN 1 \n", "84970 NaN 1 \n", "84971 NaN 1 \n", "84972 NaN 1 \n", "84973 NaN 1 \n", "84974 NaN 1 \n", "84975 NaN 1 \n", "84976 NaN 0 \n", "84977 NaN 1 \n", "84978 NaN NaN \n", "84979 NaN 1 \n", "84980 NaN NaN \n", "84981 NaN 1 \n", "84982 NaN 1 \n", "84983 NaN 0 \n", "84984 NaN 1 \n", "84985 NaN 1 \n", "84986 NaN 1 \n", "84987 NaN 0 \n", "84988 NaN 0 \n", "84989 NaN 1 \n", "84990 NaN 0 \n", "84991 NaN 1 \n", "84992 NaN 1 \n", "84993 NaN 1 \n", "84994 NaN 1 \n", "84995 NaN 0 \n", "84996 NaN 1 \n", "84997 NaN 1 \n", "84998 NaN 1 \n", "84999 NaN 1 \n", "85000 NaN 1 \n", "85001 NaN 1 \n", "85002 NaN 1 \n", "85003 NaN 1 \n", "85004 NaN 0 \n", "\n", " records_retention_policy_v2 whistleblower_policy_v2 SOX_policies \\\n", "84958 0 0 1 \n", "84959 0 1 2 \n", "84960 1 1 3 \n", "84961 1 1 3 \n", "84962 1 1 3 \n", "84963 0 0 0 \n", "84964 1 1 3 \n", "84965 1 1 3 \n", "84966 1 1 3 \n", "84967 1 1 3 \n", "84968 1 1 3 \n", "84969 1 1 3 \n", "84970 0 0 1 \n", "84971 1 1 3 \n", "84972 1 1 3 \n", "84973 1 1 3 \n", "84974 1 1 3 \n", "84975 1 1 3 \n", "84976 0 0 0 \n", "84977 1 1 3 \n", "84978 NaN NaN NaN \n", "84979 1 1 3 \n", "84980 NaN NaN NaN \n", "84981 1 1 3 \n", "84982 1 1 3 \n", "84983 0 0 0 \n", "84984 1 1 3 \n", "84985 1 1 3 \n", "84986 1 1 3 \n", "84987 0 0 0 \n", "84988 1 1 2 \n", "84989 1 1 3 \n", "84990 0 0 0 \n", "84991 0 1 2 \n", "84992 1 1 3 \n", "84993 1 1 3 \n", "84994 1 1 3 \n", "84995 0 0 0 \n", "84996 1 1 3 \n", "84997 1 1 3 \n", "84998 1 1 3 \n", "84999 0 1 2 \n", "85000 1 1 3 \n", "85001 1 1 3 \n", "85002 0 0 1 \n", "85003 1 0 2 \n", "85004 0 0 0 \n", "\n", " SOX_policies_binary SOX_policies_all_binary program_efficiency \\\n", "84958 1 0 0.618893 \n", "84959 1 0 0.929412 \n", "84960 1 1 0.629087 \n", "84961 1 1 0.141991 \n", "84962 1 1 0.852032 \n", "84963 0 0 1.000000 \n", "84964 1 1 0.274614 \n", "84965 1 1 0.079828 \n", "84966 1 1 0.352063 \n", "84967 1 1 0.929439 \n", "84968 1 1 0.945672 \n", "84969 1 1 0.844013 \n", "84970 1 0 0.364271 \n", "84971 1 1 0.435513 \n", "84972 1 1 0.721359 \n", "84973 1 1 0.849729 \n", "84974 1 1 0.689285 \n", "84975 1 1 0.818691 \n", "84976 0 0 0.790548 \n", "84977 1 1 0.812558 \n", "84978 NaN NaN NaN \n", "84979 1 1 0.852131 \n", "84980 NaN NaN NaN \n", "84981 1 1 0.640762 \n", "84982 1 1 0.568572 \n", "84983 0 0 0.977070 \n", "84984 1 1 0.795181 \n", "84985 1 1 0.652118 \n", "84986 1 1 0.955955 \n", "84987 0 0 0.898682 \n", "84988 1 0 0.671127 \n", "84989 1 1 0.419107 \n", "84990 0 0 0.649908 \n", "84991 1 0 0.554423 \n", "84992 1 1 0.600435 \n", "84993 1 1 0.874641 \n", "84994 1 1 0.758392 \n", "84995 0 0 0.973898 \n", "84996 1 1 0.641964 \n", "84997 1 1 0.719780 \n", "84998 1 1 0.732696 \n", "84999 1 0 NaN \n", "85000 1 1 0.685504 \n", "85001 1 1 0.913450 \n", "85002 1 0 0.904631 \n", "85003 1 0 0.836547 \n", "85004 0 0 0.855837 \n", "\n", " complexity age total_revenue_logged tot_rev state category \\\n", "84958 5 NaN 13.007969 445953 NaN NaN \n", "84959 3 NaN 15.185653 3935913 NaN NaN \n", "84960 3 NaN 13.468637 706895 NaN NaN \n", "84961 2 NaN 15.102160 3620634 NaN NaN \n", "84962 5 NaN 15.261124 4244456 NaN NaN \n", "84963 2 NaN 13.518482 743023 NaN NaN \n", "84964 1 NaN 15.427470 5012622 NaN NaN \n", "84965 2 NaN 15.528412 5545025 NaN NaN \n", "84966 4 NaN 15.915062 8162508 NaN NaN \n", "84967 7 NaN 17.185632 29082050 NaN NaN \n", "84968 2 NaN 12.365475 234562 NaN NaN \n", "84969 5 NaN 18.372664 95311778 NaN NaN \n", "84970 6 NaN 14.720467 2471824 NaN NaN \n", "84971 4 NaN 16.332460 12390738 NaN NaN \n", "84972 4 NaN 16.950810 22995526 NaN NaN \n", "84973 3 NaN 15.388586 4821453 NaN NaN \n", "84974 4 NaN 18.615514 121510885 NaN NaN \n", "84975 5 NaN 16.979085 23655000 NaN NaN \n", "84976 4 NaN 16.324828 12296531 NaN NaN \n", "84977 6 NaN 17.201226 29539116 NaN NaN \n", "84978 NaN NaN NaN NaN NaN NaN \n", "84979 5 NaN 14.797452 2669634 NaN NaN \n", "84980 NaN NaN NaN NaN NaN NaN \n", "84981 3 NaN 14.049514 1263649 NaN NaN \n", "84982 4 NaN 15.017164 3325610 NaN NaN \n", "84983 3 NaN 14.255583 1552819 NaN NaN \n", "84984 2 NaN 13.234134 559128 NaN NaN \n", "84985 3 NaN 14.981906 3210399 NaN NaN \n", "84986 3 NaN 13.719331 908301 NaN NaN \n", "84987 3 NaN 14.674448 2360651 NaN NaN \n", "84988 3 NaN 13.786554 971459 NaN NaN \n", "84989 2 NaN 13.057601 468645 NaN NaN \n", "84990 2 NaN 12.538616 278902 NaN NaN \n", "84991 2 NaN 13.688772 880964 NaN NaN \n", "84992 1 NaN 14.011518 1216536 NaN NaN \n", "84993 4 NaN 15.368564 4725877 NaN NaN \n", "84994 5 NaN 13.373186 642541 NaN NaN \n", "84995 3 NaN 13.924612 1115275 NaN NaN \n", "84996 3 NaN 14.715918 2460605 NaN NaN \n", "84997 3 NaN 14.583403 2155219 NaN NaN \n", "84998 3 NaN 15.492194 5347792 NaN NaN \n", "84999 1 NaN 14.722835 2477684 NaN NaN \n", "85000 2 NaN 14.459109 1903318 NaN NaN \n", "85001 4 NaN 14.830139 2758339 NaN NaN \n", "85002 3 NaN 13.072391 475628 NaN NaN \n", "85003 4 NaN 15.892490 7980330 NaN NaN \n", "85004 1 NaN 14.111410 1344335 NaN NaN \n", "\n", " category_Animals category_Arts, Culture, Humanities \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "84960 NaN NaN \n", "84961 NaN NaN \n", "84962 NaN NaN \n", "84963 NaN NaN \n", "84964 NaN NaN \n", "84965 NaN NaN \n", "84966 NaN NaN \n", "84967 NaN NaN \n", "84968 NaN NaN \n", "84969 NaN NaN \n", "84970 NaN NaN \n", "84971 NaN NaN \n", "84972 NaN NaN \n", "84973 NaN NaN \n", "84974 NaN NaN \n", "84975 NaN NaN \n", "84976 NaN NaN \n", "84977 NaN NaN \n", "84978 NaN NaN \n", "84979 NaN NaN \n", "84980 NaN NaN \n", "84981 NaN NaN \n", "84982 NaN NaN \n", "84983 NaN NaN \n", "84984 NaN NaN \n", "84985 NaN NaN \n", "84986 NaN NaN \n", "84987 NaN NaN \n", "84988 NaN NaN \n", "84989 NaN NaN \n", "84990 NaN NaN \n", "84991 NaN NaN \n", "84992 NaN NaN \n", "84993 NaN NaN \n", "84994 NaN NaN \n", "84995 NaN NaN \n", "84996 NaN NaN \n", "84997 NaN NaN \n", "84998 NaN NaN \n", "84999 NaN NaN \n", "85000 NaN NaN \n", "85001 NaN NaN \n", "85002 NaN NaN \n", "85003 NaN NaN \n", "85004 NaN NaN \n", "\n", " category_Community Development category_Education \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "84960 NaN NaN \n", "84961 NaN NaN \n", "84962 NaN NaN \n", "84963 NaN NaN \n", "84964 NaN NaN \n", "84965 NaN NaN \n", "84966 NaN NaN \n", "84967 NaN NaN \n", "84968 NaN NaN \n", "84969 NaN NaN \n", "84970 NaN NaN \n", "84971 NaN NaN \n", "84972 NaN NaN \n", "84973 NaN NaN \n", "84974 NaN NaN \n", "84975 NaN NaN \n", "84976 NaN NaN \n", "84977 NaN NaN \n", "84978 NaN NaN \n", "84979 NaN NaN \n", "84980 NaN NaN \n", "84981 NaN NaN \n", "84982 NaN NaN \n", "84983 NaN NaN \n", "84984 NaN NaN \n", "84985 NaN NaN \n", "84986 NaN NaN \n", "84987 NaN NaN \n", "84988 NaN NaN \n", "84989 NaN NaN \n", "84990 NaN NaN \n", "84991 NaN NaN \n", "84992 NaN NaN \n", "84993 NaN NaN \n", "84994 NaN NaN \n", "84995 NaN NaN \n", "84996 NaN NaN \n", "84997 NaN NaN \n", "84998 NaN NaN \n", "84999 NaN NaN \n", "85000 NaN NaN \n", "85001 NaN NaN \n", "85002 NaN NaN \n", "85003 NaN NaN \n", "85004 NaN NaN \n", "\n", " category_Environment category_Health category_Human Services \\\n", "84958 NaN NaN NaN \n", "84959 NaN NaN NaN \n", "84960 NaN NaN NaN \n", "84961 NaN NaN NaN \n", "84962 NaN NaN NaN \n", "84963 NaN NaN NaN \n", "84964 NaN NaN NaN \n", "84965 NaN NaN NaN \n", "84966 NaN NaN NaN \n", "84967 NaN NaN NaN \n", "84968 NaN NaN NaN \n", "84969 NaN NaN NaN \n", "84970 NaN NaN NaN \n", "84971 NaN NaN NaN \n", "84972 NaN NaN NaN \n", "84973 NaN NaN NaN \n", "84974 NaN NaN NaN \n", "84975 NaN NaN NaN \n", "84976 NaN NaN NaN \n", "84977 NaN NaN NaN \n", "84978 NaN NaN NaN \n", "84979 NaN NaN NaN \n", "84980 NaN NaN NaN \n", "84981 NaN NaN NaN \n", "84982 NaN NaN NaN \n", "84983 NaN NaN NaN \n", "84984 NaN NaN NaN \n", "84985 NaN NaN NaN \n", "84986 NaN NaN NaN \n", "84987 NaN NaN NaN \n", "84988 NaN NaN NaN \n", "84989 NaN NaN NaN \n", "84990 NaN NaN NaN \n", "84991 NaN NaN NaN \n", "84992 NaN NaN NaN \n", "84993 NaN NaN NaN \n", "84994 NaN NaN NaN \n", "84995 NaN NaN NaN \n", "84996 NaN NaN NaN \n", "84997 NaN NaN NaN \n", "84998 NaN NaN NaN \n", "84999 NaN NaN NaN \n", "85000 NaN NaN NaN \n", "85001 NaN NaN NaN \n", "85002 NaN NaN NaN \n", "85003 NaN NaN NaN \n", "85004 NaN NaN NaN \n", "\n", " category_Human and Civil Rights category_International \\\n", "84958 NaN NaN \n", "84959 NaN NaN \n", "84960 NaN NaN \n", "84961 NaN NaN \n", "84962 NaN NaN \n", "84963 NaN NaN \n", "84964 NaN NaN \n", "84965 NaN NaN \n", "84966 NaN NaN \n", "84967 NaN NaN \n", "84968 NaN NaN \n", "84969 NaN NaN \n", "84970 NaN NaN \n", "84971 NaN NaN \n", "84972 NaN NaN \n", "84973 NaN NaN \n", "84974 NaN NaN \n", "84975 NaN NaN \n", "84976 NaN NaN \n", "84977 NaN NaN \n", "84978 NaN NaN \n", "84979 NaN NaN \n", "84980 NaN NaN \n", "84981 NaN NaN \n", "84982 NaN NaN \n", "84983 NaN NaN \n", "84984 NaN NaN \n", "84985 NaN NaN \n", "84986 NaN NaN \n", "84987 NaN NaN \n", "84988 NaN NaN \n", "84989 NaN NaN \n", "84990 NaN NaN \n", "84991 NaN NaN \n", "84992 NaN NaN \n", "84993 NaN NaN \n", "84994 NaN NaN \n", "84995 NaN NaN \n", "84996 NaN NaN \n", "84997 NaN NaN \n", "84998 NaN NaN \n", "84999 NaN NaN \n", "85000 NaN NaN \n", "85001 NaN NaN \n", "85002 NaN NaN \n", "85003 NaN NaN \n", "85004 NaN NaN \n", "\n", " category_Religion category_Research and Public Policy \n", "84958 NaN NaN \n", "84959 NaN NaN \n", "84960 NaN NaN \n", "84961 NaN NaN \n", "84962 NaN NaN \n", "84963 NaN NaN \n", "84964 NaN NaN \n", "84965 NaN NaN \n", "84966 NaN NaN \n", "84967 NaN NaN \n", "84968 NaN NaN \n", "84969 NaN NaN \n", "84970 NaN NaN \n", "84971 NaN NaN \n", "84972 NaN NaN \n", "84973 NaN NaN \n", "84974 NaN NaN \n", "84975 NaN NaN \n", "84976 NaN NaN \n", "84977 NaN NaN \n", "84978 NaN NaN \n", "84979 NaN NaN \n", "84980 NaN NaN \n", "84981 NaN NaN \n", "84982 NaN NaN \n", "84983 NaN NaN \n", "84984 NaN NaN \n", "84985 NaN NaN \n", "84986 NaN NaN \n", "84987 NaN NaN \n", "84988 NaN NaN \n", "84989 NaN NaN \n", "84990 NaN NaN \n", "84991 NaN NaN \n", "84992 NaN NaN \n", "84993 NaN NaN \n", "84994 NaN NaN \n", "84995 NaN NaN \n", "84996 NaN NaN \n", "84997 NaN NaN \n", "84998 NaN NaN \n", "84999 NaN NaN \n", "85000 NaN NaN \n", "85001 NaN NaN \n", "85002 NaN NaN \n", "85003 NaN NaN \n", "85004 NaN NaN " ] }, "execution_count": 1352, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['_merge_47']=='right_only'][cols]" ] }, { "cell_type": "code", "execution_count": 1353, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data849580.0572402.323021e-0100.0000000.0000000.0000001.000000e+00
2016_data849580.0977422.969678e-0100.0000000.0000000.0000001.000000e+00
donor_advisory838970.0046606.810882e-0200.0000000.0000000.0000001.000000e+00
donor_advisory_2016849580.0043326.567222e-0200.0000000.0000000.0000001.000000e+00
donor_advisory_2011_to_2016849580.0143841.190666e-0100.0000000.0000000.0000001.000000e+00
conflict_of_interest_policy_v2219390.9631711.883470e-0101.0000001.0000001.0000001.000000e+00
records_retention_policy_v2219390.8788463.263137e-0101.0000001.0000001.0000001.000000e+00
whistleblower_policy_v2219390.8816723.230036e-0101.0000001.0000001.0000001.000000e+00
SOX_policies219392.7236886.913304e-0103.0000003.0000003.0000003.000000e+00
SOX_policies_binary219390.9720591.648079e-0101.0000001.0000001.0000001.000000e+00
SOX_policies_all_binary219390.8347243.714383e-0101.0000001.0000001.0000001.000000e+00
program_efficiency219380.8052071.040898e-0100.7564330.8177200.8711051.010186e+00
complexity850030.3745871.222913e+0000.0000000.0000000.0000008.000000e+00
age8383039.5081471.931018e+01024.00000035.00000052.0000001.080000e+02
total_revenue_logged2193915.8592941.707720e+00014.77209315.69728616.8600182.204279e+01
tot_rev1100949733944.3467161.580116e+08-2182650255421305.00000015121649.00000042128948.0000003.741635e+09
category_Animals849580.0725652.594231e-0100.0000000.0000000.0000001.000000e+00
category_Arts, Culture, Humanities849580.1355853.423490e-0100.0000000.0000000.0000001.000000e+00
category_Community Development849580.0877372.829144e-0100.0000000.0000000.0000001.000000e+00
category_Education849580.0611832.396674e-0100.0000000.0000000.0000001.000000e+00
category_Environment849580.0598652.372377e-0100.0000000.0000000.0000001.000000e+00
category_Health849580.1153863.194896e-0100.0000000.0000000.0000001.000000e+00
category_Human Services849580.2487354.322822e-0100.0000000.0000000.0000001.000000e+00
category_Human and Civil Rights849580.0381841.916403e-0100.0000000.0000000.0000001.000000e+00
category_International849580.0849832.788583e-0100.0000000.0000000.0000001.000000e+00
category_Religion849580.0595822.367129e-0100.0000000.0000000.0000001.000000e+00
category_Research and Public Policy849580.0242711.538896e-0100.0000000.0000000.0000001.000000e+00
\n", "
" ], "text/plain": [ " count mean std \\\n", "2011_data 84958 0.057240 2.323021e-01 \n", "2016_data 84958 0.097742 2.969678e-01 \n", "donor_advisory 83897 0.004660 6.810882e-02 \n", "donor_advisory_2016 84958 0.004332 6.567222e-02 \n", "donor_advisory_2011_to_2016 84958 0.014384 1.190666e-01 \n", "conflict_of_interest_policy_v2 21939 0.963171 1.883470e-01 \n", "records_retention_policy_v2 21939 0.878846 3.263137e-01 \n", "whistleblower_policy_v2 21939 0.881672 3.230036e-01 \n", "SOX_policies 21939 2.723688 6.913304e-01 \n", "SOX_policies_binary 21939 0.972059 1.648079e-01 \n", "SOX_policies_all_binary 21939 0.834724 3.714383e-01 \n", "program_efficiency 21938 0.805207 1.040898e-01 \n", "complexity 85003 0.374587 1.222913e+00 \n", "age 83830 39.508147 1.931018e+01 \n", "total_revenue_logged 21939 15.859294 1.707720e+00 \n", "tot_rev 11009 49733944.346716 1.580116e+08 \n", "category_Animals 84958 0.072565 2.594231e-01 \n", "category_Arts, Culture, Humanities 84958 0.135585 3.423490e-01 \n", "category_Community Development 84958 0.087737 2.829144e-01 \n", "category_Education 84958 0.061183 2.396674e-01 \n", "category_Environment 84958 0.059865 2.372377e-01 \n", "category_Health 84958 0.115386 3.194896e-01 \n", "category_Human Services 84958 0.248735 4.322822e-01 \n", "category_Human and Civil Rights 84958 0.038184 1.916403e-01 \n", "category_International 84958 0.084983 2.788583e-01 \n", "category_Religion 84958 0.059582 2.367129e-01 \n", "category_Research and Public Policy 84958 0.024271 1.538896e-01 \n", "\n", " min 25% \\\n", "2011_data 0 0.000000 \n", "2016_data 0 0.000000 \n", "donor_advisory 0 0.000000 \n", "donor_advisory_2016 0 0.000000 \n", "donor_advisory_2011_to_2016 0 0.000000 \n", "conflict_of_interest_policy_v2 0 1.000000 \n", "records_retention_policy_v2 0 1.000000 \n", "whistleblower_policy_v2 0 1.000000 \n", "SOX_policies 0 3.000000 \n", "SOX_policies_binary 0 1.000000 \n", "SOX_policies_all_binary 0 1.000000 \n", "program_efficiency 0 0.756433 \n", "complexity 0 0.000000 \n", "age 0 24.000000 \n", "total_revenue_logged 0 14.772093 \n", "tot_rev -218265025 5421305.000000 \n", "category_Animals 0 0.000000 \n", "category_Arts, Culture, Humanities 0 0.000000 \n", "category_Community Development 0 0.000000 \n", "category_Education 0 0.000000 \n", "category_Environment 0 0.000000 \n", "category_Health 0 0.000000 \n", "category_Human Services 0 0.000000 \n", "category_Human and Civil Rights 0 0.000000 \n", "category_International 0 0.000000 \n", "category_Religion 0 0.000000 \n", "category_Research and Public Policy 0 0.000000 \n", "\n", " 50% 75% \\\n", "2011_data 0.000000 0.000000 \n", "2016_data 0.000000 0.000000 \n", "donor_advisory 0.000000 0.000000 \n", "donor_advisory_2016 0.000000 0.000000 \n", "donor_advisory_2011_to_2016 0.000000 0.000000 \n", "conflict_of_interest_policy_v2 1.000000 1.000000 \n", "records_retention_policy_v2 1.000000 1.000000 \n", "whistleblower_policy_v2 1.000000 1.000000 \n", "SOX_policies 3.000000 3.000000 \n", "SOX_policies_binary 1.000000 1.000000 \n", "SOX_policies_all_binary 1.000000 1.000000 \n", "program_efficiency 0.817720 0.871105 \n", "complexity 0.000000 0.000000 \n", "age 35.000000 52.000000 \n", "total_revenue_logged 15.697286 16.860018 \n", "tot_rev 15121649.000000 42128948.000000 \n", "category_Animals 0.000000 0.000000 \n", "category_Arts, Culture, Humanities 0.000000 0.000000 \n", "category_Community Development 0.000000 0.000000 \n", "category_Education 0.000000 0.000000 \n", "category_Environment 0.000000 0.000000 \n", "category_Health 0.000000 0.000000 \n", "category_Human Services 0.000000 0.000000 \n", "category_Human and Civil Rights 0.000000 0.000000 \n", "category_International 0.000000 0.000000 \n", "category_Religion 0.000000 0.000000 \n", "category_Research and Public Policy 0.000000 0.000000 \n", "\n", " max \n", "2011_data 1.000000e+00 \n", "2016_data 1.000000e+00 \n", "donor_advisory 1.000000e+00 \n", "donor_advisory_2016 1.000000e+00 \n", "donor_advisory_2011_to_2016 1.000000e+00 \n", "conflict_of_interest_policy_v2 1.000000e+00 \n", "records_retention_policy_v2 1.000000e+00 \n", "whistleblower_policy_v2 1.000000e+00 \n", "SOX_policies 3.000000e+00 \n", "SOX_policies_binary 1.000000e+00 \n", "SOX_policies_all_binary 1.000000e+00 \n", "program_efficiency 1.010186e+00 \n", "complexity 8.000000e+00 \n", "age 1.080000e+02 \n", "total_revenue_logged 2.204279e+01 \n", "tot_rev 3.741635e+09 \n", "category_Animals 1.000000e+00 \n", "category_Arts, Culture, Humanities 1.000000e+00 \n", "category_Community Development 1.000000e+00 \n", "category_Education 1.000000e+00 \n", "category_Environment 1.000000e+00 \n", "category_Health 1.000000e+00 \n", "category_Human Services 1.000000e+00 \n", "category_Human and Civil Rights 1.000000e+00 \n", "category_International 1.000000e+00 \n", "category_Religion 1.000000e+00 \n", "category_Research and Public Policy 1.000000e+00 " ] }, "execution_count": 1353, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[cols].describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Save DF" ] }, { "cell_type": "code", "execution_count": 1355, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "85005\n" ] } ], "source": [ "print len(df)\n", "df.to_pickle('Merged CN dataset with Age, State, Category, Total Revenues, Efficiency, Complexity, SOX, Donor Advisory (with added 990 data).pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Merge in e-file data\n", "NOTE: THERE WAS A STRANGE NUMPY IMPORT ERROR WHEN I ATTEMPTED TO READ IN THE *.PKL VERSION" ] }, { "cell_type": "code", "execution_count": 1389, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "19\n", "538\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINOrganizationNameURLSubmittedOnTaxPeriodFYEwhistleblower_policyconflict_of_interest_policyrecords_retention_policySOX_policiesSOX_policies_binarySOX_policies_all_binarytot_revtot_rev_no_negtotal_revenue_loggedtotal_expensesprogram_expensesprogram_efficiencycomplexity
0030498214NEWARK NOW INChttps://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml2013-12-31201212FY20120112102515399251539914.737942210092215986260.7609162
1030498214NEWARK NOW INChttps://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml2013-02-14201112FY20110112102161209216120914.586178248131017157930.6914870
\n", "
" ], "text/plain": [ " EIN OrganizationName \\\n", "0 030498214 NEWARK NOW INC \n", "1 030498214 NEWARK NOW INC \n", "\n", " URL \\\n", "0 https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml \n", "1 https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml \n", "\n", " SubmittedOn TaxPeriod FYE whistleblower_policy \\\n", "0 2013-12-31 201212 FY2012 0 \n", "1 2013-02-14 201112 FY2011 0 \n", "\n", " conflict_of_interest_policy records_retention_policy SOX_policies \\\n", "0 1 1 2 \n", "1 1 1 2 \n", "\n", " SOX_policies_binary SOX_policies_all_binary tot_rev tot_rev_no_neg \\\n", "0 1 0 2515399 2515399 \n", "1 1 0 2161209 2161209 \n", "\n", " total_revenue_logged total_expenses program_expenses program_efficiency \\\n", "0 14.737942 2100922 1598626 0.760916 \n", "1 14.586178 2481310 1715793 0.691487 \n", "\n", " complexity \n", "0 2 \n", "1 0 " ] }, "execution_count": 1389, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#dfe = pd.read_pickle('efile 990s.pkl')\n", "#dfe = pd.read_excel('e-file 990s for 2016 donor advisory organizations, v4 (key columns only).xls', \n", "# dtype={'EIN': object})\n", "dfe = pd.read_csv('e-file 990s for 2016 donor advisory organizations, v4 (key columns only).csv', \n", " dtype={'EIN': object})\n", "print len(dfe.columns)\n", "print len(dfe)\n", "dfe[:2]" ] }, { "cell_type": "code", "execution_count": 1393, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
TaxPeriod538201236.8698881.418918e+02201006201112.000000201212.000000201312.0000002.015090e+05
whistleblower_policy5380.5892194.924334e-0100.0000001.0000001.0000001.000000e+00
conflict_of_interest_policy5380.8438663.633200e-0101.0000001.0000001.0000001.000000e+00
records_retention_policy5380.7118964.533013e-0100.0000001.0000001.0000001.000000e+00
SOX_policies5382.1449811.097391e+0001.0000003.0000003.0000003.000000e+00
SOX_policies_binary5380.8605953.466912e-0101.0000001.0000001.0000001.000000e+00
SOX_policies_all_binary5380.5446104.984695e-0100.0000001.0000001.0000001.000000e+00
tot_rev53858570634.4776953.234413e+08-2182650251320586.2500004261393.50000013341627.0000003.741635e+09
tot_rev_no_neg53858976436.4535323.232302e+0811320586.2500004261393.50000013341627.0000003.741635e+09
total_revenue_logged53815.2496682.287752e+00014.09358415.26509916.4063842.204279e+01
total_expenses53856322737.5483273.003909e+0801375425.7500004421643.50000012851086.7500003.287631e+09
program_expenses52449944834.8568702.653191e+083884946640.2500003140793.00000010166690.7500002.884879e+09
program_efficiency5380.7589502.257745e-0100.6945520.8398740.9007391.000000e+00
complexity5382.4312271.278642e+0002.0000002.0000003.0000007.000000e+00
\n", "
" ], "text/plain": [ " count mean std min \\\n", "TaxPeriod 538 201236.869888 1.418918e+02 201006 \n", "whistleblower_policy 538 0.589219 4.924334e-01 0 \n", "conflict_of_interest_policy 538 0.843866 3.633200e-01 0 \n", "records_retention_policy 538 0.711896 4.533013e-01 0 \n", "SOX_policies 538 2.144981 1.097391e+00 0 \n", "SOX_policies_binary 538 0.860595 3.466912e-01 0 \n", "SOX_policies_all_binary 538 0.544610 4.984695e-01 0 \n", "tot_rev 538 58570634.477695 3.234413e+08 -218265025 \n", "tot_rev_no_neg 538 58976436.453532 3.232302e+08 1 \n", "total_revenue_logged 538 15.249668 2.287752e+00 0 \n", "total_expenses 538 56322737.548327 3.003909e+08 0 \n", "program_expenses 524 49944834.856870 2.653191e+08 3884 \n", "program_efficiency 538 0.758950 2.257745e-01 0 \n", "complexity 538 2.431227 1.278642e+00 0 \n", "\n", " 25% 50% 75% \\\n", "TaxPeriod 201112.000000 201212.000000 201312.000000 \n", "whistleblower_policy 0.000000 1.000000 1.000000 \n", "conflict_of_interest_policy 1.000000 1.000000 1.000000 \n", "records_retention_policy 0.000000 1.000000 1.000000 \n", "SOX_policies 1.000000 3.000000 3.000000 \n", "SOX_policies_binary 1.000000 1.000000 1.000000 \n", "SOX_policies_all_binary 0.000000 1.000000 1.000000 \n", "tot_rev 1320586.250000 4261393.500000 13341627.000000 \n", "tot_rev_no_neg 1320586.250000 4261393.500000 13341627.000000 \n", "total_revenue_logged 14.093584 15.265099 16.406384 \n", "total_expenses 1375425.750000 4421643.500000 12851086.750000 \n", "program_expenses 946640.250000 3140793.000000 10166690.750000 \n", "program_efficiency 0.694552 0.839874 0.900739 \n", "complexity 2.000000 2.000000 3.000000 \n", "\n", " max \n", "TaxPeriod 2.015090e+05 \n", "whistleblower_policy 1.000000e+00 \n", "conflict_of_interest_policy 1.000000e+00 \n", "records_retention_policy 1.000000e+00 \n", "SOX_policies 3.000000e+00 \n", "SOX_policies_binary 1.000000e+00 \n", "SOX_policies_all_binary 1.000000e+00 \n", "tot_rev 3.741635e+09 \n", "tot_rev_no_neg 3.741635e+09 \n", "total_revenue_logged 2.204279e+01 \n", "total_expenses 3.287631e+09 \n", "program_expenses 2.884879e+09 \n", "program_efficiency 1.000000e+00 \n", "complexity 7.000000e+00 " ] }, "execution_count": 1393, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfe.describe().T" ] }, { "cell_type": "code", "execution_count": 1392, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "EIN object\n", "OrganizationName object\n", "URL object\n", "SubmittedOn object\n", "TaxPeriod int64\n", "FYE object\n", "whistleblower_policy int64\n", "conflict_of_interest_policy int64\n", "records_retention_policy int64\n", "SOX_policies int64\n", "SOX_policies_binary int64\n", "SOX_policies_all_binary int64\n", "tot_rev int64\n", "tot_rev_no_neg int64\n", "total_revenue_logged float64\n", "total_expenses int64\n", "program_expenses float64\n", "program_efficiency float64\n", "complexity int64\n", "dtype: object" ] }, "execution_count": 1392, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfe.dtypes" ] }, { "cell_type": "code", "execution_count": 1371, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#dfe['EIN'] = dfe['EIN'].astype('str')" ] }, { "cell_type": "code", "execution_count": 1394, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['EIN', 'OrganizationName', 'URL', 'SubmittedOn', 'TaxPeriod', 'FYE', 'whistleblower_policy', 'conflict_of_interest_policy', 'records_retention_policy', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'tot_rev', 'tot_rev_no_neg', 'total_revenue_logged', 'total_expenses', 'program_expenses', 'program_efficiency', 'complexity']\n" ] } ], "source": [ "print dfe.columns.tolist()" ] }, { "cell_type": "code", "execution_count": 1395, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINOrganizationName_efileURL_efileSubmittedOn_efileTaxPeriod_efileFYEwhistleblower_policy_efileconflict_of_interest_policy_efilerecords_retention_policy_efileSOX_policies_efileSOX_policies_binary_efileSOX_policies_all_binary_efiletot_rev_efiletot_rev_no_neg_efiletotal_revenue_logged_efiletotal_expenses_efileprogram_expenses_efileprogram_efficiency_efilecomplexity_efile
0030498214NEWARK NOW INChttps://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml2013-12-31201212FY20120112102515399251539914.737942210092215986260.7609162
1030498214NEWARK NOW INChttps://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml2013-02-14201112FY20110112102161209216120914.586178248131017157930.6914870
2030498214NEWARK NOW INChttps://s3.amazonaws.com/irs-form-990/201220909349300327_public.xml2012-05-25201012FY20100112103990564399056415.199443391264329307850.7490550
\n", "
" ], "text/plain": [ " EIN OrganizationName_efile \\\n", "0 030498214 NEWARK NOW INC \n", "1 030498214 NEWARK NOW INC \n", "2 030498214 NEWARK NOW INC \n", "\n", " URL_efile \\\n", "0 https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml \n", "1 https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml \n", "2 https://s3.amazonaws.com/irs-form-990/201220909349300327_public.xml \n", "\n", " SubmittedOn_efile TaxPeriod_efile FYE whistleblower_policy_efile \\\n", "0 2013-12-31 201212 FY2012 0 \n", "1 2013-02-14 201112 FY2011 0 \n", "2 2012-05-25 201012 FY2010 0 \n", "\n", " conflict_of_interest_policy_efile records_retention_policy_efile \\\n", "0 1 1 \n", "1 1 1 \n", "2 1 1 \n", "\n", " SOX_policies_efile SOX_policies_binary_efile \\\n", "0 2 1 \n", "1 2 1 \n", "2 2 1 \n", "\n", " SOX_policies_all_binary_efile tot_rev_efile tot_rev_no_neg_efile \\\n", "0 0 2515399 2515399 \n", "1 0 2161209 2161209 \n", "2 0 3990564 3990564 \n", "\n", " total_revenue_logged_efile total_expenses_efile program_expenses_efile \\\n", "0 14.737942 2100922 1598626 \n", "1 14.586178 2481310 1715793 \n", "2 15.199443 3912643 2930785 \n", "\n", " program_efficiency_efile complexity_efile \n", "0 0.760916 2 \n", "1 0.691487 0 \n", "2 0.749055 0 " ] }, "execution_count": 1395, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfe.columns = ['EIN', 'OrganizationName_efile', 'URL_efile', 'SubmittedOn_efile', 'TaxPeriod_efile', 'FYE', \n", " 'whistleblower_policy_efile', 'conflict_of_interest_policy_efile', 'records_retention_policy_efile', \n", " 'SOX_policies_efile', 'SOX_policies_binary_efile', 'SOX_policies_all_binary_efile', \n", " 'tot_rev_efile', 'tot_rev_no_neg_efile', 'total_revenue_logged_efile', \n", " 'total_expenses_efile', 'program_expenses_efile', 'program_efficiency_efile', 'complexity_efile'] \n", "dfe[:3]" ] }, { "cell_type": "code", "execution_count": 1396, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINFYEOrganizationName_efileURL_efileSubmittedOn_efileTaxPeriod_efilewhistleblower_policy_efileconflict_of_interest_policy_efilerecords_retention_policy_efileSOX_policies_efileSOX_policies_binary_efileSOX_policies_all_binary_efiletot_rev_efiletot_rev_no_neg_efiletotal_revenue_logged_efileprogram_expenses_efiletotal_expenses_efileprogram_efficiency_efilecomplexity_efile
0030498214FY2012NEWARK NOW INChttps://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml2013-12-312012120112102515399251539914.737942159862621009220.7609162
1030498214FY2011NEWARK NOW INChttps://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml2013-02-142011120112102161209216120914.586178171579324813100.6914870
\n", "
" ], "text/plain": [ " EIN FYE OrganizationName_efile \\\n", "0 030498214 FY2012 NEWARK NOW INC \n", "1 030498214 FY2011 NEWARK NOW INC \n", "\n", " URL_efile \\\n", "0 https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml \n", "1 https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml \n", "\n", " SubmittedOn_efile TaxPeriod_efile whistleblower_policy_efile \\\n", "0 2013-12-31 201212 0 \n", "1 2013-02-14 201112 0 \n", "\n", " conflict_of_interest_policy_efile records_retention_policy_efile \\\n", "0 1 1 \n", "1 1 1 \n", "\n", " SOX_policies_efile SOX_policies_binary_efile \\\n", "0 2 1 \n", "1 2 1 \n", "\n", " SOX_policies_all_binary_efile tot_rev_efile tot_rev_no_neg_efile \\\n", "0 0 2515399 2515399 \n", "1 0 2161209 2161209 \n", "\n", " total_revenue_logged_efile program_expenses_efile total_expenses_efile \\\n", "0 14.737942 1598626 2100922 \n", "1 14.586178 1715793 2481310 \n", "\n", " program_efficiency_efile complexity_efile \n", "0 0.760916 2 \n", "1 0.691487 0 " ] }, "execution_count": 1396, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfe = dfe[['EIN', 'FYE', 'OrganizationName_efile', 'URL_efile', 'SubmittedOn_efile', 'TaxPeriod_efile', \n", " 'whistleblower_policy_efile', 'conflict_of_interest_policy_efile', 'records_retention_policy_efile', \n", " 'SOX_policies_efile', 'SOX_policies_binary_efile', 'SOX_policies_all_binary_efile', \n", " 'tot_rev_efile', 'tot_rev_no_neg_efile', 'total_revenue_logged_efile', \n", " 'program_expenses_efile', 'total_expenses_efile', 'program_efficiency_efile', 'complexity_efile']]\n", "dfe[:2]" ] }, { "cell_type": "code", "execution_count": 1397, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dfe.to_pickle('dfe.pkl')" ] }, { "cell_type": "code", "execution_count": 1398, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "538" ] }, "execution_count": 1398, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(dfe)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Merge into main dataframe" ] }, { "cell_type": "code", "execution_count": 1380, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#df.to_pickle('df.pkl')" ] }, { "cell_type": "code", "execution_count": 1399, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYEForm 990 FYEratings_system2011_data2016_datadonor_advisorydonor_advisory_2016donor_advisory_2011_to_2016conflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexityagetotal_revenue_loggedtot_revstatecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
016722020503776currentcurrentcurrent01111NaNNaNNaNNaNNaNNaNNaN05NaNNaNNHHuman Services00000010000
110166043314346FY20132013-12CN 2.1010011113110.8708652813.549098NaNMAHealth00000100000
\n", "
" ], "text/plain": [ " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "0 16722 020503776 current current current 0 \n", "1 10166 043314346 FY2013 2013-12 CN 2.1 0 \n", "\n", " 2016_data donor_advisory donor_advisory_2016 \\\n", "0 1 1 1 \n", "1 1 0 0 \n", "\n", " donor_advisory_2011_to_2016 conflict_of_interest_policy_v2 \\\n", "0 1 NaN \n", "1 1 1 \n", "\n", " records_retention_policy_v2 whistleblower_policy_v2 SOX_policies \\\n", "0 NaN NaN NaN \n", "1 1 1 3 \n", "\n", " SOX_policies_binary SOX_policies_all_binary program_efficiency \\\n", "0 NaN NaN NaN \n", "1 1 1 0.870865 \n", "\n", " complexity age total_revenue_logged tot_rev state category \\\n", "0 0 5 NaN NaN NH Human Services \n", "1 2 8 13.549098 NaN MA Health \n", "\n", " category_Animals category_Arts, Culture, Humanities \\\n", "0 0 0 \n", "1 0 0 \n", "\n", " category_Community Development category_Education category_Environment \\\n", "0 0 0 0 \n", "1 0 0 0 \n", "\n", " category_Health category_Human Services category_Human and Civil Rights \\\n", "0 0 1 0 \n", "1 1 0 0 \n", "\n", " category_International category_Religion \\\n", "0 0 0 \n", "1 0 0 \n", "\n", " category_Research and Public Policy \n", "0 0 \n", "1 0 " ] }, "execution_count": 1399, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[cols][:2]" ] }, { "cell_type": "code", "execution_count": 1400, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINFYEOrganizationName_efileURL_efileSubmittedOn_efileTaxPeriod_efilewhistleblower_policy_efileconflict_of_interest_policy_efilerecords_retention_policy_efileSOX_policies_efileSOX_policies_binary_efileSOX_policies_all_binary_efiletot_rev_efiletot_rev_no_neg_efiletotal_revenue_logged_efileprogram_expenses_efiletotal_expenses_efileprogram_efficiency_efilecomplexity_efile
0030498214FY2012NEWARK NOW INChttps://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml2013-12-312012120112102515399251539914.737942159862621009220.7609162
1030498214FY2011NEWARK NOW INChttps://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml2013-02-142011120112102161209216120914.586178171579324813100.6914870
\n", "
" ], "text/plain": [ " EIN FYE OrganizationName_efile \\\n", "0 030498214 FY2012 NEWARK NOW INC \n", "1 030498214 FY2011 NEWARK NOW INC \n", "\n", " URL_efile \\\n", "0 https://s3.amazonaws.com/irs-form-990/201303199349303505_public.xml \n", "1 https://s3.amazonaws.com/irs-form-990/201320249349300417_public.xml \n", "\n", " SubmittedOn_efile TaxPeriod_efile whistleblower_policy_efile \\\n", "0 2013-12-31 201212 0 \n", "1 2013-02-14 201112 0 \n", "\n", " conflict_of_interest_policy_efile records_retention_policy_efile \\\n", "0 1 1 \n", "1 1 1 \n", "\n", " SOX_policies_efile SOX_policies_binary_efile \\\n", "0 2 1 \n", "1 2 1 \n", "\n", " SOX_policies_all_binary_efile tot_rev_efile tot_rev_no_neg_efile \\\n", "0 0 2515399 2515399 \n", "1 0 2161209 2161209 \n", "\n", " total_revenue_logged_efile program_expenses_efile total_expenses_efile \\\n", "0 14.737942 1598626 2100922 \n", "1 14.586178 1715793 2481310 \n", "\n", " program_efficiency_efile complexity_efile \n", "0 0.760916 2 \n", "1 0.691487 0 " ] }, "execution_count": 1400, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfe[:2]" ] }, { "cell_type": "code", "execution_count": 1403, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "286\n", "85005\n", "303\n", "85006\n", "303\n", "85401\n", "304\n", "85401\n" ] } ], "source": [ "print len(df.columns)\n", "print len(df)\n", "print len(pd.merge(df, dfe, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='left').columns)\n", "print len(pd.merge(df, dfe, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='left'))\n", "print len(pd.merge(df, dfe, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='outer').columns)\n", "print len(pd.merge(df, dfe, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='outer'))\n", "df = pd.merge(df, dfe, left_on=['EIN','FYE'], right_on=['EIN','FYE'], how='outer', indicator=True)\n", "print len(df.columns)\n", "print len(df)" ] }, { "cell_type": "code", "execution_count": 1404, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "left_only 84860\n", "right_only 395\n", "both 146\n", "dtype: int64" ] }, "execution_count": 1404, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.rename(columns={'_merge':'_merge_efile'}, inplace=True)\n", "df['_merge_efile'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Save DF" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "print len(df)\n", "df.to_pickle('Merged CN dataset with Age, State, Category, Total Revenues, Efficiency, Complexity, SOX, Donor Advisory (with added 990 data).pkl')" ] }, { "cell_type": "code", "execution_count": 1442, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'tot_rev', 'state', 'category', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n", "['org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'tot_rev', 'state', 'category', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy', 'whistleblower_policy_efile', 'conflict_of_interest_policy_efile', 'records_retention_policy_efile', 'SOX_policies_efile', 'SOX_policies_binary_efile', 'SOX_policies_all_binary_efile', 'tot_rev_efile', 'tot_rev_no_neg_efile', 'total_revenue_logged_efile', 'program_expenses_efile', 'total_expenses_efile', 'program_efficiency_efile', 'complexity_efile', '_merge_efile', 'latest_entry']\n" ] } ], "source": [ "print cols\n", "efile_cols = ['whistleblower_policy_efile', 'conflict_of_interest_policy_efile', 'records_retention_policy_efile',\n", "'SOX_policies_efile', 'SOX_policies_binary_efile', 'SOX_policies_all_binary_efile', 'tot_rev_efile', \n", "'tot_rev_no_neg_efile', 'total_revenue_logged_efile', 'program_expenses_efile', 'total_expenses_efile', \n", "'program_efficiency_efile', 'complexity_efile']\n", "cols2 = cols+efile_cols+['_merge_efile', 'latest_entry']\n", "print cols2" ] }, { "cell_type": "code", "execution_count": 1412, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYEForm 990 FYEratings_system2011_data2016_datadonor_advisorydonor_advisory_2016donor_advisory_2011_to_2016conflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexityagetotal_revenue_loggedtot_revstatecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policywhistleblower_policy_efileconflict_of_interest_policy_efilerecords_retention_policy_efileSOX_policies_efileSOX_policies_binary_efileSOX_policies_all_binary_efiletot_rev_efiletot_rev_no_neg_efiletotal_revenue_logged_efileprogram_expenses_efiletotal_expenses_efileprogram_efficiency_efilecomplexity_efile
85006NaN030498214FY2011NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0112102161209216120914.586178171579324813100.6914870
85007NaN030498214FY2010NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN0112103990564399056415.199443293078539126430.7490550
85008NaN042701694FY2014NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN11131176443776443713.5468954782196363460.7515083
\n", "
" ], "text/plain": [ " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "85006 NaN 030498214 FY2011 NaN NaN NaN \n", "85007 NaN 030498214 FY2010 NaN NaN NaN \n", "85008 NaN 042701694 FY2014 NaN NaN NaN \n", "\n", " 2016_data donor_advisory donor_advisory_2016 \\\n", "85006 NaN NaN NaN \n", "85007 NaN NaN NaN \n", "85008 NaN NaN NaN \n", "\n", " donor_advisory_2011_to_2016 conflict_of_interest_policy_v2 \\\n", "85006 NaN NaN \n", "85007 NaN NaN \n", "85008 NaN NaN \n", "\n", " records_retention_policy_v2 whistleblower_policy_v2 SOX_policies \\\n", "85006 NaN NaN NaN \n", "85007 NaN NaN NaN \n", "85008 NaN NaN NaN \n", "\n", " SOX_policies_binary SOX_policies_all_binary program_efficiency \\\n", "85006 NaN NaN NaN \n", "85007 NaN NaN NaN \n", "85008 NaN NaN NaN \n", "\n", " complexity age total_revenue_logged tot_rev state category \\\n", "85006 NaN NaN NaN NaN NaN NaN \n", "85007 NaN NaN NaN NaN NaN NaN \n", "85008 NaN NaN NaN NaN NaN NaN \n", "\n", " category_Animals category_Arts, Culture, Humanities \\\n", "85006 NaN NaN \n", "85007 NaN NaN \n", "85008 NaN NaN \n", "\n", " category_Community Development category_Education \\\n", "85006 NaN NaN \n", "85007 NaN NaN \n", "85008 NaN NaN \n", "\n", " category_Environment category_Health category_Human Services \\\n", "85006 NaN NaN NaN \n", "85007 NaN NaN NaN \n", "85008 NaN NaN NaN \n", "\n", " category_Human and Civil Rights category_International \\\n", "85006 NaN NaN \n", "85007 NaN NaN \n", "85008 NaN NaN \n", "\n", " category_Religion category_Research and Public Policy \\\n", "85006 NaN NaN \n", "85007 NaN NaN \n", "85008 NaN NaN \n", "\n", " whistleblower_policy_efile conflict_of_interest_policy_efile \\\n", "85006 0 1 \n", "85007 0 1 \n", "85008 1 1 \n", "\n", " records_retention_policy_efile SOX_policies_efile \\\n", "85006 1 2 \n", "85007 1 2 \n", "85008 1 3 \n", "\n", " SOX_policies_binary_efile SOX_policies_all_binary_efile \\\n", "85006 1 0 \n", "85007 1 0 \n", "85008 1 1 \n", "\n", " tot_rev_efile tot_rev_no_neg_efile total_revenue_logged_efile \\\n", "85006 2161209 2161209 14.586178 \n", "85007 3990564 3990564 15.199443 \n", "85008 764437 764437 13.546895 \n", "\n", " program_expenses_efile total_expenses_efile program_efficiency_efile \\\n", "85006 1715793 2481310 0.691487 \n", "85007 2930785 3912643 0.749055 \n", "85008 478219 636346 0.751508 \n", "\n", " complexity_efile \n", "85006 0 \n", "85007 0 \n", "85008 3 " ] }, "execution_count": 1412, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['_merge_efile']=='right_only'][cols2][:3]" ] }, { "cell_type": "code", "execution_count": 1414, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYEForm 990 FYEratings_system2011_data2016_datadonor_advisorydonor_advisory_2016donor_advisory_2011_to_2016conflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexityagetotal_revenue_loggedtot_revstatecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policywhistleblower_policy_efileconflict_of_interest_policy_efilerecords_retention_policy_efileSOX_policies_efileSOX_policies_binary_efileSOX_policies_all_binary_efiletot_rev_efiletot_rev_no_neg_efiletotal_revenue_logged_efileprogram_expenses_efiletotal_expenses_efileprogram_efficiency_efilecomplexity_efile
8389911671300038297FY2010NaNNaN10NaN111113110.84915201315.596243NaNCACommunity Development001000000001113115934202593420215.596243398511346930530.8491522
8390011327720760857FY2010NaNNaN10NaN111113110.85276404215.651125NaNLACommunity Development001000000001113116268983626898315.651125472194855372280.8527645
8390410087581925867FY2010NaNNaN10NaN111001100.80533002514.585510NaNLAHuman Services000000100000101102159766215976614.585510201930925074310.8053304
\n", "
" ], "text/plain": [ " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "83899 11671 300038297 FY2010 NaN NaN 1 \n", "83900 11327 720760857 FY2010 NaN NaN 1 \n", "83904 10087 581925867 FY2010 NaN NaN 1 \n", "\n", " 2016_data donor_advisory donor_advisory_2016 \\\n", "83899 0 NaN 1 \n", "83900 0 NaN 1 \n", "83904 0 NaN 1 \n", "\n", " donor_advisory_2011_to_2016 conflict_of_interest_policy_v2 \\\n", "83899 1 1 \n", "83900 1 1 \n", "83904 1 1 \n", "\n", " records_retention_policy_v2 whistleblower_policy_v2 SOX_policies \\\n", "83899 1 1 3 \n", "83900 1 1 3 \n", "83904 0 0 1 \n", "\n", " SOX_policies_binary SOX_policies_all_binary program_efficiency \\\n", "83899 1 1 0.849152 \n", "83900 1 1 0.852764 \n", "83904 1 0 0.805330 \n", "\n", " complexity age total_revenue_logged tot_rev state \\\n", "83899 0 13 15.596243 NaN CA \n", "83900 0 42 15.651125 NaN LA \n", "83904 0 25 14.585510 NaN LA \n", "\n", " category category_Animals \\\n", "83899 Community Development 0 \n", "83900 Community Development 0 \n", "83904 Human Services 0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "83899 0 1 \n", "83900 0 1 \n", "83904 0 0 \n", "\n", " category_Education category_Environment category_Health \\\n", "83899 0 0 0 \n", "83900 0 0 0 \n", "83904 0 0 0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "83899 0 0 \n", "83900 0 0 \n", "83904 1 0 \n", "\n", " category_International category_Religion \\\n", "83899 0 0 \n", "83900 0 0 \n", "83904 0 0 \n", "\n", " category_Research and Public Policy whistleblower_policy_efile \\\n", "83899 0 1 \n", "83900 0 1 \n", "83904 0 0 \n", "\n", " conflict_of_interest_policy_efile records_retention_policy_efile \\\n", "83899 1 1 \n", "83900 1 1 \n", "83904 1 0 \n", "\n", " SOX_policies_efile SOX_policies_binary_efile \\\n", "83899 3 1 \n", "83900 3 1 \n", "83904 1 1 \n", "\n", " SOX_policies_all_binary_efile tot_rev_efile tot_rev_no_neg_efile \\\n", "83899 1 5934202 5934202 \n", "83900 1 6268983 6268983 \n", "83904 0 2159766 2159766 \n", "\n", " total_revenue_logged_efile program_expenses_efile \\\n", "83899 15.596243 3985113 \n", "83900 15.651125 4721948 \n", "83904 14.585510 2019309 \n", "\n", " total_expenses_efile program_efficiency_efile complexity_efile \n", "83899 4693053 0.849152 2 \n", "83900 5537228 0.852764 5 \n", "83904 2507431 0.805330 4 " ] }, "execution_count": 1414, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['_merge_efile']=='both'][cols2][:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Sort DF" ] }, { "cell_type": "code", "execution_count": 1443, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#df[df['org_id']=='16648'][cols2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Fix 4 EIN values that the sort command showed to be wrong." ] }, { "cell_type": "code", "execution_count": 1437, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#df.set_value(66124, 'EIN', '042453412')\n", "#df.set_value(44363, 'EIN', '202440544')\n", "#df.set_value(35434, 'EIN', '364762261')\n", "#df.set_value(11784, 'EIN', np.nan)" ] }, { "cell_type": "code", "execution_count": 1444, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYEForm 990 FYEratings_system2011_data2016_datadonor_advisorydonor_advisory_2016donor_advisory_2011_to_2016conflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexityagetotal_revenue_loggedtot_revstatecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policywhistleblower_policy_efileconflict_of_interest_policy_efilerecords_retention_policy_efileSOX_policies_efileSOX_policies_binary_efileSOX_policies_all_binary_efiletot_rev_efiletot_rev_no_neg_efiletotal_revenue_logged_efileprogram_expenses_efiletotal_expenses_efileprogram_efficiency_efilecomplexity_efile_merge_efilelatest_entry
507095954010202467FY20142014-12CN 2.1010001113110.79445766216.377993NaNMEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyTrue
507105954010202467FY20132013-12CN 2.0000001113110.80015206216.13452010165601MEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507115954010202467FY20122012-12CN 2.0000001113110.79579306216.24974211407051MEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507125954010202467FY20122012-12CN 2.0000001113110.79579306216.24974211407051MEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507135954010202467FY20112011-12CN 2.0000001113110.82483806216.39647813209918MEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507145954010202467FY20102010-12CN 2.0000001113110.81860206216.0645159478299MEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507155954010202467FY20092009-12CN 2.0100001113110.78889506215.9475638432154MEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507165954010202467FY20092009-12CN 1.0000001113110.78889506215.9475638432154MEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507175954010202467FY20082008-12CN 1.0000001102100.81818606216.15173510342120MEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507185954010202467FY20072007-12CN 1.000000NaNNaNNaNNaNNaNNaNNaN062NaNNaNMEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507195954010202467FY20062006-12CN 1.000000NaNNaNNaNNaNNaNNaNNaN062NaNNaNMEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507205954010202467FY20052005-12CN 1.000000NaNNaNNaNNaNNaNNaNNaN062NaNNaNMEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507215954010202467FY20042004-12CN 1.000000NaNNaNNaNNaNNaNNaNNaN062NaNNaNMEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507225954010202467FY20032003-12CN 1.000000NaNNaNNaNNaNNaNNaNNaN062NaNNaNMEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507235954010202467FY20022002-12CN 1.000000NaNNaNNaNNaNNaNNaNNaN062NaNNaNMEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
507245954010202467FY20012001-12CN 1.000000NaNNaNNaNNaNNaNNaNNaN062NaNNaNMEResearch and Public Policy00000000001NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
\n", "
" ], "text/plain": [ " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "50709 5954 010202467 FY2014 2014-12 CN 2.1 0 \n", "50710 5954 010202467 FY2013 2013-12 CN 2.0 0 \n", "50711 5954 010202467 FY2012 2012-12 CN 2.0 0 \n", "50712 5954 010202467 FY2012 2012-12 CN 2.0 0 \n", "50713 5954 010202467 FY2011 2011-12 CN 2.0 0 \n", "50714 5954 010202467 FY2010 2010-12 CN 2.0 0 \n", "50715 5954 010202467 FY2009 2009-12 CN 2.0 1 \n", "50716 5954 010202467 FY2009 2009-12 CN 1.0 0 \n", "50717 5954 010202467 FY2008 2008-12 CN 1.0 0 \n", "50718 5954 010202467 FY2007 2007-12 CN 1.0 0 \n", "50719 5954 010202467 FY2006 2006-12 CN 1.0 0 \n", "50720 5954 010202467 FY2005 2005-12 CN 1.0 0 \n", "50721 5954 010202467 FY2004 2004-12 CN 1.0 0 \n", "50722 5954 010202467 FY2003 2003-12 CN 1.0 0 \n", "50723 5954 010202467 FY2002 2002-12 CN 1.0 0 \n", "50724 5954 010202467 FY2001 2001-12 CN 1.0 0 \n", "\n", " 2016_data donor_advisory donor_advisory_2016 \\\n", "50709 1 0 0 \n", "50710 0 0 0 \n", "50711 0 0 0 \n", "50712 0 0 0 \n", "50713 0 0 0 \n", "50714 0 0 0 \n", "50715 0 0 0 \n", "50716 0 0 0 \n", "50717 0 0 0 \n", "50718 0 0 0 \n", "50719 0 0 0 \n", "50720 0 0 0 \n", "50721 0 0 0 \n", "50722 0 0 0 \n", "50723 0 0 0 \n", "50724 0 0 0 \n", "\n", " donor_advisory_2011_to_2016 conflict_of_interest_policy_v2 \\\n", "50709 0 1 \n", "50710 0 1 \n", "50711 0 1 \n", "50712 0 1 \n", "50713 0 1 \n", "50714 0 1 \n", "50715 0 1 \n", "50716 0 1 \n", "50717 0 1 \n", "50718 0 NaN \n", "50719 0 NaN \n", "50720 0 NaN \n", "50721 0 NaN \n", "50722 0 NaN \n", "50723 0 NaN \n", "50724 0 NaN \n", "\n", " records_retention_policy_v2 whistleblower_policy_v2 SOX_policies \\\n", "50709 1 1 3 \n", "50710 1 1 3 \n", "50711 1 1 3 \n", "50712 1 1 3 \n", "50713 1 1 3 \n", "50714 1 1 3 \n", "50715 1 1 3 \n", "50716 1 1 3 \n", "50717 1 0 2 \n", "50718 NaN NaN NaN \n", "50719 NaN NaN NaN \n", "50720 NaN NaN NaN \n", "50721 NaN NaN NaN \n", "50722 NaN NaN NaN \n", "50723 NaN NaN NaN \n", "50724 NaN NaN NaN \n", "\n", " SOX_policies_binary SOX_policies_all_binary program_efficiency \\\n", "50709 1 1 0.794457 \n", "50710 1 1 0.800152 \n", "50711 1 1 0.795793 \n", "50712 1 1 0.795793 \n", "50713 1 1 0.824838 \n", "50714 1 1 0.818602 \n", "50715 1 1 0.788895 \n", "50716 1 1 0.788895 \n", "50717 1 0 0.818186 \n", "50718 NaN NaN NaN \n", "50719 NaN NaN NaN \n", "50720 NaN NaN NaN \n", "50721 NaN NaN NaN \n", "50722 NaN NaN NaN \n", "50723 NaN NaN NaN \n", "50724 NaN NaN NaN \n", "\n", " complexity age total_revenue_logged tot_rev state \\\n", "50709 6 62 16.377993 NaN ME \n", "50710 0 62 16.134520 10165601 ME \n", "50711 0 62 16.249742 11407051 ME \n", "50712 0 62 16.249742 11407051 ME \n", "50713 0 62 16.396478 13209918 ME \n", "50714 0 62 16.064515 9478299 ME \n", "50715 0 62 15.947563 8432154 ME \n", "50716 0 62 15.947563 8432154 ME \n", "50717 0 62 16.151735 10342120 ME \n", "50718 0 62 NaN NaN ME \n", "50719 0 62 NaN NaN ME \n", "50720 0 62 NaN NaN ME \n", "50721 0 62 NaN NaN ME \n", "50722 0 62 NaN NaN ME \n", "50723 0 62 NaN NaN ME \n", "50724 0 62 NaN NaN ME \n", "\n", " category category_Animals \\\n", "50709 Research and Public Policy 0 \n", "50710 Research and Public Policy 0 \n", "50711 Research and Public Policy 0 \n", "50712 Research and Public Policy 0 \n", "50713 Research and Public Policy 0 \n", "50714 Research and Public Policy 0 \n", "50715 Research and Public Policy 0 \n", "50716 Research and Public Policy 0 \n", "50717 Research and Public Policy 0 \n", "50718 Research and Public Policy 0 \n", "50719 Research and Public Policy 0 \n", "50720 Research and Public Policy 0 \n", "50721 Research and Public Policy 0 \n", "50722 Research and Public Policy 0 \n", "50723 Research and Public Policy 0 \n", "50724 Research and Public Policy 0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "50709 0 0 \n", "50710 0 0 \n", "50711 0 0 \n", "50712 0 0 \n", "50713 0 0 \n", "50714 0 0 \n", "50715 0 0 \n", "50716 0 0 \n", "50717 0 0 \n", "50718 0 0 \n", "50719 0 0 \n", "50720 0 0 \n", "50721 0 0 \n", "50722 0 0 \n", "50723 0 0 \n", "50724 0 0 \n", "\n", " category_Education category_Environment category_Health \\\n", "50709 0 0 0 \n", "50710 0 0 0 \n", "50711 0 0 0 \n", "50712 0 0 0 \n", "50713 0 0 0 \n", "50714 0 0 0 \n", "50715 0 0 0 \n", "50716 0 0 0 \n", "50717 0 0 0 \n", "50718 0 0 0 \n", "50719 0 0 0 \n", "50720 0 0 0 \n", "50721 0 0 0 \n", "50722 0 0 0 \n", "50723 0 0 0 \n", "50724 0 0 0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "50709 0 0 \n", "50710 0 0 \n", "50711 0 0 \n", "50712 0 0 \n", "50713 0 0 \n", "50714 0 0 \n", "50715 0 0 \n", "50716 0 0 \n", "50717 0 0 \n", "50718 0 0 \n", "50719 0 0 \n", "50720 0 0 \n", "50721 0 0 \n", "50722 0 0 \n", "50723 0 0 \n", "50724 0 0 \n", "\n", " category_International category_Religion \\\n", "50709 0 0 \n", "50710 0 0 \n", "50711 0 0 \n", "50712 0 0 \n", "50713 0 0 \n", "50714 0 0 \n", "50715 0 0 \n", "50716 0 0 \n", "50717 0 0 \n", "50718 0 0 \n", "50719 0 0 \n", "50720 0 0 \n", "50721 0 0 \n", "50722 0 0 \n", "50723 0 0 \n", "50724 0 0 \n", "\n", " category_Research and Public Policy whistleblower_policy_efile \\\n", "50709 1 NaN \n", "50710 1 NaN \n", "50711 1 NaN \n", "50712 1 NaN \n", "50713 1 NaN \n", "50714 1 NaN \n", "50715 1 NaN \n", "50716 1 NaN \n", "50717 1 NaN \n", "50718 1 NaN \n", "50719 1 NaN \n", "50720 1 NaN \n", "50721 1 NaN \n", "50722 1 NaN \n", "50723 1 NaN \n", "50724 1 NaN \n", "\n", " conflict_of_interest_policy_efile records_retention_policy_efile \\\n", "50709 NaN NaN \n", "50710 NaN NaN \n", "50711 NaN NaN \n", "50712 NaN NaN \n", "50713 NaN NaN \n", "50714 NaN NaN \n", "50715 NaN NaN \n", "50716 NaN NaN \n", "50717 NaN NaN \n", "50718 NaN NaN \n", "50719 NaN NaN \n", "50720 NaN NaN \n", "50721 NaN NaN \n", "50722 NaN NaN \n", "50723 NaN NaN \n", "50724 NaN NaN \n", "\n", " SOX_policies_efile SOX_policies_binary_efile \\\n", "50709 NaN NaN \n", "50710 NaN NaN \n", "50711 NaN NaN \n", "50712 NaN NaN \n", "50713 NaN NaN \n", "50714 NaN NaN \n", "50715 NaN NaN \n", "50716 NaN NaN \n", "50717 NaN NaN \n", "50718 NaN NaN \n", "50719 NaN NaN \n", "50720 NaN NaN \n", "50721 NaN NaN \n", "50722 NaN NaN \n", "50723 NaN NaN \n", "50724 NaN NaN \n", "\n", " SOX_policies_all_binary_efile tot_rev_efile tot_rev_no_neg_efile \\\n", "50709 NaN NaN NaN \n", "50710 NaN NaN NaN \n", "50711 NaN NaN NaN \n", "50712 NaN NaN NaN \n", "50713 NaN NaN NaN \n", "50714 NaN NaN NaN \n", "50715 NaN NaN NaN \n", "50716 NaN NaN NaN \n", "50717 NaN NaN NaN \n", "50718 NaN NaN NaN \n", "50719 NaN NaN NaN \n", "50720 NaN NaN NaN \n", "50721 NaN NaN NaN \n", "50722 NaN NaN NaN \n", "50723 NaN NaN NaN \n", "50724 NaN NaN NaN \n", "\n", " total_revenue_logged_efile program_expenses_efile \\\n", "50709 NaN NaN \n", "50710 NaN NaN \n", "50711 NaN NaN \n", "50712 NaN NaN \n", "50713 NaN NaN \n", "50714 NaN NaN \n", "50715 NaN NaN \n", "50716 NaN NaN \n", "50717 NaN NaN \n", "50718 NaN NaN \n", "50719 NaN NaN \n", "50720 NaN NaN \n", "50721 NaN NaN \n", "50722 NaN NaN \n", "50723 NaN NaN \n", "50724 NaN NaN \n", "\n", " total_expenses_efile program_efficiency_efile complexity_efile \\\n", "50709 NaN NaN NaN \n", "50710 NaN NaN NaN \n", "50711 NaN NaN NaN \n", "50712 NaN NaN NaN \n", "50713 NaN NaN NaN \n", "50714 NaN NaN NaN \n", "50715 NaN NaN NaN \n", "50716 NaN NaN NaN \n", "50717 NaN NaN NaN \n", "50718 NaN NaN NaN \n", "50719 NaN NaN NaN \n", "50720 NaN NaN NaN \n", "50721 NaN NaN NaN \n", "50722 NaN NaN NaN \n", "50723 NaN NaN NaN \n", "50724 NaN NaN NaN \n", "\n", " _merge_efile latest_entry \n", "50709 left_only True \n", "50710 left_only False \n", "50711 left_only False \n", "50712 left_only False \n", "50713 left_only False \n", "50714 left_only False \n", "50715 left_only False \n", "50716 left_only False \n", "50717 left_only False \n", "50718 left_only False \n", "50719 left_only False \n", "50720 left_only False \n", "50721 left_only False \n", "50722 left_only False \n", "50723 left_only False \n", "50724 left_only False " ] }, "execution_count": 1444, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values(by=['EIN', 'latest_entry', 'FYE', 'ratings_system'], ascending=[1, 0, 0, 0])[cols2][:16]" ] }, { "cell_type": "code", "execution_count": 1447, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 21940.000000\n", "mean 2.723701\n", "std 0.691317\n", "min 0.000000\n", "25% 3.000000\n", "50% 3.000000\n", "75% 3.000000\n", "max 3.000000\n", "Name: SOX_policies, dtype: float64" ] }, "execution_count": 1447, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['SOX_policies'].describe()" ] }, { "cell_type": "code", "execution_count": 1452, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYEForm 990 FYEratings_system2011_data2016_datadonor_advisorydonor_advisory_2016donor_advisory_2011_to_2016conflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexityagetotal_revenue_loggedtot_revstatecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policywhistleblower_policy_efileconflict_of_interest_policy_efilerecords_retention_policy_efileSOX_policies_efileSOX_policies_binary_efileSOX_policies_all_binary_efiletot_rev_efiletot_rev_no_neg_efiletotal_revenue_logged_efileprogram_expenses_efiletotal_expenses_efileprogram_efficiency_efilecomplexity_efile_merge_efilelatest_entry
403483916010211513FY20142014-12CN 2.1010001113110.83329656619.490857NaNMEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyTrue
403493916010211513FY20142014-12CN 2.000000NaNNaNNaNNaNNaNNaNNaN066NaNNaNMEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403503916010211513FY20132013-12CN 2.0000001113110.83543106619.365103257132786MEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403513916010211513FY20122012-12CN 2.0000001113110.84936306619.258274231079981MEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403523916010211513FY20112011-05CN 2.0000001113110.85558406619.260154231514645MEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403533916010211513FY20112011-05CN 2.0000001113110.85558406619.260154231514645MEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403543916010211513FY20102010-05CN 2.0100001113110.85885106619.115237200282021MEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403553916010211513FY20092009-05CN 1.0000001113110.79305106618.958910171297125MEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403563916010211513FY20082008-05CN 1.000000NaNNaNNaNNaNNaNNaNNaN066NaNNaNMEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403573916010211513FY20072007-05CN 1.000000NaNNaNNaNNaNNaNNaNNaN066NaNNaNMEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403583916010211513FY20062006-05CN 1.000000NaNNaNNaNNaNNaNNaNNaN066NaNNaNMEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403593916010211513FY20052005-05CN 1.000000NaNNaNNaNNaNNaNNaNNaN066NaNNaNMEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403603916010211513FY20042004-05CN 1.000000NaNNaNNaNNaNNaNNaNNaN066NaNNaNMEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403613916010211513FY20032003-05CN 1.000000NaNNaNNaNNaNNaNNaNNaN066NaNNaNMEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403623916010211513FY20022002-05CN 1.000000NaNNaNNaNNaNNaNNaNNaN066NaNNaNMEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403633916010211513FY20012001-05CN 1.000000NaNNaNNaNNaNNaNNaNNaN066NaNNaNMEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
403643916010211513FY20002000-05CN 1.000000NaNNaNNaNNaNNaNNaNNaN066NaNNaNMEHealth00000100000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyFalse
\n", "
" ], "text/plain": [ " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "40348 3916 010211513 FY2014 2014-12 CN 2.1 0 \n", "40349 3916 010211513 FY2014 2014-12 CN 2.0 0 \n", "40350 3916 010211513 FY2013 2013-12 CN 2.0 0 \n", "40351 3916 010211513 FY2012 2012-12 CN 2.0 0 \n", "40352 3916 010211513 FY2011 2011-05 CN 2.0 0 \n", "40353 3916 010211513 FY2011 2011-05 CN 2.0 0 \n", "40354 3916 010211513 FY2010 2010-05 CN 2.0 1 \n", "40355 3916 010211513 FY2009 2009-05 CN 1.0 0 \n", "40356 3916 010211513 FY2008 2008-05 CN 1.0 0 \n", "40357 3916 010211513 FY2007 2007-05 CN 1.0 0 \n", "40358 3916 010211513 FY2006 2006-05 CN 1.0 0 \n", "40359 3916 010211513 FY2005 2005-05 CN 1.0 0 \n", "40360 3916 010211513 FY2004 2004-05 CN 1.0 0 \n", "40361 3916 010211513 FY2003 2003-05 CN 1.0 0 \n", "40362 3916 010211513 FY2002 2002-05 CN 1.0 0 \n", "40363 3916 010211513 FY2001 2001-05 CN 1.0 0 \n", "40364 3916 010211513 FY2000 2000-05 CN 1.0 0 \n", "\n", " 2016_data donor_advisory donor_advisory_2016 \\\n", "40348 1 0 0 \n", "40349 0 0 0 \n", "40350 0 0 0 \n", "40351 0 0 0 \n", "40352 0 0 0 \n", "40353 0 0 0 \n", "40354 0 0 0 \n", "40355 0 0 0 \n", "40356 0 0 0 \n", "40357 0 0 0 \n", "40358 0 0 0 \n", "40359 0 0 0 \n", "40360 0 0 0 \n", "40361 0 0 0 \n", "40362 0 0 0 \n", "40363 0 0 0 \n", "40364 0 0 0 \n", "\n", " donor_advisory_2011_to_2016 conflict_of_interest_policy_v2 \\\n", "40348 0 1 \n", "40349 0 NaN \n", "40350 0 1 \n", "40351 0 1 \n", "40352 0 1 \n", "40353 0 1 \n", "40354 0 1 \n", "40355 0 1 \n", "40356 0 NaN \n", "40357 0 NaN \n", "40358 0 NaN \n", "40359 0 NaN \n", "40360 0 NaN \n", "40361 0 NaN \n", "40362 0 NaN \n", "40363 0 NaN \n", "40364 0 NaN \n", "\n", " records_retention_policy_v2 whistleblower_policy_v2 SOX_policies \\\n", "40348 1 1 3 \n", "40349 NaN NaN NaN \n", "40350 1 1 3 \n", "40351 1 1 3 \n", "40352 1 1 3 \n", "40353 1 1 3 \n", "40354 1 1 3 \n", "40355 1 1 3 \n", "40356 NaN NaN NaN \n", "40357 NaN NaN NaN \n", "40358 NaN NaN NaN \n", "40359 NaN NaN NaN \n", "40360 NaN NaN NaN \n", "40361 NaN NaN NaN \n", "40362 NaN NaN NaN \n", "40363 NaN NaN NaN \n", "40364 NaN NaN NaN \n", "\n", " SOX_policies_binary SOX_policies_all_binary program_efficiency \\\n", "40348 1 1 0.833296 \n", "40349 NaN NaN NaN \n", "40350 1 1 0.835431 \n", "40351 1 1 0.849363 \n", "40352 1 1 0.855584 \n", "40353 1 1 0.855584 \n", "40354 1 1 0.858851 \n", "40355 1 1 0.793051 \n", "40356 NaN NaN NaN \n", "40357 NaN NaN NaN \n", "40358 NaN NaN NaN \n", "40359 NaN NaN NaN \n", "40360 NaN NaN NaN \n", "40361 NaN NaN NaN \n", "40362 NaN NaN NaN \n", "40363 NaN NaN NaN \n", "40364 NaN NaN NaN \n", "\n", " complexity age total_revenue_logged tot_rev state category \\\n", "40348 5 66 19.490857 NaN ME Health \n", "40349 0 66 NaN NaN ME Health \n", "40350 0 66 19.365103 257132786 ME Health \n", "40351 0 66 19.258274 231079981 ME Health \n", "40352 0 66 19.260154 231514645 ME Health \n", "40353 0 66 19.260154 231514645 ME Health \n", "40354 0 66 19.115237 200282021 ME Health \n", "40355 0 66 18.958910 171297125 ME Health \n", "40356 0 66 NaN NaN ME Health \n", "40357 0 66 NaN NaN ME Health \n", "40358 0 66 NaN NaN ME Health \n", "40359 0 66 NaN NaN ME Health \n", "40360 0 66 NaN NaN ME Health \n", "40361 0 66 NaN NaN ME Health \n", "40362 0 66 NaN NaN ME Health \n", "40363 0 66 NaN NaN ME Health \n", "40364 0 66 NaN NaN ME Health \n", "\n", " category_Animals category_Arts, Culture, Humanities \\\n", "40348 0 0 \n", "40349 0 0 \n", "40350 0 0 \n", "40351 0 0 \n", "40352 0 0 \n", "40353 0 0 \n", "40354 0 0 \n", "40355 0 0 \n", "40356 0 0 \n", "40357 0 0 \n", "40358 0 0 \n", "40359 0 0 \n", "40360 0 0 \n", "40361 0 0 \n", "40362 0 0 \n", "40363 0 0 \n", "40364 0 0 \n", "\n", " category_Community Development category_Education \\\n", "40348 0 0 \n", "40349 0 0 \n", "40350 0 0 \n", "40351 0 0 \n", "40352 0 0 \n", "40353 0 0 \n", "40354 0 0 \n", "40355 0 0 \n", "40356 0 0 \n", "40357 0 0 \n", "40358 0 0 \n", "40359 0 0 \n", "40360 0 0 \n", "40361 0 0 \n", "40362 0 0 \n", "40363 0 0 \n", "40364 0 0 \n", "\n", " category_Environment category_Health category_Human Services \\\n", "40348 0 1 0 \n", "40349 0 1 0 \n", "40350 0 1 0 \n", "40351 0 1 0 \n", "40352 0 1 0 \n", "40353 0 1 0 \n", "40354 0 1 0 \n", "40355 0 1 0 \n", "40356 0 1 0 \n", "40357 0 1 0 \n", "40358 0 1 0 \n", "40359 0 1 0 \n", "40360 0 1 0 \n", "40361 0 1 0 \n", "40362 0 1 0 \n", "40363 0 1 0 \n", "40364 0 1 0 \n", "\n", " category_Human and Civil Rights category_International \\\n", "40348 0 0 \n", "40349 0 0 \n", "40350 0 0 \n", "40351 0 0 \n", "40352 0 0 \n", "40353 0 0 \n", "40354 0 0 \n", "40355 0 0 \n", "40356 0 0 \n", "40357 0 0 \n", "40358 0 0 \n", "40359 0 0 \n", "40360 0 0 \n", "40361 0 0 \n", "40362 0 0 \n", "40363 0 0 \n", "40364 0 0 \n", "\n", " category_Religion category_Research and Public Policy \\\n", "40348 0 0 \n", "40349 0 0 \n", "40350 0 0 \n", "40351 0 0 \n", "40352 0 0 \n", "40353 0 0 \n", "40354 0 0 \n", "40355 0 0 \n", "40356 0 0 \n", "40357 0 0 \n", "40358 0 0 \n", "40359 0 0 \n", "40360 0 0 \n", "40361 0 0 \n", "40362 0 0 \n", "40363 0 0 \n", "40364 0 0 \n", "\n", " whistleblower_policy_efile conflict_of_interest_policy_efile \\\n", "40348 NaN NaN \n", "40349 NaN NaN \n", "40350 NaN NaN \n", "40351 NaN NaN \n", "40352 NaN NaN \n", "40353 NaN NaN \n", "40354 NaN NaN \n", "40355 NaN NaN \n", "40356 NaN NaN \n", "40357 NaN NaN \n", "40358 NaN NaN \n", "40359 NaN NaN \n", "40360 NaN NaN \n", "40361 NaN NaN \n", "40362 NaN NaN \n", "40363 NaN NaN \n", "40364 NaN NaN \n", "\n", " records_retention_policy_efile SOX_policies_efile \\\n", "40348 NaN NaN \n", "40349 NaN NaN \n", "40350 NaN NaN \n", "40351 NaN NaN \n", "40352 NaN NaN \n", "40353 NaN NaN \n", "40354 NaN NaN \n", "40355 NaN NaN \n", "40356 NaN NaN \n", "40357 NaN NaN \n", "40358 NaN NaN \n", "40359 NaN NaN \n", "40360 NaN NaN \n", "40361 NaN NaN \n", "40362 NaN NaN \n", "40363 NaN NaN \n", "40364 NaN NaN \n", "\n", " SOX_policies_binary_efile SOX_policies_all_binary_efile \\\n", "40348 NaN NaN \n", "40349 NaN NaN \n", "40350 NaN NaN \n", "40351 NaN NaN \n", "40352 NaN NaN \n", "40353 NaN NaN \n", "40354 NaN NaN \n", "40355 NaN NaN \n", "40356 NaN NaN \n", "40357 NaN NaN \n", "40358 NaN NaN \n", "40359 NaN NaN \n", "40360 NaN NaN \n", "40361 NaN NaN \n", "40362 NaN NaN \n", "40363 NaN NaN \n", "40364 NaN NaN \n", "\n", " tot_rev_efile tot_rev_no_neg_efile total_revenue_logged_efile \\\n", "40348 NaN NaN NaN \n", "40349 NaN NaN NaN \n", "40350 NaN NaN NaN \n", "40351 NaN NaN NaN \n", "40352 NaN NaN NaN \n", "40353 NaN NaN NaN \n", "40354 NaN NaN NaN \n", "40355 NaN NaN NaN \n", "40356 NaN NaN NaN \n", "40357 NaN NaN NaN \n", "40358 NaN NaN NaN \n", "40359 NaN NaN NaN \n", "40360 NaN NaN NaN \n", "40361 NaN NaN NaN \n", "40362 NaN NaN NaN \n", "40363 NaN NaN NaN \n", "40364 NaN NaN NaN \n", "\n", " program_expenses_efile total_expenses_efile program_efficiency_efile \\\n", "40348 NaN NaN NaN \n", "40349 NaN NaN NaN \n", "40350 NaN NaN NaN \n", "40351 NaN NaN NaN \n", "40352 NaN NaN NaN \n", "40353 NaN NaN NaN \n", "40354 NaN NaN NaN \n", "40355 NaN NaN NaN \n", "40356 NaN NaN NaN \n", "40357 NaN NaN NaN \n", "40358 NaN NaN NaN \n", "40359 NaN NaN NaN \n", "40360 NaN NaN NaN \n", "40361 NaN NaN NaN \n", "40362 NaN NaN NaN \n", "40363 NaN NaN NaN \n", "40364 NaN NaN NaN \n", "\n", " complexity_efile _merge_efile latest_entry \n", "40348 NaN left_only True \n", "40349 NaN left_only False \n", "40350 NaN left_only False \n", "40351 NaN left_only False \n", "40352 NaN left_only False \n", "40353 NaN left_only False \n", "40354 NaN left_only False \n", "40355 NaN left_only False \n", "40356 NaN left_only False \n", "40357 NaN left_only False \n", "40358 NaN left_only False \n", "40359 NaN left_only False \n", "40360 NaN left_only False \n", "40361 NaN left_only False \n", "40362 NaN left_only False \n", "40363 NaN left_only False \n", "40364 NaN left_only False " ] }, "execution_count": 1452, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['org_id']=='3916'][cols2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
65 EINs are missing. See email exchanges with Dan." ] }, { "cell_type": "code", "execution_count": 1424, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "65\n", "85336\n" ] } ], "source": [ "print len(df[df['EIN'].isnull()])\n", "print len(df[df['EIN'].notnull()])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Replace Values " ] }, { "cell_type": "code", "execution_count": 1464, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21895\n", "22326\n" ] } ], "source": [ "print len(df[df['conflict_of_interest_policy'].notnull()])\n", "df['conflict_of_interest_policy'] = np.where( ( (df['conflict_of_interest_policy'].isnull()) \n", " & (df['conflict_of_interest_policy_efile'].notnull()) ),\n", " df['conflict_of_interest_policy_efile'], df['conflict_of_interest_policy'])\n", "print len(df[df['conflict_of_interest_policy'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1537, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21940\n", "22337\n" ] } ], "source": [ "print len(df[df['conflict_of_interest_policy_v2'].notnull()])\n", "df['conflict_of_interest_policy_v2'] = np.where( ( (df['conflict_of_interest_policy_v2'].isnull()) \n", " & (df['conflict_of_interest_policy_efile'].notnull()) ),\n", " df['conflict_of_interest_policy_efile'], df['conflict_of_interest_policy_v2'])\n", "print len(df[df['conflict_of_interest_policy_v2'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1465, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21895\n", "22326\n" ] } ], "source": [ "print len(df[df['records_retention_policy'].notnull()])\n", "df['records_retention_policy'] = np.where( ( (df['records_retention_policy'].isnull()) \n", " & (df['records_retention_policy_efile'].notnull()) ),\n", " df['records_retention_policy_efile'], df['records_retention_policy'])\n", "print len(df[df['records_retention_policy'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1538, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21940\n", "22337\n" ] } ], "source": [ "print len(df[df['records_retention_policy_v2'].notnull()])\n", "df['records_retention_policy_v2'] = np.where( ( (df['records_retention_policy_v2'].isnull()) \n", " & (df['records_retention_policy_efile'].notnull()) ),\n", " df['records_retention_policy_efile'], df['records_retention_policy_v2'])\n", "print len(df[df['records_retention_policy_v2'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1466, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21940\n", "22337\n" ] } ], "source": [ "print len(df[df['whistleblower_policy'].notnull()])\n", "df['whistleblower_policy'] = np.where( ( (df['whistleblower_policy'].isnull()) \n", " & (df['whistleblower_policy_efile'].notnull()) ),\n", " df['whistleblower_policy_efile'], df['whistleblower_policy'])\n", "print len(df[df['whistleblower_policy'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1539, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21940\n", "22337\n" ] } ], "source": [ "print len(df[df['whistleblower_policy_v2'].notnull()])\n", "df['whistleblower_policy_v2'] = np.where( ( (df['whistleblower_policy_v2'].isnull()) \n", " & (df['whistleblower_policy_efile'].notnull()) ),\n", " df['whistleblower_policy_efile'], df['whistleblower_policy_v2'])\n", "print len(df[df['whistleblower_policy_v2'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1467, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21940\n", "22337\n" ] } ], "source": [ "print len(df[df['SOX_policies'].notnull()])\n", "df['SOX_policies'] = np.where( ( (df['SOX_policies'].isnull()) & (df['SOX_policies_efile'].notnull()) ),\n", " df['SOX_policies_efile'], df['SOX_policies'])\n", "print len(df[df['SOX_policies'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1468, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21940\n", "22337\n" ] } ], "source": [ "print len(df[df['SOX_policies_all_binary'].notnull()])\n", "df['SOX_policies_all_binary'] = np.where( ( (df['SOX_policies_all_binary'].isnull()) \n", " & (df['SOX_policies_all_binary_efile'].notnull()) ),\n", " df['SOX_policies_all_binary_efile'], df['SOX_policies_all_binary'])\n", "print len(df[df['SOX_policies_all_binary'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1469, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21940\n", "22337\n" ] } ], "source": [ "print len(df[df['SOX_policies_binary'].notnull()])\n", "df['SOX_policies_binary'] = np.where( ( (df['SOX_policies_binary'].isnull()) \n", " & (df['SOX_policies_binary_efile'].notnull()) ),\n", " df['SOX_policies_binary_efile'], df['SOX_policies_binary'])\n", "print len(df[df['SOX_policies_binary'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1470, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "11010\n", "11410\n" ] } ], "source": [ "print len(df[df['tot_rev'].notnull()])\n", "df['tot_rev'] = np.where( ( (df['tot_rev'].isnull()) & (df['tot_rev_efile'].notnull()) ),\n", " df['tot_rev_efile'], df['tot_rev'])\n", "print len(df[df['tot_rev'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1471, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21940\n", "22337\n" ] } ], "source": [ "print len(df[df['total_revenue_logged'].notnull()])\n", "df['total_revenue_logged'] = np.where( ( (df['total_revenue_logged'].isnull()) & (df['total_revenue_logged_efile'].notnull()) ),\n", " df['total_revenue_logged_efile'], df['total_revenue_logged'])\n", "print len(df[df['total_revenue_logged'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1472, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21940\n", "22327\n" ] } ], "source": [ "print len(df[df['program_expenses'].notnull()])\n", "df['program_expenses'] = np.where( ( (df['program_expenses'].isnull()) & (df['program_expenses_efile'].notnull()) ),\n", " df['program_expenses_efile'], df['program_expenses'])\n", "print len(df[df['program_expenses'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1473, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21940\n", "22337\n" ] } ], "source": [ "print len(df[df['total_expenses'].notnull()])\n", "df['total_expenses'] = np.where( ( (df['total_expenses'].isnull()) & (df['total_expenses_efile'].notnull()) ),\n", " df['total_expenses_efile'], df['total_expenses'])\n", "print len(df[df['total_expenses'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1474, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21939\n", "22336\n" ] } ], "source": [ "print len(df[df['program_efficiency'].notnull()])\n", "df['program_efficiency'] = np.where( ( (df['program_efficiency'].isnull()) & (df['program_efficiency_efile'].notnull()) ),\n", " df['program_efficiency_efile'], df['program_efficiency'])\n", "print len(df[df['program_efficiency'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1475, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "85004\n", "85401\n" ] } ], "source": [ "print len(df[df['complexity'].notnull()])\n", "df['complexity'] = np.where( ( (df['complexity'].isnull()) & (df['complexity_efile'].notnull()) ),\n", " df['complexity_efile'], df['complexity'])\n", "print len(df[df['complexity'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1476, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data849590.0572392.323008e-0100.0000000.0000000.0000001.000000e+00
2016_data849590.0977412.969662e-0100.0000000.0000000.0000001.000000e+00
donor_advisory838970.0046606.810882e-0200.0000000.0000000.0000001.000000e+00
donor_advisory_2016849590.0043326.567184e-0200.0000000.0000000.0000001.000000e+00
donor_advisory_2011_to_2016849590.0143831.190659e-0100.0000000.0000000.0000001.000000e+00
conflict_of_interest_policy_v2219400.9631721.883429e-0101.0000001.0000001.0000001.000000e+00
records_retention_policy_v2219400.8788513.263073e-0101.0000001.0000001.0000001.000000e+00
whistleblower_policy_v2219400.8816773.229972e-0101.0000001.0000001.0000001.000000e+00
SOX_policies223372.7104367.085330e-0103.0000003.0000003.0000003.000000e+00
SOX_policies_binary223370.9696021.716837e-0101.0000001.0000001.0000001.000000e+00
SOX_policies_all_binary223370.8281773.772346e-0101.0000001.0000001.0000001.000000e+00
program_efficiency223360.8043551.079594e-0100.7557020.8178710.8716411.010186e+00
complexity854010.3829231.228848e+0000.0000000.0000000.0000008.000000e+00
age8383039.5081471.931018e+01024.00000035.00000052.0000001.080000e+02
total_revenue_logged2233715.8391091.718828e+00014.75524915.68215816.8435362.204279e+01
tot_rev1141048442379.3834361.559548e+08-2182650255098355.00000014394792.00000040741209.0000003.741635e+09
category_Animals849590.0725642.594217e-0100.0000000.0000000.0000001.000000e+00
category_Arts, Culture, Humanities849590.1355833.423473e-0100.0000000.0000000.0000001.000000e+00
category_Community Development849590.0877362.829129e-0100.0000000.0000000.0000001.000000e+00
category_Education849590.0611822.396661e-0100.0000000.0000000.0000001.000000e+00
category_Environment849590.0598642.372364e-0100.0000000.0000000.0000001.000000e+00
category_Health849590.1153853.194880e-0100.0000000.0000000.0000001.000000e+00
category_Human Services849590.2487324.322805e-0100.0000000.0000000.0000001.000000e+00
category_Human and Civil Rights849590.0381831.916393e-0100.0000000.0000000.0000001.000000e+00
category_International849590.0849822.788568e-0100.0000000.0000000.0000001.000000e+00
category_Religion849590.0595822.367116e-0100.0000000.0000000.0000001.000000e+00
category_Research and Public Policy849590.0242711.538888e-0100.0000000.0000000.0000001.000000e+00
\n", "
" ], "text/plain": [ " count mean std \\\n", "2011_data 84959 0.057239 2.323008e-01 \n", "2016_data 84959 0.097741 2.969662e-01 \n", "donor_advisory 83897 0.004660 6.810882e-02 \n", "donor_advisory_2016 84959 0.004332 6.567184e-02 \n", "donor_advisory_2011_to_2016 84959 0.014383 1.190659e-01 \n", "conflict_of_interest_policy_v2 21940 0.963172 1.883429e-01 \n", "records_retention_policy_v2 21940 0.878851 3.263073e-01 \n", "whistleblower_policy_v2 21940 0.881677 3.229972e-01 \n", "SOX_policies 22337 2.710436 7.085330e-01 \n", "SOX_policies_binary 22337 0.969602 1.716837e-01 \n", "SOX_policies_all_binary 22337 0.828177 3.772346e-01 \n", "program_efficiency 22336 0.804355 1.079594e-01 \n", "complexity 85401 0.382923 1.228848e+00 \n", "age 83830 39.508147 1.931018e+01 \n", "total_revenue_logged 22337 15.839109 1.718828e+00 \n", "tot_rev 11410 48442379.383436 1.559548e+08 \n", "category_Animals 84959 0.072564 2.594217e-01 \n", "category_Arts, Culture, Humanities 84959 0.135583 3.423473e-01 \n", "category_Community Development 84959 0.087736 2.829129e-01 \n", "category_Education 84959 0.061182 2.396661e-01 \n", "category_Environment 84959 0.059864 2.372364e-01 \n", "category_Health 84959 0.115385 3.194880e-01 \n", "category_Human Services 84959 0.248732 4.322805e-01 \n", "category_Human and Civil Rights 84959 0.038183 1.916393e-01 \n", "category_International 84959 0.084982 2.788568e-01 \n", "category_Religion 84959 0.059582 2.367116e-01 \n", "category_Research and Public Policy 84959 0.024271 1.538888e-01 \n", "\n", " min 25% \\\n", "2011_data 0 0.000000 \n", "2016_data 0 0.000000 \n", "donor_advisory 0 0.000000 \n", "donor_advisory_2016 0 0.000000 \n", "donor_advisory_2011_to_2016 0 0.000000 \n", "conflict_of_interest_policy_v2 0 1.000000 \n", "records_retention_policy_v2 0 1.000000 \n", "whistleblower_policy_v2 0 1.000000 \n", "SOX_policies 0 3.000000 \n", "SOX_policies_binary 0 1.000000 \n", "SOX_policies_all_binary 0 1.000000 \n", "program_efficiency 0 0.755702 \n", "complexity 0 0.000000 \n", "age 0 24.000000 \n", "total_revenue_logged 0 14.755249 \n", "tot_rev -218265025 5098355.000000 \n", "category_Animals 0 0.000000 \n", "category_Arts, Culture, Humanities 0 0.000000 \n", "category_Community Development 0 0.000000 \n", "category_Education 0 0.000000 \n", "category_Environment 0 0.000000 \n", "category_Health 0 0.000000 \n", "category_Human Services 0 0.000000 \n", "category_Human and Civil Rights 0 0.000000 \n", "category_International 0 0.000000 \n", "category_Religion 0 0.000000 \n", "category_Research and Public Policy 0 0.000000 \n", "\n", " 50% 75% \\\n", "2011_data 0.000000 0.000000 \n", "2016_data 0.000000 0.000000 \n", "donor_advisory 0.000000 0.000000 \n", "donor_advisory_2016 0.000000 0.000000 \n", "donor_advisory_2011_to_2016 0.000000 0.000000 \n", "conflict_of_interest_policy_v2 1.000000 1.000000 \n", "records_retention_policy_v2 1.000000 1.000000 \n", "whistleblower_policy_v2 1.000000 1.000000 \n", "SOX_policies 3.000000 3.000000 \n", "SOX_policies_binary 1.000000 1.000000 \n", "SOX_policies_all_binary 1.000000 1.000000 \n", "program_efficiency 0.817871 0.871641 \n", "complexity 0.000000 0.000000 \n", "age 35.000000 52.000000 \n", "total_revenue_logged 15.682158 16.843536 \n", "tot_rev 14394792.000000 40741209.000000 \n", "category_Animals 0.000000 0.000000 \n", "category_Arts, Culture, Humanities 0.000000 0.000000 \n", "category_Community Development 0.000000 0.000000 \n", "category_Education 0.000000 0.000000 \n", "category_Environment 0.000000 0.000000 \n", "category_Health 0.000000 0.000000 \n", "category_Human Services 0.000000 0.000000 \n", "category_Human and Civil Rights 0.000000 0.000000 \n", "category_International 0.000000 0.000000 \n", "category_Religion 0.000000 0.000000 \n", "category_Research and Public Policy 0.000000 0.000000 \n", "\n", " max \n", "2011_data 1.000000e+00 \n", "2016_data 1.000000e+00 \n", "donor_advisory 1.000000e+00 \n", "donor_advisory_2016 1.000000e+00 \n", "donor_advisory_2011_to_2016 1.000000e+00 \n", "conflict_of_interest_policy_v2 1.000000e+00 \n", "records_retention_policy_v2 1.000000e+00 \n", "whistleblower_policy_v2 1.000000e+00 \n", "SOX_policies 3.000000e+00 \n", "SOX_policies_binary 1.000000e+00 \n", "SOX_policies_all_binary 1.000000e+00 \n", "program_efficiency 1.010186e+00 \n", "complexity 8.000000e+00 \n", "age 1.080000e+02 \n", "total_revenue_logged 2.204279e+01 \n", "tot_rev 3.741635e+09 \n", "category_Animals 1.000000e+00 \n", "category_Arts, Culture, Humanities 1.000000e+00 \n", "category_Community Development 1.000000e+00 \n", "category_Education 1.000000e+00 \n", "category_Environment 1.000000e+00 \n", "category_Health 1.000000e+00 \n", "category_Human Services 1.000000e+00 \n", "category_Human and Civil Rights 1.000000e+00 \n", "category_International 1.000000e+00 \n", "category_Religion 1.000000e+00 \n", "category_Research and Public Policy 1.000000e+00 " ] }, "execution_count": 1476, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[cols].describe().T" ] }, { "cell_type": "code", "execution_count": 1477, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "['_merge_v1',\n", " 'to_be_merged',\n", " '_merge_v2',\n", " '_merge_v3',\n", " '_merge_v4',\n", " '_merge_47',\n", " '_merge_efile']" ] }, "execution_count": 1477, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[x for x in list(df) if '_merge' in x]" ] }, { "cell_type": "code", "execution_count": 1495, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Sort DF" ] }, { "cell_type": "code", "execution_count": 1488, "metadata": { "collapsed": true }, "outputs": [], "source": [ " " ] }, { "cell_type": "code", "execution_count": 1494, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINFYE2011_data2016_dataSOX_policiestotal_revenue_loggedtot_rev_merge_v4_merge_47_merge_efile
4648215533010211530FY201401214.447180NaNleft_onlyleft_onlyleft_only
4648315533010211530FY201400NaNNaNNaNleft_onlyleft_onlyleft_only
4648415533010211530FY201300NaNNaNNaNleft_onlyleft_onlyleft_only
1081315222010211543FY201401315.004547NaNleft_onlyleft_onlyleft_only
1081415222010211543FY201400NaNNaNNaNleft_onlyleft_onlyleft_only
1081515222010211543FY201300NaNNaNNaNleft_onlyleft_onlyleft_only
7579415534010211564FY201401314.179904NaNleft_onlyleft_onlyleft_only
7579515534010211564FY201400NaNNaNNaNleft_onlyleft_onlyleft_only
7579615534010211564FY201300NaNNaNNaNleft_onlyleft_onlyleft_only
608737736010212442FY201401315.602503NaNleft_onlyleft_onlyleft_only
608747736010212442FY201300NaNNaNNaNleft_onlyleft_onlyleft_only
608757736010212442FY201300NaNNaNNaNleft_onlyleft_onlyleft_only
608767736010212442FY201200NaNNaNNaNleft_onlyleft_onlyleft_only
608777736010212442FY201100NaNNaNNaNleft_onlyleft_onlyleft_only
608787736010212442FY201100NaNNaNNaNleft_onlyleft_onlyleft_only
608797736010212442FY201010315.498073NaNleft_onlyleft_onlyleft_only
608807736010212442FY201000NaNNaNNaNleft_onlyleft_onlyleft_only
608817736010212442FY200900NaNNaNNaNleft_onlyleft_onlyleft_only
608827736010212442FY200800NaNNaNNaNleft_onlyleft_onlyleft_only
608837736010212442FY200700NaNNaNNaNleft_onlyleft_onlyleft_only
608847736010212442FY200600NaNNaNNaNleft_onlyleft_onlyleft_only
608857736010212442FY200500NaNNaNNaNleft_onlyleft_onlyleft_only
608867736010212442FY200400NaNNaNNaNleft_onlyleft_onlyleft_only
608877736010212442FY200300NaNNaNNaNleft_onlyleft_onlyleft_only
534913258010212541FY201401314.881789NaNleft_onlyleft_onlyleft_only
535013258010212541FY201400NaNNaNNaNleft_onlyleft_onlyleft_only
535113258010212541FY201300114.7504712547112bothleft_onlyleft_only
535213258010212541FY201200115.1585753830764bothleft_onlyleft_only
535313258010212541FY201200115.1585753830764bothleft_onlyleft_only
535413258010212541FY201100114.2526951548341bothleft_onlyleft_only
535513258010212541FY201000114.2316601516112bothleft_onlyleft_only
84112NaN010212541FY200900113.9870731187158bothleft_onlyleft_only
84262NaN010212541FY200800014.0303161239621bothleft_onlyleft_only
752510965010215910FY201501213.783290NaNleft_onlyleft_onlyleft_only
752610965010215910FY201500NaNNaNNaNleft_onlyleft_onlyleft_only
752710965010215910FY201400NaNNaNNaNleft_onlyleft_onlyleft_only
752810965010215910FY201300NaNNaNNaNleft_onlyleft_onlyleft_only
752910965010215910FY201200NaNNaNNaNleft_onlyleft_onlyleft_only
753010965010215910FY201100NaNNaNNaNleft_onlyleft_onlyleft_only
753110965010215910FY201010113.853119NaNleft_onlyleft_onlyleft_only
\n", "
" ], "text/plain": [ " org_id EIN FYE 2011_data 2016_data SOX_policies \\\n", "46482 15533 010211530 FY2014 0 1 2 \n", "46483 15533 010211530 FY2014 0 0 NaN \n", "46484 15533 010211530 FY2013 0 0 NaN \n", "10813 15222 010211543 FY2014 0 1 3 \n", "10814 15222 010211543 FY2014 0 0 NaN \n", "10815 15222 010211543 FY2013 0 0 NaN \n", "75794 15534 010211564 FY2014 0 1 3 \n", "75795 15534 010211564 FY2014 0 0 NaN \n", "75796 15534 010211564 FY2013 0 0 NaN \n", "60873 7736 010212442 FY2014 0 1 3 \n", "60874 7736 010212442 FY2013 0 0 NaN \n", "60875 7736 010212442 FY2013 0 0 NaN \n", "60876 7736 010212442 FY2012 0 0 NaN \n", "60877 7736 010212442 FY2011 0 0 NaN \n", "60878 7736 010212442 FY2011 0 0 NaN \n", "60879 7736 010212442 FY2010 1 0 3 \n", "60880 7736 010212442 FY2010 0 0 NaN \n", "60881 7736 010212442 FY2009 0 0 NaN \n", "60882 7736 010212442 FY2008 0 0 NaN \n", "60883 7736 010212442 FY2007 0 0 NaN \n", "60884 7736 010212442 FY2006 0 0 NaN \n", "60885 7736 010212442 FY2005 0 0 NaN \n", "60886 7736 010212442 FY2004 0 0 NaN \n", "60887 7736 010212442 FY2003 0 0 NaN \n", "5349 13258 010212541 FY2014 0 1 3 \n", "5350 13258 010212541 FY2014 0 0 NaN \n", "5351 13258 010212541 FY2013 0 0 1 \n", "5352 13258 010212541 FY2012 0 0 1 \n", "5353 13258 010212541 FY2012 0 0 1 \n", "5354 13258 010212541 FY2011 0 0 1 \n", "5355 13258 010212541 FY2010 0 0 1 \n", "84112 NaN 010212541 FY2009 0 0 1 \n", "84262 NaN 010212541 FY2008 0 0 0 \n", "7525 10965 010215910 FY2015 0 1 2 \n", "7526 10965 010215910 FY2015 0 0 NaN \n", "7527 10965 010215910 FY2014 0 0 NaN \n", "7528 10965 010215910 FY2013 0 0 NaN \n", "7529 10965 010215910 FY2012 0 0 NaN \n", "7530 10965 010215910 FY2011 0 0 NaN \n", "7531 10965 010215910 FY2010 1 0 1 \n", "\n", " total_revenue_logged tot_rev _merge_v4 _merge_47 _merge_efile \n", "46482 14.447180 NaN left_only left_only left_only \n", "46483 NaN NaN left_only left_only left_only \n", "46484 NaN NaN left_only left_only left_only \n", "10813 15.004547 NaN left_only left_only left_only \n", "10814 NaN NaN left_only left_only left_only \n", "10815 NaN NaN left_only left_only left_only \n", "75794 14.179904 NaN left_only left_only left_only \n", "75795 NaN NaN left_only left_only left_only \n", "75796 NaN NaN left_only left_only left_only \n", "60873 15.602503 NaN left_only left_only left_only \n", "60874 NaN NaN left_only left_only left_only \n", "60875 NaN NaN left_only left_only left_only \n", "60876 NaN NaN left_only left_only left_only \n", "60877 NaN NaN left_only left_only left_only \n", "60878 NaN NaN left_only left_only left_only \n", "60879 15.498073 NaN left_only left_only left_only \n", "60880 NaN NaN left_only left_only left_only \n", "60881 NaN NaN left_only left_only left_only \n", "60882 NaN NaN left_only left_only left_only \n", "60883 NaN NaN left_only left_only left_only \n", "60884 NaN NaN left_only left_only left_only \n", "60885 NaN NaN left_only left_only left_only \n", "60886 NaN NaN left_only left_only left_only \n", "60887 NaN NaN left_only left_only left_only \n", "5349 14.881789 NaN left_only left_only left_only \n", "5350 NaN NaN left_only left_only left_only \n", "5351 14.750471 2547112 both left_only left_only \n", "5352 15.158575 3830764 both left_only left_only \n", "5353 15.158575 3830764 both left_only left_only \n", "5354 14.252695 1548341 both left_only left_only \n", "5355 14.231660 1516112 both left_only left_only \n", "84112 13.987073 1187158 both left_only left_only \n", "84262 14.030316 1239621 both left_only left_only \n", "7525 13.783290 NaN left_only left_only left_only \n", "7526 NaN NaN left_only left_only left_only \n", "7527 NaN NaN left_only left_only left_only \n", "7528 NaN NaN left_only left_only left_only \n", "7529 NaN NaN left_only left_only left_only \n", "7530 NaN NaN left_only left_only left_only \n", "7531 13.853119 NaN left_only left_only left_only " ] }, "execution_count": 1494, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['org_id', 'EIN', 'FYE', '2011_data', '2016_data', 'SOX_policies', 'total_revenue_logged',\n", " 'tot_rev', '_merge_v4', '_merge_47', '_merge_efile']][40:80]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Save DF" ] }, { "cell_type": "code", "execution_count": 1487, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "85401\n" ] } ], "source": [ "#df = pd.read_pickle('Merged CN dataset with Age, State, Category, Total Revenues, Efficiency, Complexity, SOX, Donor Advisory (with added 990 data).pkl')\n", "print len(df)\n", "df.to_pickle('Merged CN dataset with Age, State, Category, Total Revenues, Efficiency, Complexity, SOX, Donor Advisory (with added 990 data).pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset checks for Test 1 and Test 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df[cols]" ] }, { "cell_type": "code", "execution_count": 1498, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "4863" ] }, "execution_count": 1498, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df[df['2011_data']==1])" ] }, { "cell_type": "code", "execution_count": 1504, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexityagetotal_revenue_loggedcategorystatetot_rev
507150005954010202467FY20092009-12CN 2.0101113110.78889506215.947563Research and Public PolicyME8432154
403540003916010211513FY20102010-05CN 2.0101113110.85885106619.115237HealthME200282021
608790007736010212442FY20102010-08CN 2.0101113110.91865107015.498073Human ServicesMENaN
753100010965010215910FY20102010-04CN 2.0101001100.71405803913.853119AnimalsMENaN
465140009318010216837FY20092009-12CN 2.0101113110.81976407514.350657Human ServicesMENaN
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "50715 0 0 0 \n", "40354 0 0 0 \n", "60879 0 0 0 \n", "7531 0 0 0 \n", "46514 0 0 0 \n", "\n", " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "50715 5954 010202467 FY2009 2009-12 CN 2.0 1 \n", "40354 3916 010211513 FY2010 2010-05 CN 2.0 1 \n", "60879 7736 010212442 FY2010 2010-08 CN 2.0 1 \n", "7531 10965 010215910 FY2010 2010-04 CN 2.0 1 \n", "46514 9318 010216837 FY2009 2009-12 CN 2.0 1 \n", "\n", " 2016_data conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "50715 0 1 1 \n", "40354 0 1 1 \n", "60879 0 1 1 \n", "7531 0 1 0 \n", "46514 0 1 1 \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "50715 1 3 1 \n", "40354 1 3 1 \n", "60879 1 3 1 \n", "7531 0 1 1 \n", "46514 1 3 1 \n", "\n", " SOX_policies_all_binary program_efficiency complexity age \\\n", "50715 1 0.788895 0 62 \n", "40354 1 0.858851 0 66 \n", "60879 1 0.918651 0 70 \n", "7531 0 0.714058 0 39 \n", "46514 1 0.819764 0 75 \n", "\n", " total_revenue_logged category state tot_rev \n", "50715 15.947563 Research and Public Policy ME 8432154 \n", "40354 19.115237 Health ME 200282021 \n", "60879 15.498073 Human Services ME NaN \n", "7531 13.853119 Animals ME NaN \n", "46514 14.350657 Human Services ME NaN " ] }, "execution_count": 1504, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['2011_data']==1][cols][:5]" ] }, { "cell_type": "code", "execution_count": 1502, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 4816\n", "1 47\n", "Name: donor_advisory_2016, dtype: int64" ] }, "execution_count": 1502, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['2011_data']==1]['donor_advisory_2016'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1503, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 4755\n", "1 108\n", "Name: donor_advisory_2011_to_2016, dtype: int64" ] }, "execution_count": 1503, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['2011_data']==1]['donor_advisory_2011_to_2016'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1500, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "576" ] }, "execution_count": 1500, "metadata": {}, "output_type": "execute_result" } ], "source": [ "5439-4863" ] }, { "cell_type": "code", "execution_count": 1497, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
donor_advisory48150.0049847.043159e-020.0000000.0000000.0000000.0000001.000000e+00
donor_advisory_201648630.0096659.784363e-020.0000000.0000000.0000000.0000001.000000e+00
donor_advisory_2011_to_201648630.0222091.473763e-010.0000000.0000000.0000000.0000001.000000e+00
2011_data48631.0000000.000000e+001.0000001.0000001.0000001.0000001.000000e+00
2016_data48630.0000000.000000e+000.0000000.0000000.0000000.0000000.000000e+00
conflict_of_interest_policy_v248380.9336502.489182e-010.0000001.0000001.0000001.0000001.000000e+00
records_retention_policy_v248380.7995044.004130e-010.0000001.0000001.0000001.0000001.000000e+00
whistleblower_policy_v248380.7999174.001033e-010.0000001.0000001.0000001.0000001.000000e+00
SOX_policies48382.5330728.696534e-010.0000002.0000003.0000003.0000003.000000e+00
SOX_policies_binary48380.9472922.234725e-010.0000001.0000001.0000001.0000001.000000e+00
SOX_policies_all_binary48380.7339814.419200e-010.0000000.0000001.0000001.0000001.000000e+00
program_efficiency48380.8046911.055729e-010.0221770.7555250.8165680.8708209.976872e-01
complexity48630.0000000.000000e+000.0000000.0000000.0000000.0000000.000000e+00
age486040.0510291.924022e+010.00000025.00000035.00000052.0000001.080000e+02
total_revenue_logged483815.4617251.654727e+000.00000014.55884315.35817416.2773762.200080e+01
tot_rev125743126114.5592681.378553e+08-42638874.0000005469933.00000013989527.00000036968805.0000003.587230e+09
\n", "
" ], "text/plain": [ " count mean std \\\n", "donor_advisory 4815 0.004984 7.043159e-02 \n", "donor_advisory_2016 4863 0.009665 9.784363e-02 \n", "donor_advisory_2011_to_2016 4863 0.022209 1.473763e-01 \n", "2011_data 4863 1.000000 0.000000e+00 \n", "2016_data 4863 0.000000 0.000000e+00 \n", "conflict_of_interest_policy_v2 4838 0.933650 2.489182e-01 \n", "records_retention_policy_v2 4838 0.799504 4.004130e-01 \n", "whistleblower_policy_v2 4838 0.799917 4.001033e-01 \n", "SOX_policies 4838 2.533072 8.696534e-01 \n", "SOX_policies_binary 4838 0.947292 2.234725e-01 \n", "SOX_policies_all_binary 4838 0.733981 4.419200e-01 \n", "program_efficiency 4838 0.804691 1.055729e-01 \n", "complexity 4863 0.000000 0.000000e+00 \n", "age 4860 40.051029 1.924022e+01 \n", "total_revenue_logged 4838 15.461725 1.654727e+00 \n", "tot_rev 1257 43126114.559268 1.378553e+08 \n", "\n", " min 25% \\\n", "donor_advisory 0.000000 0.000000 \n", "donor_advisory_2016 0.000000 0.000000 \n", "donor_advisory_2011_to_2016 0.000000 0.000000 \n", "2011_data 1.000000 1.000000 \n", "2016_data 0.000000 0.000000 \n", "conflict_of_interest_policy_v2 0.000000 1.000000 \n", "records_retention_policy_v2 0.000000 1.000000 \n", "whistleblower_policy_v2 0.000000 1.000000 \n", "SOX_policies 0.000000 2.000000 \n", "SOX_policies_binary 0.000000 1.000000 \n", "SOX_policies_all_binary 0.000000 0.000000 \n", "program_efficiency 0.022177 0.755525 \n", "complexity 0.000000 0.000000 \n", "age 0.000000 25.000000 \n", "total_revenue_logged 0.000000 14.558843 \n", "tot_rev -42638874.000000 5469933.000000 \n", "\n", " 50% 75% max \n", "donor_advisory 0.000000 0.000000 1.000000e+00 \n", "donor_advisory_2016 0.000000 0.000000 1.000000e+00 \n", "donor_advisory_2011_to_2016 0.000000 0.000000 1.000000e+00 \n", "2011_data 1.000000 1.000000 1.000000e+00 \n", "2016_data 0.000000 0.000000 0.000000e+00 \n", "conflict_of_interest_policy_v2 1.000000 1.000000 1.000000e+00 \n", "records_retention_policy_v2 1.000000 1.000000 1.000000e+00 \n", "whistleblower_policy_v2 1.000000 1.000000 1.000000e+00 \n", "SOX_policies 3.000000 3.000000 3.000000e+00 \n", "SOX_policies_binary 1.000000 1.000000 1.000000e+00 \n", "SOX_policies_all_binary 1.000000 1.000000 1.000000e+00 \n", "program_efficiency 0.816568 0.870820 9.976872e-01 \n", "complexity 0.000000 0.000000 0.000000e+00 \n", "age 35.000000 52.000000 1.080000e+02 \n", "total_revenue_logged 15.358174 16.277376 2.200080e+01 \n", "tot_rev 13989527.000000 36968805.000000 3.587230e+09 " ] }, "execution_count": 1497, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cols = DVs + indicators + IVs + controls + SOI_check\n", "df[df['2011_data']==1][cols].describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Fix Age based on Guidestar Values and website searches" ] }, { "cell_type": "code", "execution_count": 1505, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexityagetotal_revenue_loggedcategorystatetot_rev
630630006108251730893FY20102010-06CN 2.0101113110.7982850NaN14.156778Human ServicesPANaN
597090006951363038894FY20092009-12CN 2.0100000000.7718000NaN13.681441ReligionPANaN
148050007972520941367FY20102010-06CN 2.0101102100.8203250NaN13.726762Human and Civil RightsDCNaN
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "63063 0 0 0 \n", "59709 0 0 0 \n", "14805 0 0 0 \n", "\n", " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "63063 6108 251730893 FY2010 2010-06 CN 2.0 1 \n", "59709 6951 363038894 FY2009 2009-12 CN 2.0 1 \n", "14805 7972 520941367 FY2010 2010-06 CN 2.0 1 \n", "\n", " 2016_data conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "63063 0 1 1 \n", "59709 0 0 0 \n", "14805 0 1 1 \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "63063 1 3 1 \n", "59709 0 0 0 \n", "14805 0 2 1 \n", "\n", " SOX_policies_all_binary program_efficiency complexity age \\\n", "63063 1 0.798285 0 NaN \n", "59709 0 0.771800 0 NaN \n", "14805 0 0.820325 0 NaN \n", "\n", " total_revenue_logged category state tot_rev \n", "63063 14.156778 Human Services PA NaN \n", "59709 13.681441 Religion PA NaN \n", "14805 13.726762 Human and Civil Rights DC NaN " ] }, "execution_count": 1505, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['2011_data']==1) & (df['age'].isnull())][cols]" ] }, { "cell_type": "code", "execution_count": 1509, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "83830\n", "83846\n" ] } ], "source": [ "print len(df[df['age'].notnull()])\n", "df['age'] = np.where(df['org_id']=='6108', 22, df['age'])\n", "print len(df[df['age'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1516, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "83846\n", "83861\n" ] } ], "source": [ "print len(df[df['age'].notnull()])\n", "df['age'] = np.where(df['org_id']=='6951', 37, df['age'])\n", "print len(df[df['age'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1517, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "83861\n", "83874\n" ] } ], "source": [ "print len(df[df['age'].notnull()])\n", "df['age'] = np.where(df['org_id']=='7972', 45, df['age'])\n", "print len(df[df['age'].notnull()])" ] }, { "cell_type": "code", "execution_count": 1515, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FYEorg_idagenameSOX_policiesdonor_advisory
14799FY20147972NaNCenter of Concern30
14800FY20147972NaNCenter of ConcernNaN0
14801FY20147972NaNCenter of ConcernNaN0
14802FY20137972NaNCenter of ConcernNaN0
14803FY20127972NaNCenter of ConcernNaN0
14804FY20117972NaNCenter of ConcernNaN0
14805FY20107972NaNCenter of Concern20
14806FY20107972NaNCenter of ConcernNaN0
14807FY20087972NaNCenter of ConcernNaN0
14808FY20077972NaNCenter of ConcernNaN0
14809FY20067972NaNCenter of ConcernNaN0
14810FY20057972NaNCenter of ConcernNaN0
14811FY20047972NaNCenter of ConcernNaN0
\n", "
" ], "text/plain": [ " FYE org_id age name SOX_policies donor_advisory\n", "14799 FY2014 7972 NaN Center of Concern 3 0\n", "14800 FY2014 7972 NaN Center of Concern NaN 0\n", "14801 FY2014 7972 NaN Center of Concern NaN 0\n", "14802 FY2013 7972 NaN Center of Concern NaN 0\n", "14803 FY2012 7972 NaN Center of Concern NaN 0\n", "14804 FY2011 7972 NaN Center of Concern NaN 0\n", "14805 FY2010 7972 NaN Center of Concern 2 0\n", "14806 FY2010 7972 NaN Center of Concern NaN 0\n", "14807 FY2008 7972 NaN Center of Concern NaN 0\n", "14808 FY2007 7972 NaN Center of Concern NaN 0\n", "14809 FY2006 7972 NaN Center of Concern NaN 0\n", "14810 FY2005 7972 NaN Center of Concern NaN 0\n", "14811 FY2004 7972 NaN Center of Concern NaN 0" ] }, "execution_count": 1515, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['EIN']=='520941367'][['FYE', 'org_id', 'age', 'name', 'SOX_policies', 'donor_advisory']]" ] }, { "cell_type": "code", "execution_count": 1524, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
donor_advisory48150.0049847.043159e-020.0000000.0000000.0000000.0000001.000000e+00
donor_advisory_201648630.0096659.784363e-020.0000000.0000000.0000000.0000001.000000e+00
donor_advisory_2011_to_201648630.0222091.473763e-010.0000000.0000000.0000000.0000001.000000e+00
2011_data48631.0000000.000000e+001.0000001.0000001.0000001.0000001.000000e+00
2016_data48630.0000000.000000e+000.0000000.0000000.0000000.0000000.000000e+00
conflict_of_interest_policy_v248380.9336502.489182e-010.0000001.0000001.0000001.0000001.000000e+00
records_retention_policy_v248380.7995044.004130e-010.0000001.0000001.0000001.0000001.000000e+00
whistleblower_policy_v248380.7999174.001033e-010.0000001.0000001.0000001.0000001.000000e+00
SOX_policies48382.5330728.696534e-010.0000002.0000003.0000003.0000003.000000e+00
SOX_policies_binary48380.9472922.234725e-010.0000001.0000001.0000001.0000001.000000e+00
SOX_policies_all_binary48380.7339814.419200e-010.0000000.0000001.0000001.0000001.000000e+00
program_efficiency48380.8046911.055729e-010.0221770.7555250.8165680.8708209.976872e-01
complexity48630.0000000.000000e+000.0000000.0000000.0000000.0000000.000000e+00
age486340.0477071.923620e+010.00000025.00000035.00000052.0000001.080000e+02
total_revenue_logged483815.4617251.654727e+000.00000014.55884315.35817416.2773762.200080e+01
tot_rev125743126114.5592681.378553e+08-42638874.0000005469933.00000013989527.00000036968805.0000003.587230e+09
complexity_201148332.4667915.144678e-011.0000002.0000002.0000003.0000003.000000e+00
\n", "
" ], "text/plain": [ " count mean std \\\n", "donor_advisory 4815 0.004984 7.043159e-02 \n", "donor_advisory_2016 4863 0.009665 9.784363e-02 \n", "donor_advisory_2011_to_2016 4863 0.022209 1.473763e-01 \n", "2011_data 4863 1.000000 0.000000e+00 \n", "2016_data 4863 0.000000 0.000000e+00 \n", "conflict_of_interest_policy_v2 4838 0.933650 2.489182e-01 \n", "records_retention_policy_v2 4838 0.799504 4.004130e-01 \n", "whistleblower_policy_v2 4838 0.799917 4.001033e-01 \n", "SOX_policies 4838 2.533072 8.696534e-01 \n", "SOX_policies_binary 4838 0.947292 2.234725e-01 \n", "SOX_policies_all_binary 4838 0.733981 4.419200e-01 \n", "program_efficiency 4838 0.804691 1.055729e-01 \n", "complexity 4863 0.000000 0.000000e+00 \n", "age 4863 40.047707 1.923620e+01 \n", "total_revenue_logged 4838 15.461725 1.654727e+00 \n", "tot_rev 1257 43126114.559268 1.378553e+08 \n", "complexity_2011 4833 2.466791 5.144678e-01 \n", "\n", " min 25% \\\n", "donor_advisory 0.000000 0.000000 \n", "donor_advisory_2016 0.000000 0.000000 \n", "donor_advisory_2011_to_2016 0.000000 0.000000 \n", "2011_data 1.000000 1.000000 \n", "2016_data 0.000000 0.000000 \n", "conflict_of_interest_policy_v2 0.000000 1.000000 \n", "records_retention_policy_v2 0.000000 1.000000 \n", "whistleblower_policy_v2 0.000000 1.000000 \n", "SOX_policies 0.000000 2.000000 \n", "SOX_policies_binary 0.000000 1.000000 \n", "SOX_policies_all_binary 0.000000 0.000000 \n", "program_efficiency 0.022177 0.755525 \n", "complexity 0.000000 0.000000 \n", "age 0.000000 25.000000 \n", "total_revenue_logged 0.000000 14.558843 \n", "tot_rev -42638874.000000 5469933.000000 \n", "complexity_2011 1.000000 2.000000 \n", "\n", " 50% 75% max \n", "donor_advisory 0.000000 0.000000 1.000000e+00 \n", "donor_advisory_2016 0.000000 0.000000 1.000000e+00 \n", "donor_advisory_2011_to_2016 0.000000 0.000000 1.000000e+00 \n", "2011_data 1.000000 1.000000 1.000000e+00 \n", "2016_data 0.000000 0.000000 0.000000e+00 \n", "conflict_of_interest_policy_v2 1.000000 1.000000 1.000000e+00 \n", "records_retention_policy_v2 1.000000 1.000000 1.000000e+00 \n", "whistleblower_policy_v2 1.000000 1.000000 1.000000e+00 \n", "SOX_policies 3.000000 3.000000 3.000000e+00 \n", "SOX_policies_binary 1.000000 1.000000 1.000000e+00 \n", "SOX_policies_all_binary 1.000000 1.000000 1.000000e+00 \n", "program_efficiency 0.816568 0.870820 9.976872e-01 \n", "complexity 0.000000 0.000000 0.000000e+00 \n", "age 35.000000 52.000000 1.080000e+02 \n", "total_revenue_logged 15.358174 16.277376 2.200080e+01 \n", "tot_rev 13989527.000000 36968805.000000 3.587230e+09 \n", "complexity_2011 2.000000 3.000000 3.000000e+00 " ] }, "execution_count": 1524, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cols = DVs + indicators + IVs + controls + SOI_check + ['complexity_2011']\n", "df[df['2011_data']==1][cols].describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save 2011 Dataset" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'complexity_2011', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ "DVs = ['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016']\n", "indicators = ['org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data']\n", "IVs = ['conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2',\n", " 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']\n", "controls = ['program_efficiency', 'complexity', 'complexity_2011', 'age', 'total_revenue_logged', 'category', 'state']\n", "fixed_effects = ['category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', \n", " 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', \n", " 'category_Human and Civil Rights', 'category_International', 'category_Religion', \n", " 'category_Research and Public Policy']\n", "SOI_check = ['tot_rev']\n", "\n", "merge_cols = ['_merge_v1', '_merge_v2', '_merge_v3', '_merge_v4', '_merge_47', '_merge_efile']\n", "\n", "logit_cols = DVs + indicators + IVs + controls + SOI_check + fixed_effects\n", "print logit_cols" ] }, { "cell_type": "code", "execution_count": 1526, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4863\n", "35\n", "4863\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revcategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
507150005954010202467FY20092009-12CN 2.0101113110.788895036215.947563Research and Public PolicyME843215400000000001
403540003916010211513FY20102010-05CN 2.0101113110.858851036619.115237HealthME20028202100000100000
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "50715 0 0 0 \n", "40354 0 0 0 \n", "\n", " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "50715 5954 010202467 FY2009 2009-12 CN 2.0 1 \n", "40354 3916 010211513 FY2010 2010-05 CN 2.0 1 \n", "\n", " 2016_data conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "50715 0 1 1 \n", "40354 0 1 1 \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "50715 1 3 1 \n", "40354 1 3 1 \n", "\n", " SOX_policies_all_binary program_efficiency complexity \\\n", "50715 1 0.788895 0 \n", "40354 1 0.858851 0 \n", "\n", " complexity_2011 age total_revenue_logged category \\\n", "50715 3 62 15.947563 Research and Public Policy \n", "40354 3 66 19.115237 Health \n", "\n", " state tot_rev category_Animals category_Arts, Culture, Humanities \\\n", "50715 ME 8432154 0 0 \n", "40354 ME 200282021 0 0 \n", "\n", " category_Community Development category_Education \\\n", "50715 0 0 \n", "40354 0 0 \n", "\n", " category_Environment category_Health category_Human Services \\\n", "50715 0 0 0 \n", "40354 0 1 0 \n", "\n", " category_Human and Civil Rights category_International \\\n", "50715 0 0 \n", "40354 0 0 \n", "\n", " category_Religion category_Research and Public Policy \n", "50715 0 1 \n", "40354 0 0 " ] }, "execution_count": 1526, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['2011_data']==1][logit_cols])\n", "df_2011 = df[df['2011_data']==1][logit_cols]\n", "print len(df_2011.columns)\n", "print len(df_2011)\n", "df_2011[:2]" ] }, { "cell_type": "code", "execution_count": 1522, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 4816\n", "1 47\n", "Name: donor_advisory_2016, dtype: int64\n", "0 4755\n", "1 108\n", "Name: donor_advisory_2011_to_2016, dtype: int64\n" ] } ], "source": [ "print df_2011['donor_advisory_2016'].value_counts()\n", "print df_2011['donor_advisory_2011_to_2016'].value_counts()" ] }, { "cell_type": "code", "execution_count": 1527, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_2011.to_pickle('Tests 1-2 data.pkl')\n", "df_2011.to_excel('Tests 1-2 data.xls')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fix *complexity* values for obs with 2016 donor advisory\n", "All are zeros. That is not correct. " ] }, { "cell_type": "code", "execution_count": 1614, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "321\n", "0 321\n", "Name: complexity, dtype: int64\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revcategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
7534911115499016009240currentcurrentcurrent01NaNNaNNaNNaNNaNNaNNaN0NaN47NaNHuman ServicesMENaN00000010000
5538611116130020136360currentcurrentcurrent01NaNNaNNaNNaNNaNNaNNaN0NaNNaNNaNEducationNaNNaN00010000000
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "75349 1 1 1 \n", "55386 1 1 1 \n", "\n", " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "75349 15499 016009240 current current current 0 \n", "55386 16130 020136360 current current current 0 \n", "\n", " 2016_data conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "75349 1 NaN NaN \n", "55386 1 NaN NaN \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "75349 NaN NaN NaN \n", "55386 NaN NaN NaN \n", "\n", " SOX_policies_all_binary program_efficiency complexity \\\n", "75349 NaN NaN 0 \n", "55386 NaN NaN 0 \n", "\n", " complexity_2011 age total_revenue_logged category state \\\n", "75349 NaN 47 NaN Human Services ME \n", "55386 NaN NaN NaN Education NaN \n", "\n", " tot_rev category_Animals category_Arts, Culture, Humanities \\\n", "75349 NaN 0 0 \n", "55386 NaN 0 0 \n", "\n", " category_Community Development category_Education \\\n", "75349 0 0 \n", "55386 0 1 \n", "\n", " category_Environment category_Health category_Human Services \\\n", "75349 0 0 1 \n", "55386 0 0 0 \n", "\n", " category_Human and Civil Rights category_International \\\n", "75349 0 0 \n", "55386 0 0 \n", "\n", " category_Religion category_Research and Public Policy \n", "75349 0 0 \n", "55386 0 0 " ] }, "execution_count": 1614, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[(df['2016_data']==1) & (df['donor_advisory']==1)])\n", "print df[(df['2016_data']==1) & (df['donor_advisory']==1)]['complexity'].value_counts()\n", "df[(df['2016_data']==1) & (df['donor_advisory']==1)][logit_cols][:2]" ] }, { "cell_type": "code", "execution_count": 1615, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
donor_advisory3211.0000000.0000001111.001
donor_advisory_20163211.0000000.0000001111.001
donor_advisory_2011_to_20163211.0000000.0000001111.001
2011_data3210.0000000.0000000000.000
2016_data3211.0000000.0000001111.001
conflict_of_interest_policy_v20NaNNaNNaNNaNNaNNaNNaN
records_retention_policy_v20NaNNaNNaNNaNNaNNaNNaN
whistleblower_policy_v20NaNNaNNaNNaNNaNNaNNaN
SOX_policies0NaNNaNNaNNaNNaNNaNNaN
SOX_policies_binary0NaNNaNNaNNaNNaNNaNNaN
SOX_policies_all_binary0NaNNaNNaNNaNNaNNaNNaN
program_efficiency0NaNNaNNaNNaNNaNNaNNaN
complexity3210.0000000.0000000000.000
complexity_20110NaNNaNNaNNaNNaNNaNNaN
age24623.10569117.206934092033.7581
total_revenue_logged0NaNNaNNaNNaNNaNNaNNaN
tot_rev0NaNNaNNaNNaNNaNNaNNaN
category_Animals3210.0342680.1822010000.001
category_Arts, Culture, Humanities3210.0186920.1356450000.001
category_Community Development3210.1339560.3411370000.001
category_Education3210.0903430.2871200000.001
category_Environment3210.0280370.1653370000.001
category_Health3210.1214950.3272120000.001
category_Human Services3210.3582550.4802360001.001
category_Human and Civil Rights3210.0560750.2304250000.001
category_International3210.0373830.1899950000.001
category_Religion3210.0903430.2871200000.001
category_Research and Public Policy3210.0311530.1740010000.001
\n", "
" ], "text/plain": [ " count mean std min 25% \\\n", "donor_advisory 321 1.000000 0.000000 1 1 \n", "donor_advisory_2016 321 1.000000 0.000000 1 1 \n", "donor_advisory_2011_to_2016 321 1.000000 0.000000 1 1 \n", "2011_data 321 0.000000 0.000000 0 0 \n", "2016_data 321 1.000000 0.000000 1 1 \n", "conflict_of_interest_policy_v2 0 NaN NaN NaN NaN \n", "records_retention_policy_v2 0 NaN NaN NaN NaN \n", "whistleblower_policy_v2 0 NaN NaN NaN NaN \n", "SOX_policies 0 NaN NaN NaN NaN \n", "SOX_policies_binary 0 NaN NaN NaN NaN \n", "SOX_policies_all_binary 0 NaN NaN NaN NaN \n", "program_efficiency 0 NaN NaN NaN NaN \n", "complexity 321 0.000000 0.000000 0 0 \n", "complexity_2011 0 NaN NaN NaN NaN \n", "age 246 23.105691 17.206934 0 9 \n", "total_revenue_logged 0 NaN NaN NaN NaN \n", "tot_rev 0 NaN NaN NaN NaN \n", "category_Animals 321 0.034268 0.182201 0 0 \n", "category_Arts, Culture, Humanities 321 0.018692 0.135645 0 0 \n", "category_Community Development 321 0.133956 0.341137 0 0 \n", "category_Education 321 0.090343 0.287120 0 0 \n", "category_Environment 321 0.028037 0.165337 0 0 \n", "category_Health 321 0.121495 0.327212 0 0 \n", "category_Human Services 321 0.358255 0.480236 0 0 \n", "category_Human and Civil Rights 321 0.056075 0.230425 0 0 \n", "category_International 321 0.037383 0.189995 0 0 \n", "category_Religion 321 0.090343 0.287120 0 0 \n", "category_Research and Public Policy 321 0.031153 0.174001 0 0 \n", "\n", " 50% 75% max \n", "donor_advisory 1 1.00 1 \n", "donor_advisory_2016 1 1.00 1 \n", "donor_advisory_2011_to_2016 1 1.00 1 \n", "2011_data 0 0.00 0 \n", "2016_data 1 1.00 1 \n", "conflict_of_interest_policy_v2 NaN NaN NaN \n", "records_retention_policy_v2 NaN NaN NaN \n", "whistleblower_policy_v2 NaN NaN NaN \n", "SOX_policies NaN NaN NaN \n", "SOX_policies_binary NaN NaN NaN \n", "SOX_policies_all_binary NaN NaN NaN \n", "program_efficiency NaN NaN NaN \n", "complexity 0 0.00 0 \n", "complexity_2011 NaN NaN NaN \n", "age 20 33.75 81 \n", "total_revenue_logged NaN NaN NaN \n", "tot_rev NaN NaN NaN \n", "category_Animals 0 0.00 1 \n", "category_Arts, Culture, Humanities 0 0.00 1 \n", "category_Community Development 0 0.00 1 \n", "category_Education 0 0.00 1 \n", "category_Environment 0 0.00 1 \n", "category_Health 0 0.00 1 \n", "category_Human Services 0 1.00 1 \n", "category_Human and Civil Rights 0 0.00 1 \n", "category_International 0 0.00 1 \n", "category_Religion 0 0.00 1 \n", "category_Research and Public Policy 0 0.00 1 " ] }, "execution_count": 1615, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['2016_data']==1) & (df['donor_advisory']==1)][logit_cols].describe().T" ] }, { "cell_type": "code", "execution_count": 1620, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "85401\n", "85401\n" ] } ], "source": [ "print len(df)\n", "print df['complexity'].value_counts().sum()" ] }, { "cell_type": "code", "execution_count": 1621, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "321\n" ] } ], "source": [ "print len(df[df['complexity'].isnull()])\n", "df['complexity'] = np.where( ( (df['2016_data']==1) & (df['donor_advisory']==1)), np.nan, df['complexity'])\n", "print len(df[df['complexity'].isnull()])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Check what the 2016 donor advisory orgs are missing" ] }, { "cell_type": "code", "execution_count": 1622, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "321\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revcategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
7534911115499016009240currentcurrentcurrent01NaNNaNNaNNaNNaNNaNNaNNaNNaN47NaNHuman ServicesMENaN00000010000
5538611116130020136360currentcurrentcurrent01NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNEducationNaNNaN00010000000
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "75349 1 1 1 \n", "55386 1 1 1 \n", "\n", " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "75349 15499 016009240 current current current 0 \n", "55386 16130 020136360 current current current 0 \n", "\n", " 2016_data conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "75349 1 NaN NaN \n", "55386 1 NaN NaN \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "75349 NaN NaN NaN \n", "55386 NaN NaN NaN \n", "\n", " SOX_policies_all_binary program_efficiency complexity \\\n", "75349 NaN NaN NaN \n", "55386 NaN NaN NaN \n", "\n", " complexity_2011 age total_revenue_logged category state \\\n", "75349 NaN 47 NaN Human Services ME \n", "55386 NaN NaN NaN Education NaN \n", "\n", " tot_rev category_Animals category_Arts, Culture, Humanities \\\n", "75349 NaN 0 0 \n", "55386 NaN 0 0 \n", "\n", " category_Community Development category_Education \\\n", "75349 0 0 \n", "55386 0 1 \n", "\n", " category_Environment category_Health category_Human Services \\\n", "75349 0 0 1 \n", "55386 0 0 0 \n", "\n", " category_Human and Civil Rights category_International \\\n", "75349 0 0 \n", "55386 0 0 \n", "\n", " category_Religion category_Research and Public Policy \n", "75349 0 0 \n", "55386 0 0 " ] }, "execution_count": 1622, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[(df['donor_advisory_2016']==1) & (df['2016_data']==1)])\n", "df[(df['donor_advisory_2016']==1) & (df['2016_data']==1)][logit_cols][:2]" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "321\n", "321\n" ] }, { "data": { "text/plain": [ "['020503776', nan, nan, nan, nan]" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "advisory_2016 = df[(df['donor_advisory_2016']==1) & (df['2016_data']==1)]['EIN'].tolist()\n", "print len(advisory_2016)\n", "print len(set(advisory_2016))\n", "advisory_2016[:5]" ] }, { "cell_type": "code", "execution_count": 1624, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df[df['EIN'].isin(advisory_2016)][logit_cols].to_excel('2016 advisory orgs.xls')" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "956" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df[df['EIN'].isin(advisory_2016)])" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
latest_entryFalseTrue
2016_data
0.0470
1.00321
\n", "
" ], "text/plain": [ "latest_entry False True\n", "2016_data \n", "0.0 47 0\n", "1.0 0 321" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df[df['EIN'].isin(advisory_2016)]['2016_data'], df[df['EIN'].isin(advisory_2016)]['latest_entry'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Here's the normal way I'd do a groupby -- but it won't work for getting 'first' or 'last' entries." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def f(x):\n", " return Series(dict(Number_of_Public_Reply_Messages = x['reply_message'].sum(),\n", " Number_of_RTs = x['retweeted_status_dummy'].sum(),\n", " Number_of_tweets = x['content'].count(), \n", " Avg_number_lists = x['from_user_listed_count'].avg(),\n", " #rts = x['retweeted_status_dummy'].value_counts().max(), \n", " #TO GET MAXIMUM VALUE --> OTHERWISE IT WILL GENERATE VARIABLE WITH LIST OF VALUES '[1429, 450]'\n", " ))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "firm_day_count = df_2014.groupby([df_2014.index.date,'ticker']).apply(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
We could also do something like this (pseudo code)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "g = df.groupby(0)\n", "grouped['D'].agg({'result1' : \"sum\", 'result2' : \"mean\"})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
This is one way to do a groupby command and grab first value. But, if all of the aggregations are 'first', then there's a shortcut (see below)." ] }, { "cell_type": "code", "execution_count": 1553, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def func_first(s, row):\n", " #df1 = s[s.retweeted_status_dummy==0] \n", " #df2 = s[s.retweeted_status_dummy==1]\n", " #df3 = df_2014\n", " a = df.groupby(row).agg({ 'SOX_policies':{'SOX_policies_first':\"first\"}})\n", " b = df.groupby(row).agg({ 'FYE':{'FYE_first':\"first\"}})\n", " c = df.groupby(row).agg({ 'EIN':{'EIN_first':\"first\"}}) \n", " d = df.groupby(row).agg({ 'org_id':{'org_id_first':\"first\"}}) \n", " #bb = df1.groupby(row).agg({ 'content':{'Number of Original Firm Tweets':\"count\"}}) #PROBLEM HERE\n", " #c = df_2014.groupby(row).agg({ 'retweeted_status_dummy':{'Number of RTs Sent by Firm':\"sum\"}}) \n", " #j1 = df1.groupby(row).agg({ 'retweet_count':{'Original Retweet Count for Firm':\"sum\"}}) #PROBLEM HERE\n", " #l = df_2014.groupby(row).agg({ 'from_user_followers_count':{'Number of Firm Followers (min)':\"min\"}})\n", " #m = df_2014.groupby(row).agg({ 'from_user_followers_count':{'Number of Firm Followers (max)':\"max\"}})\n", " #n = df_2014.groupby(row).agg({ 'from_user_followers_count':{'Number of Firm Followers (start)':\"first\"}})\n", " ##o = df_2014.groupby(row).agg({ 'from_user_followers_count':{'Number of Firm Followers (end)':\"last\"}})\n", " #p = df_2014.groupby(row).agg({ 'from_user_followers_count':{'Number of Firm Followers (mean)':\"mean\"}})\n", " #q = df_2014.groupby(row).agg({ 'from_user_listed_count':{'Number of Lists for Firm (min)':\"min\"}})\n", " #r = df_2014.groupby(row).agg({ 'from_user_listed_count':{'Number of Lists for Firm (max)':'max'}})\n", " #o = df_2014.groupby(row).agg({ 'from_user_listed_count':{'Number of Lists for Firm (start)':\"first\"}}) \n", " #t = df_2014.groupby(row).agg({ 'from_user_listed_count':{'Number of Lists for Firm (end)':\"last\"}})\n", " #u = df_2014.groupby(row).agg({ 'from_user_listed_count':{'Number of Lists for Firm (mean)':\"mean\"}}) \n", " #p = df_2014.groupby(row).agg({ 'retweeted_user_followers_count':{'Total Follower Count for Users Retweeted by Firm':\"sum\"}})\n", " #s1 = pd.concat([a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p], axis=1)\n", " s1 = pd.concat([a,b,c], axis=1)\n", " s1.columns = s1.columns.droplevel()\n", " return s1" ] }, { "cell_type": "code", "execution_count": 1575, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "157\n", "98\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SOX_policies_firstFYE_firstEIN_first
EIN
016009240NaNcurrent016009240
020136360NaNcurrent020136360
020503776NaNcurrent020503776
020508063NaNcurrent020508063
0304982142current030498214
0421298893current042129889
042453412NaNcurrent042453412
0427016943current042701694
042753817NaNcurrent042753817
042958082NaNcurrent042958082
\n", "
" ], "text/plain": [ " SOX_policies_first FYE_first EIN_first\n", "EIN \n", "016009240 NaN current 016009240\n", "020136360 NaN current 020136360\n", "020503776 NaN current 020503776\n", "020508063 NaN current 020508063\n", "030498214 2 current 030498214\n", "042129889 3 current 042129889\n", "042453412 NaN current 042453412\n", "042701694 3 current 042701694\n", "042753817 NaN current 042753817\n", "042958082 NaN current 042958082" ] }, "execution_count": 1575, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_data = func_first(df[df['EIN'].isin(advisory_2016)], df[df['EIN'].isin(advisory_2016)]['EIN'])\n", "print len(first_data[first_data['SOX_policies_first'].notnull()])\n", "print len(first_data[first_data['SOX_policies_first'].isnull()])\n", "first_data[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
This command doesn't work." ] }, { "cell_type": "code", "execution_count": 1574, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "255\n", "Series([], Name: SOX_policies, dtype: int64)\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2011_data2016_dataFYEForm 990 FYESOX_policiesSOX_policies_all_binarySOX_policies_binaryagecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policycomplexitycomplexity_2011conflict_of_interest_policy_v2donor_advisorydonor_advisory_2011_to_2016donor_advisory_2016org_idprogram_efficiencyratings_systemrecords_retention_policy_v2statetot_revtotal_revenue_loggedwhistleblower_policy_v2
EIN
01600924001currentcurrentNaNNaNNaN47Human Services000000100000NaNNaN11115499NaNcurrentNaNMENaNNaNNaN
02013636001currentcurrentNaNNaNNaNNaNEducation000100000000NaNNaN11116130NaNcurrentNaNNaNNaNNaNNaN
02050377601currentcurrentNaNNaNNaN5Human Services000000100000NaNNaN11116722NaNcurrentNaNNHNaNNaNNaN
02050806301currentcurrentNaNNaNNaN17Animals100000000000NaNNaN1117520NaNcurrentNaNNHNaNNaNNaN
03049821401currentcurrentNaNNaNNaN13Community Development001000000000NaNNaN11113486NaNcurrentNaNNJNaNNaNNaN
\n", "
" ], "text/plain": [ " 2011_data 2016_data FYE Form 990 FYE SOX_policies \\\n", "EIN \n", "016009240 0 1 current current NaN \n", "020136360 0 1 current current NaN \n", "020503776 0 1 current current NaN \n", "020508063 0 1 current current NaN \n", "030498214 0 1 current current NaN \n", "\n", " SOX_policies_all_binary SOX_policies_binary age \\\n", "EIN \n", "016009240 NaN NaN 47 \n", "020136360 NaN NaN NaN \n", "020503776 NaN NaN 5 \n", "020508063 NaN NaN 17 \n", "030498214 NaN NaN 13 \n", "\n", " category category_Animals \\\n", "EIN \n", "016009240 Human Services 0 \n", "020136360 Education 0 \n", "020503776 Human Services 0 \n", "020508063 Animals 1 \n", "030498214 Community Development 0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "EIN \n", "016009240 0 0 \n", "020136360 0 0 \n", "020503776 0 0 \n", "020508063 0 0 \n", "030498214 0 1 \n", "\n", " category_Education category_Environment category_Health \\\n", "EIN \n", "016009240 0 0 0 \n", "020136360 1 0 0 \n", "020503776 0 0 0 \n", "020508063 0 0 0 \n", "030498214 0 0 0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "EIN \n", "016009240 1 0 \n", "020136360 0 0 \n", "020503776 1 0 \n", "020508063 0 0 \n", "030498214 0 0 \n", "\n", " category_International category_Religion \\\n", "EIN \n", "016009240 0 0 \n", "020136360 0 0 \n", "020503776 0 0 \n", "020508063 0 0 \n", "030498214 0 0 \n", "\n", " category_Research and Public Policy complexity complexity_2011 \\\n", "EIN \n", "016009240 0 0 NaN \n", "020136360 0 0 NaN \n", "020503776 0 0 NaN \n", "020508063 0 0 NaN \n", "030498214 0 0 NaN \n", "\n", " conflict_of_interest_policy_v2 donor_advisory \\\n", "EIN \n", "016009240 NaN 1 \n", "020136360 NaN 1 \n", "020503776 NaN 1 \n", "020508063 NaN 1 \n", "030498214 NaN 1 \n", "\n", " donor_advisory_2011_to_2016 donor_advisory_2016 org_id \\\n", "EIN \n", "016009240 1 1 15499 \n", "020136360 1 1 16130 \n", "020503776 1 1 16722 \n", "020508063 1 1 7520 \n", "030498214 1 1 13486 \n", "\n", " program_efficiency ratings_system records_retention_policy_v2 \\\n", "EIN \n", "016009240 NaN current NaN \n", "020136360 NaN current NaN \n", "020503776 NaN current NaN \n", "020508063 NaN current NaN \n", "030498214 NaN current NaN \n", "\n", " state tot_rev total_revenue_logged whistleblower_policy_v2 \n", "EIN \n", "016009240 ME NaN NaN NaN \n", "020136360 NaN NaN NaN NaN \n", "020503776 NH NaN NaN NaN \n", "020508063 NH NaN NaN NaN \n", "030498214 NJ NaN NaN NaN " ] }, "execution_count": 1574, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['EIN'].isin(advisory_2016)][logit_cols].groupby('EIN').nth(0))\n", "print df[df['EIN'].isin(advisory_2016)][logit_cols].groupby('EIN').nth(0)['SOX_policies'].value_counts()\n", "df[df['EIN'].isin(advisory_2016)][logit_cols].groupby('EIN').nth(0)[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
This version works and is a better shortcut." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#df = pd.read_pickle('Merged CN dataset with Age, State, Category, Total Revenues, Efficiency, Complexity, SOX, Donor Advisory (with added 990 data).pkl')" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'complexity_2011', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'total_revenue']\n" ] } ], "source": [ "'''\n", "DVs = ['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016']\n", "indicators = ['org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data']\n", "IVs = ['conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2',\n", " 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary']\n", "controls = ['program_efficiency', 'complexity', 'complexity_2011', 'age', 'total_revenue_logged', 'category', 'state']\n", "fixed_effects = ['category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', \n", " 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', \n", " 'category_Human and Civil Rights', 'category_International', 'category_Religion', \n", " 'category_Research and Public Policy']\n", "SOI_check = ['tot_rev']\n", "extra = ['total_revenue']\n", "\n", "merge_cols = ['_merge_v1', '_merge_v2', '_merge_v3', '_merge_v4', '_merge_47', '_merge_efile']\n", "\n", "logit_cols = DVs + indicators + IVs + controls + SOI_check + extra #+ fixed_effects\n", "print logit_cols\n", "'''" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revtotal_revenue
01.01.01.016722020503776currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaN0.0NaN5.0NaNHuman ServicesNHNaNNaN
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 org_id \\\n", "0 1.0 1.0 1.0 16722 \n", "\n", " EIN FYE Form 990 FYE ratings_system 2011_data 2016_data \\\n", "0 020503776 current current current 0.0 1.0 \n", "\n", " conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "0 NaN NaN \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "0 NaN NaN NaN \n", "\n", " SOX_policies_all_binary program_efficiency complexity complexity_2011 \\\n", "0 NaN NaN 0.0 NaN \n", "\n", " age total_revenue_logged category state tot_rev total_revenue \n", "0 5.0 NaN Human Services NH NaN NaN " ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df[logit_cols][:1]" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "255\n", "3.0 77\n", "2.0 31\n", "0.0 30\n", "1.0 19\n", "Name: SOX_policies, dtype: int64\n", "157\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revtotal_revenue
EIN
0205037761.01.01.016722currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaN0.0NaN5.0NaNHuman ServicesNHNaNNaN
0205080631.01.01.07520currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaN0.0NaN17.0NaNAnimalsNHNaNNaN
0304982141.01.01.013486currentcurrentcurrent0.01.01.01.00.02.01.00.00.7609160.0NaN13.014.737942Community DevelopmentNJ2515399.02515399.0
0421298891.01.01.04441currentcurrentcurrent0.01.01.01.01.03.01.01.00.8290870.03.065.017.165174HealthMA28493155.028493155.0
0424534121.01.01.016648currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaN0.0NaN46.0NaNCommunity DevelopmentNaNNaNNaN
0427016941.01.01.013914currentcurrentcurrent0.01.0NaNNaNNaN3.01.01.00.7515080.0NaN36.013.546895EducationMA764437.0NaN
0427538171.01.01.016666currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaN0.0NaN0.0NaNCommunity DevelopmentMANaNNaN
0429580821.01.01.016644currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaN0.0NaN18.0NaNHuman ServicesMANaNNaN
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "EIN \n", "020503776 1.0 1.0 1.0 \n", "020508063 1.0 1.0 1.0 \n", "030498214 1.0 1.0 1.0 \n", "042129889 1.0 1.0 1.0 \n", "042453412 1.0 1.0 1.0 \n", "042701694 1.0 1.0 1.0 \n", "042753817 1.0 1.0 1.0 \n", "042958082 1.0 1.0 1.0 \n", "\n", " org_id FYE Form 990 FYE ratings_system 2011_data 2016_data \\\n", "EIN \n", "020503776 16722 current current current 0.0 1.0 \n", "020508063 7520 current current current 0.0 1.0 \n", "030498214 13486 current current current 0.0 1.0 \n", "042129889 4441 current current current 0.0 1.0 \n", "042453412 16648 current current current 0.0 1.0 \n", "042701694 13914 current current current 0.0 1.0 \n", "042753817 16666 current current current 0.0 1.0 \n", "042958082 16644 current current current 0.0 1.0 \n", "\n", " conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "EIN \n", "020503776 NaN NaN \n", "020508063 NaN NaN \n", "030498214 1.0 1.0 \n", "042129889 1.0 1.0 \n", "042453412 NaN NaN \n", "042701694 NaN NaN \n", "042753817 NaN NaN \n", "042958082 NaN NaN \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "EIN \n", "020503776 NaN NaN NaN \n", "020508063 NaN NaN NaN \n", "030498214 0.0 2.0 1.0 \n", "042129889 1.0 3.0 1.0 \n", "042453412 NaN NaN NaN \n", "042701694 NaN 3.0 1.0 \n", "042753817 NaN NaN NaN \n", "042958082 NaN NaN NaN \n", "\n", " SOX_policies_all_binary program_efficiency complexity \\\n", "EIN \n", "020503776 NaN NaN 0.0 \n", "020508063 NaN NaN 0.0 \n", "030498214 0.0 0.760916 0.0 \n", "042129889 1.0 0.829087 0.0 \n", "042453412 NaN NaN 0.0 \n", "042701694 1.0 0.751508 0.0 \n", "042753817 NaN NaN 0.0 \n", "042958082 NaN NaN 0.0 \n", "\n", " complexity_2011 age total_revenue_logged category \\\n", "EIN \n", "020503776 NaN 5.0 NaN Human Services \n", "020508063 NaN 17.0 NaN Animals \n", "030498214 NaN 13.0 14.737942 Community Development \n", "042129889 3.0 65.0 17.165174 Health \n", "042453412 NaN 46.0 NaN Community Development \n", "042701694 NaN 36.0 13.546895 Education \n", "042753817 NaN 0.0 NaN Community Development \n", "042958082 NaN 18.0 NaN Human Services \n", "\n", " state tot_rev total_revenue \n", "EIN \n", "020503776 NH NaN NaN \n", "020508063 NH NaN NaN \n", "030498214 NJ 2515399.0 2515399.0 \n", "042129889 MA 28493155.0 28493155.0 \n", "042453412 NaN NaN NaN \n", "042701694 MA 764437.0 NaN \n", "042753817 MA NaN NaN \n", "042958082 MA NaN NaN " ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['EIN'].isin(advisory_2016)][logit_cols].groupby('EIN').agg('first'))\n", "print df[df['EIN'].isin(advisory_2016)][logit_cols].groupby('EIN').agg('first')['SOX_policies'].value_counts()\n", "print 77+31+30+19\n", "df[df['EIN'].isin(advisory_2016)][logit_cols].groupby('EIN').agg('first')[2:10]" ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "157\n", "98\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revtotal_revenue
EIN
0160092401.01.01.015499currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaN0.0NaN47.0NaNHuman ServicesMENaNNaN
0201363601.01.01.016130currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaN0.0NaNNaNNaNEducationNaNNaNNaN
0205037761.01.01.016722currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaN0.0NaN5.0NaNHuman ServicesNHNaNNaN
0205080631.01.01.07520currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaN0.0NaN17.0NaNAnimalsNHNaNNaN
0304982141.01.01.013486currentcurrentcurrent0.01.01.01.00.02.01.00.00.7609160.0NaN13.014.737942Community DevelopmentNJ2515399.02515399.0
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "EIN \n", "016009240 1.0 1.0 1.0 \n", "020136360 1.0 1.0 1.0 \n", "020503776 1.0 1.0 1.0 \n", "020508063 1.0 1.0 1.0 \n", "030498214 1.0 1.0 1.0 \n", "\n", " org_id FYE Form 990 FYE ratings_system 2011_data 2016_data \\\n", "EIN \n", "016009240 15499 current current current 0.0 1.0 \n", "020136360 16130 current current current 0.0 1.0 \n", "020503776 16722 current current current 0.0 1.0 \n", "020508063 7520 current current current 0.0 1.0 \n", "030498214 13486 current current current 0.0 1.0 \n", "\n", " conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "EIN \n", "016009240 NaN NaN \n", "020136360 NaN NaN \n", "020503776 NaN NaN \n", "020508063 NaN NaN \n", "030498214 1.0 1.0 \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "EIN \n", "016009240 NaN NaN NaN \n", "020136360 NaN NaN NaN \n", "020503776 NaN NaN NaN \n", "020508063 NaN NaN NaN \n", "030498214 0.0 2.0 1.0 \n", "\n", " SOX_policies_all_binary program_efficiency complexity \\\n", "EIN \n", "016009240 NaN NaN 0.0 \n", "020136360 NaN NaN 0.0 \n", "020503776 NaN NaN 0.0 \n", "020508063 NaN NaN 0.0 \n", "030498214 0.0 0.760916 0.0 \n", "\n", " complexity_2011 age total_revenue_logged category \\\n", "EIN \n", "016009240 NaN 47.0 NaN Human Services \n", "020136360 NaN NaN NaN Education \n", "020503776 NaN 5.0 NaN Human Services \n", "020508063 NaN 17.0 NaN Animals \n", "030498214 NaN 13.0 14.737942 Community Development \n", "\n", " state tot_rev total_revenue \n", "EIN \n", "016009240 ME NaN NaN \n", "020136360 NaN NaN NaN \n", "020503776 NH NaN NaN \n", "020508063 NH NaN NaN \n", "030498214 NJ 2515399.0 2515399.0 " ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_data_2016_advisories = df[df['EIN'].isin(advisory_2016)][logit_cols].groupby('EIN').agg('first')\n", "print len(first_data_2016_advisories[first_data_2016_advisories['SOX_policies'].notnull()])\n", "print len(first_data_2016_advisories[first_data_2016_advisories['SOX_policies'].isnull()])\n", "first_data_2016_advisories[:5]" ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "collapsed": true }, "outputs": [], "source": [ "first_data_2016_advisories = first_data_2016_advisories.reset_index()" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['EIN', 'donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'complexity_2011', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'total_revenue']\n" ] } ], "source": [ "print first_data_2016_advisories.columns.tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Export 2016 data columns for Test 4" ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8304\n", "25\n", "8304\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revtotal_revenue
01.01.01.016722020503776currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaN0.0NaN5.0NaNHuman ServicesNHNaNNaN
10.00.01.010166043314346FY20132013-12CN 2.10.01.01.01.01.03.01.01.00.8708652.0NaN8.013.549098HealthMANaN766123.0
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 org_id \\\n", "0 1.0 1.0 1.0 16722 \n", "1 0.0 0.0 1.0 10166 \n", "\n", " EIN FYE Form 990 FYE ratings_system 2011_data 2016_data \\\n", "0 020503776 current current current 0.0 1.0 \n", "1 043314346 FY2013 2013-12 CN 2.1 0.0 1.0 \n", "\n", " conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "0 NaN NaN \n", "1 1.0 1.0 \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "0 NaN NaN NaN \n", "1 1.0 3.0 1.0 \n", "\n", " SOX_policies_all_binary program_efficiency complexity complexity_2011 \\\n", "0 NaN NaN 0.0 NaN \n", "1 1.0 0.870865 2.0 NaN \n", "\n", " age total_revenue_logged category state tot_rev total_revenue \n", "0 5.0 NaN Human Services NH NaN NaN \n", "1 8.0 13.549098 Health MA NaN 766123.0 " ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['2016_data']==1][logit_cols])\n", "df_2016 = df[df['2016_data']==1][logit_cols]\n", "print len(df_2016.columns)\n", "print len(df_2016)\n", "df_2016[:2]" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0 7983\n", "1.0 321\n", "Name: donor_advisory, dtype: int64 \n", "\n", "0.0 7983\n", "1.0 321\n", "Name: donor_advisory_2016, dtype: int64 \n", "\n" ] } ], "source": [ "print df_2016['donor_advisory'].value_counts(), '\\n'\n", "print df_2016['donor_advisory_2016'].value_counts(), '\\n'" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'EIN', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'complexity_2011', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'total_revenue']\n" ] } ], "source": [ "print df_2016.columns.tolist()" ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['EIN', 'donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'complexity_2011', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'total_revenue']\n" ] } ], "source": [ "print first_data_2016_advisories.columns.tolist()" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
donor_advisory255.01.000000e+000.000000e+001.01.01.01.01.000000e+00
donor_advisory_2016255.01.000000e+000.000000e+001.01.01.01.01.000000e+00
donor_advisory_2011_to_2016255.01.000000e+000.000000e+001.01.01.01.01.000000e+00
2011_data255.00.000000e+000.000000e+000.00.00.00.00.000000e+00
2016_data255.01.000000e+000.000000e+001.01.01.01.01.000000e+00
conflict_of_interest_policy_v266.08.030303e-014.007569e-010.0NaNNaNNaN1.000000e+00
records_retention_policy_v266.06.818182e-014.693397e-010.0NaNNaNNaN1.000000e+00
whistleblower_policy_v266.06.666667e-014.750169e-010.0NaNNaNNaN1.000000e+00
SOX_policies157.01.987261e+001.176627e+000.0NaNNaNNaN3.000000e+00
SOX_policies_binary157.08.089172e-013.944122e-010.0NaNNaNNaN1.000000e+00
SOX_policies_all_binary157.04.904459e-015.015084e-010.0NaNNaNNaN1.000000e+00
program_efficiency157.07.309421e-012.578280e-010.0NaNNaNNaN1.000000e+00
complexity255.00.000000e+000.000000e+000.00.00.00.00.000000e+00
complexity_201141.02.121951e+005.096627e-011.0NaNNaNNaN3.000000e+00
age245.02.313878e+011.723432e+010.0NaNNaNNaN8.100000e+01
total_revenue_logged157.01.481904e+012.479390e+000.0NaNNaNNaN2.168384e+01
tot_rev157.03.521763e+072.236996e+08-7264312.0NaNNaNNaN2.613209e+09
total_revenue61.08.349723e+073.551054e+08-7264312.0NaNNaNNaN2.613209e+09
\n", "
" ], "text/plain": [ " count mean std min \\\n", "donor_advisory 255.0 1.000000e+00 0.000000e+00 1.0 \n", "donor_advisory_2016 255.0 1.000000e+00 0.000000e+00 1.0 \n", "donor_advisory_2011_to_2016 255.0 1.000000e+00 0.000000e+00 1.0 \n", "2011_data 255.0 0.000000e+00 0.000000e+00 0.0 \n", "2016_data 255.0 1.000000e+00 0.000000e+00 1.0 \n", "conflict_of_interest_policy_v2 66.0 8.030303e-01 4.007569e-01 0.0 \n", "records_retention_policy_v2 66.0 6.818182e-01 4.693397e-01 0.0 \n", "whistleblower_policy_v2 66.0 6.666667e-01 4.750169e-01 0.0 \n", "SOX_policies 157.0 1.987261e+00 1.176627e+00 0.0 \n", "SOX_policies_binary 157.0 8.089172e-01 3.944122e-01 0.0 \n", "SOX_policies_all_binary 157.0 4.904459e-01 5.015084e-01 0.0 \n", "program_efficiency 157.0 7.309421e-01 2.578280e-01 0.0 \n", "complexity 255.0 0.000000e+00 0.000000e+00 0.0 \n", "complexity_2011 41.0 2.121951e+00 5.096627e-01 1.0 \n", "age 245.0 2.313878e+01 1.723432e+01 0.0 \n", "total_revenue_logged 157.0 1.481904e+01 2.479390e+00 0.0 \n", "tot_rev 157.0 3.521763e+07 2.236996e+08 -7264312.0 \n", "total_revenue 61.0 8.349723e+07 3.551054e+08 -7264312.0 \n", "\n", " 25% 50% 75% max \n", "donor_advisory 1.0 1.0 1.0 1.000000e+00 \n", "donor_advisory_2016 1.0 1.0 1.0 1.000000e+00 \n", "donor_advisory_2011_to_2016 1.0 1.0 1.0 1.000000e+00 \n", "2011_data 0.0 0.0 0.0 0.000000e+00 \n", "2016_data 1.0 1.0 1.0 1.000000e+00 \n", "conflict_of_interest_policy_v2 NaN NaN NaN 1.000000e+00 \n", "records_retention_policy_v2 NaN NaN NaN 1.000000e+00 \n", "whistleblower_policy_v2 NaN NaN NaN 1.000000e+00 \n", "SOX_policies NaN NaN NaN 3.000000e+00 \n", "SOX_policies_binary NaN NaN NaN 1.000000e+00 \n", "SOX_policies_all_binary NaN NaN NaN 1.000000e+00 \n", "program_efficiency NaN NaN NaN 1.000000e+00 \n", "complexity 0.0 0.0 0.0 0.000000e+00 \n", "complexity_2011 NaN NaN NaN 3.000000e+00 \n", "age NaN NaN NaN 8.100000e+01 \n", "total_revenue_logged NaN NaN NaN 2.168384e+01 \n", "tot_rev NaN NaN NaN 2.613209e+09 \n", "total_revenue NaN NaN NaN 2.613209e+09 " ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_data_2016_advisories.describe().T" ] }, { "cell_type": "code", "execution_count": 116, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "set([])\n", "set([])\n" ] } ], "source": [ "print set(first_data_2016_advisories.columns.tolist()) - set(df_2016.columns.tolist())\n", "print set(df_2016.columns.tolist()) - set(first_data_2016_advisories.columns.tolist())" ] }, { "cell_type": "code", "execution_count": 117, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8049\n", "255\n", "8304\n", "8049\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revtotal_revenue
10.00.01.010166043314346FY20132013-12CN 2.10.01.01.01.01.03.01.01.00.8708652.0NaN8.013.549098HealthMANaN766123.0
150.00.00.06466953667812FY20142014-06CN 2.10.01.01.01.01.03.01.01.00.7637385.0NaN35.015.697937EducationCANaN6569428.0
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 org_id \\\n", "1 0.0 0.0 1.0 10166 \n", "15 0.0 0.0 0.0 6466 \n", "\n", " EIN FYE Form 990 FYE ratings_system 2011_data 2016_data \\\n", "1 043314346 FY2013 2013-12 CN 2.1 0.0 1.0 \n", "15 953667812 FY2014 2014-06 CN 2.1 0.0 1.0 \n", "\n", " conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "1 1.0 1.0 \n", "15 1.0 1.0 \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "1 1.0 3.0 1.0 \n", "15 1.0 3.0 1.0 \n", "\n", " SOX_policies_all_binary program_efficiency complexity complexity_2011 \\\n", "1 1.0 0.870865 2.0 NaN \n", "15 1.0 0.763738 5.0 NaN \n", "\n", " age total_revenue_logged category state tot_rev total_revenue \n", "1 8.0 13.549098 Health MA NaN 766123.0 \n", "15 35.0 15.697937 Education CA NaN 6569428.0 " ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df_2016[~df_2016['EIN'].isin(advisory_2016)])\n", "print len(df_2016[df_2016['EIN'].isin(advisory_2016)])\n", "print len(df_2016[df_2016['EIN'].isin(advisory_2016)]) + len(df_2016[~df_2016['EIN'].isin(advisory_2016)])\n", "df_2016_mod = df_2016[~df_2016['EIN'].isin(advisory_2016)]\n", "print len(df_2016_mod)\n", "df_2016_mod[:2]" ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8304\n", "255\n", "8049\n", "25\n", "8304\n", "25\n" ] } ], "source": [ "print len(df_2016_mod.append(first_data_2016_advisories))\n", "print 8238-7983\n", "print len(df_2016_mod)\n", "print len(df_2016_mod.columns)\n", "df_2016_mod = df_2016_mod.append(first_data_2016_advisories)\n", "print len(df_2016_mod)\n", "print len(df_2016_mod.columns)" ] }, { "cell_type": "code", "execution_count": 119, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "255\n" ] } ], "source": [ "print len(df_2016_mod[df_2016_mod['EIN'].isin(advisory_2016)])\n", "df_2016_mod[df_2016_mod['EIN'].isin(advisory_2016)].to_excel('df_2016_mod_partial.xls')" ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data8304.00.000000e+000.000000e+000.00.00.00.00.000000e+00
2016_data8304.01.000000e+000.000000e+001.01.01.01.01.000000e+00
SOX_policies8140.02.780713e+006.260091e-010.0NaNNaNNaN3.000000e+00
SOX_policies_all_binary8140.08.683047e-013.381800e-010.0NaNNaNNaN1.000000e+00
SOX_policies_binary8140.09.772727e-011.490418e-010.0NaNNaNNaN1.000000e+00
age8226.03.723499e+011.932929e+010.0NaNNaNNaN1.080000e+02
complexity8304.03.816474e+001.452459e+000.03.04.05.08.000000e+00
complexity_201141.02.121951e+005.096627e-011.0NaNNaNNaN3.000000e+00
conflict_of_interest_policy_v28049.09.730401e-011.619762e-010.0NaNNaNNaN1.000000e+00
donor_advisory8304.03.865607e-021.927855e-010.00.00.00.01.000000e+00
donor_advisory_2011_to_20168304.04.708574e-022.118350e-010.00.00.00.01.000000e+00
donor_advisory_20168304.03.865607e-021.927855e-010.00.00.00.01.000000e+00
program_efficiency8140.08.012793e-011.100624e-010.0NaNNaNNaN1.000000e+00
records_retention_policy_v28049.09.059510e-012.919149e-010.0NaNNaNNaN1.000000e+00
tot_rev794.05.614404e+071.813611e+08-7264312.0NaNNaNNaN2.974134e+09
total_revenue8044.01.663883e+078.464929e+07-7264312.0NaNNaNNaN3.471552e+09
total_revenue_logged8140.01.538204e+011.315085e+000.0NaNNaNNaN2.196787e+01
whistleblower_policy_v28049.09.120388e-012.832561e-010.0NaNNaNNaN1.000000e+00
\n", "
" ], "text/plain": [ " count mean std min \\\n", "2011_data 8304.0 0.000000e+00 0.000000e+00 0.0 \n", "2016_data 8304.0 1.000000e+00 0.000000e+00 1.0 \n", "SOX_policies 8140.0 2.780713e+00 6.260091e-01 0.0 \n", "SOX_policies_all_binary 8140.0 8.683047e-01 3.381800e-01 0.0 \n", "SOX_policies_binary 8140.0 9.772727e-01 1.490418e-01 0.0 \n", "age 8226.0 3.723499e+01 1.932929e+01 0.0 \n", "complexity 8304.0 3.816474e+00 1.452459e+00 0.0 \n", "complexity_2011 41.0 2.121951e+00 5.096627e-01 1.0 \n", "conflict_of_interest_policy_v2 8049.0 9.730401e-01 1.619762e-01 0.0 \n", "donor_advisory 8304.0 3.865607e-02 1.927855e-01 0.0 \n", "donor_advisory_2011_to_2016 8304.0 4.708574e-02 2.118350e-01 0.0 \n", "donor_advisory_2016 8304.0 3.865607e-02 1.927855e-01 0.0 \n", "program_efficiency 8140.0 8.012793e-01 1.100624e-01 0.0 \n", "records_retention_policy_v2 8049.0 9.059510e-01 2.919149e-01 0.0 \n", "tot_rev 794.0 5.614404e+07 1.813611e+08 -7264312.0 \n", "total_revenue 8044.0 1.663883e+07 8.464929e+07 -7264312.0 \n", "total_revenue_logged 8140.0 1.538204e+01 1.315085e+00 0.0 \n", "whistleblower_policy_v2 8049.0 9.120388e-01 2.832561e-01 0.0 \n", "\n", " 25% 50% 75% max \n", "2011_data 0.0 0.0 0.0 0.000000e+00 \n", "2016_data 1.0 1.0 1.0 1.000000e+00 \n", "SOX_policies NaN NaN NaN 3.000000e+00 \n", "SOX_policies_all_binary NaN NaN NaN 1.000000e+00 \n", "SOX_policies_binary NaN NaN NaN 1.000000e+00 \n", "age NaN NaN NaN 1.080000e+02 \n", "complexity 3.0 4.0 5.0 8.000000e+00 \n", "complexity_2011 NaN NaN NaN 3.000000e+00 \n", "conflict_of_interest_policy_v2 NaN NaN NaN 1.000000e+00 \n", "donor_advisory 0.0 0.0 0.0 1.000000e+00 \n", "donor_advisory_2011_to_2016 0.0 0.0 0.0 1.000000e+00 \n", "donor_advisory_2016 0.0 0.0 0.0 1.000000e+00 \n", "program_efficiency NaN NaN NaN 1.000000e+00 \n", "records_retention_policy_v2 NaN NaN NaN 1.000000e+00 \n", "tot_rev NaN NaN NaN 2.974134e+09 \n", "total_revenue NaN NaN NaN 3.471552e+09 \n", "total_revenue_logged NaN NaN NaN 2.196787e+01 \n", "whistleblower_policy_v2 NaN NaN NaN 1.000000e+00 " ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2016_mod.describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# SIDEBAR -- CREATING UNLOGGED REVENUES VARIABLE FOR SUMMARY STATS TABLE (I HAD TO ADD IN UNLOGGED REVENUE COLUMNS TO *logit_cols*" ] }, { "cell_type": "code", "execution_count": 122, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "8044\n", "8140\n" ] } ], "source": [ "df_2016_mod['revitup'] = np.nan\n", "print len(df_2016_mod[df_2016_mod['revitup'].notnull()])\n", "df_2016_mod['revitup'] = np.where( ( (df_2016_mod['revitup'].isnull()) & (df_2016_mod['total_revenue'].notnull()) ),\n", " df_2016_mod['total_revenue'], df_2016_mod['revitup'])\n", "print len(df_2016_mod[df_2016_mod['revitup'].notnull()])\n", "df_2016_mod['revitup'] = np.where( ( (df_2016_mod['revitup'].isnull()) & (df_2016_mod['tot_rev'].notnull()) ),\n", " df_2016_mod['tot_rev'], df_2016_mod['revitup'])\n", "print len(df_2016_mod[df_2016_mod['revitup'].notnull()])" ] }, { "cell_type": "code", "execution_count": 123, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data8304.00.000000e+000.000000e+000.00.00.00.00.000000e+00
2016_data8304.01.000000e+000.000000e+001.01.01.01.01.000000e+00
SOX_policies8140.02.780713e+006.260091e-010.0NaNNaNNaN3.000000e+00
SOX_policies_all_binary8140.08.683047e-013.381800e-010.0NaNNaNNaN1.000000e+00
SOX_policies_binary8140.09.772727e-011.490418e-010.0NaNNaNNaN1.000000e+00
age8226.03.723499e+011.932929e+010.0NaNNaNNaN1.080000e+02
complexity8304.03.816474e+001.452459e+000.03.04.05.08.000000e+00
complexity_201141.02.121951e+005.096627e-011.0NaNNaNNaN3.000000e+00
conflict_of_interest_policy_v28049.09.730401e-011.619762e-010.0NaNNaNNaN1.000000e+00
donor_advisory8304.03.865607e-021.927855e-010.00.00.00.01.000000e+00
donor_advisory_2011_to_20168304.04.708574e-022.118350e-010.00.00.00.01.000000e+00
donor_advisory_20168304.03.865607e-021.927855e-010.00.00.00.01.000000e+00
program_efficiency8140.08.012793e-011.100624e-010.0NaNNaNNaN1.000000e+00
records_retention_policy_v28049.09.059510e-012.919149e-010.0NaNNaNNaN1.000000e+00
tot_rev794.05.614404e+071.813611e+08-7264312.0NaNNaNNaN2.974134e+09
total_revenue8044.01.663883e+078.464929e+07-7264312.0NaNNaNNaN3.471552e+09
total_revenue_logged8140.01.538204e+011.315085e+000.0NaNNaNNaN2.196787e+01
whistleblower_policy_v28049.09.120388e-012.832561e-010.0NaNNaNNaN1.000000e+00
revitup8140.01.650084e+078.416408e+07-7264312.0NaNNaNNaN3.471552e+09
\n", "
" ], "text/plain": [ " count mean std min \\\n", "2011_data 8304.0 0.000000e+00 0.000000e+00 0.0 \n", "2016_data 8304.0 1.000000e+00 0.000000e+00 1.0 \n", "SOX_policies 8140.0 2.780713e+00 6.260091e-01 0.0 \n", "SOX_policies_all_binary 8140.0 8.683047e-01 3.381800e-01 0.0 \n", "SOX_policies_binary 8140.0 9.772727e-01 1.490418e-01 0.0 \n", "age 8226.0 3.723499e+01 1.932929e+01 0.0 \n", "complexity 8304.0 3.816474e+00 1.452459e+00 0.0 \n", "complexity_2011 41.0 2.121951e+00 5.096627e-01 1.0 \n", "conflict_of_interest_policy_v2 8049.0 9.730401e-01 1.619762e-01 0.0 \n", "donor_advisory 8304.0 3.865607e-02 1.927855e-01 0.0 \n", "donor_advisory_2011_to_2016 8304.0 4.708574e-02 2.118350e-01 0.0 \n", "donor_advisory_2016 8304.0 3.865607e-02 1.927855e-01 0.0 \n", "program_efficiency 8140.0 8.012793e-01 1.100624e-01 0.0 \n", "records_retention_policy_v2 8049.0 9.059510e-01 2.919149e-01 0.0 \n", "tot_rev 794.0 5.614404e+07 1.813611e+08 -7264312.0 \n", "total_revenue 8044.0 1.663883e+07 8.464929e+07 -7264312.0 \n", "total_revenue_logged 8140.0 1.538204e+01 1.315085e+00 0.0 \n", "whistleblower_policy_v2 8049.0 9.120388e-01 2.832561e-01 0.0 \n", "revitup 8140.0 1.650084e+07 8.416408e+07 -7264312.0 \n", "\n", " 25% 50% 75% max \n", "2011_data 0.0 0.0 0.0 0.000000e+00 \n", "2016_data 1.0 1.0 1.0 1.000000e+00 \n", "SOX_policies NaN NaN NaN 3.000000e+00 \n", "SOX_policies_all_binary NaN NaN NaN 1.000000e+00 \n", "SOX_policies_binary NaN NaN NaN 1.000000e+00 \n", "age NaN NaN NaN 1.080000e+02 \n", "complexity 3.0 4.0 5.0 8.000000e+00 \n", "complexity_2011 NaN NaN NaN 3.000000e+00 \n", "conflict_of_interest_policy_v2 NaN NaN NaN 1.000000e+00 \n", "donor_advisory 0.0 0.0 0.0 1.000000e+00 \n", "donor_advisory_2011_to_2016 0.0 0.0 0.0 1.000000e+00 \n", "donor_advisory_2016 0.0 0.0 0.0 1.000000e+00 \n", "program_efficiency NaN NaN NaN 1.000000e+00 \n", "records_retention_policy_v2 NaN NaN NaN 1.000000e+00 \n", "tot_rev NaN NaN NaN 2.974134e+09 \n", "total_revenue NaN NaN NaN 3.471552e+09 \n", "total_revenue_logged NaN NaN NaN 2.196787e+01 \n", "whistleblower_policy_v2 NaN NaN NaN 1.000000e+00 \n", "revitup NaN NaN NaN 3.471552e+09 " ] }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2016_mod.describe().T" ] }, { "cell_type": "code", "execution_count": 125, "metadata": { "collapsed": true }, "outputs": [], "source": [ "pd.set_option('display.float_format', lambda x: '%.3f' % x)" ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 8140.000\n", "mean 16500839.106\n", "std 84164083.162\n", "min -7264312.000\n", "25% nan\n", "50% nan\n", "75% nan\n", "max 3471552268.000\n", "Name: revitup, dtype: float64" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2016_mod['revitup'].describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# BACK TO REGULARLY SCHEDULED PROGRAMMING" ] }, { "cell_type": "code", "execution_count": 1653, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data2550.0000000.000000e+0000.0000000.0000000.0000000.000000e+00
2016_data2551.0000000.000000e+0011.0000001.0000001.0000001.000000e+00
SOX_policies1571.9872611.176627e+0001.0000002.0000003.0000003.000000e+00
SOX_policies_all_binary1570.4904465.015084e-0100.0000000.0000001.0000001.000000e+00
SOX_policies_binary1570.8089173.944122e-0101.0000001.0000001.0000001.000000e+00
age24523.1387761.723432e+0109.00000020.00000034.0000008.100000e+01
category_Animals2550.0352941.848851e-0100.0000000.0000000.0000001.000000e+00
category_Arts, Culture, Humanities2550.0235291.518757e-0100.0000000.0000000.0000001.000000e+00
category_Community Development2550.0941182.925665e-0100.0000000.0000000.0000001.000000e+00
category_Education2550.1019613.031918e-0100.0000000.0000000.0000001.000000e+00
category_Environment2550.0235291.518757e-0100.0000000.0000000.0000001.000000e+00
category_Health2550.1294123.363152e-0100.0000000.0000000.0000001.000000e+00
category_Human Services2550.3490204.775976e-0100.0000000.0000001.0000001.000000e+00
category_Human and Civil Rights2550.0470592.121812e-0100.0000000.0000000.0000001.000000e+00
category_International2550.0431372.035656e-0100.0000000.0000000.0000001.000000e+00
category_Religion2550.1137253.181019e-0100.0000000.0000000.0000001.000000e+00
category_Research and Public Policy2550.0392161.944895e-0100.0000000.0000000.0000001.000000e+00
complexity1571.6878981.604695e+0000.0000002.0000003.0000006.000000e+00
complexity_2011412.1219515.096627e-0112.0000002.0000002.0000003.000000e+00
conflict_of_interest_policy_v21570.7898094.087480e-0101.0000001.0000001.0000001.000000e+00
donor_advisory2551.0000000.000000e+0011.0000001.0000001.0000001.000000e+00
donor_advisory_2011_to_20162551.0000000.000000e+0011.0000001.0000001.0000001.000000e+00
donor_advisory_20162551.0000000.000000e+0011.0000001.0000001.0000001.000000e+00
program_efficiency1570.7305502.587130e-0100.6344330.8292700.8998811.000000e+00
records_retention_policy_v21570.6560514.765444e-0100.0000001.0000001.0000001.000000e+00
tot_rev15749792270.9554143.187400e+080905369.0000002758339.0000007851441.0000003.741635e+09
total_revenue_logged15714.9659502.246415e+00013.85551814.99754515.8840312.204279e+01
whistleblower_policy_v21570.5414014.998775e-0100.0000001.0000001.0000001.000000e+00
\n", "
" ], "text/plain": [ " count mean std \\\n", "2011_data 255 0.000000 0.000000e+00 \n", "2016_data 255 1.000000 0.000000e+00 \n", "SOX_policies 157 1.987261 1.176627e+00 \n", "SOX_policies_all_binary 157 0.490446 5.015084e-01 \n", "SOX_policies_binary 157 0.808917 3.944122e-01 \n", "age 245 23.138776 1.723432e+01 \n", "category_Animals 255 0.035294 1.848851e-01 \n", "category_Arts, Culture, Humanities 255 0.023529 1.518757e-01 \n", "category_Community Development 255 0.094118 2.925665e-01 \n", "category_Education 255 0.101961 3.031918e-01 \n", "category_Environment 255 0.023529 1.518757e-01 \n", "category_Health 255 0.129412 3.363152e-01 \n", "category_Human Services 255 0.349020 4.775976e-01 \n", "category_Human and Civil Rights 255 0.047059 2.121812e-01 \n", "category_International 255 0.043137 2.035656e-01 \n", "category_Religion 255 0.113725 3.181019e-01 \n", "category_Research and Public Policy 255 0.039216 1.944895e-01 \n", "complexity 157 1.687898 1.604695e+00 \n", "complexity_2011 41 2.121951 5.096627e-01 \n", "conflict_of_interest_policy_v2 157 0.789809 4.087480e-01 \n", "donor_advisory 255 1.000000 0.000000e+00 \n", "donor_advisory_2011_to_2016 255 1.000000 0.000000e+00 \n", "donor_advisory_2016 255 1.000000 0.000000e+00 \n", "program_efficiency 157 0.730550 2.587130e-01 \n", "records_retention_policy_v2 157 0.656051 4.765444e-01 \n", "tot_rev 157 49792270.955414 3.187400e+08 \n", "total_revenue_logged 157 14.965950 2.246415e+00 \n", "whistleblower_policy_v2 157 0.541401 4.998775e-01 \n", "\n", " min 25% 50% \\\n", "2011_data 0 0.000000 0.000000 \n", "2016_data 1 1.000000 1.000000 \n", "SOX_policies 0 1.000000 2.000000 \n", "SOX_policies_all_binary 0 0.000000 0.000000 \n", "SOX_policies_binary 0 1.000000 1.000000 \n", "age 0 9.000000 20.000000 \n", "category_Animals 0 0.000000 0.000000 \n", "category_Arts, Culture, Humanities 0 0.000000 0.000000 \n", "category_Community Development 0 0.000000 0.000000 \n", "category_Education 0 0.000000 0.000000 \n", "category_Environment 0 0.000000 0.000000 \n", "category_Health 0 0.000000 0.000000 \n", "category_Human Services 0 0.000000 0.000000 \n", "category_Human and Civil Rights 0 0.000000 0.000000 \n", "category_International 0 0.000000 0.000000 \n", "category_Religion 0 0.000000 0.000000 \n", "category_Research and Public Policy 0 0.000000 0.000000 \n", "complexity 0 0.000000 2.000000 \n", "complexity_2011 1 2.000000 2.000000 \n", "conflict_of_interest_policy_v2 0 1.000000 1.000000 \n", "donor_advisory 1 1.000000 1.000000 \n", "donor_advisory_2011_to_2016 1 1.000000 1.000000 \n", "donor_advisory_2016 1 1.000000 1.000000 \n", "program_efficiency 0 0.634433 0.829270 \n", "records_retention_policy_v2 0 0.000000 1.000000 \n", "tot_rev 0 905369.000000 2758339.000000 \n", "total_revenue_logged 0 13.855518 14.997545 \n", "whistleblower_policy_v2 0 0.000000 1.000000 \n", "\n", " 75% max \n", "2011_data 0.000000 0.000000e+00 \n", "2016_data 1.000000 1.000000e+00 \n", "SOX_policies 3.000000 3.000000e+00 \n", "SOX_policies_all_binary 1.000000 1.000000e+00 \n", "SOX_policies_binary 1.000000 1.000000e+00 \n", "age 34.000000 8.100000e+01 \n", "category_Animals 0.000000 1.000000e+00 \n", "category_Arts, Culture, Humanities 0.000000 1.000000e+00 \n", "category_Community Development 0.000000 1.000000e+00 \n", "category_Education 0.000000 1.000000e+00 \n", "category_Environment 0.000000 1.000000e+00 \n", "category_Health 0.000000 1.000000e+00 \n", "category_Human Services 1.000000 1.000000e+00 \n", "category_Human and Civil Rights 0.000000 1.000000e+00 \n", "category_International 0.000000 1.000000e+00 \n", "category_Religion 0.000000 1.000000e+00 \n", "category_Research and Public Policy 0.000000 1.000000e+00 \n", "complexity 3.000000 6.000000e+00 \n", "complexity_2011 2.000000 3.000000e+00 \n", "conflict_of_interest_policy_v2 1.000000 1.000000e+00 \n", "donor_advisory 1.000000 1.000000e+00 \n", "donor_advisory_2011_to_2016 1.000000 1.000000e+00 \n", "donor_advisory_2016 1.000000 1.000000e+00 \n", "program_efficiency 0.899881 1.000000e+00 \n", "records_retention_policy_v2 1.000000 1.000000e+00 \n", "tot_rev 7851441.000000 3.741635e+09 \n", "total_revenue_logged 15.884031 2.204279e+01 \n", "whistleblower_policy_v2 1.000000 1.000000e+00 " ] }, "execution_count": 1653, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2016_mod[df_2016_mod['EIN'].isin(advisory_2016)].describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Save DFs" ] }, { "cell_type": "code", "execution_count": 1654, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.to_pickle('df.pkl')\n", "df_2016.to_pickle('2016 - Test 4 data.pkl')\n", "df_2016.to_excel('2016 - Test 4 data.xls')\n", "df_2016_mod.to_pickle('Test 4 data.pkl')\n", "df_2016_mod.to_excel('Test 4 data.xls')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Test 5 Prep" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of columns: 304\n", "Number of observations: 85401\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
org_idEINorg_urlnamecategorycategory-fullDate PublishedForm 990 FYEForm 990 FYE, v2FYEEarliest Rating Publication Dateratings_systemOverall ScoreOverall Ratingadvisory text - current advisoryadvisory text - past advisorycurrent_or_past_donor_advisorycurrent_donor_advisorypast_donor_advisorylatest_entrycurrent_ratings_urlein_2016Publication_date_and_FY_2016Publication Date_2016FYE_2016donor_alert_2016overall_rating_2016efficiency_rating_rating_2016AT_rating_2016overall_rating_star_2016financial_rating_star_2016AT_rating_star_2016program_expense_percent_2016admin_expense_percent_2016fund_expense_percent_2016fund_efficiency_2016working_capital_ratio_2016program_expense_growth_2016liabilities_to_assets_2016independent_board_2016no_material_division_2016audited_financials_2016no_loans_related_2016documents_minutes_2016form_990_2016conflict_of_interest_policy_2016whistleblower_policy_2016records_retention_policy_2016CEO_listed_2016process_CEO_compensation_2016no_board_compensation_2016donor_privacy_policy_2016board_listed_2016audited_financials_web_2016form_990_web_2016staff_listed_2016contributions_gifts_grants_2016federated_campaigns_2016membership_dues_2016fundraising_events_2016related_organizations_2016government_grants_2016total_contributions_2016program_service_revenue_2016total_primary_revenue_2016other_revenue_2016total_revenue_2016program_expenses_2016administrative_expenses_2016fundraising_expenses_2016total_functional_expenses_2016payments_to_affiliates_2016excess_or_deficit_2016net_assets_2016comp_2016cp_2016mission_20162011_datacharity_name_2011category_2011city_2011state_2011cause_2011tag_line_2011url_2011ein_2011fye_2011overall_rating_2011overall_rating_2011_plus_30overall_rating_2011_plus_30_v2overall_rating_star_2011overall_rating_star_2011_textefficiency_rating_2011AT_rating_2011financial_rating_star_2011AT_rating_star_2011program_expense_percent_2011admin_expense_percent_2011fund_expense_percent_2011fund_efficiency_2011primary_revenue_growth_2011program_expense_growth_2011working_capital_ratio_2011independent_board_2011no_material_division_2011audited_financials_2011no_loans_related_2011documents_minutes_2011form_990_2011conflict_of_interest_policy_2011whistleblower_policy_2011records_retention_policy_2011CEO_listed_2011process_CEO_compensation_2011no_board_compensation_2011donor_privacy_policy_2011board_listed_2011audited_financials_web_2011form_990_web_2011staff_listed_2011primary_revenue_2011other_revenue_2011total_revenue_2011govt_revenue_2011program_expense_2011admin_expense_2011fund_expense_2011total_functional_expense_2011affiliate_payments_2011budget_surplus_2011net_assets_2011leader_comp_2011leader_comp_percent_2011email_2011website_20112016 Advisory - Date Posted2016 Advisory - Charity Name2016 Advisory - advisory_url2016 Advisory - advisory_merge_v1to_be_mergedNEW ROWNAME_2015_BMFSTREET_2015_BMFCITY_2015_BMFSTATE_2015_BMFZIP_2015_BMFRULING_2015_BMFACTIVITY_2015_BMFTAX_PERIOD_2015_BMFASSET_AMT_2015_BMFINCOME_AMT_2015_BMFREVENUE_AMT_2015_BMFNTEE_CD_2015_BMF2015 BMFruledate_2004_BMFname_MSTRALLstate_MSTRALLNTEE1_MSTRALLnteecc_MSTRALLzip_MSTRALLfips_MSTRALLtaxper_MSTRALLincome_MSTRALLF990REV_MSTRALLassets_MSTRALLruledate_MSTRALLdeductcd_MSTRALLaccper_MSTRALLrule_date_v1taxpdNAME_SOIyr_frmtnpt1_num_vtng_gvrn_bdy_memspt1_num_ind_vtng_memsnum_vtng_gvrn_bdy_memsnum_ind_vtng_memstot_num_emplstot_num_vlntrscontri_grnts_cyprog_srvc_rev_cyinvst_incm_cyoth_rev_cygrnts_and_smlr_amts_cytot_prof_fndrsng_exp_cytot_fndrsng_exp_cypt1_tot_asts_eoyaud_fincl_stmtsmtrl_divrsn_or_misusecnflct_int_plcywhistleblower_plcydoc_retention_plcyfederated_campaignsmemshp_duesrltd_orgsgovt_grntsall_oth_contrinncsh_contritot_contripsr_totinv_incm_tot_revbonds_tot_revroylrev_tot_revnet_rent_tot_revgain_or_loss_secgain_or_loss_othoth_rev_tottot_revmgmt_srvc_fee_totfee_for_srvc_leg_totfee_for_srvc_acct_totfee_for_srvc_lbby_totfee_for_srvc_prof_totfee_for_srvc_invst_totfee_for_srvc_oth_totfs_auditedaudit_committeevlntr_hrs_merge_v2rule_dateruledate_2004_BMF_v2ruledate_MSTRALL_v2yr_frmtn_v2agecategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policygovt_revenue_2011_binaryother_revenue_2011_binarycomplexity_2011advisorySOX_policies_2011total_revenue_2011_loggedtotal_revenuetotal_revenue_loggedprogram_efficiency_2016statetot_func_expns_prg_srvcstot_func_expns_tot_merge_v3program_expensestotal_expensesprogram_efficiencyfndrsng_events_merge_v4other_revenue_SOIcomplexity_2016complexity_SOIcomplexityconflict_of_interest_policywhistleblower_policyrecords_retention_policyconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binary2016_dataAdvisory Textdonor_advisorydonor_advisory_2016donor_advisory_2011_to_2016SOX_policies_all_binarytotal_revenue_no_negEIN_47conflict_of_interest_policy_47records_retention_policy_47whistleblower_policy_47SOX_policies_47SOX_policies_all_binary_47SOX_policies_binary_47tot_rev_47total_revenue_logged_47program_expenses_47total_expenses_47program_efficiency_47complexity_47_merge_47OrganizationName_efileURL_efileSubmittedOn_efileTaxPeriod_efilewhistleblower_policy_efileconflict_of_interest_policy_efilerecords_retention_policy_efileSOX_policies_efileSOX_policies_binary_efileSOX_policies_all_binary_efiletot_rev_efiletot_rev_no_neg_efiletotal_revenue_logged_efileprogram_expenses_efiletotal_expenses_efileprogram_efficiency_efilecomplexity_efile_merge_efile
507095954010202467http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=5954Mount Desert Island Biological LaboratoryResearch and Public PolicyResearch and Public Policy : Non-Medical Science & Technology Research2016-06-01 00:00:002014-122014-12-01FY20142003-06-09CN 2.191.314 starsNaNNaN0.00.00.0Truehttp://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=595401-0202467This rating was published 06/01/2016 and includes data from FY2014, the most recent 990 received at that time.06/01/2016FY2014NaN91.3187.72100.0043479.615.94.30.041.842.319.9[_gfx_/icons/checked.gif]_gfx_/icons/checked.gif[_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif][_gfx_/icons/checked.gif]2334773.00.04600.0800.00.07932282.0$10,272,4551711837.0$11,984,292983676.012967968.08605100.0$1,671,713$554,60810831421.0$0$2,136,547$23,690,097$233,2172.15%The MDI Biological Laboratory is a rapidly growing, independent non-profit biomedical research institution. Its mission is to improve human health and well-being through basic research, education, and development ventures that transform discoveries into cures.0.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_only0.0NaNMOUNT DESERT ISLAND BIOLOGICAL LABORATORYPO BOX 35SALSBURY COVEME04672-0035195403.0161180059.0201412.029607771.013022814.012967968.0U5001.0195403.0MT DESERT ISLAND BIO LABORATORYMEUU5004672-0000230092000122202514.02202514.04973233.01954031121954NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_only19541954.01954nan62.00.00.00.00.00.00.00.00.00.00.01.0NaNNaNNaN0.0NaNNaN12967968.016.3779930.794457MENaNNaNleft_only8605100.010831421.00.794457NaNleft_onlyNaN6.00.06.0_gfx_/icons/checked.gif_gfx_/icons/checked.gif_gfx_/icons/checked.gif1.01.01.03.01.01.0NaN0.00.00.01.012967968.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_onlyNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNleft_only
\n", "
" ], "text/plain": [ " org_id EIN \\\n", "50709 5954 010202467 \n", "\n", " org_url \\\n", "50709 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=5954 \n", "\n", " name category \\\n", "50709 Mount Desert Island Biological Laboratory Research and Public Policy \n", "\n", " category-full \\\n", "50709 Research and Public Policy : Non-Medical Science & Technology Research \n", "\n", " Date Published Form 990 FYE Form 990 FYE, v2 FYE \\\n", "50709 2016-06-01 00:00:00 2014-12 2014-12-01 FY2014 \n", "\n", " Earliest Rating Publication Date ratings_system Overall Score \\\n", "50709 2003-06-09 CN 2.1 91.31 \n", "\n", " Overall Rating advisory text - current advisory \\\n", "50709 4 stars NaN \n", "\n", " advisory text - past advisory current_or_past_donor_advisory \\\n", "50709 NaN 0.0 \n", "\n", " current_donor_advisory past_donor_advisory latest_entry \\\n", "50709 0.0 0.0 True \n", "\n", " current_ratings_url \\\n", "50709 http://www.charitynavigator.org/index.cfm?bay=search.summary&orgid=5954 \n", "\n", " ein_2016 \\\n", "50709 01-0202467 \n", "\n", " Publication_date_and_FY_2016 \\\n", "50709 This rating was published 06/01/2016 and includes data from FY2014, the most recent 990 received at that time. \n", "\n", " Publication Date_2016 FYE_2016 donor_alert_2016 overall_rating_2016 \\\n", "50709 06/01/2016 FY2014 NaN 91.31 \n", "\n", " efficiency_rating_rating_2016 AT_rating_2016 overall_rating_star_2016 \\\n", "50709 87.72 100.00 4 \n", "\n", " financial_rating_star_2016 AT_rating_star_2016 \\\n", "50709 3 4 \n", "\n", " program_expense_percent_2016 admin_expense_percent_2016 \\\n", "50709 79.6 15.9 \n", "\n", " fund_expense_percent_2016 fund_efficiency_2016 \\\n", "50709 4.3 0.04 \n", "\n", " working_capital_ratio_2016 program_expense_growth_2016 \\\n", "50709 1.84 2.3 \n", "\n", " liabilities_to_assets_2016 independent_board_2016 \\\n", "50709 19.9 [_gfx_/icons/checked.gif] \n", "\n", " no_material_division_2016 audited_financials_2016 \\\n", "50709 _gfx_/icons/checked.gif [_gfx_/icons/checked.gif] \n", "\n", " no_loans_related_2016 documents_minutes_2016 \\\n", "50709 [_gfx_/icons/checked.gif] [_gfx_/icons/checked.gif] \n", "\n", " form_990_2016 conflict_of_interest_policy_2016 \\\n", "50709 [_gfx_/icons/checked.gif] [_gfx_/icons/checked.gif] \n", "\n", " whistleblower_policy_2016 records_retention_policy_2016 \\\n", "50709 [_gfx_/icons/checked.gif] [_gfx_/icons/checked.gif] \n", "\n", " CEO_listed_2016 process_CEO_compensation_2016 \\\n", "50709 [_gfx_/icons/checked.gif] [_gfx_/icons/checked.gif] \n", "\n", " no_board_compensation_2016 donor_privacy_policy_2016 \\\n", "50709 [_gfx_/icons/checked.gif] [_gfx_/icons/checked.gif] \n", "\n", " board_listed_2016 audited_financials_web_2016 \\\n", "50709 [_gfx_/icons/checked.gif] [_gfx_/icons/checked.gif] \n", "\n", " form_990_web_2016 staff_listed_2016 \\\n", "50709 [_gfx_/icons/checked.gif] [_gfx_/icons/checked.gif] \n", "\n", " contributions_gifts_grants_2016 federated_campaigns_2016 \\\n", "50709 2334773.0 0.0 \n", "\n", " membership_dues_2016 fundraising_events_2016 \\\n", "50709 4600.0 800.0 \n", "\n", " related_organizations_2016 government_grants_2016 \\\n", "50709 0.0 7932282.0 \n", "\n", " total_contributions_2016 program_service_revenue_2016 \\\n", "50709 $10,272,455 1711837.0 \n", "\n", " total_primary_revenue_2016 other_revenue_2016 total_revenue_2016 \\\n", "50709 $11,984,292 983676.0 12967968.0 \n", "\n", " program_expenses_2016 administrative_expenses_2016 \\\n", "50709 8605100.0 $1,671,713 \n", "\n", " fundraising_expenses_2016 total_functional_expenses_2016 \\\n", "50709 $554,608 10831421.0 \n", "\n", " payments_to_affiliates_2016 excess_or_deficit_2016 net_assets_2016 \\\n", "50709 $0 $2,136,547 $23,690,097 \n", "\n", " comp_2016 cp_2016 \\\n", "50709 $233,217 2.15% \n", "\n", " mission_2016 \\\n", "50709 The MDI Biological Laboratory is a rapidly growing, independent non-profit biomedical research institution. Its mission is to improve human health and well-being through basic research, education, and development ventures that transform discoveries into cures. \n", "\n", " 2011_data charity_name_2011 category_2011 city_2011 state_2011 \\\n", "50709 0.0 NaN NaN NaN NaN \n", "\n", " cause_2011 tag_line_2011 url_2011 ein_2011 fye_2011 \\\n", "50709 NaN NaN NaN NaN NaN \n", "\n", " overall_rating_2011 overall_rating_2011_plus_30 \\\n", "50709 NaN NaN \n", "\n", " overall_rating_2011_plus_30_v2 overall_rating_star_2011 \\\n", "50709 NaN NaN \n", "\n", " overall_rating_star_2011_text efficiency_rating_2011 AT_rating_2011 \\\n", "50709 NaN NaN NaN \n", "\n", " financial_rating_star_2011 AT_rating_star_2011 \\\n", "50709 NaN NaN \n", "\n", " program_expense_percent_2011 admin_expense_percent_2011 \\\n", "50709 NaN NaN \n", "\n", " fund_expense_percent_2011 fund_efficiency_2011 \\\n", "50709 NaN NaN \n", "\n", " primary_revenue_growth_2011 program_expense_growth_2011 \\\n", "50709 NaN NaN \n", "\n", " working_capital_ratio_2011 independent_board_2011 \\\n", "50709 NaN NaN \n", "\n", " no_material_division_2011 audited_financials_2011 no_loans_related_2011 \\\n", "50709 NaN NaN NaN \n", "\n", " documents_minutes_2011 form_990_2011 conflict_of_interest_policy_2011 \\\n", "50709 NaN NaN NaN \n", "\n", " whistleblower_policy_2011 records_retention_policy_2011 CEO_listed_2011 \\\n", "50709 NaN NaN NaN \n", "\n", " process_CEO_compensation_2011 no_board_compensation_2011 \\\n", "50709 NaN NaN \n", "\n", " donor_privacy_policy_2011 board_listed_2011 audited_financials_web_2011 \\\n", "50709 NaN NaN NaN \n", "\n", " form_990_web_2011 staff_listed_2011 primary_revenue_2011 \\\n", "50709 NaN NaN NaN \n", "\n", " other_revenue_2011 total_revenue_2011 govt_revenue_2011 \\\n", "50709 NaN NaN NaN \n", "\n", " program_expense_2011 admin_expense_2011 fund_expense_2011 \\\n", "50709 NaN NaN NaN \n", "\n", " total_functional_expense_2011 affiliate_payments_2011 \\\n", "50709 NaN NaN \n", "\n", " budget_surplus_2011 net_assets_2011 leader_comp_2011 \\\n", "50709 NaN NaN NaN \n", "\n", " leader_comp_percent_2011 email_2011 website_2011 \\\n", "50709 NaN NaN NaN \n", "\n", " 2016 Advisory - Date Posted 2016 Advisory - Charity Name \\\n", "50709 NaN NaN \n", "\n", " 2016 Advisory - advisory_url 2016 Advisory - advisory _merge_v1 \\\n", "50709 NaN NaN left_only \n", "\n", " to_be_merged NEW ROW NAME_2015_BMF \\\n", "50709 0.0 NaN MOUNT DESERT ISLAND BIOLOGICAL LABORATORY \n", "\n", " STREET_2015_BMF CITY_2015_BMF STATE_2015_BMF ZIP_2015_BMF \\\n", "50709 PO BOX 35 SALSBURY COVE ME 04672-0035 \n", "\n", " RULING_2015_BMF ACTIVITY_2015_BMF TAX_PERIOD_2015_BMF \\\n", "50709 195403.0 161180059.0 201412.0 \n", "\n", " ASSET_AMT_2015_BMF INCOME_AMT_2015_BMF REVENUE_AMT_2015_BMF \\\n", "50709 29607771.0 13022814.0 12967968.0 \n", "\n", " NTEE_CD_2015_BMF 2015 BMF ruledate_2004_BMF \\\n", "50709 U500 1.0 195403.0 \n", "\n", " name_MSTRALL state_MSTRALL NTEE1_MSTRALL \\\n", "50709 MT DESERT ISLAND BIO LABORATORY ME U \n", "\n", " nteecc_MSTRALL zip_MSTRALL fips_MSTRALL taxper_MSTRALL income_MSTRALL \\\n", "50709 U50 04672-0000 23009 200012 2202514.0 \n", "\n", " F990REV_MSTRALL assets_MSTRALL ruledate_MSTRALL deductcd_MSTRALL \\\n", "50709 2202514.0 4973233.0 195403 1 \n", "\n", " accper_MSTRALL rule_date_v1 taxpd NAME_SOI yr_frmtn \\\n", "50709 12 1954 NaN NaN NaN \n", "\n", " pt1_num_vtng_gvrn_bdy_mems pt1_num_ind_vtng_mems \\\n", "50709 NaN NaN \n", "\n", " num_vtng_gvrn_bdy_mems num_ind_vtng_mems tot_num_empls \\\n", "50709 NaN NaN NaN \n", "\n", " tot_num_vlntrs contri_grnts_cy prog_srvc_rev_cy invst_incm_cy \\\n", "50709 NaN NaN NaN NaN \n", "\n", " oth_rev_cy grnts_and_smlr_amts_cy tot_prof_fndrsng_exp_cy \\\n", "50709 NaN NaN NaN \n", "\n", " tot_fndrsng_exp_cy pt1_tot_asts_eoy aud_fincl_stmts \\\n", "50709 NaN NaN NaN \n", "\n", " mtrl_divrsn_or_misuse cnflct_int_plcy whistleblower_plcy \\\n", "50709 NaN NaN NaN \n", "\n", " doc_retention_plcy federated_campaigns memshp_dues rltd_orgs \\\n", "50709 NaN NaN NaN NaN \n", "\n", " govt_grnts all_oth_contri nncsh_contri tot_contri psr_tot \\\n", "50709 NaN NaN NaN NaN NaN \n", "\n", " inv_incm_tot_rev bonds_tot_rev roylrev_tot_rev net_rent_tot_rev \\\n", "50709 NaN NaN NaN NaN \n", "\n", " gain_or_loss_sec gain_or_loss_oth oth_rev_tot tot_rev \\\n", "50709 NaN NaN NaN NaN \n", "\n", " mgmt_srvc_fee_tot fee_for_srvc_leg_tot fee_for_srvc_acct_tot \\\n", "50709 NaN NaN NaN \n", "\n", " fee_for_srvc_lbby_tot fee_for_srvc_prof_tot fee_for_srvc_invst_tot \\\n", "50709 NaN NaN NaN \n", "\n", " fee_for_srvc_oth_tot fs_audited audit_committee vlntr_hrs _merge_v2 \\\n", "50709 NaN NaN NaN NaN left_only \n", "\n", " rule_date ruledate_2004_BMF_v2 ruledate_MSTRALL_v2 yr_frmtn_v2 age \\\n", "50709 1954 1954.0 1954 nan 62.0 \n", "\n", " category_Animals category_Arts, Culture, Humanities \\\n", "50709 0.0 0.0 \n", "\n", " category_Community Development category_Education \\\n", "50709 0.0 0.0 \n", "\n", " category_Environment category_Health category_Human Services \\\n", "50709 0.0 0.0 0.0 \n", "\n", " category_Human and Civil Rights category_International \\\n", "50709 0.0 0.0 \n", "\n", " category_Religion category_Research and Public Policy \\\n", "50709 0.0 1.0 \n", "\n", " govt_revenue_2011_binary other_revenue_2011_binary complexity_2011 \\\n", "50709 NaN NaN NaN \n", "\n", " advisory SOX_policies_2011 total_revenue_2011_logged total_revenue \\\n", "50709 0.0 NaN NaN 12967968.0 \n", "\n", " total_revenue_logged program_efficiency_2016 state \\\n", "50709 16.377993 0.794457 ME \n", "\n", " tot_func_expns_prg_srvcs tot_func_expns_tot _merge_v3 \\\n", "50709 NaN NaN left_only \n", "\n", " program_expenses total_expenses program_efficiency fndrsng_events \\\n", "50709 8605100.0 10831421.0 0.794457 NaN \n", "\n", " _merge_v4 other_revenue_SOI complexity_2016 complexity_SOI \\\n", "50709 left_only NaN 6.0 0.0 \n", "\n", " complexity conflict_of_interest_policy whistleblower_policy \\\n", "50709 6.0 _gfx_/icons/checked.gif _gfx_/icons/checked.gif \n", "\n", " records_retention_policy conflict_of_interest_policy_v2 \\\n", "50709 _gfx_/icons/checked.gif 1.0 \n", "\n", " records_retention_policy_v2 whistleblower_policy_v2 SOX_policies \\\n", "50709 1.0 1.0 3.0 \n", "\n", " SOX_policies_binary 2016_data Advisory Text donor_advisory \\\n", "50709 1.0 1.0 NaN 0.0 \n", "\n", " donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "50709 0.0 0.0 \n", "\n", " SOX_policies_all_binary total_revenue_no_neg EIN_47 \\\n", "50709 1.0 12967968.0 NaN \n", "\n", " conflict_of_interest_policy_47 records_retention_policy_47 \\\n", "50709 NaN NaN \n", "\n", " whistleblower_policy_47 SOX_policies_47 SOX_policies_all_binary_47 \\\n", "50709 NaN NaN NaN \n", "\n", " SOX_policies_binary_47 tot_rev_47 total_revenue_logged_47 \\\n", "50709 NaN NaN NaN \n", "\n", " program_expenses_47 total_expenses_47 program_efficiency_47 \\\n", "50709 NaN NaN NaN \n", "\n", " complexity_47 _merge_47 OrganizationName_efile URL_efile \\\n", "50709 NaN left_only NaN NaN \n", "\n", " SubmittedOn_efile TaxPeriod_efile whistleblower_policy_efile \\\n", "50709 NaN NaN NaN \n", "\n", " conflict_of_interest_policy_efile records_retention_policy_efile \\\n", "50709 NaN NaN \n", "\n", " SOX_policies_efile SOX_policies_binary_efile \\\n", "50709 NaN NaN \n", "\n", " SOX_policies_all_binary_efile tot_rev_efile tot_rev_no_neg_efile \\\n", "50709 NaN NaN NaN \n", "\n", " total_revenue_logged_efile program_expenses_efile \\\n", "50709 NaN NaN \n", "\n", " total_expenses_efile program_efficiency_efile complexity_efile \\\n", "50709 NaN NaN NaN \n", "\n", " _merge_efile \n", "50709 left_only " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_pickle('df.pkl')\n", "print \"Number of columns:\", len(df.columns)\n", "print \"Number of observations:\", len(df)\n", "df.head(1)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4863\n", "4857\n", "4857\n" ] } ], "source": [ "print len(df[df['2011_data']==1]['EIN'].tolist())\n", "orgs_2011 = list(set(df[df['2011_data']==1]['EIN'].tolist()))\n", "print len(orgs_2011)\n", "print len(set(orgs_2011))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "47\n", "47\n", "47\n" ] }, { "data": { "text/plain": [ "['042129889', '112613334', '112716763', '113059922', '133119118']" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[(df['EIN'].isin(orgs_2011)) & (df['donor_advisory']==1) & (df['2016_data']==1)])\n", "advisory_orgs_2011 = df[(df['EIN'].isin(orgs_2011)) & (df['donor_advisory']==1) & (df['2016_data']==1)]['EIN'].tolist()\n", "print len(advisory_orgs_2011)\n", "print len(set(advisory_orgs_2011))\n", "advisory_orgs_2011[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Sort DF" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#df = df.sort_values(by=['EIN', 'latest_entry', 'FYE', 'ratings_system'], ascending=[1, 0, 0, 0])" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#df['year'] = df['FYE'].str[2:]\n", "#print df['year'][:3], '\\n'\n", "#print df['year'].value_counts()" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#df['year'] = np.where(df['year']=='rrent', 9999, df['year'])\n", "#print df['year'].value_counts()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df = df.sort_values(by=['EIN', '2016_data', 'FYE', 'ratings_system'], ascending=[1, 0, 0, 0])" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#logit_cols2 = ['year'] + logit_cols" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create list of 2016 advisory orgs for the '2011' orgs -- grab 'first' values for each variable" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "281\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yeardonor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revcategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
6804899991.01.01.04441042129889currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaNNaNNaN65.0NaNHealthMANaN0.00.00.00.00.01.00.00.00.00.00.0
849592014NaN0.00.0NaN042129889FY2014NaNNaN0.00.01.01.01.03.01.01.00.7213590.0NaNNaN16.950810NaNNaN22995526.00.00.00.00.00.00.00.00.00.00.00.0
849212013NaN0.00.0NaN042129889FY2013NaNNaN0.00.01.01.01.03.01.01.00.7352180.0NaNNaN17.070658NaNNaN25923449.00.00.00.00.00.00.00.00.00.00.00.0
848592012NaN0.00.0NaN042129889FY2012NaNNaN0.00.01.01.01.03.01.01.00.7555440.0NaNNaN17.168203NaNNaN28579592.00.00.00.00.00.00.00.00.00.00.00.0
846732011NaN0.00.0NaN042129889FY2011NaNNaN0.00.01.01.01.03.01.01.00.7791460.0NaNNaN17.089253NaNNaN26410002.00.00.00.00.00.00.00.00.00.00.00.0
839372010NaN1.01.04441042129889FY2010NaNNaN1.00.01.01.01.03.01.01.00.8290870.03.065.017.165174HealthMA28493155.00.00.00.00.00.01.00.00.00.00.00.0
839512009NaN0.00.0NaN042129889FY2009NaNNaN0.00.01.01.01.03.01.01.00.8192630.0NaNNaN16.996901NaNNaN24080206.00.00.00.00.00.00.00.00.00.00.00.0
8280199991.01.01.04778112613334currentcurrentcurrent0.01.0NaNNaNNaNNaNNaNNaNNaNNaNNaN32.0NaNEducationNYNaN0.00.00.01.00.00.00.00.00.00.00.0
846812011NaN0.00.0NaN112613334FY2011NaNNaN0.00.00.00.00.00.00.00.00.6628120.0NaNNaN16.424554NaNNaN13586048.00.00.00.00.00.00.00.00.00.00.00.0
839332010NaN1.01.04778112613334FY2010NaNNaN1.00.00.00.00.00.00.00.00.7737350.03.032.016.248756EducationNY11395808.00.00.00.01.00.00.00.00.00.00.00.0
839632009NaN0.00.0NaN112613334FY2009NaNNaN0.00.00.00.00.00.00.00.00.5972820.0NaNNaN16.157449NaNNaN10401383.00.00.00.00.00.00.00.00.00.00.00.0
849782015NaNNaNNaN4778112613334FY2015NaNNaNNaNNaN0.00.00.00.00.00.00.7905484.0NaNNaN16.324828NaNNaN12296531.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
850212014NaNNaNNaNNaN112613334FY2014NaNNaNNaNNaN0.00.00.00.00.00.00.7786893.0NaNNaN16.360544NaNNaN12743652.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
850222013NaNNaNNaNNaN112613334FY2013NaNNaNNaNNaN0.00.00.00.00.00.00.6686622.0NaNNaN16.424509NaNNaN13585438.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
850232012NaNNaNNaNNaN112613334FY2012NaNNaNNaNNaN0.00.00.00.00.00.00.5796302.0NaNNaN16.303710NaNNaN12039583.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " year donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "68048 9999 1.0 1.0 1.0 \n", "84959 2014 NaN 0.0 0.0 \n", "84921 2013 NaN 0.0 0.0 \n", "84859 2012 NaN 0.0 0.0 \n", "84673 2011 NaN 0.0 0.0 \n", "83937 2010 NaN 1.0 1.0 \n", "83951 2009 NaN 0.0 0.0 \n", "82801 9999 1.0 1.0 1.0 \n", "84681 2011 NaN 0.0 0.0 \n", "83933 2010 NaN 1.0 1.0 \n", "83963 2009 NaN 0.0 0.0 \n", "84978 2015 NaN NaN NaN \n", "85021 2014 NaN NaN NaN \n", "85022 2013 NaN NaN NaN \n", "85023 2012 NaN NaN NaN \n", "\n", " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "68048 4441 042129889 current current current 0.0 \n", "84959 NaN 042129889 FY2014 NaN NaN 0.0 \n", "84921 NaN 042129889 FY2013 NaN NaN 0.0 \n", "84859 NaN 042129889 FY2012 NaN NaN 0.0 \n", "84673 NaN 042129889 FY2011 NaN NaN 0.0 \n", "83937 4441 042129889 FY2010 NaN NaN 1.0 \n", "83951 NaN 042129889 FY2009 NaN NaN 0.0 \n", "82801 4778 112613334 current current current 0.0 \n", "84681 NaN 112613334 FY2011 NaN NaN 0.0 \n", "83933 4778 112613334 FY2010 NaN NaN 1.0 \n", "83963 NaN 112613334 FY2009 NaN NaN 0.0 \n", "84978 4778 112613334 FY2015 NaN NaN NaN \n", "85021 NaN 112613334 FY2014 NaN NaN NaN \n", "85022 NaN 112613334 FY2013 NaN NaN NaN \n", "85023 NaN 112613334 FY2012 NaN NaN NaN \n", "\n", " 2016_data conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "68048 1.0 NaN NaN \n", "84959 0.0 1.0 1.0 \n", "84921 0.0 1.0 1.0 \n", "84859 0.0 1.0 1.0 \n", "84673 0.0 1.0 1.0 \n", "83937 0.0 1.0 1.0 \n", "83951 0.0 1.0 1.0 \n", "82801 1.0 NaN NaN \n", "84681 0.0 0.0 0.0 \n", "83933 0.0 0.0 0.0 \n", "83963 0.0 0.0 0.0 \n", "84978 NaN 0.0 0.0 \n", "85021 NaN 0.0 0.0 \n", "85022 NaN 0.0 0.0 \n", "85023 NaN 0.0 0.0 \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "68048 NaN NaN NaN \n", "84959 1.0 3.0 1.0 \n", "84921 1.0 3.0 1.0 \n", "84859 1.0 3.0 1.0 \n", "84673 1.0 3.0 1.0 \n", "83937 1.0 3.0 1.0 \n", "83951 1.0 3.0 1.0 \n", "82801 NaN NaN NaN \n", "84681 0.0 0.0 0.0 \n", "83933 0.0 0.0 0.0 \n", "83963 0.0 0.0 0.0 \n", "84978 0.0 0.0 0.0 \n", "85021 0.0 0.0 0.0 \n", "85022 0.0 0.0 0.0 \n", "85023 0.0 0.0 0.0 \n", "\n", " SOX_policies_all_binary program_efficiency complexity \\\n", "68048 NaN NaN NaN \n", "84959 1.0 0.721359 0.0 \n", "84921 1.0 0.735218 0.0 \n", "84859 1.0 0.755544 0.0 \n", "84673 1.0 0.779146 0.0 \n", "83937 1.0 0.829087 0.0 \n", "83951 1.0 0.819263 0.0 \n", "82801 NaN NaN NaN \n", "84681 0.0 0.662812 0.0 \n", "83933 0.0 0.773735 0.0 \n", "83963 0.0 0.597282 0.0 \n", "84978 0.0 0.790548 4.0 \n", "85021 0.0 0.778689 3.0 \n", "85022 0.0 0.668662 2.0 \n", "85023 0.0 0.579630 2.0 \n", "\n", " complexity_2011 age total_revenue_logged category state \\\n", "68048 NaN 65.0 NaN Health MA \n", "84959 NaN NaN 16.950810 NaN NaN \n", "84921 NaN NaN 17.070658 NaN NaN \n", "84859 NaN NaN 17.168203 NaN NaN \n", "84673 NaN NaN 17.089253 NaN NaN \n", "83937 3.0 65.0 17.165174 Health MA \n", "83951 NaN NaN 16.996901 NaN NaN \n", "82801 NaN 32.0 NaN Education NY \n", "84681 NaN NaN 16.424554 NaN NaN \n", "83933 3.0 32.0 16.248756 Education NY \n", "83963 NaN NaN 16.157449 NaN NaN \n", "84978 NaN NaN 16.324828 NaN NaN \n", "85021 NaN NaN 16.360544 NaN NaN \n", "85022 NaN NaN 16.424509 NaN NaN \n", "85023 NaN NaN 16.303710 NaN NaN \n", "\n", " tot_rev category_Animals category_Arts, Culture, Humanities \\\n", "68048 NaN 0.0 0.0 \n", "84959 22995526.0 0.0 0.0 \n", "84921 25923449.0 0.0 0.0 \n", "84859 28579592.0 0.0 0.0 \n", "84673 26410002.0 0.0 0.0 \n", "83937 28493155.0 0.0 0.0 \n", "83951 24080206.0 0.0 0.0 \n", "82801 NaN 0.0 0.0 \n", "84681 13586048.0 0.0 0.0 \n", "83933 11395808.0 0.0 0.0 \n", "83963 10401383.0 0.0 0.0 \n", "84978 12296531.0 NaN NaN \n", "85021 12743652.0 NaN NaN \n", "85022 13585438.0 NaN NaN \n", "85023 12039583.0 NaN NaN \n", "\n", " category_Community Development category_Education \\\n", "68048 0.0 0.0 \n", "84959 0.0 0.0 \n", "84921 0.0 0.0 \n", "84859 0.0 0.0 \n", "84673 0.0 0.0 \n", "83937 0.0 0.0 \n", "83951 0.0 0.0 \n", "82801 0.0 1.0 \n", "84681 0.0 0.0 \n", "83933 0.0 1.0 \n", "83963 0.0 0.0 \n", "84978 NaN NaN \n", "85021 NaN NaN \n", "85022 NaN NaN \n", "85023 NaN NaN \n", "\n", " category_Environment category_Health category_Human Services \\\n", "68048 0.0 1.0 0.0 \n", "84959 0.0 0.0 0.0 \n", "84921 0.0 0.0 0.0 \n", "84859 0.0 0.0 0.0 \n", "84673 0.0 0.0 0.0 \n", "83937 0.0 1.0 0.0 \n", "83951 0.0 0.0 0.0 \n", "82801 0.0 0.0 0.0 \n", "84681 0.0 0.0 0.0 \n", "83933 0.0 0.0 0.0 \n", "83963 0.0 0.0 0.0 \n", "84978 NaN NaN NaN \n", "85021 NaN NaN NaN \n", "85022 NaN NaN NaN \n", "85023 NaN NaN NaN \n", "\n", " category_Human and Civil Rights category_International \\\n", "68048 0.0 0.0 \n", "84959 0.0 0.0 \n", "84921 0.0 0.0 \n", "84859 0.0 0.0 \n", "84673 0.0 0.0 \n", "83937 0.0 0.0 \n", "83951 0.0 0.0 \n", "82801 0.0 0.0 \n", "84681 0.0 0.0 \n", "83933 0.0 0.0 \n", "83963 0.0 0.0 \n", "84978 NaN NaN \n", "85021 NaN NaN \n", "85022 NaN NaN \n", "85023 NaN NaN \n", "\n", " category_Religion category_Research and Public Policy \n", "68048 0.0 0.0 \n", "84959 0.0 0.0 \n", "84921 0.0 0.0 \n", "84859 0.0 0.0 \n", "84673 0.0 0.0 \n", "83937 0.0 0.0 \n", "83951 0.0 0.0 \n", "82801 0.0 0.0 \n", "84681 0.0 0.0 \n", "83933 0.0 0.0 \n", "83963 0.0 0.0 \n", "84978 NaN NaN \n", "85021 NaN NaN \n", "85022 NaN NaN \n", "85023 NaN NaN " ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['EIN'].isin(advisory_orgs_2011)])\n", "df[df['EIN'].isin(advisory_orgs_2011)][logit_cols2].to_excel('2011 orgs with 2016 advisory.xls')\n", "df[df['EIN'].isin(advisory_orgs_2011)][logit_cols2][:15]" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "47\n", "3.0 26\n", "0.0 11\n", "2.0 7\n", "1.0 3\n", "Name: SOX_policies, dtype: int64\n", "47\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revcategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
EIN
1127167631.01.01.06705currentcurrentcurrent0.01.01.01.01.03.01.01.00.8372640.02.031.014.261826Human ServicesNY559128.00.00.00.00.00.00.01.00.00.00.00.0
1130599221.01.01.07651currentcurrentcurrent0.01.00.00.00.00.00.00.00.6026680.02.025.015.274481HealthNY468645.00.00.00.00.00.01.00.00.00.00.00.0
1331191181.01.01.08626currentcurrentcurrent0.01.00.00.00.00.00.00.00.9094470.02.034.013.785303ReligionNY970244.00.00.00.00.00.00.00.00.00.01.00.0
1335521541.01.01.04994currentcurrentcurrent0.01.01.01.01.03.01.01.00.9081040.02.022.016.953921Community DevelopmentNY23067174.00.00.01.00.00.00.00.00.00.00.00.0
1355905161.01.01.06033currentcurrentcurrent0.01.00.00.00.00.00.00.00.7779040.02.074.014.571678InternationalCA1552819.00.00.00.00.00.00.00.00.01.00.00.0
1416319951.01.01.09107currentcurrentcurrent0.01.01.01.01.03.01.01.00.6978100.01.034.015.596783ReligionCA5347792.00.00.00.00.00.00.00.00.00.01.00.0
2012264161.01.01.012740currentcurrentcurrent0.01.01.01.01.03.01.01.00.4679280.02.011.016.450163HealthTN5545025.00.00.00.00.00.01.00.00.00.00.00.0
2226800301.01.01.04608currentcurrentcurrent0.01.01.01.01.03.01.01.00.8186910.03.029.016.979085Human ServicesNJ23655000.00.00.00.00.00.00.01.00.00.00.00.0
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "EIN \n", "112716763 1.0 1.0 1.0 \n", "113059922 1.0 1.0 1.0 \n", "133119118 1.0 1.0 1.0 \n", "133552154 1.0 1.0 1.0 \n", "135590516 1.0 1.0 1.0 \n", "141631995 1.0 1.0 1.0 \n", "201226416 1.0 1.0 1.0 \n", "222680030 1.0 1.0 1.0 \n", "\n", " org_id FYE Form 990 FYE ratings_system 2011_data 2016_data \\\n", "EIN \n", "112716763 6705 current current current 0.0 1.0 \n", "113059922 7651 current current current 0.0 1.0 \n", "133119118 8626 current current current 0.0 1.0 \n", "133552154 4994 current current current 0.0 1.0 \n", "135590516 6033 current current current 0.0 1.0 \n", "141631995 9107 current current current 0.0 1.0 \n", "201226416 12740 current current current 0.0 1.0 \n", "222680030 4608 current current current 0.0 1.0 \n", "\n", " conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "EIN \n", "112716763 1.0 1.0 \n", "113059922 0.0 0.0 \n", "133119118 0.0 0.0 \n", "133552154 1.0 1.0 \n", "135590516 0.0 0.0 \n", "141631995 1.0 1.0 \n", "201226416 1.0 1.0 \n", "222680030 1.0 1.0 \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "EIN \n", "112716763 1.0 3.0 1.0 \n", "113059922 0.0 0.0 0.0 \n", "133119118 0.0 0.0 0.0 \n", "133552154 1.0 3.0 1.0 \n", "135590516 0.0 0.0 0.0 \n", "141631995 1.0 3.0 1.0 \n", "201226416 1.0 3.0 1.0 \n", "222680030 1.0 3.0 1.0 \n", "\n", " SOX_policies_all_binary program_efficiency complexity \\\n", "EIN \n", "112716763 1.0 0.837264 0.0 \n", "113059922 0.0 0.602668 0.0 \n", "133119118 0.0 0.909447 0.0 \n", "133552154 1.0 0.908104 0.0 \n", "135590516 0.0 0.777904 0.0 \n", "141631995 1.0 0.697810 0.0 \n", "201226416 1.0 0.467928 0.0 \n", "222680030 1.0 0.818691 0.0 \n", "\n", " complexity_2011 age total_revenue_logged category \\\n", "EIN \n", "112716763 2.0 31.0 14.261826 Human Services \n", "113059922 2.0 25.0 15.274481 Health \n", "133119118 2.0 34.0 13.785303 Religion \n", "133552154 2.0 22.0 16.953921 Community Development \n", "135590516 2.0 74.0 14.571678 International \n", "141631995 1.0 34.0 15.596783 Religion \n", "201226416 2.0 11.0 16.450163 Health \n", "222680030 3.0 29.0 16.979085 Human Services \n", "\n", " state tot_rev category_Animals \\\n", "EIN \n", "112716763 NY 559128.0 0.0 \n", "113059922 NY 468645.0 0.0 \n", "133119118 NY 970244.0 0.0 \n", "133552154 NY 23067174.0 0.0 \n", "135590516 CA 1552819.0 0.0 \n", "141631995 CA 5347792.0 0.0 \n", "201226416 TN 5545025.0 0.0 \n", "222680030 NJ 23655000.0 0.0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "EIN \n", "112716763 0.0 0.0 \n", "113059922 0.0 0.0 \n", "133119118 0.0 0.0 \n", "133552154 0.0 1.0 \n", "135590516 0.0 0.0 \n", "141631995 0.0 0.0 \n", "201226416 0.0 0.0 \n", "222680030 0.0 0.0 \n", "\n", " category_Education category_Environment category_Health \\\n", "EIN \n", "112716763 0.0 0.0 0.0 \n", "113059922 0.0 0.0 1.0 \n", "133119118 0.0 0.0 0.0 \n", "133552154 0.0 0.0 0.0 \n", "135590516 0.0 0.0 0.0 \n", "141631995 0.0 0.0 0.0 \n", "201226416 0.0 0.0 1.0 \n", "222680030 0.0 0.0 0.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "EIN \n", "112716763 1.0 0.0 \n", "113059922 0.0 0.0 \n", "133119118 0.0 0.0 \n", "133552154 0.0 0.0 \n", "135590516 0.0 0.0 \n", "141631995 0.0 0.0 \n", "201226416 0.0 0.0 \n", "222680030 1.0 0.0 \n", "\n", " category_International category_Religion \\\n", "EIN \n", "112716763 0.0 0.0 \n", "113059922 0.0 0.0 \n", "133119118 0.0 1.0 \n", "133552154 0.0 0.0 \n", "135590516 1.0 0.0 \n", "141631995 0.0 1.0 \n", "201226416 0.0 0.0 \n", "222680030 0.0 0.0 \n", "\n", " category_Research and Public Policy \n", "EIN \n", "112716763 0.0 \n", "113059922 0.0 \n", "133119118 0.0 \n", "133552154 0.0 \n", "135590516 0.0 \n", "141631995 0.0 \n", "201226416 0.0 \n", "222680030 0.0 " ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['EIN'].isin(advisory_orgs_2011)][logit_cols].groupby('EIN').agg('first'))\n", "print df[df['EIN'].isin(advisory_orgs_2011)][logit_cols].groupby('EIN').agg('first')['SOX_policies'].value_counts()\n", "print 26+11+7+3\n", "df[df['EIN'].isin(advisory_orgs_2011)][logit_cols].groupby('EIN').agg('first')[2:10]" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "47\n", "0\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revcategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
EIN
0421298891.01.01.04441currentcurrentcurrent0.01.01.01.01.03.01.01.00.7213590.03.065.016.950810HealthMA22995526.00.00.00.00.00.01.00.00.00.00.00.0
1126133341.01.01.04778currentcurrentcurrent0.01.00.00.00.00.00.00.00.6628120.03.032.016.424554EducationNY13586048.00.00.00.01.00.00.00.00.00.00.00.0
1127167631.01.01.06705currentcurrentcurrent0.01.01.01.01.03.01.01.00.8372640.02.031.014.261826Human ServicesNY559128.00.00.00.00.00.00.01.00.00.00.00.0
1130599221.01.01.07651currentcurrentcurrent0.01.00.00.00.00.00.00.00.6026680.02.025.015.274481HealthNY468645.00.00.00.00.00.01.00.00.00.00.00.0
1331191181.01.01.08626currentcurrentcurrent0.01.00.00.00.00.00.00.00.9094470.02.034.013.785303ReligionNY970244.00.00.00.00.00.00.00.00.00.01.00.0
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "EIN \n", "042129889 1.0 1.0 1.0 \n", "112613334 1.0 1.0 1.0 \n", "112716763 1.0 1.0 1.0 \n", "113059922 1.0 1.0 1.0 \n", "133119118 1.0 1.0 1.0 \n", "\n", " org_id FYE Form 990 FYE ratings_system 2011_data 2016_data \\\n", "EIN \n", "042129889 4441 current current current 0.0 1.0 \n", "112613334 4778 current current current 0.0 1.0 \n", "112716763 6705 current current current 0.0 1.0 \n", "113059922 7651 current current current 0.0 1.0 \n", "133119118 8626 current current current 0.0 1.0 \n", "\n", " conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "EIN \n", "042129889 1.0 1.0 \n", "112613334 0.0 0.0 \n", "112716763 1.0 1.0 \n", "113059922 0.0 0.0 \n", "133119118 0.0 0.0 \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "EIN \n", "042129889 1.0 3.0 1.0 \n", "112613334 0.0 0.0 0.0 \n", "112716763 1.0 3.0 1.0 \n", "113059922 0.0 0.0 0.0 \n", "133119118 0.0 0.0 0.0 \n", "\n", " SOX_policies_all_binary program_efficiency complexity \\\n", "EIN \n", "042129889 1.0 0.721359 0.0 \n", "112613334 0.0 0.662812 0.0 \n", "112716763 1.0 0.837264 0.0 \n", "113059922 0.0 0.602668 0.0 \n", "133119118 0.0 0.909447 0.0 \n", "\n", " complexity_2011 age total_revenue_logged category state \\\n", "EIN \n", "042129889 3.0 65.0 16.950810 Health MA \n", "112613334 3.0 32.0 16.424554 Education NY \n", "112716763 2.0 31.0 14.261826 Human Services NY \n", "113059922 2.0 25.0 15.274481 Health NY \n", "133119118 2.0 34.0 13.785303 Religion NY \n", "\n", " tot_rev category_Animals category_Arts, Culture, Humanities \\\n", "EIN \n", "042129889 22995526.0 0.0 0.0 \n", "112613334 13586048.0 0.0 0.0 \n", "112716763 559128.0 0.0 0.0 \n", "113059922 468645.0 0.0 0.0 \n", "133119118 970244.0 0.0 0.0 \n", "\n", " category_Community Development category_Education \\\n", "EIN \n", "042129889 0.0 0.0 \n", "112613334 0.0 1.0 \n", "112716763 0.0 0.0 \n", "113059922 0.0 0.0 \n", "133119118 0.0 0.0 \n", "\n", " category_Environment category_Health category_Human Services \\\n", "EIN \n", "042129889 0.0 1.0 0.0 \n", "112613334 0.0 0.0 0.0 \n", "112716763 0.0 0.0 1.0 \n", "113059922 0.0 1.0 0.0 \n", "133119118 0.0 0.0 0.0 \n", "\n", " category_Human and Civil Rights category_International \\\n", "EIN \n", "042129889 0.0 0.0 \n", "112613334 0.0 0.0 \n", "112716763 0.0 0.0 \n", "113059922 0.0 0.0 \n", "133119118 0.0 0.0 \n", "\n", " category_Religion category_Research and Public Policy \n", "EIN \n", "042129889 0.0 0.0 \n", "112613334 0.0 0.0 \n", "112716763 0.0 0.0 \n", "113059922 0.0 0.0 \n", "133119118 1.0 0.0 " ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_data_2016_advisories_2011_orgs = df[df['EIN'].isin(advisory_orgs_2011)][logit_cols].groupby('EIN').agg('first')\n", "print len(first_data_2016_advisories_2011_orgs[first_data_2016_advisories_2011_orgs['SOX_policies'].notnull()])\n", "print len(first_data_2016_advisories_2011_orgs[first_data_2016_advisories_2011_orgs['SOX_policies'].isnull()])\n", "first_data_2016_advisories_2011_orgs[:5]" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": true }, "outputs": [], "source": [ "first_data_2016_advisories_2011_orgs = first_data_2016_advisories_2011_orgs.reset_index()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['EIN', 'donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'complexity_2011', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ "print first_data_2016_advisories_2011_orgs.columns.tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Export data and columns for Test 5 obs" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4857\n", "['521558579', '592729694', '521272309', '351483868', '131683279']\n" ] } ], "source": [ "print len(orgs_2011)\n", "print orgs_2011[:5]" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4857\n" ] } ], "source": [ "print len(set(df_2011_orgs['EIN'].tolist()))" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "69850\n", "35\n", "69850\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revcategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
507090.00.00.05954010202467FY20142014-12CN 2.10.01.01.01.01.03.01.01.00.7944576.0NaN62.016.377993Research and Public PolicyMENaN0.00.00.00.00.00.00.00.00.00.01.0
507100.00.00.05954010202467FY20132013-12CN 2.00.00.01.01.01.03.01.01.00.8001520.0NaN62.016.134520Research and Public PolicyME10165601.00.00.00.00.00.00.00.00.00.00.01.0
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "50709 0.0 0.0 0.0 \n", "50710 0.0 0.0 0.0 \n", "\n", " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "50709 5954 010202467 FY2014 2014-12 CN 2.1 0.0 \n", "50710 5954 010202467 FY2013 2013-12 CN 2.0 0.0 \n", "\n", " 2016_data conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "50709 1.0 1.0 1.0 \n", "50710 0.0 1.0 1.0 \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "50709 1.0 3.0 1.0 \n", "50710 1.0 3.0 1.0 \n", "\n", " SOX_policies_all_binary program_efficiency complexity \\\n", "50709 1.0 0.794457 6.0 \n", "50710 1.0 0.800152 0.0 \n", "\n", " complexity_2011 age total_revenue_logged \\\n", "50709 NaN 62.0 16.377993 \n", "50710 NaN 62.0 16.134520 \n", "\n", " category state tot_rev category_Animals \\\n", "50709 Research and Public Policy ME NaN 0.0 \n", "50710 Research and Public Policy ME 10165601.0 0.0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "50709 0.0 0.0 \n", "50710 0.0 0.0 \n", "\n", " category_Education category_Environment category_Health \\\n", "50709 0.0 0.0 0.0 \n", "50710 0.0 0.0 0.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "50709 0.0 0.0 \n", "50710 0.0 0.0 \n", "\n", " category_International category_Religion \\\n", "50709 0.0 0.0 \n", "50710 0.0 0.0 \n", "\n", " category_Research and Public Policy \n", "50709 1.0 \n", "50710 1.0 " ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df[df['EIN'].isin(orgs_2011)])\n", "df_2011_orgs = df[df['EIN'].isin(orgs_2011)][logit_cols]\n", "print len(df_2011_orgs.columns)\n", "print len(df_2011_orgs)\n", "df_2011_orgs[:2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### NOTE: There are 94 advisories in 2016 because they 47 are counted twice -- once with FYE 'current' and once with the actual FYE as coded by Dan." ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0 69460\n", "1.0 108\n", "Name: donor_advisory, dtype: int64 \n", "\n", "0.0 69624\n", "1.0 94\n", "Name: donor_advisory_2016, dtype: int64 \n", "\n" ] } ], "source": [ "print df_2011_orgs['donor_advisory'].value_counts(), '\\n'\n", "print df_2011_orgs['donor_advisory_2016'].value_counts(), '\\n'" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "47\n" ] }, { "data": { "text/plain": [ "1.0 47\n", "Name: donor_advisory_2016, dtype: int64" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(first_data_2016_advisories_2011_orgs)\n", "first_data_2016_advisories_2011_orgs['donor_advisory_2016'].value_counts()" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['EIN', 'donor_advisory', 'donor_advisory_2016', 'donor_advisory_2011_to_2016', 'org_id', 'FYE', 'Form 990 FYE', 'ratings_system', '2011_data', '2016_data', 'conflict_of_interest_policy_v2', 'records_retention_policy_v2', 'whistleblower_policy_v2', 'SOX_policies', 'SOX_policies_binary', 'SOX_policies_all_binary', 'program_efficiency', 'complexity', 'complexity_2011', 'age', 'total_revenue_logged', 'category', 'state', 'tot_rev', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ "print first_data_2016_advisories_2011_orgs.columns.tolist()" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
donor_advisory47.01.000000e+000.000000e+001.0000001.000000e+001.000000e+001.000000e+001.000000e+00
donor_advisory_201647.01.000000e+000.000000e+001.0000001.000000e+001.000000e+001.000000e+001.000000e+00
donor_advisory_2011_to_201647.01.000000e+000.000000e+001.0000001.000000e+001.000000e+001.000000e+001.000000e+00
2011_data47.00.000000e+000.000000e+000.0000000.000000e+000.000000e+000.000000e+000.000000e+00
2016_data47.01.000000e+000.000000e+001.0000001.000000e+001.000000e+001.000000e+001.000000e+00
conflict_of_interest_policy_v247.07.659574e-014.279763e-010.0000001.000000e+001.000000e+001.000000e+001.000000e+00
records_retention_policy_v247.06.170213e-014.913686e-010.0000000.000000e+001.000000e+001.000000e+001.000000e+00
whistleblower_policy_v247.06.382979e-014.856879e-010.0000000.000000e+001.000000e+001.000000e+001.000000e+00
SOX_policies47.02.021277e+001.259561e+000.0000001.000000e+003.000000e+003.000000e+003.000000e+00
SOX_policies_binary47.07.659574e-014.279763e-010.0000001.000000e+001.000000e+001.000000e+001.000000e+00
SOX_policies_all_binary47.05.531915e-015.025375e-010.0000000.000000e+001.000000e+001.000000e+001.000000e+00
program_efficiency47.07.360223e-011.854859e-010.1147126.655270e-017.779041e-018.696869e-019.743193e-01
complexity47.00.000000e+000.000000e+000.0000000.000000e+000.000000e+000.000000e+000.000000e+00
complexity_201141.02.121951e+005.096627e-011.000000NaNNaNNaN3.000000e+00
age47.03.478723e+011.776652e+011.0000002.500000e+013.200000e+013.950000e+017.400000e+01
total_revenue_logged47.01.526122e+011.310822e+0013.0723911.427300e+011.507667e+011.619310e+011.876587e+01
tot_rev47.01.035996e+072.307296e+07234562.0000001.240092e+062.758339e+067.215746e+061.412263e+08
category_Animals47.06.382979e-022.470922e-010.0000000.000000e+000.000000e+000.000000e+001.000000e+00
category_Arts, Culture, Humanities47.04.255319e-022.040297e-010.0000000.000000e+000.000000e+000.000000e+001.000000e+00
category_Community Development47.08.510638e-022.820567e-010.0000000.000000e+000.000000e+000.000000e+001.000000e+00
category_Education47.08.510638e-022.820567e-010.0000000.000000e+000.000000e+000.000000e+001.000000e+00
category_Environment47.00.000000e+000.000000e+000.0000000.000000e+000.000000e+000.000000e+000.000000e+00
category_Health47.01.063830e-013.116605e-010.0000000.000000e+000.000000e+000.000000e+001.000000e+00
category_Human Services47.01.914894e-013.977271e-010.0000000.000000e+000.000000e+000.000000e+001.000000e+00
category_Human and Civil Rights47.04.255319e-022.040297e-010.0000000.000000e+000.000000e+000.000000e+001.000000e+00
category_International47.01.063830e-013.116605e-010.0000000.000000e+000.000000e+000.000000e+001.000000e+00
category_Religion47.02.340426e-014.279763e-010.0000000.000000e+000.000000e+000.000000e+001.000000e+00
category_Research and Public Policy47.04.255319e-022.040297e-010.0000000.000000e+000.000000e+000.000000e+001.000000e+00
\n", "
" ], "text/plain": [ " count mean std \\\n", "donor_advisory 47.0 1.000000e+00 0.000000e+00 \n", "donor_advisory_2016 47.0 1.000000e+00 0.000000e+00 \n", "donor_advisory_2011_to_2016 47.0 1.000000e+00 0.000000e+00 \n", "2011_data 47.0 0.000000e+00 0.000000e+00 \n", "2016_data 47.0 1.000000e+00 0.000000e+00 \n", "conflict_of_interest_policy_v2 47.0 7.659574e-01 4.279763e-01 \n", "records_retention_policy_v2 47.0 6.170213e-01 4.913686e-01 \n", "whistleblower_policy_v2 47.0 6.382979e-01 4.856879e-01 \n", "SOX_policies 47.0 2.021277e+00 1.259561e+00 \n", "SOX_policies_binary 47.0 7.659574e-01 4.279763e-01 \n", "SOX_policies_all_binary 47.0 5.531915e-01 5.025375e-01 \n", "program_efficiency 47.0 7.360223e-01 1.854859e-01 \n", "complexity 47.0 0.000000e+00 0.000000e+00 \n", "complexity_2011 41.0 2.121951e+00 5.096627e-01 \n", "age 47.0 3.478723e+01 1.776652e+01 \n", "total_revenue_logged 47.0 1.526122e+01 1.310822e+00 \n", "tot_rev 47.0 1.035996e+07 2.307296e+07 \n", "category_Animals 47.0 6.382979e-02 2.470922e-01 \n", "category_Arts, Culture, Humanities 47.0 4.255319e-02 2.040297e-01 \n", "category_Community Development 47.0 8.510638e-02 2.820567e-01 \n", "category_Education 47.0 8.510638e-02 2.820567e-01 \n", "category_Environment 47.0 0.000000e+00 0.000000e+00 \n", "category_Health 47.0 1.063830e-01 3.116605e-01 \n", "category_Human Services 47.0 1.914894e-01 3.977271e-01 \n", "category_Human and Civil Rights 47.0 4.255319e-02 2.040297e-01 \n", "category_International 47.0 1.063830e-01 3.116605e-01 \n", "category_Religion 47.0 2.340426e-01 4.279763e-01 \n", "category_Research and Public Policy 47.0 4.255319e-02 2.040297e-01 \n", "\n", " min 25% \\\n", "donor_advisory 1.000000 1.000000e+00 \n", "donor_advisory_2016 1.000000 1.000000e+00 \n", "donor_advisory_2011_to_2016 1.000000 1.000000e+00 \n", "2011_data 0.000000 0.000000e+00 \n", "2016_data 1.000000 1.000000e+00 \n", "conflict_of_interest_policy_v2 0.000000 1.000000e+00 \n", "records_retention_policy_v2 0.000000 0.000000e+00 \n", "whistleblower_policy_v2 0.000000 0.000000e+00 \n", "SOX_policies 0.000000 1.000000e+00 \n", "SOX_policies_binary 0.000000 1.000000e+00 \n", "SOX_policies_all_binary 0.000000 0.000000e+00 \n", "program_efficiency 0.114712 6.655270e-01 \n", "complexity 0.000000 0.000000e+00 \n", "complexity_2011 1.000000 NaN \n", "age 1.000000 2.500000e+01 \n", "total_revenue_logged 13.072391 1.427300e+01 \n", "tot_rev 234562.000000 1.240092e+06 \n", "category_Animals 0.000000 0.000000e+00 \n", "category_Arts, Culture, Humanities 0.000000 0.000000e+00 \n", "category_Community Development 0.000000 0.000000e+00 \n", "category_Education 0.000000 0.000000e+00 \n", "category_Environment 0.000000 0.000000e+00 \n", "category_Health 0.000000 0.000000e+00 \n", "category_Human Services 0.000000 0.000000e+00 \n", "category_Human and Civil Rights 0.000000 0.000000e+00 \n", "category_International 0.000000 0.000000e+00 \n", "category_Religion 0.000000 0.000000e+00 \n", "category_Research and Public Policy 0.000000 0.000000e+00 \n", "\n", " 50% 75% max \n", "donor_advisory 1.000000e+00 1.000000e+00 1.000000e+00 \n", "donor_advisory_2016 1.000000e+00 1.000000e+00 1.000000e+00 \n", "donor_advisory_2011_to_2016 1.000000e+00 1.000000e+00 1.000000e+00 \n", "2011_data 0.000000e+00 0.000000e+00 0.000000e+00 \n", "2016_data 1.000000e+00 1.000000e+00 1.000000e+00 \n", "conflict_of_interest_policy_v2 1.000000e+00 1.000000e+00 1.000000e+00 \n", "records_retention_policy_v2 1.000000e+00 1.000000e+00 1.000000e+00 \n", "whistleblower_policy_v2 1.000000e+00 1.000000e+00 1.000000e+00 \n", "SOX_policies 3.000000e+00 3.000000e+00 3.000000e+00 \n", "SOX_policies_binary 1.000000e+00 1.000000e+00 1.000000e+00 \n", "SOX_policies_all_binary 1.000000e+00 1.000000e+00 1.000000e+00 \n", "program_efficiency 7.779041e-01 8.696869e-01 9.743193e-01 \n", "complexity 0.000000e+00 0.000000e+00 0.000000e+00 \n", "complexity_2011 NaN NaN 3.000000e+00 \n", "age 3.200000e+01 3.950000e+01 7.400000e+01 \n", "total_revenue_logged 1.507667e+01 1.619310e+01 1.876587e+01 \n", "tot_rev 2.758339e+06 7.215746e+06 1.412263e+08 \n", "category_Animals 0.000000e+00 0.000000e+00 1.000000e+00 \n", "category_Arts, Culture, Humanities 0.000000e+00 0.000000e+00 1.000000e+00 \n", "category_Community Development 0.000000e+00 0.000000e+00 1.000000e+00 \n", "category_Education 0.000000e+00 0.000000e+00 1.000000e+00 \n", "category_Environment 0.000000e+00 0.000000e+00 0.000000e+00 \n", "category_Health 0.000000e+00 0.000000e+00 1.000000e+00 \n", "category_Human Services 0.000000e+00 0.000000e+00 1.000000e+00 \n", "category_Human and Civil Rights 0.000000e+00 0.000000e+00 1.000000e+00 \n", "category_International 0.000000e+00 0.000000e+00 1.000000e+00 \n", "category_Religion 0.000000e+00 0.000000e+00 1.000000e+00 \n", "category_Research and Public Policy 0.000000e+00 0.000000e+00 1.000000e+00 " ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_data_2016_advisories_2011_orgs.describe().T" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "set([])\n", "set([])\n" ] } ], "source": [ "print set(first_data_2016_advisories_2011_orgs.columns.tolist()) - set(df_2011_orgs.columns.tolist())\n", "print set(df_2011_orgs.columns.tolist()) - set(first_data_2016_advisories_2011_orgs.columns.tolist())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Get rid of *latest_entry* for each 2016 donor advisory org -- so, we're getting rid of 47 rows" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "69569\n", "281\n", "64993\n", "4857\n", "69850\n", "69850\n", "47\n", "69803\n", "69803\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisorydonor_advisory_2016donor_advisory_2011_to_2016org_idEINFYEForm 990 FYEratings_system2011_data2016_dataconflict_of_interest_policy_v2records_retention_policy_v2whistleblower_policy_v2SOX_policiesSOX_policies_binarySOX_policies_all_binaryprogram_efficiencycomplexitycomplexity_2011agetotal_revenue_loggedcategorystatetot_revcategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
507090.00.00.05954010202467FY20142014-12CN 2.10.01.01.01.01.03.01.01.00.7944576.0NaN62.016.377993Research and Public PolicyMENaN0.00.00.00.00.00.00.00.00.00.01.0
507100.00.00.05954010202467FY20132013-12CN 2.00.00.01.01.01.03.01.01.00.8001520.0NaN62.016.134520Research and Public PolicyME10165601.00.00.00.00.00.00.00.00.00.00.01.0
\n", "
" ], "text/plain": [ " donor_advisory donor_advisory_2016 donor_advisory_2011_to_2016 \\\n", "50709 0.0 0.0 0.0 \n", "50710 0.0 0.0 0.0 \n", "\n", " org_id EIN FYE Form 990 FYE ratings_system 2011_data \\\n", "50709 5954 010202467 FY2014 2014-12 CN 2.1 0.0 \n", "50710 5954 010202467 FY2013 2013-12 CN 2.0 0.0 \n", "\n", " 2016_data conflict_of_interest_policy_v2 records_retention_policy_v2 \\\n", "50709 1.0 1.0 1.0 \n", "50710 0.0 1.0 1.0 \n", "\n", " whistleblower_policy_v2 SOX_policies SOX_policies_binary \\\n", "50709 1.0 3.0 1.0 \n", "50710 1.0 3.0 1.0 \n", "\n", " SOX_policies_all_binary program_efficiency complexity \\\n", "50709 1.0 0.794457 6.0 \n", "50710 1.0 0.800152 0.0 \n", "\n", " complexity_2011 age total_revenue_logged \\\n", "50709 NaN 62.0 16.377993 \n", "50710 NaN 62.0 16.134520 \n", "\n", " category state tot_rev category_Animals \\\n", "50709 Research and Public Policy ME NaN 0.0 \n", "50710 Research and Public Policy ME 10165601.0 0.0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "50709 0.0 0.0 \n", "50710 0.0 0.0 \n", "\n", " category_Education category_Environment category_Health \\\n", "50709 0.0 0.0 0.0 \n", "50710 0.0 0.0 0.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "50709 0.0 0.0 \n", "50710 0.0 0.0 \n", "\n", " category_International category_Religion \\\n", "50709 0.0 0.0 \n", "50710 0.0 0.0 \n", "\n", " category_Research and Public Policy \n", "50709 1.0 \n", "50710 1.0 " ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df_2011_orgs[~df_2011_orgs['EIN'].isin(advisory_orgs_2011)])\n", "print len(df_2011_orgs[df_2011_orgs['EIN'].isin(advisory_orgs_2011)])\n", "print len(df_2011_orgs[~(df_2011_orgs['2016_data']==1)])\n", "print len(df_2011_orgs[df_2011_orgs['2016_data']==1])\n", "print 64993+4857 \n", "print len(df_2011_orgs)\n", "print len( df_2011_orgs[(df_2011_orgs['2016_data']==1) & (df_2011_orgs['EIN'].isin(advisory_orgs_2011))])\n", "print len( df_2011_orgs[~((df_2011_orgs['2016_data']==1) & (df_2011_orgs['EIN'].isin(advisory_orgs_2011)))])\n", "df_2011_orgs_mod = df_2011_orgs[~((df_2011_orgs['2016_data']==1) & (df_2011_orgs['EIN'].isin(advisory_orgs_2011)))]\n", "print len(df_2011_orgs_mod)\n", "df_2011_orgs_mod[:2]" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "69850\n", "255\n", "69803\n", "35\n", "69850\n", "35\n" ] } ], "source": [ "print len(df_2011_orgs_mod.append(first_data_2016_advisories_2011_orgs))\n", "print 8238-7983\n", "print len(df_2011_orgs_mod)\n", "print len(df_2011_orgs_mod.columns)\n", "df_2011_orgs_mod = df_2011_orgs_mod.append(first_data_2016_advisories_2011_orgs)\n", "print len(df_2011_orgs_mod)\n", "print len(df_2011_orgs_mod.columns)" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "281\n" ] } ], "source": [ "print len(df_2011_orgs_mod[df_2011_orgs_mod['EIN'].isin(advisory_orgs_2011)])\n", "df_2011_orgs_mod[df_2011_orgs_mod['EIN'].isin(advisory_orgs_2011)].to_excel('df_2011_orgs_mod_partial.xls')" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data69718.06.975243e-022.547312e-010.000000e+00NaNNaNNaN1.000000e+00
2016_data69718.06.966637e-022.545857e-010.000000e+00NaNNaNNaN1.000000e+00
SOX_policies17307.02.745883e+006.696433e-010.000000e+00NaNNaNNaN3.000000e+00
SOX_policies_all_binary17307.08.486739e-013.583768e-010.000000e+00NaNNaNNaN1.000000e+00
SOX_policies_binary17307.09.733634e-011.610236e-010.000000e+00NaNNaNNaN1.000000e+00
age69616.04.069758e+011.926887e+010.000000e+00NaNNaNNaN1.080000e+02
category_Animals69718.08.302017e-022.759147e-010.000000e+00NaNNaNNaN1.000000e+00
category_Arts, Culture, Humanities69718.01.404085e-013.474128e-010.000000e+00NaNNaNNaN1.000000e+00
category_Community Development69718.08.068218e-022.723484e-010.000000e+00NaNNaNNaN1.000000e+00
category_Education69718.05.581055e-022.295572e-010.000000e+00NaNNaNNaN1.000000e+00
category_Environment69718.06.768697e-022.512098e-010.000000e+00NaNNaNNaN1.000000e+00
category_Health69718.01.184773e-013.231748e-010.000000e+00NaNNaNNaN1.000000e+00
category_Human Services69718.02.352621e-014.241655e-010.000000e+00NaNNaNNaN1.000000e+00
category_Human and Civil Rights69718.03.755128e-021.901097e-010.000000e+00NaNNaNNaN1.000000e+00
category_International69718.09.198485e-022.890066e-010.000000e+00NaNNaNNaN1.000000e+00
category_Religion69718.06.219341e-022.415082e-010.000000e+00NaNNaNNaN1.000000e+00
category_Research and Public Policy69718.02.545971e-021.575178e-010.000000e+00NaNNaNNaN1.000000e+00
complexity69850.02.844094e-011.088412e+000.000000e+000.00.00.08.000000e+00
complexity_20114874.02.463890e+005.153392e-011.000000e+00NaNNaNNaN3.000000e+00
conflict_of_interest_policy_v217307.09.653319e-011.829429e-010.000000e+00NaNNaNNaN1.000000e+00
donor_advisory69568.01.552438e-033.937068e-020.000000e+00NaNNaNNaN1.000000e+00
donor_advisory_2011_to_201669718.01.302390e-021.133775e-010.000000e+00NaNNaNNaN1.000000e+00
donor_advisory_201669718.01.348289e-033.669455e-020.000000e+00NaNNaNNaN1.000000e+00
program_efficiency17306.08.087123e-011.012502e-013.833359e-03NaNNaNNaN1.010186e+00
records_retention_policy_v217307.08.875599e-013.159162e-010.000000e+00NaNNaNNaN1.000000e+00
tot_rev9462.05.273146e+071.483925e+08-7.919805e+07NaNNaNNaN3.587230e+09
total_revenue_logged17307.01.608389e+011.741080e+000.000000e+00NaNNaNNaN2.200080e+01
whistleblower_policy_v217307.08.929913e-013.091333e-010.000000e+00NaNNaNNaN1.000000e+00
\n", "
" ], "text/plain": [ " count mean std \\\n", "2011_data 69718.0 6.975243e-02 2.547312e-01 \n", "2016_data 69718.0 6.966637e-02 2.545857e-01 \n", "SOX_policies 17307.0 2.745883e+00 6.696433e-01 \n", "SOX_policies_all_binary 17307.0 8.486739e-01 3.583768e-01 \n", "SOX_policies_binary 17307.0 9.733634e-01 1.610236e-01 \n", "age 69616.0 4.069758e+01 1.926887e+01 \n", "category_Animals 69718.0 8.302017e-02 2.759147e-01 \n", "category_Arts, Culture, Humanities 69718.0 1.404085e-01 3.474128e-01 \n", "category_Community Development 69718.0 8.068218e-02 2.723484e-01 \n", "category_Education 69718.0 5.581055e-02 2.295572e-01 \n", "category_Environment 69718.0 6.768697e-02 2.512098e-01 \n", "category_Health 69718.0 1.184773e-01 3.231748e-01 \n", "category_Human Services 69718.0 2.352621e-01 4.241655e-01 \n", "category_Human and Civil Rights 69718.0 3.755128e-02 1.901097e-01 \n", "category_International 69718.0 9.198485e-02 2.890066e-01 \n", "category_Religion 69718.0 6.219341e-02 2.415082e-01 \n", "category_Research and Public Policy 69718.0 2.545971e-02 1.575178e-01 \n", "complexity 69850.0 2.844094e-01 1.088412e+00 \n", "complexity_2011 4874.0 2.463890e+00 5.153392e-01 \n", "conflict_of_interest_policy_v2 17307.0 9.653319e-01 1.829429e-01 \n", "donor_advisory 69568.0 1.552438e-03 3.937068e-02 \n", "donor_advisory_2011_to_2016 69718.0 1.302390e-02 1.133775e-01 \n", "donor_advisory_2016 69718.0 1.348289e-03 3.669455e-02 \n", "program_efficiency 17306.0 8.087123e-01 1.012502e-01 \n", "records_retention_policy_v2 17307.0 8.875599e-01 3.159162e-01 \n", "tot_rev 9462.0 5.273146e+07 1.483925e+08 \n", "total_revenue_logged 17307.0 1.608389e+01 1.741080e+00 \n", "whistleblower_policy_v2 17307.0 8.929913e-01 3.091333e-01 \n", "\n", " min 25% 50% 75% max \n", "2011_data 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "2016_data 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "SOX_policies 0.000000e+00 NaN NaN NaN 3.000000e+00 \n", "SOX_policies_all_binary 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "SOX_policies_binary 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "age 0.000000e+00 NaN NaN NaN 1.080000e+02 \n", "category_Animals 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "category_Arts, Culture, Humanities 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "category_Community Development 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "category_Education 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "category_Environment 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "category_Health 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "category_Human Services 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "category_Human and Civil Rights 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "category_International 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "category_Religion 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "category_Research and Public Policy 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "complexity 0.000000e+00 0.0 0.0 0.0 8.000000e+00 \n", "complexity_2011 1.000000e+00 NaN NaN NaN 3.000000e+00 \n", "conflict_of_interest_policy_v2 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "donor_advisory 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "donor_advisory_2011_to_2016 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "donor_advisory_2016 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "program_efficiency 3.833359e-03 NaN NaN NaN 1.010186e+00 \n", "records_retention_policy_v2 0.000000e+00 NaN NaN NaN 1.000000e+00 \n", "tot_rev -7.919805e+07 NaN NaN NaN 3.587230e+09 \n", "total_revenue_logged 0.000000e+00 NaN NaN NaN 2.200080e+01 \n", "whistleblower_policy_v2 0.000000e+00 NaN NaN NaN 1.000000e+00 " ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod.describe().T" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data149.03.154362e-014.662566e-010.000000NaNNaNNaN1.000000e+00
2016_data149.03.154362e-014.662566e-010.000000NaNNaNNaN1.000000e+00
SOX_policies276.02.202899e+001.179580e+000.000000NaNNaNNaN3.000000e+00
SOX_policies_all_binary276.06.340580e-014.825683e-010.000000NaNNaNNaN1.000000e+00
SOX_policies_binary276.08.188406e-013.858498e-010.000000NaNNaNNaN1.000000e+00
age94.03.478723e+011.767075e+011.000000NaNNaNNaN7.400000e+01
category_Animals149.04.026846e-021.972512e-010.000000NaNNaNNaN1.000000e+00
category_Arts, Culture, Humanities149.02.684564e-021.621773e-010.000000NaNNaNNaN1.000000e+00
category_Community Development149.05.369128e-022.261677e-010.000000NaNNaNNaN1.000000e+00
category_Education149.05.369128e-022.261677e-010.000000NaNNaNNaN1.000000e+00
category_Environment149.00.000000e+000.000000e+000.000000NaNNaNNaN0.000000e+00
category_Health149.06.711409e-022.510634e-010.000000NaNNaNNaN1.000000e+00
category_Human Services149.01.208054e-013.270001e-010.000000NaNNaNNaN1.000000e+00
category_Human and Civil Rights149.02.684564e-021.621773e-010.000000NaNNaNNaN1.000000e+00
category_International149.06.711409e-022.510634e-010.000000NaNNaNNaN1.000000e+00
category_Religion149.01.476510e-013.559502e-010.000000NaNNaNNaN1.000000e+00
category_Research and Public Policy149.02.684564e-021.621773e-010.000000NaNNaNNaN1.000000e+00
complexity281.01.160142e+001.594403e+000.0000000.00.02.07.000000e+00
complexity_201182.02.121951e+005.065069e-011.000000NaNNaNNaN3.000000e+00
conflict_of_interest_policy_v2276.08.043478e-013.974225e-010.000000NaNNaNNaN1.000000e+00
donor_advisory47.01.000000e+000.000000e+001.000000NaNNaNNaN1.000000e+00
donor_advisory_2011_to_2016149.06.308725e-014.841961e-010.000000NaNNaNNaN1.000000e+00
donor_advisory_2016149.06.308725e-014.841961e-010.000000NaNNaNNaN1.000000e+00
program_efficiency275.07.532153e-011.767741e-010.079828NaNNaNNaN1.000000e+00
records_retention_policy_v2276.06.847826e-014.654464e-010.000000NaNNaNNaN1.000000e+00
tot_rev249.01.471758e+072.983683e+07-56435.000000NaNNaNNaN1.786763e+08
total_revenue_logged276.01.530327e+011.710497e+000.000000NaNNaNNaN1.900109e+01
whistleblower_policy_v2276.07.137681e-014.528202e-010.000000NaNNaNNaN1.000000e+00
\n", "
" ], "text/plain": [ " count mean std \\\n", "2011_data 149.0 3.154362e-01 4.662566e-01 \n", "2016_data 149.0 3.154362e-01 4.662566e-01 \n", "SOX_policies 276.0 2.202899e+00 1.179580e+00 \n", "SOX_policies_all_binary 276.0 6.340580e-01 4.825683e-01 \n", "SOX_policies_binary 276.0 8.188406e-01 3.858498e-01 \n", "age 94.0 3.478723e+01 1.767075e+01 \n", "category_Animals 149.0 4.026846e-02 1.972512e-01 \n", "category_Arts, Culture, Humanities 149.0 2.684564e-02 1.621773e-01 \n", "category_Community Development 149.0 5.369128e-02 2.261677e-01 \n", "category_Education 149.0 5.369128e-02 2.261677e-01 \n", "category_Environment 149.0 0.000000e+00 0.000000e+00 \n", "category_Health 149.0 6.711409e-02 2.510634e-01 \n", "category_Human Services 149.0 1.208054e-01 3.270001e-01 \n", "category_Human and Civil Rights 149.0 2.684564e-02 1.621773e-01 \n", "category_International 149.0 6.711409e-02 2.510634e-01 \n", "category_Religion 149.0 1.476510e-01 3.559502e-01 \n", "category_Research and Public Policy 149.0 2.684564e-02 1.621773e-01 \n", "complexity 281.0 1.160142e+00 1.594403e+00 \n", "complexity_2011 82.0 2.121951e+00 5.065069e-01 \n", "conflict_of_interest_policy_v2 276.0 8.043478e-01 3.974225e-01 \n", "donor_advisory 47.0 1.000000e+00 0.000000e+00 \n", "donor_advisory_2011_to_2016 149.0 6.308725e-01 4.841961e-01 \n", "donor_advisory_2016 149.0 6.308725e-01 4.841961e-01 \n", "program_efficiency 275.0 7.532153e-01 1.767741e-01 \n", "records_retention_policy_v2 276.0 6.847826e-01 4.654464e-01 \n", "tot_rev 249.0 1.471758e+07 2.983683e+07 \n", "total_revenue_logged 276.0 1.530327e+01 1.710497e+00 \n", "whistleblower_policy_v2 276.0 7.137681e-01 4.528202e-01 \n", "\n", " min 25% 50% 75% max \n", "2011_data 0.000000 NaN NaN NaN 1.000000e+00 \n", "2016_data 0.000000 NaN NaN NaN 1.000000e+00 \n", "SOX_policies 0.000000 NaN NaN NaN 3.000000e+00 \n", "SOX_policies_all_binary 0.000000 NaN NaN NaN 1.000000e+00 \n", "SOX_policies_binary 0.000000 NaN NaN NaN 1.000000e+00 \n", "age 1.000000 NaN NaN NaN 7.400000e+01 \n", "category_Animals 0.000000 NaN NaN NaN 1.000000e+00 \n", "category_Arts, Culture, Humanities 0.000000 NaN NaN NaN 1.000000e+00 \n", "category_Community Development 0.000000 NaN NaN NaN 1.000000e+00 \n", "category_Education 0.000000 NaN NaN NaN 1.000000e+00 \n", "category_Environment 0.000000 NaN NaN NaN 0.000000e+00 \n", "category_Health 0.000000 NaN NaN NaN 1.000000e+00 \n", "category_Human Services 0.000000 NaN NaN NaN 1.000000e+00 \n", "category_Human and Civil Rights 0.000000 NaN NaN NaN 1.000000e+00 \n", "category_International 0.000000 NaN NaN NaN 1.000000e+00 \n", "category_Religion 0.000000 NaN NaN NaN 1.000000e+00 \n", "category_Research and Public Policy 0.000000 NaN NaN NaN 1.000000e+00 \n", "complexity 0.000000 0.0 0.0 2.0 7.000000e+00 \n", "complexity_2011 1.000000 NaN NaN NaN 3.000000e+00 \n", "conflict_of_interest_policy_v2 0.000000 NaN NaN NaN 1.000000e+00 \n", "donor_advisory 1.000000 NaN NaN NaN 1.000000e+00 \n", "donor_advisory_2011_to_2016 0.000000 NaN NaN NaN 1.000000e+00 \n", "donor_advisory_2016 0.000000 NaN NaN NaN 1.000000e+00 \n", "program_efficiency 0.079828 NaN NaN NaN 1.000000e+00 \n", "records_retention_policy_v2 0.000000 NaN NaN NaN 1.000000e+00 \n", "tot_rev -56435.000000 NaN NaN NaN 1.786763e+08 \n", "total_revenue_logged 0.000000 NaN NaN NaN 1.900109e+01 \n", "whistleblower_policy_v2 0.000000 NaN NaN NaN 1.000000e+00 " ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[df_2011_orgs_mod['EIN'].isin(advisory_orgs_2011)].describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Save DF" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df.to_pickle('df.pkl')\n", "first_data_2016_advisories_2011_orgs.to_pickle('first_data_2016_advisories_2011_orgs.pkl')\n", "df_2011_orgs_mod.to_pickle('Test 4 data.pkl')\n", "df_2011_orgs_mod.to_excel('Test 4 data.xlsx')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Inspect DF " ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_2011_orgs_mod = df_2011_orgs_mod.sort_values(by=['EIN', '2016_data', 'FYE', 'ratings_system'], ascending=[1, 0, 0, 0])" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "35\n", "69850\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2011_data2016_dataEINFYEForm 990 FYESOX_policiesSOX_policies_all_binarySOX_policies_binaryagecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policycomplexitycomplexity_2011conflict_of_interest_policy_v2donor_advisorydonor_advisory_2011_to_2016donor_advisory_2016org_idprogram_efficiencyratings_systemrecords_retention_policy_v2statetot_revtotal_revenue_loggedwhistleblower_policy_v2
507090.01.0010202467FY20142014-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.06.0NaN1.00.00.00.059540.794457CN 2.11.0MENaN16.3779931.0
507100.00.0010202467FY20132013-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaN1.00.00.00.059540.800152CN 2.01.0ME10165601.016.1345201.0
507110.00.0010202467FY20122012-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaN1.00.00.00.059540.795793CN 2.01.0ME11407051.016.2497421.0
507120.00.0010202467FY20122012-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaN1.00.00.00.059540.795793CN 2.01.0ME11407051.016.2497421.0
507130.00.0010202467FY20112011-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaN1.00.00.00.059540.824838CN 2.01.0ME13209918.016.3964781.0
507140.00.0010202467FY20102010-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaN1.00.00.00.059540.818602CN 2.01.0ME9478299.016.0645151.0
507151.00.0010202467FY20092009-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.03.01.00.00.00.059540.788895CN 2.01.0ME8432154.015.9475631.0
507160.00.0010202467FY20092009-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaN1.00.00.00.059540.788895CN 1.01.0ME8432154.015.9475631.0
507170.00.0010202467FY20082008-122.00.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaN1.00.00.00.059540.818186CN 1.01.0ME10342120.016.1517350.0
507180.00.0010202467FY20072007-12NaNNaNNaN62.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaNNaN0.00.00.05954NaNCN 1.0NaNMENaNNaNNaN
507190.00.0010202467FY20062006-12NaNNaNNaN62.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaNNaN0.00.00.05954NaNCN 1.0NaNMENaNNaNNaN
507200.00.0010202467FY20052005-12NaNNaNNaN62.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaNNaN0.00.00.05954NaNCN 1.0NaNMENaNNaNNaN
507210.00.0010202467FY20042004-12NaNNaNNaN62.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaNNaN0.00.00.05954NaNCN 1.0NaNMENaNNaNNaN
507220.00.0010202467FY20032003-12NaNNaNNaN62.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaNNaN0.00.00.05954NaNCN 1.0NaNMENaNNaNNaN
507230.00.0010202467FY20022002-12NaNNaNNaN62.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaNNaN0.00.00.05954NaNCN 1.0NaNMENaNNaNNaN
507240.00.0010202467FY20012001-12NaNNaNNaN62.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.0NaNNaN0.00.00.05954NaNCN 1.0NaNMENaNNaNNaN
403480.01.0010211513FY20142014-123.01.01.066.0Health0.00.00.00.00.01.00.00.00.00.00.05.0NaN1.00.00.00.039160.833296CN 2.11.0MENaN19.4908571.0
403490.00.0010211513FY20142014-12NaNNaNNaN66.0Health0.00.00.00.00.01.00.00.00.00.00.00.0NaNNaN0.00.00.03916NaNCN 2.0NaNMENaNNaNNaN
403500.00.0010211513FY20132013-123.01.01.066.0Health0.00.00.00.00.01.00.00.00.00.00.00.0NaN1.00.00.00.039160.835431CN 2.01.0ME257132786.019.3651031.0
403510.00.0010211513FY20122012-123.01.01.066.0Health0.00.00.00.00.01.00.00.00.00.00.00.0NaN1.00.00.00.039160.849363CN 2.01.0ME231079981.019.2582741.0
403520.00.0010211513FY20112011-053.01.01.066.0Health0.00.00.00.00.01.00.00.00.00.00.00.0NaN1.00.00.00.039160.855584CN 2.01.0ME231514645.019.2601541.0
403530.00.0010211513FY20112011-053.01.01.066.0Health0.00.00.00.00.01.00.00.00.00.00.00.0NaN1.00.00.00.039160.855584CN 2.01.0ME231514645.019.2601541.0
403541.00.0010211513FY20102010-053.01.01.066.0Health0.00.00.00.00.01.00.00.00.00.00.00.03.01.00.00.00.039160.858851CN 2.01.0ME200282021.019.1152371.0
403550.00.0010211513FY20092009-053.01.01.066.0Health0.00.00.00.00.01.00.00.00.00.00.00.0NaN1.00.00.00.039160.793051CN 1.01.0ME171297125.018.9589101.0
403560.00.0010211513FY20082008-05NaNNaNNaN66.0Health0.00.00.00.00.01.00.00.00.00.00.00.0NaNNaN0.00.00.03916NaNCN 1.0NaNMENaNNaNNaN
\n", "
" ], "text/plain": [ " 2011_data 2016_data EIN FYE Form 990 FYE SOX_policies \\\n", "50709 0.0 1.0 010202467 FY2014 2014-12 3.0 \n", "50710 0.0 0.0 010202467 FY2013 2013-12 3.0 \n", "50711 0.0 0.0 010202467 FY2012 2012-12 3.0 \n", "50712 0.0 0.0 010202467 FY2012 2012-12 3.0 \n", "50713 0.0 0.0 010202467 FY2011 2011-12 3.0 \n", "50714 0.0 0.0 010202467 FY2010 2010-12 3.0 \n", "50715 1.0 0.0 010202467 FY2009 2009-12 3.0 \n", "50716 0.0 0.0 010202467 FY2009 2009-12 3.0 \n", "50717 0.0 0.0 010202467 FY2008 2008-12 2.0 \n", "50718 0.0 0.0 010202467 FY2007 2007-12 NaN \n", "50719 0.0 0.0 010202467 FY2006 2006-12 NaN \n", "50720 0.0 0.0 010202467 FY2005 2005-12 NaN \n", "50721 0.0 0.0 010202467 FY2004 2004-12 NaN \n", "50722 0.0 0.0 010202467 FY2003 2003-12 NaN \n", "50723 0.0 0.0 010202467 FY2002 2002-12 NaN \n", "50724 0.0 0.0 010202467 FY2001 2001-12 NaN \n", "40348 0.0 1.0 010211513 FY2014 2014-12 3.0 \n", "40349 0.0 0.0 010211513 FY2014 2014-12 NaN \n", "40350 0.0 0.0 010211513 FY2013 2013-12 3.0 \n", "40351 0.0 0.0 010211513 FY2012 2012-12 3.0 \n", "40352 0.0 0.0 010211513 FY2011 2011-05 3.0 \n", "40353 0.0 0.0 010211513 FY2011 2011-05 3.0 \n", "40354 1.0 0.0 010211513 FY2010 2010-05 3.0 \n", "40355 0.0 0.0 010211513 FY2009 2009-05 3.0 \n", "40356 0.0 0.0 010211513 FY2008 2008-05 NaN \n", "\n", " SOX_policies_all_binary SOX_policies_binary age \\\n", "50709 1.0 1.0 62.0 \n", "50710 1.0 1.0 62.0 \n", "50711 1.0 1.0 62.0 \n", "50712 1.0 1.0 62.0 \n", "50713 1.0 1.0 62.0 \n", "50714 1.0 1.0 62.0 \n", "50715 1.0 1.0 62.0 \n", "50716 1.0 1.0 62.0 \n", "50717 0.0 1.0 62.0 \n", "50718 NaN NaN 62.0 \n", "50719 NaN NaN 62.0 \n", "50720 NaN NaN 62.0 \n", "50721 NaN NaN 62.0 \n", "50722 NaN NaN 62.0 \n", "50723 NaN NaN 62.0 \n", "50724 NaN NaN 62.0 \n", "40348 1.0 1.0 66.0 \n", "40349 NaN NaN 66.0 \n", "40350 1.0 1.0 66.0 \n", "40351 1.0 1.0 66.0 \n", "40352 1.0 1.0 66.0 \n", "40353 1.0 1.0 66.0 \n", "40354 1.0 1.0 66.0 \n", "40355 1.0 1.0 66.0 \n", "40356 NaN NaN 66.0 \n", "\n", " category category_Animals \\\n", "50709 Research and Public Policy 0.0 \n", "50710 Research and Public Policy 0.0 \n", "50711 Research and Public Policy 0.0 \n", "50712 Research and Public Policy 0.0 \n", "50713 Research and Public Policy 0.0 \n", "50714 Research and Public Policy 0.0 \n", "50715 Research and Public Policy 0.0 \n", "50716 Research and Public Policy 0.0 \n", "50717 Research and Public Policy 0.0 \n", "50718 Research and Public Policy 0.0 \n", "50719 Research and Public Policy 0.0 \n", "50720 Research and Public Policy 0.0 \n", "50721 Research and Public Policy 0.0 \n", "50722 Research and Public Policy 0.0 \n", "50723 Research and Public Policy 0.0 \n", "50724 Research and Public Policy 0.0 \n", "40348 Health 0.0 \n", "40349 Health 0.0 \n", "40350 Health 0.0 \n", "40351 Health 0.0 \n", "40352 Health 0.0 \n", "40353 Health 0.0 \n", "40354 Health 0.0 \n", "40355 Health 0.0 \n", "40356 Health 0.0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "50709 0.0 0.0 \n", "50710 0.0 0.0 \n", "50711 0.0 0.0 \n", "50712 0.0 0.0 \n", "50713 0.0 0.0 \n", "50714 0.0 0.0 \n", "50715 0.0 0.0 \n", "50716 0.0 0.0 \n", "50717 0.0 0.0 \n", "50718 0.0 0.0 \n", "50719 0.0 0.0 \n", "50720 0.0 0.0 \n", "50721 0.0 0.0 \n", "50722 0.0 0.0 \n", "50723 0.0 0.0 \n", "50724 0.0 0.0 \n", "40348 0.0 0.0 \n", "40349 0.0 0.0 \n", "40350 0.0 0.0 \n", "40351 0.0 0.0 \n", "40352 0.0 0.0 \n", "40353 0.0 0.0 \n", "40354 0.0 0.0 \n", "40355 0.0 0.0 \n", "40356 0.0 0.0 \n", "\n", " category_Education category_Environment category_Health \\\n", "50709 0.0 0.0 0.0 \n", "50710 0.0 0.0 0.0 \n", "50711 0.0 0.0 0.0 \n", "50712 0.0 0.0 0.0 \n", "50713 0.0 0.0 0.0 \n", "50714 0.0 0.0 0.0 \n", "50715 0.0 0.0 0.0 \n", "50716 0.0 0.0 0.0 \n", "50717 0.0 0.0 0.0 \n", "50718 0.0 0.0 0.0 \n", "50719 0.0 0.0 0.0 \n", "50720 0.0 0.0 0.0 \n", "50721 0.0 0.0 0.0 \n", "50722 0.0 0.0 0.0 \n", "50723 0.0 0.0 0.0 \n", "50724 0.0 0.0 0.0 \n", "40348 0.0 0.0 1.0 \n", "40349 0.0 0.0 1.0 \n", "40350 0.0 0.0 1.0 \n", "40351 0.0 0.0 1.0 \n", "40352 0.0 0.0 1.0 \n", "40353 0.0 0.0 1.0 \n", "40354 0.0 0.0 1.0 \n", "40355 0.0 0.0 1.0 \n", "40356 0.0 0.0 1.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "50709 0.0 0.0 \n", "50710 0.0 0.0 \n", "50711 0.0 0.0 \n", "50712 0.0 0.0 \n", "50713 0.0 0.0 \n", "50714 0.0 0.0 \n", "50715 0.0 0.0 \n", "50716 0.0 0.0 \n", "50717 0.0 0.0 \n", "50718 0.0 0.0 \n", "50719 0.0 0.0 \n", "50720 0.0 0.0 \n", "50721 0.0 0.0 \n", "50722 0.0 0.0 \n", "50723 0.0 0.0 \n", "50724 0.0 0.0 \n", "40348 0.0 0.0 \n", "40349 0.0 0.0 \n", "40350 0.0 0.0 \n", "40351 0.0 0.0 \n", "40352 0.0 0.0 \n", "40353 0.0 0.0 \n", "40354 0.0 0.0 \n", "40355 0.0 0.0 \n", "40356 0.0 0.0 \n", "\n", " category_International category_Religion \\\n", "50709 0.0 0.0 \n", "50710 0.0 0.0 \n", "50711 0.0 0.0 \n", "50712 0.0 0.0 \n", "50713 0.0 0.0 \n", "50714 0.0 0.0 \n", "50715 0.0 0.0 \n", "50716 0.0 0.0 \n", "50717 0.0 0.0 \n", "50718 0.0 0.0 \n", "50719 0.0 0.0 \n", "50720 0.0 0.0 \n", "50721 0.0 0.0 \n", "50722 0.0 0.0 \n", "50723 0.0 0.0 \n", "50724 0.0 0.0 \n", "40348 0.0 0.0 \n", "40349 0.0 0.0 \n", "40350 0.0 0.0 \n", "40351 0.0 0.0 \n", "40352 0.0 0.0 \n", "40353 0.0 0.0 \n", "40354 0.0 0.0 \n", "40355 0.0 0.0 \n", "40356 0.0 0.0 \n", "\n", " category_Research and Public Policy complexity complexity_2011 \\\n", "50709 1.0 6.0 NaN \n", "50710 1.0 0.0 NaN \n", "50711 1.0 0.0 NaN \n", "50712 1.0 0.0 NaN \n", "50713 1.0 0.0 NaN \n", "50714 1.0 0.0 NaN \n", "50715 1.0 0.0 3.0 \n", "50716 1.0 0.0 NaN \n", "50717 1.0 0.0 NaN \n", "50718 1.0 0.0 NaN \n", "50719 1.0 0.0 NaN \n", "50720 1.0 0.0 NaN \n", "50721 1.0 0.0 NaN \n", "50722 1.0 0.0 NaN \n", "50723 1.0 0.0 NaN \n", "50724 1.0 0.0 NaN \n", "40348 0.0 5.0 NaN \n", "40349 0.0 0.0 NaN \n", "40350 0.0 0.0 NaN \n", "40351 0.0 0.0 NaN \n", "40352 0.0 0.0 NaN \n", "40353 0.0 0.0 NaN \n", "40354 0.0 0.0 3.0 \n", "40355 0.0 0.0 NaN \n", "40356 0.0 0.0 NaN \n", "\n", " conflict_of_interest_policy_v2 donor_advisory \\\n", "50709 1.0 0.0 \n", "50710 1.0 0.0 \n", "50711 1.0 0.0 \n", "50712 1.0 0.0 \n", "50713 1.0 0.0 \n", "50714 1.0 0.0 \n", "50715 1.0 0.0 \n", "50716 1.0 0.0 \n", "50717 1.0 0.0 \n", "50718 NaN 0.0 \n", "50719 NaN 0.0 \n", "50720 NaN 0.0 \n", "50721 NaN 0.0 \n", "50722 NaN 0.0 \n", "50723 NaN 0.0 \n", "50724 NaN 0.0 \n", "40348 1.0 0.0 \n", "40349 NaN 0.0 \n", "40350 1.0 0.0 \n", "40351 1.0 0.0 \n", "40352 1.0 0.0 \n", "40353 1.0 0.0 \n", "40354 1.0 0.0 \n", "40355 1.0 0.0 \n", "40356 NaN 0.0 \n", "\n", " donor_advisory_2011_to_2016 donor_advisory_2016 org_id \\\n", "50709 0.0 0.0 5954 \n", "50710 0.0 0.0 5954 \n", "50711 0.0 0.0 5954 \n", "50712 0.0 0.0 5954 \n", "50713 0.0 0.0 5954 \n", "50714 0.0 0.0 5954 \n", "50715 0.0 0.0 5954 \n", "50716 0.0 0.0 5954 \n", "50717 0.0 0.0 5954 \n", "50718 0.0 0.0 5954 \n", "50719 0.0 0.0 5954 \n", "50720 0.0 0.0 5954 \n", "50721 0.0 0.0 5954 \n", "50722 0.0 0.0 5954 \n", "50723 0.0 0.0 5954 \n", "50724 0.0 0.0 5954 \n", "40348 0.0 0.0 3916 \n", "40349 0.0 0.0 3916 \n", "40350 0.0 0.0 3916 \n", "40351 0.0 0.0 3916 \n", "40352 0.0 0.0 3916 \n", "40353 0.0 0.0 3916 \n", "40354 0.0 0.0 3916 \n", "40355 0.0 0.0 3916 \n", "40356 0.0 0.0 3916 \n", "\n", " program_efficiency ratings_system records_retention_policy_v2 state \\\n", "50709 0.794457 CN 2.1 1.0 ME \n", "50710 0.800152 CN 2.0 1.0 ME \n", "50711 0.795793 CN 2.0 1.0 ME \n", "50712 0.795793 CN 2.0 1.0 ME \n", "50713 0.824838 CN 2.0 1.0 ME \n", "50714 0.818602 CN 2.0 1.0 ME \n", "50715 0.788895 CN 2.0 1.0 ME \n", "50716 0.788895 CN 1.0 1.0 ME \n", "50717 0.818186 CN 1.0 1.0 ME \n", "50718 NaN CN 1.0 NaN ME \n", "50719 NaN CN 1.0 NaN ME \n", "50720 NaN CN 1.0 NaN ME \n", "50721 NaN CN 1.0 NaN ME \n", "50722 NaN CN 1.0 NaN ME \n", "50723 NaN CN 1.0 NaN ME \n", "50724 NaN CN 1.0 NaN ME \n", "40348 0.833296 CN 2.1 1.0 ME \n", "40349 NaN CN 2.0 NaN ME \n", "40350 0.835431 CN 2.0 1.0 ME \n", "40351 0.849363 CN 2.0 1.0 ME \n", "40352 0.855584 CN 2.0 1.0 ME \n", "40353 0.855584 CN 2.0 1.0 ME \n", "40354 0.858851 CN 2.0 1.0 ME \n", "40355 0.793051 CN 1.0 1.0 ME \n", "40356 NaN CN 1.0 NaN ME \n", "\n", " tot_rev total_revenue_logged whistleblower_policy_v2 \n", "50709 NaN 16.377993 1.0 \n", "50710 10165601.0 16.134520 1.0 \n", "50711 11407051.0 16.249742 1.0 \n", "50712 11407051.0 16.249742 1.0 \n", "50713 13209918.0 16.396478 1.0 \n", "50714 9478299.0 16.064515 1.0 \n", "50715 8432154.0 15.947563 1.0 \n", "50716 8432154.0 15.947563 1.0 \n", "50717 10342120.0 16.151735 0.0 \n", "50718 NaN NaN NaN \n", "50719 NaN NaN NaN \n", "50720 NaN NaN NaN \n", "50721 NaN NaN NaN \n", "50722 NaN NaN NaN \n", "50723 NaN NaN NaN \n", "50724 NaN NaN NaN \n", "40348 NaN 19.490857 1.0 \n", "40349 NaN NaN NaN \n", "40350 257132786.0 19.365103 1.0 \n", "40351 231079981.0 19.258274 1.0 \n", "40352 231514645.0 19.260154 1.0 \n", "40353 231514645.0 19.260154 1.0 \n", "40354 200282021.0 19.115237 1.0 \n", "40355 171297125.0 18.958910 1.0 \n", "40356 NaN NaN NaN " ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df_2011_orgs_mod.columns)\n", "print len(df_2011_orgs_mod)\n", "df_2011_orgs_mod[:25]" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.0 69624\n", "1.0 94\n", "Name: donor_advisory_2016, dtype: int64" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod['donor_advisory_2016'].value_counts()" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['2011_data', '2016_data', 'EIN', 'FYE', 'Form 990 FYE', 'SOX_policies', 'SOX_policies_all_binary', 'SOX_policies_binary', 'age', 'category', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy', 'complexity', 'complexity_2011', 'conflict_of_interest_policy_v2', 'donor_advisory', 'donor_advisory_2011_to_2016', 'donor_advisory_2016', 'org_id', 'program_efficiency', 'ratings_system', 'records_retention_policy_v2', 'state', 'tot_rev', 'total_revenue_logged', 'whistleblower_policy_v2']\n" ] } ], "source": [ "print df_2011_orgs_mod.columns.tolist()" ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_2011_orgs_mod[df_2011_orgs_mod['EIN'].isin(advisory_orgs_2011)].to_excel('data for 47 orgs.xls')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Create Change Variables" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_2011_orgs_mod['NEW_SOX'] = np.where()" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4863\n", "4838\n", "4857\n", "4857\n" ] } ], "source": [ "print len(df_2011_orgs_mod[df_2011_orgs_mod['2011_data']==1])\n", "print len(df_2011_orgs_mod[(df_2011_orgs_mod['2011_data']==1) & (df_2011_orgs_mod['SOX_policies'].notnull())])\n", "print len(df_2011_orgs_mod[df_2011_orgs_mod['2016_data']==1])\n", "print len(df_2011_orgs_mod[(df_2011_orgs_mod['2016_data']==1) & (df_2011_orgs_mod['SOX_policies'].notnull())])" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data4863.01.000000e+000.000000e+001.000000e+001.01.01.01.000000e+00
2016_data4863.00.000000e+000.000000e+000.000000e+000.00.00.00.000000e+00
SOX_policies4838.02.533072e+008.696534e-010.000000e+00NaNNaNNaN3.000000e+00
SOX_policies_all_binary4838.07.339810e-014.419200e-010.000000e+00NaNNaNNaN1.000000e+00
SOX_policies_binary4838.09.472923e-012.234725e-010.000000e+00NaNNaNNaN1.000000e+00
age4863.04.004771e+011.923620e+010.000000e+0025.035.052.01.080000e+02
category_Animals4863.07.731853e-022.671237e-010.000000e+000.00.00.01.000000e+00
category_Arts, Culture, Humanities4863.01.392145e-013.462058e-010.000000e+000.00.00.01.000000e+00
category_Community Development4863.07.978614e-022.709897e-010.000000e+000.00.00.01.000000e+00
category_Education4863.05.819453e-022.341350e-010.000000e+000.00.00.01.000000e+00
category_Environment4863.06.580300e-022.479629e-010.000000e+000.00.00.01.000000e+00
category_Health4863.01.192679e-013.241369e-010.000000e+000.00.00.01.000000e+00
category_Human Services4863.02.475838e-014.316531e-010.000000e+000.00.00.01.000000e+00
category_Human and Civil Rights4863.03.742546e-021.898215e-010.000000e+000.00.00.01.000000e+00
category_International4863.08.924532e-022.851268e-010.000000e+000.00.00.01.000000e+00
category_Religion4863.06.107341e-022.394895e-010.000000e+000.00.00.01.000000e+00
category_Research and Public Policy4863.02.508739e-021.564067e-010.000000e+000.00.00.01.000000e+00
complexity4863.00.000000e+000.000000e+000.000000e+000.00.00.00.000000e+00
complexity_20114833.02.466791e+005.144678e-011.000000e+00NaNNaNNaN3.000000e+00
conflict_of_interest_policy_v24838.09.336503e-012.489182e-010.000000e+00NaNNaNNaN1.000000e+00
donor_advisory4815.04.984424e-037.043159e-020.000000e+00NaNNaNNaN1.000000e+00
donor_advisory_2011_to_20164863.02.220851e-021.473763e-010.000000e+000.00.00.01.000000e+00
donor_advisory_20164863.09.664816e-039.784363e-020.000000e+000.00.00.01.000000e+00
program_efficiency4838.08.046909e-011.055729e-012.217704e-02NaNNaNNaN9.976872e-01
records_retention_policy_v24838.07.995039e-014.004130e-010.000000e+00NaNNaNNaN1.000000e+00
tot_rev1257.04.312611e+071.378553e+08-4.263887e+07NaNNaNNaN3.587230e+09
total_revenue_logged4838.01.546172e+011.654727e+000.000000e+00NaNNaNNaN2.200080e+01
whistleblower_policy_v24838.07.999173e-014.001033e-010.000000e+00NaNNaNNaN1.000000e+00
\n", "
" ], "text/plain": [ " count mean std \\\n", "2011_data 4863.0 1.000000e+00 0.000000e+00 \n", "2016_data 4863.0 0.000000e+00 0.000000e+00 \n", "SOX_policies 4838.0 2.533072e+00 8.696534e-01 \n", "SOX_policies_all_binary 4838.0 7.339810e-01 4.419200e-01 \n", "SOX_policies_binary 4838.0 9.472923e-01 2.234725e-01 \n", "age 4863.0 4.004771e+01 1.923620e+01 \n", "category_Animals 4863.0 7.731853e-02 2.671237e-01 \n", "category_Arts, Culture, Humanities 4863.0 1.392145e-01 3.462058e-01 \n", "category_Community Development 4863.0 7.978614e-02 2.709897e-01 \n", "category_Education 4863.0 5.819453e-02 2.341350e-01 \n", "category_Environment 4863.0 6.580300e-02 2.479629e-01 \n", "category_Health 4863.0 1.192679e-01 3.241369e-01 \n", "category_Human Services 4863.0 2.475838e-01 4.316531e-01 \n", "category_Human and Civil Rights 4863.0 3.742546e-02 1.898215e-01 \n", "category_International 4863.0 8.924532e-02 2.851268e-01 \n", "category_Religion 4863.0 6.107341e-02 2.394895e-01 \n", "category_Research and Public Policy 4863.0 2.508739e-02 1.564067e-01 \n", "complexity 4863.0 0.000000e+00 0.000000e+00 \n", "complexity_2011 4833.0 2.466791e+00 5.144678e-01 \n", "conflict_of_interest_policy_v2 4838.0 9.336503e-01 2.489182e-01 \n", "donor_advisory 4815.0 4.984424e-03 7.043159e-02 \n", "donor_advisory_2011_to_2016 4863.0 2.220851e-02 1.473763e-01 \n", "donor_advisory_2016 4863.0 9.664816e-03 9.784363e-02 \n", "program_efficiency 4838.0 8.046909e-01 1.055729e-01 \n", "records_retention_policy_v2 4838.0 7.995039e-01 4.004130e-01 \n", "tot_rev 1257.0 4.312611e+07 1.378553e+08 \n", "total_revenue_logged 4838.0 1.546172e+01 1.654727e+00 \n", "whistleblower_policy_v2 4838.0 7.999173e-01 4.001033e-01 \n", "\n", " min 25% 50% 75% \\\n", "2011_data 1.000000e+00 1.0 1.0 1.0 \n", "2016_data 0.000000e+00 0.0 0.0 0.0 \n", "SOX_policies 0.000000e+00 NaN NaN NaN \n", "SOX_policies_all_binary 0.000000e+00 NaN NaN NaN \n", "SOX_policies_binary 0.000000e+00 NaN NaN NaN \n", "age 0.000000e+00 25.0 35.0 52.0 \n", "category_Animals 0.000000e+00 0.0 0.0 0.0 \n", "category_Arts, Culture, Humanities 0.000000e+00 0.0 0.0 0.0 \n", "category_Community Development 0.000000e+00 0.0 0.0 0.0 \n", "category_Education 0.000000e+00 0.0 0.0 0.0 \n", "category_Environment 0.000000e+00 0.0 0.0 0.0 \n", "category_Health 0.000000e+00 0.0 0.0 0.0 \n", "category_Human Services 0.000000e+00 0.0 0.0 0.0 \n", "category_Human and Civil Rights 0.000000e+00 0.0 0.0 0.0 \n", "category_International 0.000000e+00 0.0 0.0 0.0 \n", "category_Religion 0.000000e+00 0.0 0.0 0.0 \n", "category_Research and Public Policy 0.000000e+00 0.0 0.0 0.0 \n", "complexity 0.000000e+00 0.0 0.0 0.0 \n", "complexity_2011 1.000000e+00 NaN NaN NaN \n", "conflict_of_interest_policy_v2 0.000000e+00 NaN NaN NaN \n", "donor_advisory 0.000000e+00 NaN NaN NaN \n", "donor_advisory_2011_to_2016 0.000000e+00 0.0 0.0 0.0 \n", "donor_advisory_2016 0.000000e+00 0.0 0.0 0.0 \n", "program_efficiency 2.217704e-02 NaN NaN NaN \n", "records_retention_policy_v2 0.000000e+00 NaN NaN NaN \n", "tot_rev -4.263887e+07 NaN NaN NaN \n", "total_revenue_logged 0.000000e+00 NaN NaN NaN \n", "whistleblower_policy_v2 0.000000e+00 NaN NaN NaN \n", "\n", " max \n", "2011_data 1.000000e+00 \n", "2016_data 0.000000e+00 \n", "SOX_policies 3.000000e+00 \n", "SOX_policies_all_binary 1.000000e+00 \n", "SOX_policies_binary 1.000000e+00 \n", "age 1.080000e+02 \n", "category_Animals 1.000000e+00 \n", "category_Arts, Culture, Humanities 1.000000e+00 \n", "category_Community Development 1.000000e+00 \n", "category_Education 1.000000e+00 \n", "category_Environment 1.000000e+00 \n", "category_Health 1.000000e+00 \n", "category_Human Services 1.000000e+00 \n", "category_Human and Civil Rights 1.000000e+00 \n", "category_International 1.000000e+00 \n", "category_Religion 1.000000e+00 \n", "category_Research and Public Policy 1.000000e+00 \n", "complexity 0.000000e+00 \n", "complexity_2011 3.000000e+00 \n", "conflict_of_interest_policy_v2 1.000000e+00 \n", "donor_advisory 1.000000e+00 \n", "donor_advisory_2011_to_2016 1.000000e+00 \n", "donor_advisory_2016 1.000000e+00 \n", "program_efficiency 9.976872e-01 \n", "records_retention_policy_v2 1.000000e+00 \n", "tot_rev 3.587230e+09 \n", "total_revenue_logged 2.200080e+01 \n", "whistleblower_policy_v2 1.000000e+00 " ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[df_2011_orgs_mod['2011_data']==1].describe().T" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data4857.00.000000e+000.000000e+000.0000000.0000000.0000000.0000000.000000e+00
2016_data4857.01.000000e+000.000000e+001.0000001.0000001.0000001.0000001.000000e+00
SOX_policies4857.02.870496e+004.882352e-010.0000003.0000003.0000003.0000003.000000e+00
SOX_policies_all_binary4857.09.192917e-012.724146e-010.0000001.0000001.0000001.0000001.000000e+00
SOX_policies_binary4857.09.859996e-011.175042e-010.0000001.0000001.0000001.0000001.000000e+00
age4857.04.003521e+011.922688e+010.00000025.00000035.00000052.0000001.080000e+02
category_Animals4857.07.700226e-022.666225e-010.0000000.0000000.0000000.0000001.000000e+00
category_Arts, Culture, Humanities4857.01.393865e-013.463850e-010.0000000.0000000.0000000.0000001.000000e+00
category_Community Development4857.07.967881e-022.708232e-010.0000000.0000000.0000000.0000001.000000e+00
category_Education4857.05.826642e-022.342707e-010.0000000.0000000.0000000.0000001.000000e+00
category_Environment4857.06.588429e-022.481053e-010.0000000.0000000.0000000.0000001.000000e+00
category_Health4857.01.192094e-013.240681e-010.0000000.0000000.0000000.0000001.000000e+00
category_Human Services4857.02.476838e-014.317116e-010.0000000.0000000.0000000.0000001.000000e+00
category_Human and Civil Rights4857.03.747169e-021.899342e-010.0000000.0000000.0000000.0000001.000000e+00
category_International4857.08.935557e-022.852857e-010.0000000.0000000.0000000.0000001.000000e+00
category_Religion4857.06.094297e-022.392503e-010.0000000.0000000.0000000.0000001.000000e+00
category_Research and Public Policy4857.02.511839e-021.565008e-010.0000000.0000000.0000000.0000001.000000e+00
complexity4857.04.023060e+001.338153e+000.0000003.0000004.0000005.0000008.000000e+00
complexity_201141.02.121951e+005.096627e-011.000000NaNNaNNaN3.000000e+00
conflict_of_interest_policy_v24857.09.820877e-011.326464e-010.0000001.0000001.0000001.0000001.000000e+00
donor_advisory4857.09.676755e-039.790347e-020.0000000.0000000.0000000.0000001.000000e+00
donor_advisory_2011_to_20164857.02.223595e-021.474652e-010.0000000.0000000.0000000.0000001.000000e+00
donor_advisory_20164857.09.676755e-039.790347e-020.0000000.0000000.0000000.0000001.000000e+00
program_efficiency4857.08.040641e-011.074864e-010.0038330.7559600.8167990.8691419.971109e-01
records_retention_policy_v24857.09.407041e-012.362019e-010.0000001.0000001.0000001.0000001.000000e+00
tot_rev593.06.462455e+071.746480e+08234562.000000NaNNaNNaN2.974134e+09
total_revenue_logged4857.01.576144e+011.381395e+000.00000014.79201415.60409216.5760352.196787e+01
whistleblower_policy_v24857.09.477043e-012.226455e-010.0000001.0000001.0000001.0000001.000000e+00
\n", "
" ], "text/plain": [ " count mean std \\\n", "2011_data 4857.0 0.000000e+00 0.000000e+00 \n", "2016_data 4857.0 1.000000e+00 0.000000e+00 \n", "SOX_policies 4857.0 2.870496e+00 4.882352e-01 \n", "SOX_policies_all_binary 4857.0 9.192917e-01 2.724146e-01 \n", "SOX_policies_binary 4857.0 9.859996e-01 1.175042e-01 \n", "age 4857.0 4.003521e+01 1.922688e+01 \n", "category_Animals 4857.0 7.700226e-02 2.666225e-01 \n", "category_Arts, Culture, Humanities 4857.0 1.393865e-01 3.463850e-01 \n", "category_Community Development 4857.0 7.967881e-02 2.708232e-01 \n", "category_Education 4857.0 5.826642e-02 2.342707e-01 \n", "category_Environment 4857.0 6.588429e-02 2.481053e-01 \n", "category_Health 4857.0 1.192094e-01 3.240681e-01 \n", "category_Human Services 4857.0 2.476838e-01 4.317116e-01 \n", "category_Human and Civil Rights 4857.0 3.747169e-02 1.899342e-01 \n", "category_International 4857.0 8.935557e-02 2.852857e-01 \n", "category_Religion 4857.0 6.094297e-02 2.392503e-01 \n", "category_Research and Public Policy 4857.0 2.511839e-02 1.565008e-01 \n", "complexity 4857.0 4.023060e+00 1.338153e+00 \n", "complexity_2011 41.0 2.121951e+00 5.096627e-01 \n", "conflict_of_interest_policy_v2 4857.0 9.820877e-01 1.326464e-01 \n", "donor_advisory 4857.0 9.676755e-03 9.790347e-02 \n", "donor_advisory_2011_to_2016 4857.0 2.223595e-02 1.474652e-01 \n", "donor_advisory_2016 4857.0 9.676755e-03 9.790347e-02 \n", "program_efficiency 4857.0 8.040641e-01 1.074864e-01 \n", "records_retention_policy_v2 4857.0 9.407041e-01 2.362019e-01 \n", "tot_rev 593.0 6.462455e+07 1.746480e+08 \n", "total_revenue_logged 4857.0 1.576144e+01 1.381395e+00 \n", "whistleblower_policy_v2 4857.0 9.477043e-01 2.226455e-01 \n", "\n", " min 25% 50% \\\n", "2011_data 0.000000 0.000000 0.000000 \n", "2016_data 1.000000 1.000000 1.000000 \n", "SOX_policies 0.000000 3.000000 3.000000 \n", "SOX_policies_all_binary 0.000000 1.000000 1.000000 \n", "SOX_policies_binary 0.000000 1.000000 1.000000 \n", "age 0.000000 25.000000 35.000000 \n", "category_Animals 0.000000 0.000000 0.000000 \n", "category_Arts, Culture, Humanities 0.000000 0.000000 0.000000 \n", "category_Community Development 0.000000 0.000000 0.000000 \n", "category_Education 0.000000 0.000000 0.000000 \n", "category_Environment 0.000000 0.000000 0.000000 \n", "category_Health 0.000000 0.000000 0.000000 \n", "category_Human Services 0.000000 0.000000 0.000000 \n", "category_Human and Civil Rights 0.000000 0.000000 0.000000 \n", "category_International 0.000000 0.000000 0.000000 \n", "category_Religion 0.000000 0.000000 0.000000 \n", "category_Research and Public Policy 0.000000 0.000000 0.000000 \n", "complexity 0.000000 3.000000 4.000000 \n", "complexity_2011 1.000000 NaN NaN \n", "conflict_of_interest_policy_v2 0.000000 1.000000 1.000000 \n", "donor_advisory 0.000000 0.000000 0.000000 \n", "donor_advisory_2011_to_2016 0.000000 0.000000 0.000000 \n", "donor_advisory_2016 0.000000 0.000000 0.000000 \n", "program_efficiency 0.003833 0.755960 0.816799 \n", "records_retention_policy_v2 0.000000 1.000000 1.000000 \n", "tot_rev 234562.000000 NaN NaN \n", "total_revenue_logged 0.000000 14.792014 15.604092 \n", "whistleblower_policy_v2 0.000000 1.000000 1.000000 \n", "\n", " 75% max \n", "2011_data 0.000000 0.000000e+00 \n", "2016_data 1.000000 1.000000e+00 \n", "SOX_policies 3.000000 3.000000e+00 \n", "SOX_policies_all_binary 1.000000 1.000000e+00 \n", "SOX_policies_binary 1.000000 1.000000e+00 \n", "age 52.000000 1.080000e+02 \n", "category_Animals 0.000000 1.000000e+00 \n", "category_Arts, Culture, Humanities 0.000000 1.000000e+00 \n", "category_Community Development 0.000000 1.000000e+00 \n", "category_Education 0.000000 1.000000e+00 \n", "category_Environment 0.000000 1.000000e+00 \n", "category_Health 0.000000 1.000000e+00 \n", "category_Human Services 0.000000 1.000000e+00 \n", "category_Human and Civil Rights 0.000000 1.000000e+00 \n", "category_International 0.000000 1.000000e+00 \n", "category_Religion 0.000000 1.000000e+00 \n", "category_Research and Public Policy 0.000000 1.000000e+00 \n", "complexity 5.000000 8.000000e+00 \n", "complexity_2011 NaN 3.000000e+00 \n", "conflict_of_interest_policy_v2 1.000000 1.000000e+00 \n", "donor_advisory 0.000000 1.000000e+00 \n", "donor_advisory_2011_to_2016 0.000000 1.000000e+00 \n", "donor_advisory_2016 0.000000 1.000000e+00 \n", "program_efficiency 0.869141 9.971109e-01 \n", "records_retention_policy_v2 1.000000 1.000000e+00 \n", "tot_rev NaN 2.974134e+09 \n", "total_revenue_logged 16.576035 2.196787e+01 \n", "whistleblower_policy_v2 1.000000 1.000000e+00 " ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[df_2011_orgs_mod['2016_data']==1].describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Delete all rows except the *2011_data* and *2016_data* rows" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2011_data2016_dataEINFYEForm 990 FYESOX_policiesSOX_policies_all_binarySOX_policies_binaryagecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policycomplexitycomplexity_2011conflict_of_interest_policy_v2donor_advisorydonor_advisory_2011_to_2016donor_advisory_2016org_idprogram_efficiencyratings_systemrecords_retention_policy_v2statetot_revtotal_revenue_loggedwhistleblower_policy_v2
507090.01.0010202467FY20142014-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.06.0NaN1.00.00.00.059540.794457CN 2.11.0MENaN16.3779931.0
\n", "
" ], "text/plain": [ " 2011_data 2016_data EIN FYE Form 990 FYE SOX_policies \\\n", "50709 0.0 1.0 010202467 FY2014 2014-12 3.0 \n", "\n", " SOX_policies_all_binary SOX_policies_binary age \\\n", "50709 1.0 1.0 62.0 \n", "\n", " category category_Animals \\\n", "50709 Research and Public Policy 0.0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "50709 0.0 0.0 \n", "\n", " category_Education category_Environment category_Health \\\n", "50709 0.0 0.0 0.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "50709 0.0 0.0 \n", "\n", " category_International category_Religion \\\n", "50709 0.0 0.0 \n", "\n", " category_Research and Public Policy complexity complexity_2011 \\\n", "50709 1.0 6.0 NaN \n", "\n", " conflict_of_interest_policy_v2 donor_advisory \\\n", "50709 1.0 0.0 \n", "\n", " donor_advisory_2011_to_2016 donor_advisory_2016 org_id \\\n", "50709 0.0 0.0 5954 \n", "\n", " program_efficiency ratings_system records_retention_policy_v2 state \\\n", "50709 0.794457 CN 2.1 1.0 ME \n", "\n", " tot_rev total_revenue_logged whistleblower_policy_v2 \n", "50709 NaN 16.377993 1.0 " ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[:1]" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#df_2011_orgs_mod['2011_data'] = df_2011_orgs_mod['2011_data'].astype('int')\n", "#df_2011_orgs_mod['2016_data'] = df_2011_orgs_mod['2016_data'].astype('int')" ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "69850\n", "9720\n", "4863\n", "4857\n", "0\n" ] } ], "source": [ "print len(df_2011_orgs_mod)\n", "print len(df_2011_orgs_mod[(df_2011_orgs_mod['2011_data']==1) | (df_2011_orgs_mod['2016_data']==1)])\n", "print len(df_2011_orgs_mod[df_2011_orgs_mod['2011_data']==1])\n", "print len(df_2011_orgs_mod[df_2011_orgs_mod['2016_data']==1])\n", "print len(df_2011_orgs_mod[(df_2011_orgs_mod['2011_data']==1) & (df_2011_orgs_mod['2016_data']==1)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Save version with all rows" ] }, { "cell_type": "code", "execution_count": 174, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#df_2011_orgs_mod = pd.read_pickle('df_2011_orgs_mod_v1.pkl')\n", "df_2011_orgs_mod.to_pickle('df_2011_orgs_mod_v1.pkl')" ] }, { "cell_type": "code", "execution_count": 175, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9720\n" ] } ], "source": [ "df_2011_orgs_mod = df_2011_orgs_mod[(df_2011_orgs_mod['2011_data']==1) | (df_2011_orgs_mod['2016_data']==1)]\n", "print len(df_2011_orgs_mod)" ] }, { "cell_type": "code", "execution_count": 176, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "float64\n", "1.0 4863\n", "0.0 4857\n", "Name: 2011_data, dtype: int64 \n", "\n", "float64\n", "0.0 4863\n", "1.0 4857\n", "Name: 2016_data, dtype: int64\n" ] } ], "source": [ "print df_2011_orgs_mod['2011_data'].dtype\n", "print df_2011_orgs_mod['2011_data'].value_counts(), '\\n'\n", "print df_2011_orgs_mod['2016_data'].dtype\n", "print df_2011_orgs_mod['2016_data'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create new variables\n", "\n", "merged_firm_day['Number of Ticker Mentions [t-1]'] = merged_firm_day['Number of Ticker Mentions'].unstack().shift(1).stack()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Get rid of duplicates first." ] }, { "cell_type": "code", "execution_count": 178, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2011_data2016_dataEINFYEForm 990 FYESOX_policiesSOX_policies_all_binarySOX_policies_binaryagecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policycomplexitycomplexity_2011conflict_of_interest_policy_v2donor_advisorydonor_advisory_2011_to_2016donor_advisory_2016org_idprogram_efficiencyratings_systemrecords_retention_policy_v2statetot_revtotal_revenue_loggedwhistleblower_policy_v2
373541.00.0131624041FY20092009-122.00.01.066.0Animals1.00.00.00.00.00.00.00.00.00.00.00.02.01.00.00.00.059910.938895CN 1.01.0NYNaN15.3412680.0
223911.00.0316027287FY20092009-063.01.01.090.0Community Development0.00.01.00.00.00.00.00.00.00.00.00.02.01.00.00.00.052000.871144CN 1.01.0OH16554258.016.6221541.0
228391.00.0330068583FY20102010-063.01.01.031.0Health0.00.00.00.00.01.00.00.00.00.00.00.03.01.00.00.00.0124580.766880CN 1.01.0CANaN16.3333301.0
745441.00.0341787585FY20092009-123.01.01.021.0Religion0.00.00.00.00.00.00.00.00.01.00.00.02.01.00.00.00.078090.947708CN 1.01.0OHNaN15.6619301.0
225751.00.0510082499FY20092009-122.00.01.058.0Animals1.00.00.00.00.00.00.00.00.00.00.00.03.01.00.00.00.089570.780487CN 1.00.0DENaN14.4476821.0
282241.00.0521219783FY20102010-063.01.01.035.0Human Services0.00.00.00.00.00.01.00.00.00.00.00.02.01.00.00.00.054140.950350CN 1.01.0VA20116402.016.8170461.0
\n", "
" ], "text/plain": [ " 2011_data 2016_data EIN FYE Form 990 FYE SOX_policies \\\n", "37354 1.0 0.0 131624041 FY2009 2009-12 2.0 \n", "22391 1.0 0.0 316027287 FY2009 2009-06 3.0 \n", "22839 1.0 0.0 330068583 FY2010 2010-06 3.0 \n", "74544 1.0 0.0 341787585 FY2009 2009-12 3.0 \n", "22575 1.0 0.0 510082499 FY2009 2009-12 2.0 \n", "28224 1.0 0.0 521219783 FY2010 2010-06 3.0 \n", "\n", " SOX_policies_all_binary SOX_policies_binary age \\\n", "37354 0.0 1.0 66.0 \n", "22391 1.0 1.0 90.0 \n", "22839 1.0 1.0 31.0 \n", "74544 1.0 1.0 21.0 \n", "22575 0.0 1.0 58.0 \n", "28224 1.0 1.0 35.0 \n", "\n", " category category_Animals \\\n", "37354 Animals 1.0 \n", "22391 Community Development 0.0 \n", "22839 Health 0.0 \n", "74544 Religion 0.0 \n", "22575 Animals 1.0 \n", "28224 Human Services 0.0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "37354 0.0 0.0 \n", "22391 0.0 1.0 \n", "22839 0.0 0.0 \n", "74544 0.0 0.0 \n", "22575 0.0 0.0 \n", "28224 0.0 0.0 \n", "\n", " category_Education category_Environment category_Health \\\n", "37354 0.0 0.0 0.0 \n", "22391 0.0 0.0 0.0 \n", "22839 0.0 0.0 1.0 \n", "74544 0.0 0.0 0.0 \n", "22575 0.0 0.0 0.0 \n", "28224 0.0 0.0 0.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "37354 0.0 0.0 \n", "22391 0.0 0.0 \n", "22839 0.0 0.0 \n", "74544 0.0 0.0 \n", "22575 0.0 0.0 \n", "28224 1.0 0.0 \n", "\n", " category_International category_Religion \\\n", "37354 0.0 0.0 \n", "22391 0.0 0.0 \n", "22839 0.0 0.0 \n", "74544 0.0 1.0 \n", "22575 0.0 0.0 \n", "28224 0.0 0.0 \n", "\n", " category_Research and Public Policy complexity complexity_2011 \\\n", "37354 0.0 0.0 2.0 \n", "22391 0.0 0.0 2.0 \n", "22839 0.0 0.0 3.0 \n", "74544 0.0 0.0 2.0 \n", "22575 0.0 0.0 3.0 \n", "28224 0.0 0.0 2.0 \n", "\n", " conflict_of_interest_policy_v2 donor_advisory \\\n", "37354 1.0 0.0 \n", "22391 1.0 0.0 \n", "22839 1.0 0.0 \n", "74544 1.0 0.0 \n", "22575 1.0 0.0 \n", "28224 1.0 0.0 \n", "\n", " donor_advisory_2011_to_2016 donor_advisory_2016 org_id \\\n", "37354 0.0 0.0 5991 \n", "22391 0.0 0.0 5200 \n", "22839 0.0 0.0 12458 \n", "74544 0.0 0.0 7809 \n", "22575 0.0 0.0 8957 \n", "28224 0.0 0.0 5414 \n", "\n", " program_efficiency ratings_system records_retention_policy_v2 state \\\n", "37354 0.938895 CN 1.0 1.0 NY \n", "22391 0.871144 CN 1.0 1.0 OH \n", "22839 0.766880 CN 1.0 1.0 CA \n", "74544 0.947708 CN 1.0 1.0 OH \n", "22575 0.780487 CN 1.0 0.0 DE \n", "28224 0.950350 CN 1.0 1.0 VA \n", "\n", " tot_rev total_revenue_logged whistleblower_policy_v2 \n", "37354 NaN 15.341268 0.0 \n", "22391 16554258.0 16.622154 1.0 \n", "22839 NaN 16.333330 1.0 \n", "74544 NaN 15.661930 1.0 \n", "22575 NaN 14.447682 1.0 \n", "28224 20116402.0 16.817046 1.0 " ] }, "execution_count": 178, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[df_2011_orgs_mod.duplicated(['EIN', 'FYE'])]" ] }, { "cell_type": "code", "execution_count": 179, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2011_data2016_dataEINFYEForm 990 FYESOX_policiesSOX_policies_all_binarySOX_policies_binaryagecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policycomplexitycomplexity_2011conflict_of_interest_policy_v2donor_advisorydonor_advisory_2011_to_2016donor_advisory_2016org_idprogram_efficiencyratings_systemrecords_retention_policy_v2statetot_revtotal_revenue_loggedwhistleblower_policy_v2
373460.01.0131624041FY20142014-123.01.01.066.0Animals1.00.00.00.00.00.00.00.00.00.00.03.0NaN1.00.00.00.059910.899889CN 2.11.0NYNaN15.7617391.0
373521.00.0131624041FY20092009-122.00.01.066.0Animals1.00.00.00.00.00.00.00.00.00.00.00.02.01.00.00.00.059910.938895CN 2.01.0NYNaN15.3412680.0
373541.00.0131624041FY20092009-122.00.01.066.0Animals1.00.00.00.00.00.00.00.00.00.00.00.02.01.00.00.00.059910.938895CN 1.01.0NYNaN15.3412680.0
223840.01.0316027287FY20142014-063.01.01.090.0Community Development0.00.01.00.00.00.00.00.00.00.00.04.0NaN1.00.00.00.052000.872482CN 2.11.0OH47097987.017.6677411.0
223901.00.0316027287FY20092009-063.01.01.090.0Community Development0.00.01.00.00.00.00.00.00.00.00.00.02.01.00.00.00.052000.871144CN 2.01.0OH16554258.016.6221541.0
223911.00.0316027287FY20092009-063.01.01.090.0Community Development0.00.01.00.00.00.00.00.00.00.00.00.02.01.00.00.00.052000.871144CN 1.01.0OH16554258.016.6221541.0
228310.01.0330068583FY20152015-063.01.01.031.0Health0.00.00.00.00.01.00.00.00.00.00.05.0NaN1.00.00.00.0124580.824054CN 2.11.0CANaN17.0510951.0
228381.00.0330068583FY20102010-063.01.01.031.0Health0.00.00.00.00.01.00.00.00.00.00.00.03.01.00.00.00.0124580.766880CN 2.01.0CANaN16.3333301.0
228391.00.0330068583FY20102010-063.01.01.031.0Health0.00.00.00.00.01.00.00.00.00.00.00.03.01.00.00.00.0124580.766880CN 1.01.0CANaN16.3333301.0
745370.01.0341787585FY20142014-123.01.01.021.0Religion0.00.00.00.00.00.00.00.00.01.00.02.0NaN1.00.00.00.078090.823612CN 2.11.0OHNaN16.2664681.0
745431.00.0341787585FY20092009-123.01.01.021.0Religion0.00.00.00.00.00.00.00.00.01.00.00.02.01.00.00.00.078090.947708CN 2.01.0OHNaN15.6619301.0
745441.00.0341787585FY20092009-123.01.01.021.0Religion0.00.00.00.00.00.00.00.00.01.00.00.02.01.00.00.00.078090.947708CN 1.01.0OHNaN15.6619301.0
225680.01.0510082499FY20142014-123.01.01.058.0Animals1.00.00.00.00.00.00.00.00.00.00.05.0NaN1.00.00.00.089570.708162CN 2.11.0DENaN14.2592861.0
225741.00.0510082499FY20092009-122.00.01.058.0Animals1.00.00.00.00.00.00.00.00.00.00.00.03.01.00.00.00.089570.780487CN 2.00.0DENaN14.4476821.0
225751.00.0510082499FY20092009-122.00.01.058.0Animals1.00.00.00.00.00.00.00.00.00.00.00.03.01.00.00.00.089570.780487CN 1.00.0DENaN14.4476821.0
282170.01.0521219783FY20152015-063.01.01.035.0Human Services0.00.00.00.00.00.01.00.00.00.00.04.0NaN1.00.00.00.054140.949818CN 2.11.0VANaN17.3316741.0
282231.00.0521219783FY20102010-063.01.01.035.0Human Services0.00.00.00.00.00.01.00.00.00.00.00.02.01.00.00.00.054140.950350CN 2.01.0VA20116402.016.8170461.0
282241.00.0521219783FY20102010-063.01.01.035.0Human Services0.00.00.00.00.00.01.00.00.00.00.00.02.01.00.00.00.054140.950350CN 1.01.0VA20116402.016.8170461.0
\n", "
" ], "text/plain": [ " 2011_data 2016_data EIN FYE Form 990 FYE SOX_policies \\\n", "37346 0.0 1.0 131624041 FY2014 2014-12 3.0 \n", "37352 1.0 0.0 131624041 FY2009 2009-12 2.0 \n", "37354 1.0 0.0 131624041 FY2009 2009-12 2.0 \n", "22384 0.0 1.0 316027287 FY2014 2014-06 3.0 \n", "22390 1.0 0.0 316027287 FY2009 2009-06 3.0 \n", "22391 1.0 0.0 316027287 FY2009 2009-06 3.0 \n", "22831 0.0 1.0 330068583 FY2015 2015-06 3.0 \n", "22838 1.0 0.0 330068583 FY2010 2010-06 3.0 \n", "22839 1.0 0.0 330068583 FY2010 2010-06 3.0 \n", "74537 0.0 1.0 341787585 FY2014 2014-12 3.0 \n", "74543 1.0 0.0 341787585 FY2009 2009-12 3.0 \n", "74544 1.0 0.0 341787585 FY2009 2009-12 3.0 \n", "22568 0.0 1.0 510082499 FY2014 2014-12 3.0 \n", "22574 1.0 0.0 510082499 FY2009 2009-12 2.0 \n", "22575 1.0 0.0 510082499 FY2009 2009-12 2.0 \n", "28217 0.0 1.0 521219783 FY2015 2015-06 3.0 \n", "28223 1.0 0.0 521219783 FY2010 2010-06 3.0 \n", "28224 1.0 0.0 521219783 FY2010 2010-06 3.0 \n", "\n", " SOX_policies_all_binary SOX_policies_binary age \\\n", "37346 1.0 1.0 66.0 \n", "37352 0.0 1.0 66.0 \n", "37354 0.0 1.0 66.0 \n", "22384 1.0 1.0 90.0 \n", "22390 1.0 1.0 90.0 \n", "22391 1.0 1.0 90.0 \n", "22831 1.0 1.0 31.0 \n", "22838 1.0 1.0 31.0 \n", "22839 1.0 1.0 31.0 \n", "74537 1.0 1.0 21.0 \n", "74543 1.0 1.0 21.0 \n", "74544 1.0 1.0 21.0 \n", "22568 1.0 1.0 58.0 \n", "22574 0.0 1.0 58.0 \n", "22575 0.0 1.0 58.0 \n", "28217 1.0 1.0 35.0 \n", "28223 1.0 1.0 35.0 \n", "28224 1.0 1.0 35.0 \n", "\n", " category category_Animals \\\n", "37346 Animals 1.0 \n", "37352 Animals 1.0 \n", "37354 Animals 1.0 \n", "22384 Community Development 0.0 \n", "22390 Community Development 0.0 \n", "22391 Community Development 0.0 \n", "22831 Health 0.0 \n", "22838 Health 0.0 \n", "22839 Health 0.0 \n", "74537 Religion 0.0 \n", "74543 Religion 0.0 \n", "74544 Religion 0.0 \n", "22568 Animals 1.0 \n", "22574 Animals 1.0 \n", "22575 Animals 1.0 \n", "28217 Human Services 0.0 \n", "28223 Human Services 0.0 \n", "28224 Human Services 0.0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "37346 0.0 0.0 \n", "37352 0.0 0.0 \n", "37354 0.0 0.0 \n", "22384 0.0 1.0 \n", "22390 0.0 1.0 \n", "22391 0.0 1.0 \n", "22831 0.0 0.0 \n", "22838 0.0 0.0 \n", "22839 0.0 0.0 \n", "74537 0.0 0.0 \n", "74543 0.0 0.0 \n", "74544 0.0 0.0 \n", "22568 0.0 0.0 \n", "22574 0.0 0.0 \n", "22575 0.0 0.0 \n", "28217 0.0 0.0 \n", "28223 0.0 0.0 \n", "28224 0.0 0.0 \n", "\n", " category_Education category_Environment category_Health \\\n", "37346 0.0 0.0 0.0 \n", "37352 0.0 0.0 0.0 \n", "37354 0.0 0.0 0.0 \n", "22384 0.0 0.0 0.0 \n", "22390 0.0 0.0 0.0 \n", "22391 0.0 0.0 0.0 \n", "22831 0.0 0.0 1.0 \n", "22838 0.0 0.0 1.0 \n", "22839 0.0 0.0 1.0 \n", "74537 0.0 0.0 0.0 \n", "74543 0.0 0.0 0.0 \n", "74544 0.0 0.0 0.0 \n", "22568 0.0 0.0 0.0 \n", "22574 0.0 0.0 0.0 \n", "22575 0.0 0.0 0.0 \n", "28217 0.0 0.0 0.0 \n", "28223 0.0 0.0 0.0 \n", "28224 0.0 0.0 0.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "37346 0.0 0.0 \n", "37352 0.0 0.0 \n", "37354 0.0 0.0 \n", "22384 0.0 0.0 \n", "22390 0.0 0.0 \n", "22391 0.0 0.0 \n", "22831 0.0 0.0 \n", "22838 0.0 0.0 \n", "22839 0.0 0.0 \n", "74537 0.0 0.0 \n", "74543 0.0 0.0 \n", "74544 0.0 0.0 \n", "22568 0.0 0.0 \n", "22574 0.0 0.0 \n", "22575 0.0 0.0 \n", "28217 1.0 0.0 \n", "28223 1.0 0.0 \n", "28224 1.0 0.0 \n", "\n", " category_International category_Religion \\\n", "37346 0.0 0.0 \n", "37352 0.0 0.0 \n", "37354 0.0 0.0 \n", "22384 0.0 0.0 \n", "22390 0.0 0.0 \n", "22391 0.0 0.0 \n", "22831 0.0 0.0 \n", "22838 0.0 0.0 \n", "22839 0.0 0.0 \n", "74537 0.0 1.0 \n", "74543 0.0 1.0 \n", "74544 0.0 1.0 \n", "22568 0.0 0.0 \n", "22574 0.0 0.0 \n", "22575 0.0 0.0 \n", "28217 0.0 0.0 \n", "28223 0.0 0.0 \n", "28224 0.0 0.0 \n", "\n", " category_Research and Public Policy complexity complexity_2011 \\\n", "37346 0.0 3.0 NaN \n", "37352 0.0 0.0 2.0 \n", "37354 0.0 0.0 2.0 \n", "22384 0.0 4.0 NaN \n", "22390 0.0 0.0 2.0 \n", "22391 0.0 0.0 2.0 \n", "22831 0.0 5.0 NaN \n", "22838 0.0 0.0 3.0 \n", "22839 0.0 0.0 3.0 \n", "74537 0.0 2.0 NaN \n", "74543 0.0 0.0 2.0 \n", "74544 0.0 0.0 2.0 \n", "22568 0.0 5.0 NaN \n", "22574 0.0 0.0 3.0 \n", "22575 0.0 0.0 3.0 \n", "28217 0.0 4.0 NaN \n", "28223 0.0 0.0 2.0 \n", "28224 0.0 0.0 2.0 \n", "\n", " conflict_of_interest_policy_v2 donor_advisory \\\n", "37346 1.0 0.0 \n", "37352 1.0 0.0 \n", "37354 1.0 0.0 \n", "22384 1.0 0.0 \n", "22390 1.0 0.0 \n", "22391 1.0 0.0 \n", "22831 1.0 0.0 \n", "22838 1.0 0.0 \n", "22839 1.0 0.0 \n", "74537 1.0 0.0 \n", "74543 1.0 0.0 \n", "74544 1.0 0.0 \n", "22568 1.0 0.0 \n", "22574 1.0 0.0 \n", "22575 1.0 0.0 \n", "28217 1.0 0.0 \n", "28223 1.0 0.0 \n", "28224 1.0 0.0 \n", "\n", " donor_advisory_2011_to_2016 donor_advisory_2016 org_id \\\n", "37346 0.0 0.0 5991 \n", "37352 0.0 0.0 5991 \n", "37354 0.0 0.0 5991 \n", "22384 0.0 0.0 5200 \n", "22390 0.0 0.0 5200 \n", "22391 0.0 0.0 5200 \n", "22831 0.0 0.0 12458 \n", "22838 0.0 0.0 12458 \n", "22839 0.0 0.0 12458 \n", "74537 0.0 0.0 7809 \n", "74543 0.0 0.0 7809 \n", "74544 0.0 0.0 7809 \n", "22568 0.0 0.0 8957 \n", "22574 0.0 0.0 8957 \n", "22575 0.0 0.0 8957 \n", "28217 0.0 0.0 5414 \n", "28223 0.0 0.0 5414 \n", "28224 0.0 0.0 5414 \n", "\n", " program_efficiency ratings_system records_retention_policy_v2 state \\\n", "37346 0.899889 CN 2.1 1.0 NY \n", "37352 0.938895 CN 2.0 1.0 NY \n", "37354 0.938895 CN 1.0 1.0 NY \n", "22384 0.872482 CN 2.1 1.0 OH \n", "22390 0.871144 CN 2.0 1.0 OH \n", "22391 0.871144 CN 1.0 1.0 OH \n", "22831 0.824054 CN 2.1 1.0 CA \n", "22838 0.766880 CN 2.0 1.0 CA \n", "22839 0.766880 CN 1.0 1.0 CA \n", "74537 0.823612 CN 2.1 1.0 OH \n", "74543 0.947708 CN 2.0 1.0 OH \n", "74544 0.947708 CN 1.0 1.0 OH \n", "22568 0.708162 CN 2.1 1.0 DE \n", "22574 0.780487 CN 2.0 0.0 DE \n", "22575 0.780487 CN 1.0 0.0 DE \n", "28217 0.949818 CN 2.1 1.0 VA \n", "28223 0.950350 CN 2.0 1.0 VA \n", "28224 0.950350 CN 1.0 1.0 VA \n", "\n", " tot_rev total_revenue_logged whistleblower_policy_v2 \n", "37346 NaN 15.761739 1.0 \n", "37352 NaN 15.341268 0.0 \n", "37354 NaN 15.341268 0.0 \n", "22384 47097987.0 17.667741 1.0 \n", "22390 16554258.0 16.622154 1.0 \n", "22391 16554258.0 16.622154 1.0 \n", "22831 NaN 17.051095 1.0 \n", "22838 NaN 16.333330 1.0 \n", "22839 NaN 16.333330 1.0 \n", "74537 NaN 16.266468 1.0 \n", "74543 NaN 15.661930 1.0 \n", "74544 NaN 15.661930 1.0 \n", "22568 NaN 14.259286 1.0 \n", "22574 NaN 14.447682 1.0 \n", "22575 NaN 14.447682 1.0 \n", "28217 NaN 17.331674 1.0 \n", "28223 20116402.0 16.817046 1.0 \n", "28224 20116402.0 16.817046 1.0 " ] }, "execution_count": 179, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[df_2011_orgs_mod['EIN'].isin(['131624041','316027287','330068583','341787585',\n", " '510082499','521219783'])]#[:6]" ] }, { "cell_type": "code", "execution_count": 183, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Int64Index([50709, 50715, 40348, 40354, 60873, 60879, 7525, 7531, 46508,\n", " 46514,\n", " ...\n", " 34584, 34590, 34556, 34561, 40545, 40551, 34495, 34501, 53168,\n", " 53174],\n", " dtype='int64', length=9720)" ] }, "execution_count": 183, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod.index" ] }, { "cell_type": "code", "execution_count": 188, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "9720\n", "9714\n" ] } ], "source": [ "print len(df_2011_orgs_mod)\n", "df_2011_orgs_mod = df_2011_orgs_mod.drop(37354)\n", "df_2011_orgs_mod = df_2011_orgs_mod.drop(22391)\n", "df_2011_orgs_mod = df_2011_orgs_mod.drop(22839)\n", "df_2011_orgs_mod = df_2011_orgs_mod.drop(74544)\n", "df_2011_orgs_mod = df_2011_orgs_mod.drop(22575)\n", "df_2011_orgs_mod = df_2011_orgs_mod.drop(28224)\n", "#df_2011_orgs_mod = df_2011_orgs_mod.drop([37354,22391,22839,74544,22575,28224])\n", "#[756, 3171, 3202, 3381, 4726, 5045])\n", "print len(df_2011_orgs_mod)" ] }, { "cell_type": "code", "execution_count": 189, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2011_data2016_dataEINFYEForm 990 FYESOX_policiesSOX_policies_all_binarySOX_policies_binaryagecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policycomplexitycomplexity_2011conflict_of_interest_policy_v2donor_advisorydonor_advisory_2011_to_2016donor_advisory_2016org_idprogram_efficiencyratings_systemrecords_retention_policy_v2statetot_revtotal_revenue_loggedwhistleblower_policy_v2
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [2011_data, 2016_data, EIN, FYE, Form 990 FYE, SOX_policies, SOX_policies_all_binary, SOX_policies_binary, age, category, category_Animals, category_Arts, Culture, Humanities, category_Community Development, category_Education, category_Environment, category_Health, category_Human Services, category_Human and Civil Rights, category_International, category_Religion, category_Research and Public Policy, complexity, complexity_2011, conflict_of_interest_policy_v2, donor_advisory, donor_advisory_2011_to_2016, donor_advisory_2016, org_id, program_efficiency, ratings_system, records_retention_policy_v2, state, tot_rev, total_revenue_logged, whistleblower_policy_v2]\n", "Index: []" ] }, "execution_count": 189, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[df_2011_orgs_mod.duplicated(['EIN', 'FYE'])]" ] }, { "cell_type": "code", "execution_count": 238, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#df_2011_orgs_mod.set_index(['FYE', 'EIN'], inplace=True)" ] }, { "cell_type": "code", "execution_count": 239, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#df_2011_orgs_mod = df_2011_orgs_mod.set_index(['EIN'])" ] }, { "cell_type": "code", "execution_count": 230, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_2011_orgs_mod = df_2011_orgs_mod.reset_index()" ] }, { "cell_type": "code", "execution_count": 224, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#df_2011_orgs_mod = df_2011_orgs_mod.set_index(['EIN', '2016_data'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### This is the right way to set the index to get it to work!" ] }, { "cell_type": "code", "execution_count": 264, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_2011_orgs_mod = df_2011_orgs_mod.set_index(['2016_data', 'EIN'])" ] }, { "cell_type": "code", "execution_count": 240, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FYEindex2011_dataForm 990 FYESOX_policiesSOX_policies_all_binarySOX_policies_binaryagecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policycomplexitycomplexity_2011conflict_of_interest_policy_v2donor_advisorydonor_advisory_2011_to_2016donor_advisory_2016org_idprogram_efficiencyratings_systemrecords_retention_policy_v2statetot_revtotal_revenue_loggedwhistleblower_policy_v2SOX_policies [t-1]
2016_dataEIN
1.0010202467FY201400.02014-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.06.0NaN1.00.00.00.059540.794457CN 2.11.0MENaN16.3779931.03.0
0.0010202467FY200911.02009-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.03.01.00.00.00.059540.788895CN 2.01.0ME8432154.015.9475631.0NaN
\n", "
" ], "text/plain": [ " FYE index 2011_data Form 990 FYE SOX_policies \\\n", "2016_data EIN \n", "1.0 010202467 FY2014 0 0.0 2014-12 3.0 \n", "0.0 010202467 FY2009 1 1.0 2009-12 3.0 \n", "\n", " SOX_policies_all_binary SOX_policies_binary age \\\n", "2016_data EIN \n", "1.0 010202467 1.0 1.0 62.0 \n", "0.0 010202467 1.0 1.0 62.0 \n", "\n", " category category_Animals \\\n", "2016_data EIN \n", "1.0 010202467 Research and Public Policy 0.0 \n", "0.0 010202467 Research and Public Policy 0.0 \n", "\n", " category_Arts, Culture, Humanities \\\n", "2016_data EIN \n", "1.0 010202467 0.0 \n", "0.0 010202467 0.0 \n", "\n", " category_Community Development category_Education \\\n", "2016_data EIN \n", "1.0 010202467 0.0 0.0 \n", "0.0 010202467 0.0 0.0 \n", "\n", " category_Environment category_Health \\\n", "2016_data EIN \n", "1.0 010202467 0.0 0.0 \n", "0.0 010202467 0.0 0.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "2016_data EIN \n", "1.0 010202467 0.0 0.0 \n", "0.0 010202467 0.0 0.0 \n", "\n", " category_International category_Religion \\\n", "2016_data EIN \n", "1.0 010202467 0.0 0.0 \n", "0.0 010202467 0.0 0.0 \n", "\n", " category_Research and Public Policy complexity \\\n", "2016_data EIN \n", "1.0 010202467 1.0 6.0 \n", "0.0 010202467 1.0 0.0 \n", "\n", " complexity_2011 conflict_of_interest_policy_v2 \\\n", "2016_data EIN \n", "1.0 010202467 NaN 1.0 \n", "0.0 010202467 3.0 1.0 \n", "\n", " donor_advisory donor_advisory_2011_to_2016 \\\n", "2016_data EIN \n", "1.0 010202467 0.0 0.0 \n", "0.0 010202467 0.0 0.0 \n", "\n", " donor_advisory_2016 org_id program_efficiency \\\n", "2016_data EIN \n", "1.0 010202467 0.0 5954 0.794457 \n", "0.0 010202467 0.0 5954 0.788895 \n", "\n", " ratings_system records_retention_policy_v2 state \\\n", "2016_data EIN \n", "1.0 010202467 CN 2.1 1.0 ME \n", "0.0 010202467 CN 2.0 1.0 ME \n", "\n", " tot_rev total_revenue_logged whistleblower_policy_v2 \\\n", "2016_data EIN \n", "1.0 010202467 NaN 16.377993 1.0 \n", "0.0 010202467 8432154.0 15.947563 1.0 \n", "\n", " SOX_policies [t-1] \n", "2016_data EIN \n", "1.0 010202467 3.0 \n", "0.0 010202467 NaN " ] }, "execution_count": 240, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[:2]" ] }, { "cell_type": "code", "execution_count": 241, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(1.0, '010202467')" ] }, "execution_count": 241, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod.index[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create lagged variables" ] }, { "cell_type": "code", "execution_count": 259, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_2011_orgs_mod['SOX_policies [t-1]'] = df_2011_orgs_mod['SOX_policies'].unstack().shift(1).stack()\n", "df_2011_orgs_mod['SOX_policies_binary [t-1]'] = df_2011_orgs_mod['SOX_policies_binary'].unstack().shift(1).stack()\n", "df_2011_orgs_mod['SOX_policies_all_binary [t-1]'] = df_2011_orgs_mod['SOX_policies_all_binary'].unstack().shift(1).stack()\n", "df_2011_orgs_mod['whistleblower_policy_v2 [t-1]'] = df_2011_orgs_mod['whistleblower_policy_v2'].unstack().shift(1).stack()\n", "df_2011_orgs_mod['records_retention_policy_v2 [t-1]'] = df_2011_orgs_mod['records_retention_policy_v2'].unstack().shift(1).stack()\n", "df_2011_orgs_mod['conflict_of_interest_policy_v2 [t-1]'] = df_2011_orgs_mod['conflict_of_interest_policy_v2'].unstack().shift(1).stack()" ] }, { "cell_type": "code", "execution_count": 260, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SOX_policiesSOX_policies [t-1]
2016_dataEIN
1.09902612833.03.0
0.09902612833.0NaN
1.09902667333.03.0
0.09902667333.0NaN
\n", "
" ], "text/plain": [ " SOX_policies SOX_policies [t-1]\n", "2016_data EIN \n", "1.0 990261283 3.0 3.0\n", "0.0 990261283 3.0 NaN\n", "1.0 990266733 3.0 3.0\n", "0.0 990266733 3.0 NaN" ] }, "execution_count": 260, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[['SOX_policies', 'SOX_policies [t-1]']].tail(4)" ] }, { "cell_type": "code", "execution_count": 265, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_2011_orgs_mod['complexity_2011 [t-1]'] = df_2011_orgs_mod['complexity_2011'].unstack().shift(1).stack()\n", "df_2011_orgs_mod['total_revenue_logged [t-1]'] = df_2011_orgs_mod['total_revenue_logged'].unstack().shift(1).stack()\n", "df_2011_orgs_mod['program_efficiency [t-1]'] = df_2011_orgs_mod['program_efficiency'].unstack().shift(1).stack()" ] }, { "cell_type": "code", "execution_count": 266, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FYEindex2011_dataForm 990 FYESOX_policiesSOX_policies_all_binarySOX_policies_binaryagecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policycomplexitycomplexity_2011conflict_of_interest_policy_v2donor_advisorydonor_advisory_2011_to_2016donor_advisory_2016org_idprogram_efficiencyratings_systemrecords_retention_policy_v2statetot_revtotal_revenue_loggedwhistleblower_policy_v2SOX_policies [t-1]SOX_policies_binary [t-1]SOX_policies_all_binary [t-1]whistleblower_policy_v2 [t-1]records_retention_policy_v2 [t-1]conflict_of_interest_policy_v2 [t-1]complexity_2011 [t-1]total_revenue_logged [t-1]program_efficiency [t-1]
2016_dataEIN
1.0010202467FY201400.02014-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.06.0NaN1.00.00.00.059540.794457CN 2.11.0MENaN16.3779931.03.01.01.01.01.01.03.015.9475630.788895
0.0010202467FY200911.02009-123.01.01.062.0Research and Public Policy0.00.00.00.00.00.00.00.00.00.01.00.03.01.00.00.00.059540.788895CN 2.01.0ME8432154.015.9475631.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " FYE index 2011_data Form 990 FYE SOX_policies \\\n", "2016_data EIN \n", "1.0 010202467 FY2014 0 0.0 2014-12 3.0 \n", "0.0 010202467 FY2009 1 1.0 2009-12 3.0 \n", "\n", " SOX_policies_all_binary SOX_policies_binary age \\\n", "2016_data EIN \n", "1.0 010202467 1.0 1.0 62.0 \n", "0.0 010202467 1.0 1.0 62.0 \n", "\n", " category category_Animals \\\n", "2016_data EIN \n", "1.0 010202467 Research and Public Policy 0.0 \n", "0.0 010202467 Research and Public Policy 0.0 \n", "\n", " category_Arts, Culture, Humanities \\\n", "2016_data EIN \n", "1.0 010202467 0.0 \n", "0.0 010202467 0.0 \n", "\n", " category_Community Development category_Education \\\n", "2016_data EIN \n", "1.0 010202467 0.0 0.0 \n", "0.0 010202467 0.0 0.0 \n", "\n", " category_Environment category_Health \\\n", "2016_data EIN \n", "1.0 010202467 0.0 0.0 \n", "0.0 010202467 0.0 0.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "2016_data EIN \n", "1.0 010202467 0.0 0.0 \n", "0.0 010202467 0.0 0.0 \n", "\n", " category_International category_Religion \\\n", "2016_data EIN \n", "1.0 010202467 0.0 0.0 \n", "0.0 010202467 0.0 0.0 \n", "\n", " category_Research and Public Policy complexity \\\n", "2016_data EIN \n", "1.0 010202467 1.0 6.0 \n", "0.0 010202467 1.0 0.0 \n", "\n", " complexity_2011 conflict_of_interest_policy_v2 \\\n", "2016_data EIN \n", "1.0 010202467 NaN 1.0 \n", "0.0 010202467 3.0 1.0 \n", "\n", " donor_advisory donor_advisory_2011_to_2016 \\\n", "2016_data EIN \n", "1.0 010202467 0.0 0.0 \n", "0.0 010202467 0.0 0.0 \n", "\n", " donor_advisory_2016 org_id program_efficiency \\\n", "2016_data EIN \n", "1.0 010202467 0.0 5954 0.794457 \n", "0.0 010202467 0.0 5954 0.788895 \n", "\n", " ratings_system records_retention_policy_v2 state \\\n", "2016_data EIN \n", "1.0 010202467 CN 2.1 1.0 ME \n", "0.0 010202467 CN 2.0 1.0 ME \n", "\n", " tot_rev total_revenue_logged whistleblower_policy_v2 \\\n", "2016_data EIN \n", "1.0 010202467 NaN 16.377993 1.0 \n", "0.0 010202467 8432154.0 15.947563 1.0 \n", "\n", " SOX_policies [t-1] SOX_policies_binary [t-1] \\\n", "2016_data EIN \n", "1.0 010202467 3.0 1.0 \n", "0.0 010202467 NaN NaN \n", "\n", " SOX_policies_all_binary [t-1] \\\n", "2016_data EIN \n", "1.0 010202467 1.0 \n", "0.0 010202467 NaN \n", "\n", " whistleblower_policy_v2 [t-1] \\\n", "2016_data EIN \n", "1.0 010202467 1.0 \n", "0.0 010202467 NaN \n", "\n", " records_retention_policy_v2 [t-1] \\\n", "2016_data EIN \n", "1.0 010202467 1.0 \n", "0.0 010202467 NaN \n", "\n", " conflict_of_interest_policy_v2 [t-1] \\\n", "2016_data EIN \n", "1.0 010202467 1.0 \n", "0.0 010202467 NaN \n", "\n", " complexity_2011 [t-1] total_revenue_logged [t-1] \\\n", "2016_data EIN \n", "1.0 010202467 3.0 15.947563 \n", "0.0 010202467 NaN NaN \n", "\n", " program_efficiency [t-1] \n", "2016_data EIN \n", "1.0 010202467 0.788895 \n", "0.0 010202467 NaN " ] }, "execution_count": 266, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[:2]" ] }, { "cell_type": "code", "execution_count": 267, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 4832.000000\n", "mean 2.532906\n", "std 0.870022\n", "min 0.000000\n", "25% NaN\n", "50% NaN\n", "75% NaN\n", "max 3.000000\n", "Name: SOX_policies, dtype: float64" ] }, "execution_count": 267, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[df_2011_orgs_mod['2011_data']==1]['SOX_policies'].describe()" ] }, { "cell_type": "code", "execution_count": 268, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 9689.000000\n", "mean 2.702136\n", "std 0.724866\n", "min 0.000000\n", "25% NaN\n", "50% NaN\n", "75% NaN\n", "max 3.000000\n", "Name: SOX_policies, dtype: float64" ] }, "execution_count": 268, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod['SOX_policies'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 25 of the 4,857 orgs are missing 2011 SOX data and Total Revenues, etc." ] }, { "cell_type": "code", "execution_count": 256, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "4857" ] }, "execution_count": 256, "metadata": {}, "output_type": "execute_result" } ], "source": [ "9689-4832" ] }, { "cell_type": "code", "execution_count": 269, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
index9714.04.856500e+032.804335e+030.000000e+002428.254856.57284.759.713000e+03
2011_data9714.05.000000e-015.000257e-010.000000e+000.000.51.001.000000e+00
SOX_policies9689.02.702136e+007.248655e-010.000000e+00NaNNaNNaN3.000000e+00
SOX_policies_all_binary9689.08.269171e-013.783384e-010.000000e+00NaNNaNNaN1.000000e+00
SOX_policies_binary9689.09.666632e-011.795237e-010.000000e+00NaNNaNNaN1.000000e+00
age9714.04.003521e+011.922589e+010.000000e+0025.0035.052.001.080000e+02
category_Animals9714.07.700226e-022.666088e-010.000000e+000.000.00.001.000000e+00
category_Arts, Culture, Humanities9714.01.393865e-013.463672e-010.000000e+000.000.00.001.000000e+00
category_Community Development9714.07.967881e-022.708093e-010.000000e+000.000.00.001.000000e+00
category_Education9714.05.826642e-022.342586e-010.000000e+000.000.00.001.000000e+00
category_Environment9714.06.588429e-022.480925e-010.000000e+000.000.00.001.000000e+00
category_Health9714.01.192094e-013.240514e-010.000000e+000.000.00.001.000000e+00
category_Human Services9714.02.476838e-014.316894e-010.000000e+000.000.00.001.000000e+00
category_Human and Civil Rights9714.03.747169e-021.899244e-010.000000e+000.000.00.001.000000e+00
category_International9714.08.935557e-022.852710e-010.000000e+000.000.00.001.000000e+00
category_Religion9714.06.094297e-022.392380e-010.000000e+000.000.00.001.000000e+00
category_Research and Public Policy9714.02.511839e-021.564927e-010.000000e+000.000.00.001.000000e+00
complexity9714.02.011530e+002.223039e+000.000000e+000.000.04.008.000000e+00
complexity_20114868.02.464051e+005.153707e-011.000000e+00NaNNaNNaN3.000000e+00
conflict_of_interest_policy_v29689.09.578904e-012.008496e-010.000000e+00NaNNaNNaN1.000000e+00
donor_advisory9666.07.345334e-038.539400e-020.000000e+00NaNNaNNaN1.000000e+00
donor_advisory_2011_to_20169714.02.223595e-021.474576e-010.000000e+000.000.00.001.000000e+00
donor_advisory_20169714.09.676755e-039.789843e-020.000000e+000.000.00.001.000000e+00
program_efficiency9689.08.043326e-011.065316e-013.833359e-03NaNNaNNaN9.976872e-01
records_retention_policy_v29689.08.702652e-013.360287e-010.000000e+00NaNNaNNaN1.000000e+00
tot_rev1848.05.005152e+071.509947e+08-4.263887e+07NaNNaNNaN3.587230e+09
total_revenue_logged9689.01.561172e+011.531531e+000.000000e+00NaNNaNNaN2.200080e+01
whistleblower_policy_v29689.08.739808e-013.318881e-010.000000e+00NaNNaNNaN1.000000e+00
SOX_policies [t-1]4832.02.532906e+008.700219e-010.000000e+00NaNNaNNaN3.000000e+00
SOX_policies_binary [t-1]4832.09.472268e-012.236035e-010.000000e+00NaNNaNNaN1.000000e+00
SOX_policies_all_binary [t-1]4832.07.340646e-014.418758e-010.000000e+00NaNNaNNaN1.000000e+00
whistleblower_policy_v2 [t-1]4832.07.998758e-014.001345e-010.000000e+00NaNNaNNaN1.000000e+00
records_retention_policy_v2 [t-1]4832.07.994619e-014.004444e-010.000000e+00NaNNaNNaN1.000000e+00
conflict_of_interest_policy_v2 [t-1]4832.09.335679e-012.490617e-010.000000e+00NaNNaNNaN1.000000e+00
complexity_2011 [t-1]4827.02.466957e+005.144976e-011.000000e+00NaNNaNNaN3.000000e+00
total_revenue_logged [t-1]4832.01.546122e+011.655440e+000.000000e+00NaNNaNNaN2.200080e+01
program_efficiency [t-1]4832.08.046025e-011.055736e-012.217704e-02NaNNaNNaN9.976872e-01
\n", "
" ], "text/plain": [ " count mean std \\\n", "index 9714.0 4.856500e+03 2.804335e+03 \n", "2011_data 9714.0 5.000000e-01 5.000257e-01 \n", "SOX_policies 9689.0 2.702136e+00 7.248655e-01 \n", "SOX_policies_all_binary 9689.0 8.269171e-01 3.783384e-01 \n", "SOX_policies_binary 9689.0 9.666632e-01 1.795237e-01 \n", "age 9714.0 4.003521e+01 1.922589e+01 \n", "category_Animals 9714.0 7.700226e-02 2.666088e-01 \n", "category_Arts, Culture, Humanities 9714.0 1.393865e-01 3.463672e-01 \n", "category_Community Development 9714.0 7.967881e-02 2.708093e-01 \n", "category_Education 9714.0 5.826642e-02 2.342586e-01 \n", "category_Environment 9714.0 6.588429e-02 2.480925e-01 \n", "category_Health 9714.0 1.192094e-01 3.240514e-01 \n", "category_Human Services 9714.0 2.476838e-01 4.316894e-01 \n", "category_Human and Civil Rights 9714.0 3.747169e-02 1.899244e-01 \n", "category_International 9714.0 8.935557e-02 2.852710e-01 \n", "category_Religion 9714.0 6.094297e-02 2.392380e-01 \n", "category_Research and Public Policy 9714.0 2.511839e-02 1.564927e-01 \n", "complexity 9714.0 2.011530e+00 2.223039e+00 \n", "complexity_2011 4868.0 2.464051e+00 5.153707e-01 \n", "conflict_of_interest_policy_v2 9689.0 9.578904e-01 2.008496e-01 \n", "donor_advisory 9666.0 7.345334e-03 8.539400e-02 \n", "donor_advisory_2011_to_2016 9714.0 2.223595e-02 1.474576e-01 \n", "donor_advisory_2016 9714.0 9.676755e-03 9.789843e-02 \n", "program_efficiency 9689.0 8.043326e-01 1.065316e-01 \n", "records_retention_policy_v2 9689.0 8.702652e-01 3.360287e-01 \n", "tot_rev 1848.0 5.005152e+07 1.509947e+08 \n", "total_revenue_logged 9689.0 1.561172e+01 1.531531e+00 \n", "whistleblower_policy_v2 9689.0 8.739808e-01 3.318881e-01 \n", "SOX_policies [t-1] 4832.0 2.532906e+00 8.700219e-01 \n", "SOX_policies_binary [t-1] 4832.0 9.472268e-01 2.236035e-01 \n", "SOX_policies_all_binary [t-1] 4832.0 7.340646e-01 4.418758e-01 \n", "whistleblower_policy_v2 [t-1] 4832.0 7.998758e-01 4.001345e-01 \n", "records_retention_policy_v2 [t-1] 4832.0 7.994619e-01 4.004444e-01 \n", "conflict_of_interest_policy_v2 [t-1] 4832.0 9.335679e-01 2.490617e-01 \n", "complexity_2011 [t-1] 4827.0 2.466957e+00 5.144976e-01 \n", "total_revenue_logged [t-1] 4832.0 1.546122e+01 1.655440e+00 \n", "program_efficiency [t-1] 4832.0 8.046025e-01 1.055736e-01 \n", "\n", " min 25% 50% 75% \\\n", "index 0.000000e+00 2428.25 4856.5 7284.75 \n", "2011_data 0.000000e+00 0.00 0.5 1.00 \n", "SOX_policies 0.000000e+00 NaN NaN NaN \n", "SOX_policies_all_binary 0.000000e+00 NaN NaN NaN \n", "SOX_policies_binary 0.000000e+00 NaN NaN NaN \n", "age 0.000000e+00 25.00 35.0 52.00 \n", "category_Animals 0.000000e+00 0.00 0.0 0.00 \n", "category_Arts, Culture, Humanities 0.000000e+00 0.00 0.0 0.00 \n", "category_Community Development 0.000000e+00 0.00 0.0 0.00 \n", "category_Education 0.000000e+00 0.00 0.0 0.00 \n", "category_Environment 0.000000e+00 0.00 0.0 0.00 \n", "category_Health 0.000000e+00 0.00 0.0 0.00 \n", "category_Human Services 0.000000e+00 0.00 0.0 0.00 \n", "category_Human and Civil Rights 0.000000e+00 0.00 0.0 0.00 \n", "category_International 0.000000e+00 0.00 0.0 0.00 \n", "category_Religion 0.000000e+00 0.00 0.0 0.00 \n", "category_Research and Public Policy 0.000000e+00 0.00 0.0 0.00 \n", "complexity 0.000000e+00 0.00 0.0 4.00 \n", "complexity_2011 1.000000e+00 NaN NaN NaN \n", "conflict_of_interest_policy_v2 0.000000e+00 NaN NaN NaN \n", "donor_advisory 0.000000e+00 NaN NaN NaN \n", "donor_advisory_2011_to_2016 0.000000e+00 0.00 0.0 0.00 \n", "donor_advisory_2016 0.000000e+00 0.00 0.0 0.00 \n", "program_efficiency 3.833359e-03 NaN NaN NaN \n", "records_retention_policy_v2 0.000000e+00 NaN NaN NaN \n", "tot_rev -4.263887e+07 NaN NaN NaN \n", "total_revenue_logged 0.000000e+00 NaN NaN NaN \n", "whistleblower_policy_v2 0.000000e+00 NaN NaN NaN \n", "SOX_policies [t-1] 0.000000e+00 NaN NaN NaN \n", "SOX_policies_binary [t-1] 0.000000e+00 NaN NaN NaN \n", "SOX_policies_all_binary [t-1] 0.000000e+00 NaN NaN NaN \n", "whistleblower_policy_v2 [t-1] 0.000000e+00 NaN NaN NaN \n", "records_retention_policy_v2 [t-1] 0.000000e+00 NaN NaN NaN \n", "conflict_of_interest_policy_v2 [t-1] 0.000000e+00 NaN NaN NaN \n", "complexity_2011 [t-1] 1.000000e+00 NaN NaN NaN \n", "total_revenue_logged [t-1] 0.000000e+00 NaN NaN NaN \n", "program_efficiency [t-1] 2.217704e-02 NaN NaN NaN \n", "\n", " max \n", "index 9.713000e+03 \n", "2011_data 1.000000e+00 \n", "SOX_policies 3.000000e+00 \n", "SOX_policies_all_binary 1.000000e+00 \n", "SOX_policies_binary 1.000000e+00 \n", "age 1.080000e+02 \n", "category_Animals 1.000000e+00 \n", "category_Arts, Culture, Humanities 1.000000e+00 \n", "category_Community Development 1.000000e+00 \n", "category_Education 1.000000e+00 \n", "category_Environment 1.000000e+00 \n", "category_Health 1.000000e+00 \n", "category_Human Services 1.000000e+00 \n", "category_Human and Civil Rights 1.000000e+00 \n", "category_International 1.000000e+00 \n", "category_Religion 1.000000e+00 \n", "category_Research and Public Policy 1.000000e+00 \n", "complexity 8.000000e+00 \n", "complexity_2011 3.000000e+00 \n", "conflict_of_interest_policy_v2 1.000000e+00 \n", "donor_advisory 1.000000e+00 \n", "donor_advisory_2011_to_2016 1.000000e+00 \n", "donor_advisory_2016 1.000000e+00 \n", "program_efficiency 9.976872e-01 \n", "records_retention_policy_v2 1.000000e+00 \n", "tot_rev 3.587230e+09 \n", "total_revenue_logged 2.200080e+01 \n", "whistleblower_policy_v2 1.000000e+00 \n", "SOX_policies [t-1] 3.000000e+00 \n", "SOX_policies_binary [t-1] 1.000000e+00 \n", "SOX_policies_all_binary [t-1] 1.000000e+00 \n", "whistleblower_policy_v2 [t-1] 1.000000e+00 \n", "records_retention_policy_v2 [t-1] 1.000000e+00 \n", "conflict_of_interest_policy_v2 [t-1] 1.000000e+00 \n", "complexity_2011 [t-1] 3.000000e+00 \n", "total_revenue_logged [t-1] 2.200080e+01 \n", "program_efficiency [t-1] 9.976872e-01 " ] }, "execution_count": 269, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod.describe().T" ] }, { "cell_type": "code", "execution_count": 250, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "24\n" ] } ], "source": [ "print len(df_2011_orgs_mod[(df_2011_orgs_mod['2011_data']==1) \n", " & (df_2011_orgs_mod['donor_advisory']==1)])\n", "#['SOX_policies'].describe()" ] }, { "cell_type": "code", "execution_count": 253, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.0 9620\n", "1.0 94\n", "Name: donor_advisory_2016, dtype: int64" ] }, "execution_count": 253, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod['donor_advisory_2016'].value_counts()" ] }, { "cell_type": "code", "execution_count": 270, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_2011_orgs_mod = df_2011_orgs_mod.reset_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Save DF" ] }, { "cell_type": "code", "execution_count": 275, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#df_2011_orgs_mod.to_pickle('df_2011_orgs_mod.pkl')\n", "df_2011_orgs_mod = pd.read_pickle('df_2011_orgs_mod.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Re-arrange and rename columns" ] }, { "cell_type": "code", "execution_count": 276, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['2016_data', 'EIN', 'FYE', 'index', '2011_data', 'Form 990 FYE', 'SOX_policies', 'SOX_policies_all_binary', 'SOX_policies_binary', 'age', 'category', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy', 'complexity', 'complexity_2011', 'conflict_of_interest_policy_v2', 'donor_advisory', 'donor_advisory_2011_to_2016', 'donor_advisory_2016', 'org_id', 'program_efficiency', 'ratings_system', 'records_retention_policy_v2', 'state', 'tot_rev', 'total_revenue_logged', 'whistleblower_policy_v2', 'SOX_policies [t-1]', 'SOX_policies_binary [t-1]', 'SOX_policies_all_binary [t-1]', 'whistleblower_policy_v2 [t-1]', 'records_retention_policy_v2 [t-1]', 'conflict_of_interest_policy_v2 [t-1]', 'complexity_2011 [t-1]', 'total_revenue_logged [t-1]', 'program_efficiency [t-1]']\n" ] } ], "source": [ "print df_2011_orgs_mod.columns.tolist()" ] }, { "cell_type": "code", "execution_count": 281, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINorg_idFYE2011_data2016_dataForm 990 FYEratings_systemdonor_advisorydonor_advisory_2011_to_2016donor_advisory_2016SOX_policiesSOX_policies [t-1]SOX_policies_binarySOX_policies_binary [t-1]SOX_policies_all_binarySOX_policies_all_binary [t-1]conflict_of_interest_policy_v2conflict_of_interest_policy_v2 [t-1]whistleblower_policy_v2whistleblower_policy_v2 [t-1]records_retention_policy_v2records_retention_policy_v2 [t-1]program_efficiencyprogram_efficiency [t-1]total_revenue_loggedtotal_revenue_logged [t-1]complexitycomplexity_2011complexity_2011 [t-1]agestatecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
00102024675954FY20140.01.02014-12CN 2.10.00.00.03.03.01.01.01.01.01.01.01.01.01.01.00.7944570.78889516.37799315.9475636.0NaN3.062.0MEResearch and Public Policy0.00.00.00.00.00.00.00.00.00.01.0
10102024675954FY20091.00.02009-12CN 2.00.00.00.03.0NaN1.0NaN1.0NaN1.0NaN1.0NaN1.0NaN0.788895NaN15.947563NaN0.03.0NaN62.0MEResearch and Public Policy0.00.00.00.00.00.00.00.00.00.01.0
20102115133916FY20140.01.02014-12CN 2.10.00.00.03.03.01.01.01.01.01.01.01.01.01.01.00.8332960.85885119.49085719.1152375.0NaN3.066.0MEHealth0.00.00.00.00.01.00.00.00.00.00.0
30102115133916FY20101.00.02010-05CN 2.00.00.00.03.0NaN1.0NaN1.0NaN1.0NaN1.0NaN1.0NaN0.858851NaN19.115237NaN0.03.0NaN66.0MEHealth0.00.00.00.00.01.00.00.00.00.00.0
\n", "
" ], "text/plain": [ " EIN org_id FYE 2011_data 2016_data Form 990 FYE ratings_system \\\n", "0 010202467 5954 FY2014 0.0 1.0 2014-12 CN 2.1 \n", "1 010202467 5954 FY2009 1.0 0.0 2009-12 CN 2.0 \n", "2 010211513 3916 FY2014 0.0 1.0 2014-12 CN 2.1 \n", "3 010211513 3916 FY2010 1.0 0.0 2010-05 CN 2.0 \n", "\n", " donor_advisory donor_advisory_2011_to_2016 donor_advisory_2016 \\\n", "0 0.0 0.0 0.0 \n", "1 0.0 0.0 0.0 \n", "2 0.0 0.0 0.0 \n", "3 0.0 0.0 0.0 \n", "\n", " SOX_policies SOX_policies [t-1] SOX_policies_binary \\\n", "0 3.0 3.0 1.0 \n", "1 3.0 NaN 1.0 \n", "2 3.0 3.0 1.0 \n", "3 3.0 NaN 1.0 \n", "\n", " SOX_policies_binary [t-1] SOX_policies_all_binary \\\n", "0 1.0 1.0 \n", "1 NaN 1.0 \n", "2 1.0 1.0 \n", "3 NaN 1.0 \n", "\n", " SOX_policies_all_binary [t-1] conflict_of_interest_policy_v2 \\\n", "0 1.0 1.0 \n", "1 NaN 1.0 \n", "2 1.0 1.0 \n", "3 NaN 1.0 \n", "\n", " conflict_of_interest_policy_v2 [t-1] whistleblower_policy_v2 \\\n", "0 1.0 1.0 \n", "1 NaN 1.0 \n", "2 1.0 1.0 \n", "3 NaN 1.0 \n", "\n", " whistleblower_policy_v2 [t-1] records_retention_policy_v2 \\\n", "0 1.0 1.0 \n", "1 NaN 1.0 \n", "2 1.0 1.0 \n", "3 NaN 1.0 \n", "\n", " records_retention_policy_v2 [t-1] program_efficiency \\\n", "0 1.0 0.794457 \n", "1 NaN 0.788895 \n", "2 1.0 0.833296 \n", "3 NaN 0.858851 \n", "\n", " program_efficiency [t-1] total_revenue_logged total_revenue_logged [t-1] \\\n", "0 0.788895 16.377993 15.947563 \n", "1 NaN 15.947563 NaN \n", "2 0.858851 19.490857 19.115237 \n", "3 NaN 19.115237 NaN \n", "\n", " complexity complexity_2011 complexity_2011 [t-1] age state \\\n", "0 6.0 NaN 3.0 62.0 ME \n", "1 0.0 3.0 NaN 62.0 ME \n", "2 5.0 NaN 3.0 66.0 ME \n", "3 0.0 3.0 NaN 66.0 ME \n", "\n", " category category_Animals \\\n", "0 Research and Public Policy 0.0 \n", "1 Research and Public Policy 0.0 \n", "2 Health 0.0 \n", "3 Health 0.0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 0.0 0.0 \n", "\n", " category_Education category_Environment category_Health \\\n", "0 0.0 0.0 0.0 \n", "1 0.0 0.0 0.0 \n", "2 0.0 0.0 1.0 \n", "3 0.0 0.0 1.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 0.0 0.0 \n", "\n", " category_International category_Religion \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 0.0 0.0 \n", "\n", " category_Research and Public Policy \n", "0 1.0 \n", "1 1.0 \n", "2 0.0 \n", "3 0.0 " ] }, "execution_count": 281, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#'index', 'tot_rev', \n", "df_2011_orgs_mod = df_2011_orgs_mod[['EIN', 'org_id', 'FYE', '2011_data', '2016_data', 'Form 990 FYE', 'ratings_system', \n", " 'donor_advisory', 'donor_advisory_2011_to_2016', 'donor_advisory_2016',\n", " 'SOX_policies', 'SOX_policies [t-1]', 'SOX_policies_binary', 'SOX_policies_binary [t-1]',\n", " 'SOX_policies_all_binary', 'SOX_policies_all_binary [t-1]',\n", " 'conflict_of_interest_policy_v2', 'conflict_of_interest_policy_v2 [t-1]',\n", " 'whistleblower_policy_v2', 'whistleblower_policy_v2 [t-1]',\n", " 'records_retention_policy_v2', 'records_retention_policy_v2 [t-1]', \n", " 'program_efficiency', 'program_efficiency [t-1]',\n", " 'total_revenue_logged', 'total_revenue_logged [t-1]', \n", " 'complexity', 'complexity_2011', 'complexity_2011 [t-1]', \n", " 'age', 'state', 'category', \n", " 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', \n", " 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', \n", " 'category_Human and Civil Rights', 'category_International', 'category_Religion', \n", " 'category_Research and Public Policy'\n", " ]]\n", "df_2011_orgs_mod[:4]" ] }, { "cell_type": "code", "execution_count": 280, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data4857.00.000000e+000.000000e+000.0000000.0000000.0000000.0000000.000000e+00
2016_data4857.01.000000e+000.000000e+001.0000001.0000001.0000001.0000001.000000e+00
donor_advisory4857.09.676755e-039.790347e-020.0000000.0000000.0000000.0000001.000000e+00
donor_advisory_2011_to_20164857.02.223595e-021.474652e-010.0000000.0000000.0000000.0000001.000000e+00
donor_advisory_20164857.09.676755e-039.790347e-020.0000000.0000000.0000000.0000001.000000e+00
SOX_policies4857.02.870496e+004.882352e-010.0000003.0000003.0000003.0000003.000000e+00
SOX_policies [t-1]4832.02.532906e+008.700219e-010.000000NaNNaNNaN3.000000e+00
SOX_policies_binary4857.09.859996e-011.175042e-010.0000001.0000001.0000001.0000001.000000e+00
SOX_policies_binary [t-1]4832.09.472268e-012.236035e-010.000000NaNNaNNaN1.000000e+00
SOX_policies_all_binary4857.09.192917e-012.724146e-010.0000001.0000001.0000001.0000001.000000e+00
SOX_policies_all_binary [t-1]4832.07.340646e-014.418758e-010.000000NaNNaNNaN1.000000e+00
conflict_of_interest_policy_v24857.09.820877e-011.326464e-010.0000001.0000001.0000001.0000001.000000e+00
conflict_of_interest_policy_v2 [t-1]4832.09.335679e-012.490617e-010.000000NaNNaNNaN1.000000e+00
whistleblower_policy_v24857.09.477043e-012.226455e-010.0000001.0000001.0000001.0000001.000000e+00
whistleblower_policy_v2 [t-1]4832.07.998758e-014.001345e-010.000000NaNNaNNaN1.000000e+00
records_retention_policy_v24857.09.407041e-012.362019e-010.0000001.0000001.0000001.0000001.000000e+00
records_retention_policy_v2 [t-1]4832.07.994619e-014.004444e-010.000000NaNNaNNaN1.000000e+00
program_efficiency4857.08.040641e-011.074864e-010.0038330.7559600.8167990.8691419.971109e-01
program_efficiency [t-1]4832.08.046025e-011.055736e-010.022177NaNNaNNaN9.976872e-01
total_revenue_logged4857.01.576144e+011.381395e+000.00000014.79201415.60409216.5760352.196787e+01
total_revenue_logged [t-1]4832.01.546122e+011.655440e+000.000000NaNNaNNaN2.200080e+01
tot_rev593.06.462455e+071.746480e+08234562.000000NaNNaNNaN2.974134e+09
complexity4857.04.023060e+001.338153e+000.0000003.0000004.0000005.0000008.000000e+00
complexity_201141.02.121951e+005.096627e-011.000000NaNNaNNaN3.000000e+00
complexity_2011 [t-1]4827.02.466957e+005.144976e-011.000000NaNNaNNaN3.000000e+00
age4857.04.003521e+011.922688e+010.00000025.00000035.00000052.0000001.080000e+02
category_Animals4857.07.700226e-022.666225e-010.0000000.0000000.0000000.0000001.000000e+00
category_Arts, Culture, Humanities4857.01.393865e-013.463850e-010.0000000.0000000.0000000.0000001.000000e+00
category_Community Development4857.07.967881e-022.708232e-010.0000000.0000000.0000000.0000001.000000e+00
category_Education4857.05.826642e-022.342707e-010.0000000.0000000.0000000.0000001.000000e+00
category_Environment4857.06.588429e-022.481053e-010.0000000.0000000.0000000.0000001.000000e+00
category_Health4857.01.192094e-013.240681e-010.0000000.0000000.0000000.0000001.000000e+00
category_Human Services4857.02.476838e-014.317116e-010.0000000.0000000.0000000.0000001.000000e+00
category_Human and Civil Rights4857.03.747169e-021.899342e-010.0000000.0000000.0000000.0000001.000000e+00
category_International4857.08.935557e-022.852857e-010.0000000.0000000.0000000.0000001.000000e+00
category_Religion4857.06.094297e-022.392503e-010.0000000.0000000.0000000.0000001.000000e+00
category_Research and Public Policy4857.02.511839e-021.565008e-010.0000000.0000000.0000000.0000001.000000e+00
\n", "
" ], "text/plain": [ " count mean std \\\n", "2011_data 4857.0 0.000000e+00 0.000000e+00 \n", "2016_data 4857.0 1.000000e+00 0.000000e+00 \n", "donor_advisory 4857.0 9.676755e-03 9.790347e-02 \n", "donor_advisory_2011_to_2016 4857.0 2.223595e-02 1.474652e-01 \n", "donor_advisory_2016 4857.0 9.676755e-03 9.790347e-02 \n", "SOX_policies 4857.0 2.870496e+00 4.882352e-01 \n", "SOX_policies [t-1] 4832.0 2.532906e+00 8.700219e-01 \n", "SOX_policies_binary 4857.0 9.859996e-01 1.175042e-01 \n", "SOX_policies_binary [t-1] 4832.0 9.472268e-01 2.236035e-01 \n", "SOX_policies_all_binary 4857.0 9.192917e-01 2.724146e-01 \n", "SOX_policies_all_binary [t-1] 4832.0 7.340646e-01 4.418758e-01 \n", "conflict_of_interest_policy_v2 4857.0 9.820877e-01 1.326464e-01 \n", "conflict_of_interest_policy_v2 [t-1] 4832.0 9.335679e-01 2.490617e-01 \n", "whistleblower_policy_v2 4857.0 9.477043e-01 2.226455e-01 \n", "whistleblower_policy_v2 [t-1] 4832.0 7.998758e-01 4.001345e-01 \n", "records_retention_policy_v2 4857.0 9.407041e-01 2.362019e-01 \n", "records_retention_policy_v2 [t-1] 4832.0 7.994619e-01 4.004444e-01 \n", "program_efficiency 4857.0 8.040641e-01 1.074864e-01 \n", "program_efficiency [t-1] 4832.0 8.046025e-01 1.055736e-01 \n", "total_revenue_logged 4857.0 1.576144e+01 1.381395e+00 \n", "total_revenue_logged [t-1] 4832.0 1.546122e+01 1.655440e+00 \n", "tot_rev 593.0 6.462455e+07 1.746480e+08 \n", "complexity 4857.0 4.023060e+00 1.338153e+00 \n", "complexity_2011 41.0 2.121951e+00 5.096627e-01 \n", "complexity_2011 [t-1] 4827.0 2.466957e+00 5.144976e-01 \n", "age 4857.0 4.003521e+01 1.922688e+01 \n", "category_Animals 4857.0 7.700226e-02 2.666225e-01 \n", "category_Arts, Culture, Humanities 4857.0 1.393865e-01 3.463850e-01 \n", "category_Community Development 4857.0 7.967881e-02 2.708232e-01 \n", "category_Education 4857.0 5.826642e-02 2.342707e-01 \n", "category_Environment 4857.0 6.588429e-02 2.481053e-01 \n", "category_Health 4857.0 1.192094e-01 3.240681e-01 \n", "category_Human Services 4857.0 2.476838e-01 4.317116e-01 \n", "category_Human and Civil Rights 4857.0 3.747169e-02 1.899342e-01 \n", "category_International 4857.0 8.935557e-02 2.852857e-01 \n", "category_Religion 4857.0 6.094297e-02 2.392503e-01 \n", "category_Research and Public Policy 4857.0 2.511839e-02 1.565008e-01 \n", "\n", " min 25% 50% \\\n", "2011_data 0.000000 0.000000 0.000000 \n", "2016_data 1.000000 1.000000 1.000000 \n", "donor_advisory 0.000000 0.000000 0.000000 \n", "donor_advisory_2011_to_2016 0.000000 0.000000 0.000000 \n", "donor_advisory_2016 0.000000 0.000000 0.000000 \n", "SOX_policies 0.000000 3.000000 3.000000 \n", "SOX_policies [t-1] 0.000000 NaN NaN \n", "SOX_policies_binary 0.000000 1.000000 1.000000 \n", "SOX_policies_binary [t-1] 0.000000 NaN NaN \n", "SOX_policies_all_binary 0.000000 1.000000 1.000000 \n", "SOX_policies_all_binary [t-1] 0.000000 NaN NaN \n", "conflict_of_interest_policy_v2 0.000000 1.000000 1.000000 \n", "conflict_of_interest_policy_v2 [t-1] 0.000000 NaN NaN \n", "whistleblower_policy_v2 0.000000 1.000000 1.000000 \n", "whistleblower_policy_v2 [t-1] 0.000000 NaN NaN \n", "records_retention_policy_v2 0.000000 1.000000 1.000000 \n", "records_retention_policy_v2 [t-1] 0.000000 NaN NaN \n", "program_efficiency 0.003833 0.755960 0.816799 \n", "program_efficiency [t-1] 0.022177 NaN NaN \n", "total_revenue_logged 0.000000 14.792014 15.604092 \n", "total_revenue_logged [t-1] 0.000000 NaN NaN \n", "tot_rev 234562.000000 NaN NaN \n", "complexity 0.000000 3.000000 4.000000 \n", "complexity_2011 1.000000 NaN NaN \n", "complexity_2011 [t-1] 1.000000 NaN NaN \n", "age 0.000000 25.000000 35.000000 \n", "category_Animals 0.000000 0.000000 0.000000 \n", "category_Arts, Culture, Humanities 0.000000 0.000000 0.000000 \n", "category_Community Development 0.000000 0.000000 0.000000 \n", "category_Education 0.000000 0.000000 0.000000 \n", "category_Environment 0.000000 0.000000 0.000000 \n", "category_Health 0.000000 0.000000 0.000000 \n", "category_Human Services 0.000000 0.000000 0.000000 \n", "category_Human and Civil Rights 0.000000 0.000000 0.000000 \n", "category_International 0.000000 0.000000 0.000000 \n", "category_Religion 0.000000 0.000000 0.000000 \n", "category_Research and Public Policy 0.000000 0.000000 0.000000 \n", "\n", " 75% max \n", "2011_data 0.000000 0.000000e+00 \n", "2016_data 1.000000 1.000000e+00 \n", "donor_advisory 0.000000 1.000000e+00 \n", "donor_advisory_2011_to_2016 0.000000 1.000000e+00 \n", "donor_advisory_2016 0.000000 1.000000e+00 \n", "SOX_policies 3.000000 3.000000e+00 \n", "SOX_policies [t-1] NaN 3.000000e+00 \n", "SOX_policies_binary 1.000000 1.000000e+00 \n", "SOX_policies_binary [t-1] NaN 1.000000e+00 \n", "SOX_policies_all_binary 1.000000 1.000000e+00 \n", "SOX_policies_all_binary [t-1] NaN 1.000000e+00 \n", "conflict_of_interest_policy_v2 1.000000 1.000000e+00 \n", "conflict_of_interest_policy_v2 [t-1] NaN 1.000000e+00 \n", "whistleblower_policy_v2 1.000000 1.000000e+00 \n", "whistleblower_policy_v2 [t-1] NaN 1.000000e+00 \n", "records_retention_policy_v2 1.000000 1.000000e+00 \n", "records_retention_policy_v2 [t-1] NaN 1.000000e+00 \n", "program_efficiency 0.869141 9.971109e-01 \n", "program_efficiency [t-1] NaN 9.976872e-01 \n", "total_revenue_logged 16.576035 2.196787e+01 \n", "total_revenue_logged [t-1] NaN 2.200080e+01 \n", "tot_rev NaN 2.974134e+09 \n", "complexity 5.000000 8.000000e+00 \n", "complexity_2011 NaN 3.000000e+00 \n", "complexity_2011 [t-1] NaN 3.000000e+00 \n", "age 52.000000 1.080000e+02 \n", "category_Animals 0.000000 1.000000e+00 \n", "category_Arts, Culture, Humanities 0.000000 1.000000e+00 \n", "category_Community Development 0.000000 1.000000e+00 \n", "category_Education 0.000000 1.000000e+00 \n", "category_Environment 0.000000 1.000000e+00 \n", "category_Health 0.000000 1.000000e+00 \n", "category_Human Services 0.000000 1.000000e+00 \n", "category_Human and Civil Rights 0.000000 1.000000e+00 \n", "category_International 0.000000 1.000000e+00 \n", "category_Religion 0.000000 1.000000e+00 \n", "category_Research and Public Policy 0.000000 1.000000e+00 " ] }, "execution_count": 280, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[df_2011_orgs_mod['2016_data']==1].describe().T" ] }, { "cell_type": "code", "execution_count": 279, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data4857.01.000000e+000.000000e+001.000000e+001.01.01.01.000000e+00
2016_data4857.00.000000e+000.000000e+000.000000e+000.00.00.00.000000e+00
donor_advisory4809.04.990643e-037.047531e-020.000000e+00NaNNaNNaN1.000000e+00
donor_advisory_2011_to_20164857.02.223595e-021.474652e-010.000000e+000.00.00.01.000000e+00
donor_advisory_20164857.09.676755e-039.790347e-020.000000e+000.00.00.01.000000e+00
SOX_policies4832.02.532906e+008.700219e-010.000000e+00NaNNaNNaN3.000000e+00
SOX_policies [t-1]0.0NaNNaNNaNNaNNaNNaNNaN
SOX_policies_binary4832.09.472268e-012.236035e-010.000000e+00NaNNaNNaN1.000000e+00
SOX_policies_binary [t-1]0.0NaNNaNNaNNaNNaNNaNNaN
SOX_policies_all_binary4832.07.340646e-014.418758e-010.000000e+00NaNNaNNaN1.000000e+00
SOX_policies_all_binary [t-1]0.0NaNNaNNaNNaNNaNNaNNaN
conflict_of_interest_policy_v24832.09.335679e-012.490617e-010.000000e+00NaNNaNNaN1.000000e+00
conflict_of_interest_policy_v2 [t-1]0.0NaNNaNNaNNaNNaNNaNNaN
whistleblower_policy_v24832.07.998758e-014.001345e-010.000000e+00NaNNaNNaN1.000000e+00
whistleblower_policy_v2 [t-1]0.0NaNNaNNaNNaNNaNNaNNaN
records_retention_policy_v24832.07.994619e-014.004444e-010.000000e+00NaNNaNNaN1.000000e+00
records_retention_policy_v2 [t-1]0.0NaNNaNNaNNaNNaNNaNNaN
program_efficiency4832.08.046025e-011.055736e-012.217704e-02NaNNaNNaN9.976872e-01
program_efficiency [t-1]0.0NaNNaNNaNNaNNaNNaNNaN
total_revenue_logged4832.01.546122e+011.655440e+000.000000e+00NaNNaNNaN2.200080e+01
total_revenue_logged [t-1]0.0NaNNaNNaNNaNNaNNaNNaN
tot_rev1255.04.316562e+071.379616e+08-4.263887e+07NaNNaNNaN3.587230e+09
complexity4857.00.000000e+000.000000e+000.000000e+000.00.00.00.000000e+00
complexity_20114827.02.466957e+005.144976e-011.000000e+00NaNNaNNaN3.000000e+00
complexity_2011 [t-1]0.0NaNNaNNaNNaNNaNNaNNaN
age4857.04.003521e+011.922688e+010.000000e+0025.035.052.01.080000e+02
category_Animals4857.07.700226e-022.666225e-010.000000e+000.00.00.01.000000e+00
category_Arts, Culture, Humanities4857.01.393865e-013.463850e-010.000000e+000.00.00.01.000000e+00
category_Community Development4857.07.967881e-022.708232e-010.000000e+000.00.00.01.000000e+00
category_Education4857.05.826642e-022.342707e-010.000000e+000.00.00.01.000000e+00
category_Environment4857.06.588429e-022.481053e-010.000000e+000.00.00.01.000000e+00
category_Health4857.01.192094e-013.240681e-010.000000e+000.00.00.01.000000e+00
category_Human Services4857.02.476838e-014.317116e-010.000000e+000.00.00.01.000000e+00
category_Human and Civil Rights4857.03.747169e-021.899342e-010.000000e+000.00.00.01.000000e+00
category_International4857.08.935557e-022.852857e-010.000000e+000.00.00.01.000000e+00
category_Religion4857.06.094297e-022.392503e-010.000000e+000.00.00.01.000000e+00
category_Research and Public Policy4857.02.511839e-021.565008e-010.000000e+000.00.00.01.000000e+00
\n", "
" ], "text/plain": [ " count mean std \\\n", "2011_data 4857.0 1.000000e+00 0.000000e+00 \n", "2016_data 4857.0 0.000000e+00 0.000000e+00 \n", "donor_advisory 4809.0 4.990643e-03 7.047531e-02 \n", "donor_advisory_2011_to_2016 4857.0 2.223595e-02 1.474652e-01 \n", "donor_advisory_2016 4857.0 9.676755e-03 9.790347e-02 \n", "SOX_policies 4832.0 2.532906e+00 8.700219e-01 \n", "SOX_policies [t-1] 0.0 NaN NaN \n", "SOX_policies_binary 4832.0 9.472268e-01 2.236035e-01 \n", "SOX_policies_binary [t-1] 0.0 NaN NaN \n", "SOX_policies_all_binary 4832.0 7.340646e-01 4.418758e-01 \n", "SOX_policies_all_binary [t-1] 0.0 NaN NaN \n", "conflict_of_interest_policy_v2 4832.0 9.335679e-01 2.490617e-01 \n", "conflict_of_interest_policy_v2 [t-1] 0.0 NaN NaN \n", "whistleblower_policy_v2 4832.0 7.998758e-01 4.001345e-01 \n", "whistleblower_policy_v2 [t-1] 0.0 NaN NaN \n", "records_retention_policy_v2 4832.0 7.994619e-01 4.004444e-01 \n", "records_retention_policy_v2 [t-1] 0.0 NaN NaN \n", "program_efficiency 4832.0 8.046025e-01 1.055736e-01 \n", "program_efficiency [t-1] 0.0 NaN NaN \n", "total_revenue_logged 4832.0 1.546122e+01 1.655440e+00 \n", "total_revenue_logged [t-1] 0.0 NaN NaN \n", "tot_rev 1255.0 4.316562e+07 1.379616e+08 \n", "complexity 4857.0 0.000000e+00 0.000000e+00 \n", "complexity_2011 4827.0 2.466957e+00 5.144976e-01 \n", "complexity_2011 [t-1] 0.0 NaN NaN \n", "age 4857.0 4.003521e+01 1.922688e+01 \n", "category_Animals 4857.0 7.700226e-02 2.666225e-01 \n", "category_Arts, Culture, Humanities 4857.0 1.393865e-01 3.463850e-01 \n", "category_Community Development 4857.0 7.967881e-02 2.708232e-01 \n", "category_Education 4857.0 5.826642e-02 2.342707e-01 \n", "category_Environment 4857.0 6.588429e-02 2.481053e-01 \n", "category_Health 4857.0 1.192094e-01 3.240681e-01 \n", "category_Human Services 4857.0 2.476838e-01 4.317116e-01 \n", "category_Human and Civil Rights 4857.0 3.747169e-02 1.899342e-01 \n", "category_International 4857.0 8.935557e-02 2.852857e-01 \n", "category_Religion 4857.0 6.094297e-02 2.392503e-01 \n", "category_Research and Public Policy 4857.0 2.511839e-02 1.565008e-01 \n", "\n", " min 25% 50% 75% \\\n", "2011_data 1.000000e+00 1.0 1.0 1.0 \n", "2016_data 0.000000e+00 0.0 0.0 0.0 \n", "donor_advisory 0.000000e+00 NaN NaN NaN \n", "donor_advisory_2011_to_2016 0.000000e+00 0.0 0.0 0.0 \n", "donor_advisory_2016 0.000000e+00 0.0 0.0 0.0 \n", "SOX_policies 0.000000e+00 NaN NaN NaN \n", "SOX_policies [t-1] NaN NaN NaN NaN \n", "SOX_policies_binary 0.000000e+00 NaN NaN NaN \n", "SOX_policies_binary [t-1] NaN NaN NaN NaN \n", "SOX_policies_all_binary 0.000000e+00 NaN NaN NaN \n", "SOX_policies_all_binary [t-1] NaN NaN NaN NaN \n", "conflict_of_interest_policy_v2 0.000000e+00 NaN NaN NaN \n", "conflict_of_interest_policy_v2 [t-1] NaN NaN NaN NaN \n", "whistleblower_policy_v2 0.000000e+00 NaN NaN NaN \n", "whistleblower_policy_v2 [t-1] NaN NaN NaN NaN \n", "records_retention_policy_v2 0.000000e+00 NaN NaN NaN \n", "records_retention_policy_v2 [t-1] NaN NaN NaN NaN \n", "program_efficiency 2.217704e-02 NaN NaN NaN \n", "program_efficiency [t-1] NaN NaN NaN NaN \n", "total_revenue_logged 0.000000e+00 NaN NaN NaN \n", "total_revenue_logged [t-1] NaN NaN NaN NaN \n", "tot_rev -4.263887e+07 NaN NaN NaN \n", "complexity 0.000000e+00 0.0 0.0 0.0 \n", "complexity_2011 1.000000e+00 NaN NaN NaN \n", "complexity_2011 [t-1] NaN NaN NaN NaN \n", "age 0.000000e+00 25.0 35.0 52.0 \n", "category_Animals 0.000000e+00 0.0 0.0 0.0 \n", "category_Arts, Culture, Humanities 0.000000e+00 0.0 0.0 0.0 \n", "category_Community Development 0.000000e+00 0.0 0.0 0.0 \n", "category_Education 0.000000e+00 0.0 0.0 0.0 \n", "category_Environment 0.000000e+00 0.0 0.0 0.0 \n", "category_Health 0.000000e+00 0.0 0.0 0.0 \n", "category_Human Services 0.000000e+00 0.0 0.0 0.0 \n", "category_Human and Civil Rights 0.000000e+00 0.0 0.0 0.0 \n", "category_International 0.000000e+00 0.0 0.0 0.0 \n", "category_Religion 0.000000e+00 0.0 0.0 0.0 \n", "category_Research and Public Policy 0.000000e+00 0.0 0.0 0.0 \n", "\n", " max \n", "2011_data 1.000000e+00 \n", "2016_data 0.000000e+00 \n", "donor_advisory 1.000000e+00 \n", "donor_advisory_2011_to_2016 1.000000e+00 \n", "donor_advisory_2016 1.000000e+00 \n", "SOX_policies 3.000000e+00 \n", "SOX_policies [t-1] NaN \n", "SOX_policies_binary 1.000000e+00 \n", "SOX_policies_binary [t-1] NaN \n", "SOX_policies_all_binary 1.000000e+00 \n", "SOX_policies_all_binary [t-1] NaN \n", "conflict_of_interest_policy_v2 1.000000e+00 \n", "conflict_of_interest_policy_v2 [t-1] NaN \n", "whistleblower_policy_v2 1.000000e+00 \n", "whistleblower_policy_v2 [t-1] NaN \n", "records_retention_policy_v2 1.000000e+00 \n", "records_retention_policy_v2 [t-1] NaN \n", "program_efficiency 9.976872e-01 \n", "program_efficiency [t-1] NaN \n", "total_revenue_logged 2.200080e+01 \n", "total_revenue_logged [t-1] NaN \n", "tot_rev 3.587230e+09 \n", "complexity 0.000000e+00 \n", "complexity_2011 3.000000e+00 \n", "complexity_2011 [t-1] NaN \n", "age 1.080000e+02 \n", "category_Animals 1.000000e+00 \n", "category_Arts, Culture, Humanities 1.000000e+00 \n", "category_Community Development 1.000000e+00 \n", "category_Education 1.000000e+00 \n", "category_Environment 1.000000e+00 \n", "category_Health 1.000000e+00 \n", "category_Human Services 1.000000e+00 \n", "category_Human and Civil Rights 1.000000e+00 \n", "category_International 1.000000e+00 \n", "category_Religion 1.000000e+00 \n", "category_Research and Public Policy 1.000000e+00 " ] }, "execution_count": 279, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[df_2011_orgs_mod['2011_data']==1].describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Save DF" ] }, { "cell_type": "code", "execution_count": 282, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_2011_orgs_mod.to_pickle('df_2011_orgs_mod (2011 and 2016 rows).pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Limit data to 2016" ] }, { "cell_type": "code", "execution_count": 284, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4857\n", "4857\n" ] } ], "source": [ "print len(df_2011_orgs_mod[df_2011_orgs_mod['2016_data']==1])\n", "df_2011_orgs_mod = df_2011_orgs_mod[df_2011_orgs_mod['2016_data']==1]\n", "print len(df_2011_orgs_mod)" ] }, { "cell_type": "code", "execution_count": 285, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data4857.00.0000000.0000000.0000000.0000000.0000000.0000000.000000
2016_data4857.01.0000000.0000001.0000001.0000001.0000001.0000001.000000
donor_advisory4857.00.0096770.0979030.0000000.0000000.0000000.0000001.000000
donor_advisory_2011_to_20164857.00.0222360.1474650.0000000.0000000.0000000.0000001.000000
donor_advisory_20164857.00.0096770.0979030.0000000.0000000.0000000.0000001.000000
SOX_policies4857.02.8704960.4882350.0000003.0000003.0000003.0000003.000000
SOX_policies [t-1]4832.02.5329060.8700220.000000NaNNaNNaN3.000000
SOX_policies_binary4857.00.9860000.1175040.0000001.0000001.0000001.0000001.000000
SOX_policies_binary [t-1]4832.00.9472270.2236030.000000NaNNaNNaN1.000000
SOX_policies_all_binary4857.00.9192920.2724150.0000001.0000001.0000001.0000001.000000
SOX_policies_all_binary [t-1]4832.00.7340650.4418760.000000NaNNaNNaN1.000000
conflict_of_interest_policy_v24857.00.9820880.1326460.0000001.0000001.0000001.0000001.000000
conflict_of_interest_policy_v2 [t-1]4832.00.9335680.2490620.000000NaNNaNNaN1.000000
whistleblower_policy_v24857.00.9477040.2226460.0000001.0000001.0000001.0000001.000000
whistleblower_policy_v2 [t-1]4832.00.7998760.4001350.000000NaNNaNNaN1.000000
records_retention_policy_v24857.00.9407040.2362020.0000001.0000001.0000001.0000001.000000
records_retention_policy_v2 [t-1]4832.00.7994620.4004440.000000NaNNaNNaN1.000000
program_efficiency4857.00.8040640.1074860.0038330.7559600.8167990.8691410.997111
program_efficiency [t-1]4832.00.8046020.1055740.022177NaNNaNNaN0.997687
total_revenue_logged4857.015.7614421.3813950.00000014.79201415.60409216.57603521.967868
total_revenue_logged [t-1]4832.015.4612171.6554400.000000NaNNaNNaN22.000798
complexity4857.04.0230601.3381530.0000003.0000004.0000005.0000008.000000
complexity_201141.02.1219510.5096631.000000NaNNaNNaN3.000000
complexity_2011 [t-1]4827.02.4669570.5144981.000000NaNNaNNaN3.000000
age4857.040.03520719.2268760.00000025.00000035.00000052.000000108.000000
category_Animals4857.00.0770020.2666220.0000000.0000000.0000000.0000001.000000
category_Arts, Culture, Humanities4857.00.1393860.3463850.0000000.0000000.0000000.0000001.000000
category_Community Development4857.00.0796790.2708230.0000000.0000000.0000000.0000001.000000
category_Education4857.00.0582660.2342710.0000000.0000000.0000000.0000001.000000
category_Environment4857.00.0658840.2481050.0000000.0000000.0000000.0000001.000000
category_Health4857.00.1192090.3240680.0000000.0000000.0000000.0000001.000000
category_Human Services4857.00.2476840.4317120.0000000.0000000.0000000.0000001.000000
category_Human and Civil Rights4857.00.0374720.1899340.0000000.0000000.0000000.0000001.000000
category_International4857.00.0893560.2852860.0000000.0000000.0000000.0000001.000000
category_Religion4857.00.0609430.2392500.0000000.0000000.0000000.0000001.000000
category_Research and Public Policy4857.00.0251180.1565010.0000000.0000000.0000000.0000001.000000
\n", "
" ], "text/plain": [ " count mean std min \\\n", "2011_data 4857.0 0.000000 0.000000 0.000000 \n", "2016_data 4857.0 1.000000 0.000000 1.000000 \n", "donor_advisory 4857.0 0.009677 0.097903 0.000000 \n", "donor_advisory_2011_to_2016 4857.0 0.022236 0.147465 0.000000 \n", "donor_advisory_2016 4857.0 0.009677 0.097903 0.000000 \n", "SOX_policies 4857.0 2.870496 0.488235 0.000000 \n", "SOX_policies [t-1] 4832.0 2.532906 0.870022 0.000000 \n", "SOX_policies_binary 4857.0 0.986000 0.117504 0.000000 \n", "SOX_policies_binary [t-1] 4832.0 0.947227 0.223603 0.000000 \n", "SOX_policies_all_binary 4857.0 0.919292 0.272415 0.000000 \n", "SOX_policies_all_binary [t-1] 4832.0 0.734065 0.441876 0.000000 \n", "conflict_of_interest_policy_v2 4857.0 0.982088 0.132646 0.000000 \n", "conflict_of_interest_policy_v2 [t-1] 4832.0 0.933568 0.249062 0.000000 \n", "whistleblower_policy_v2 4857.0 0.947704 0.222646 0.000000 \n", "whistleblower_policy_v2 [t-1] 4832.0 0.799876 0.400135 0.000000 \n", "records_retention_policy_v2 4857.0 0.940704 0.236202 0.000000 \n", "records_retention_policy_v2 [t-1] 4832.0 0.799462 0.400444 0.000000 \n", "program_efficiency 4857.0 0.804064 0.107486 0.003833 \n", "program_efficiency [t-1] 4832.0 0.804602 0.105574 0.022177 \n", "total_revenue_logged 4857.0 15.761442 1.381395 0.000000 \n", "total_revenue_logged [t-1] 4832.0 15.461217 1.655440 0.000000 \n", "complexity 4857.0 4.023060 1.338153 0.000000 \n", "complexity_2011 41.0 2.121951 0.509663 1.000000 \n", "complexity_2011 [t-1] 4827.0 2.466957 0.514498 1.000000 \n", "age 4857.0 40.035207 19.226876 0.000000 \n", "category_Animals 4857.0 0.077002 0.266622 0.000000 \n", "category_Arts, Culture, Humanities 4857.0 0.139386 0.346385 0.000000 \n", "category_Community Development 4857.0 0.079679 0.270823 0.000000 \n", "category_Education 4857.0 0.058266 0.234271 0.000000 \n", "category_Environment 4857.0 0.065884 0.248105 0.000000 \n", "category_Health 4857.0 0.119209 0.324068 0.000000 \n", "category_Human Services 4857.0 0.247684 0.431712 0.000000 \n", "category_Human and Civil Rights 4857.0 0.037472 0.189934 0.000000 \n", "category_International 4857.0 0.089356 0.285286 0.000000 \n", "category_Religion 4857.0 0.060943 0.239250 0.000000 \n", "category_Research and Public Policy 4857.0 0.025118 0.156501 0.000000 \n", "\n", " 25% 50% 75% \\\n", "2011_data 0.000000 0.000000 0.000000 \n", "2016_data 1.000000 1.000000 1.000000 \n", "donor_advisory 0.000000 0.000000 0.000000 \n", "donor_advisory_2011_to_2016 0.000000 0.000000 0.000000 \n", "donor_advisory_2016 0.000000 0.000000 0.000000 \n", "SOX_policies 3.000000 3.000000 3.000000 \n", "SOX_policies [t-1] NaN NaN NaN \n", "SOX_policies_binary 1.000000 1.000000 1.000000 \n", "SOX_policies_binary [t-1] NaN NaN NaN \n", "SOX_policies_all_binary 1.000000 1.000000 1.000000 \n", "SOX_policies_all_binary [t-1] NaN NaN NaN \n", "conflict_of_interest_policy_v2 1.000000 1.000000 1.000000 \n", "conflict_of_interest_policy_v2 [t-1] NaN NaN NaN \n", "whistleblower_policy_v2 1.000000 1.000000 1.000000 \n", "whistleblower_policy_v2 [t-1] NaN NaN NaN \n", "records_retention_policy_v2 1.000000 1.000000 1.000000 \n", "records_retention_policy_v2 [t-1] NaN NaN NaN \n", "program_efficiency 0.755960 0.816799 0.869141 \n", "program_efficiency [t-1] NaN NaN NaN \n", "total_revenue_logged 14.792014 15.604092 16.576035 \n", "total_revenue_logged [t-1] NaN NaN NaN \n", "complexity 3.000000 4.000000 5.000000 \n", "complexity_2011 NaN NaN NaN \n", "complexity_2011 [t-1] NaN NaN NaN \n", "age 25.000000 35.000000 52.000000 \n", "category_Animals 0.000000 0.000000 0.000000 \n", "category_Arts, Culture, Humanities 0.000000 0.000000 0.000000 \n", "category_Community Development 0.000000 0.000000 0.000000 \n", "category_Education 0.000000 0.000000 0.000000 \n", "category_Environment 0.000000 0.000000 0.000000 \n", "category_Health 0.000000 0.000000 0.000000 \n", "category_Human Services 0.000000 0.000000 0.000000 \n", "category_Human and Civil Rights 0.000000 0.000000 0.000000 \n", "category_International 0.000000 0.000000 0.000000 \n", "category_Religion 0.000000 0.000000 0.000000 \n", "category_Research and Public Policy 0.000000 0.000000 0.000000 \n", "\n", " max \n", "2011_data 0.000000 \n", "2016_data 1.000000 \n", "donor_advisory 1.000000 \n", "donor_advisory_2011_to_2016 1.000000 \n", "donor_advisory_2016 1.000000 \n", "SOX_policies 3.000000 \n", "SOX_policies [t-1] 3.000000 \n", "SOX_policies_binary 1.000000 \n", "SOX_policies_binary [t-1] 1.000000 \n", "SOX_policies_all_binary 1.000000 \n", "SOX_policies_all_binary [t-1] 1.000000 \n", "conflict_of_interest_policy_v2 1.000000 \n", "conflict_of_interest_policy_v2 [t-1] 1.000000 \n", "whistleblower_policy_v2 1.000000 \n", "whistleblower_policy_v2 [t-1] 1.000000 \n", "records_retention_policy_v2 1.000000 \n", "records_retention_policy_v2 [t-1] 1.000000 \n", "program_efficiency 0.997111 \n", "program_efficiency [t-1] 0.997687 \n", "total_revenue_logged 21.967868 \n", "total_revenue_logged [t-1] 22.000798 \n", "complexity 8.000000 \n", "complexity_2011 3.000000 \n", "complexity_2011 [t-1] 3.000000 \n", "age 108.000000 \n", "category_Animals 1.000000 \n", "category_Arts, Culture, Humanities 1.000000 \n", "category_Community Development 1.000000 \n", "category_Education 1.000000 \n", "category_Environment 1.000000 \n", "category_Health 1.000000 \n", "category_Human Services 1.000000 \n", "category_Human and Civil Rights 1.000000 \n", "category_International 1.000000 \n", "category_Religion 1.000000 \n", "category_Research and Public Policy 1.000000 " ] }, "execution_count": 285, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod.describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Some of the orgs are missing *only* complexity_2011 values." ] }, { "cell_type": "code", "execution_count": 287, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "30\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINorg_idFYE2011_data2016_dataForm 990 FYEratings_systemdonor_advisorydonor_advisory_2011_to_2016donor_advisory_2016SOX_policiesSOX_policies [t-1]SOX_policies_binarySOX_policies_binary [t-1]SOX_policies_all_binarySOX_policies_all_binary [t-1]conflict_of_interest_policy_v2conflict_of_interest_policy_v2 [t-1]whistleblower_policy_v2whistleblower_policy_v2 [t-1]records_retention_policy_v2records_retention_policy_v2 [t-1]program_efficiencyprogram_efficiency [t-1]total_revenue_loggedtotal_revenue_logged [t-1]complexitycomplexity_2011complexity_2011 [t-1]agestatecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
34604331434610166FY20130.01.02013-12CN 2.10.01.00.03.0NaN1.0NaN1.0NaN1.0NaN1.0NaN1.0NaN0.870865NaN13.549098NaN2.0NaNNaN8.0MAHealth0.00.00.00.00.01.00.00.00.00.00.0
4500606851184902FY20150.01.02015-01CN 2.10.01.00.01.01.01.01.00.00.01.01.00.00.00.00.00.7569580.90969314.54696814.4901856.0NaNNaN78.0CTArts, Culture, Humanities0.01.00.00.00.00.00.00.00.00.00.0
11921331327418762FY20140.01.02014-12CN 2.10.01.00.03.0NaN1.0NaN1.0NaN1.0NaN1.0NaN1.0NaN0.668603NaN14.782318NaN2.0NaNNaN33.0NYAnimals1.00.00.00.00.00.00.00.00.00.00.0
19382215763007655FY20140.01.02014-12CN 2.10.01.00.03.03.01.01.01.01.01.01.01.01.01.01.00.7073590.89265716.44747019.1600545.0NaNNaN51.0TXHuman Services0.00.00.00.00.00.01.00.00.00.00.0
215222374605011555FY20140.01.02014-12CN 2.10.01.00.03.0NaN1.0NaN1.0NaN1.0NaN1.0NaN1.0NaN0.473208NaN17.364142NaN2.0NaNNaN15.0NJHuman Services0.00.00.00.00.00.01.00.00.00.00.0
\n", "
" ], "text/plain": [ " EIN org_id FYE 2011_data 2016_data Form 990 FYE \\\n", "346 043314346 10166 FY2013 0.0 1.0 2013-12 \n", "450 060685118 4902 FY2015 0.0 1.0 2015-01 \n", "1192 133132741 8762 FY2014 0.0 1.0 2014-12 \n", "1938 221576300 7655 FY2014 0.0 1.0 2014-12 \n", "2152 223746050 11555 FY2014 0.0 1.0 2014-12 \n", "\n", " ratings_system donor_advisory donor_advisory_2011_to_2016 \\\n", "346 CN 2.1 0.0 1.0 \n", "450 CN 2.1 0.0 1.0 \n", "1192 CN 2.1 0.0 1.0 \n", "1938 CN 2.1 0.0 1.0 \n", "2152 CN 2.1 0.0 1.0 \n", "\n", " donor_advisory_2016 SOX_policies SOX_policies [t-1] \\\n", "346 0.0 3.0 NaN \n", "450 0.0 1.0 1.0 \n", "1192 0.0 3.0 NaN \n", "1938 0.0 3.0 3.0 \n", "2152 0.0 3.0 NaN \n", "\n", " SOX_policies_binary SOX_policies_binary [t-1] SOX_policies_all_binary \\\n", "346 1.0 NaN 1.0 \n", "450 1.0 1.0 0.0 \n", "1192 1.0 NaN 1.0 \n", "1938 1.0 1.0 1.0 \n", "2152 1.0 NaN 1.0 \n", "\n", " SOX_policies_all_binary [t-1] conflict_of_interest_policy_v2 \\\n", "346 NaN 1.0 \n", "450 0.0 1.0 \n", "1192 NaN 1.0 \n", "1938 1.0 1.0 \n", "2152 NaN 1.0 \n", "\n", " conflict_of_interest_policy_v2 [t-1] whistleblower_policy_v2 \\\n", "346 NaN 1.0 \n", "450 1.0 0.0 \n", "1192 NaN 1.0 \n", "1938 1.0 1.0 \n", "2152 NaN 1.0 \n", "\n", " whistleblower_policy_v2 [t-1] records_retention_policy_v2 \\\n", "346 NaN 1.0 \n", "450 0.0 0.0 \n", "1192 NaN 1.0 \n", "1938 1.0 1.0 \n", "2152 NaN 1.0 \n", "\n", " records_retention_policy_v2 [t-1] program_efficiency \\\n", "346 NaN 0.870865 \n", "450 0.0 0.756958 \n", "1192 NaN 0.668603 \n", "1938 1.0 0.707359 \n", "2152 NaN 0.473208 \n", "\n", " program_efficiency [t-1] total_revenue_logged \\\n", "346 NaN 13.549098 \n", "450 0.909693 14.546968 \n", "1192 NaN 14.782318 \n", "1938 0.892657 16.447470 \n", "2152 NaN 17.364142 \n", "\n", " total_revenue_logged [t-1] complexity complexity_2011 \\\n", "346 NaN 2.0 NaN \n", "450 14.490185 6.0 NaN \n", "1192 NaN 2.0 NaN \n", "1938 19.160054 5.0 NaN \n", "2152 NaN 2.0 NaN \n", "\n", " complexity_2011 [t-1] age state category \\\n", "346 NaN 8.0 MA Health \n", "450 NaN 78.0 CT Arts, Culture, Humanities \n", "1192 NaN 33.0 NY Animals \n", "1938 NaN 51.0 TX Human Services \n", "2152 NaN 15.0 NJ Human Services \n", "\n", " category_Animals category_Arts, Culture, Humanities \\\n", "346 0.0 0.0 \n", "450 0.0 1.0 \n", "1192 1.0 0.0 \n", "1938 0.0 0.0 \n", "2152 0.0 0.0 \n", "\n", " category_Community Development category_Education \\\n", "346 0.0 0.0 \n", "450 0.0 0.0 \n", "1192 0.0 0.0 \n", "1938 0.0 0.0 \n", "2152 0.0 0.0 \n", "\n", " category_Environment category_Health category_Human Services \\\n", "346 0.0 1.0 0.0 \n", "450 0.0 0.0 0.0 \n", "1192 0.0 0.0 0.0 \n", "1938 0.0 0.0 1.0 \n", "2152 0.0 0.0 1.0 \n", "\n", " category_Human and Civil Rights category_International \\\n", "346 0.0 0.0 \n", "450 0.0 0.0 \n", "1192 0.0 0.0 \n", "1938 0.0 0.0 \n", "2152 0.0 0.0 \n", "\n", " category_Religion category_Research and Public Policy \n", "346 0.0 0.0 \n", "450 0.0 0.0 \n", "1192 0.0 0.0 \n", "1938 0.0 0.0 \n", "2152 0.0 0.0 " ] }, "execution_count": 287, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print len(df_2011_orgs_mod[df_2011_orgs_mod['complexity_2011 [t-1]'].isnull()])\n", "df_2011_orgs_mod[df_2011_orgs_mod['complexity_2011 [t-1]'].isnull()][:5]" ] }, { "cell_type": "code", "execution_count": 291, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "30\n", "0\n" ] } ], "source": [ "print len(df_2011_orgs_mod[df_2011_orgs_mod['complexity_2011 [t-1]'].isnull()])\n", "df_2011_orgs_mod['complexity_2011 [t-1]'] = np.where(df_2011_orgs_mod['complexity_2011 [t-1]'].isnull(),\n", " df_2011_orgs_mod['complexity'],\n", " df_2011_orgs_mod['complexity_2011 [t-1]'])\n", "print len(df_2011_orgs_mod[df_2011_orgs_mod['complexity_2011 [t-1]'].isnull()])" ] }, { "cell_type": "code", "execution_count": 292, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
2011_data4857.00.0000000.0000000.0000000.0000000.0000000.0000000.000000
2016_data4857.01.0000000.0000001.0000001.0000001.0000001.0000001.000000
donor_advisory4857.00.0096770.0979030.0000000.0000000.0000000.0000001.000000
donor_advisory_2011_to_20164857.00.0222360.1474650.0000000.0000000.0000000.0000001.000000
donor_advisory_20164857.00.0096770.0979030.0000000.0000000.0000000.0000001.000000
SOX_policies4857.02.8704960.4882350.0000003.0000003.0000003.0000003.000000
SOX_policies [t-1]4832.02.5329060.8700220.000000NaNNaNNaN3.000000
SOX_policies_binary4857.00.9860000.1175040.0000001.0000001.0000001.0000001.000000
SOX_policies_binary [t-1]4832.00.9472270.2236030.000000NaNNaNNaN1.000000
SOX_policies_all_binary4857.00.9192920.2724150.0000001.0000001.0000001.0000001.000000
SOX_policies_all_binary [t-1]4832.00.7340650.4418760.000000NaNNaNNaN1.000000
conflict_of_interest_policy_v24857.00.9820880.1326460.0000001.0000001.0000001.0000001.000000
conflict_of_interest_policy_v2 [t-1]4832.00.9335680.2490620.000000NaNNaNNaN1.000000
whistleblower_policy_v24857.00.9477040.2226460.0000001.0000001.0000001.0000001.000000
whistleblower_policy_v2 [t-1]4832.00.7998760.4001350.000000NaNNaNNaN1.000000
records_retention_policy_v24857.00.9407040.2362020.0000001.0000001.0000001.0000001.000000
records_retention_policy_v2 [t-1]4832.00.7994620.4004440.000000NaNNaNNaN1.000000
program_efficiency4857.00.8040640.1074860.0038330.7559600.8167990.8691410.997111
program_efficiency [t-1]4832.00.8046020.1055740.022177NaNNaNNaN0.997687
total_revenue_logged4857.015.7614421.3813950.00000014.79201415.60409216.57603521.967868
total_revenue_logged [t-1]4832.015.4612171.6554400.000000NaNNaNNaN22.000798
complexity4857.04.0230601.3381530.0000003.0000004.0000005.0000008.000000
complexity_201141.02.1219510.5096631.000000NaNNaNNaN3.000000
complexity_2011 [t-1]4857.02.4700430.5361500.0000002.0000002.0000003.0000006.000000
age4857.040.03520719.2268760.00000025.00000035.00000052.000000108.000000
category_Animals4857.00.0770020.2666220.0000000.0000000.0000000.0000001.000000
category_Arts, Culture, Humanities4857.00.1393860.3463850.0000000.0000000.0000000.0000001.000000
category_Community Development4857.00.0796790.2708230.0000000.0000000.0000000.0000001.000000
category_Education4857.00.0582660.2342710.0000000.0000000.0000000.0000001.000000
category_Environment4857.00.0658840.2481050.0000000.0000000.0000000.0000001.000000
category_Health4857.00.1192090.3240680.0000000.0000000.0000000.0000001.000000
category_Human Services4857.00.2476840.4317120.0000000.0000000.0000000.0000001.000000
category_Human and Civil Rights4857.00.0374720.1899340.0000000.0000000.0000000.0000001.000000
category_International4857.00.0893560.2852860.0000000.0000000.0000000.0000001.000000
category_Religion4857.00.0609430.2392500.0000000.0000000.0000000.0000001.000000
category_Research and Public Policy4857.00.0251180.1565010.0000000.0000000.0000000.0000001.000000
\n", "
" ], "text/plain": [ " count mean std min \\\n", "2011_data 4857.0 0.000000 0.000000 0.000000 \n", "2016_data 4857.0 1.000000 0.000000 1.000000 \n", "donor_advisory 4857.0 0.009677 0.097903 0.000000 \n", "donor_advisory_2011_to_2016 4857.0 0.022236 0.147465 0.000000 \n", "donor_advisory_2016 4857.0 0.009677 0.097903 0.000000 \n", "SOX_policies 4857.0 2.870496 0.488235 0.000000 \n", "SOX_policies [t-1] 4832.0 2.532906 0.870022 0.000000 \n", "SOX_policies_binary 4857.0 0.986000 0.117504 0.000000 \n", "SOX_policies_binary [t-1] 4832.0 0.947227 0.223603 0.000000 \n", "SOX_policies_all_binary 4857.0 0.919292 0.272415 0.000000 \n", "SOX_policies_all_binary [t-1] 4832.0 0.734065 0.441876 0.000000 \n", "conflict_of_interest_policy_v2 4857.0 0.982088 0.132646 0.000000 \n", "conflict_of_interest_policy_v2 [t-1] 4832.0 0.933568 0.249062 0.000000 \n", "whistleblower_policy_v2 4857.0 0.947704 0.222646 0.000000 \n", "whistleblower_policy_v2 [t-1] 4832.0 0.799876 0.400135 0.000000 \n", "records_retention_policy_v2 4857.0 0.940704 0.236202 0.000000 \n", "records_retention_policy_v2 [t-1] 4832.0 0.799462 0.400444 0.000000 \n", "program_efficiency 4857.0 0.804064 0.107486 0.003833 \n", "program_efficiency [t-1] 4832.0 0.804602 0.105574 0.022177 \n", "total_revenue_logged 4857.0 15.761442 1.381395 0.000000 \n", "total_revenue_logged [t-1] 4832.0 15.461217 1.655440 0.000000 \n", "complexity 4857.0 4.023060 1.338153 0.000000 \n", "complexity_2011 41.0 2.121951 0.509663 1.000000 \n", "complexity_2011 [t-1] 4857.0 2.470043 0.536150 0.000000 \n", "age 4857.0 40.035207 19.226876 0.000000 \n", "category_Animals 4857.0 0.077002 0.266622 0.000000 \n", "category_Arts, Culture, Humanities 4857.0 0.139386 0.346385 0.000000 \n", "category_Community Development 4857.0 0.079679 0.270823 0.000000 \n", "category_Education 4857.0 0.058266 0.234271 0.000000 \n", "category_Environment 4857.0 0.065884 0.248105 0.000000 \n", "category_Health 4857.0 0.119209 0.324068 0.000000 \n", "category_Human Services 4857.0 0.247684 0.431712 0.000000 \n", "category_Human and Civil Rights 4857.0 0.037472 0.189934 0.000000 \n", "category_International 4857.0 0.089356 0.285286 0.000000 \n", "category_Religion 4857.0 0.060943 0.239250 0.000000 \n", "category_Research and Public Policy 4857.0 0.025118 0.156501 0.000000 \n", "\n", " 25% 50% 75% \\\n", "2011_data 0.000000 0.000000 0.000000 \n", "2016_data 1.000000 1.000000 1.000000 \n", "donor_advisory 0.000000 0.000000 0.000000 \n", "donor_advisory_2011_to_2016 0.000000 0.000000 0.000000 \n", "donor_advisory_2016 0.000000 0.000000 0.000000 \n", "SOX_policies 3.000000 3.000000 3.000000 \n", "SOX_policies [t-1] NaN NaN NaN \n", "SOX_policies_binary 1.000000 1.000000 1.000000 \n", "SOX_policies_binary [t-1] NaN NaN NaN \n", "SOX_policies_all_binary 1.000000 1.000000 1.000000 \n", "SOX_policies_all_binary [t-1] NaN NaN NaN \n", "conflict_of_interest_policy_v2 1.000000 1.000000 1.000000 \n", "conflict_of_interest_policy_v2 [t-1] NaN NaN NaN \n", "whistleblower_policy_v2 1.000000 1.000000 1.000000 \n", "whistleblower_policy_v2 [t-1] NaN NaN NaN \n", "records_retention_policy_v2 1.000000 1.000000 1.000000 \n", "records_retention_policy_v2 [t-1] NaN NaN NaN \n", "program_efficiency 0.755960 0.816799 0.869141 \n", "program_efficiency [t-1] NaN NaN NaN \n", "total_revenue_logged 14.792014 15.604092 16.576035 \n", "total_revenue_logged [t-1] NaN NaN NaN \n", "complexity 3.000000 4.000000 5.000000 \n", "complexity_2011 NaN NaN NaN \n", "complexity_2011 [t-1] 2.000000 2.000000 3.000000 \n", "age 25.000000 35.000000 52.000000 \n", "category_Animals 0.000000 0.000000 0.000000 \n", "category_Arts, Culture, Humanities 0.000000 0.000000 0.000000 \n", "category_Community Development 0.000000 0.000000 0.000000 \n", "category_Education 0.000000 0.000000 0.000000 \n", "category_Environment 0.000000 0.000000 0.000000 \n", "category_Health 0.000000 0.000000 0.000000 \n", "category_Human Services 0.000000 0.000000 0.000000 \n", "category_Human and Civil Rights 0.000000 0.000000 0.000000 \n", "category_International 0.000000 0.000000 0.000000 \n", "category_Religion 0.000000 0.000000 0.000000 \n", "category_Research and Public Policy 0.000000 0.000000 0.000000 \n", "\n", " max \n", "2011_data 0.000000 \n", "2016_data 1.000000 \n", "donor_advisory 1.000000 \n", "donor_advisory_2011_to_2016 1.000000 \n", "donor_advisory_2016 1.000000 \n", "SOX_policies 3.000000 \n", "SOX_policies [t-1] 3.000000 \n", "SOX_policies_binary 1.000000 \n", "SOX_policies_binary [t-1] 1.000000 \n", "SOX_policies_all_binary 1.000000 \n", "SOX_policies_all_binary [t-1] 1.000000 \n", "conflict_of_interest_policy_v2 1.000000 \n", "conflict_of_interest_policy_v2 [t-1] 1.000000 \n", "whistleblower_policy_v2 1.000000 \n", "whistleblower_policy_v2 [t-1] 1.000000 \n", "records_retention_policy_v2 1.000000 \n", "records_retention_policy_v2 [t-1] 1.000000 \n", "program_efficiency 0.997111 \n", "program_efficiency [t-1] 0.997687 \n", "total_revenue_logged 21.967868 \n", "total_revenue_logged [t-1] 22.000798 \n", "complexity 8.000000 \n", "complexity_2011 3.000000 \n", "complexity_2011 [t-1] 6.000000 \n", "age 108.000000 \n", "category_Animals 1.000000 \n", "category_Arts, Culture, Humanities 1.000000 \n", "category_Community Development 1.000000 \n", "category_Education 1.000000 \n", "category_Environment 1.000000 \n", "category_Health 1.000000 \n", "category_Human Services 1.000000 \n", "category_Human and Civil Rights 1.000000 \n", "category_International 1.000000 \n", "category_Religion 1.000000 \n", "category_Research and Public Policy 1.000000 " ] }, "execution_count": 292, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod.describe().T" ] }, { "cell_type": "code", "execution_count": 293, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "47\n", "108\n", "47\n" ] } ], "source": [ "print len(df_2011_orgs_mod[df_2011_orgs_mod['donor_advisory']==1])\n", "print len(df_2011_orgs_mod[df_2011_orgs_mod['donor_advisory_2011_to_2016']==1])\n", "print len(df_2011_orgs_mod[df_2011_orgs_mod['donor_advisory_2016']==1])" ] }, { "cell_type": "code", "execution_count": 294, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#df_2011_orgs_mod[df_2011_orgs_mod['donor_advisory']==1][:5]" ] }, { "cell_type": "code", "execution_count": 295, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['EIN', 'org_id', 'FYE', '2011_data', '2016_data', 'Form 990 FYE', 'ratings_system', 'donor_advisory', 'donor_advisory_2011_to_2016', 'donor_advisory_2016', 'SOX_policies', 'SOX_policies [t-1]', 'SOX_policies_binary', 'SOX_policies_binary [t-1]', 'SOX_policies_all_binary', 'SOX_policies_all_binary [t-1]', 'conflict_of_interest_policy_v2', 'conflict_of_interest_policy_v2 [t-1]', 'whistleblower_policy_v2', 'whistleblower_policy_v2 [t-1]', 'records_retention_policy_v2', 'records_retention_policy_v2 [t-1]', 'program_efficiency', 'program_efficiency [t-1]', 'total_revenue_logged', 'total_revenue_logged [t-1]', 'complexity', 'complexity_2011', 'complexity_2011 [t-1]', 'age', 'state', 'category', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ "print df_2011_orgs_mod.columns.tolist()" ] }, { "cell_type": "code", "execution_count": 296, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINorg_idFYE2011_data2016_dataForm 990 FYEratings_systemdonor_advisorydonor_advisory_2011_to_2016donor_advisory_2016SOX_policiesSOX_policies [t-1]SOX_policies_binarySOX_policies_binary [t-1]SOX_policies_all_binarySOX_policies_all_binary [t-1]conflict_of_interest_policy_v2conflict_of_interest_policy_v2 [t-1]whistleblower_policy_v2whistleblower_policy_v2 [t-1]records_retention_policy_v2records_retention_policy_v2 [t-1]program_efficiencyprogram_efficiency [t-1]total_revenue_loggedtotal_revenue_logged [t-1]complexitycomplexity_2011complexity_2011 [t-1]agestatecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
00102024675954FY20140.01.02014-12CN 2.10.00.00.03.03.01.01.01.01.01.01.01.01.01.01.00.7944570.78889516.37799315.9475636.0NaN3.062.0MEResearch and Public Policy0.00.00.00.00.00.00.00.00.00.01.0
\n", "
" ], "text/plain": [ " EIN org_id FYE 2011_data 2016_data Form 990 FYE ratings_system \\\n", "0 010202467 5954 FY2014 0.0 1.0 2014-12 CN 2.1 \n", "\n", " donor_advisory donor_advisory_2011_to_2016 donor_advisory_2016 \\\n", "0 0.0 0.0 0.0 \n", "\n", " SOX_policies SOX_policies [t-1] SOX_policies_binary \\\n", "0 3.0 3.0 1.0 \n", "\n", " SOX_policies_binary [t-1] SOX_policies_all_binary \\\n", "0 1.0 1.0 \n", "\n", " SOX_policies_all_binary [t-1] conflict_of_interest_policy_v2 \\\n", "0 1.0 1.0 \n", "\n", " conflict_of_interest_policy_v2 [t-1] whistleblower_policy_v2 \\\n", "0 1.0 1.0 \n", "\n", " whistleblower_policy_v2 [t-1] records_retention_policy_v2 \\\n", "0 1.0 1.0 \n", "\n", " records_retention_policy_v2 [t-1] program_efficiency \\\n", "0 1.0 0.794457 \n", "\n", " program_efficiency [t-1] total_revenue_logged total_revenue_logged [t-1] \\\n", "0 0.788895 16.377993 15.947563 \n", "\n", " complexity complexity_2011 complexity_2011 [t-1] age state \\\n", "0 6.0 NaN 3.0 62.0 ME \n", "\n", " category category_Animals \\\n", "0 Research and Public Policy 0.0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "0 0.0 0.0 \n", "\n", " category_Education category_Environment category_Health \\\n", "0 0.0 0.0 0.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "0 0.0 0.0 \n", "\n", " category_International category_Religion \\\n", "0 0.0 0.0 \n", "\n", " category_Research and Public Policy \n", "0 1.0 " ] }, "execution_count": 296, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[:1]" ] }, { "cell_type": "code", "execution_count": 297, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINorg_iddonor_advisory_2011_to_2016donor_advisory_2016SOX_policiesSOX_policies [t-1]SOX_policies_binarySOX_policies_binary [t-1]SOX_policies_all_binarySOX_policies_all_binary [t-1]conflict_of_interest_policy_v2conflict_of_interest_policy_v2 [t-1]whistleblower_policy_v2whistleblower_policy_v2 [t-1]records_retention_policy_v2records_retention_policy_v2 [t-1]program_efficiency [t-1]total_revenue_logged [t-1]complexity_2011 [t-1]agestatecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policy
001020246759540.00.03.03.01.01.01.01.01.01.01.01.01.01.00.78889515.9475633.062.0MEResearch and Public Policy0.00.00.00.00.00.00.00.00.00.01.0
201021151339160.00.03.03.01.01.01.01.01.01.01.01.01.01.00.85885119.1152373.066.0MEHealth0.00.00.00.00.01.00.00.00.00.00.0
401021244277360.00.03.03.01.01.01.01.01.01.01.01.01.01.00.91865115.4980732.070.0MEHuman Services0.00.00.00.00.00.01.00.00.00.00.0
6010215910109650.00.02.01.01.01.00.00.01.01.01.00.00.00.00.71405813.8531192.039.0MEAnimals1.00.00.00.00.00.00.00.00.00.00.0
\n", "
" ], "text/plain": [ " EIN org_id donor_advisory_2011_to_2016 donor_advisory_2016 \\\n", "0 010202467 5954 0.0 0.0 \n", "2 010211513 3916 0.0 0.0 \n", "4 010212442 7736 0.0 0.0 \n", "6 010215910 10965 0.0 0.0 \n", "\n", " SOX_policies SOX_policies [t-1] SOX_policies_binary \\\n", "0 3.0 3.0 1.0 \n", "2 3.0 3.0 1.0 \n", "4 3.0 3.0 1.0 \n", "6 2.0 1.0 1.0 \n", "\n", " SOX_policies_binary [t-1] SOX_policies_all_binary \\\n", "0 1.0 1.0 \n", "2 1.0 1.0 \n", "4 1.0 1.0 \n", "6 1.0 0.0 \n", "\n", " SOX_policies_all_binary [t-1] conflict_of_interest_policy_v2 \\\n", "0 1.0 1.0 \n", "2 1.0 1.0 \n", "4 1.0 1.0 \n", "6 0.0 1.0 \n", "\n", " conflict_of_interest_policy_v2 [t-1] whistleblower_policy_v2 \\\n", "0 1.0 1.0 \n", "2 1.0 1.0 \n", "4 1.0 1.0 \n", "6 1.0 1.0 \n", "\n", " whistleblower_policy_v2 [t-1] records_retention_policy_v2 \\\n", "0 1.0 1.0 \n", "2 1.0 1.0 \n", "4 1.0 1.0 \n", "6 0.0 0.0 \n", "\n", " records_retention_policy_v2 [t-1] program_efficiency [t-1] \\\n", "0 1.0 0.788895 \n", "2 1.0 0.858851 \n", "4 1.0 0.918651 \n", "6 0.0 0.714058 \n", "\n", " total_revenue_logged [t-1] complexity_2011 [t-1] age state \\\n", "0 15.947563 3.0 62.0 ME \n", "2 19.115237 3.0 66.0 ME \n", "4 15.498073 2.0 70.0 ME \n", "6 13.853119 2.0 39.0 ME \n", "\n", " category category_Animals \\\n", "0 Research and Public Policy 0.0 \n", "2 Health 0.0 \n", "4 Human Services 0.0 \n", "6 Animals 1.0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "0 0.0 0.0 \n", "2 0.0 0.0 \n", "4 0.0 0.0 \n", "6 0.0 0.0 \n", "\n", " category_Education category_Environment category_Health \\\n", "0 0.0 0.0 0.0 \n", "2 0.0 0.0 1.0 \n", "4 0.0 0.0 0.0 \n", "6 0.0 0.0 0.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "0 0.0 0.0 \n", "2 0.0 0.0 \n", "4 1.0 0.0 \n", "6 0.0 0.0 \n", "\n", " category_International category_Religion \\\n", "0 0.0 0.0 \n", "2 0.0 0.0 \n", "4 0.0 0.0 \n", "6 0.0 0.0 \n", "\n", " category_Research and Public Policy \n", "0 1.0 \n", "2 0.0 \n", "4 0.0 \n", "6 0.0 " ] }, "execution_count": 297, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#'2011_data', '2016_data', 'FYE', 'Form 990 FYE', 'ratings_system', 'donor_advisory', \n", "#'complexity', 'complexity_2011',\n", "#'program_efficiency', 'total_revenue_logged', \n", "df_2011_orgs_mod = df_2011_orgs_mod[['EIN', 'org_id', \n", " 'donor_advisory_2011_to_2016', 'donor_advisory_2016',\n", " 'SOX_policies', 'SOX_policies [t-1]', 'SOX_policies_binary', 'SOX_policies_binary [t-1]',\n", " 'SOX_policies_all_binary', 'SOX_policies_all_binary [t-1]',\n", " 'conflict_of_interest_policy_v2', 'conflict_of_interest_policy_v2 [t-1]',\n", " 'whistleblower_policy_v2', 'whistleblower_policy_v2 [t-1]',\n", " 'records_retention_policy_v2', 'records_retention_policy_v2 [t-1]', \n", " 'program_efficiency [t-1]',\n", " 'total_revenue_logged [t-1]', \n", " 'complexity_2011 [t-1]', \n", " 'age', 'state', 'category', \n", " 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', \n", " 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', \n", " 'category_Human and Civil Rights', 'category_International', 'category_Religion', \n", " 'category_Research and Public Policy'\n", " ]]\n", "df_2011_orgs_mod[:4]" ] }, { "cell_type": "code", "execution_count": 298, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
donor_advisory_2011_to_20164857.00.0222360.1474650.0000000.00.00.01.000000
donor_advisory_20164857.00.0096770.0979030.0000000.00.00.01.000000
SOX_policies4857.02.8704960.4882350.0000003.03.03.03.000000
SOX_policies [t-1]4832.02.5329060.8700220.000000NaNNaNNaN3.000000
SOX_policies_binary4857.00.9860000.1175040.0000001.01.01.01.000000
SOX_policies_binary [t-1]4832.00.9472270.2236030.000000NaNNaNNaN1.000000
SOX_policies_all_binary4857.00.9192920.2724150.0000001.01.01.01.000000
SOX_policies_all_binary [t-1]4832.00.7340650.4418760.000000NaNNaNNaN1.000000
conflict_of_interest_policy_v24857.00.9820880.1326460.0000001.01.01.01.000000
conflict_of_interest_policy_v2 [t-1]4832.00.9335680.2490620.000000NaNNaNNaN1.000000
whistleblower_policy_v24857.00.9477040.2226460.0000001.01.01.01.000000
whistleblower_policy_v2 [t-1]4832.00.7998760.4001350.000000NaNNaNNaN1.000000
records_retention_policy_v24857.00.9407040.2362020.0000001.01.01.01.000000
records_retention_policy_v2 [t-1]4832.00.7994620.4004440.000000NaNNaNNaN1.000000
program_efficiency [t-1]4832.00.8046020.1055740.022177NaNNaNNaN0.997687
total_revenue_logged [t-1]4832.015.4612171.6554400.000000NaNNaNNaN22.000798
complexity_2011 [t-1]4857.02.4700430.5361500.0000002.02.03.06.000000
age4857.040.03520719.2268760.00000025.035.052.0108.000000
category_Animals4857.00.0770020.2666220.0000000.00.00.01.000000
category_Arts, Culture, Humanities4857.00.1393860.3463850.0000000.00.00.01.000000
category_Community Development4857.00.0796790.2708230.0000000.00.00.01.000000
category_Education4857.00.0582660.2342710.0000000.00.00.01.000000
category_Environment4857.00.0658840.2481050.0000000.00.00.01.000000
category_Health4857.00.1192090.3240680.0000000.00.00.01.000000
category_Human Services4857.00.2476840.4317120.0000000.00.00.01.000000
category_Human and Civil Rights4857.00.0374720.1899340.0000000.00.00.01.000000
category_International4857.00.0893560.2852860.0000000.00.00.01.000000
category_Religion4857.00.0609430.2392500.0000000.00.00.01.000000
category_Research and Public Policy4857.00.0251180.1565010.0000000.00.00.01.000000
\n", "
" ], "text/plain": [ " count mean std min \\\n", "donor_advisory_2011_to_2016 4857.0 0.022236 0.147465 0.000000 \n", "donor_advisory_2016 4857.0 0.009677 0.097903 0.000000 \n", "SOX_policies 4857.0 2.870496 0.488235 0.000000 \n", "SOX_policies [t-1] 4832.0 2.532906 0.870022 0.000000 \n", "SOX_policies_binary 4857.0 0.986000 0.117504 0.000000 \n", "SOX_policies_binary [t-1] 4832.0 0.947227 0.223603 0.000000 \n", "SOX_policies_all_binary 4857.0 0.919292 0.272415 0.000000 \n", "SOX_policies_all_binary [t-1] 4832.0 0.734065 0.441876 0.000000 \n", "conflict_of_interest_policy_v2 4857.0 0.982088 0.132646 0.000000 \n", "conflict_of_interest_policy_v2 [t-1] 4832.0 0.933568 0.249062 0.000000 \n", "whistleblower_policy_v2 4857.0 0.947704 0.222646 0.000000 \n", "whistleblower_policy_v2 [t-1] 4832.0 0.799876 0.400135 0.000000 \n", "records_retention_policy_v2 4857.0 0.940704 0.236202 0.000000 \n", "records_retention_policy_v2 [t-1] 4832.0 0.799462 0.400444 0.000000 \n", "program_efficiency [t-1] 4832.0 0.804602 0.105574 0.022177 \n", "total_revenue_logged [t-1] 4832.0 15.461217 1.655440 0.000000 \n", "complexity_2011 [t-1] 4857.0 2.470043 0.536150 0.000000 \n", "age 4857.0 40.035207 19.226876 0.000000 \n", "category_Animals 4857.0 0.077002 0.266622 0.000000 \n", "category_Arts, Culture, Humanities 4857.0 0.139386 0.346385 0.000000 \n", "category_Community Development 4857.0 0.079679 0.270823 0.000000 \n", "category_Education 4857.0 0.058266 0.234271 0.000000 \n", "category_Environment 4857.0 0.065884 0.248105 0.000000 \n", "category_Health 4857.0 0.119209 0.324068 0.000000 \n", "category_Human Services 4857.0 0.247684 0.431712 0.000000 \n", "category_Human and Civil Rights 4857.0 0.037472 0.189934 0.000000 \n", "category_International 4857.0 0.089356 0.285286 0.000000 \n", "category_Religion 4857.0 0.060943 0.239250 0.000000 \n", "category_Research and Public Policy 4857.0 0.025118 0.156501 0.000000 \n", "\n", " 25% 50% 75% max \n", "donor_advisory_2011_to_2016 0.0 0.0 0.0 1.000000 \n", "donor_advisory_2016 0.0 0.0 0.0 1.000000 \n", "SOX_policies 3.0 3.0 3.0 3.000000 \n", "SOX_policies [t-1] NaN NaN NaN 3.000000 \n", "SOX_policies_binary 1.0 1.0 1.0 1.000000 \n", "SOX_policies_binary [t-1] NaN NaN NaN 1.000000 \n", "SOX_policies_all_binary 1.0 1.0 1.0 1.000000 \n", "SOX_policies_all_binary [t-1] NaN NaN NaN 1.000000 \n", "conflict_of_interest_policy_v2 1.0 1.0 1.0 1.000000 \n", "conflict_of_interest_policy_v2 [t-1] NaN NaN NaN 1.000000 \n", "whistleblower_policy_v2 1.0 1.0 1.0 1.000000 \n", "whistleblower_policy_v2 [t-1] NaN NaN NaN 1.000000 \n", "records_retention_policy_v2 1.0 1.0 1.0 1.000000 \n", "records_retention_policy_v2 [t-1] NaN NaN NaN 1.000000 \n", "program_efficiency [t-1] NaN NaN NaN 0.997687 \n", "total_revenue_logged [t-1] NaN NaN NaN 22.000798 \n", "complexity_2011 [t-1] 2.0 2.0 3.0 6.000000 \n", "age 25.0 35.0 52.0 108.000000 \n", "category_Animals 0.0 0.0 0.0 1.000000 \n", "category_Arts, Culture, Humanities 0.0 0.0 0.0 1.000000 \n", "category_Community Development 0.0 0.0 0.0 1.000000 \n", "category_Education 0.0 0.0 0.0 1.000000 \n", "category_Environment 0.0 0.0 0.0 1.000000 \n", "category_Health 0.0 0.0 0.0 1.000000 \n", "category_Human Services 0.0 0.0 0.0 1.000000 \n", "category_Human and Civil Rights 0.0 0.0 0.0 1.000000 \n", "category_International 0.0 0.0 0.0 1.000000 \n", "category_Religion 0.0 0.0 0.0 1.000000 \n", "category_Research and Public Policy 0.0 0.0 0.0 1.000000 " ] }, "execution_count": 298, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod.describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Save DF" ] }, { "cell_type": "code", "execution_count": 300, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_2011_orgs_mod.to_pickle('df_2011_orgs_mod_v3 (single year combined).pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate Change Variables" ] }, { "cell_type": "code", "execution_count": 301, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['EIN', 'org_id', 'donor_advisory_2011_to_2016', 'donor_advisory_2016', 'SOX_policies', 'SOX_policies [t-1]', 'SOX_policies_binary', 'SOX_policies_binary [t-1]', 'SOX_policies_all_binary', 'SOX_policies_all_binary [t-1]', 'conflict_of_interest_policy_v2', 'conflict_of_interest_policy_v2 [t-1]', 'whistleblower_policy_v2', 'whistleblower_policy_v2 [t-1]', 'records_retention_policy_v2', 'records_retention_policy_v2 [t-1]', 'program_efficiency [t-1]', 'total_revenue_logged [t-1]', 'complexity_2011 [t-1]', 'age', 'state', 'category', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy']\n" ] } ], "source": [ "print df_2011_orgs_mod.columns.tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Number of New SOX Policies Added" ] }, { "cell_type": "code", "execution_count": 303, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ " 0.0 3789\n", " 1.0 494\n", " 2.0 356\n", " 3.0 158\n", "-1.0 29\n", "-2.0 4\n", "-3.0 2\n", "Name: number_of_SOX_policies_added, dtype: int64" ] }, "execution_count": 303, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod['number_of_SOX_policies_added'] = df_2011_orgs_mod['SOX_policies'] - \\\n", " df_2011_orgs_mod['SOX_policies [t-1]'] \n", "df_2011_orgs_mod['number_of_SOX_policies_added'].value_counts() " ] }, { "cell_type": "code", "execution_count": 304, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINorg_iddonor_advisory_2011_to_2016donor_advisory_2016SOX_policiesSOX_policies [t-1]SOX_policies_binarySOX_policies_binary [t-1]SOX_policies_all_binarySOX_policies_all_binary [t-1]conflict_of_interest_policy_v2conflict_of_interest_policy_v2 [t-1]whistleblower_policy_v2whistleblower_policy_v2 [t-1]records_retention_policy_v2records_retention_policy_v2 [t-1]program_efficiency [t-1]total_revenue_logged [t-1]complexity_2011 [t-1]agestatecategorycategory_Animalscategory_Arts, Culture, Humanitiescategory_Community Developmentcategory_Educationcategory_Environmentcategory_Healthcategory_Human Servicescategory_Human and Civil Rightscategory_Internationalcategory_Religioncategory_Research and Public Policynumber_of_SOX_policies_added
68811610148767110.00.01.02.01.01.00.00.00.00.00.01.01.01.00.82504414.1654642.049.0NYAnimals1.00.00.00.00.00.00.00.00.00.00.0-1.0
86813177741389260.00.00.01.00.01.00.00.00.01.00.00.00.00.00.62267613.9016752.047.0NYHealth0.00.00.00.00.01.00.00.00.00.00.0-1.0
1274133355315110130.00.02.03.01.01.00.01.01.01.01.01.00.01.00.77709314.1622823.021.0NYEducation0.00.00.01.00.00.00.00.00.00.00.0-1.0
145013374064077700.00.00.01.00.01.00.00.00.01.00.00.00.00.00.88880415.6865791.021.0NYReligion0.00.00.00.00.00.00.00.00.01.00.0-1.0
1492133843122104920.00.01.02.01.01.00.00.00.01.00.00.01.01.00.83578414.6689432.020.0NYHuman and Civil Rights0.00.00.00.00.00.00.01.00.00.00.0-1.0
205222262453240600.00.02.03.01.01.00.01.01.01.01.01.00.01.00.94227115.6898192.031.0CAInternational0.00.00.00.00.00.00.00.01.00.00.0-1.0
2058222664361106130.00.01.02.01.01.00.00.00.01.01.00.00.01.00.88881914.2067292.031.0NJReligion0.00.00.00.00.00.00.00.00.01.00.0-1.0
228823213983194100.00.02.03.01.01.00.01.01.01.00.01.01.01.00.72965814.7448672.036.0PAEducation0.00.00.01.00.00.00.00.00.00.00.0-1.0
253423715917260810.00.01.02.01.01.00.00.01.01.00.00.00.01.00.69090414.6434193.044.0TNEnvironment0.00.00.00.01.00.00.00.00.00.00.0-1.0
264823728409268560.00.02.03.01.01.00.01.01.01.00.01.01.01.00.71247612.9519692.043.0DCResearch and Public Policy0.00.00.00.00.00.00.00.00.00.01.0-1.0
2972300335420128010.00.02.03.01.01.00.01.01.01.01.01.00.01.00.75551315.0115792.010.0DCInternational0.00.00.00.00.00.00.00.01.00.00.0-1.0
305231100579268960.00.01.03.01.01.00.01.01.01.00.01.00.01.00.81130715.1224923.035.0INEnvironment0.00.00.00.01.00.00.00.00.00.00.0-2.0
351636216772533110.00.02.03.01.01.00.01.01.01.01.01.00.01.00.88605519.3622133.091.0ILEducation0.00.00.01.00.00.00.00.00.00.00.0-1.0
356436221798132480.00.02.03.01.01.00.01.00.01.01.01.01.01.00.60741917.6205583.063.0DCHealth0.00.00.00.00.01.00.00.00.00.00.0-1.0
404638288282395040.00.01.02.01.01.00.00.01.01.00.00.00.01.00.60581713.6243093.027.0MIHuman and Civil Rights0.00.00.00.00.00.00.01.00.00.00.0-1.0
413239092109380350.00.02.03.01.01.00.01.01.01.00.01.01.01.00.84316515.5326963.060.0WIArts, Culture, Humanities0.01.00.00.00.00.00.00.00.00.00.0-1.0
420239186229053220.00.02.03.01.01.00.01.01.01.00.01.01.01.00.90320914.3775652.019.0WIArts, Culture, Humanities0.01.00.00.00.00.00.00.00.00.00.0-1.0
491452095360978440.00.02.03.01.01.00.01.01.01.00.01.01.01.00.74405514.4848882.044.0MDHealth0.00.00.00.00.01.00.00.00.00.00.0-1.0
5134521309876102280.00.02.03.01.01.00.01.01.01.01.01.00.01.00.81790314.1379532.033.0CAArts, Culture, Humanities0.01.00.00.00.00.00.00.00.00.00.0-1.0
555453016244054750.00.02.03.01.01.00.01.01.01.01.01.00.01.00.74705615.2767662.079.0DCAnimals1.00.00.00.00.00.00.00.00.00.00.0-1.0
6074576000192104740.00.01.02.01.01.00.00.01.01.00.00.00.01.00.82426614.7637641.070.0SCArts, Culture, Humanities0.01.00.00.00.00.00.00.00.00.00.0-1.0
616658130347655210.00.00.03.00.01.00.01.00.01.00.01.00.01.00.83794614.0534882.038.0GAReligion0.00.00.00.00.00.00.00.00.01.00.0-3.0
631658197441055420.00.01.03.01.01.00.01.01.01.00.01.00.01.00.93903914.7592483.024.0GAAnimals1.00.00.00.00.00.00.00.00.00.00.0-2.0
690062067097272510.00.02.03.01.01.00.01.01.01.01.01.00.01.00.83864116.1490662.052.0TNHuman Services0.00.00.00.00.00.01.00.00.00.00.0-1.0
7084650746714103791.00.01.03.01.01.00.01.00.01.00.01.01.01.00.92185114.4823822.018.0FLHuman Services0.00.00.00.00.00.01.00.00.00.00.0-2.0
723272129779583080.00.00.01.00.01.00.00.00.01.00.00.00.00.00.10370514.0961092.020.0LAHealth0.00.00.00.00.01.00.00.00.00.00.0-1.0
727473102605772900.00.02.03.01.01.00.01.01.01.01.01.00.01.00.64612313.2348112.039.0NCReligion0.00.00.00.00.00.00.00.00.01.00.0-1.0
736474146946559390.00.02.03.01.01.00.01.01.01.00.01.01.01.00.89399916.0090592.058.0TXHuman Services0.00.00.00.00.00.01.00.00.00.00.0-1.0
813085043711473690.00.02.03.01.01.00.01.01.01.00.01.01.01.00.78198916.1149402.020.0NMArts, Culture, Humanities0.01.00.00.00.00.00.00.00.00.00.0-1.0
844291150819132310.00.02.03.01.01.00.01.01.01.00.01.01.01.00.88481115.9332013.020.0WAEducation0.00.00.01.00.00.00.00.00.00.00.0-1.0
863893100930574170.00.00.03.00.01.00.01.00.01.00.01.00.01.00.77169315.4366843.026.0ORArts, Culture, Humanities0.01.00.00.00.00.00.00.00.00.00.0-3.0
883694229774697750.00.01.03.01.01.00.01.01.01.00.01.00.01.00.85207414.2543703.041.0CAArts, Culture, Humanities0.01.00.00.00.00.00.00.00.00.00.0-2.0
890494272266388720.00.02.03.01.01.00.01.01.01.01.01.00.01.00.73573814.0107702.035.0CAHealth0.00.00.00.00.01.00.00.00.00.00.0-1.0
936295283487191060.00.02.03.01.01.00.01.01.01.00.01.01.01.00.75559216.2336293.043.0CAArts, Culture, Humanities0.01.00.00.00.00.00.00.00.00.00.0-1.0
944495355705691000.00.02.03.01.01.00.01.01.01.00.01.01.01.00.99703917.6865933.033.0CAHuman Services0.00.00.00.00.00.01.00.00.00.00.0-1.0
\n", "
" ], "text/plain": [ " EIN org_id donor_advisory_2011_to_2016 donor_advisory_2016 \\\n", "688 116101487 6711 0.0 0.0 \n", "868 131777413 8926 0.0 0.0 \n", "1274 133355315 11013 0.0 0.0 \n", "1450 133740640 7770 0.0 0.0 \n", "1492 133843122 10492 0.0 0.0 \n", "2052 222624532 4060 0.0 0.0 \n", "2058 222664361 10613 0.0 0.0 \n", "2288 232139831 9410 0.0 0.0 \n", "2534 237159172 6081 0.0 0.0 \n", "2648 237284092 6856 0.0 0.0 \n", "2972 300335420 12801 0.0 0.0 \n", "3052 311005792 6896 0.0 0.0 \n", "3516 362167725 3311 0.0 0.0 \n", "3564 362217981 3248 0.0 0.0 \n", "4046 382882823 9504 0.0 0.0 \n", "4132 390921093 8035 0.0 0.0 \n", "4202 391862290 5322 0.0 0.0 \n", "4914 520953609 7844 0.0 0.0 \n", "5134 521309876 10228 0.0 0.0 \n", "5554 530162440 5475 0.0 0.0 \n", "6074 576000192 10474 0.0 0.0 \n", "6166 581303476 5521 0.0 0.0 \n", "6316 581974410 5542 0.0 0.0 \n", "6900 620670972 7251 0.0 0.0 \n", "7084 650746714 10379 1.0 0.0 \n", "7232 721297795 8308 0.0 0.0 \n", "7274 731026057 7290 0.0 0.0 \n", "7364 741469465 5939 0.0 0.0 \n", "8130 850437114 7369 0.0 0.0 \n", "8442 911508191 3231 0.0 0.0 \n", "8638 931009305 7417 0.0 0.0 \n", "8836 942297746 9775 0.0 0.0 \n", "8904 942722663 8872 0.0 0.0 \n", "9362 952834871 9106 0.0 0.0 \n", "9444 953557056 9100 0.0 0.0 \n", "\n", " SOX_policies SOX_policies [t-1] SOX_policies_binary \\\n", "688 1.0 2.0 1.0 \n", "868 0.0 1.0 0.0 \n", "1274 2.0 3.0 1.0 \n", "1450 0.0 1.0 0.0 \n", "1492 1.0 2.0 1.0 \n", "2052 2.0 3.0 1.0 \n", "2058 1.0 2.0 1.0 \n", "2288 2.0 3.0 1.0 \n", "2534 1.0 2.0 1.0 \n", "2648 2.0 3.0 1.0 \n", "2972 2.0 3.0 1.0 \n", "3052 1.0 3.0 1.0 \n", "3516 2.0 3.0 1.0 \n", "3564 2.0 3.0 1.0 \n", "4046 1.0 2.0 1.0 \n", "4132 2.0 3.0 1.0 \n", "4202 2.0 3.0 1.0 \n", "4914 2.0 3.0 1.0 \n", "5134 2.0 3.0 1.0 \n", "5554 2.0 3.0 1.0 \n", "6074 1.0 2.0 1.0 \n", "6166 0.0 3.0 0.0 \n", "6316 1.0 3.0 1.0 \n", "6900 2.0 3.0 1.0 \n", "7084 1.0 3.0 1.0 \n", "7232 0.0 1.0 0.0 \n", "7274 2.0 3.0 1.0 \n", "7364 2.0 3.0 1.0 \n", "8130 2.0 3.0 1.0 \n", "8442 2.0 3.0 1.0 \n", "8638 0.0 3.0 0.0 \n", "8836 1.0 3.0 1.0 \n", "8904 2.0 3.0 1.0 \n", "9362 2.0 3.0 1.0 \n", "9444 2.0 3.0 1.0 \n", "\n", " SOX_policies_binary [t-1] SOX_policies_all_binary \\\n", "688 1.0 0.0 \n", "868 1.0 0.0 \n", "1274 1.0 0.0 \n", "1450 1.0 0.0 \n", "1492 1.0 0.0 \n", "2052 1.0 0.0 \n", "2058 1.0 0.0 \n", "2288 1.0 0.0 \n", "2534 1.0 0.0 \n", "2648 1.0 0.0 \n", "2972 1.0 0.0 \n", "3052 1.0 0.0 \n", "3516 1.0 0.0 \n", "3564 1.0 0.0 \n", "4046 1.0 0.0 \n", "4132 1.0 0.0 \n", "4202 1.0 0.0 \n", "4914 1.0 0.0 \n", "5134 1.0 0.0 \n", "5554 1.0 0.0 \n", "6074 1.0 0.0 \n", "6166 1.0 0.0 \n", "6316 1.0 0.0 \n", "6900 1.0 0.0 \n", "7084 1.0 0.0 \n", "7232 1.0 0.0 \n", "7274 1.0 0.0 \n", "7364 1.0 0.0 \n", "8130 1.0 0.0 \n", "8442 1.0 0.0 \n", "8638 1.0 0.0 \n", "8836 1.0 0.0 \n", "8904 1.0 0.0 \n", "9362 1.0 0.0 \n", "9444 1.0 0.0 \n", "\n", " SOX_policies_all_binary [t-1] conflict_of_interest_policy_v2 \\\n", "688 0.0 0.0 \n", "868 0.0 0.0 \n", "1274 1.0 1.0 \n", "1450 0.0 0.0 \n", "1492 0.0 0.0 \n", "2052 1.0 1.0 \n", "2058 0.0 0.0 \n", "2288 1.0 1.0 \n", "2534 0.0 1.0 \n", "2648 1.0 1.0 \n", "2972 1.0 1.0 \n", "3052 1.0 1.0 \n", "3516 1.0 1.0 \n", "3564 1.0 0.0 \n", "4046 0.0 1.0 \n", "4132 1.0 1.0 \n", "4202 1.0 1.0 \n", "4914 1.0 1.0 \n", "5134 1.0 1.0 \n", "5554 1.0 1.0 \n", "6074 0.0 1.0 \n", "6166 1.0 0.0 \n", "6316 1.0 1.0 \n", "6900 1.0 1.0 \n", "7084 1.0 0.0 \n", "7232 0.0 0.0 \n", "7274 1.0 1.0 \n", "7364 1.0 1.0 \n", "8130 1.0 1.0 \n", "8442 1.0 1.0 \n", "8638 1.0 0.0 \n", "8836 1.0 1.0 \n", "8904 1.0 1.0 \n", "9362 1.0 1.0 \n", "9444 1.0 1.0 \n", "\n", " conflict_of_interest_policy_v2 [t-1] whistleblower_policy_v2 \\\n", "688 0.0 0.0 \n", "868 1.0 0.0 \n", "1274 1.0 1.0 \n", "1450 1.0 0.0 \n", "1492 1.0 0.0 \n", "2052 1.0 1.0 \n", "2058 1.0 1.0 \n", "2288 1.0 0.0 \n", "2534 1.0 0.0 \n", "2648 1.0 0.0 \n", "2972 1.0 1.0 \n", "3052 1.0 0.0 \n", "3516 1.0 1.0 \n", "3564 1.0 1.0 \n", "4046 1.0 0.0 \n", "4132 1.0 0.0 \n", "4202 1.0 0.0 \n", "4914 1.0 0.0 \n", "5134 1.0 1.0 \n", "5554 1.0 1.0 \n", "6074 1.0 0.0 \n", "6166 1.0 0.0 \n", "6316 1.0 0.0 \n", "6900 1.0 1.0 \n", "7084 1.0 0.0 \n", "7232 1.0 0.0 \n", "7274 1.0 1.0 \n", "7364 1.0 0.0 \n", "8130 1.0 0.0 \n", "8442 1.0 0.0 \n", "8638 1.0 0.0 \n", "8836 1.0 0.0 \n", "8904 1.0 1.0 \n", "9362 1.0 0.0 \n", "9444 1.0 0.0 \n", "\n", " whistleblower_policy_v2 [t-1] records_retention_policy_v2 \\\n", "688 1.0 1.0 \n", "868 0.0 0.0 \n", "1274 1.0 0.0 \n", "1450 0.0 0.0 \n", "1492 0.0 1.0 \n", "2052 1.0 0.0 \n", "2058 0.0 0.0 \n", "2288 1.0 1.0 \n", "2534 0.0 0.0 \n", "2648 1.0 1.0 \n", "2972 1.0 0.0 \n", "3052 1.0 0.0 \n", "3516 1.0 0.0 \n", "3564 1.0 1.0 \n", "4046 0.0 0.0 \n", "4132 1.0 1.0 \n", "4202 1.0 1.0 \n", "4914 1.0 1.0 \n", "5134 1.0 0.0 \n", "5554 1.0 0.0 \n", "6074 0.0 0.0 \n", "6166 1.0 0.0 \n", "6316 1.0 0.0 \n", "6900 1.0 0.0 \n", "7084 1.0 1.0 \n", "7232 0.0 0.0 \n", "7274 1.0 0.0 \n", "7364 1.0 1.0 \n", "8130 1.0 1.0 \n", "8442 1.0 1.0 \n", "8638 1.0 0.0 \n", "8836 1.0 0.0 \n", "8904 1.0 0.0 \n", "9362 1.0 1.0 \n", "9444 1.0 1.0 \n", "\n", " records_retention_policy_v2 [t-1] program_efficiency [t-1] \\\n", "688 1.0 0.825044 \n", "868 0.0 0.622676 \n", "1274 1.0 0.777093 \n", "1450 0.0 0.888804 \n", "1492 1.0 0.835784 \n", "2052 1.0 0.942271 \n", "2058 1.0 0.888819 \n", "2288 1.0 0.729658 \n", "2534 1.0 0.690904 \n", "2648 1.0 0.712476 \n", "2972 1.0 0.755513 \n", "3052 1.0 0.811307 \n", "3516 1.0 0.886055 \n", "3564 1.0 0.607419 \n", "4046 1.0 0.605817 \n", "4132 1.0 0.843165 \n", "4202 1.0 0.903209 \n", "4914 1.0 0.744055 \n", "5134 1.0 0.817903 \n", "5554 1.0 0.747056 \n", "6074 1.0 0.824266 \n", "6166 1.0 0.837946 \n", "6316 1.0 0.939039 \n", "6900 1.0 0.838641 \n", "7084 1.0 0.921851 \n", "7232 0.0 0.103705 \n", "7274 1.0 0.646123 \n", "7364 1.0 0.893999 \n", "8130 1.0 0.781989 \n", "8442 1.0 0.884811 \n", "8638 1.0 0.771693 \n", "8836 1.0 0.852074 \n", "8904 1.0 0.735738 \n", "9362 1.0 0.755592 \n", "9444 1.0 0.997039 \n", "\n", " total_revenue_logged [t-1] complexity_2011 [t-1] age state \\\n", "688 14.165464 2.0 49.0 NY \n", "868 13.901675 2.0 47.0 NY \n", "1274 14.162282 3.0 21.0 NY \n", "1450 15.686579 1.0 21.0 NY \n", "1492 14.668943 2.0 20.0 NY \n", "2052 15.689819 2.0 31.0 CA \n", "2058 14.206729 2.0 31.0 NJ \n", "2288 14.744867 2.0 36.0 PA \n", "2534 14.643419 3.0 44.0 TN \n", "2648 12.951969 2.0 43.0 DC \n", "2972 15.011579 2.0 10.0 DC \n", "3052 15.122492 3.0 35.0 IN \n", "3516 19.362213 3.0 91.0 IL \n", "3564 17.620558 3.0 63.0 DC \n", "4046 13.624309 3.0 27.0 MI \n", "4132 15.532696 3.0 60.0 WI \n", "4202 14.377565 2.0 19.0 WI \n", "4914 14.484888 2.0 44.0 MD \n", "5134 14.137953 2.0 33.0 CA \n", "5554 15.276766 2.0 79.0 DC \n", "6074 14.763764 1.0 70.0 SC \n", "6166 14.053488 2.0 38.0 GA \n", "6316 14.759248 3.0 24.0 GA \n", "6900 16.149066 2.0 52.0 TN \n", "7084 14.482382 2.0 18.0 FL \n", "7232 14.096109 2.0 20.0 LA \n", "7274 13.234811 2.0 39.0 NC \n", "7364 16.009059 2.0 58.0 TX \n", "8130 16.114940 2.0 20.0 NM \n", "8442 15.933201 3.0 20.0 WA \n", "8638 15.436684 3.0 26.0 OR \n", "8836 14.254370 3.0 41.0 CA \n", "8904 14.010770 2.0 35.0 CA \n", "9362 16.233629 3.0 43.0 CA \n", "9444 17.686593 3.0 33.0 CA \n", "\n", " category category_Animals \\\n", "688 Animals 1.0 \n", "868 Health 0.0 \n", "1274 Education 0.0 \n", "1450 Religion 0.0 \n", "1492 Human and Civil Rights 0.0 \n", "2052 International 0.0 \n", "2058 Religion 0.0 \n", "2288 Education 0.0 \n", "2534 Environment 0.0 \n", "2648 Research and Public Policy 0.0 \n", "2972 International 0.0 \n", "3052 Environment 0.0 \n", "3516 Education 0.0 \n", "3564 Health 0.0 \n", "4046 Human and Civil Rights 0.0 \n", "4132 Arts, Culture, Humanities 0.0 \n", "4202 Arts, Culture, Humanities 0.0 \n", "4914 Health 0.0 \n", "5134 Arts, Culture, Humanities 0.0 \n", "5554 Animals 1.0 \n", "6074 Arts, Culture, Humanities 0.0 \n", "6166 Religion 0.0 \n", "6316 Animals 1.0 \n", "6900 Human Services 0.0 \n", "7084 Human Services 0.0 \n", "7232 Health 0.0 \n", "7274 Religion 0.0 \n", "7364 Human Services 0.0 \n", "8130 Arts, Culture, Humanities 0.0 \n", "8442 Education 0.0 \n", "8638 Arts, Culture, Humanities 0.0 \n", "8836 Arts, Culture, Humanities 0.0 \n", "8904 Health 0.0 \n", "9362 Arts, Culture, Humanities 0.0 \n", "9444 Human Services 0.0 \n", "\n", " category_Arts, Culture, Humanities category_Community Development \\\n", "688 0.0 0.0 \n", "868 0.0 0.0 \n", "1274 0.0 0.0 \n", "1450 0.0 0.0 \n", "1492 0.0 0.0 \n", "2052 0.0 0.0 \n", "2058 0.0 0.0 \n", "2288 0.0 0.0 \n", "2534 0.0 0.0 \n", "2648 0.0 0.0 \n", "2972 0.0 0.0 \n", "3052 0.0 0.0 \n", "3516 0.0 0.0 \n", "3564 0.0 0.0 \n", "4046 0.0 0.0 \n", "4132 1.0 0.0 \n", "4202 1.0 0.0 \n", "4914 0.0 0.0 \n", "5134 1.0 0.0 \n", "5554 0.0 0.0 \n", "6074 1.0 0.0 \n", "6166 0.0 0.0 \n", "6316 0.0 0.0 \n", "6900 0.0 0.0 \n", "7084 0.0 0.0 \n", "7232 0.0 0.0 \n", "7274 0.0 0.0 \n", "7364 0.0 0.0 \n", "8130 1.0 0.0 \n", "8442 0.0 0.0 \n", "8638 1.0 0.0 \n", "8836 1.0 0.0 \n", "8904 0.0 0.0 \n", "9362 1.0 0.0 \n", "9444 0.0 0.0 \n", "\n", " category_Education category_Environment category_Health \\\n", "688 0.0 0.0 0.0 \n", "868 0.0 0.0 1.0 \n", "1274 1.0 0.0 0.0 \n", "1450 0.0 0.0 0.0 \n", "1492 0.0 0.0 0.0 \n", "2052 0.0 0.0 0.0 \n", "2058 0.0 0.0 0.0 \n", "2288 1.0 0.0 0.0 \n", "2534 0.0 1.0 0.0 \n", "2648 0.0 0.0 0.0 \n", "2972 0.0 0.0 0.0 \n", "3052 0.0 1.0 0.0 \n", "3516 1.0 0.0 0.0 \n", "3564 0.0 0.0 1.0 \n", "4046 0.0 0.0 0.0 \n", "4132 0.0 0.0 0.0 \n", "4202 0.0 0.0 0.0 \n", "4914 0.0 0.0 1.0 \n", "5134 0.0 0.0 0.0 \n", "5554 0.0 0.0 0.0 \n", "6074 0.0 0.0 0.0 \n", "6166 0.0 0.0 0.0 \n", "6316 0.0 0.0 0.0 \n", "6900 0.0 0.0 0.0 \n", "7084 0.0 0.0 0.0 \n", "7232 0.0 0.0 1.0 \n", "7274 0.0 0.0 0.0 \n", "7364 0.0 0.0 0.0 \n", "8130 0.0 0.0 0.0 \n", "8442 1.0 0.0 0.0 \n", "8638 0.0 0.0 0.0 \n", "8836 0.0 0.0 0.0 \n", "8904 0.0 0.0 1.0 \n", "9362 0.0 0.0 0.0 \n", "9444 0.0 0.0 0.0 \n", "\n", " category_Human Services category_Human and Civil Rights \\\n", "688 0.0 0.0 \n", "868 0.0 0.0 \n", "1274 0.0 0.0 \n", "1450 0.0 0.0 \n", "1492 0.0 1.0 \n", "2052 0.0 0.0 \n", "2058 0.0 0.0 \n", "2288 0.0 0.0 \n", "2534 0.0 0.0 \n", "2648 0.0 0.0 \n", "2972 0.0 0.0 \n", "3052 0.0 0.0 \n", "3516 0.0 0.0 \n", "3564 0.0 0.0 \n", "4046 0.0 1.0 \n", "4132 0.0 0.0 \n", "4202 0.0 0.0 \n", "4914 0.0 0.0 \n", "5134 0.0 0.0 \n", "5554 0.0 0.0 \n", "6074 0.0 0.0 \n", "6166 0.0 0.0 \n", "6316 0.0 0.0 \n", "6900 1.0 0.0 \n", "7084 1.0 0.0 \n", "7232 0.0 0.0 \n", "7274 0.0 0.0 \n", "7364 1.0 0.0 \n", "8130 0.0 0.0 \n", "8442 0.0 0.0 \n", "8638 0.0 0.0 \n", "8836 0.0 0.0 \n", "8904 0.0 0.0 \n", "9362 0.0 0.0 \n", "9444 1.0 0.0 \n", "\n", " category_International category_Religion \\\n", "688 0.0 0.0 \n", "868 0.0 0.0 \n", "1274 0.0 0.0 \n", "1450 0.0 1.0 \n", "1492 0.0 0.0 \n", "2052 1.0 0.0 \n", "2058 0.0 1.0 \n", "2288 0.0 0.0 \n", "2534 0.0 0.0 \n", "2648 0.0 0.0 \n", "2972 1.0 0.0 \n", "3052 0.0 0.0 \n", "3516 0.0 0.0 \n", "3564 0.0 0.0 \n", "4046 0.0 0.0 \n", "4132 0.0 0.0 \n", "4202 0.0 0.0 \n", "4914 0.0 0.0 \n", "5134 0.0 0.0 \n", "5554 0.0 0.0 \n", "6074 0.0 0.0 \n", "6166 0.0 1.0 \n", "6316 0.0 0.0 \n", "6900 0.0 0.0 \n", "7084 0.0 0.0 \n", "7232 0.0 0.0 \n", "7274 0.0 1.0 \n", "7364 0.0 0.0 \n", "8130 0.0 0.0 \n", "8442 0.0 0.0 \n", "8638 0.0 0.0 \n", "8836 0.0 0.0 \n", "8904 0.0 0.0 \n", "9362 0.0 0.0 \n", "9444 0.0 0.0 \n", "\n", " category_Research and Public Policy number_of_SOX_policies_added \n", "688 0.0 -1.0 \n", "868 0.0 -1.0 \n", "1274 0.0 -1.0 \n", "1450 0.0 -1.0 \n", "1492 0.0 -1.0 \n", "2052 0.0 -1.0 \n", "2058 0.0 -1.0 \n", "2288 0.0 -1.0 \n", "2534 0.0 -1.0 \n", "2648 1.0 -1.0 \n", "2972 0.0 -1.0 \n", "3052 0.0 -2.0 \n", "3516 0.0 -1.0 \n", "3564 0.0 -1.0 \n", "4046 0.0 -1.0 \n", "4132 0.0 -1.0 \n", "4202 0.0 -1.0 \n", "4914 0.0 -1.0 \n", "5134 0.0 -1.0 \n", "5554 0.0 -1.0 \n", "6074 0.0 -1.0 \n", "6166 0.0 -3.0 \n", "6316 0.0 -2.0 \n", "6900 0.0 -1.0 \n", "7084 0.0 -2.0 \n", "7232 0.0 -1.0 \n", "7274 0.0 -1.0 \n", "7364 0.0 -1.0 \n", "8130 0.0 -1.0 \n", "8442 0.0 -1.0 \n", "8638 0.0 -3.0 \n", "8836 0.0 -2.0 \n", "8904 0.0 -1.0 \n", "9362 0.0 -1.0 \n", "9444 0.0 -1.0 " ] }, "execution_count": 304, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod[df_2011_orgs_mod['number_of_SOX_policies_added']<0]" ] }, { "cell_type": "code", "execution_count": 311, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EINorg_idFYE2011_data2016_dataSOX_policieswhistleblower_policyrecords_retention_policyconflict_of_interest_policy
52361161014876711FY20140.01.01.0_gfx_/icons/checkboxX.gif_gfx_/icons/checked.gif_gfx_/icons/checkboxX.gif
52371161014876711FY20130.00.0NaNNaNNaNNaN
52381161014876711FY20120.00.0NaNNaNNaNNaN
52391161014876711FY20110.00.0NaNNaNNaNNaN
52401161014876711FY20100.00.0NaNNaNNaNNaN
52411161014876711FY20091.00.02.0yesyesNO
52421161014876711FY20090.00.0NaNNaNNaNNaN
52431161014876711FY20090.00.0NaNNaNNaNNaN
52441161014876711FY20080.00.0NaNNaNNaNNaN
52451161014876711FY20070.00.0NaNNaNNaNNaN
52461161014876711FY20060.00.0NaNNaNNaNNaN
52471161014876711FY20050.00.0NaNNaNNaNNaN
52481161014876711FY20040.00.0NaNNaNNaNNaN
52491161014876711FY20030.00.0NaNNaNNaNNaN
52501161014876711FY20020.00.0NaNNaNNaNNaN
\n", "
" ], "text/plain": [ " EIN org_id FYE 2011_data 2016_data SOX_policies \\\n", "5236 116101487 6711 FY2014 0.0 1.0 1.0 \n", "5237 116101487 6711 FY2013 0.0 0.0 NaN \n", "5238 116101487 6711 FY2012 0.0 0.0 NaN \n", "5239 116101487 6711 FY2011 0.0 0.0 NaN \n", "5240 116101487 6711 FY2010 0.0 0.0 NaN \n", "5241 116101487 6711 FY2009 1.0 0.0 2.0 \n", "5242 116101487 6711 FY2009 0.0 0.0 NaN \n", "5243 116101487 6711 FY2009 0.0 0.0 NaN \n", "5244 116101487 6711 FY2008 0.0 0.0 NaN \n", "5245 116101487 6711 FY2007 0.0 0.0 NaN \n", "5246 116101487 6711 FY2006 0.0 0.0 NaN \n", "5247 116101487 6711 FY2005 0.0 0.0 NaN \n", "5248 116101487 6711 FY2004 0.0 0.0 NaN \n", "5249 116101487 6711 FY2003 0.0 0.0 NaN \n", "5250 116101487 6711 FY2002 0.0 0.0 NaN \n", "\n", " whistleblower_policy records_retention_policy \\\n", "5236 _gfx_/icons/checkboxX.gif _gfx_/icons/checked.gif \n", "5237 NaN NaN \n", "5238 NaN NaN \n", "5239 NaN NaN \n", "5240 NaN NaN \n", "5241 yes yes \n", "5242 NaN NaN \n", "5243 NaN NaN \n", "5244 NaN NaN \n", "5245 NaN NaN \n", "5246 NaN NaN \n", "5247 NaN NaN \n", "5248 NaN NaN \n", "5249 NaN NaN \n", "5250 NaN NaN \n", "\n", " conflict_of_interest_policy \n", "5236 _gfx_/icons/checkboxX.gif \n", "5237 NaN \n", "5238 NaN \n", "5239 NaN \n", "5240 NaN \n", "5241 NO \n", "5242 NaN \n", "5243 NaN \n", "5244 NaN \n", "5245 NaN \n", "5246 NaN \n", "5247 NaN \n", "5248 NaN \n", "5249 NaN \n", "5250 NaN " ] }, "execution_count": 311, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['EIN']=='116101487'][['EIN', 'org_id', 'FYE', '2011_data', '2016_data', 'SOX_policies', \n", " 'whistleblower_policy', 'records_retention_policy',\n", " 'conflict_of_interest_policy']]" ] }, { "cell_type": "code", "execution_count": 312, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.0 3789\n", " 1.0 494\n", " 2.0 356\n", " 3.0 158\n", "-1.0 29\n", "-2.0 4\n", "-3.0 2\n", "Name: number_of_SOX_policies_added, dtype: int64 \n", "\n", "0.0 3824\n", "1.0 494\n", "2.0 356\n", "3.0 158\n", "Name: number_of_SOX_policies_added, dtype: int64\n" ] } ], "source": [ "print df_2011_orgs_mod['number_of_SOX_policies_added'].value_counts(), '\\n'\n", "df_2011_orgs_mod['number_of_SOX_policies_added'] = np.where(df_2011_orgs_mod['number_of_SOX_policies_added']<0,\n", " 0, df_2011_orgs_mod['number_of_SOX_policies_added'])\n", "print df_2011_orgs_mod['number_of_SOX_policies_added'].value_counts() " ] }, { "cell_type": "code", "execution_count": 343, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 3849\n", "1 1008\n", "Name: any_SOX_policies_added, dtype: int64 \n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
number_of_SOX_policies_added0.01.02.03.0
any_SOX_policies_added
03824000
10494356158
\n", "
" ], "text/plain": [ "number_of_SOX_policies_added 0.0 1.0 2.0 3.0\n", "any_SOX_policies_added \n", "0 3824 0 0 0\n", "1 0 494 356 158" ] }, "execution_count": 343, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod['any_SOX_policies_added'] = np.nan \n", "df_2011_orgs_mod['any_SOX_policies_added'] = np.where(df_2011_orgs_mod['number_of_SOX_policies_added']>0,\n", " 1, 0)\n", "print df_2011_orgs_mod['any_SOX_policies_added'].value_counts(), '\\n' \n", "pd.crosstab(df_2011_orgs_mod['any_SOX_policies_added'], df_2011_orgs_mod['number_of_SOX_policies_added'])" ] }, { "cell_type": "code", "execution_count": 314, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 4795\n", "1 62\n", "Name: always_no_SOX, dtype: int64 \n", "\n" ] } ], "source": [ "df_2011_orgs_mod['always_no_SOX'] = np.where( ( (df_2011_orgs_mod['SOX_policies_binary']==0)\n", " & (df_2011_orgs_mod['SOX_policies_binary [t-1]']==0)),\n", " 1, 0)\n", "print b.value_counts(), '\\n'\n", "#df_2011_orgs_mod['any_SOX_policies_added'] = np.where(df_2011_orgs_mod['any_SOX_policies_added']<0,\n", "# 0, df_2011_orgs_mod['any_SOX_policies_added'])\n", "#print df_2011_orgs_mod['any_SOX_policies_added'].value_counts() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
There were 255 orgs with zero SOX policies in 2011; this dropped to 62 in 2016 (so, 193 orgs added SOX policies)." ] }, { "cell_type": "code", "execution_count": 315, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SOX_policies_binary0.01.0
always_no_SOX
064789
1620
\n", "
" ], "text/plain": [ "SOX_policies_binary 0.0 1.0\n", "always_no_SOX \n", "0 6 4789\n", "1 62 0" ] }, "execution_count": 315, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df_2011_orgs_mod['always_no_SOX'], df_2011_orgs_mod['SOX_policies_binary'])" ] }, { "cell_type": "code", "execution_count": 316, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SOX_policies_binary [t-1]0.01.0
always_no_SOX
01934577
1620
\n", "
" ], "text/plain": [ "SOX_policies_binary [t-1] 0.0 1.0\n", "always_no_SOX \n", "0 193 4577\n", "1 62 0" ] }, "execution_count": 316, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df_2011_orgs_mod['always_no_SOX'], df_2011_orgs_mod['SOX_policies_binary [t-1]'])" ] }, { "cell_type": "code", "execution_count": 319, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisory_20160.01.0
always_no_SOX
0475936
15111
\n", "
" ], "text/plain": [ "donor_advisory_2016 0.0 1.0\n", "always_no_SOX \n", "0 4759 36\n", "1 51 11" ] }, "execution_count": 319, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df_2011_orgs_mod['always_no_SOX'], df_2011_orgs_mod['donor_advisory_2016'])" ] }, { "cell_type": "code", "execution_count": 320, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 4572\n", "0 285\n", "Name: always_SOX, dtype: int64 \n", "\n" ] } ], "source": [ "df_2011_orgs_mod['always_SOX'] = np.where( ( (df_2011_orgs_mod['SOX_policies_binary']==1)\n", " & (df_2011_orgs_mod['SOX_policies_binary [t-1]']==1)),\n", " 1, 0)\n", "print df_2011_orgs_mod['always_SOX'].value_counts(), '\\n'\n", "#df_2011_orgs_mod['any_SOX_policies_added'] = np.where(df_2011_orgs_mod['any_SOX_policies_added']<0,\n", "# 0, df_2011_orgs_mod['any_SOX_policies_added'])\n", "#print df_2011_orgs_mod['any_SOX_policies_added'].value_counts() " ] }, { "cell_type": "code", "execution_count": 321, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
always_SOX01
always_no_SOX
02234572
1620
\n", "
" ], "text/plain": [ "always_SOX 0 1\n", "always_no_SOX \n", "0 223 4572\n", "1 62 0" ] }, "execution_count": 321, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df_2011_orgs_mod['always_no_SOX'], df_2011_orgs_mod['always_SOX'])" ] }, { "cell_type": "code", "execution_count": 323, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4857\n", "4857\n" ] } ], "source": [ "print 62+4572+223\n", "print len(df_2011_orgs_modzero)" ] }, { "cell_type": "code", "execution_count": 344, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 4664\n", "1 193\n", "Name: added_SOX_previously_none, dtype: int64 \n", "\n" ] } ], "source": [ "df_2011_orgs_mod['added_SOX_previously_none'] = np.where( (df_2011_orgs_mod['SOX_policies_binary']>\n", " df_2011_orgs_mod['SOX_policies_binary [t-1]']),\n", " 1, 0)\n", "print df_2011_orgs_mod['added_SOX_previously_none'].value_counts(), '\\n'" ] }, { "cell_type": "code", "execution_count": 346, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#df_2011_orgs_mod = df_2011_orgs_mod.drop('added_SOX', 1)" ] }, { "cell_type": "code", "execution_count": 328, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisory_20160.01.0
always_no_SOX
0475936
15111
\n", "
" ], "text/plain": [ "donor_advisory_2016 0.0 1.0\n", "always_no_SOX \n", "0 4759 36\n", "1 51 11" ] }, "execution_count": 328, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df_2011_orgs_mod['always_no_SOX'], df_2011_orgs_mod['donor_advisory_2016'])" ] }, { "cell_type": "code", "execution_count": 330, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from __future__ import division" ] }, { "cell_type": "code", "execution_count": 334, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.177419354839\n", "0.0\n", "0.00682669015635\n" ] } ], "source": [ "print 11/62\n", "print 0/193\n", "print 31/4541" ] }, { "cell_type": "code", "execution_count": 347, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisory_20160.01.0
added_SOX_previously_none
0461747
11930
\n", "
" ], "text/plain": [ "donor_advisory_2016 0.0 1.0\n", "added_SOX_previously_none \n", "0 4617 47\n", "1 193 0" ] }, "execution_count": 347, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df_2011_orgs_mod['added_SOX_previously_none'], df_2011_orgs_mod['donor_advisory_2016'])" ] }, { "cell_type": "code", "execution_count": 333, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisory_20160.01.0
always_SOX
026916
1454131
\n", "
" ], "text/plain": [ "donor_advisory_2016 0.0 1.0\n", "always_SOX \n", "0 269 16\n", "1 4541 31" ] }, "execution_count": 333, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df_2011_orgs_mod['always_SOX'], df_2011_orgs_mod['donor_advisory_2016'])" ] }, { "cell_type": "code", "execution_count": 352, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 4699\n", "1 158\n", "Name: from_zero_to_3_SOX, dtype: int64 \n", "\n" ] } ], "source": [ "df_2011_orgs_mod['from_zero_to_3_SOX'] = np.where( ((df_2011_orgs_mod['SOX_policies_binary [t-1]']==0) &\n", " (df_2011_orgs_mod['number_of_SOX_policies_added']==3)),\n", " 1, 0)\n", "print df_2011_orgs_mod['from_zero_to_3_SOX'].value_counts(), '\\n'" ] }, { "cell_type": "code", "execution_count": 353, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
number_of_SOX_policies_added0.01.02.03.0
from_zero_to_3_SOX
038244943560
1000158
\n", "
" ], "text/plain": [ "number_of_SOX_policies_added 0.0 1.0 2.0 3.0\n", "from_zero_to_3_SOX \n", "0 3824 494 356 0\n", "1 0 0 0 158" ] }, "execution_count": 353, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df_2011_orgs_mod['from_zero_to_3_SOX'], df_2011_orgs_mod['number_of_SOX_policies_added'])" ] }, { "cell_type": "code", "execution_count": 357, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisory_20160.01.0
from_zero_to_3_SOX
0465247
11580
\n", "
" ], "text/plain": [ "donor_advisory_2016 0.0 1.0\n", "from_zero_to_3_SOX \n", "0 4652 47\n", "1 158 0" ] }, "execution_count": 357, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df_2011_orgs_mod['from_zero_to_3_SOX'], df_2011_orgs_mod['donor_advisory_2016'])" ] }, { "cell_type": "code", "execution_count": 358, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisory_20160.01.0
added_SOX_previously_none
0461747
11930
\n", "
" ], "text/plain": [ "donor_advisory_2016 0.0 1.0\n", "added_SOX_previously_none \n", "0 4617 47\n", "1 193 0" ] }, "execution_count": 358, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df_2011_orgs_mod['added_SOX_previously_none'], df_2011_orgs_mod['donor_advisory_2016'])" ] }, { "cell_type": "code", "execution_count": 361, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisory_20160.01.0
always_no_SOX
0475936
15111
\n", "
" ], "text/plain": [ "donor_advisory_2016 0.0 1.0\n", "always_no_SOX \n", "0 4759 36\n", "1 51 11" ] }, "execution_count": 361, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df_2011_orgs_mod['always_no_SOX'], df_2011_orgs_mod['donor_advisory_2016'])" ] }, { "cell_type": "code", "execution_count": 359, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
donor_advisory_20160.01.0
always_SOX
026916
1454131
\n", "
" ], "text/plain": [ "donor_advisory_2016 0.0 1.0\n", "always_SOX \n", "0 269 16\n", "1 4541 31" ] }, "execution_count": 359, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.crosstab(df_2011_orgs_mod['always_SOX'], df_2011_orgs_mod['donor_advisory_2016'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#'SOX_policies_binary', 'SOX_policies_binary [t-1]', \n", "#'SOX_policies_all_binary', 'SOX_policies_all_binary [t-1]', \n", "#'conflict_of_interest_policy_v2', 'conflict_of_interest_policy_v2 [t-1]', \n", "#'whistleblower_policy_v2', 'whistleblower_policy_v2 [t-1]', \n", "#'records_retention_policy_v2', 'records_retention_policy_v2 [t-1]'," ] }, { "cell_type": "code", "execution_count": 336, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4857\n" ] } ], "source": [ "print len(df_2011_orgs_mod)" ] }, { "cell_type": "code", "execution_count": 354, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
donor_advisory_2011_to_20164857.00.0222360.1474650.0000000.00.00.01.000000
donor_advisory_20164857.00.0096770.0979030.0000000.00.00.01.000000
SOX_policies4857.02.8704960.4882350.0000003.03.03.03.000000
SOX_policies [t-1]4832.02.5329060.8700220.000000NaNNaNNaN3.000000
SOX_policies_binary4857.00.9860000.1175040.0000001.01.01.01.000000
SOX_policies_binary [t-1]4832.00.9472270.2236030.000000NaNNaNNaN1.000000
SOX_policies_all_binary4857.00.9192920.2724150.0000001.01.01.01.000000
SOX_policies_all_binary [t-1]4832.00.7340650.4418760.000000NaNNaNNaN1.000000
conflict_of_interest_policy_v24857.00.9820880.1326460.0000001.01.01.01.000000
conflict_of_interest_policy_v2 [t-1]4832.00.9335680.2490620.000000NaNNaNNaN1.000000
whistleblower_policy_v24857.00.9477040.2226460.0000001.01.01.01.000000
whistleblower_policy_v2 [t-1]4832.00.7998760.4001350.000000NaNNaNNaN1.000000
records_retention_policy_v24857.00.9407040.2362020.0000001.01.01.01.000000
records_retention_policy_v2 [t-1]4832.00.7994620.4004440.000000NaNNaNNaN1.000000
program_efficiency [t-1]4832.00.8046020.1055740.022177NaNNaNNaN0.997687
total_revenue_logged [t-1]4832.015.4612171.6554400.000000NaNNaNNaN22.000798
complexity_2011 [t-1]4857.02.4700430.5361500.0000002.02.03.06.000000
age4857.040.03520719.2268760.00000025.035.052.0108.000000
category_Animals4857.00.0770020.2666220.0000000.00.00.01.000000
category_Arts, Culture, Humanities4857.00.1393860.3463850.0000000.00.00.01.000000
category_Community Development4857.00.0796790.2708230.0000000.00.00.01.000000
category_Education4857.00.0582660.2342710.0000000.00.00.01.000000
category_Environment4857.00.0658840.2481050.0000000.00.00.01.000000
category_Health4857.00.1192090.3240680.0000000.00.00.01.000000
category_Human Services4857.00.2476840.4317120.0000000.00.00.01.000000
category_Human and Civil Rights4857.00.0374720.1899340.0000000.00.00.01.000000
category_International4857.00.0893560.2852860.0000000.00.00.01.000000
category_Religion4857.00.0609430.2392500.0000000.00.00.01.000000
category_Research and Public Policy4857.00.0251180.1565010.0000000.00.00.01.000000
number_of_SOX_policies_added4832.00.3476820.7552880.000000NaNNaNNaN3.000000
any_SOX_policies_added4857.00.2075360.4055840.0000000.00.00.01.000000
always_no_SOX4857.00.0127650.1122710.0000000.00.00.01.000000
always_SOX4857.00.9413220.2350460.0000001.01.01.01.000000
added_SOX_previously_none4857.00.0397360.1953590.0000000.00.00.01.000000
from_zero_to_3_SOX4857.00.0325300.1774220.0000000.00.00.01.000000
\n", "
" ], "text/plain": [ " count mean std min \\\n", "donor_advisory_2011_to_2016 4857.0 0.022236 0.147465 0.000000 \n", "donor_advisory_2016 4857.0 0.009677 0.097903 0.000000 \n", "SOX_policies 4857.0 2.870496 0.488235 0.000000 \n", "SOX_policies [t-1] 4832.0 2.532906 0.870022 0.000000 \n", "SOX_policies_binary 4857.0 0.986000 0.117504 0.000000 \n", "SOX_policies_binary [t-1] 4832.0 0.947227 0.223603 0.000000 \n", "SOX_policies_all_binary 4857.0 0.919292 0.272415 0.000000 \n", "SOX_policies_all_binary [t-1] 4832.0 0.734065 0.441876 0.000000 \n", "conflict_of_interest_policy_v2 4857.0 0.982088 0.132646 0.000000 \n", "conflict_of_interest_policy_v2 [t-1] 4832.0 0.933568 0.249062 0.000000 \n", "whistleblower_policy_v2 4857.0 0.947704 0.222646 0.000000 \n", "whistleblower_policy_v2 [t-1] 4832.0 0.799876 0.400135 0.000000 \n", "records_retention_policy_v2 4857.0 0.940704 0.236202 0.000000 \n", "records_retention_policy_v2 [t-1] 4832.0 0.799462 0.400444 0.000000 \n", "program_efficiency [t-1] 4832.0 0.804602 0.105574 0.022177 \n", "total_revenue_logged [t-1] 4832.0 15.461217 1.655440 0.000000 \n", "complexity_2011 [t-1] 4857.0 2.470043 0.536150 0.000000 \n", "age 4857.0 40.035207 19.226876 0.000000 \n", "category_Animals 4857.0 0.077002 0.266622 0.000000 \n", "category_Arts, Culture, Humanities 4857.0 0.139386 0.346385 0.000000 \n", "category_Community Development 4857.0 0.079679 0.270823 0.000000 \n", "category_Education 4857.0 0.058266 0.234271 0.000000 \n", "category_Environment 4857.0 0.065884 0.248105 0.000000 \n", "category_Health 4857.0 0.119209 0.324068 0.000000 \n", "category_Human Services 4857.0 0.247684 0.431712 0.000000 \n", "category_Human and Civil Rights 4857.0 0.037472 0.189934 0.000000 \n", "category_International 4857.0 0.089356 0.285286 0.000000 \n", "category_Religion 4857.0 0.060943 0.239250 0.000000 \n", "category_Research and Public Policy 4857.0 0.025118 0.156501 0.000000 \n", "number_of_SOX_policies_added 4832.0 0.347682 0.755288 0.000000 \n", "any_SOX_policies_added 4857.0 0.207536 0.405584 0.000000 \n", "always_no_SOX 4857.0 0.012765 0.112271 0.000000 \n", "always_SOX 4857.0 0.941322 0.235046 0.000000 \n", "added_SOX_previously_none 4857.0 0.039736 0.195359 0.000000 \n", "from_zero_to_3_SOX 4857.0 0.032530 0.177422 0.000000 \n", "\n", " 25% 50% 75% max \n", "donor_advisory_2011_to_2016 0.0 0.0 0.0 1.000000 \n", "donor_advisory_2016 0.0 0.0 0.0 1.000000 \n", "SOX_policies 3.0 3.0 3.0 3.000000 \n", "SOX_policies [t-1] NaN NaN NaN 3.000000 \n", "SOX_policies_binary 1.0 1.0 1.0 1.000000 \n", "SOX_policies_binary [t-1] NaN NaN NaN 1.000000 \n", "SOX_policies_all_binary 1.0 1.0 1.0 1.000000 \n", "SOX_policies_all_binary [t-1] NaN NaN NaN 1.000000 \n", "conflict_of_interest_policy_v2 1.0 1.0 1.0 1.000000 \n", "conflict_of_interest_policy_v2 [t-1] NaN NaN NaN 1.000000 \n", "whistleblower_policy_v2 1.0 1.0 1.0 1.000000 \n", "whistleblower_policy_v2 [t-1] NaN NaN NaN 1.000000 \n", "records_retention_policy_v2 1.0 1.0 1.0 1.000000 \n", "records_retention_policy_v2 [t-1] NaN NaN NaN 1.000000 \n", "program_efficiency [t-1] NaN NaN NaN 0.997687 \n", "total_revenue_logged [t-1] NaN NaN NaN 22.000798 \n", "complexity_2011 [t-1] 2.0 2.0 3.0 6.000000 \n", "age 25.0 35.0 52.0 108.000000 \n", "category_Animals 0.0 0.0 0.0 1.000000 \n", "category_Arts, Culture, Humanities 0.0 0.0 0.0 1.000000 \n", "category_Community Development 0.0 0.0 0.0 1.000000 \n", "category_Education 0.0 0.0 0.0 1.000000 \n", "category_Environment 0.0 0.0 0.0 1.000000 \n", "category_Health 0.0 0.0 0.0 1.000000 \n", "category_Human Services 0.0 0.0 0.0 1.000000 \n", "category_Human and Civil Rights 0.0 0.0 0.0 1.000000 \n", "category_International 0.0 0.0 0.0 1.000000 \n", "category_Religion 0.0 0.0 0.0 1.000000 \n", "category_Research and Public Policy 0.0 0.0 0.0 1.000000 \n", "number_of_SOX_policies_added NaN NaN NaN 3.000000 \n", "any_SOX_policies_added 0.0 0.0 0.0 1.000000 \n", "always_no_SOX 0.0 0.0 0.0 1.000000 \n", "always_SOX 1.0 1.0 1.0 1.000000 \n", "added_SOX_previously_none 0.0 0.0 0.0 1.000000 \n", "from_zero_to_3_SOX 0.0 0.0 0.0 1.000000 " ] }, "execution_count": 354, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_2011_orgs_mod.describe().T" ] }, { "cell_type": "code", "execution_count": 355, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['EIN', 'org_id', 'donor_advisory_2011_to_2016', 'donor_advisory_2016', 'SOX_policies', 'SOX_policies [t-1]', 'SOX_policies_binary', 'SOX_policies_binary [t-1]', 'SOX_policies_all_binary', 'SOX_policies_all_binary [t-1]', 'conflict_of_interest_policy_v2', 'conflict_of_interest_policy_v2 [t-1]', 'whistleblower_policy_v2', 'whistleblower_policy_v2 [t-1]', 'records_retention_policy_v2', 'records_retention_policy_v2 [t-1]', 'program_efficiency [t-1]', 'total_revenue_logged [t-1]', 'complexity_2011 [t-1]', 'age', 'state', 'category', 'category_Animals', 'category_Arts, Culture, Humanities', 'category_Community Development', 'category_Education', 'category_Environment', 'category_Health', 'category_Human Services', 'category_Human and Civil Rights', 'category_International', 'category_Religion', 'category_Research and Public Policy', 'number_of_SOX_policies_added', 'any_SOX_policies_added', 'always_no_SOX', 'always_SOX', 'added_SOX_previously_none', 'from_zero_to_3_SOX']\n" ] } ], "source": [ "print df_2011_orgs_mod.columns.tolist()" ] }, { "cell_type": "code", "execution_count": 356, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df_2011_orgs_mod.to_pickle('Test 5 data.pkl')\n", "df_2011_orgs_mod.to_excel('Test 5 data.xls')" ] }, { "cell_type": "code", "execution_count": 363, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df.to_pickle('Final Merged CN Dataset (85,401 obs).pkl')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Miscellaneous Code Below This" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NOTE: Newly added 990 rows need org_id plus age and category and state added" ] }, { "cell_type": "code", "execution_count": 1207, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "133552154 201412\n", "http://990s.foundationcenter.org/990_pdf_archive/133/133552154/133552154_201412_990.pdf\n", "581925867 201412\n", "http://990s.foundationcenter.org/990_pdf_archive/581/581925867/581925867_201412_990.pdf\n" ] } ], "source": [ "for index, row in df[(df['org_id'].isin(org_ids_2011)) & (df['2016_data']==1) & (df['donor_advisory']==1)][:2].iterrows():\n", " #url = 'http://990s.foundationcenter.org/990_pdf_archive/043/043314346/043314346_201312_990.pdf'\n", " EIN = row['EIN']\n", " if row['Form 990 FYE']!='current': \n", " fye = row['Form 990 FYE'].str.replace('_', '')\n", " else:\n", " fye = '201412'\n", " URL_extension = EIN + '_' + fye\n", " print EIN, fye\n", " url = 'http://990s.foundationcenter.org/990_pdf_archive/%s/%s/%s_990.pdf' % (EIN[:3], EIN, URL_extension)\n", " print url" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Save Lists of EINs to download" ] }, { "cell_type": "code", "execution_count": 1218, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "321\n", "271\n", "321\n", "321\n" ] } ], "source": [ "print len(df[(df['2016_data']==1) & (df['donor_advisory']==1)])\n", "print len(set(df[(df['2016_data']==1) & (df['donor_advisory']==1)]))\n", "print len(list(set(df[(df['2016_data']==1) & (df['donor_advisory']==1)]['EIN'].tolist())))\n", "advisories_2016 = list(set(df[(df['2016_data']==1) & (df['donor_advisory']==1)]['EIN'].tolist()))\n", "print len(advisories_2016)\n", "f = open('2016 donor advisory EINs.json', 'w')\n", "json.dump(advisories_2016, f)\n", "f.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 0 }