{ "metadata": { "name": "", "signature": "sha256:6229c37254101afd61a6b0dad6f7ff7006f2f0bd7904d70e47f55b256cc2270e" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analysis: Scores for Georgia child placing agencies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Georgia [grades child placing agencies on a 100-point scale](https://www.gascore.com/documents/RBWO_Provider_Profile_Guide.pdf). (Due to \"incentive credits,\" it's possible to score above 100.) The state provided BuzzFeed News with reports covering the previous 10 fiscal quarters: FY2013-Q1 through FY2015-Q2." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parsing the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Georgia reports come as PDFs, which BuzzFeed News then converted to XML documents using [pdftohtml](http://linux.die.net/man/1/pdftohtml). The section of code below parses the XML documents to extract key details from each provider's summary page." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import lxml.html" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "report_paths = [\n", " \"../reports/RBWO_FY2015_Provider_Profile_Guide.xml\",\n", " \"../reports/RBWO_FY2014_Provider_Profile_Guide.xml\",\n", " \"../reports/RBWO_FY2013_Provider_Profile_Guide.xml\"\n", "]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "class Page(object):\n", " def __init__(self, el):\n", " self.el = el\n", " self.page_num = int(el.attrib[\"number\"])\n", " \n", " def get_text_el(self, search_string):\n", " matching = [ sub_el for sub_el in self.el.cssselect(\"text\")\n", " if search_string in sub_el.text_content() ]\n", " if len(matching):\n", " return matching[0]\n", " else:\n", " return None\n", " \n", " @property\n", " def year(self):\n", " return self.get_text_el(\"(FY\").text_content().split(\"FY\")[1][:2]\n", "\n", " @property\n", " def is_profile_page(self):\n", " return \"SHINES Resource ID\" in self.el.text_content()\n", " \n", " @property\n", " def provider_name(self):\n", " prev = self.get_text_el(\"RBWO Provider Profile\")\n", " return prev.getnext().text_content()\n", " \n", " @property\n", " def total_children(self):\n", " label = self.get_text_el(\"Total Children:\")\n", " return int(label.getnext().text_content())\n", "\n", " @property\n", " def vendor_id(self):\n", " label = self.get_text_el(\"Vendor ID:\")\n", " return int(label.text_content().split(\" \")[-1])\n", " \n", " @property\n", " def license_type(self):\n", " label = self.get_text_el(\"Type:\")\n", " label_text = label.text_content().strip()\n", " label_split = label_text.split(\": \")\n", " if len(label_split) > 1: return label_split[1]\n", " return label.getnext().text_content().strip()\n", " \n", " def extract_score_from_el(self, el):\n", " text = el.text_content()\n", " if \"N/A\" in text: return None\n", " split = text.split(u\"\\xa0\")\n", " pct = split[1].strip()\n", " if not pct: return None\n", " return float(pct[:-1])\n", " \n", " @property\n", " def scores(self):\n", " year = self.year\n", " el = self.get_text_el(\"(FY\")\n", " _scores = []\n", " for i in range(4):\n", " el = el.getnext()\n", " quarter = \"FY{0}Q{1}\".format(self.year, i+1)\n", " _scores.append({\n", " \"quarter\": quarter,\n", " \"score\": self.extract_score_from_el(el)\n", " })\n", " return _scores" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "class Report(object):\n", " def __init__(self, path):\n", " with open(path) as f:\n", " self.dom = lxml.html.fromstring(f.read())\n", " self.pages = list(map(Page, self.dom.cssselect(\"page\")))\n", " self.profile_pages = list(filter(lambda p: p.is_profile_page, self.pages))\n", " self.providers = pd.DataFrame({\n", " \"vendor_id\": p.vendor_id,\n", " \"provider_name\": p.provider_name,\n", " \"license_type\": p.license_type,\n", " \"total_children\": p.total_children,\n", " \"scores\": p.scores,\n", " \"year\": \"FY\" + p.year\n", " } for p in self.profile_pages)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "reports = map(Report, report_paths)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "providers = pd.concat(r.providers for r in reports)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Aggregating the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we place the scores into its own dataframe, so that we can compute aggregate statistics." ] }, { "cell_type": "code", "collapsed": false, "input": [ "scores = pd.concat([ pd.DataFrame({\n", " \"vendor_id\": vendor_id,\n", " \"provider_name\": provider_name,\n", " \"quarter\": score[\"quarter\"],\n", " \"score\": score[\"score\"]\n", "} for score in scores) for ix, vendor_id, provider_name, scores \n", " in providers[[ \"vendor_id\", \"provider_name\", \"scores\" ]].itertuples() ])\\\n", " .dropna()\\\n", " .reset_index(drop=True)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "license_types = providers[[\"license_type\", \"vendor_id\"]].drop_duplicates().set_index(\"vendor_id\")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we separate out child placing agencies (CPAs) from child caring institutions (CCIs):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "cpas = license_types[license_types[\"license_type\"] == \"CPA\"].copy()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "score_by_vendor = scores.groupby(\"vendor_id\")\n", "aggregate_scores = pd.DataFrame({\n", " \"avg_score\": score_by_vendor[\"score\"].mean().round(2),\n", " \"high_score\": score_by_vendor[\"score\"].max(),\n", " \"low_score\": score_by_vendor[\"score\"].min(),\n", " \"n_quarters\": score_by_vendor[\"score\"].size(),\n", " \"provider_name\": score_by_vendor[\"provider_name\"].first()\n", "})" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ranking the providers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we rank all CPAs that received scores in all 10 quarters, by average score:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ranked = cpas.join(aggregate_scores).sort(\"avg_score\")\n", "all_quarters = ranked[ranked[\"n_quarters\"] == 10].copy()\n", "all_quarters[\"avg_score_rank\"] = all_quarters[\"avg_score\"].rank(ascending=False)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "main_cols = [ \"provider_name\", \"n_quarters\", \"avg_score\", \"avg_score_rank\" ]\n", "all_quarters[main_cols]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
provider_namen_quartersavg_scoreavg_score_rank
vendor_id
35509 New Horizons Initiatives, Inc. (968) 10 45.04 54
35508 New Horizons Community Services 10 74.25 53
35443 Laurel Heights Hospital -Universal 10 78.81 52
35249 Bethany Christian Services Atlanta (573) 10 80.75 51
35497 Mentor Network Mentor Athens (734) 10 81.58 50
40080 New Beginnings, Life Changing 10 82.55 49
84761 National Youth Advocate Program 10 82.71 48
35219 All God's Children (861) 10 82.80 47
35387 National Youth Advocate Program 10 83.44 46
108643 Elks Aidmore Children's Center Child Placing A... 10 84.10 45
35493 Mentor Network Mentor Atlanta (736) 10 84.32 44
35296 Creative Community Services (612) 10 84.33 43
35494 Mentor Network Mentor Savannah (742) 10 84.79 42
35248 Bethany Christian Services Columbus 10 85.09 41
44182 Universal Health Services of 10 85.70 40
53071 Morningstar Children and Family 10 85.74 39
99720 Benchmark Family Services, Inc 10 86.56 38
84514 ENA, Inc., dba NECCO (formerly GA 10 87.60 37
99719 Benchmark Family Services, Inc 10 87.66 36
35495 Mentor Network Mentor Augusta 10 88.26 35
84513 ENA, Inc., dba NECCO (formerly GA 10 88.49 34
35505 Neighbor to Family Fulton County (774) 10 88.70 33
35498 Mentor Network Mentor Albany (733) 10 89.82 32
35611 Twin Cedars Youth Services Foster 10 90.29 31
35451 Lutheran Services of Georgia Lutheran of Atlanta 10 90.95 30
35384 Meritan, Inc. d/b/a Meritan Stepping Stones 10 91.11 29
35496 Mentor Network Mentor Macon (740) 10 91.27 28
35335 Families First Foster Care Program 10 91.50 27
84510 ENA, Inc., dba NECCO (formerly GA 10 91.86 26
35448 Lookout Mountain Community 10 92.18 25
35450 Lutheran Services of Georgia 10 92.27 24
35452 Lutheran Services of Georgia 10 92.62 23
35446 Lighthouse Therapeutic Foster Care 10 93.10 22
35415 Hillside Connections Program (700) 10 93.49 21
35506 Neighbor to Family Richmond 10 93.98 20
35385 Meritan, Inc. d/b/a Meritan Stepping Stones Macon 10 94.53 19
62037 Lutheran Services of Georgia 10 95.16 18
82494 Faithbridge Foster Care Atlanta (974) 10 95.63 17
40245 Trinity J and D, LLC Trinity J and D, 10 96.24 16
35502 Neighbor to Family Douglas County 10 96.50 15
89583 Neighbor to Family Chatham County 10 96.85 14
35356 Georgia Agape (655) 10 97.57 13
62038 Neighbor to Family Henry County 10 97.83 12
35504 Neighbor to Family Gwinnett County 10 98.36 11
35378 Georgia Parent Support Network (670) 10 98.54 10
35503 Neighbor to Family Dekalb County 10 99.84 9
40276 Giving Children A Chance of Georgia 10 99.95 8
84512 ENA, Inc., dba NECCO (formerly GA 10 100.31 7
35305 Devereux GA Treatment Network 10 100.42 6
35275 Choices for Life Of GA Valdosta (943) 10 101.28 5
35292 Community Connections (586) 10 101.55 4
35485 Murphy-Harpst Children's Centers 10 102.10 3
66182 Georgia Baptist Children's Home & 10 102.72 2
45624 United Methodist Children Home of the North GA... 10 103.21 1
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ " provider_name n_quarters \\\n", "vendor_id \n", "35509 New Horizons Initiatives, Inc. (968) 10 \n", "35508 New Horizons Community Services 10 \n", "35443 Laurel Heights Hospital -Universal 10 \n", "35249 Bethany Christian Services Atlanta (573) 10 \n", "35497 Mentor Network Mentor Athens (734) 10 \n", "40080 New Beginnings, Life Changing 10 \n", "84761 National Youth Advocate Program 10 \n", "35219 All God's Children (861) 10 \n", "35387 National Youth Advocate Program 10 \n", "108643 Elks Aidmore Children's Center Child Placing A... 10 \n", "35493 Mentor Network Mentor Atlanta (736) 10 \n", "35296 Creative Community Services (612) 10 \n", "35494 Mentor Network Mentor Savannah (742) 10 \n", "35248 Bethany Christian Services Columbus 10 \n", "44182 Universal Health Services of 10 \n", "53071 Morningstar Children and Family 10 \n", "99720 Benchmark Family Services, Inc 10 \n", "84514 ENA, Inc., dba NECCO (formerly GA 10 \n", "99719 Benchmark Family Services, Inc 10 \n", "35495 Mentor Network Mentor Augusta 10 \n", "84513 ENA, Inc., dba NECCO (formerly GA 10 \n", "35505 Neighbor to Family Fulton County (774) 10 \n", "35498 Mentor Network Mentor Albany (733) 10 \n", "35611 Twin Cedars Youth Services Foster 10 \n", "35451 Lutheran Services of Georgia Lutheran of Atlanta 10 \n", "35384 Meritan, Inc. d/b/a Meritan Stepping Stones 10 \n", "35496 Mentor Network Mentor Macon (740) 10 \n", "35335 Families First Foster Care Program 10 \n", "84510 ENA, Inc., dba NECCO (formerly GA 10 \n", "35448 Lookout Mountain Community 10 \n", "35450 Lutheran Services of Georgia 10 \n", "35452 Lutheran Services of Georgia 10 \n", "35446 Lighthouse Therapeutic Foster Care 10 \n", "35415 Hillside Connections Program (700) 10 \n", "35506 Neighbor to Family Richmond 10 \n", "35385 Meritan, Inc. d/b/a Meritan Stepping Stones Macon 10 \n", "62037 Lutheran Services of Georgia 10 \n", "82494 Faithbridge Foster Care Atlanta (974) 10 \n", "40245 Trinity J and D, LLC Trinity J and D, 10 \n", "35502 Neighbor to Family Douglas County 10 \n", "89583 Neighbor to Family Chatham County 10 \n", "35356 Georgia Agape (655) 10 \n", "62038 Neighbor to Family Henry County 10 \n", "35504 Neighbor to Family Gwinnett County 10 \n", "35378 Georgia Parent Support Network (670) 10 \n", "35503 Neighbor to Family Dekalb County 10 \n", "40276 Giving Children A Chance of Georgia 10 \n", "84512 ENA, Inc., dba NECCO (formerly GA 10 \n", "35305 Devereux GA Treatment Network 10 \n", "35275 Choices for Life Of GA Valdosta (943) 10 \n", "35292 Community Connections (586) 10 \n", "35485 Murphy-Harpst Children's Centers 10 \n", "66182 Georgia Baptist Children's Home & 10 \n", "45624 United Methodist Children Home of the North GA... 10 \n", "\n", " avg_score avg_score_rank \n", "vendor_id \n", "35509 45.04 54 \n", "35508 74.25 53 \n", "35443 78.81 52 \n", "35249 80.75 51 \n", "35497 81.58 50 \n", "40080 82.55 49 \n", "84761 82.71 48 \n", "35219 82.80 47 \n", "35387 83.44 46 \n", "108643 84.10 45 \n", "35493 84.32 44 \n", "35296 84.33 43 \n", "35494 84.79 42 \n", "35248 85.09 41 \n", "44182 85.70 40 \n", "53071 85.74 39 \n", "99720 86.56 38 \n", "84514 87.60 37 \n", "99719 87.66 36 \n", "35495 88.26 35 \n", "84513 88.49 34 \n", "35505 88.70 33 \n", "35498 89.82 32 \n", "35611 90.29 31 \n", "35451 90.95 30 \n", "35384 91.11 29 \n", "35496 91.27 28 \n", "35335 91.50 27 \n", "84510 91.86 26 \n", "35448 92.18 25 \n", "35450 92.27 24 \n", "35452 92.62 23 \n", "35446 93.10 22 \n", "35415 93.49 21 \n", "35506 93.98 20 \n", "35385 94.53 19 \n", "62037 95.16 18 \n", "82494 95.63 17 \n", "40245 96.24 16 \n", "35502 96.50 15 \n", "89583 96.85 14 \n", "35356 97.57 13 \n", "62038 97.83 12 \n", "35504 98.36 11 \n", "35378 98.54 10 \n", "35503 99.84 9 \n", "40276 99.95 8 \n", "84512 100.31 7 \n", "35305 100.42 6 \n", "35275 101.28 5 \n", "35292 101.55 4 \n", "35485 102.10 3 \n", "66182 102.72 2 \n", "45624 103.21 1 " ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For ease of reference, here are Mentor's scores and ranks alone:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "all_quarters[all_quarters[\"provider_name\"].apply(lambda x: \"Mentor\" in x)][main_cols]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
provider_namen_quartersavg_scoreavg_score_rank
vendor_id
35497 Mentor Network Mentor Athens (734) 10 81.58 50
35493 Mentor Network Mentor Atlanta (736) 10 84.32 44
35494 Mentor Network Mentor Savannah (742) 10 84.79 42
35495 Mentor Network Mentor Augusta 10 88.26 35
35498 Mentor Network Mentor Albany (733) 10 89.82 32
35496 Mentor Network Mentor Macon (740) 10 91.27 28
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ " provider_name n_quarters avg_score \\\n", "vendor_id \n", "35497 Mentor Network Mentor Athens (734) 10 81.58 \n", "35493 Mentor Network Mentor Atlanta (736) 10 84.32 \n", "35494 Mentor Network Mentor Savannah (742) 10 84.79 \n", "35495 Mentor Network Mentor Augusta 10 88.26 \n", "35498 Mentor Network Mentor Albany (733) 10 89.82 \n", "35496 Mentor Network Mentor Macon (740) 10 91.27 \n", "\n", " avg_score_rank \n", "vendor_id \n", "35497 50 \n", "35493 44 \n", "35494 42 \n", "35495 35 \n", "35498 32 \n", "35496 28 " ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "---\n", "\n", "---" ] } ], "metadata": {} } ] }