{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Good afternoon!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`In this class we'll learn how to slice and dice our data using pandas, a Python package created specifically for data analysis.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`We'll be doing all this in Jupyter Notebooks, a supremely nifty, handy-dandy tool for writing, experimenting with, and editing your code.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`'A' creates a new cell above, 'B' creates a new cell below. Double 'D' deletes a cell. 'Option+Enter' runs a cell and creates a new one. What you need might pop up in the dropdown. And finally, select a cell without being in the cell and press 'M' to switch to Markdown.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`To use pandas, we have to tell Python to pull the bundle of tools into this specific python session. At the time same time, we call it 'pd,' because that allows us to call on those tools without having to write out 'pandas' every single time.`" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Let's look at our data.`\n", "1. An Excel file listing three kinds of old-folks homes in Oregon: nursing homes, residential care facilities, and assisted living facilities in Oregon.\n", "2. A csv file listing each substantiated case of abuse or neglect at each facility over 5 years." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`What we want to do:`\n", "\n", "1. Filter facilities\n", "2. Sort facilities\n", "3. Get summary stats on facilities (means and medians)\n", "4. Calculate complaints per facility (grouping and joining)\n", "5. Find worst/best facilities (more sorting)\n", "6. See which of the three categories of facilities has the most violations. (grouping with summary stats)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## But first!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`... we have to get the data into this thing.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`How? Simple.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Create a DATAFRAME using the 'read_csv' tool that comes with pandas.`" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = pd.read_csv('data/facilities.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`You've now created a pandas dataframe. To see what it looks like, show the first five rows of the variable 'df' that you just created.`" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
facidfac_typecapacityfac_namefac_addresscity_state_zipUnnamed: 6owneroperator
0385008NF96.0Presbyterian Community Care Center1085 N Oregon StOntario, OR 97914NaNPresbyterian Nursing Home, Inc.Presbyterian Nursing Home, Inc.
1385010NF159.0Laurelhurst Village Rehabilitation Center3060 SE Stark StPortland, OR 97214NaNLaurelhurst Operations, LLCLaurelhurst Operations, LLC
2385015NF128.0Regency Gresham Nursing & Rehabilitation Center5905 SE Powell Valley RdGresham, OR 97080NaNRegency Gresham Nursing & Rehabilitation Cente...Regency Pacific Management, LLC
3385018NF98.0Providence Benedictine Nursing Center540 South Main StMt. Angel, OR 97362NaNProvidence Health & Services - OregonProvidence Health & Services - Oregon
4385024NF91.0Avamere Health Services of Rogue Valley625 Stevens StMedford, OR 97504NaNMedford Operations, LLCMedford Operations, LLC
\n", "
" ], "text/plain": [ " facid fac_type capacity fac_name \\\n", "0 385008 NF 96.0 Presbyterian Community Care Center \n", "1 385010 NF 159.0 Laurelhurst Village Rehabilitation Center \n", "2 385015 NF 128.0 Regency Gresham Nursing & Rehabilitation Center \n", "3 385018 NF 98.0 Providence Benedictine Nursing Center \n", "4 385024 NF 91.0 Avamere Health Services of Rogue Valley \n", "\n", " fac_address city_state_zip Unnamed: 6 \\\n", "0 1085 N Oregon St Ontario, OR 97914 NaN \n", "1 3060 SE Stark St Portland, OR 97214 NaN \n", "2 5905 SE Powell Valley Rd Gresham, OR 97080 NaN \n", "3 540 South Main St Mt. Angel, OR 97362 NaN \n", "4 625 Stevens St Medford, OR 97504 NaN \n", "\n", " owner \\\n", "0 Presbyterian Nursing Home, Inc. \n", "1 Laurelhurst Operations, LLC \n", "2 Regency Gresham Nursing & Rehabilitation Cente... \n", "3 Providence Health & Services - Oregon \n", "4 Medford Operations, LLC \n", "\n", " operator \n", "0 Presbyterian Nursing Home, Inc. \n", "1 Laurelhurst Operations, LLC \n", "2 Regency Pacific Management, LLC \n", "3 Providence Health & Services - Oregon \n", "4 Medford Operations, LLC " ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`***The '.head()' selects the first five rows of a dataframe. We'll be using these throughout the notebook because selecting an entire dataframe takes up a lot of space.***`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## BUT FIRST... THE BASICS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`1. Select a column:`" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fac_type
0NF
1NF
2NF
3NF
4NF
\n", "
" ], "text/plain": [ " fac_type\n", "0 NF\n", "1 NF\n", "2 NF\n", "3 NF\n", "4 NF" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['fac_type']].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Select two columns:`" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fac_typecapacity
0NF96.0
1NF159.0
2NF128.0
3NF98.0
4NF91.0
\n", "
" ], "text/plain": [ " fac_type capacity\n", "0 NF 96.0\n", "1 NF 159.0\n", "2 NF 128.0\n", "3 NF 98.0\n", "4 NF 91.0" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['fac_type','capacity']].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`count your rows`" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "facid 644\n", "fac_type 642\n", "capacity 642\n", "fac_name 644\n", "fac_address 644\n", "city_state_zip 644\n", "Unnamed: 6 0\n", "owner 644\n", "operator 644\n", "dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. FILTERING" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Filtering dataframes is a funky business.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`First, you create a list of 'Trues' and 'Falses' (a boolean array) based on the fields that interest you, like so:`" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 True\n", "1 True\n", "2 True\n", "3 True\n", "4 True\n", "5 True\n", "6 True\n", "7 True\n", "8 True\n", "9 True\n", "10 True\n", "11 True\n", "12 True\n", "13 True\n", "14 True\n", "15 True\n", "16 True\n", "17 True\n", "18 True\n", "19 True\n", "20 True\n", "21 True\n", "22 True\n", "23 True\n", "24 True\n", "25 True\n", "26 True\n", "27 True\n", "28 True\n", "29 True\n", " ... \n", "614 False\n", "615 False\n", "616 False\n", "617 False\n", "618 False\n", "619 False\n", "620 False\n", "621 False\n", "622 False\n", "623 False\n", "624 False\n", "625 False\n", "626 False\n", "627 False\n", "628 False\n", "629 False\n", "630 False\n", "631 False\n", "632 False\n", "633 False\n", "634 False\n", "635 False\n", "636 False\n", "637 False\n", "638 False\n", "639 False\n", "640 False\n", "641 False\n", "642 False\n", "643 False\n", "Name: fac_type, Length: 644, dtype: bool" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['fac_type']=='NF'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Then, you run that list against the dataframe. Pandas will spit back every row where there's a True. Here's what it looks like:`" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
facidfac_typecapacityfac_namefac_addresscity_state_zipUnnamed: 6owneroperator
0385008NF96.0Presbyterian Community Care Center1085 N Oregon StOntario, OR 97914NaNPresbyterian Nursing Home, Inc.Presbyterian Nursing Home, Inc.
1385010NF159.0Laurelhurst Village Rehabilitation Center3060 SE Stark StPortland, OR 97214NaNLaurelhurst Operations, LLCLaurelhurst Operations, LLC
2385015NF128.0Regency Gresham Nursing & Rehabilitation Center5905 SE Powell Valley RdGresham, OR 97080NaNRegency Gresham Nursing & Rehabilitation Cente...Regency Pacific Management, LLC
3385018NF98.0Providence Benedictine Nursing Center540 South Main StMt. Angel, OR 97362NaNProvidence Health & Services - OregonProvidence Health & Services - Oregon
4385024NF91.0Avamere Health Services of Rogue Valley625 Stevens StMedford, OR 97504NaNMedford Operations, LLCMedford Operations, LLC
\n", "
" ], "text/plain": [ " facid fac_type capacity fac_name \\\n", "0 385008 NF 96.0 Presbyterian Community Care Center \n", "1 385010 NF 159.0 Laurelhurst Village Rehabilitation Center \n", "2 385015 NF 128.0 Regency Gresham Nursing & Rehabilitation Center \n", "3 385018 NF 98.0 Providence Benedictine Nursing Center \n", "4 385024 NF 91.0 Avamere Health Services of Rogue Valley \n", "\n", " fac_address city_state_zip Unnamed: 6 \\\n", "0 1085 N Oregon St Ontario, OR 97914 NaN \n", "1 3060 SE Stark St Portland, OR 97214 NaN \n", "2 5905 SE Powell Valley Rd Gresham, OR 97080 NaN \n", "3 540 South Main St Mt. Angel, OR 97362 NaN \n", "4 625 Stevens St Medford, OR 97504 NaN \n", "\n", " owner \\\n", "0 Presbyterian Nursing Home, Inc. \n", "1 Laurelhurst Operations, LLC \n", "2 Regency Gresham Nursing & Rehabilitation Cente... \n", "3 Providence Health & Services - Oregon \n", "4 Medford Operations, LLC \n", "\n", " operator \n", "0 Presbyterian Nursing Home, Inc. \n", "1 Laurelhurst Operations, LLC \n", "2 Regency Pacific Management, LLC \n", "3 Providence Health & Services - Oregon \n", "4 Medford Operations, LLC " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df['fac_type']=='NF'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Here's what filtering on multiple fields looks like. 'AND' is '&,' and 'OR' is '|'. Just make sure to put each argument inside parentheses.`" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
facidfac_typecapacityfac_namefac_addresscity_state_zipUnnamed: 6owneroperator
0385008NF96.0Presbyterian Community Care Center1085 N Oregon StOntario, OR 97914NaNPresbyterian Nursing Home, Inc.Presbyterian Nursing Home, Inc.
1385010NF159.0Laurelhurst Village Rehabilitation Center3060 SE Stark StPortland, OR 97214NaNLaurelhurst Operations, LLCLaurelhurst Operations, LLC
2385015NF128.0Regency Gresham Nursing & Rehabilitation Center5905 SE Powell Valley RdGresham, OR 97080NaNRegency Gresham Nursing & Rehabilitation Cente...Regency Pacific Management, LLC
3385018NF98.0Providence Benedictine Nursing Center540 South Main StMt. Angel, OR 97362NaNProvidence Health & Services - OregonProvidence Health & Services - Oregon
4385024NF91.0Avamere Health Services of Rogue Valley625 Stevens StMedford, OR 97504NaNMedford Operations, LLCMedford Operations, LLC
\n", "
" ], "text/plain": [ " facid fac_type capacity fac_name \\\n", "0 385008 NF 96.0 Presbyterian Community Care Center \n", "1 385010 NF 159.0 Laurelhurst Village Rehabilitation Center \n", "2 385015 NF 128.0 Regency Gresham Nursing & Rehabilitation Center \n", "3 385018 NF 98.0 Providence Benedictine Nursing Center \n", "4 385024 NF 91.0 Avamere Health Services of Rogue Valley \n", "\n", " fac_address city_state_zip Unnamed: 6 \\\n", "0 1085 N Oregon St Ontario, OR 97914 NaN \n", "1 3060 SE Stark St Portland, OR 97214 NaN \n", "2 5905 SE Powell Valley Rd Gresham, OR 97080 NaN \n", "3 540 South Main St Mt. Angel, OR 97362 NaN \n", "4 625 Stevens St Medford, OR 97504 NaN \n", "\n", " owner \\\n", "0 Presbyterian Nursing Home, Inc. \n", "1 Laurelhurst Operations, LLC \n", "2 Regency Gresham Nursing & Rehabilitation Cente... \n", "3 Providence Health & Services - Oregon \n", "4 Medford Operations, LLC \n", "\n", " operator \n", "0 Presbyterian Nursing Home, Inc. \n", "1 Laurelhurst Operations, LLC \n", "2 Regency Pacific Management, LLC \n", "3 Providence Health & Services - Oregon \n", "4 Medford Operations, LLC " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['fac_type']=='NF') & (df['capacity']>10)].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`How do you count the number of results you get when you filter?`" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "135" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['fac_type']=='NF') & (df['capacity']>10)].count()[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. SORTING" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Piece of cake.`" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
facidfac_typecapacityfac_namefac_addresscity_state_zipUnnamed: 6owneroperator
110385265NF5.0Mary's Woods at Marylhurst17360 Holy Names DriveLake Oswego, OR 97034NaNMary's Woods at Marylhurst, Inc.Mary's Woods at Marylhurst, Inc.
67385200NF6.0Willamette View Health Center13145 SE River RdMilwaukie, OR 97222NaNWillamette View, Inc. dbaWillamette View, Inc.
20750R062RCF7.0Hubbard Residential Care Facility647 Junction RdGlendale, OR 97442NaNNorma Ann HubbardNorma Ann Hubbard
33950R386RCF7.0Thanksgiving House184 N 2nd StSt. Helens, OR 97051NaNCecile MoldenCecile Molden
31750R364RCF8.0Summit Springs Village MCU120 S. Church St.Condon, OR 97823NaNSummit Springs Village CorporationSummit Springs Village Corporation
\n", "
" ], "text/plain": [ " facid fac_type capacity fac_name \\\n", "110 385265 NF 5.0 Mary's Woods at Marylhurst \n", "67 385200 NF 6.0 Willamette View Health Center \n", "207 50R062 RCF 7.0 Hubbard Residential Care Facility \n", "339 50R386 RCF 7.0 Thanksgiving House \n", "317 50R364 RCF 8.0 Summit Springs Village MCU \n", "\n", " fac_address city_state_zip Unnamed: 6 \\\n", "110 17360 Holy Names Drive Lake Oswego, OR 97034 NaN \n", "67 13145 SE River Rd Milwaukie, OR 97222 NaN \n", "207 647 Junction Rd Glendale, OR 97442 NaN \n", "339 184 N 2nd St St. Helens, OR 97051 NaN \n", "317 120 S. Church St. Condon, OR 97823 NaN \n", "\n", " owner operator \n", "110 Mary's Woods at Marylhurst, Inc. Mary's Woods at Marylhurst, Inc. \n", "67 Willamette View, Inc. dba Willamette View, Inc. \n", "207 Norma Ann Hubbard Norma Ann Hubbard \n", "339 Cecile Molden Cecile Molden \n", "317 Summit Springs Village Corporation Summit Springs Village Corporation " ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values('capacity').head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`If you like bigger numbers...`" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
facidfac_typecapacityfac_namefac_addresscity_state_zipUnnamed: 6owneroperator
93385240NF214.0Marian Estates390 Church StSublimity, OR 97385NaNErnmaur, Inc.Marian Estates Support Services
25950R293RCF186.0Miramont Pointe11520 SE Sunnyside RdClackamas, OR 97015NaNMP, LLCMP, LLC
57170M080ALF180.0Rose Schnitzer Manor6140 SW Boundary StPortland, OR 97221NaNRobison Jewish HomeRobison Jewish Home
20385112NF180.0West Hills Health & Rehabilitation Center5701 SW Multnomah BlvdPortland, OR 97219NaNWest Hills Convalescent Center Limited Partner...West Hills Convalescent Center Limited Partner...
50385166NF165.0Maryville Nursing Home14645 SW Farmington RdBeaverton, OR 97007NaNSisters of St. Mary of Oregon Maryville Corp.Sisters of St. Mary of Oregon Maryville Corp.
\n", "
" ], "text/plain": [ " facid fac_type capacity fac_name \\\n", "93 385240 NF 214.0 Marian Estates \n", "259 50R293 RCF 186.0 Miramont Pointe \n", "571 70M080 ALF 180.0 Rose Schnitzer Manor \n", "20 385112 NF 180.0 West Hills Health & Rehabilitation Center \n", "50 385166 NF 165.0 Maryville Nursing Home \n", "\n", " fac_address city_state_zip Unnamed: 6 \\\n", "93 390 Church St Sublimity, OR 97385 NaN \n", "259 11520 SE Sunnyside Rd Clackamas, OR 97015 NaN \n", "571 6140 SW Boundary St Portland, OR 97221 NaN \n", "20 5701 SW Multnomah Blvd Portland, OR 97219 NaN \n", "50 14645 SW Farmington Rd Beaverton, OR 97007 NaN \n", "\n", " owner \\\n", "93 Ernmaur, Inc. \n", "259 MP, LLC \n", "571 Robison Jewish Home \n", "20 West Hills Convalescent Center Limited Partner... \n", "50 Sisters of St. Mary of Oregon Maryville Corp. \n", "\n", " operator \n", "93 Marian Estates Support Services \n", "259 MP, LLC \n", "571 Robison Jewish Home \n", "20 West Hills Convalescent Center Limited Partner... \n", "50 Sisters of St. Mary of Oregon Maryville Corp. " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sort_values('capacity', ascending=False).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. SUMMARY STATS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Simply a breeze.`" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "capacity 57.61838\n", "Unnamed: 6 NaN\n", "dtype: float64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mean()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "capacity 53.0\n", "Unnamed: 6 NaN\n", "dtype: float64" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.median()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. GROUPING" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`The concept behind grouping is a little gnarly (I think).`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`First, you split the dataframe up into chunks, creating a grouped object. Like so:`" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('fac_type')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Then, you tell pandas what you want to do to that group of slices. Let's get a count:`" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
facidcapacityfac_namefac_addresscity_state_zipUnnamed: 6owneroperator
fac_type
ALF2212212212212210221221
NF1371371371371370137137
RCF2842832842842840284284
\n", "
" ], "text/plain": [ " facid capacity fac_name fac_address city_state_zip Unnamed: 6 \\\n", "fac_type \n", "ALF 221 221 221 221 221 0 \n", "NF 137 137 137 137 137 0 \n", "RCF 284 283 284 284 284 0 \n", "\n", " owner operator \n", "fac_type \n", "ALF 221 221 \n", "NF 137 137 \n", "RCF 284 284 " ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('fac_type').count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`OK, so we get a count of each type of facility. As you see, it gives us a result for each column that it can. It'll do the same for other kinds of calculations.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Let's get a mean.`" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
capacityUnnamed: 6
fac_type
ALF66.610860NaN
NF81.759124NaN
RCF38.759717NaN
\n", "
" ], "text/plain": [ " capacity Unnamed: 6\n", "fac_type \n", "ALF 66.610860 NaN\n", "NF 81.759124 NaN\n", "RCF 38.759717 NaN" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('fac_type').mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`And a sum.`" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
capacityUnnamed: 6
fac_type
ALF14721.0NaN
NF11201.0NaN
RCF10969.0NaN
\n", "
" ], "text/plain": [ " capacity Unnamed: 6\n", "fac_type \n", "ALF 14721.0 NaN\n", "NF 11201.0 NaN\n", "RCF 10969.0 NaN" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('fac_type').sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# SWITCHING GEARS: COMPLAINTS DATA" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "df2 = pd.read_xlsx('data/complaints.xlsx')" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
complaint_idfacility_idfacility_typeincident_datenotesseverityfineFacility Invest Results AbuseFacility Invest Results RuleType Of Abuse
0OT105179A385008NF8/31/2010RV reported asking staff to change him/her pri...2.00.0Not SubstantiatedSubstantiatedNaN
1OT105179B385008NF8/31/2010RV reported staff answered his/her call light,...2.00.0Not SubstantiatedSubstantiatedNaN
2OT105179C385008NF8/31/2010RV reported an unknown \"not RV's regular staff...2.00.0Not SubstantiatedSubstantiatedNaN
3OR0000656000385008NF12/21/2010Resident 1 was admitted with multiple diagnose...3.00.0SubstantiatedSubstantiatedNeglect
4OT105397385008NF9/17/2010RV was admitted 9/17/10 with multiple diagnose...2.00.0Not SubstantiatedSubstantiatedNaN
\n", "
" ], "text/plain": [ " complaint_id facility_id facility_type incident_date \\\n", "0 OT105179A 385008 NF 8/31/2010 \n", "1 OT105179B 385008 NF 8/31/2010 \n", "2 OT105179C 385008 NF 8/31/2010 \n", "3 OR0000656000 385008 NF 12/21/2010 \n", "4 OT105397 385008 NF 9/17/2010 \n", "\n", " notes severity fine \\\n", "0 RV reported asking staff to change him/her pri... 2.0 0.0 \n", "1 RV reported staff answered his/her call light,... 2.0 0.0 \n", "2 RV reported an unknown \"not RV's regular staff... 2.0 0.0 \n", "3 Resident 1 was admitted with multiple diagnose... 3.0 0.0 \n", "4 RV was admitted 9/17/10 with multiple diagnose... 2.0 0.0 \n", "\n", " Facility Invest Results Abuse Facility Invest Results Rule Type Of Abuse \n", "0 Not Substantiated Substantiated NaN \n", "1 Not Substantiated Substantiated NaN \n", "2 Not Substantiated Substantiated NaN \n", "3 Substantiated Substantiated Neglect \n", "4 Not Substantiated Substantiated NaN " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.head()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
complaint_idfacility_typeincident_datenotesseverityfineFacility Invest Results AbuseFacility Invest Results RuleType Of Abuse
facility_id
385008999997993
385010888887884
38501517171717171117174
3850181717171717517175
385024404040404028404012
\n", "
" ], "text/plain": [ " complaint_id facility_type incident_date notes severity \\\n", "facility_id \n", "385008 9 9 9 9 9 \n", "385010 8 8 8 8 8 \n", "385015 17 17 17 17 17 \n", "385018 17 17 17 17 17 \n", "385024 40 40 40 40 40 \n", "\n", " fine Facility Invest Results Abuse \\\n", "facility_id \n", "385008 7 9 \n", "385010 7 8 \n", "385015 11 17 \n", "385018 5 17 \n", "385024 28 40 \n", "\n", " Facility Invest Results Rule Type Of Abuse \n", "facility_id \n", "385008 9 3 \n", "385010 8 4 \n", "385015 17 4 \n", "385018 17 5 \n", "385024 40 12 " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.groupby('facility_id').count().head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`We don't need all these columns because they show more or less the thing. So we select the index column and the one with unique IDs for each complaint.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`(You'll notice that facility_id became the index in the grouping. We undo that by doing 'reset_index.')`" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
facility_idcomplaint_id
03850089
13850108
238501517
338501817
438502440
\n", "
" ], "text/plain": [ " facility_id complaint_id\n", "0 385008 9\n", "1 385010 8\n", "2 385015 17\n", "3 385018 17\n", "4 385024 40" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.groupby('facility_id').count().reset_index()[['facility_id','complaint_id']].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Let's create a new dataframe.`" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true }, "outputs": [], "source": [ "fac_comp = df2.groupby('facility_id').count().reset_index()[['facility_id','complaint_id']]" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
facility_idcomplaint_id
03850089
13850108
238501517
338501817
438502440
\n", "
" ], "text/plain": [ " facility_id complaint_id\n", "0 385008 9\n", "1 385010 8\n", "2 385015 17\n", "3 385018 17\n", "4 385024 40" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fac_comp.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. JOINING" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`OK, so now we've got one dataframe with a list of facilities and their characteristics, and another with the number of substantiated complaints per facility.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`If we want to do any Journalism with the data, we need to combine the two dataframes. For those familiar with SQL, this is called joining.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`The variables in pandas joining are pretty similar to the ones in SQL: you tell pandas which fields to join on and whether it's a left, right, inner, or outer join.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`The syntax is as follows:`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`dataframe1.merge(dataframe2,left_on='dataframe1.column',right_on='dataframe2.column',how='left')`" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
facidfac_typecapacityfac_namefac_addresscity_state_zipUnnamed: 6owneroperatorfacility_idcomplaint_id
0385008NF96.0Presbyterian Community Care Center1085 N Oregon StOntario, OR 97914NaNPresbyterian Nursing Home, Inc.Presbyterian Nursing Home, Inc.3850089.0
1385010NF159.0Laurelhurst Village Rehabilitation Center3060 SE Stark StPortland, OR 97214NaNLaurelhurst Operations, LLCLaurelhurst Operations, LLC3850108.0
2385015NF128.0Regency Gresham Nursing & Rehabilitation Center5905 SE Powell Valley RdGresham, OR 97080NaNRegency Gresham Nursing & Rehabilitation Cente...Regency Pacific Management, LLC38501517.0
3385018NF98.0Providence Benedictine Nursing Center540 South Main StMt. Angel, OR 97362NaNProvidence Health & Services - OregonProvidence Health & Services - Oregon38501817.0
4385024NF91.0Avamere Health Services of Rogue Valley625 Stevens StMedford, OR 97504NaNMedford Operations, LLCMedford Operations, LLC38502440.0
\n", "
" ], "text/plain": [ " facid fac_type capacity fac_name \\\n", "0 385008 NF 96.0 Presbyterian Community Care Center \n", "1 385010 NF 159.0 Laurelhurst Village Rehabilitation Center \n", "2 385015 NF 128.0 Regency Gresham Nursing & Rehabilitation Center \n", "3 385018 NF 98.0 Providence Benedictine Nursing Center \n", "4 385024 NF 91.0 Avamere Health Services of Rogue Valley \n", "\n", " fac_address city_state_zip Unnamed: 6 \\\n", "0 1085 N Oregon St Ontario, OR 97914 NaN \n", "1 3060 SE Stark St Portland, OR 97214 NaN \n", "2 5905 SE Powell Valley Rd Gresham, OR 97080 NaN \n", "3 540 South Main St Mt. Angel, OR 97362 NaN \n", "4 625 Stevens St Medford, OR 97504 NaN \n", "\n", " owner \\\n", "0 Presbyterian Nursing Home, Inc. \n", "1 Laurelhurst Operations, LLC \n", "2 Regency Gresham Nursing & Rehabilitation Cente... \n", "3 Providence Health & Services - Oregon \n", "4 Medford Operations, LLC \n", "\n", " operator facility_id complaint_id \n", "0 Presbyterian Nursing Home, Inc. 385008 9.0 \n", "1 Laurelhurst Operations, LLC 385010 8.0 \n", "2 Regency Pacific Management, LLC 385015 17.0 \n", "3 Providence Health & Services - Oregon 385018 17.0 \n", "4 Medford Operations, LLC 385024 40.0 " ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.merge(fac_comp, left_on='facid',right_on='facility_id',how='left').head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Recall the beauty of being able to easily create a new dataframe:`" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": true }, "outputs": [], "source": [ "m1 = df.merge(fac_comp, left_on='facid',right_on='facility_id',how='left')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Check which facilities had no complaints (filter for null).`" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true }, "outputs": [], "source": [ "null_count = m1[m1.complaint_id.isnull()]" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "77" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "null_count.count()[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Create a new column with an overall complaint rate (complaints per bed). `" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`(NB: Only the df['field'] syntax works when creating a new column.)`" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": true }, "outputs": [], "source": [ "m1['comp_rate'] = m1.complaint_id/m1.capacity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Cool. Let's see the 5 facilities with the highest complaint rates.`" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
facidfac_typecapacityfac_namefac_addresscity_state_zipUnnamed: 6owneroperatorfacility_idcomplaint_idcomp_rate
3935MA170RCF37.0Brookdale McMinnville Westside320 SW Hill RoadMcMinnville, OR 97128NaNBrookdale Senior Living Communities, Inc.Brookdale Senior Living Communities, Inc.5MA17059.01.594595
4035MA233RCF30.0Ashley Manor - Roseburg427 SE Ramp St.Roseburg, OR 97470NaNAshley Manor LLCAshley Manor LLC5MA23347.01.566667
3905MA161RCF32.0Skylark Memory Care950 Skylark PlaceAshland, OR 97520NaNAshland Assisted Living, LLCMission Senior Living, LLC5MA16148.01.500000
32050R367RCF48.0Arbor Oaks Terrace Memory Care317 Werth Blvd.Newberg, OR 97132NaNNewberg Memory Associates, LLCFrontier Management, LLC50R36771.01.479167
3985MA215RCF55.0Baycrest Memory Care955 Kentucky AvenueCoos Bay, OR 97420NaNBay Area Properties, LLCRadiant Senior Living, Inc.5MA21577.01.400000
\n", "
" ], "text/plain": [ " facid fac_type capacity fac_name \\\n", "393 5MA170 RCF 37.0 Brookdale McMinnville Westside \n", "403 5MA233 RCF 30.0 Ashley Manor - Roseburg \n", "390 5MA161 RCF 32.0 Skylark Memory Care \n", "320 50R367 RCF 48.0 Arbor Oaks Terrace Memory Care \n", "398 5MA215 RCF 55.0 Baycrest Memory Care \n", "\n", " fac_address city_state_zip Unnamed: 6 \\\n", "393 320 SW Hill Road McMinnville, OR 97128 NaN \n", "403 427 SE Ramp St. Roseburg, OR 97470 NaN \n", "390 950 Skylark Place Ashland, OR 97520 NaN \n", "320 317 Werth Blvd. Newberg, OR 97132 NaN \n", "398 955 Kentucky Avenue Coos Bay, OR 97420 NaN \n", "\n", " owner \\\n", "393 Brookdale Senior Living Communities, Inc. \n", "403 Ashley Manor LLC \n", "390 Ashland Assisted Living, LLC \n", "320 Newberg Memory Associates, LLC \n", "398 Bay Area Properties, LLC \n", "\n", " operator facility_id complaint_id \\\n", "393 Brookdale Senior Living Communities, Inc. 5MA170 59.0 \n", "403 Ashley Manor LLC 5MA233 47.0 \n", "390 Mission Senior Living, LLC 5MA161 48.0 \n", "320 Frontier Management, LLC 50R367 71.0 \n", "398 Radiant Senior Living, Inc. 5MA215 77.0 \n", "\n", " comp_rate \n", "393 1.594595 \n", "403 1.566667 \n", "390 1.500000 \n", "320 1.479167 \n", "398 1.400000 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m1.sort_values('comp_rate',ascending=False).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Cool Stuff.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`The 'fac_type' field tells you if a facility is a nursing home, a residential care facility, or an assisted living facility. Each one is subject to different sets of regulations so who knows, maybe there's a difference.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`We've already gone over the thing we'll need to use to do this. Does anybody remember?`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`That's right, grouping!`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. AND NOW WE PUT IT TOGETHER..." ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
capacityUnnamed: 6complaint_idcomp_rate
fac_type
ALF66.610860NaN10.1951220.154733
NF81.759124NaN14.5920000.178521
RCF38.759717NaN10.5169490.290260
\n", "
" ], "text/plain": [ " capacity Unnamed: 6 complaint_id comp_rate\n", "fac_type \n", "ALF 66.610860 NaN 10.195122 0.154733\n", "NF 81.759124 NaN 14.592000 0.178521\n", "RCF 38.759717 NaN 10.516949 0.290260" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m1.groupby('fac_type').mean()" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mean_comp_rate = m1.groupby('fac_type').mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Wow, RCFs are super high! Wonder what could be going on there...`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Final final step" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`What if you want to show your results to someone not as techno-magical as you? Why, simply export your results into a csv. Decide where you're gonna output your dataframe and what you're gonna call it.`" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mean_comp_rate.to_csv('output/mean_comp_rate.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# General words of wisdom" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "`Stackoverflow is your friend.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Google is your friend.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Documentation is your friend.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`News Nerdery's 'help me' channel is your friend.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Your friends are your friends.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Make a google spreadsheet where you save snippets of code you're likely to reuse.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`If you need to, find a good wall to bang your head against. There is no shame in that.`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Never give up. (within reason.)`" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }