{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Good afternoon!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`In this class we'll learn how to slice and dice our data using pandas, a Python package created specifically for data analysis.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`We'll be doing all this in Jupyter Notebooks, a supremely nifty, handy-dandy tool for writing, experimenting with, and editing your code.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`'A' creates a new cell above, 'B' creates a new cell below. Double 'D' deletes a cell. 'Option+Enter' runs a cell and creates a new one. What you need might pop up in the dropdown. And finally, select a cell without being in the cell and press 'M' to switch to Markdown.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`To use pandas, we have to tell Python to pull the bundle of tools into this specific python session. At the time same time, we call it 'pd,' because that allows us to call on those tools without having to write out 'pandas' every single time.`"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Let's look at our data.`\n",
"1. An Excel file listing three kinds of old-folks homes in Oregon: nursing homes, residential care facilities, and assisted living facilities in Oregon.\n",
"2. A csv file listing each substantiated case of abuse or neglect at each facility over 5 years."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`What we want to do:`\n",
"\n",
"1. Filter facilities\n",
"2. Sort facilities\n",
"3. Get summary stats on facilities (means and medians)\n",
"4. Calculate complaints per facility (grouping and joining)\n",
"5. Find worst/best facilities (more sorting)\n",
"6. See which of the three categories of facilities has the most violations. (grouping with summary stats)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## But first!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`... we have to get the data into this thing.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`How? Simple.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Create a DATAFRAME using the 'read_csv' tool that comes with pandas.`"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df = pd.read_csv('data/facilities.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`You've now created a pandas dataframe. To see what it looks like, show the first five rows of the variable 'df' that you just created.`"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" facid | \n",
" fac_type | \n",
" capacity | \n",
" fac_name | \n",
" fac_address | \n",
" city_state_zip | \n",
" Unnamed: 6 | \n",
" owner | \n",
" operator | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 385008 | \n",
" NF | \n",
" 96.0 | \n",
" Presbyterian Community Care Center | \n",
" 1085 N Oregon St | \n",
" Ontario, OR 97914 | \n",
" NaN | \n",
" Presbyterian Nursing Home, Inc. | \n",
" Presbyterian Nursing Home, Inc. | \n",
"
\n",
" \n",
" | 1 | \n",
" 385010 | \n",
" NF | \n",
" 159.0 | \n",
" Laurelhurst Village Rehabilitation Center | \n",
" 3060 SE Stark St | \n",
" Portland, OR 97214 | \n",
" NaN | \n",
" Laurelhurst Operations, LLC | \n",
" Laurelhurst Operations, LLC | \n",
"
\n",
" \n",
" | 2 | \n",
" 385015 | \n",
" NF | \n",
" 128.0 | \n",
" Regency Gresham Nursing & Rehabilitation Center | \n",
" 5905 SE Powell Valley Rd | \n",
" Gresham, OR 97080 | \n",
" NaN | \n",
" Regency Gresham Nursing & Rehabilitation Cente... | \n",
" Regency Pacific Management, LLC | \n",
"
\n",
" \n",
" | 3 | \n",
" 385018 | \n",
" NF | \n",
" 98.0 | \n",
" Providence Benedictine Nursing Center | \n",
" 540 South Main St | \n",
" Mt. Angel, OR 97362 | \n",
" NaN | \n",
" Providence Health & Services - Oregon | \n",
" Providence Health & Services - Oregon | \n",
"
\n",
" \n",
" | 4 | \n",
" 385024 | \n",
" NF | \n",
" 91.0 | \n",
" Avamere Health Services of Rogue Valley | \n",
" 625 Stevens St | \n",
" Medford, OR 97504 | \n",
" NaN | \n",
" Medford Operations, LLC | \n",
" Medford Operations, LLC | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" facid fac_type capacity fac_name \\\n",
"0 385008 NF 96.0 Presbyterian Community Care Center \n",
"1 385010 NF 159.0 Laurelhurst Village Rehabilitation Center \n",
"2 385015 NF 128.0 Regency Gresham Nursing & Rehabilitation Center \n",
"3 385018 NF 98.0 Providence Benedictine Nursing Center \n",
"4 385024 NF 91.0 Avamere Health Services of Rogue Valley \n",
"\n",
" fac_address city_state_zip Unnamed: 6 \\\n",
"0 1085 N Oregon St Ontario, OR 97914 NaN \n",
"1 3060 SE Stark St Portland, OR 97214 NaN \n",
"2 5905 SE Powell Valley Rd Gresham, OR 97080 NaN \n",
"3 540 South Main St Mt. Angel, OR 97362 NaN \n",
"4 625 Stevens St Medford, OR 97504 NaN \n",
"\n",
" owner \\\n",
"0 Presbyterian Nursing Home, Inc. \n",
"1 Laurelhurst Operations, LLC \n",
"2 Regency Gresham Nursing & Rehabilitation Cente... \n",
"3 Providence Health & Services - Oregon \n",
"4 Medford Operations, LLC \n",
"\n",
" operator \n",
"0 Presbyterian Nursing Home, Inc. \n",
"1 Laurelhurst Operations, LLC \n",
"2 Regency Pacific Management, LLC \n",
"3 Providence Health & Services - Oregon \n",
"4 Medford Operations, LLC "
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`***The '.head()' selects the first five rows of a dataframe. We'll be using these throughout the notebook because selecting an entire dataframe takes up a lot of space.***`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## BUT FIRST... THE BASICS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`1. Select a column:`"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" fac_type | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" NF | \n",
"
\n",
" \n",
" | 1 | \n",
" NF | \n",
"
\n",
" \n",
" | 2 | \n",
" NF | \n",
"
\n",
" \n",
" | 3 | \n",
" NF | \n",
"
\n",
" \n",
" | 4 | \n",
" NF | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" fac_type\n",
"0 NF\n",
"1 NF\n",
"2 NF\n",
"3 NF\n",
"4 NF"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['fac_type']].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Select two columns:`"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" fac_type | \n",
" capacity | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" NF | \n",
" 96.0 | \n",
"
\n",
" \n",
" | 1 | \n",
" NF | \n",
" 159.0 | \n",
"
\n",
" \n",
" | 2 | \n",
" NF | \n",
" 128.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" NF | \n",
" 98.0 | \n",
"
\n",
" \n",
" | 4 | \n",
" NF | \n",
" 91.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" fac_type capacity\n",
"0 NF 96.0\n",
"1 NF 159.0\n",
"2 NF 128.0\n",
"3 NF 98.0\n",
"4 NF 91.0"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['fac_type','capacity']].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`count your rows`"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"facid 644\n",
"fac_type 642\n",
"capacity 642\n",
"fac_name 644\n",
"fac_address 644\n",
"city_state_zip 644\n",
"Unnamed: 6 0\n",
"owner 644\n",
"operator 644\n",
"dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. FILTERING"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Filtering dataframes is a funky business.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`First, you create a list of 'Trues' and 'Falses' (a boolean array) based on the fields that interest you, like so:`"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 True\n",
"1 True\n",
"2 True\n",
"3 True\n",
"4 True\n",
"5 True\n",
"6 True\n",
"7 True\n",
"8 True\n",
"9 True\n",
"10 True\n",
"11 True\n",
"12 True\n",
"13 True\n",
"14 True\n",
"15 True\n",
"16 True\n",
"17 True\n",
"18 True\n",
"19 True\n",
"20 True\n",
"21 True\n",
"22 True\n",
"23 True\n",
"24 True\n",
"25 True\n",
"26 True\n",
"27 True\n",
"28 True\n",
"29 True\n",
" ... \n",
"614 False\n",
"615 False\n",
"616 False\n",
"617 False\n",
"618 False\n",
"619 False\n",
"620 False\n",
"621 False\n",
"622 False\n",
"623 False\n",
"624 False\n",
"625 False\n",
"626 False\n",
"627 False\n",
"628 False\n",
"629 False\n",
"630 False\n",
"631 False\n",
"632 False\n",
"633 False\n",
"634 False\n",
"635 False\n",
"636 False\n",
"637 False\n",
"638 False\n",
"639 False\n",
"640 False\n",
"641 False\n",
"642 False\n",
"643 False\n",
"Name: fac_type, Length: 644, dtype: bool"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['fac_type']=='NF'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Then, you run that list against the dataframe. Pandas will spit back every row where there's a True. Here's what it looks like:`"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" facid | \n",
" fac_type | \n",
" capacity | \n",
" fac_name | \n",
" fac_address | \n",
" city_state_zip | \n",
" Unnamed: 6 | \n",
" owner | \n",
" operator | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 385008 | \n",
" NF | \n",
" 96.0 | \n",
" Presbyterian Community Care Center | \n",
" 1085 N Oregon St | \n",
" Ontario, OR 97914 | \n",
" NaN | \n",
" Presbyterian Nursing Home, Inc. | \n",
" Presbyterian Nursing Home, Inc. | \n",
"
\n",
" \n",
" | 1 | \n",
" 385010 | \n",
" NF | \n",
" 159.0 | \n",
" Laurelhurst Village Rehabilitation Center | \n",
" 3060 SE Stark St | \n",
" Portland, OR 97214 | \n",
" NaN | \n",
" Laurelhurst Operations, LLC | \n",
" Laurelhurst Operations, LLC | \n",
"
\n",
" \n",
" | 2 | \n",
" 385015 | \n",
" NF | \n",
" 128.0 | \n",
" Regency Gresham Nursing & Rehabilitation Center | \n",
" 5905 SE Powell Valley Rd | \n",
" Gresham, OR 97080 | \n",
" NaN | \n",
" Regency Gresham Nursing & Rehabilitation Cente... | \n",
" Regency Pacific Management, LLC | \n",
"
\n",
" \n",
" | 3 | \n",
" 385018 | \n",
" NF | \n",
" 98.0 | \n",
" Providence Benedictine Nursing Center | \n",
" 540 South Main St | \n",
" Mt. Angel, OR 97362 | \n",
" NaN | \n",
" Providence Health & Services - Oregon | \n",
" Providence Health & Services - Oregon | \n",
"
\n",
" \n",
" | 4 | \n",
" 385024 | \n",
" NF | \n",
" 91.0 | \n",
" Avamere Health Services of Rogue Valley | \n",
" 625 Stevens St | \n",
" Medford, OR 97504 | \n",
" NaN | \n",
" Medford Operations, LLC | \n",
" Medford Operations, LLC | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" facid fac_type capacity fac_name \\\n",
"0 385008 NF 96.0 Presbyterian Community Care Center \n",
"1 385010 NF 159.0 Laurelhurst Village Rehabilitation Center \n",
"2 385015 NF 128.0 Regency Gresham Nursing & Rehabilitation Center \n",
"3 385018 NF 98.0 Providence Benedictine Nursing Center \n",
"4 385024 NF 91.0 Avamere Health Services of Rogue Valley \n",
"\n",
" fac_address city_state_zip Unnamed: 6 \\\n",
"0 1085 N Oregon St Ontario, OR 97914 NaN \n",
"1 3060 SE Stark St Portland, OR 97214 NaN \n",
"2 5905 SE Powell Valley Rd Gresham, OR 97080 NaN \n",
"3 540 South Main St Mt. Angel, OR 97362 NaN \n",
"4 625 Stevens St Medford, OR 97504 NaN \n",
"\n",
" owner \\\n",
"0 Presbyterian Nursing Home, Inc. \n",
"1 Laurelhurst Operations, LLC \n",
"2 Regency Gresham Nursing & Rehabilitation Cente... \n",
"3 Providence Health & Services - Oregon \n",
"4 Medford Operations, LLC \n",
"\n",
" operator \n",
"0 Presbyterian Nursing Home, Inc. \n",
"1 Laurelhurst Operations, LLC \n",
"2 Regency Pacific Management, LLC \n",
"3 Providence Health & Services - Oregon \n",
"4 Medford Operations, LLC "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['fac_type']=='NF'].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Here's what filtering on multiple fields looks like. 'AND' is '&,' and 'OR' is '|'. Just make sure to put each argument inside parentheses.`"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" facid | \n",
" fac_type | \n",
" capacity | \n",
" fac_name | \n",
" fac_address | \n",
" city_state_zip | \n",
" Unnamed: 6 | \n",
" owner | \n",
" operator | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 385008 | \n",
" NF | \n",
" 96.0 | \n",
" Presbyterian Community Care Center | \n",
" 1085 N Oregon St | \n",
" Ontario, OR 97914 | \n",
" NaN | \n",
" Presbyterian Nursing Home, Inc. | \n",
" Presbyterian Nursing Home, Inc. | \n",
"
\n",
" \n",
" | 1 | \n",
" 385010 | \n",
" NF | \n",
" 159.0 | \n",
" Laurelhurst Village Rehabilitation Center | \n",
" 3060 SE Stark St | \n",
" Portland, OR 97214 | \n",
" NaN | \n",
" Laurelhurst Operations, LLC | \n",
" Laurelhurst Operations, LLC | \n",
"
\n",
" \n",
" | 2 | \n",
" 385015 | \n",
" NF | \n",
" 128.0 | \n",
" Regency Gresham Nursing & Rehabilitation Center | \n",
" 5905 SE Powell Valley Rd | \n",
" Gresham, OR 97080 | \n",
" NaN | \n",
" Regency Gresham Nursing & Rehabilitation Cente... | \n",
" Regency Pacific Management, LLC | \n",
"
\n",
" \n",
" | 3 | \n",
" 385018 | \n",
" NF | \n",
" 98.0 | \n",
" Providence Benedictine Nursing Center | \n",
" 540 South Main St | \n",
" Mt. Angel, OR 97362 | \n",
" NaN | \n",
" Providence Health & Services - Oregon | \n",
" Providence Health & Services - Oregon | \n",
"
\n",
" \n",
" | 4 | \n",
" 385024 | \n",
" NF | \n",
" 91.0 | \n",
" Avamere Health Services of Rogue Valley | \n",
" 625 Stevens St | \n",
" Medford, OR 97504 | \n",
" NaN | \n",
" Medford Operations, LLC | \n",
" Medford Operations, LLC | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" facid fac_type capacity fac_name \\\n",
"0 385008 NF 96.0 Presbyterian Community Care Center \n",
"1 385010 NF 159.0 Laurelhurst Village Rehabilitation Center \n",
"2 385015 NF 128.0 Regency Gresham Nursing & Rehabilitation Center \n",
"3 385018 NF 98.0 Providence Benedictine Nursing Center \n",
"4 385024 NF 91.0 Avamere Health Services of Rogue Valley \n",
"\n",
" fac_address city_state_zip Unnamed: 6 \\\n",
"0 1085 N Oregon St Ontario, OR 97914 NaN \n",
"1 3060 SE Stark St Portland, OR 97214 NaN \n",
"2 5905 SE Powell Valley Rd Gresham, OR 97080 NaN \n",
"3 540 South Main St Mt. Angel, OR 97362 NaN \n",
"4 625 Stevens St Medford, OR 97504 NaN \n",
"\n",
" owner \\\n",
"0 Presbyterian Nursing Home, Inc. \n",
"1 Laurelhurst Operations, LLC \n",
"2 Regency Gresham Nursing & Rehabilitation Cente... \n",
"3 Providence Health & Services - Oregon \n",
"4 Medford Operations, LLC \n",
"\n",
" operator \n",
"0 Presbyterian Nursing Home, Inc. \n",
"1 Laurelhurst Operations, LLC \n",
"2 Regency Pacific Management, LLC \n",
"3 Providence Health & Services - Oregon \n",
"4 Medford Operations, LLC "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df['fac_type']=='NF') & (df['capacity']>10)].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`How do you count the number of results you get when you filter?`"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"135"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[(df['fac_type']=='NF') & (df['capacity']>10)].count()[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. SORTING"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Piece of cake.`"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" facid | \n",
" fac_type | \n",
" capacity | \n",
" fac_name | \n",
" fac_address | \n",
" city_state_zip | \n",
" Unnamed: 6 | \n",
" owner | \n",
" operator | \n",
"
\n",
" \n",
" \n",
" \n",
" | 110 | \n",
" 385265 | \n",
" NF | \n",
" 5.0 | \n",
" Mary's Woods at Marylhurst | \n",
" 17360 Holy Names Drive | \n",
" Lake Oswego, OR 97034 | \n",
" NaN | \n",
" Mary's Woods at Marylhurst, Inc. | \n",
" Mary's Woods at Marylhurst, Inc. | \n",
"
\n",
" \n",
" | 67 | \n",
" 385200 | \n",
" NF | \n",
" 6.0 | \n",
" Willamette View Health Center | \n",
" 13145 SE River Rd | \n",
" Milwaukie, OR 97222 | \n",
" NaN | \n",
" Willamette View, Inc. dba | \n",
" Willamette View, Inc. | \n",
"
\n",
" \n",
" | 207 | \n",
" 50R062 | \n",
" RCF | \n",
" 7.0 | \n",
" Hubbard Residential Care Facility | \n",
" 647 Junction Rd | \n",
" Glendale, OR 97442 | \n",
" NaN | \n",
" Norma Ann Hubbard | \n",
" Norma Ann Hubbard | \n",
"
\n",
" \n",
" | 339 | \n",
" 50R386 | \n",
" RCF | \n",
" 7.0 | \n",
" Thanksgiving House | \n",
" 184 N 2nd St | \n",
" St. Helens, OR 97051 | \n",
" NaN | \n",
" Cecile Molden | \n",
" Cecile Molden | \n",
"
\n",
" \n",
" | 317 | \n",
" 50R364 | \n",
" RCF | \n",
" 8.0 | \n",
" Summit Springs Village MCU | \n",
" 120 S. Church St. | \n",
" Condon, OR 97823 | \n",
" NaN | \n",
" Summit Springs Village Corporation | \n",
" Summit Springs Village Corporation | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" facid fac_type capacity fac_name \\\n",
"110 385265 NF 5.0 Mary's Woods at Marylhurst \n",
"67 385200 NF 6.0 Willamette View Health Center \n",
"207 50R062 RCF 7.0 Hubbard Residential Care Facility \n",
"339 50R386 RCF 7.0 Thanksgiving House \n",
"317 50R364 RCF 8.0 Summit Springs Village MCU \n",
"\n",
" fac_address city_state_zip Unnamed: 6 \\\n",
"110 17360 Holy Names Drive Lake Oswego, OR 97034 NaN \n",
"67 13145 SE River Rd Milwaukie, OR 97222 NaN \n",
"207 647 Junction Rd Glendale, OR 97442 NaN \n",
"339 184 N 2nd St St. Helens, OR 97051 NaN \n",
"317 120 S. Church St. Condon, OR 97823 NaN \n",
"\n",
" owner operator \n",
"110 Mary's Woods at Marylhurst, Inc. Mary's Woods at Marylhurst, Inc. \n",
"67 Willamette View, Inc. dba Willamette View, Inc. \n",
"207 Norma Ann Hubbard Norma Ann Hubbard \n",
"339 Cecile Molden Cecile Molden \n",
"317 Summit Springs Village Corporation Summit Springs Village Corporation "
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sort_values('capacity').head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`If you like bigger numbers...`"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" facid | \n",
" fac_type | \n",
" capacity | \n",
" fac_name | \n",
" fac_address | \n",
" city_state_zip | \n",
" Unnamed: 6 | \n",
" owner | \n",
" operator | \n",
"
\n",
" \n",
" \n",
" \n",
" | 93 | \n",
" 385240 | \n",
" NF | \n",
" 214.0 | \n",
" Marian Estates | \n",
" 390 Church St | \n",
" Sublimity, OR 97385 | \n",
" NaN | \n",
" Ernmaur, Inc. | \n",
" Marian Estates Support Services | \n",
"
\n",
" \n",
" | 259 | \n",
" 50R293 | \n",
" RCF | \n",
" 186.0 | \n",
" Miramont Pointe | \n",
" 11520 SE Sunnyside Rd | \n",
" Clackamas, OR 97015 | \n",
" NaN | \n",
" MP, LLC | \n",
" MP, LLC | \n",
"
\n",
" \n",
" | 571 | \n",
" 70M080 | \n",
" ALF | \n",
" 180.0 | \n",
" Rose Schnitzer Manor | \n",
" 6140 SW Boundary St | \n",
" Portland, OR 97221 | \n",
" NaN | \n",
" Robison Jewish Home | \n",
" Robison Jewish Home | \n",
"
\n",
" \n",
" | 20 | \n",
" 385112 | \n",
" NF | \n",
" 180.0 | \n",
" West Hills Health & Rehabilitation Center | \n",
" 5701 SW Multnomah Blvd | \n",
" Portland, OR 97219 | \n",
" NaN | \n",
" West Hills Convalescent Center Limited Partner... | \n",
" West Hills Convalescent Center Limited Partner... | \n",
"
\n",
" \n",
" | 50 | \n",
" 385166 | \n",
" NF | \n",
" 165.0 | \n",
" Maryville Nursing Home | \n",
" 14645 SW Farmington Rd | \n",
" Beaverton, OR 97007 | \n",
" NaN | \n",
" Sisters of St. Mary of Oregon Maryville Corp. | \n",
" Sisters of St. Mary of Oregon Maryville Corp. | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" facid fac_type capacity fac_name \\\n",
"93 385240 NF 214.0 Marian Estates \n",
"259 50R293 RCF 186.0 Miramont Pointe \n",
"571 70M080 ALF 180.0 Rose Schnitzer Manor \n",
"20 385112 NF 180.0 West Hills Health & Rehabilitation Center \n",
"50 385166 NF 165.0 Maryville Nursing Home \n",
"\n",
" fac_address city_state_zip Unnamed: 6 \\\n",
"93 390 Church St Sublimity, OR 97385 NaN \n",
"259 11520 SE Sunnyside Rd Clackamas, OR 97015 NaN \n",
"571 6140 SW Boundary St Portland, OR 97221 NaN \n",
"20 5701 SW Multnomah Blvd Portland, OR 97219 NaN \n",
"50 14645 SW Farmington Rd Beaverton, OR 97007 NaN \n",
"\n",
" owner \\\n",
"93 Ernmaur, Inc. \n",
"259 MP, LLC \n",
"571 Robison Jewish Home \n",
"20 West Hills Convalescent Center Limited Partner... \n",
"50 Sisters of St. Mary of Oregon Maryville Corp. \n",
"\n",
" operator \n",
"93 Marian Estates Support Services \n",
"259 MP, LLC \n",
"571 Robison Jewish Home \n",
"20 West Hills Convalescent Center Limited Partner... \n",
"50 Sisters of St. Mary of Oregon Maryville Corp. "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sort_values('capacity', ascending=False).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. SUMMARY STATS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Simply a breeze.`"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"capacity 57.61838\n",
"Unnamed: 6 NaN\n",
"dtype: float64"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.mean()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"capacity 53.0\n",
"Unnamed: 6 NaN\n",
"dtype: float64"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.median()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. GROUPING"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`The concept behind grouping is a little gnarly (I think).`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`First, you split the dataframe up into chunks, creating a grouped object. Like so:`"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('fac_type')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Then, you tell pandas what you want to do to that group of slices. Let's get a count:`"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" facid | \n",
" capacity | \n",
" fac_name | \n",
" fac_address | \n",
" city_state_zip | \n",
" Unnamed: 6 | \n",
" owner | \n",
" operator | \n",
"
\n",
" \n",
" | fac_type | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | ALF | \n",
" 221 | \n",
" 221 | \n",
" 221 | \n",
" 221 | \n",
" 221 | \n",
" 0 | \n",
" 221 | \n",
" 221 | \n",
"
\n",
" \n",
" | NF | \n",
" 137 | \n",
" 137 | \n",
" 137 | \n",
" 137 | \n",
" 137 | \n",
" 0 | \n",
" 137 | \n",
" 137 | \n",
"
\n",
" \n",
" | RCF | \n",
" 284 | \n",
" 283 | \n",
" 284 | \n",
" 284 | \n",
" 284 | \n",
" 0 | \n",
" 284 | \n",
" 284 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" facid capacity fac_name fac_address city_state_zip Unnamed: 6 \\\n",
"fac_type \n",
"ALF 221 221 221 221 221 0 \n",
"NF 137 137 137 137 137 0 \n",
"RCF 284 283 284 284 284 0 \n",
"\n",
" owner operator \n",
"fac_type \n",
"ALF 221 221 \n",
"NF 137 137 \n",
"RCF 284 284 "
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('fac_type').count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`OK, so we get a count of each type of facility. As you see, it gives us a result for each column that it can. It'll do the same for other kinds of calculations.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Let's get a mean.`"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" capacity | \n",
" Unnamed: 6 | \n",
"
\n",
" \n",
" | fac_type | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | ALF | \n",
" 66.610860 | \n",
" NaN | \n",
"
\n",
" \n",
" | NF | \n",
" 81.759124 | \n",
" NaN | \n",
"
\n",
" \n",
" | RCF | \n",
" 38.759717 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" capacity Unnamed: 6\n",
"fac_type \n",
"ALF 66.610860 NaN\n",
"NF 81.759124 NaN\n",
"RCF 38.759717 NaN"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('fac_type').mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`And a sum.`"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" capacity | \n",
" Unnamed: 6 | \n",
"
\n",
" \n",
" | fac_type | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | ALF | \n",
" 14721.0 | \n",
" NaN | \n",
"
\n",
" \n",
" | NF | \n",
" 11201.0 | \n",
" NaN | \n",
"
\n",
" \n",
" | RCF | \n",
" 10969.0 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" capacity Unnamed: 6\n",
"fac_type \n",
"ALF 14721.0 NaN\n",
"NF 11201.0 NaN\n",
"RCF 10969.0 NaN"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.groupby('fac_type').sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# SWITCHING GEARS: COMPLAINTS DATA"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": [
"df2 = pd.read_xlsx('data/complaints.xlsx')"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" complaint_id | \n",
" facility_id | \n",
" facility_type | \n",
" incident_date | \n",
" notes | \n",
" severity | \n",
" fine | \n",
" Facility Invest Results Abuse | \n",
" Facility Invest Results Rule | \n",
" Type Of Abuse | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" OT105179A | \n",
" 385008 | \n",
" NF | \n",
" 8/31/2010 | \n",
" RV reported asking staff to change him/her pri... | \n",
" 2.0 | \n",
" 0.0 | \n",
" Not Substantiated | \n",
" Substantiated | \n",
" NaN | \n",
"
\n",
" \n",
" | 1 | \n",
" OT105179B | \n",
" 385008 | \n",
" NF | \n",
" 8/31/2010 | \n",
" RV reported staff answered his/her call light,... | \n",
" 2.0 | \n",
" 0.0 | \n",
" Not Substantiated | \n",
" Substantiated | \n",
" NaN | \n",
"
\n",
" \n",
" | 2 | \n",
" OT105179C | \n",
" 385008 | \n",
" NF | \n",
" 8/31/2010 | \n",
" RV reported an unknown \"not RV's regular staff... | \n",
" 2.0 | \n",
" 0.0 | \n",
" Not Substantiated | \n",
" Substantiated | \n",
" NaN | \n",
"
\n",
" \n",
" | 3 | \n",
" OR0000656000 | \n",
" 385008 | \n",
" NF | \n",
" 12/21/2010 | \n",
" Resident 1 was admitted with multiple diagnose... | \n",
" 3.0 | \n",
" 0.0 | \n",
" Substantiated | \n",
" Substantiated | \n",
" Neglect | \n",
"
\n",
" \n",
" | 4 | \n",
" OT105397 | \n",
" 385008 | \n",
" NF | \n",
" 9/17/2010 | \n",
" RV was admitted 9/17/10 with multiple diagnose... | \n",
" 2.0 | \n",
" 0.0 | \n",
" Not Substantiated | \n",
" Substantiated | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" complaint_id facility_id facility_type incident_date \\\n",
"0 OT105179A 385008 NF 8/31/2010 \n",
"1 OT105179B 385008 NF 8/31/2010 \n",
"2 OT105179C 385008 NF 8/31/2010 \n",
"3 OR0000656000 385008 NF 12/21/2010 \n",
"4 OT105397 385008 NF 9/17/2010 \n",
"\n",
" notes severity fine \\\n",
"0 RV reported asking staff to change him/her pri... 2.0 0.0 \n",
"1 RV reported staff answered his/her call light,... 2.0 0.0 \n",
"2 RV reported an unknown \"not RV's regular staff... 2.0 0.0 \n",
"3 Resident 1 was admitted with multiple diagnose... 3.0 0.0 \n",
"4 RV was admitted 9/17/10 with multiple diagnose... 2.0 0.0 \n",
"\n",
" Facility Invest Results Abuse Facility Invest Results Rule Type Of Abuse \n",
"0 Not Substantiated Substantiated NaN \n",
"1 Not Substantiated Substantiated NaN \n",
"2 Not Substantiated Substantiated NaN \n",
"3 Substantiated Substantiated Neglect \n",
"4 Not Substantiated Substantiated NaN "
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2.head()"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" complaint_id | \n",
" facility_type | \n",
" incident_date | \n",
" notes | \n",
" severity | \n",
" fine | \n",
" Facility Invest Results Abuse | \n",
" Facility Invest Results Rule | \n",
" Type Of Abuse | \n",
"
\n",
" \n",
" | facility_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | 385008 | \n",
" 9 | \n",
" 9 | \n",
" 9 | \n",
" 9 | \n",
" 9 | \n",
" 7 | \n",
" 9 | \n",
" 9 | \n",
" 3 | \n",
"
\n",
" \n",
" | 385010 | \n",
" 8 | \n",
" 8 | \n",
" 8 | \n",
" 8 | \n",
" 8 | \n",
" 7 | \n",
" 8 | \n",
" 8 | \n",
" 4 | \n",
"
\n",
" \n",
" | 385015 | \n",
" 17 | \n",
" 17 | \n",
" 17 | \n",
" 17 | \n",
" 17 | \n",
" 11 | \n",
" 17 | \n",
" 17 | \n",
" 4 | \n",
"
\n",
" \n",
" | 385018 | \n",
" 17 | \n",
" 17 | \n",
" 17 | \n",
" 17 | \n",
" 17 | \n",
" 5 | \n",
" 17 | \n",
" 17 | \n",
" 5 | \n",
"
\n",
" \n",
" | 385024 | \n",
" 40 | \n",
" 40 | \n",
" 40 | \n",
" 40 | \n",
" 40 | \n",
" 28 | \n",
" 40 | \n",
" 40 | \n",
" 12 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" complaint_id facility_type incident_date notes severity \\\n",
"facility_id \n",
"385008 9 9 9 9 9 \n",
"385010 8 8 8 8 8 \n",
"385015 17 17 17 17 17 \n",
"385018 17 17 17 17 17 \n",
"385024 40 40 40 40 40 \n",
"\n",
" fine Facility Invest Results Abuse \\\n",
"facility_id \n",
"385008 7 9 \n",
"385010 7 8 \n",
"385015 11 17 \n",
"385018 5 17 \n",
"385024 28 40 \n",
"\n",
" Facility Invest Results Rule Type Of Abuse \n",
"facility_id \n",
"385008 9 3 \n",
"385010 8 4 \n",
"385015 17 4 \n",
"385018 17 5 \n",
"385024 40 12 "
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2.groupby('facility_id').count().head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`We don't need all these columns because they show more or less the thing. So we select the index column and the one with unique IDs for each complaint.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`(You'll notice that facility_id became the index in the grouping. We undo that by doing 'reset_index.')`"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" facility_id | \n",
" complaint_id | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 385008 | \n",
" 9 | \n",
"
\n",
" \n",
" | 1 | \n",
" 385010 | \n",
" 8 | \n",
"
\n",
" \n",
" | 2 | \n",
" 385015 | \n",
" 17 | \n",
"
\n",
" \n",
" | 3 | \n",
" 385018 | \n",
" 17 | \n",
"
\n",
" \n",
" | 4 | \n",
" 385024 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" facility_id complaint_id\n",
"0 385008 9\n",
"1 385010 8\n",
"2 385015 17\n",
"3 385018 17\n",
"4 385024 40"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2.groupby('facility_id').count().reset_index()[['facility_id','complaint_id']].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Let's create a new dataframe.`"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"fac_comp = df2.groupby('facility_id').count().reset_index()[['facility_id','complaint_id']]"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" facility_id | \n",
" complaint_id | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 385008 | \n",
" 9 | \n",
"
\n",
" \n",
" | 1 | \n",
" 385010 | \n",
" 8 | \n",
"
\n",
" \n",
" | 2 | \n",
" 385015 | \n",
" 17 | \n",
"
\n",
" \n",
" | 3 | \n",
" 385018 | \n",
" 17 | \n",
"
\n",
" \n",
" | 4 | \n",
" 385024 | \n",
" 40 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" facility_id complaint_id\n",
"0 385008 9\n",
"1 385010 8\n",
"2 385015 17\n",
"3 385018 17\n",
"4 385024 40"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fac_comp.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. JOINING"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`OK, so now we've got one dataframe with a list of facilities and their characteristics, and another with the number of substantiated complaints per facility.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`If we want to do any Journalism with the data, we need to combine the two dataframes. For those familiar with SQL, this is called joining.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`The variables in pandas joining are pretty similar to the ones in SQL: you tell pandas which fields to join on and whether it's a left, right, inner, or outer join.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`The syntax is as follows:`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`dataframe1.merge(dataframe2,left_on='dataframe1.column',right_on='dataframe2.column',how='left')`"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" facid | \n",
" fac_type | \n",
" capacity | \n",
" fac_name | \n",
" fac_address | \n",
" city_state_zip | \n",
" Unnamed: 6 | \n",
" owner | \n",
" operator | \n",
" facility_id | \n",
" complaint_id | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 385008 | \n",
" NF | \n",
" 96.0 | \n",
" Presbyterian Community Care Center | \n",
" 1085 N Oregon St | \n",
" Ontario, OR 97914 | \n",
" NaN | \n",
" Presbyterian Nursing Home, Inc. | \n",
" Presbyterian Nursing Home, Inc. | \n",
" 385008 | \n",
" 9.0 | \n",
"
\n",
" \n",
" | 1 | \n",
" 385010 | \n",
" NF | \n",
" 159.0 | \n",
" Laurelhurst Village Rehabilitation Center | \n",
" 3060 SE Stark St | \n",
" Portland, OR 97214 | \n",
" NaN | \n",
" Laurelhurst Operations, LLC | \n",
" Laurelhurst Operations, LLC | \n",
" 385010 | \n",
" 8.0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 385015 | \n",
" NF | \n",
" 128.0 | \n",
" Regency Gresham Nursing & Rehabilitation Center | \n",
" 5905 SE Powell Valley Rd | \n",
" Gresham, OR 97080 | \n",
" NaN | \n",
" Regency Gresham Nursing & Rehabilitation Cente... | \n",
" Regency Pacific Management, LLC | \n",
" 385015 | \n",
" 17.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 385018 | \n",
" NF | \n",
" 98.0 | \n",
" Providence Benedictine Nursing Center | \n",
" 540 South Main St | \n",
" Mt. Angel, OR 97362 | \n",
" NaN | \n",
" Providence Health & Services - Oregon | \n",
" Providence Health & Services - Oregon | \n",
" 385018 | \n",
" 17.0 | \n",
"
\n",
" \n",
" | 4 | \n",
" 385024 | \n",
" NF | \n",
" 91.0 | \n",
" Avamere Health Services of Rogue Valley | \n",
" 625 Stevens St | \n",
" Medford, OR 97504 | \n",
" NaN | \n",
" Medford Operations, LLC | \n",
" Medford Operations, LLC | \n",
" 385024 | \n",
" 40.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" facid fac_type capacity fac_name \\\n",
"0 385008 NF 96.0 Presbyterian Community Care Center \n",
"1 385010 NF 159.0 Laurelhurst Village Rehabilitation Center \n",
"2 385015 NF 128.0 Regency Gresham Nursing & Rehabilitation Center \n",
"3 385018 NF 98.0 Providence Benedictine Nursing Center \n",
"4 385024 NF 91.0 Avamere Health Services of Rogue Valley \n",
"\n",
" fac_address city_state_zip Unnamed: 6 \\\n",
"0 1085 N Oregon St Ontario, OR 97914 NaN \n",
"1 3060 SE Stark St Portland, OR 97214 NaN \n",
"2 5905 SE Powell Valley Rd Gresham, OR 97080 NaN \n",
"3 540 South Main St Mt. Angel, OR 97362 NaN \n",
"4 625 Stevens St Medford, OR 97504 NaN \n",
"\n",
" owner \\\n",
"0 Presbyterian Nursing Home, Inc. \n",
"1 Laurelhurst Operations, LLC \n",
"2 Regency Gresham Nursing & Rehabilitation Cente... \n",
"3 Providence Health & Services - Oregon \n",
"4 Medford Operations, LLC \n",
"\n",
" operator facility_id complaint_id \n",
"0 Presbyterian Nursing Home, Inc. 385008 9.0 \n",
"1 Laurelhurst Operations, LLC 385010 8.0 \n",
"2 Regency Pacific Management, LLC 385015 17.0 \n",
"3 Providence Health & Services - Oregon 385018 17.0 \n",
"4 Medford Operations, LLC 385024 40.0 "
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.merge(fac_comp, left_on='facid',right_on='facility_id',how='left').head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Recall the beauty of being able to easily create a new dataframe:`"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"m1 = df.merge(fac_comp, left_on='facid',right_on='facility_id',how='left')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Check which facilities had no complaints (filter for null).`"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"null_count = m1[m1.complaint_id.isnull()]"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"77"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"null_count.count()[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Create a new column with an overall complaint rate (complaints per bed). `"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`(NB: Only the df['field'] syntax works when creating a new column.)`"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"m1['comp_rate'] = m1.complaint_id/m1.capacity"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Cool. Let's see the 5 facilities with the highest complaint rates.`"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" facid | \n",
" fac_type | \n",
" capacity | \n",
" fac_name | \n",
" fac_address | \n",
" city_state_zip | \n",
" Unnamed: 6 | \n",
" owner | \n",
" operator | \n",
" facility_id | \n",
" complaint_id | \n",
" comp_rate | \n",
"
\n",
" \n",
" \n",
" \n",
" | 393 | \n",
" 5MA170 | \n",
" RCF | \n",
" 37.0 | \n",
" Brookdale McMinnville Westside | \n",
" 320 SW Hill Road | \n",
" McMinnville, OR 97128 | \n",
" NaN | \n",
" Brookdale Senior Living Communities, Inc. | \n",
" Brookdale Senior Living Communities, Inc. | \n",
" 5MA170 | \n",
" 59.0 | \n",
" 1.594595 | \n",
"
\n",
" \n",
" | 403 | \n",
" 5MA233 | \n",
" RCF | \n",
" 30.0 | \n",
" Ashley Manor - Roseburg | \n",
" 427 SE Ramp St. | \n",
" Roseburg, OR 97470 | \n",
" NaN | \n",
" Ashley Manor LLC | \n",
" Ashley Manor LLC | \n",
" 5MA233 | \n",
" 47.0 | \n",
" 1.566667 | \n",
"
\n",
" \n",
" | 390 | \n",
" 5MA161 | \n",
" RCF | \n",
" 32.0 | \n",
" Skylark Memory Care | \n",
" 950 Skylark Place | \n",
" Ashland, OR 97520 | \n",
" NaN | \n",
" Ashland Assisted Living, LLC | \n",
" Mission Senior Living, LLC | \n",
" 5MA161 | \n",
" 48.0 | \n",
" 1.500000 | \n",
"
\n",
" \n",
" | 320 | \n",
" 50R367 | \n",
" RCF | \n",
" 48.0 | \n",
" Arbor Oaks Terrace Memory Care | \n",
" 317 Werth Blvd. | \n",
" Newberg, OR 97132 | \n",
" NaN | \n",
" Newberg Memory Associates, LLC | \n",
" Frontier Management, LLC | \n",
" 50R367 | \n",
" 71.0 | \n",
" 1.479167 | \n",
"
\n",
" \n",
" | 398 | \n",
" 5MA215 | \n",
" RCF | \n",
" 55.0 | \n",
" Baycrest Memory Care | \n",
" 955 Kentucky Avenue | \n",
" Coos Bay, OR 97420 | \n",
" NaN | \n",
" Bay Area Properties, LLC | \n",
" Radiant Senior Living, Inc. | \n",
" 5MA215 | \n",
" 77.0 | \n",
" 1.400000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" facid fac_type capacity fac_name \\\n",
"393 5MA170 RCF 37.0 Brookdale McMinnville Westside \n",
"403 5MA233 RCF 30.0 Ashley Manor - Roseburg \n",
"390 5MA161 RCF 32.0 Skylark Memory Care \n",
"320 50R367 RCF 48.0 Arbor Oaks Terrace Memory Care \n",
"398 5MA215 RCF 55.0 Baycrest Memory Care \n",
"\n",
" fac_address city_state_zip Unnamed: 6 \\\n",
"393 320 SW Hill Road McMinnville, OR 97128 NaN \n",
"403 427 SE Ramp St. Roseburg, OR 97470 NaN \n",
"390 950 Skylark Place Ashland, OR 97520 NaN \n",
"320 317 Werth Blvd. Newberg, OR 97132 NaN \n",
"398 955 Kentucky Avenue Coos Bay, OR 97420 NaN \n",
"\n",
" owner \\\n",
"393 Brookdale Senior Living Communities, Inc. \n",
"403 Ashley Manor LLC \n",
"390 Ashland Assisted Living, LLC \n",
"320 Newberg Memory Associates, LLC \n",
"398 Bay Area Properties, LLC \n",
"\n",
" operator facility_id complaint_id \\\n",
"393 Brookdale Senior Living Communities, Inc. 5MA170 59.0 \n",
"403 Ashley Manor LLC 5MA233 47.0 \n",
"390 Mission Senior Living, LLC 5MA161 48.0 \n",
"320 Frontier Management, LLC 50R367 71.0 \n",
"398 Radiant Senior Living, Inc. 5MA215 77.0 \n",
"\n",
" comp_rate \n",
"393 1.594595 \n",
"403 1.566667 \n",
"390 1.500000 \n",
"320 1.479167 \n",
"398 1.400000 "
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1.sort_values('comp_rate',ascending=False).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Cool Stuff.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`The 'fac_type' field tells you if a facility is a nursing home, a residential care facility, or an assisted living facility. Each one is subject to different sets of regulations so who knows, maybe there's a difference.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`We've already gone over the thing we'll need to use to do this. Does anybody remember?`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`That's right, grouping!`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. AND NOW WE PUT IT TOGETHER..."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" capacity | \n",
" Unnamed: 6 | \n",
" complaint_id | \n",
" comp_rate | \n",
"
\n",
" \n",
" | fac_type | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | ALF | \n",
" 66.610860 | \n",
" NaN | \n",
" 10.195122 | \n",
" 0.154733 | \n",
"
\n",
" \n",
" | NF | \n",
" 81.759124 | \n",
" NaN | \n",
" 14.592000 | \n",
" 0.178521 | \n",
"
\n",
" \n",
" | RCF | \n",
" 38.759717 | \n",
" NaN | \n",
" 10.516949 | \n",
" 0.290260 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" capacity Unnamed: 6 complaint_id comp_rate\n",
"fac_type \n",
"ALF 66.610860 NaN 10.195122 0.154733\n",
"NF 81.759124 NaN 14.592000 0.178521\n",
"RCF 38.759717 NaN 10.516949 0.290260"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1.groupby('fac_type').mean()"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mean_comp_rate = m1.groupby('fac_type').mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Wow, RCFs are super high! Wonder what could be going on there...`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Final final step"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`What if you want to show your results to someone not as techno-magical as you? Why, simply export your results into a csv. Decide where you're gonna output your dataframe and what you're gonna call it.`"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mean_comp_rate.to_csv('output/mean_comp_rate.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# General words of wisdom"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"`Stackoverflow is your friend.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Google is your friend.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Documentation is your friend.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`News Nerdery's 'help me' channel is your friend.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Your friends are your friends.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Make a google spreadsheet where you save snippets of code you're likely to reuse.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`If you need to, find a good wall to bang your head against. There is no shame in that.`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Never give up. (within reason.)`"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}