{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#Computing occupancy statistics with Python - Part 2 of 3#\n",
"In the second part of this series, we will use Python to compute summary occupancy statistics (such as means and percentiles) by time of day, day of week, and patient category (recall that this example is from a hospital short stay unit - go back to Part 1 for all of the background info). Computation of percentiles by one or more grouping fields is a pain using tools like Excel, Access and SQL Server. With Python+pandas it's easy.\n",
"\n",
"You can find the data and the `.ipynb` file in my [hselab-tutorials](https://github.com/misken/hselab-tutorials) github repo. Clone or download a zip."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##Preliminaries##"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At the end of Part 1 of this tutorial series, we ended up with a csv file called bydate_shortstay_csv.csv. Let's read it in and take a look at it."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"## Read sample data set and convert string dates to datetimes\n",
"bydate_df = pd.read_csv('data/bydate_shortstay_csv.csv',parse_dates=['datetime'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bydate_df.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"
\n",
"
\n",
" \n",
" \n",
" | \n",
" category | \n",
" datetime | \n",
" arrivals | \n",
" binofday | \n",
" binofweek | \n",
" dayofweek | \n",
" departures | \n",
" occupancy | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" IVT | \n",
" 1996-01-02 00:00:00 | \n",
" 0 | \n",
" 0 | \n",
" 48 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" IVT | \n",
" 1996-01-02 00:30:00 | \n",
" 0 | \n",
" 1 | \n",
" 49 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 2 | \n",
" IVT | \n",
" 1996-01-02 01:00:00 | \n",
" 0 | \n",
" 2 | \n",
" 50 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" IVT | \n",
" 1996-01-02 01:30:00 | \n",
" 0 | \n",
" 3 | \n",
" 51 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" IVT | \n",
" 1996-01-02 02:00:00 | \n",
" 0 | \n",
" 4 | \n",
" 52 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
" category datetime arrivals binofday binofweek dayofweek departures \\\n",
"0 IVT 1996-01-02 00:00:00 0 0 48 1 0 \n",
"1 IVT 1996-01-02 00:30:00 0 1 49 1 0 \n",
"2 IVT 1996-01-02 01:00:00 0 2 50 1 0 \n",
"3 IVT 1996-01-02 01:30:00 0 3 51 1 0 \n",
"4 IVT 1996-01-02 02:00:00 0 4 52 1 0 \n",
"\n",
" occupancy \n",
"0 0 \n",
"1 0 \n",
"2 0 \n",
"3 0 \n",
"4 0 "
]
}
],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bydate_df[1320:1350]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" category | \n",
" datetime | \n",
" arrivals | \n",
" binofday | \n",
" binofweek | \n",
" dayofweek | \n",
" departures | \n",
" occupancy | \n",
"
\n",
" \n",
" \n",
" \n",
" 1320 | \n",
" IVT | \n",
" 1996-01-29 12:00:00 | \n",
" 9 | \n",
" 24 | \n",
" 24 | \n",
" 0 | \n",
" 9 | \n",
" 21.266667 | \n",
"
\n",
" \n",
" 1321 | \n",
" IVT | \n",
" 1996-01-29 12:30:00 | \n",
" 9 | \n",
" 25 | \n",
" 25 | \n",
" 0 | \n",
" 6 | \n",
" 22.333333 | \n",
"
\n",
" \n",
" 1322 | \n",
" IVT | \n",
" 1996-01-29 13:00:00 | \n",
" 12 | \n",
" 26 | \n",
" 26 | \n",
" 0 | \n",
" 12 | \n",
" 22.266667 | \n",
"
\n",
" \n",
" 1323 | \n",
" IVT | \n",
" 1996-01-29 13:30:00 | \n",
" 8 | \n",
" 27 | \n",
" 27 | \n",
" 0 | \n",
" 9 | \n",
" 23.100000 | \n",
"
\n",
" \n",
" 1324 | \n",
" IVT | \n",
" 1996-01-29 14:00:00 | \n",
" 8 | \n",
" 28 | \n",
" 28 | \n",
" 0 | \n",
" 6 | \n",
" 22.933333 | \n",
"
\n",
" \n",
" 1325 | \n",
" IVT | \n",
" 1996-01-29 14:30:00 | \n",
" 9 | \n",
" 29 | \n",
" 29 | \n",
" 0 | \n",
" 11 | \n",
" 22.300000 | \n",
"
\n",
" \n",
" 1326 | \n",
" IVT | \n",
" 1996-01-29 15:00:00 | \n",
" 7 | \n",
" 30 | \n",
" 30 | \n",
" 0 | \n",
" 6 | \n",
" 23.900000 | \n",
"
\n",
" \n",
" 1327 | \n",
" IVT | \n",
" 1996-01-29 15:30:00 | \n",
" 6 | \n",
" 31 | \n",
" 31 | \n",
" 0 | \n",
" 9 | \n",
" 22.600000 | \n",
"
\n",
" \n",
" 1328 | \n",
" IVT | \n",
" 1996-01-29 16:00:00 | \n",
" 9 | \n",
" 32 | \n",
" 32 | \n",
" 0 | \n",
" 11 | \n",
" 19.700000 | \n",
"
\n",
" \n",
" 1329 | \n",
" IVT | \n",
" 1996-01-29 16:30:00 | \n",
" 7 | \n",
" 33 | \n",
" 33 | \n",
" 0 | \n",
" 4 | \n",
" 19.533333 | \n",
"
\n",
" \n",
" 1330 | \n",
" IVT | \n",
" 1996-01-29 17:00:00 | \n",
" 5 | \n",
" 34 | \n",
" 34 | \n",
" 0 | \n",
" 9 | \n",
" 18.033333 | \n",
"
\n",
" \n",
" 1331 | \n",
" IVT | \n",
" 1996-01-29 17:30:00 | \n",
" 2 | \n",
" 35 | \n",
" 35 | \n",
" 0 | \n",
" 9 | \n",
" 13.600000 | \n",
"
\n",
" \n",
" 1332 | \n",
" IVT | \n",
" 1996-01-29 18:00:00 | \n",
" 3 | \n",
" 36 | \n",
" 36 | \n",
" 0 | \n",
" 5 | \n",
" 9.566667 | \n",
"
\n",
" \n",
" 1333 | \n",
" IVT | \n",
" 1996-01-29 18:30:00 | \n",
" 2 | \n",
" 37 | \n",
" 37 | \n",
" 0 | \n",
" 4 | \n",
" 7.166667 | \n",
"
\n",
" \n",
" 1334 | \n",
" IVT | \n",
" 1996-01-29 19:00:00 | \n",
" 1 | \n",
" 38 | \n",
" 38 | \n",
" 0 | \n",
" 3 | \n",
" 5.833333 | \n",
"
\n",
" \n",
" 1335 | \n",
" IVT | \n",
" 1996-01-29 19:30:00 | \n",
" 3 | \n",
" 39 | \n",
" 39 | \n",
" 0 | \n",
" 2 | \n",
" 6.366667 | \n",
"
\n",
" \n",
" 1336 | \n",
" IVT | \n",
" 1996-01-29 20:00:00 | \n",
" 0 | \n",
" 40 | \n",
" 40 | \n",
" 0 | \n",
" 3 | \n",
" 4.833333 | \n",
"
\n",
" \n",
" 1337 | \n",
" IVT | \n",
" 1996-01-29 20:30:00 | \n",
" 0 | \n",
" 41 | \n",
" 41 | \n",
" 0 | \n",
" 3 | \n",
" 1.000000 | \n",
"
\n",
" \n",
" 1338 | \n",
" IVT | \n",
" 1996-01-29 21:00:00 | \n",
" 0 | \n",
" 42 | \n",
" 42 | \n",
" 0 | \n",
" 0 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 1339 | \n",
" IVT | \n",
" 1996-01-29 21:30:00 | \n",
" 0 | \n",
" 43 | \n",
" 43 | \n",
" 0 | \n",
" 0 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 1340 | \n",
" IVT | \n",
" 1996-01-29 22:00:00 | \n",
" 0 | \n",
" 44 | \n",
" 44 | \n",
" 0 | \n",
" 0 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 1341 | \n",
" IVT | \n",
" 1996-01-29 22:30:00 | \n",
" 0 | \n",
" 45 | \n",
" 45 | \n",
" 0 | \n",
" 0 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 1342 | \n",
" IVT | \n",
" 1996-01-29 23:00:00 | \n",
" 0 | \n",
" 46 | \n",
" 46 | \n",
" 0 | \n",
" 0 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 1343 | \n",
" IVT | \n",
" 1996-01-29 23:30:00 | \n",
" 0 | \n",
" 47 | \n",
" 47 | \n",
" 0 | \n",
" 0 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 1344 | \n",
" IVT | \n",
" 1996-01-30 00:00:00 | \n",
" 0 | \n",
" 0 | \n",
" 48 | \n",
" 1 | \n",
" 0 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 1345 | \n",
" IVT | \n",
" 1996-01-30 00:30:00 | \n",
" 0 | \n",
" 1 | \n",
" 49 | \n",
" 1 | \n",
" 0 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 1346 | \n",
" IVT | \n",
" 1996-01-30 01:00:00 | \n",
" 0 | \n",
" 2 | \n",
" 50 | \n",
" 1 | \n",
" 0 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 1347 | \n",
" IVT | \n",
" 1996-01-30 01:30:00 | \n",
" 0 | \n",
" 3 | \n",
" 51 | \n",
" 1 | \n",
" 0 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 1348 | \n",
" IVT | \n",
" 1996-01-30 02:00:00 | \n",
" 0 | \n",
" 4 | \n",
" 52 | \n",
" 1 | \n",
" 0 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 1349 | \n",
" IVT | \n",
" 1996-01-30 02:30:00 | \n",
" 0 | \n",
" 5 | \n",
" 53 | \n",
" 1 | \n",
" 0 | \n",
" 0.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
" category datetime arrivals binofday binofweek dayofweek \\\n",
"1320 IVT 1996-01-29 12:00:00 9 24 24 0 \n",
"1321 IVT 1996-01-29 12:30:00 9 25 25 0 \n",
"1322 IVT 1996-01-29 13:00:00 12 26 26 0 \n",
"1323 IVT 1996-01-29 13:30:00 8 27 27 0 \n",
"1324 IVT 1996-01-29 14:00:00 8 28 28 0 \n",
"1325 IVT 1996-01-29 14:30:00 9 29 29 0 \n",
"1326 IVT 1996-01-29 15:00:00 7 30 30 0 \n",
"1327 IVT 1996-01-29 15:30:00 6 31 31 0 \n",
"1328 IVT 1996-01-29 16:00:00 9 32 32 0 \n",
"1329 IVT 1996-01-29 16:30:00 7 33 33 0 \n",
"1330 IVT 1996-01-29 17:00:00 5 34 34 0 \n",
"1331 IVT 1996-01-29 17:30:00 2 35 35 0 \n",
"1332 IVT 1996-01-29 18:00:00 3 36 36 0 \n",
"1333 IVT 1996-01-29 18:30:00 2 37 37 0 \n",
"1334 IVT 1996-01-29 19:00:00 1 38 38 0 \n",
"1335 IVT 1996-01-29 19:30:00 3 39 39 0 \n",
"1336 IVT 1996-01-29 20:00:00 0 40 40 0 \n",
"1337 IVT 1996-01-29 20:30:00 0 41 41 0 \n",
"1338 IVT 1996-01-29 21:00:00 0 42 42 0 \n",
"1339 IVT 1996-01-29 21:30:00 0 43 43 0 \n",
"1340 IVT 1996-01-29 22:00:00 0 44 44 0 \n",
"1341 IVT 1996-01-29 22:30:00 0 45 45 0 \n",
"1342 IVT 1996-01-29 23:00:00 0 46 46 0 \n",
"1343 IVT 1996-01-29 23:30:00 0 47 47 0 \n",
"1344 IVT 1996-01-30 00:00:00 0 0 48 1 \n",
"1345 IVT 1996-01-30 00:30:00 0 1 49 1 \n",
"1346 IVT 1996-01-30 01:00:00 0 2 50 1 \n",
"1347 IVT 1996-01-30 01:30:00 0 3 51 1 \n",
"1348 IVT 1996-01-30 02:00:00 0 4 52 1 \n",
"1349 IVT 1996-01-30 02:30:00 0 5 53 1 \n",
"\n",
" departures occupancy \n",
"1320 9 21.266667 \n",
"1321 6 22.333333 \n",
"1322 12 22.266667 \n",
"1323 9 23.100000 \n",
"1324 6 22.933333 \n",
"1325 11 22.300000 \n",
"1326 6 23.900000 \n",
"1327 9 22.600000 \n",
"1328 11 19.700000 \n",
"1329 4 19.533333 \n",
"1330 9 18.033333 \n",
"1331 9 13.600000 \n",
"1332 5 9.566667 \n",
"1333 4 7.166667 \n",
"1334 3 5.833333 \n",
"1335 2 6.366667 \n",
"1336 3 4.833333 \n",
"1337 3 1.000000 \n",
"1338 0 0.000000 \n",
"1339 0 0.000000 \n",
"1340 0 0.000000 \n",
"1341 0 0.000000 \n",
"1342 0 0.000000 \n",
"1343 0 0.000000 \n",
"1344 0 0.000000 \n",
"1345 0 0.000000 \n",
"1346 0 0.000000 \n",
"1347 0 0.000000 \n",
"1348 0 0.000000 \n",
"1349 0 0.000000 "
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With this data frame we can compute all kinds of interesting summary statistics by category, by day of week and time of day. To facilitate this type of \"group by\" analysis, **pandas** takes what is known as the Split-Apply-Combine approach. The [pandas documentation has a nice discussion](http://pandas.pydata.org/pandas-docs/dev/groupby.html) of this. To really understand split-apply-combine, [check out the article](http://www.jstatsoft.org/v40/i01) by [Hadley Wickham](http://had.co.nz/) who created the **plyr** package for [R](http://www.r-project.org/). I also created a tutorial on [Getting started with Python (with pandas and matplotlib) for group by analysis](http://hselab.org/machinery/content/getting-started-python-pandas-and-matplotlib-group-analysis) that covers some of the basics. A [companion tutorial shows how to do the same analysis using R](http://hselab.org/machinery/content/getting-started-r-plyr-and-ggplot2-group-analysis) instead of Python."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pandas provides a `GroupBy` object to facilitate computing aggregate statistics by grouping fields. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Create a GroupBy object for the summary stats \n",
"bydate_dfgrp1 = bydate_df.groupby(['category','binofweek'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Having a group by object makes it easy to compute statistics such as the mean of all of the fields other than the grouping fields.\n",
"# You'll see that the result is simply another DataFrame.\n",
"bydate_dfgrp1.mean()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
"\n",
"MultiIndex: 2016 entries, (ART, 0.0) to (Total, 335.0)\n",
"Data columns:\n",
"arrivals 2016 non-null values\n",
"binofday 2016 non-null values\n",
"dayofweek 2016 non-null values\n",
"departures 2016 non-null values\n",
"occupancy 2016 non-null values\n",
"dtypes: float64(5)"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# Let's explore some of the means.\n",
"bydate_dfgrp1.mean()[100:120]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" arrivals | \n",
" binofday | \n",
" dayofweek | \n",
" departures | \n",
" occupancy | \n",
"
\n",
" \n",
" category | \n",
" binofweek | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" ART | \n",
" 100 | \n",
" 0.000000 | \n",
" 4 | \n",
" 2 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 101 | \n",
" 0.000000 | \n",
" 5 | \n",
" 2 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 102 | \n",
" 0.000000 | \n",
" 6 | \n",
" 2 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 103 | \n",
" 0.000000 | \n",
" 7 | \n",
" 2 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 104 | \n",
" 0.000000 | \n",
" 8 | \n",
" 2 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 105 | \n",
" 0.000000 | \n",
" 9 | \n",
" 2 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 106 | \n",
" 0.000000 | \n",
" 10 | \n",
" 2 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 107 | \n",
" 1.538462 | \n",
" 11 | \n",
" 2 | \n",
" 0.000000 | \n",
" 0.782051 | \n",
"
\n",
" \n",
" 108 | \n",
" 1.769231 | \n",
" 12 | \n",
" 2 | \n",
" 0.000000 | \n",
" 2.361538 | \n",
"
\n",
" \n",
" 109 | \n",
" 3.384615 | \n",
" 13 | \n",
" 2 | \n",
" 0.000000 | \n",
" 5.058974 | \n",
"
\n",
" \n",
" 110 | \n",
" 1.769231 | \n",
" 14 | \n",
" 2 | \n",
" 0.000000 | \n",
" 7.674359 | \n",
"
\n",
" \n",
" 111 | \n",
" 1.538462 | \n",
" 15 | \n",
" 2 | \n",
" 3.076923 | \n",
" 7.584615 | \n",
"
\n",
" \n",
" 112 | \n",
" 1.692308 | \n",
" 16 | \n",
" 2 | \n",
" 3.384615 | \n",
" 5.225641 | \n",
"
\n",
" \n",
" 113 | \n",
" 1.692308 | \n",
" 17 | \n",
" 2 | \n",
" 1.538462 | \n",
" 5.300000 | \n",
"
\n",
" \n",
" 114 | \n",
" 2.153846 | \n",
" 18 | \n",
" 2 | \n",
" 1.923077 | \n",
" 5.282051 | \n",
"
\n",
" \n",
" 115 | \n",
" 1.846154 | \n",
" 19 | \n",
" 2 | \n",
" 1.923077 | \n",
" 5.412821 | \n",
"
\n",
" \n",
" 116 | \n",
" 1.153846 | \n",
" 20 | \n",
" 2 | \n",
" 1.923077 | \n",
" 5.228205 | \n",
"
\n",
" \n",
" 117 | \n",
" 1.461538 | \n",
" 21 | \n",
" 2 | \n",
" 1.230769 | \n",
" 4.800000 | \n",
"
\n",
" \n",
" 118 | \n",
" 1.692308 | \n",
" 22 | \n",
" 2 | \n",
" 2.000000 | \n",
" 4.764103 | \n",
"
\n",
" \n",
" 119 | \n",
" 2.153846 | \n",
" 23 | \n",
" 2 | \n",
" 1.461538 | \n",
" 5.064103 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
" arrivals binofday dayofweek departures occupancy\n",
"category binofweek \n",
"ART 100 0.000000 4 2 0.000000 0.000000\n",
" 101 0.000000 5 2 0.000000 0.000000\n",
" 102 0.000000 6 2 0.000000 0.000000\n",
" 103 0.000000 7 2 0.000000 0.000000\n",
" 104 0.000000 8 2 0.000000 0.000000\n",
" 105 0.000000 9 2 0.000000 0.000000\n",
" 106 0.000000 10 2 0.000000 0.000000\n",
" 107 1.538462 11 2 0.000000 0.782051\n",
" 108 1.769231 12 2 0.000000 2.361538\n",
" 109 3.384615 13 2 0.000000 5.058974\n",
" 110 1.769231 14 2 0.000000 7.674359\n",
" 111 1.538462 15 2 3.076923 7.584615\n",
" 112 1.692308 16 2 3.384615 5.225641\n",
" 113 1.692308 17 2 1.538462 5.300000\n",
" 114 2.153846 18 2 1.923077 5.282051\n",
" 115 1.846154 19 2 1.923077 5.412821\n",
" 116 1.153846 20 2 1.923077 5.228205\n",
" 117 1.461538 21 2 1.230769 4.800000\n",
" 118 1.692308 22 2 2.000000 4.764103\n",
" 119 2.153846 23 2 1.461538 5.064103"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we've seen how the a `GroupBy` object works, let's see how we can compute a whole bunch of summary statistics at once. Specifically we want to compute the mean, standard deviation, min, max and several percentiles. First let's create a slightly different `GroupBy` object."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"bydate_dfgrp2 = bydate_df.groupby(['category','dayofweek','binofday'])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's define a function that will return a bunch of statistics in a dictionary for a column of data."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def get_occstats(group, stub=''):\n",
" return {stub+'count': group.count(), stub+'mean': group.mean(), \n",
" stub+'min': group.min(),\n",
" stub+'max': group.max(), 'stdev': group.std(), \n",
" stub+'p50': group.quantile(0.5), stub+'p55': group.quantile(0.55),\n",
" stub+'p60': group.quantile(0.6), stub+'p65': group.quantile(0.65),\n",
" stub+'p70': group.quantile(0.7), stub+'p75': group.quantile(0.75),\n",
" stub+'p80': group.quantile(0.8), stub+'p85': group.quantile(0.85),\n",
" stub+'p90': group.quantile(0.9), stub+'p95': group.quantile(0.95),\n",
" stub+'p975': group.quantile(0.975), \n",
" stub+'p99': group.quantile(0.99)}"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can use the `apply` function to apply the `get_occstats()` function to a data series. We'll create separate output data series for occupancy, arrivals and departures. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"occ_stats = bydate_dfgrp2['occupancy'].apply(get_occstats)\n",
"arr_stats = bydate_dfgrp2['arrivals'].apply(get_occstats)\n",
"dep_stats = bydate_dfgrp2['departures'].apply(get_occstats)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, what is `occ_stats`?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"type(occ_stats)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"pandas.core.series.Series"
]
}
],
"prompt_number": 20
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's a pandas `Series` object. What does its index look like?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"occ_stats.index"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 22,
"text": [
"MultiIndex\n",
"[(ART, 0.0, 0.0, count), (ART, 0.0, 0.0, max), (ART, 0.0, 0.0, mean), (ART, 0.0, 0.0, min), (ART, 0.0, 0.0, p50), (ART, 0.0, 0.0, p55), (ART, 0.0, 0.0, p60), (ART, 0.0, 0.0, p65), (ART, 0.0, 0.0, p70), (ART, 0.0, 0.0, p75), (ART, 0.0, 0.0, p80), (ART, 0.0, 0.0, p85), (ART, 0.0, 0.0, p90), (ART, 0.0, 0.0, p95), (ART, 0.0, 0.0, p975), (ART, 0.0, 0.0, p99), (ART, 0.0, 0.0, stdev), (ART, 0.0, 1.0, count), (ART, 0.0, 1.0, max), (ART, 0.0, 1.0, mean), (ART, 0.0, 1.0, min), (ART, 0.0, 1.0, p50), (ART, 0.0, 1.0, p55), (ART, 0.0, 1.0, p60), (ART, 0.0, 1.0, p65), (ART, 0.0, 1.0, p70), (ART, 0.0, 1.0, p75), (ART, 0.0, 1.0, p80), (ART, 0.0, 1.0, p85), (ART, 0.0, 1.0, p90), (ART, 0.0, 1.0, p95), (ART, 0.0, 1.0, p975), (ART, 0.0, 1.0, p99), (ART, 0.0, 1.0, stdev), (ART, 0.0, 2.0, count), (ART, 0.0, 2.0, max), (ART, 0.0, 2.0, mean), (ART, 0.0, 2.0, min), (ART, 0.0, 2.0, p50), (ART, 0.0, 2.0, p55), (ART, 0.0, 2.0, p60), (ART, 0.0, 2.0, p65), (ART, 0.0, 2.0, p70), (ART, 0.0, 2.0, p75), (ART, 0.0, 2.0, p80), (ART, 0.0, 2.0, p85), (ART, 0.0, 2.0, p90), (ART, 0.0, 2.0, p95), (ART, 0.0, 2.0, p975), (ART, 0.0, 2.0, p99), (Total, 6.0, 45.0, max), (Total, 6.0, 45.0, mean), (Total, 6.0, 45.0, min), (Total, 6.0, 45.0, p50), (Total, 6.0, 45.0, p55), (Total, 6.0, 45.0, p60), (Total, 6.0, 45.0, p65), (Total, 6.0, 45.0, p70), (Total, 6.0, 45.0, p75), (Total, 6.0, 45.0, p80), (Total, 6.0, 45.0, p85), (Total, 6.0, 45.0, p90), (Total, 6.0, 45.0, p95), (Total, 6.0, 45.0, p975), (Total, 6.0, 45.0, p99), (Total, 6.0, 45.0, stdev), (Total, 6.0, 46.0, count), (Total, 6.0, 46.0, max), (Total, 6.0, 46.0, mean), (Total, 6.0, 46.0, min), (Total, 6.0, 46.0, p50), (Total, 6.0, 46.0, p55), (Total, 6.0, 46.0, p60), (Total, 6.0, 46.0, p65), (Total, 6.0, 46.0, p70), (Total, 6.0, 46.0, p75), (Total, 6.0, 46.0, p80), (Total, 6.0, 46.0, p85), (Total, 6.0, 46.0, p90), (Total, 6.0, 46.0, p95), (Total, 6.0, 46.0, p975), (Total, 6.0, 46.0, p99), (Total, 6.0, 46.0, stdev), (Total, 6.0, 47.0, count), (Total, 6.0, 47.0, max), (Total, 6.0, 47.0, mean), (Total, 6.0, 47.0, min), (Total, 6.0, 47.0, p50), (Total, 6.0, 47.0, p55), (Total, 6.0, 47.0, p60), (Total, 6.0, 47.0, p65), (Total, 6.0, 47.0, p70), (Total, 6.0, 47.0, p75), (Total, 6.0, 47.0, p80), (Total, 6.0, 47.0, p85), (Total, 6.0, 47.0, p90), (Total, 6.0, 47.0, p95), (Total, 6.0, 47.0, p975), (Total, 6.0, 47.0, p99), (Total, 6.0, 47.0, stdev)]"
]
}
],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice it's a `MultiIndex` with 4 levels: category, dayofweek, binofday, statistic. It would be nice to \"un-pivot\" the statistic from the index and have it correspond to a set of columns. That's what `unstack()` will do. It will leave us with a `DataFrame` with all of the statistics as columns and a 3 level multi-index of category, dayofweek and binofday. Perfect for plotting."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"occ_stats.unstack()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 23,
"text": [
"\n",
"MultiIndex: 2016 entries, (ART, 0.0, 0.0) to (Total, 6.0, 47.0)\n",
"Data columns:\n",
"count 2016 non-null values\n",
"max 2016 non-null values\n",
"mean 2016 non-null values\n",
"min 2016 non-null values\n",
"p50 2016 non-null values\n",
"p55 2016 non-null values\n",
"p60 2016 non-null values\n",
"p65 2016 non-null values\n",
"p70 2016 non-null values\n",
"p75 2016 non-null values\n",
"p80 2016 non-null values\n",
"p85 2016 non-null values\n",
"p90 2016 non-null values\n",
"p95 2016 non-null values\n",
"p975 2016 non-null values\n",
"p99 2016 non-null values\n",
"stdev 2016 non-null values\n",
"dtypes: float64(17)"
]
}
],
"prompt_number": 23
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"occ_stats_summary = occ_stats.unstack()\n",
"arr_stats_summary = arr_stats.unstack()\n",
"dep_stats_summary = dep_stats.unstack()\n",
"\n",
"occ_stats_summary[200:220] # Let's peek into the middle of the table."
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" | \n",
" | \n",
" count | \n",
" max | \n",
" mean | \n",
" min | \n",
" p50 | \n",
" p55 | \n",
" p60 | \n",
" p65 | \n",
" p70 | \n",
" p75 | \n",
" p80 | \n",
" p85 | \n",
" p90 | \n",
" p95 | \n",
" p975 | \n",
" p99 | \n",
" stdev | \n",
"
\n",
" \n",
" category | \n",
" dayofweek | \n",
" binofday | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" ART | \n",
" 4 | \n",
" 8 | \n",
" 13 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 9 | \n",
" 13 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 10 | \n",
" 13 | \n",
" 0.133333 | \n",
" 0.010256 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.053333 | \n",
" 0.093333 | \n",
" 0.117333 | \n",
" 0.036980 | \n",
"
\n",
" \n",
" 11 | \n",
" 13 | \n",
" 1.600000 | \n",
" 0.325641 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.060000 | \n",
" 0.180000 | \n",
" 0.420000 | \n",
" 0.553333 | \n",
" 0.633333 | \n",
" 0.653333 | \n",
" 0.680000 | \n",
" 0.720000 | \n",
" 1.080000 | \n",
" 1.340000 | \n",
" 1.496000 | \n",
" 0.485179 | \n",
"
\n",
" \n",
" 12 | \n",
" 13 | \n",
" 4.500000 | \n",
" 1.800000 | \n",
" 0.000000 | \n",
" 1.966667 | \n",
" 2.026667 | \n",
" 2.146667 | \n",
" 2.386667 | \n",
" 2.493333 | \n",
" 2.533333 | \n",
" 2.713333 | \n",
" 2.886667 | \n",
" 3.046667 | \n",
" 3.660000 | \n",
" 4.080000 | \n",
" 4.332000 | \n",
" 1.290923 | \n",
"
\n",
" \n",
" 13 | \n",
" 13 | \n",
" 7.966667 | \n",
" 5.143590 | \n",
" 2.266667 | \n",
" 4.933333 | \n",
" 5.073333 | \n",
" 5.226667 | \n",
" 5.406667 | \n",
" 5.840000 | \n",
" 6.400000 | \n",
" 6.480000 | \n",
" 6.740000 | \n",
" 7.360000 | \n",
" 7.726667 | \n",
" 7.846667 | \n",
" 7.918667 | \n",
" 1.619007 | \n",
"
\n",
" \n",
" 14 | \n",
" 13 | \n",
" 11.166667 | \n",
" 8.497436 | \n",
" 6.266667 | \n",
" 7.966667 | \n",
" 8.366667 | \n",
" 8.733333 | \n",
" 9.033333 | \n",
" 9.226667 | \n",
" 9.366667 | \n",
" 9.466667 | \n",
" 9.700000 | \n",
" 10.200000 | \n",
" 10.686667 | \n",
" 10.926667 | \n",
" 11.070667 | \n",
" 1.368661 | \n",
"
\n",
" \n",
" 15 | \n",
" 13 | \n",
" 10.733333 | \n",
" 8.707692 | \n",
" 5.166667 | \n",
" 9.200000 | \n",
" 9.480000 | \n",
" 9.666667 | \n",
" 9.666667 | \n",
" 9.680000 | \n",
" 9.700000 | \n",
" 9.860000 | \n",
" 10.040000 | \n",
" 10.260000 | \n",
" 10.493333 | \n",
" 10.613333 | \n",
" 10.685333 | \n",
" 1.572987 | \n",
"
\n",
" \n",
" 16 | \n",
" 13 | \n",
" 8.666667 | \n",
" 6.046154 | \n",
" 2.833333 | \n",
" 6.533333 | \n",
" 6.613333 | \n",
" 6.680000 | \n",
" 6.720000 | \n",
" 6.880000 | \n",
" 7.100000 | \n",
" 7.480000 | \n",
" 7.866667 | \n",
" 8.266667 | \n",
" 8.506667 | \n",
" 8.586667 | \n",
" 8.634667 | \n",
" 1.833815 | \n",
"
\n",
" \n",
" 17 | \n",
" 13 | \n",
" 9.066667 | \n",
" 6.279487 | \n",
" 3.500000 | \n",
" 6.333333 | \n",
" 6.353333 | \n",
" 6.480000 | \n",
" 6.820000 | \n",
" 6.960000 | \n",
" 7.000000 | \n",
" 7.060000 | \n",
" 7.473333 | \n",
" 8.593333 | \n",
" 9.006667 | \n",
" 9.036667 | \n",
" 9.054667 | \n",
" 1.587074 | \n",
"
\n",
" \n",
" 18 | \n",
" 13 | \n",
" 8.800000 | \n",
" 6.310256 | \n",
" 4.000000 | \n",
" 6.500000 | \n",
" 6.760000 | \n",
" 6.966667 | \n",
" 7.066667 | \n",
" 7.220000 | \n",
" 7.400000 | \n",
" 7.620000 | \n",
" 7.800000 | \n",
" 7.900000 | \n",
" 8.280000 | \n",
" 8.540000 | \n",
" 8.696000 | \n",
" 1.491729 | \n",
"
\n",
" \n",
" 19 | \n",
" 13 | \n",
" 7.566667 | \n",
" 6.074359 | \n",
" 3.833333 | \n",
" 5.966667 | \n",
" 6.466667 | \n",
" 6.840000 | \n",
" 6.960000 | \n",
" 7.053333 | \n",
" 7.133333 | \n",
" 7.193333 | \n",
" 7.280000 | \n",
" 7.420000 | \n",
" 7.506667 | \n",
" 7.536667 | \n",
" 7.554667 | \n",
" 1.215955 | \n",
"
\n",
" \n",
" 20 | \n",
" 13 | \n",
" 9.500000 | \n",
" 5.784615 | \n",
" 2.533333 | \n",
" 5.533333 | \n",
" 5.873333 | \n",
" 6.153333 | \n",
" 6.313333 | \n",
" 6.473333 | \n",
" 6.633333 | \n",
" 6.753333 | \n",
" 7.013333 | \n",
" 7.553333 | \n",
" 8.440000 | \n",
" 8.970000 | \n",
" 9.288000 | \n",
" 1.736142 | \n",
"
\n",
" \n",
" 21 | \n",
" 13 | \n",
" 9.900000 | \n",
" 6.271795 | \n",
" 3.833333 | \n",
" 5.700000 | \n",
" 5.780000 | \n",
" 5.860000 | \n",
" 5.940000 | \n",
" 6.566667 | \n",
" 7.466667 | \n",
" 7.646667 | \n",
" 7.946667 | \n",
" 8.486667 | \n",
" 9.160000 | \n",
" 9.530000 | \n",
" 9.752000 | \n",
" 1.718303 | \n",
"
\n",
" \n",
" 22 | \n",
" 13 | \n",
" 9.000000 | \n",
" 6.343590 | \n",
" 2.633333 | \n",
" 6.633333 | \n",
" 6.673333 | \n",
" 6.793333 | \n",
" 7.073333 | \n",
" 7.246667 | \n",
" 7.366667 | \n",
" 7.526667 | \n",
" 7.646667 | \n",
" 7.686667 | \n",
" 8.220000 | \n",
" 8.610000 | \n",
" 8.844000 | \n",
" 1.634092 | \n",
"
\n",
" \n",
" 23 | \n",
" 13 | \n",
" 9.100000 | \n",
" 5.692308 | \n",
" 3.533333 | \n",
" 5.100000 | \n",
" 5.440000 | \n",
" 5.806667 | \n",
" 6.226667 | \n",
" 6.660000 | \n",
" 7.100000 | \n",
" 7.300000 | \n",
" 7.446667 | \n",
" 7.486667 | \n",
" 8.140000 | \n",
" 8.620000 | \n",
" 8.908000 | \n",
" 1.697337 | \n",
"
\n",
" \n",
" 24 | \n",
" 13 | \n",
" 8.900000 | \n",
" 5.592308 | \n",
" 2.800000 | \n",
" 5.433333 | \n",
" 5.453333 | \n",
" 5.566667 | \n",
" 5.866667 | \n",
" 6.113333 | \n",
" 6.333333 | \n",
" 6.633333 | \n",
" 6.993333 | \n",
" 7.473333 | \n",
" 8.140000 | \n",
" 8.520000 | \n",
" 8.748000 | \n",
" 1.593921 | \n",
"
\n",
" \n",
" 25 | \n",
" 13 | \n",
" 8.566667 | \n",
" 5.282051 | \n",
" 3.000000 | \n",
" 5.166667 | \n",
" 5.226667 | \n",
" 5.340000 | \n",
" 5.560000 | \n",
" 5.753333 | \n",
" 5.933333 | \n",
" 6.173333 | \n",
" 6.486667 | \n",
" 6.946667 | \n",
" 7.686667 | \n",
" 8.126667 | \n",
" 8.390667 | \n",
" 1.492674 | \n",
"
\n",
" \n",
" 26 | \n",
" 13 | \n",
" 8.333333 | \n",
" 4.779487 | \n",
" 2.300000 | \n",
" 4.800000 | \n",
" 4.920000 | \n",
" 5.106667 | \n",
" 5.426667 | \n",
" 5.533333 | \n",
" 5.533333 | \n",
" 6.013333 | \n",
" 6.400000 | \n",
" 6.600000 | \n",
" 7.333333 | \n",
" 7.833333 | \n",
" 8.133333 | \n",
" 1.769627 | \n",
"
\n",
" \n",
" 27 | \n",
" 13 | \n",
" 8.066667 | \n",
" 4.817949 | \n",
" 2.266667 | \n",
" 4.833333 | \n",
" 4.933333 | \n",
" 5.073333 | \n",
" 5.293333 | \n",
" 5.393333 | \n",
" 5.433333 | \n",
" 5.653333 | \n",
" 5.906667 | \n",
" 6.226667 | \n",
" 7.026667 | \n",
" 7.546667 | \n",
" 7.858667 | \n",
" 1.493295 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 26,
"text": [
" count max mean min p50 \\\n",
"category dayofweek binofday \n",
"ART 4 8 13 0.000000 0.000000 0.000000 0.000000 \n",
" 9 13 0.000000 0.000000 0.000000 0.000000 \n",
" 10 13 0.133333 0.010256 0.000000 0.000000 \n",
" 11 13 1.600000 0.325641 0.000000 0.000000 \n",
" 12 13 4.500000 1.800000 0.000000 1.966667 \n",
" 13 13 7.966667 5.143590 2.266667 4.933333 \n",
" 14 13 11.166667 8.497436 6.266667 7.966667 \n",
" 15 13 10.733333 8.707692 5.166667 9.200000 \n",
" 16 13 8.666667 6.046154 2.833333 6.533333 \n",
" 17 13 9.066667 6.279487 3.500000 6.333333 \n",
" 18 13 8.800000 6.310256 4.000000 6.500000 \n",
" 19 13 7.566667 6.074359 3.833333 5.966667 \n",
" 20 13 9.500000 5.784615 2.533333 5.533333 \n",
" 21 13 9.900000 6.271795 3.833333 5.700000 \n",
" 22 13 9.000000 6.343590 2.633333 6.633333 \n",
" 23 13 9.100000 5.692308 3.533333 5.100000 \n",
" 24 13 8.900000 5.592308 2.800000 5.433333 \n",
" 25 13 8.566667 5.282051 3.000000 5.166667 \n",
" 26 13 8.333333 4.779487 2.300000 4.800000 \n",
" 27 13 8.066667 4.817949 2.266667 4.833333 \n",
"\n",
" p55 p60 p65 p70 p75 \\\n",
"category dayofweek binofday \n",
"ART 4 8 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
" 9 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
" 10 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
" 11 0.060000 0.180000 0.420000 0.553333 0.633333 \n",
" 12 2.026667 2.146667 2.386667 2.493333 2.533333 \n",
" 13 5.073333 5.226667 5.406667 5.840000 6.400000 \n",
" 14 8.366667 8.733333 9.033333 9.226667 9.366667 \n",
" 15 9.480000 9.666667 9.666667 9.680000 9.700000 \n",
" 16 6.613333 6.680000 6.720000 6.880000 7.100000 \n",
" 17 6.353333 6.480000 6.820000 6.960000 7.000000 \n",
" 18 6.760000 6.966667 7.066667 7.220000 7.400000 \n",
" 19 6.466667 6.840000 6.960000 7.053333 7.133333 \n",
" 20 5.873333 6.153333 6.313333 6.473333 6.633333 \n",
" 21 5.780000 5.860000 5.940000 6.566667 7.466667 \n",
" 22 6.673333 6.793333 7.073333 7.246667 7.366667 \n",
" 23 5.440000 5.806667 6.226667 6.660000 7.100000 \n",
" 24 5.453333 5.566667 5.866667 6.113333 6.333333 \n",
" 25 5.226667 5.340000 5.560000 5.753333 5.933333 \n",
" 26 4.920000 5.106667 5.426667 5.533333 5.533333 \n",
" 27 4.933333 5.073333 5.293333 5.393333 5.433333 \n",
"\n",
" p80 p85 p90 p95 p975 \\\n",
"category dayofweek binofday \n",
"ART 4 8 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
" 9 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
" 10 0.000000 0.000000 0.000000 0.053333 0.093333 \n",
" 11 0.653333 0.680000 0.720000 1.080000 1.340000 \n",
" 12 2.713333 2.886667 3.046667 3.660000 4.080000 \n",
" 13 6.480000 6.740000 7.360000 7.726667 7.846667 \n",
" 14 9.466667 9.700000 10.200000 10.686667 10.926667 \n",
" 15 9.860000 10.040000 10.260000 10.493333 10.613333 \n",
" 16 7.480000 7.866667 8.266667 8.506667 8.586667 \n",
" 17 7.060000 7.473333 8.593333 9.006667 9.036667 \n",
" 18 7.620000 7.800000 7.900000 8.280000 8.540000 \n",
" 19 7.193333 7.280000 7.420000 7.506667 7.536667 \n",
" 20 6.753333 7.013333 7.553333 8.440000 8.970000 \n",
" 21 7.646667 7.946667 8.486667 9.160000 9.530000 \n",
" 22 7.526667 7.646667 7.686667 8.220000 8.610000 \n",
" 23 7.300000 7.446667 7.486667 8.140000 8.620000 \n",
" 24 6.633333 6.993333 7.473333 8.140000 8.520000 \n",
" 25 6.173333 6.486667 6.946667 7.686667 8.126667 \n",
" 26 6.013333 6.400000 6.600000 7.333333 7.833333 \n",
" 27 5.653333 5.906667 6.226667 7.026667 7.546667 \n",
"\n",
" p99 stdev \n",
"category dayofweek binofday \n",
"ART 4 8 0.000000 0.000000 \n",
" 9 0.000000 0.000000 \n",
" 10 0.117333 0.036980 \n",
" 11 1.496000 0.485179 \n",
" 12 4.332000 1.290923 \n",
" 13 7.918667 1.619007 \n",
" 14 11.070667 1.368661 \n",
" 15 10.685333 1.572987 \n",
" 16 8.634667 1.833815 \n",
" 17 9.054667 1.587074 \n",
" 18 8.696000 1.491729 \n",
" 19 7.554667 1.215955 \n",
" 20 9.288000 1.736142 \n",
" 21 9.752000 1.718303 \n",
" 22 8.844000 1.634092 \n",
" 23 8.908000 1.697337 \n",
" 24 8.748000 1.593921 \n",
" 25 8.390667 1.492674 \n",
" 26 8.133333 1.769627 \n",
" 27 7.858667 1.493295 "
]
}
],
"prompt_number": 26
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wouldn't it be nice if Excel Pivot Tables could produce the output above? Why can't they? Because they can't do things like percentiles (or other custom aggregate functions). I love spreadsheets. I teach spreadsheet modeling. However, I find myself using either Python+pandas+matplotlib or R+plyr+ggplot2 more and more frequently for things I used to do in Excel."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's fire these guys out to csv files so we can check them out and maybe play with them in spreadsheet. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"occ_stats_summary.to_csv('occ_stats_summary.csv')\n",
"arr_stats_summary.to_csv('arr_stats_summary.csv')\n",
"dep_stats_summary.to_csv('dep_stats_summary.csv')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 27
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The real reason I exported them to csv was to make it easy to read these results back in for Part 3 of this series of tutorials. In Part 3, we'll create some plots using matplotlib based on these summary statistics."
]
}
],
"metadata": {}
}
]
}