{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Occupancy plots with Python and ggplot2 via rmagic\n", "===================================================\n", "\n", "## Background ##\n", "\n", "In two previous posts ([here](http://hselab.org/occupancy-analysis-with-python-pandas-part-1-create-by-date-data-frame.html) and [here](http://hselab.org/occupancy-analysis-with-python-pandas-part-2-compute-occupancy-summary-stats.html)), I showed how to do occupancy analysis with the beginnings of a Python based version of [Hillmaker](http://hillmaker.sourceforge.net/). The example is based on data from a hospital short stay unit.\n", "\n", "In this short tutorial, I'll show how to use `rmagic` from within an IPython notebook so that we can make occupancy plots using `ggplot2`. In particular, we want to create a grid of occupancy histograms with one grid axis being patient type and the other axis being day of week. We want to make sure that the plots are ordered Sunday, Monday, ..., Saturday and NOT in alphabetical order (Friday, Monday, ..., Wednesday).\n", "\n", "The first part of such an analysis leads to a table we call the \"by datetime\" table. At the end of Part 1 of this tutorial series, we ended up with a csv file called bydate_shortstay_csv.csv. Let's read it in and take a look at it.\n", "\n", "You can find the data and the `.ipynb` file in my [hselab-tutorials](https://github.com/misken/hselab-tutorials) github repo. Clone or download a zip.\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "\n", "# Make the graphs a bit prettier, and bigger\n", "pd.set_option('display.mpl_style', 'default') \n", "pd.set_option('display.line_width', 5000) \n", "pd.set_option('display.max_columns', 60) " ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "line_width has been deprecated, use display.width instead (currently both are\n", "identical)\n", "\n" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "## Read sample data set and convert string dates to datetimes\n", "bydatetime_df = pd.read_csv('data/bydate_shortstay_csv.csv',parse_dates=['datetime'])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "bydatetime_df.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | category | \n", "datetime | \n", "arrivals | \n", "binofday | \n", "binofweek | \n", "dayofweek | \n", "departures | \n", "occupancy | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "IVT | \n", "1996-01-07 00:00:00 | \n", "0 | \n", "0 | \n", "288 | \n", "6 | \n", "0 | \n", "0.0 | \n", "
1 | \n", "IVT | \n", "1996-01-07 00:30:00 | \n", "0 | \n", "1 | \n", "289 | \n", "6 | \n", "1 | \n", "0.5 | \n", "
2 | \n", "IVT | \n", "1996-01-07 01:00:00 | \n", "0 | \n", "2 | \n", "290 | \n", "6 | \n", "0 | \n", "0.0 | \n", "
3 | \n", "IVT | \n", "1996-01-07 01:30:00 | \n", "0 | \n", "3 | \n", "291 | \n", "6 | \n", "0 | \n", "0.0 | \n", "
4 | \n", "IVT | \n", "1996-01-07 02:00:00 | \n", "0 | \n", "4 | \n", "292 | \n", "6 | \n", "0 | \n", "0.0 | \n", "
5 rows \u00d7 8 columns
\n", "\n", " | \n", " | occupancy | \n", "dayofweek | \n", "
---|---|---|---|
category | \n", "datetime | \n", "\n", " | \n", " |
ART | \n", "1996-01-07 | \n", "0.000000 | \n", "Sun | \n", "
1996-01-08 | \n", "1.732639 | \n", "Mon | \n", "|
1996-01-09 | \n", "1.532639 | \n", "Tue | \n", "|
1996-01-10 | \n", "1.541667 | \n", "Wed | \n", "|
1996-01-11 | \n", "1.872222 | \n", "Thu | \n", "
5 rows \u00d7 2 columns
\n", "\n", " | \n", " | occupancy | \n", "dayofweek | \n", "
---|---|---|---|
category | \n", "datetime | \n", "\n", " | \n", " |
Total | \n", "1996-03-27 | \n", "15.801389 | \n", "Wed | \n", "
1996-03-28 | \n", "13.538194 | \n", "Thu | \n", "|
1996-03-29 | \n", "18.146528 | \n", "Fri | \n", "|
1996-03-30 | \n", "1.436111 | \n", "Sat | \n", "|
1996-03-31 | \n", "0.227083 | \n", "Sun | \n", "
5 rows \u00d7 2 columns
\n", "