{ "metadata": { "name": "", "signature": "sha256:5d97033b118e50c5926742af5a112c4b5c73f865f5142feafb535c3462f32184" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Breakout: Data Exploration and Visualization" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Start with our normal batch of imports and settings\n", "from __future__ import print_function, division\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "import seaborn as sns; sns.set()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Load the Data\n", "\n", "1. Download the files:\n", " - https://www.dropbox.com/s/74vc6yqguz4z0rx/maleVisitsToPhysician.csv\n", " - https://www.dropbox.com/s/5w4l7fvus79wvew/femaleVisitsToPhysician.csv\n", "2. load females and males data separately (try setting the ``index_col`` argument to set the index to the first column)\n", "3. combine using ``pd.concat`` into a single dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "females = pd.read_csv('data/femaleVisitsToPhysician.csv')\n", "males = pd.read_csv('data/maleVisitsToPhysician.csv')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Visualize the data\n", "\n", "For each gender, the data shows the per capita consultations by age and year.\n", "Use ``pd.pivot_table`` and plot the data.\n", "\n", "Also, as you create these plots, experiment with ``sns.set_palette`` to get a color scheme which helps convey the information you're interested in.\n", "\n", "1. Use a pivot table to index the data by age and gender\n", "2. Plot age vs per capita visits for females, one line per year\n", "3. Plot age vs per capita visits for males, one line per year" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Effect of the 2010 Copayment Elimination\n", "\n", "The copayment for GP visits was eliminated in 2010. Let's see whether there is any indication that this affected the rate of visits\n", "\n", "1. Add a column to the data called ``with_copay``, which is True if the year is prior to 2010, and False otherwise\n", "2. Use a pivot table to plot the mean visits per capita for the years with a copay and without (one plot each for men and women)\n", "3. Plot the percentage increase in per capita visits as a function of age. What age ranges did the copay change most affect?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Tracing trends in a Single Generation\n", "\n", "Let's try to pull some information out of the data that's not obviously available.\n", "\n", "Notice that the ``age`` column and the ``year`` column are intertwined... that is, by subtracting the age from the year, we can find the birth year of the group of people recorded.\n", "\n", "1. Create a new column in the data containing the birth year.\n", "2. Plot the **population** by birth year, with a different color line for each observation year. What does this tell you about immigrations and deaths in the population?\n", "3. Plot the **per capita visits** by birth year, with a different color line for each observation year. Are there any generations which have consistently more or consistently fewer visits than those in adjacent years?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Bonus: Exploring Titanic Survivors\n", "\n", "If you finish the above tasks, try this more open-ended exploration on a different dataset.\n", "\n", "Seaborn includes a dataset representing the individuals who were on-board the ill-fated maiden voyage of the *Titanic*. It has information about their age, gender, class, fare paid, the deck their quarters were on, whether they were traveling with someone, and whether they survived.\n", "\n", "This is a fairly open-ended exploration, but try answering these questions:\n", "\n", "1. Did age influence chances of survival?\n", "2. Did gender influence chances of survival?\n", "3. Did wealth (measured by class or by fair paid) influence chances of survival?\n", "4. Did the deck the person was on influence chances of survival?\n", "\n", "See what sort of interesting relationships you can find between the various pieces of data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# load the titanic data\n", "titanic = sns.load_dataset('titanic')" ], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }