{ "metadata": { "name": "", "signature": "sha256:31726dcb3e50d60f7f50e0f3c3cf5f115af474ecdd9024a68de8f8b4b972a66d" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Hands-on with Pandas\n", "\n", "This notebook will walk you through some exercises to get practice using Pandas for data manipulation.\n", "\n", "As you use this, feel free to make ample use of the [Pandas Documentation](http://pandas.pydata.org), the [Pandas StackOverflow Channel](http://stackoverflow.com/questions/tagged/pandas), and your favorite search engine. For example, if you search phrases like [\"Pandas sum all columns\"](https://www.google.com/search?q=pandas+sum+all+columns), you're very likely to find an answer to the question you have in mind.\n", "\n", "Also, if it comes down to it, note that solutions are available in the Git repository." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Start with our normal batch of imports and settings\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "# Following is optional: set plotting styles\n", "import seaborn; seaborn.set()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Diving Deeper into Baby Names\n", "\n", "In the lecture, we looked at the US Social Security Baby Names data. Here let's dive a little bit deeper into this.\n", "\n", "Try to do the following with the Baby Names data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 0. Load the baby names data\n", "\n", "(Here you can copy the code from the other notebook; make sure you understand what it's doing!)" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Find your own name within the data. \n", "\n", "- How many babies per year are born with your name?\n", "- What *fraction* of births each year have your name?\n", "\n", "Note: there are multiple ways to do this, but the first part will use *masking* and *pivot tables*, while the second part might also throw-in a *groupby*." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Find names which have switched genders.\n", "\n", "This is a bit tricky: you might be tempted to use a ``groupby`` and ``apply`` over the multiple indices ``['year', 'gender', 'name']``, but if you try this you'll find that it's **very** computationally intensive.\n", "\n", "I'd suggest doing the following:\n", "\n", "- Use a pivot table, and find the total number of births for each name before some early date (say, 1920) and after some later date (say, 1980).\n", "- Compute the percentage of males for each name within those groups.\n", "- Use masking to find which names have transitioned from a low percentage to a high percentage, and vice versa.\n", "\n", "Is a name more likely to transition from female to male, or from male to female?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }