{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Exercise 3a: Practical Pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_This is a quiz given in Roger Peng [Coursera](https://www.coursera.org) class [Computing for Data Analysis](https://www.coursera.org/course/compdata). _\n", "\n", "_Sourced from [Research Computing MeetUp's](https://github.com/ResearchComputing/Meetup-Fall-2013) Python course._\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import os\n", "\n", "data = pd.read_csv(os.path.join('data', 'ozone.csv'))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "print data.head()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " Ozone Solar.R Wind Temp Month Day\n", "0 41 190 7.4 67 5 1\n", "1 36 118 8.0 72 5 2\n", "2 12 149 12.6 74 5 3\n", "3 18 313 11.5 62 5 4\n", "4 NaN NaN 14.3 56 5 5\n" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Print the column names of the dataset to the screen, one column name per line. " ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extract the first 2 rows of the data frame and print them to the console. What does the output\n", "look like?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many observations (i.e. rows) are in this data frame?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extract the last 2 rows of the data frame and print them to the console. What does the output\n", "look like?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the value of Ozone in the 47th row?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many missing values are in the Ozone column of this data frame?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the mean of the Ozone column in this dataset? Exclude missing values (coded as NA)\n", "from this calculation." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extract the subset of rows of the data frame where Ozone values are above 31 and Temp values\n", "are above 90. What is the mean of Solar.R in this subset?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What is the mean of \"Temp\" when \"Month\" is equal to 6?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What was the maximum ozone value in the month of May (i.e. Month = 5)?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Exercise 3b: Functions with Pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kaggle has a nice challenge based on titanic passenger data:\n", "\n", "\"One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.\"\n", "\n", "We'll use the 892 line training set from [Kaggle](http://www.kaggle.com/c/titanic-gettingStarted/data), located in the data directory of this lesson.\n", " \n", " " ] }, { "cell_type": "code", "collapsed": false, "input": [ "import os\n", "import pandas as pd\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "a=pd.read_csv('./data/titanic.csv')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 30 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read the data\n", "\n", "- Read with `pandas.read_csv`\n", "- Verify that you get `891` lines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What percent of the people survied?\n", "\n", "- Yes, they survived: `d.survived == 1`" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Function\n", "\n", "Write a function that returns a dictionary of the number of `survived`, `not survived`, and `unknown`. Here is the example function call:\n", "```python\n", "print titanic_function(data)\n", "\n", "{'survived': 342, 'not survived': 549}\n", "\n", "```\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "\n", " " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "###What percent of males survived? Females?" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Next Steps\n", "\n", "**Recommended Resources**\n", "\n", "Name | Description\n", "--- | ---\n", "[Official Pandas Tutorials](http://pandas.pydata.org/pandas-docs/stable/tutorials.html) | Wes & Company's selection of tutorials and lectures\n", "[Julia Evans Pandas Cookbook](https://github.com/jvns/pandas-cookbook) | Great resource with eamples from weather, bikes and 311 calls\n", "[Learn Pandas Tutorials](https://bitbucket.org/hrojas/learn-pandas) | A great series of Pandas tutorials from Dave Rojas\n", "[Research Computing Python Data PYNBs](https://github.com/ResearchComputing/Meetup-Fall-2013/tree/master/python) | A super awesome set of python notebooks from a meetup-based course exclusively devoted to pandas" ] } ], "metadata": {} } ] }