{ "metadata": { "name": "", "signature": "sha256:0b1b41366c7c7ffcb008c269b4e553d138a67c8264dfe961448c82f4e3d11c30" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# pandas Data Structures\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 53 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Series 101" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Series are one-dimensional arrays (like R's vectors)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a series of the number of floodingReports" ] }, { "cell_type": "code", "collapsed": false, "input": [ "floodingReports = pd.Series([5, 6, 2, 9, 12])\n", "floodingReports" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 54, "text": [ "0 5\n", "1 6\n", "2 2\n", "3 9\n", "4 12\n", "dtype: int64" ] } ], "prompt_number": 54 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the first column of numbers (0 to 4) are the index." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set county names to be the index of the floodingReports series" ] }, { "cell_type": "code", "collapsed": false, "input": [ "floodingReports = pd.Series([5, 6, 2, 9, 12], index=['Cochise County', 'Pima County', 'Santa Cruz County', 'Maricopa County', 'Yuma County'])\n", "floodingReports" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 55, "text": [ "Cochise County 5\n", "Pima County 6\n", "Santa Cruz County 2\n", "Maricopa County 9\n", "Yuma County 12\n", "dtype: int64" ] } ], "prompt_number": 55 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View the number of floodingReports in Cochise County" ] }, { "cell_type": "code", "collapsed": false, "input": [ "floodingReports['Cochise County']" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 56, "text": [ "5" ] } ], "prompt_number": 56 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View the counties with more than 6 flooding reports" ] }, { "cell_type": "code", "collapsed": false, "input": [ "floodingReports[floodingReports > 6]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 57, "text": [ "Maricopa County 9\n", "Yuma County 12\n", "dtype: int64" ] } ], "prompt_number": 57 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a pandas series from a dictionary\n", "\n", "Note: when you do this, the dict's key's will become the series's index" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Create a dictionary\n", "fireReports_dict = {'Cochise County': 12, 'Pima County': 342, 'Santa Cruz County': 13, 'Maricopa County': 42, 'Yuma County' : 52}\n", "\n", "# Convert the dictionary into a pd.Series, and view it\n", "fireReports = pd.Series(fireReports_dict); fireReports" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 58, "text": [ "Cochise County 12\n", "Maricopa County 42\n", "Pima County 342\n", "Santa Cruz County 13\n", "Yuma County 52\n", "dtype: int64" ] } ], "prompt_number": 58 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Change the index of a series to shorter names" ] }, { "cell_type": "code", "collapsed": false, "input": [ "fireReports.index = [\"Cochice\", \"Pima\", \"Santa Cruz\", \"Maricopa\", \"Yuma\"]\n", "fireReports" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 63, "text": [ "Cochice 12\n", "Pima 42\n", "Santa Cruz 342\n", "Maricopa 13\n", "Yuma 52\n", "dtype: int64" ] } ], "prompt_number": 63 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DataFrame 101\n", "\n", "DataFrames are like R's Dataframes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a dataframe from a dict of equal length lists or numpy arrays" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = {'county': ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'], \n", " 'year': [2012, 2012, 2013, 2014, 2014], \n", " 'reports': [4, 24, 31, 2, 3]}\n", "df = pd.DataFrame(data)\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countyreportsyear
0 Cochice 4 2012
1 Pima 24 2012
2 Santa Cruz 31 2013
3 Maricopa 2 2014
4 Yuma 3 2014
\n", "

5 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 66, "text": [ " county reports year\n", "0 Cochice 4 2012\n", "1 Pima 24 2012\n", "2 Santa Cruz 31 2013\n", "3 Maricopa 2 2014\n", "4 Yuma 3 2014\n", "\n", "[5 rows x 3 columns]" ] } ], "prompt_number": 66 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set the order of the columns using the columns attribute" ] }, { "cell_type": "code", "collapsed": false, "input": [ "dfColumnOrdered = pd.DataFrame(data, columns=['county', 'year', 'reports'])\n", "dfColumnOrdered" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countyyearreports
0 Cochice 2012 4
1 Pima 2012 24
2 Santa Cruz 2013 31
3 Maricopa 2014 2
4 Yuma 2014 3
\n", "

5 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 95, "text": [ " county year reports\n", "0 Cochice 2012 4\n", "1 Pima 2012 24\n", "2 Santa Cruz 2013 31\n", "3 Maricopa 2014 2\n", "4 Yuma 2014 3\n", "\n", "[5 rows x 3 columns]" ] } ], "prompt_number": 95 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add a column" ] }, { "cell_type": "code", "collapsed": false, "input": [ "dfColumnOrdered['newsCoverage'] = pd.Series([42.3, 92.1, 12.2, 39.3, 30.2])\n", "dfColumnOrdered" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countyyearreportsnewsCoverage
0 Cochice 2012 4 42.3
1 Pima 2012 24 92.1
2 Santa Cruz 2013 31 12.2
3 Maricopa 2014 2 39.3
4 Yuma 2014 3 30.2
\n", "

5 rows \u00d7 4 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 101, "text": [ " county year reports newsCoverage\n", "0 Cochice 2012 4 42.3\n", "1 Pima 2012 24 92.1\n", "2 Santa Cruz 2013 31 12.2\n", "3 Maricopa 2014 2 39.3\n", "4 Yuma 2014 3 30.2\n", "\n", "[5 rows x 4 columns]" ] } ], "prompt_number": 101 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Delete a column" ] }, { "cell_type": "code", "collapsed": false, "input": [ "del dfColumnOrdered['newsCoverage']\n", "dfColumnOrdered" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countyyearreports
0 Cochice 2012 4
1 Pima 2012 24
2 Santa Cruz 2013 31
3 Maricopa 2014 2
4 Yuma 2014 3
\n", "

5 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 102, "text": [ " county year reports\n", "0 Cochice 2012 4\n", "1 Pima 2012 24\n", "2 Santa Cruz 2013 31\n", "3 Maricopa 2014 2\n", "4 Yuma 2014 3\n", "\n", "[5 rows x 3 columns]" ] } ], "prompt_number": 102 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transpose the dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "dfColumnOrdered.T" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234
county Cochice Pima Santa Cruz Maricopa Yuma
year 2012 2012 2013 2014 2014
reports 4 24 31 2 3
\n", "

3 rows \u00d7 5 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 104, "text": [ " 0 1 2 3 4\n", "county Cochice Pima Santa Cruz Maricopa Yuma\n", "year 2012 2012 2013 2014 2014\n", "reports 4 24 31 2 3\n", "\n", "[3 rows x 5 columns]" ] } ], "prompt_number": 104 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }