{
 "metadata": {
  "name": "",
  "signature": "sha256:0b1b41366c7c7ffcb008c269b4e553d138a67c8264dfe961448c82f4e3d11c30"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# pandas Data Structures\n",
      "\n",
      "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n",
      "- **Date:** -\n",
      "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n",
      "- **Note:**"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Import modules"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import pandas as pd"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 53
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Series 101"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Series are one-dimensional arrays (like R's vectors)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Create a series of the number of floodingReports"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "floodingReports = pd.Series([5, 6, 2, 9, 12])\n",
      "floodingReports"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 54,
       "text": [
        "0     5\n",
        "1     6\n",
        "2     2\n",
        "3     9\n",
        "4    12\n",
        "dtype: int64"
       ]
      }
     ],
     "prompt_number": 54
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Note that the first column of numbers (0 to 4) are the index."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Set county names to be the index of the floodingReports series"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "floodingReports = pd.Series([5, 6, 2, 9, 12], index=['Cochise County', 'Pima County', 'Santa Cruz County', 'Maricopa County', 'Yuma County'])\n",
      "floodingReports"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 55,
       "text": [
        "Cochise County        5\n",
        "Pima County           6\n",
        "Santa Cruz County     2\n",
        "Maricopa County       9\n",
        "Yuma County          12\n",
        "dtype: int64"
       ]
      }
     ],
     "prompt_number": 55
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### View the number of floodingReports in Cochise County"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "floodingReports['Cochise County']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 56,
       "text": [
        "5"
       ]
      }
     ],
     "prompt_number": 56
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### View the counties with  more than 6 flooding reports"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "floodingReports[floodingReports > 6]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 57,
       "text": [
        "Maricopa County     9\n",
        "Yuma County        12\n",
        "dtype: int64"
       ]
      }
     ],
     "prompt_number": 57
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Create a pandas series from a dictionary\n",
      "\n",
      "Note: when you do this, the dict's key's will become the series's index"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Create a dictionary\n",
      "fireReports_dict = {'Cochise County': 12, 'Pima County': 342, 'Santa Cruz County': 13, 'Maricopa County': 42, 'Yuma County' : 52}\n",
      "\n",
      "# Convert the dictionary into a pd.Series, and view it\n",
      "fireReports = pd.Series(fireReports_dict); fireReports"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 58,
       "text": [
        "Cochise County        12\n",
        "Maricopa County       42\n",
        "Pima County          342\n",
        "Santa Cruz County     13\n",
        "Yuma County           52\n",
        "dtype: int64"
       ]
      }
     ],
     "prompt_number": 58
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Change the index of a series to shorter names"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "fireReports.index = [\"Cochice\", \"Pima\", \"Santa Cruz\", \"Maricopa\", \"Yuma\"]\n",
      "fireReports"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 63,
       "text": [
        "Cochice        12\n",
        "Pima           42\n",
        "Santa Cruz    342\n",
        "Maricopa       13\n",
        "Yuma           52\n",
        "dtype: int64"
       ]
      }
     ],
     "prompt_number": 63
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## DataFrame 101\n",
      "\n",
      "DataFrames are like R's Dataframes"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Create a dataframe from a dict of equal length lists or numpy arrays"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "data = {'county': ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'], \n",
      "        'year': [2012, 2012, 2013, 2014, 2014], \n",
      "        'reports': [4, 24, 31, 2, 3]}\n",
      "df = pd.DataFrame(data)\n",
      "df"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>county</th>\n",
        "      <th>reports</th>\n",
        "      <th>year</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td>    Cochice</td>\n",
        "      <td>  4</td>\n",
        "      <td> 2012</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>       Pima</td>\n",
        "      <td> 24</td>\n",
        "      <td> 2012</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> Santa Cruz</td>\n",
        "      <td> 31</td>\n",
        "      <td> 2013</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td>   Maricopa</td>\n",
        "      <td>  2</td>\n",
        "      <td> 2014</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td>       Yuma</td>\n",
        "      <td>  3</td>\n",
        "      <td> 2014</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>5 rows \u00d7 3 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 66,
       "text": [
        "       county  reports  year\n",
        "0     Cochice        4  2012\n",
        "1        Pima       24  2012\n",
        "2  Santa Cruz       31  2013\n",
        "3    Maricopa        2  2014\n",
        "4        Yuma        3  2014\n",
        "\n",
        "[5 rows x 3 columns]"
       ]
      }
     ],
     "prompt_number": 66
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Set the order of the columns using the columns attribute"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dfColumnOrdered = pd.DataFrame(data, columns=['county', 'year', 'reports'])\n",
      "dfColumnOrdered"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>county</th>\n",
        "      <th>year</th>\n",
        "      <th>reports</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td>    Cochice</td>\n",
        "      <td> 2012</td>\n",
        "      <td>  4</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>       Pima</td>\n",
        "      <td> 2012</td>\n",
        "      <td> 24</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> Santa Cruz</td>\n",
        "      <td> 2013</td>\n",
        "      <td> 31</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td>   Maricopa</td>\n",
        "      <td> 2014</td>\n",
        "      <td>  2</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td>       Yuma</td>\n",
        "      <td> 2014</td>\n",
        "      <td>  3</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>5 rows \u00d7 3 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 95,
       "text": [
        "       county  year  reports\n",
        "0     Cochice  2012        4\n",
        "1        Pima  2012       24\n",
        "2  Santa Cruz  2013       31\n",
        "3    Maricopa  2014        2\n",
        "4        Yuma  2014        3\n",
        "\n",
        "[5 rows x 3 columns]"
       ]
      }
     ],
     "prompt_number": 95
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Add a column"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dfColumnOrdered['newsCoverage'] = pd.Series([42.3, 92.1, 12.2, 39.3, 30.2])\n",
      "dfColumnOrdered"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>county</th>\n",
        "      <th>year</th>\n",
        "      <th>reports</th>\n",
        "      <th>newsCoverage</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td>    Cochice</td>\n",
        "      <td> 2012</td>\n",
        "      <td>  4</td>\n",
        "      <td> 42.3</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>       Pima</td>\n",
        "      <td> 2012</td>\n",
        "      <td> 24</td>\n",
        "      <td> 92.1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> Santa Cruz</td>\n",
        "      <td> 2013</td>\n",
        "      <td> 31</td>\n",
        "      <td> 12.2</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td>   Maricopa</td>\n",
        "      <td> 2014</td>\n",
        "      <td>  2</td>\n",
        "      <td> 39.3</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td>       Yuma</td>\n",
        "      <td> 2014</td>\n",
        "      <td>  3</td>\n",
        "      <td> 30.2</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>5 rows \u00d7 4 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 101,
       "text": [
        "       county  year  reports  newsCoverage\n",
        "0     Cochice  2012        4          42.3\n",
        "1        Pima  2012       24          92.1\n",
        "2  Santa Cruz  2013       31          12.2\n",
        "3    Maricopa  2014        2          39.3\n",
        "4        Yuma  2014        3          30.2\n",
        "\n",
        "[5 rows x 4 columns]"
       ]
      }
     ],
     "prompt_number": 101
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Delete a column"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "del dfColumnOrdered['newsCoverage']\n",
      "dfColumnOrdered"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>county</th>\n",
        "      <th>year</th>\n",
        "      <th>reports</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td>    Cochice</td>\n",
        "      <td> 2012</td>\n",
        "      <td>  4</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>       Pima</td>\n",
        "      <td> 2012</td>\n",
        "      <td> 24</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> Santa Cruz</td>\n",
        "      <td> 2013</td>\n",
        "      <td> 31</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td>   Maricopa</td>\n",
        "      <td> 2014</td>\n",
        "      <td>  2</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td>       Yuma</td>\n",
        "      <td> 2014</td>\n",
        "      <td>  3</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>5 rows \u00d7 3 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 102,
       "text": [
        "       county  year  reports\n",
        "0     Cochice  2012        4\n",
        "1        Pima  2012       24\n",
        "2  Santa Cruz  2013       31\n",
        "3    Maricopa  2014        2\n",
        "4        Yuma  2014        3\n",
        "\n",
        "[5 rows x 3 columns]"
       ]
      }
     ],
     "prompt_number": 102
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Transpose the dataframe"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dfColumnOrdered.T"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>0</th>\n",
        "      <th>1</th>\n",
        "      <th>2</th>\n",
        "      <th>3</th>\n",
        "      <th>4</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>county</th>\n",
        "      <td> Cochice</td>\n",
        "      <td> Pima</td>\n",
        "      <td> Santa Cruz</td>\n",
        "      <td> Maricopa</td>\n",
        "      <td> Yuma</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>year</th>\n",
        "      <td>    2012</td>\n",
        "      <td> 2012</td>\n",
        "      <td>       2013</td>\n",
        "      <td>     2014</td>\n",
        "      <td> 2014</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>reports</th>\n",
        "      <td>       4</td>\n",
        "      <td>   24</td>\n",
        "      <td>         31</td>\n",
        "      <td>        2</td>\n",
        "      <td>    3</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>3 rows \u00d7 5 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 104,
       "text": [
        "               0     1           2         3     4\n",
        "county   Cochice  Pima  Santa Cruz  Maricopa  Yuma\n",
        "year        2012  2012        2013      2014  2014\n",
        "reports        4    24          31         2     3\n",
        "\n",
        "[3 rows x 5 columns]"
       ]
      }
     ],
     "prompt_number": 104
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}