{
 "metadata": {
  "name": "",
  "signature": "sha256:0dc6444fa70583def6226d897aa5534149ac315405391aff8cd17de1ead9f116"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Applying Operations Over pandas Dataframes\n",
      "\n",
      "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n",
      "- **Date:** -\n",
      "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n",
      "- **Note:**"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Import Modules"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import pandas as pd\n",
      "import numpy as np"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 72
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Create a dataframe"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], \n",
      "        'year': [2012, 2012, 2013, 2014, 2014], \n",
      "        'reports': [4, 24, 31, 2, 3],\n",
      "        'coverage': [25, 94, 57, 62, 70]}\n",
      "df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])\n",
      "df"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>coverage</th>\n",
        "      <th>name</th>\n",
        "      <th>reports</th>\n",
        "      <th>year</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>Cochice</th>\n",
        "      <td> 25</td>\n",
        "      <td> Jason</td>\n",
        "      <td>  4</td>\n",
        "      <td> 2012</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Pima</th>\n",
        "      <td> 94</td>\n",
        "      <td> Molly</td>\n",
        "      <td> 24</td>\n",
        "      <td> 2012</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Santa Cruz</th>\n",
        "      <td> 57</td>\n",
        "      <td>  Tina</td>\n",
        "      <td> 31</td>\n",
        "      <td> 2013</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Maricopa</th>\n",
        "      <td> 62</td>\n",
        "      <td>  Jake</td>\n",
        "      <td>  2</td>\n",
        "      <td> 2014</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Yuma</th>\n",
        "      <td> 70</td>\n",
        "      <td>   Amy</td>\n",
        "      <td>  3</td>\n",
        "      <td> 2014</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>5 rows \u00d7 4 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 73,
       "text": [
        "            coverage   name  reports  year\n",
        "Cochice           25  Jason        4  2012\n",
        "Pima              94  Molly       24  2012\n",
        "Santa Cruz        57   Tina       31  2013\n",
        "Maricopa          62   Jake        2  2014\n",
        "Yuma              70    Amy        3  2014\n",
        "\n",
        "[5 rows x 4 columns]"
       ]
      }
     ],
     "prompt_number": 73
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Create a capitalization lambda function"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "capitalizer = lambda x: x.upper()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 74
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Apply the capitalizer function over the column 'name'\n",
      "\n",
      "apply() can apply a function along any axis of the dataframe"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df['name'].apply(capitalizer)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 75,
       "text": [
        "Cochice       JASON\n",
        "Pima          MOLLY\n",
        "Santa Cruz     TINA\n",
        "Maricopa       JAKE\n",
        "Yuma            AMY\n",
        "Name: name, dtype: object"
       ]
      }
     ],
     "prompt_number": 75
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Map the capitalizer lambda function over each element in the series 'name'\n",
      "\n",
      "map() applies an operation over each element of a series"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df['name'].map(capitalizer)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 76,
       "text": [
        "Cochice       JASON\n",
        "Pima          MOLLY\n",
        "Santa Cruz     TINA\n",
        "Maricopa       JAKE\n",
        "Yuma            AMY\n",
        "Name: name, dtype: object"
       ]
      }
     ],
     "prompt_number": 76
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Apply a square root function to every single cell in the whole data frame\n",
      "\n",
      "applymap() applies a function to every single element in the entire dataframe."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Drop the string variable so that applymap() can run\n",
      "df = df.drop('name', axis=1)\n",
      "\n",
      "# Return the square root of every cell in the dataframe\n",
      "df.applymap(np.sqrt)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>coverage</th>\n",
        "      <th>reports</th>\n",
        "      <th>year</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>Cochice</th>\n",
        "      <td> 5.000000</td>\n",
        "      <td> 2.000000</td>\n",
        "      <td> 44.855323</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Pima</th>\n",
        "      <td> 9.695360</td>\n",
        "      <td> 4.898979</td>\n",
        "      <td> 44.855323</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Santa Cruz</th>\n",
        "      <td> 7.549834</td>\n",
        "      <td> 5.567764</td>\n",
        "      <td> 44.866469</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Maricopa</th>\n",
        "      <td> 7.874008</td>\n",
        "      <td> 1.414214</td>\n",
        "      <td> 44.877611</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Yuma</th>\n",
        "      <td> 8.366600</td>\n",
        "      <td> 1.732051</td>\n",
        "      <td> 44.877611</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>5 rows \u00d7 3 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 77,
       "text": [
        "            coverage   reports       year\n",
        "Cochice     5.000000  2.000000  44.855323\n",
        "Pima        9.695360  4.898979  44.855323\n",
        "Santa Cruz  7.549834  5.567764  44.866469\n",
        "Maricopa    7.874008  1.414214  44.877611\n",
        "Yuma        8.366600  1.732051  44.877611\n",
        "\n",
        "[5 rows x 3 columns]"
       ]
      }
     ],
     "prompt_number": 77
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Applying A Function Over A Dataframe"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Create a function that multiplies all non-strings by 100"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# create a function called times100\n",
      "def times100(x):\n",
      "    # that, if x is a string,\n",
      "    if type(x) is str:\n",
      "        # just returns it untouched\n",
      "        return x\n",
      "    # but, if not, return it multiplied by 100\n",
      "    elif x:\n",
      "        return 100 * x\n",
      "    # and leave everything else\n",
      "    else:\n",
      "        return"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 80
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Apply the times100 over every cell in the dataframe"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df.applymap(times100)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>coverage</th>\n",
        "      <th>reports</th>\n",
        "      <th>year</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>Cochice</th>\n",
        "      <td> 2500</td>\n",
        "      <td>  400</td>\n",
        "      <td> 201200</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Pima</th>\n",
        "      <td> 9400</td>\n",
        "      <td> 2400</td>\n",
        "      <td> 201200</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Santa Cruz</th>\n",
        "      <td> 5700</td>\n",
        "      <td> 3100</td>\n",
        "      <td> 201300</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Maricopa</th>\n",
        "      <td> 6200</td>\n",
        "      <td>  200</td>\n",
        "      <td> 201400</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Yuma</th>\n",
        "      <td> 7000</td>\n",
        "      <td>  300</td>\n",
        "      <td> 201400</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>5 rows \u00d7 3 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 79,
       "text": [
        "            coverage  reports    year\n",
        "Cochice         2500      400  201200\n",
        "Pima            9400     2400  201200\n",
        "Santa Cruz      5700     3100  201300\n",
        "Maricopa        6200      200  201400\n",
        "Yuma            7000      300  201400\n",
        "\n",
        "[5 rows x 3 columns]"
       ]
      }
     ],
     "prompt_number": 79
    }
   ],
   "metadata": {}
  }
 ]
}