{ "metadata": { "name": "", "signature": "sha256:0dc6444fa70583def6226d897aa5534149ac315405391aff8cd17de1ead9f116" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Applying Operations Over pandas Dataframes\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import Modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import numpy as np" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 72 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], \n", " 'year': [2012, 2012, 2013, 2014, 2014], \n", " 'reports': [4, 24, 31, 2, 3],\n", " 'coverage': [25, 94, 57, 62, 70]}\n", "df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coveragenamereportsyear
Cochice 25 Jason 4 2012
Pima 94 Molly 24 2012
Santa Cruz 57 Tina 31 2013
Maricopa 62 Jake 2 2014
Yuma 70 Amy 3 2014
\n", "

5 rows \u00d7 4 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 73, "text": [ " coverage name reports year\n", "Cochice 25 Jason 4 2012\n", "Pima 94 Molly 24 2012\n", "Santa Cruz 57 Tina 31 2013\n", "Maricopa 62 Jake 2 2014\n", "Yuma 70 Amy 3 2014\n", "\n", "[5 rows x 4 columns]" ] } ], "prompt_number": 73 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a capitalization lambda function" ] }, { "cell_type": "code", "collapsed": false, "input": [ "capitalizer = lambda x: x.upper()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 74 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Apply the capitalizer function over the column 'name'\n", "\n", "apply() can apply a function along any axis of the dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df['name'].apply(capitalizer)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 75, "text": [ "Cochice JASON\n", "Pima MOLLY\n", "Santa Cruz TINA\n", "Maricopa JAKE\n", "Yuma AMY\n", "Name: name, dtype: object" ] } ], "prompt_number": 75 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Map the capitalizer lambda function over each element in the series 'name'\n", "\n", "map() applies an operation over each element of a series" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df['name'].map(capitalizer)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 76, "text": [ "Cochice JASON\n", "Pima MOLLY\n", "Santa Cruz TINA\n", "Maricopa JAKE\n", "Yuma AMY\n", "Name: name, dtype: object" ] } ], "prompt_number": 76 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Apply a square root function to every single cell in the whole data frame\n", "\n", "applymap() applies a function to every single element in the entire dataframe." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Drop the string variable so that applymap() can run\n", "df = df.drop('name', axis=1)\n", "\n", "# Return the square root of every cell in the dataframe\n", "df.applymap(np.sqrt)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coveragereportsyear
Cochice 5.000000 2.000000 44.855323
Pima 9.695360 4.898979 44.855323
Santa Cruz 7.549834 5.567764 44.866469
Maricopa 7.874008 1.414214 44.877611
Yuma 8.366600 1.732051 44.877611
\n", "

5 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 77, "text": [ " coverage reports year\n", "Cochice 5.000000 2.000000 44.855323\n", "Pima 9.695360 4.898979 44.855323\n", "Santa Cruz 7.549834 5.567764 44.866469\n", "Maricopa 7.874008 1.414214 44.877611\n", "Yuma 8.366600 1.732051 44.877611\n", "\n", "[5 rows x 3 columns]" ] } ], "prompt_number": 77 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Applying A Function Over A Dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a function that multiplies all non-strings by 100" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# create a function called times100\n", "def times100(x):\n", " # that, if x is a string,\n", " if type(x) is str:\n", " # just returns it untouched\n", " return x\n", " # but, if not, return it multiplied by 100\n", " elif x:\n", " return 100 * x\n", " # and leave everything else\n", " else:\n", " return" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 80 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Apply the times100 over every cell in the dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df.applymap(times100)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
coveragereportsyear
Cochice 2500 400 201200
Pima 9400 2400 201200
Santa Cruz 5700 3100 201300
Maricopa 6200 200 201400
Yuma 7000 300 201400
\n", "

5 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 79, "text": [ " coverage reports year\n", "Cochice 2500 400 201200\n", "Pima 9400 2400 201200\n", "Santa Cruz 5700 3100 201300\n", "Maricopa 6200 200 201400\n", "Yuma 7000 300 201400\n", "\n", "[5 rows x 3 columns]" ] } ], "prompt_number": 79 } ], "metadata": {} } ] }