{ "metadata": { "name": "", "signature": "sha256:69af2654f6b68061f8ee043a4413fbed27c3d5d890eaf654610db2faf95faab8" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Dropping Rows And Columns In pandas Dataframe\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a dataframe " ] }, { "cell_type": "code", "collapsed": false, "input": [ "data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], \n", " 'year': [2012, 2012, 2013, 2014, 2014], \n", " 'reports': [4, 24, 31, 2, 3]}\n", "df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namereportsyear
Cochice Jason 4 2012
Pima Molly 24 2012
Santa Cruz Tina 31 2013
Maricopa Jake 2 2014
Yuma Amy 3 2014
\n", "

5 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ " name reports year\n", "Cochice Jason 4 2012\n", "Pima Molly 24 2012\n", "Santa Cruz Tina 31 2013\n", "Maricopa Jake 2 2014\n", "Yuma Amy 3 2014\n", "\n", "[5 rows x 3 columns]" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Drop an observation (row)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df.drop(['Cochice', 'Pima'])" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namereportsyear
Santa Cruz Tina 31 2013
Maricopa Jake 2 2014
Yuma Amy 3 2014
\n", "

3 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ " name reports year\n", "Santa Cruz Tina 31 2013\n", "Maricopa Jake 2 2014\n", "Yuma Amy 3 2014\n", "\n", "[3 rows x 3 columns]" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Drop a variable (column)\n", "\n", "Note: axis=1 denotes that we are referring to a column, not a row" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df.drop('reports', axis=1)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameyear
Cochice Jason 2012
Pima Molly 2012
Santa Cruz Tina 2013
Maricopa Jake 2014
Yuma Amy 2014
\n", "

5 rows \u00d7 2 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ " name year\n", "Cochice Jason 2012\n", "Pima Molly 2012\n", "Santa Cruz Tina 2013\n", "Maricopa Jake 2014\n", "Yuma Amy 2014\n", "\n", "[5 rows x 2 columns]" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Drop a row if it contains a certain value (in this case, \"Tina\")\n", "\n", "Specifically: Create a new dataframe called df that includes all rows where the value of a cell in the name column does not equal \"Tina\"" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df = df[df.name != 'Tina']\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namereportsyear
Cochice Jason 4 2012
Pima Molly 24 2012
Maricopa Jake 2 2014
Yuma Amy 3 2014
\n", "

4 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ " name reports year\n", "Cochice Jason 4 2012\n", "Pima Molly 24 2012\n", "Maricopa Jake 2 2014\n", "Yuma Amy 3 2014\n", "\n", "[4 rows x 3 columns]" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }