{ "metadata": { "name": "", "signature": "sha256:1c6173a6f0cec1737eb8ab133f3b3570ba765d9d236a56e865e46f9c36df1cea" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Binning Data In Pandas\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### import modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], \n", " 'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], \n", " 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], \n", " 'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],\n", " 'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}\n", "df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
regimentcompanynamepreTestScorepostTestScore
0 Nighthawks 1st Miller 4 25
1 Nighthawks 1st Jacobson 24 94
2 Nighthawks 2nd Ali 31 57
3 Nighthawks 2nd Milner 2 62
4 Dragoons 1st Cooze 3 70
5 Dragoons 1st Jacon 4 25
6 Dragoons 2nd Ryaner 24 94
7 Dragoons 2nd Sone 31 57
8 Scouts 1st Sloan 2 62
9 Scouts 1st Piger 3 70
10 Scouts 2nd Riani 2 62
11 Scouts 2nd Ali 3 70
\n", "

12 rows \u00d7 5 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ " regiment company name preTestScore postTestScore\n", "0 Nighthawks 1st Miller 4 25\n", "1 Nighthawks 1st Jacobson 24 94\n", "2 Nighthawks 2nd Ali 31 57\n", "3 Nighthawks 2nd Milner 2 62\n", "4 Dragoons 1st Cooze 3 70\n", "5 Dragoons 1st Jacon 4 25\n", "6 Dragoons 2nd Ryaner 24 94\n", "7 Dragoons 2nd Sone 31 57\n", "8 Scouts 1st Sloan 2 62\n", "9 Scouts 1st Piger 3 70\n", "10 Scouts 2nd Riani 2 62\n", "11 Scouts 2nd Ali 3 70\n", "\n", "[12 rows x 5 columns]" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Define bins as 0 to 25, 25 to 50, 60 to 75, 75 to 100" ] }, { "cell_type": "code", "collapsed": false, "input": [ "bins = [0, 25, 50, 75, 100]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create names for the four groups" ] }, { "cell_type": "code", "collapsed": false, "input": [ "group_names = ['Low', 'Okay', 'Good', 'Great']" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 23 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cut postTestScore" ] }, { "cell_type": "code", "collapsed": false, "input": [ "categories = pd.cut(df['postTestScore'], bins, labels=group_names)\n", "df['categories'] = pd.cut(df['postTestScore'], bins, labels=group_names)\n", "categories" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 27, "text": [ " Low\n", " Great\n", " Good\n", " Good\n", " Good\n", " Low\n", " Great\n", " Good\n", " Good\n", " Good\n", " Good\n", " Good\n", "Name: postTestScore, Levels (4): Index(['Low', 'Okay', 'Good', 'Great'], dtype=object)" ] } ], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Count the number of observations which each value" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pd.value_counts(df['categories'])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View the dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
regimentcompanynamepreTestScorepostTestScorescoresBinnedcategories
0 Nighthawks 1st Miller 4 25 (0, 25] Low
1 Nighthawks 1st Jacobson 24 94 (75, 100] Great
2 Nighthawks 2nd Ali 31 57 (50, 75] Good
3 Nighthawks 2nd Milner 2 62 (50, 75] Good
4 Dragoons 1st Cooze 3 70 (50, 75] Good
5 Dragoons 1st Jacon 4 25 (0, 25] Low
6 Dragoons 2nd Ryaner 24 94 (75, 100] Great
7 Dragoons 2nd Sone 31 57 (50, 75] Good
8 Scouts 1st Sloan 2 62 (50, 75] Good
9 Scouts 1st Piger 3 70 (50, 75] Good
10 Scouts 2nd Riani 2 62 (50, 75] Good
11 Scouts 2nd Ali 3 70 (50, 75] Good
\n", "

12 rows \u00d7 7 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 40, "text": [ " regiment company name preTestScore postTestScore scoresBinned \\\n", "0 Nighthawks 1st Miller 4 25 (0, 25] \n", "1 Nighthawks 1st Jacobson 24 94 (75, 100] \n", "2 Nighthawks 2nd Ali 31 57 (50, 75] \n", "3 Nighthawks 2nd Milner 2 62 (50, 75] \n", "4 Dragoons 1st Cooze 3 70 (50, 75] \n", "5 Dragoons 1st Jacon 4 25 (0, 25] \n", "6 Dragoons 2nd Ryaner 24 94 (75, 100] \n", "7 Dragoons 2nd Sone 31 57 (50, 75] \n", "8 Scouts 1st Sloan 2 62 (50, 75] \n", "9 Scouts 1st Piger 3 70 (50, 75] \n", "10 Scouts 2nd Riani 2 62 (50, 75] \n", "11 Scouts 2nd Ali 3 70 (50, 75] \n", "\n", " categories \n", "0 Low \n", "1 Great \n", "2 Good \n", "3 Good \n", "4 Good \n", "5 Low \n", "6 Great \n", "7 Good \n", "8 Good \n", "9 Good \n", "10 Good \n", "11 Good \n", "\n", "[12 rows x 7 columns]" ] } ], "prompt_number": 40 } ], "metadata": {} } ] }