{ "metadata": { "name": "", "signature": "sha256:fa9d1d7d71465ded90ab496c3c671a0f381b1209c89ec421b309f6e33e964df1" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Hierarchical Data In Pandas\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### import modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 47 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], \n", " 'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], \n", " 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], \n", " 'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],\n", " 'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}\n", "df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
regimentcompanynamepreTestScorepostTestScore
0 Nighthawks 1st Miller 4 25
1 Nighthawks 1st Jacobson 24 94
2 Nighthawks 2nd Ali 31 57
3 Nighthawks 2nd Milner 2 62
4 Dragoons 1st Cooze 3 70
5 Dragoons 1st Jacon 4 25
6 Dragoons 2nd Ryaner 24 94
7 Dragoons 2nd Sone 31 57
8 Scouts 1st Sloan 2 62
9 Scouts 1st Piger 3 70
10 Scouts 2nd Riani 2 62
11 Scouts 2nd Ali 3 70
\n", "

12 rows \u00d7 5 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 48, "text": [ " regiment company name preTestScore postTestScore\n", "0 Nighthawks 1st Miller 4 25\n", "1 Nighthawks 1st Jacobson 24 94\n", "2 Nighthawks 2nd Ali 31 57\n", "3 Nighthawks 2nd Milner 2 62\n", "4 Dragoons 1st Cooze 3 70\n", "5 Dragoons 1st Jacon 4 25\n", "6 Dragoons 2nd Ryaner 24 94\n", "7 Dragoons 2nd Sone 31 57\n", "8 Scouts 1st Sloan 2 62\n", "9 Scouts 1st Piger 3 70\n", "10 Scouts 2nd Riani 2 62\n", "11 Scouts 2nd Ali 3 70\n", "\n", "[12 rows x 5 columns]" ] } ], "prompt_number": 48 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set the hierarchical index but leave the columns inplace" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df.set_index(['regiment', 'company'], drop=False)\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
regimentcompanynamepreTestScorepostTestScore
0 Nighthawks 1st Miller 4 25
1 Nighthawks 1st Jacobson 24 94
2 Nighthawks 2nd Ali 31 57
3 Nighthawks 2nd Milner 2 62
4 Dragoons 1st Cooze 3 70
5 Dragoons 1st Jacon 4 25
6 Dragoons 2nd Ryaner 24 94
7 Dragoons 2nd Sone 31 57
8 Scouts 1st Sloan 2 62
9 Scouts 1st Piger 3 70
10 Scouts 2nd Riani 2 62
11 Scouts 2nd Ali 3 70
\n", "

12 rows \u00d7 5 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 49, "text": [ " regiment company name preTestScore postTestScore\n", "0 Nighthawks 1st Miller 4 25\n", "1 Nighthawks 1st Jacobson 24 94\n", "2 Nighthawks 2nd Ali 31 57\n", "3 Nighthawks 2nd Milner 2 62\n", "4 Dragoons 1st Cooze 3 70\n", "5 Dragoons 1st Jacon 4 25\n", "6 Dragoons 2nd Ryaner 24 94\n", "7 Dragoons 2nd Sone 31 57\n", "8 Scouts 1st Sloan 2 62\n", "9 Scouts 1st Piger 3 70\n", "10 Scouts 2nd Riani 2 62\n", "11 Scouts 2nd Ali 3 70\n", "\n", "[12 rows x 5 columns]" ] } ], "prompt_number": 49 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set the hierarchical index to be by regiment, and then by company" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df = df.set_index(['regiment', 'company'])\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namepreTestScorepostTestScore
regimentcompany
Nighthawks1st Miller 4 25
1st Jacobson 24 94
2nd Ali 31 57
2nd Milner 2 62
Dragoons1st Cooze 3 70
1st Jacon 4 25
2nd Ryaner 24 94
2nd Sone 31 57
Scouts1st Sloan 2 62
1st Piger 3 70
2nd Riani 2 62
2nd Ali 3 70
\n", "

12 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 50, "text": [ " name preTestScore postTestScore\n", "regiment company \n", "Nighthawks 1st Miller 4 25\n", " 1st Jacobson 24 94\n", " 2nd Ali 31 57\n", " 2nd Milner 2 62\n", "Dragoons 1st Cooze 3 70\n", " 1st Jacon 4 25\n", " 2nd Ryaner 24 94\n", " 2nd Sone 31 57\n", "Scouts 1st Sloan 2 62\n", " 1st Piger 3 70\n", " 2nd Riani 2 62\n", " 2nd Ali 3 70\n", "\n", "[12 rows x 3 columns]" ] } ], "prompt_number": 50 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View the index" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df.index" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 51, "text": [ "MultiIndex(levels=[['Dragoons', 'Nighthawks', 'Scouts'], ['1st', '2nd']],\n", " labels=[[1, 1, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2], [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1]],\n", " names=['regiment', 'company'])" ] } ], "prompt_number": 51 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Swap the levels in the index" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df.swaplevel('regiment', 'company')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namepreTestScorepostTestScore
companyregiment
1stNighthawks Miller 4 25
Nighthawks Jacobson 24 94
2ndNighthawks Ali 31 57
Nighthawks Milner 2 62
1stDragoons Cooze 3 70
Dragoons Jacon 4 25
2ndDragoons Ryaner 24 94
Dragoons Sone 31 57
1stScouts Sloan 2 62
Scouts Piger 3 70
2ndScouts Riani 2 62
Scouts Ali 3 70
\n", "

12 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 52, "text": [ " name preTestScore postTestScore\n", "company regiment \n", "1st Nighthawks Miller 4 25\n", " Nighthawks Jacobson 24 94\n", "2nd Nighthawks Ali 31 57\n", " Nighthawks Milner 2 62\n", "1st Dragoons Cooze 3 70\n", " Dragoons Jacon 4 25\n", "2nd Dragoons Ryaner 24 94\n", " Dragoons Sone 31 57\n", "1st Scouts Sloan 2 62\n", " Scouts Piger 3 70\n", "2nd Scouts Riani 2 62\n", " Scouts Ali 3 70\n", "\n", "[12 rows x 3 columns]" ] } ], "prompt_number": 52 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sumarize the results by regiment" ] }, { "cell_type": "code", "collapsed": false, "input": [ "df.sum(level='regiment')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
preTestScorepostTestScore
regiment
Dragoons 62 246
Nighthawks 61 238
Scouts 10 264
\n", "

3 rows \u00d7 2 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 53, "text": [ " preTestScore postTestScore\n", "regiment \n", "Dragoons 62 246\n", "Nighthawks 61 238\n", "Scouts 10 264\n", "\n", "[3 rows x 2 columns]" ] } ], "prompt_number": 53 } ], "metadata": {} } ] }