{ "metadata": { "name": "", "signature": "sha256:2478bebf2e4e73e2ddd8b71682196d643a1f4146f22cc7d3c93992e74db22cc5" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Crosstabs In Pandas\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import pandas" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], \n", " 'company': ['infantry', 'infantry', 'cavalry', 'cavalry', 'infantry', 'infantry', 'cavalry', 'cavalry','infantry', 'infantry', 'cavalry', 'cavalry'], \n", " 'experience': ['veteran', 'rookie', 'veteran', 'rookie', 'veteran', 'rookie', 'veteran', 'rookie','veteran', 'rookie', 'veteran', 'rookie'],\n", " 'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], \n", " 'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],\n", " 'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}\n", "df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'experience', 'name', 'preTestScore', 'postTestScore'])\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
regimentcompanyexperiencenamepreTestScorepostTestScore
0 Nighthawks infantry veteran Miller 4 25
1 Nighthawks infantry rookie Jacobson 24 94
2 Nighthawks cavalry veteran Ali 31 57
3 Nighthawks cavalry rookie Milner 2 62
4 Dragoons infantry veteran Cooze 3 70
5 Dragoons infantry rookie Jacon 4 25
6 Dragoons cavalry veteran Ryaner 24 94
7 Dragoons cavalry rookie Sone 31 57
8 Scouts infantry veteran Sloan 2 62
9 Scouts infantry rookie Piger 3 70
10 Scouts cavalry veteran Riani 2 62
11 Scouts cavalry rookie Ali 3 70
\n", "

12 rows \u00d7 6 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ " regiment company experience name preTestScore postTestScore\n", "0 Nighthawks infantry veteran Miller 4 25\n", "1 Nighthawks infantry rookie Jacobson 24 94\n", "2 Nighthawks cavalry veteran Ali 31 57\n", "3 Nighthawks cavalry rookie Milner 2 62\n", "4 Dragoons infantry veteran Cooze 3 70\n", "5 Dragoons infantry rookie Jacon 4 25\n", "6 Dragoons cavalry veteran Ryaner 24 94\n", "7 Dragoons cavalry rookie Sone 31 57\n", "8 Scouts infantry veteran Sloan 2 62\n", "9 Scouts infantry rookie Piger 3 70\n", "10 Scouts cavalry veteran Riani 2 62\n", "11 Scouts cavalry rookie Ali 3 70\n", "\n", "[12 rows x 6 columns]" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a crosstab table by company and regiment\n", "\n", "Counting the number of observations by regiment and category" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pd.crosstab(df.regiment, df.company, margins=True)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
companycavalryinfantryAll
regiment
Dragoons 2 2 4
Nighthawks 2 2 4
Scouts 2 2 4
All 6 6 12
\n", "

4 rows \u00d7 3 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "company cavalry infantry All\n", "regiment \n", "Dragoons 2 2 4\n", "Nighthawks 2 2 4\n", "Scouts 2 2 4\n", "All 6 6 12\n", "\n", "[4 rows x 3 columns]" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a crosstab of the number of rookie and veteran cavalry and infantry soldiers per regiment" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pd.crosstab([df.company, df.experience], df.regiment, margins=True)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
regimentDragoonsNighthawksScoutsAll
companyexperience
cavalryrookie 1 1 1 3
veteran 1 1 1 3
infantryrookie 1 1 1 3
veteran 1 1 1 3
All 4 4 4 12
\n", "

5 rows \u00d7 4 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "regiment Dragoons Nighthawks Scouts All\n", "company experience \n", "cavalry rookie 1 1 1 3\n", " veteran 1 1 1 3\n", "infantry rookie 1 1 1 3\n", " veteran 1 1 1 3\n", "All 4 4 4 12\n", "\n", "[5 rows x 4 columns]" ] } ], "prompt_number": 13 } ], "metadata": {} } ] }