{ "metadata": { "name": "", "signature": "sha256:4d962f05be2ad6ed869861bf7b8054b775ae3499704c52747dd67bff0519352a" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Count Values In Pandas Dataframe" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Original description](http://www.math.uah.edu/stat/data/HorseKicks.html): \"The data above give the number of soilders in the Prussian cavalry killed by horse kicks, by corp membership and by year. The years are from 1875 to 1894, and there are 14 different cavalry corps: the first column corresponds to the guard corp and the other columns to corps 1 through 11, 14, and 15. The data are from Distributome project and are derived from the book by Andrews and Herzberg. The original source of the data is the classic book by von Bortkiewicz (references are given below). The data are famous because they seem to fit the Poisson model reasonably well.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import the pandas module" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create all the columns of the dataframe as series (called vectors in R)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "year = pd.Series([1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, \n", " 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894])\n", "guardCorps = pd.Series([0,2,2,1,0,0,1,1,0,3,0,2,1,0,0,1,0,1,0,1])\n", "corps1 = pd.Series([0,0,0,2,0,3,0,2,0,0,0,1,1,1,0,2,0,3,1,0])\n", "corps2 = pd.Series([0,0,0,2,0,2,0,0,1,1,0,0,2,1,1,0,0,2,0,0])\n", "corps3 = pd.Series([0,0,0,1,1,1,2,0,2,0,0,0,1,0,1,2,1,0,0,0])\n", "corps4 = pd.Series([0,1,0,1,1,1,1,0,0,0,0,1,0,0,0,0,1,1,0,0])\n", "corps5 = pd.Series([0,0,0,0,2,1,0,0,1,0,0,1,0,1,1,1,1,1,1,0])\n", "corps6 = pd.Series([0,0,1,0,2,0,0,1,2,0,1,1,3,1,1,1,0,3,0,0])\n", "corps7 = pd.Series([1,0,1,0,0,0,1,0,1,1,0,0,2,0,0,2,1,0,2,0])\n", "corps8 = pd.Series([1,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,1,0,1])\n", "corps9 = pd.Series([0,0,0,0,0,2,1,1,1,0,2,1,1,0,1,2,0,1,0,0])\n", "corps10 = pd.Series([0,0,1,1,0,1,0,2,0,2,0,0,0,0,2,1,3,0,1,1])\n", "corps11 = pd.Series([0,0,0,0,2,4,0,1,3,0,1,1,1,1,2,1,3,1,3,1])\n", "corps14 = pd.Series([ 1,1,2,1,1,3,0,4,0,1,0,3,2,1,0,2,1,1,0,0])\n", "corps15 = pd.Series([0,1,0,0,0,0,0,1,0,1,1,0,0,0,2,2,0,0,0,0])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 32 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a dictionary variable that assigns variable names" ] }, { "cell_type": "code", "collapsed": false, "input": [ "variables = dict(guardCorps = guardCorps, corps1 = corps1, \n", " corps2 = corps2, corps3 = corps3, corps4 = corps4, \n", " corps5 = corps5, corps6 = corps6, corps7 = corps7, \n", " corps8 = corps8, corps9 = corps9, corps10 = corps10, \n", " corps11 = corps11 , corps14 = corps14, corps15 = corps15)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 33 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a dataframe and set the order of the columns using the columns attribute" ] }, { "cell_type": "code", "collapsed": false, "input": [ "horsekick = pd.DataFrame(variables, columns = ['guardCorps', \n", " 'corps1', 'corps2', \n", " 'corps3', 'corps4', \n", " 'corps5', 'corps6', \n", " 'corps7', 'corps8', \n", " 'corps9', 'corps10', \n", " 'corps11', 'corps14', \n", " 'corps15'])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set the dataframe's index to be year" ] }, { "cell_type": "code", "collapsed": false, "input": [ "horsekick.index = [1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, \n", " 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### view the horekick dataframe" ] }, { "cell_type": "code", "collapsed": false, "input": [ "horsekick" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
guardCorpscorps1corps2corps3corps4corps5corps6corps7corps8corps9corps10corps11corps14corps15
1875 0 0 0 0 0 0 0 1 1 0 0 0 1 0
1876 2 0 0 0 1 0 0 0 0 0 0 0 1 1
1877 2 0 0 0 0 0 1 1 0 0 1 0 2 0
1878 1 2 2 1 1 0 0 0 0 0 1 0 1 0
1879 0 0 0 1 1 2 2 0 1 0 0 2 1 0
1880 0 3 2 1 1 1 0 0 0 2 1 4 3 0
1881 1 0 0 2 1 0 0 1 0 1 0 0 0 0
1882 1 2 0 0 0 0 1 0 1 1 2 1 4 1
1883 0 0 1 2 0 1 2 1 0 1 0 3 0 0
1884 3 0 1 0 0 0 0 1 0 0 2 0 1 1
1885 0 0 0 0 0 0 1 0 0 2 0 1 0 1
1886 2 1 0 0 1 1 1 0 0 1 0 1 3 0
1887 1 1 2 1 0 0 3 2 1 1 0 1 2 0
1888 0 1 1 0 0 1 1 0 0 0 0 1 1 0
1889 0 0 1 1 0 1 1 0 0 1 2 2 0 2
1890 1 2 0 2 0 1 1 2 0 2 1 1 2 2
1891 0 0 0 1 1 1 0 1 1 0 3 3 1 0
1892 1 3 2 0 1 1 3 0 1 1 0 1 1 0
1893 0 1 0 0 0 1 0 2 0 0 1 3 0 0
1894 1 0 0 0 0 0 0 0 1 0 1 1 0 0
\n", "

20 rows \u00d7 14 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 36, "text": [ " guardCorps corps1 corps2 corps3 corps4 corps5 corps6 corps7 \\\n", "1875 0 0 0 0 0 0 0 1 \n", "1876 2 0 0 0 1 0 0 0 \n", "1877 2 0 0 0 0 0 1 1 \n", "1878 1 2 2 1 1 0 0 0 \n", "1879 0 0 0 1 1 2 2 0 \n", "1880 0 3 2 1 1 1 0 0 \n", "1881 1 0 0 2 1 0 0 1 \n", "1882 1 2 0 0 0 0 1 0 \n", "1883 0 0 1 2 0 1 2 1 \n", "1884 3 0 1 0 0 0 0 1 \n", "1885 0 0 0 0 0 0 1 0 \n", "1886 2 1 0 0 1 1 1 0 \n", "1887 1 1 2 1 0 0 3 2 \n", "1888 0 1 1 0 0 1 1 0 \n", "1889 0 0 1 1 0 1 1 0 \n", "1890 1 2 0 2 0 1 1 2 \n", "1891 0 0 0 1 1 1 0 1 \n", "1892 1 3 2 0 1 1 3 0 \n", "1893 0 1 0 0 0 1 0 2 \n", "1894 1 0 0 0 0 0 0 0 \n", "\n", " corps8 corps9 corps10 corps11 corps14 corps15 \n", "1875 1 0 0 0 1 0 \n", "1876 0 0 0 0 1 1 \n", "1877 0 0 1 0 2 0 \n", "1878 0 0 1 0 1 0 \n", "1879 1 0 0 2 1 0 \n", "1880 0 2 1 4 3 0 \n", "1881 0 1 0 0 0 0 \n", "1882 1 1 2 1 4 1 \n", "1883 0 1 0 3 0 0 \n", "1884 0 0 2 0 1 1 \n", "1885 0 2 0 1 0 1 \n", "1886 0 1 0 1 3 0 \n", "1887 1 1 0 1 2 0 \n", "1888 0 0 0 1 1 0 \n", "1889 0 1 2 2 0 2 \n", "1890 0 2 1 1 2 2 \n", "1891 1 0 3 3 1 0 \n", "1892 1 1 0 1 1 0 \n", "1893 0 0 1 3 0 0 \n", "1894 1 0 1 1 0 0 \n", "\n", "[20 rows x 14 columns]" ] } ], "prompt_number": 36 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Count the number of times each number of deaths occurs in each regiment" ] }, { "cell_type": "code", "collapsed": false, "input": [ "result = horsekick.apply(pd.value_counts).fillna(0); result" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
guardCorpscorps1corps2corps3corps4corps5corps6corps7corps8corps9corps10corps11corps14corps15
0 9 11 12 11 12 10 9 11 13 10 10 6 6 14
1 7 4 4 6 8 9 7 6 7 7 6 8 8 4
2 3 3 4 3 0 1 2 3 0 3 3 2 3 2
3 1 2 0 0 0 0 2 0 0 0 1 3 2 0
4 0 0 0 0 0 0 0 0 0 0 0 1 1 0
\n", "

5 rows \u00d7 14 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 41, "text": [ " guardCorps corps1 corps2 corps3 corps4 corps5 corps6 corps7 corps8 \\\n", "0 9 11 12 11 12 10 9 11 13 \n", "1 7 4 4 6 8 9 7 6 7 \n", "2 3 3 4 3 0 1 2 3 0 \n", "3 1 2 0 0 0 0 2 0 0 \n", "4 0 0 0 0 0 0 0 0 0 \n", "\n", " corps9 corps10 corps11 corps14 corps15 \n", "0 10 10 6 6 14 \n", "1 7 6 8 8 4 \n", "2 3 3 2 3 2 \n", "3 0 1 3 2 0 \n", "4 0 0 1 1 0 \n", "\n", "[5 rows x 14 columns]" ] } ], "prompt_number": 41 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Count the number of times each monthly death total appears in guardCorps" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pd.value_counts(horsekick['guardCorps'].values, sort=False)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 47, "text": [ "0 9\n", "1 7\n", "2 3\n", "3 1\n", "dtype: int64" ] } ], "prompt_number": 47 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List all the unique values in guardCorps" ] }, { "cell_type": "code", "collapsed": false, "input": [ "horsekick['guardCorps'].unique()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 50, "text": [ "array([0, 2, 1, 3])" ] } ], "prompt_number": 50 } ], "metadata": {} } ] }