{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Prussian Horse Kick Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Original description](http://www.math.uah.edu/stat/data/HorseKicks.html): \"The data above give the number of soilders in the Prussian cavalry killed by horse kicks, by corp membership and by year. The years are from 1875 to 1894, and there are 14 different cavalry corps: the first column corresponds to the guard corp and the other columns to corps 1 through 11, 14, and 15. The data are from Distributome project and are derived from the book by Andrews and Herzberg. The original source of the data is the classic book by von Bortkiewicz (references are given below). The data are famous because they seem to fit the Poisson model reasonably well.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import the pandas module" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create all the columns of the dataframe as series (called vectors in R)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "year = pd.Series([1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, \n", " 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894])\n", "guard_corps = pd.Series([0,2,2,1,0,0,1,1,0,3,0,2,1,0,0,1,0,1,0,1])\n", "corps_1 = pd.Series([0,0,0,2,0,3,0,2,0,0,0,1,1,1,0,2,0,3,1,0])\n", "corps_2 = pd.Series([0,0,0,2,0,2,0,0,1,1,0,0,2,1,1,0,0,2,0,0])\n", "corps_3 = pd.Series([0,0,0,1,1,1,2,0,2,0,0,0,1,0,1,2,1,0,0,0])\n", "corps_4 = pd.Series([0,1,0,1,1,1,1,0,0,0,0,1,0,0,0,0,1,1,0,0])\n", "corps_5 = pd.Series([0,0,0,0,2,1,0,0,1,0,0,1,0,1,1,1,1,1,1,0])\n", "corps_6 = pd.Series([0,0,1,0,2,0,0,1,2,0,1,1,3,1,1,1,0,3,0,0])\n", "corps_7 = pd.Series([1,0,1,0,0,0,1,0,1,1,0,0,2,0,0,2,1,0,2,0])\n", "corps_8 = pd.Series([1,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,1,1,0,1])\n", "corps_9 = pd.Series([0,0,0,0,0,2,1,1,1,0,2,1,1,0,1,2,0,1,0,0])\n", "corps_10 = pd.Series([0,0,1,1,0,1,0,2,0,2,0,0,0,0,2,1,3,0,1,1])\n", "corps_11 = pd.Series([0,0,0,0,2,4,0,1,3,0,1,1,1,1,2,1,3,1,3,1])\n", "corps_14 = pd.Series([ 1,1,2,1,1,3,0,4,0,1,0,3,2,1,0,2,1,1,0,0])\n", "corps_15 = pd.Series([0,1,0,0,0,0,0,1,0,1,1,0,0,0,2,2,0,0,0,0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a dictionary variable that assigns variable names" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "variables = dict(year = year, guard_corps = guard_corps, corps_1 = corps_1, \n", " corps_2 = corps_2, corps_3 = corps_3, corps_4 = corps_4, \n", " corps_5 = corps_5, corps_6 = corps_6, corps_7 = corps_7, \n", " corps_8 = corps_8, corps_9 = corps_9, corps_10 = corps_10, \n", " corps_11 = corps_11 , corps_14 = corps_14, corps_15 = corps_15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a dataframe and set the order of the columns using the columns attribute" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "horsekick_data = pd.DataFrame(variables, columns = ['year', 'guard_corps', \n", " 'corps_1', 'corps_2', \n", " 'corps_3', 'corps_4', \n", " 'corps_5', 'corps_6', \n", " 'corps_7', 'corps_8', \n", " 'corps_9', 'corps_10', \n", " 'corps_11', 'corps_14', \n", " 'corps_15'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View the dataframe" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearguard_corpscorps_1corps_2corps_3corps_4corps_5corps_6corps_7corps_8corps_9corps_10corps_11corps_14corps_15
0 1875 0 0 0 0 0 0 0 1 1 0 0 0 1 0
1 1876 2 0 0 0 1 0 0 0 0 0 0 0 1 1
2 1877 2 0 0 0 0 0 1 1 0 0 1 0 2 0
3 1878 1 2 2 1 1 0 0 0 0 0 1 0 1 0
4 1879 0 0 0 1 1 2 2 0 1 0 0 2 1 0
5 1880 0 3 2 1 1 1 0 0 0 2 1 4 3 0
6 1881 1 0 0 2 1 0 0 1 0 1 0 0 0 0
7 1882 1 2 0 0 0 0 1 0 1 1 2 1 4 1
8 1883 0 0 1 2 0 1 2 1 0 1 0 3 0 0
9 1884 3 0 1 0 0 0 0 1 0 0 2 0 1 1
10 1885 0 0 0 0 0 0 1 0 0 2 0 1 0 1
11 1886 2 1 0 0 1 1 1 0 0 1 0 1 3 0
12 1887 1 1 2 1 0 0 3 2 1 1 0 1 2 0
13 1888 0 1 1 0 0 1 1 0 0 0 0 1 1 0
14 1889 0 0 1 1 0 1 1 0 0 1 2 2 0 2
15 1890 1 2 0 2 0 1 1 2 0 2 1 1 2 2
16 1891 0 0 0 1 1 1 0 1 1 0 3 3 1 0
17 1892 1 3 2 0 1 1 3 0 1 1 0 1 1 0
18 1893 0 1 0 0 0 1 0 2 0 0 1 3 0 0
19 1894 1 0 0 0 0 0 0 0 1 0 1 1 0 0
\n", "
" ], "text/plain": [ " year guard_corps corps_1 corps_2 corps_3 corps_4 corps_5 corps_6 \\\n", "0 1875 0 0 0 0 0 0 0 \n", "1 1876 2 0 0 0 1 0 0 \n", "2 1877 2 0 0 0 0 0 1 \n", "3 1878 1 2 2 1 1 0 0 \n", "4 1879 0 0 0 1 1 2 2 \n", "5 1880 0 3 2 1 1 1 0 \n", "6 1881 1 0 0 2 1 0 0 \n", "7 1882 1 2 0 0 0 0 1 \n", "8 1883 0 0 1 2 0 1 2 \n", "9 1884 3 0 1 0 0 0 0 \n", "10 1885 0 0 0 0 0 0 1 \n", "11 1886 2 1 0 0 1 1 1 \n", "12 1887 1 1 2 1 0 0 3 \n", "13 1888 0 1 1 0 0 1 1 \n", "14 1889 0 0 1 1 0 1 1 \n", "15 1890 1 2 0 2 0 1 1 \n", "16 1891 0 0 0 1 1 1 0 \n", "17 1892 1 3 2 0 1 1 3 \n", "18 1893 0 1 0 0 0 1 0 \n", "19 1894 1 0 0 0 0 0 0 \n", "\n", " corps_7 corps_8 corps_9 corps_10 corps_11 corps_14 corps_15 \n", "0 1 1 0 0 0 1 0 \n", "1 0 0 0 0 0 1 1 \n", "2 1 0 0 1 0 2 0 \n", "3 0 0 0 1 0 1 0 \n", "4 0 1 0 0 2 1 0 \n", "5 0 0 2 1 4 3 0 \n", "6 1 0 1 0 0 0 0 \n", "7 0 1 1 2 1 4 1 \n", "8 1 0 1 0 3 0 0 \n", "9 1 0 0 2 0 1 1 \n", "10 0 0 2 0 1 0 1 \n", "11 0 0 1 0 1 3 0 \n", "12 2 1 1 0 1 2 0 \n", "13 0 0 0 0 1 1 0 \n", "14 0 0 1 2 2 0 2 \n", "15 2 0 2 1 1 2 2 \n", "16 1 1 0 3 3 1 0 \n", "17 0 1 1 0 1 1 0 \n", "18 2 0 0 1 3 0 0 \n", "19 0 1 0 1 1 0 0 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "horsekick_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.3.5" } }, "nbformat": 4, "nbformat_minor": 0 }