{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas\n", "Pandas is a Python library build on top of NumPy that is used for cleaning, analysing and visualising data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Series\n", "Panda's `Series` class extends NumPy's `ndarray` with a labelled index. The key to using Series is to understand how to use its index." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 100\n", "1 101\n", "2 110\n", "3 111\n", "dtype: int8" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a Series with auto-generated indices\n", "pd.Series(data=[100, 101, 110, 111], dtype=np.int8) " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 100\n", "b 101\n", "c 110\n", "d 111\n", "dtype: int8" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a Series with custom indices\n", "pd.Series(data=[100, 101, 110, 111], index=['a', 'b', 'c', 'd'], dtype=np.int8) " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 100\n", "b 101\n", "c 110\n", "d 111\n", "dtype: int8" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a Series using a dictionary\n", "d = {'a' : 100, 'b': 101, 'c': 110, 'd': 111}\n", "pd.Series(data=d, dtype=np.int8) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Arithmetic" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "breakfast 400\n", "lunch 600\n", "dinner 400\n", "dtype: int16" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "day1 = pd.Series(data=[400, 600, 400], index=['breakfast', 'lunch', 'dinner'], dtype=np.int16)\n", "day1" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "breakfast 350\n", "lunch 500\n", "snack 150\n", "dtype: int16" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "day2 = pd.Series(data=[350, 500, 150], index=['breakfast', 'lunch', 'snack'], dtype=np.int16)\n", "day2" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "breakfast 750.0\n", "dinner NaN\n", "lunch 1100.0\n", "snack NaN\n", "dtype: float64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Note that only values of matched indices are added together.\n", "day1 + day2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DataFrame\n", "A DataFrame is container for tabular data. Basically, a DataFrame is just a collection of Series that share the same index." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
w1234
x5678
y9101112
z13141516
\n", "
" ], "text/plain": [ " A B C D\n", "w 1 2 3 4\n", "x 5 6 7 8\n", "y 9 10 11 12\n", "z 13 14 15 16" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def init_df():\n", " return pd.DataFrame(data=np.arange(1,17).reshape(4,4), index='w x y z'.split(), columns='A B C D'.split())\n", "df = init_df()\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating and deleting" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDE
w12341
x567825
y910111281
z13141516169
\n", "
" ], "text/plain": [ " A B C D E\n", "w 1 2 3 4 1\n", "x 5 6 7 8 25\n", "y 9 10 11 12 81\n", "z 13 14 15 16 169" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a new column based on another column\n", "df['E'] = df['A'] ** 2\n", "df" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BCD
w234
x678
y101112
z141516
\n", "
" ], "text/plain": [ " B C D\n", "w 2 3 4\n", "x 6 7 8\n", "y 10 11 12\n", "z 14 15 16" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a new DataFrame, where certain columns are excluded.\n", "df.drop(['A', 'E'], axis=1)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
w1234
x5678
y9101112
z13141516
\n", "
" ], "text/plain": [ " A B C D\n", "w 1 2 3 4\n", "x 5 6 7 8\n", "y 9 10 11 12\n", "z 13 14 15 16" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Remove a column permanently\n", "df.drop('E', axis=1, inplace=True)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Querying" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "w 1\n", "x 5\n", "y 9\n", "z 13\n", "Name: A, dtype: int32" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select column 'A'\n", "df['A']" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Note that all columns are stored as Series objects\n", "type(df['A'])" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AD
w14
x58
y912
z1316
\n", "
" ], "text/plain": [ " A D\n", "w 1 4\n", "x 5 8\n", "y 9 12\n", "z 13 16" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Selecting multiple columns, we get a new DataFrame object\n", "df[['A', 'D']]" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A 5\n", "B 6\n", "C 7\n", "D 8\n", "Name: x, dtype: int32" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select a row by its label \n", "df.loc['x']" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A 1\n", "B 2\n", "C 3\n", "D 4\n", "Name: w, dtype: int32" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select a row by its numerical index position\n", "df.iloc[0]" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select the value of the first cell\n", "df.loc['w', 'A']" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BC
x67
y1011
\n", "
" ], "text/plain": [ " B C\n", "x 6 7\n", "y 10 11" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select a subset of the DataFrame\n", "df.loc[['x', 'y'], ['B', 'C']]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
wNaNNaNNaNNaN
xNaNNaNNaNNaN
yNaNNaN11.012.0
z13.014.015.016.0
\n", "
" ], "text/plain": [ " A B C D\n", "w NaN NaN NaN NaN\n", "x NaN NaN NaN NaN\n", "y NaN NaN 11.0 12.0\n", "z 13.0 14.0 15.0 16.0" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Conditional selection\n", "df[df > 10]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
wFalseFalseFalseFalse
xFalseFalseFalseFalse
yFalseFalseTrueTrue
zTrueTrueTrueTrue
\n", "
" ], "text/plain": [ " A B C D\n", "w False False False False\n", "x False False False False\n", "y False False True True\n", "z True True True True" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Note that the conditional selection only \n", "# returns cells whose boolean value is True\n", "# in the following DataFrame\n", "df > 10" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
y9101112
z13141516
\n", "
" ], "text/plain": [ " A B C D\n", "y 9 10 11 12\n", "z 13 14 15 16" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select the rows where column A is larger or equal to 9\n", "df[df['A'] >= 9]" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
y9101112
\n", "
" ], "text/plain": [ " A B C D\n", "y 9 10 11 12" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Note that we use `&` as conjunction since Python's `and` operator\n", "# can only deal with single Boolean values e.g. `True and True`\n", "df[(df['A'] >= 9) & (df['C'] == 11)]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
w1234
y9101112
z13141516
\n", "
" ], "text/plain": [ " A B C D\n", "w 1 2 3 4\n", "y 9 10 11 12\n", "z 13 14 15 16" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[(df['A'] >= 9) | (df['C'] == 3)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indicies" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexABCD
0w1234
1x5678
2y9101112
3z13141516
\n", "
" ], "text/plain": [ " index A B C D\n", "0 w 1 2 3 4\n", "1 x 5 6 7 8\n", "2 y 9 10 11 12\n", "3 z 13 14 15 16" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Reset the index to a numerical value\n", "# Note that the old index will become\n", "# a column in our DataFrame.\n", "df.reset_index()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
Country
CA1234
DE5678
DK9101112
NO13141516
\n", "
" ], "text/plain": [ " A B C D\n", "Country \n", "CA 1 2 3 4\n", "DE 5 6 7 8\n", "DK 9 10 11 12\n", "NO 13 14 15 16" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Set a new index.\n", "df['Country'] = 'CA DE DK NO'.split()\n", "df.set_index('Country')\n", "# To overrides the old index use following line instead: \n", "# df.set_index('Country', inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hierarchical indexing" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MultiIndex(levels=[['p', 'q'], [1, 2, 3]],\n", " labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],\n", " names=['outside', 'inside'])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outside = 'p p p q q q'.split()\n", "inside = [1, 2, 3, 1, 2, 3]\n", "hierarchical_index = list(zip(outside, inside))\n", "multi_index = pd.MultiIndex.from_tuples(hierarchical_index, names='outside inside'.split())\n", "multi_index" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column 1Column 2
outsideinside
p11.9824990.240303
2-0.744630-0.233123
3-0.127521-0.899706
q1-0.9786712.158961
20.4765060.681723
3-1.2011090.632565
\n", "
" ], "text/plain": [ " Column 1 Column 2\n", "outside inside \n", "p 1 1.982499 0.240303\n", " 2 -0.744630 -0.233123\n", " 3 -0.127521 -0.899706\n", "q 1 -0.978671 2.158961\n", " 2 0.476506 0.681723\n", " 3 -1.201109 0.632565" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(data=np.random.randn(6,2), index=multi_index, columns=['Column 1', 'Column 2'])\n", "df" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column 1Column 2
inside
11.9824990.240303
2-0.744630-0.233123
3-0.127521-0.899706
\n", "
" ], "text/plain": [ " Column 1 Column 2\n", "inside \n", "1 1.982499 0.240303\n", "2 -0.744630 -0.233123\n", "3 -0.127521 -0.899706" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select using the outer index\n", "df.loc['p']" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Column 1 -0.744630\n", "Column 2 -0.233123\n", "Name: 2, dtype: float64" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select using the inside index\n", "df.loc['p'].loc[2]" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-0.74462953774755614" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select a specific cell\n", "df.loc['p'].loc[2]['Column 1']" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column 1Column 2
OI
p11.9824990.240303
2-0.744630-0.233123
3-0.127521-0.899706
q1-0.9786712.158961
20.4765060.681723
3-1.2011090.632565
\n", "
" ], "text/plain": [ " Column 1 Column 2\n", "O I \n", "p 1 1.982499 0.240303\n", " 2 -0.744630 -0.233123\n", " 3 -0.127521 -0.899706\n", "q 1 -0.978671 2.158961\n", " 2 0.476506 0.681723\n", " 3 -1.201109 0.632565" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Rename index names\n", "df.index.names = ['O', 'I']\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Cross section_ is used when we need to select data at a particular level." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Column 1Column 2
O
p1.9824990.240303
q-0.9786712.158961
\n", "
" ], "text/plain": [ " Column 1 Column 2\n", "O \n", "p 1.982499 0.240303\n", "q -0.978671 2.158961" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select rows whose inside index is equal 1\n", "df.xs(1, level='I')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dealing with missing data" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
01.01.01
12.0NaN2
2NaNNaN3
\n", "
" ], "text/plain": [ " A B C\n", "0 1.0 1.0 1\n", "1 2.0 NaN 2\n", "2 NaN NaN 3" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = {'A': [1, 2, np.nan], 'B': [1, np.nan, np.nan], 'C': [1, 2, 3]}\n", "df = pd.DataFrame(d)\n", "df" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
01.01.01
\n", "
" ], "text/plain": [ " A B C\n", "0 1.0 1.0 1" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Drop any rows with missing values\n", "df.dropna()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
01.01.01
12.0NaN2
\n", "
" ], "text/plain": [ " A B C\n", "0 1.0 1.0 1\n", "1 2.0 NaN 2" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Keep only the rows with at least 2 non-na values:\n", "df.dropna(thresh=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `subset` parameter can be used to specify which columns an action should apply to instead of all columns. For instance, if we want to drop rows with missing values, `subset` specifies a list of columns to include.\n", "\n", "For instance, `df.dropna(thresh=1, subset=['A','B'])` will drop all rows with less than 1 NA value in only columns A and B(rather than all the columns to consider for thresh=1).\n", "\n", "The line `df.dropna(how=all, subset=['A','B'])` will drop all rows with all NA values in only columns A and B." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
C
01
12
23
\n", "
" ], "text/plain": [ " C\n", "0 1\n", "1 2\n", "2 3" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Drop any columns with missing values\n", "df.dropna(axis=1)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
01.01.01
12.00.02
20.00.03
\n", "
" ], "text/plain": [ " A B C\n", "0 1.0 1.0 1\n", "1 2.0 0.0 2\n", "2 0.0 0.0 3" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Replace missing values\n", "df.fillna(0)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 2.0\n", "2 1.5\n", "Name: A, dtype: float64" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Replace missing values with the mean of the column\n", "df['A'].fillna(value=df['A'].mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Grouping" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EmployeeNameJobTitleTotalPayYear
Id
1NATHANIEL FORDGENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY567595.432011
2GARY JIMENEZCAPTAIN III (POLICE DEPARTMENT)538909.282011
3ALBERT PARDINICAPTAIN III (POLICE DEPARTMENT)335279.912011
4CHRISTOPHER CHONGWIRE ROPE CABLE MAINTENANCE MECHANIC332343.612011
5PATRICK GARDNERDEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT)326373.192011
\n", "
" ], "text/plain": [ " EmployeeName JobTitle \\\n", "Id \n", "1 NATHANIEL FORD GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY \n", "2 GARY JIMENEZ CAPTAIN III (POLICE DEPARTMENT) \n", "3 ALBERT PARDINI CAPTAIN III (POLICE DEPARTMENT) \n", "4 CHRISTOPHER CHONG WIRE ROPE CABLE MAINTENANCE MECHANIC \n", "5 PATRICK GARDNER DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) \n", "\n", " TotalPay Year \n", "Id \n", "1 567595.43 2011 \n", "2 538909.28 2011 \n", "3 335279.91 2011 \n", "4 332343.61 2011 \n", "5 326373.19 2011 " ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "columns = 'Id EmployeeName JobTitle TotalPay Year'.split()\n", "salaries_df = pd.read_csv('data/sf-salaries-subset.csv', index_col='Id', usecols=columns)\n", "salaries_df.head()" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "# Group by job title\n", "salaries_by_job_df = salaries_df.groupby('JobTitle')" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
JobTitle
ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT)1.0299494.1700NaN299494.17299494.1700299494.170299494.1700299494.17
ASSISTANT DEPUTY CHIEF II1.0316285.7400NaN316285.74316285.7400316285.740316285.7400316285.74
ASSISTANT MEDICAL EXAMINER1.0274550.2500NaN274550.25274550.2500274550.250274550.2500274550.25
BATTALION CHIEF, (FIRE DEPARTMENT)4.0295547.467517602.943941276434.22283868.8425294887.300306565.9250315981.05
CAPTAIN III (POLICE DEPARTMENT)3.0390599.3700129813.897272297608.92316444.4150335279.910437094.5950538909.28
CAPTAIN, EMERGENCYCY MEDICAL SERVICES1.0278569.2100NaN278569.21278569.2100278569.210278569.2100278569.21
CHIEF OF DEPARTMENT, (FIRE DEPARTMENT)1.0302377.7300NaN302377.73302377.7300302377.730302377.7300302377.73
COMMANDER III, (POLICE DEPARTMENT)1.0286213.8600NaN286213.86286213.8600286213.860286213.8600286213.86
DEPARTMENT HEAD V2.0288696.02505622.340368284720.43286708.2275288696.025290683.8225292671.62
DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT)1.0326373.1900NaN326373.19326373.1900326373.190326373.1900326373.19
DEPUTY DIRECTOR OF INVESTMENTS1.0307899.4600NaN307899.46307899.4600307899.460307899.4600307899.46
EXECUTIVE CONTRACT EMPLOYEE1.0294580.0200NaN294580.02294580.0200294580.020294580.0200294580.02
GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY1.0567595.4300NaN567595.43567595.4300567595.430567595.4300567595.43
WIRE ROPE CABLE MAINTENANCE MECHANIC1.0332343.6100NaN332343.61332343.6100332343.610332343.6100332343.61
\n", "
" ], "text/plain": [ " count mean \\\n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 1.0 299494.1700 \n", "ASSISTANT DEPUTY CHIEF II 1.0 316285.7400 \n", "ASSISTANT MEDICAL EXAMINER 1.0 274550.2500 \n", "BATTALION CHIEF, (FIRE DEPARTMENT) 4.0 295547.4675 \n", "CAPTAIN III (POLICE DEPARTMENT) 3.0 390599.3700 \n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES 1.0 278569.2100 \n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 1.0 302377.7300 \n", "COMMANDER III, (POLICE DEPARTMENT) 1.0 286213.8600 \n", "DEPARTMENT HEAD V 2.0 288696.0250 \n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 1.0 326373.1900 \n", "DEPUTY DIRECTOR OF INVESTMENTS 1.0 307899.4600 \n", "EXECUTIVE CONTRACT EMPLOYEE 1.0 294580.0200 \n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 1.0 567595.4300 \n", "WIRE ROPE CABLE MAINTENANCE MECHANIC 1.0 332343.6100 \n", "\n", " std min \\\n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) NaN 299494.17 \n", "ASSISTANT DEPUTY CHIEF II NaN 316285.74 \n", "ASSISTANT MEDICAL EXAMINER NaN 274550.25 \n", "BATTALION CHIEF, (FIRE DEPARTMENT) 17602.943941 276434.22 \n", "CAPTAIN III (POLICE DEPARTMENT) 129813.897272 297608.92 \n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES NaN 278569.21 \n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) NaN 302377.73 \n", "COMMANDER III, (POLICE DEPARTMENT) NaN 286213.86 \n", "DEPARTMENT HEAD V 5622.340368 284720.43 \n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) NaN 326373.19 \n", "DEPUTY DIRECTOR OF INVESTMENTS NaN 307899.46 \n", "EXECUTIVE CONTRACT EMPLOYEE NaN 294580.02 \n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY NaN 567595.43 \n", "WIRE ROPE CABLE MAINTENANCE MECHANIC NaN 332343.61 \n", "\n", " 25% 50% \\\n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 299494.1700 299494.170 \n", "ASSISTANT DEPUTY CHIEF II 316285.7400 316285.740 \n", "ASSISTANT MEDICAL EXAMINER 274550.2500 274550.250 \n", "BATTALION CHIEF, (FIRE DEPARTMENT) 283868.8425 294887.300 \n", "CAPTAIN III (POLICE DEPARTMENT) 316444.4150 335279.910 \n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES 278569.2100 278569.210 \n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 302377.7300 302377.730 \n", "COMMANDER III, (POLICE DEPARTMENT) 286213.8600 286213.860 \n", "DEPARTMENT HEAD V 286708.2275 288696.025 \n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 326373.1900 326373.190 \n", "DEPUTY DIRECTOR OF INVESTMENTS 307899.4600 307899.460 \n", "EXECUTIVE CONTRACT EMPLOYEE 294580.0200 294580.020 \n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 567595.4300 567595.430 \n", "WIRE ROPE CABLE MAINTENANCE MECHANIC 332343.6100 332343.610 \n", "\n", " 75% max \n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 299494.1700 299494.17 \n", "ASSISTANT DEPUTY CHIEF II 316285.7400 316285.74 \n", "ASSISTANT MEDICAL EXAMINER 274550.2500 274550.25 \n", "BATTALION CHIEF, (FIRE DEPARTMENT) 306565.9250 315981.05 \n", "CAPTAIN III (POLICE DEPARTMENT) 437094.5950 538909.28 \n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES 278569.2100 278569.21 \n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 302377.7300 302377.73 \n", "COMMANDER III, (POLICE DEPARTMENT) 286213.8600 286213.86 \n", "DEPARTMENT HEAD V 290683.8225 292671.62 \n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 326373.1900 326373.19 \n", "DEPUTY DIRECTOR OF INVESTMENTS 307899.4600 307899.46 \n", "EXECUTIVE CONTRACT EMPLOYEE 294580.0200 294580.02 \n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 567595.4300 567595.43 \n", "WIRE ROPE CABLE MAINTENANCE MECHANIC 332343.6100 332343.61 " ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get some statistics on the TotalPay column\n", "salaries_by_job_df['TotalPay'].describe()" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TotalPayYear
countmeanstdmin25%50%75%maxcountmeanstdmin25%50%75%max
JobTitle
ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT)1.0299494.1700NaN299494.17299494.1700299494.170299494.1700299494.171.02011.0NaN2011.02011.02011.02011.02011.0
ASSISTANT DEPUTY CHIEF II1.0316285.7400NaN316285.74316285.7400316285.740316285.7400316285.741.02011.0NaN2011.02011.02011.02011.02011.0
ASSISTANT MEDICAL EXAMINER1.0274550.2500NaN274550.25274550.2500274550.250274550.2500274550.251.02011.0NaN2011.02011.02011.02011.02011.0
BATTALION CHIEF, (FIRE DEPARTMENT)4.0295547.467517602.943941276434.22283868.8425294887.300306565.9250315981.054.02011.00.02011.02011.02011.02011.02011.0
CAPTAIN III (POLICE DEPARTMENT)3.0390599.3700129813.897272297608.92316444.4150335279.910437094.5950538909.283.02011.00.02011.02011.02011.02011.02011.0
CAPTAIN, EMERGENCYCY MEDICAL SERVICES1.0278569.2100NaN278569.21278569.2100278569.210278569.2100278569.211.02011.0NaN2011.02011.02011.02011.02011.0
CHIEF OF DEPARTMENT, (FIRE DEPARTMENT)1.0302377.7300NaN302377.73302377.7300302377.730302377.7300302377.731.02011.0NaN2011.02011.02011.02011.02011.0
COMMANDER III, (POLICE DEPARTMENT)1.0286213.8600NaN286213.86286213.8600286213.860286213.8600286213.861.02011.0NaN2011.02011.02011.02011.02011.0
DEPARTMENT HEAD V2.0288696.02505622.340368284720.43286708.2275288696.025290683.8225292671.622.02011.00.02011.02011.02011.02011.02011.0
DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT)1.0326373.1900NaN326373.19326373.1900326373.190326373.1900326373.191.02011.0NaN2011.02011.02011.02011.02011.0
DEPUTY DIRECTOR OF INVESTMENTS1.0307899.4600NaN307899.46307899.4600307899.460307899.4600307899.461.02011.0NaN2011.02011.02011.02011.02011.0
EXECUTIVE CONTRACT EMPLOYEE1.0294580.0200NaN294580.02294580.0200294580.020294580.0200294580.021.02011.0NaN2011.02011.02011.02011.02011.0
GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY1.0567595.4300NaN567595.43567595.4300567595.430567595.4300567595.431.02011.0NaN2011.02011.02011.02011.02011.0
WIRE ROPE CABLE MAINTENANCE MECHANIC1.0332343.6100NaN332343.61332343.6100332343.610332343.6100332343.611.02011.0NaN2011.02011.02011.02011.02011.0
\n", "
" ], "text/plain": [ " TotalPay \\\n", " count mean \n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 1.0 299494.1700 \n", "ASSISTANT DEPUTY CHIEF II 1.0 316285.7400 \n", "ASSISTANT MEDICAL EXAMINER 1.0 274550.2500 \n", "BATTALION CHIEF, (FIRE DEPARTMENT) 4.0 295547.4675 \n", "CAPTAIN III (POLICE DEPARTMENT) 3.0 390599.3700 \n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES 1.0 278569.2100 \n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 1.0 302377.7300 \n", "COMMANDER III, (POLICE DEPARTMENT) 1.0 286213.8600 \n", "DEPARTMENT HEAD V 2.0 288696.0250 \n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 1.0 326373.1900 \n", "DEPUTY DIRECTOR OF INVESTMENTS 1.0 307899.4600 \n", "EXECUTIVE CONTRACT EMPLOYEE 1.0 294580.0200 \n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 1.0 567595.4300 \n", "WIRE ROPE CABLE MAINTENANCE MECHANIC 1.0 332343.6100 \n", "\n", " \\\n", " std min \n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) NaN 299494.17 \n", "ASSISTANT DEPUTY CHIEF II NaN 316285.74 \n", "ASSISTANT MEDICAL EXAMINER NaN 274550.25 \n", "BATTALION CHIEF, (FIRE DEPARTMENT) 17602.943941 276434.22 \n", "CAPTAIN III (POLICE DEPARTMENT) 129813.897272 297608.92 \n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES NaN 278569.21 \n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) NaN 302377.73 \n", "COMMANDER III, (POLICE DEPARTMENT) NaN 286213.86 \n", "DEPARTMENT HEAD V 5622.340368 284720.43 \n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) NaN 326373.19 \n", "DEPUTY DIRECTOR OF INVESTMENTS NaN 307899.46 \n", "EXECUTIVE CONTRACT EMPLOYEE NaN 294580.02 \n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY NaN 567595.43 \n", "WIRE ROPE CABLE MAINTENANCE MECHANIC NaN 332343.61 \n", "\n", " \\\n", " 25% 50% \n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 299494.1700 299494.170 \n", "ASSISTANT DEPUTY CHIEF II 316285.7400 316285.740 \n", "ASSISTANT MEDICAL EXAMINER 274550.2500 274550.250 \n", "BATTALION CHIEF, (FIRE DEPARTMENT) 283868.8425 294887.300 \n", "CAPTAIN III (POLICE DEPARTMENT) 316444.4150 335279.910 \n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES 278569.2100 278569.210 \n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 302377.7300 302377.730 \n", "COMMANDER III, (POLICE DEPARTMENT) 286213.8600 286213.860 \n", "DEPARTMENT HEAD V 286708.2275 288696.025 \n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 326373.1900 326373.190 \n", "DEPUTY DIRECTOR OF INVESTMENTS 307899.4600 307899.460 \n", "EXECUTIVE CONTRACT EMPLOYEE 294580.0200 294580.020 \n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 567595.4300 567595.430 \n", "WIRE ROPE CABLE MAINTENANCE MECHANIC 332343.6100 332343.610 \n", "\n", " \\\n", " 75% max \n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 299494.1700 299494.17 \n", "ASSISTANT DEPUTY CHIEF II 316285.7400 316285.74 \n", "ASSISTANT MEDICAL EXAMINER 274550.2500 274550.25 \n", "BATTALION CHIEF, (FIRE DEPARTMENT) 306565.9250 315981.05 \n", "CAPTAIN III (POLICE DEPARTMENT) 437094.5950 538909.28 \n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES 278569.2100 278569.21 \n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 302377.7300 302377.73 \n", "COMMANDER III, (POLICE DEPARTMENT) 286213.8600 286213.86 \n", "DEPARTMENT HEAD V 290683.8225 292671.62 \n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 326373.1900 326373.19 \n", "DEPUTY DIRECTOR OF INVESTMENTS 307899.4600 307899.46 \n", "EXECUTIVE CONTRACT EMPLOYEE 294580.0200 294580.02 \n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 567595.4300 567595.43 \n", "WIRE ROPE CABLE MAINTENANCE MECHANIC 332343.6100 332343.61 \n", "\n", " Year \\\n", " count mean std min \n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 1.0 2011.0 NaN 2011.0 \n", "ASSISTANT DEPUTY CHIEF II 1.0 2011.0 NaN 2011.0 \n", "ASSISTANT MEDICAL EXAMINER 1.0 2011.0 NaN 2011.0 \n", "BATTALION CHIEF, (FIRE DEPARTMENT) 4.0 2011.0 0.0 2011.0 \n", "CAPTAIN III (POLICE DEPARTMENT) 3.0 2011.0 0.0 2011.0 \n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES 1.0 2011.0 NaN 2011.0 \n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 1.0 2011.0 NaN 2011.0 \n", "COMMANDER III, (POLICE DEPARTMENT) 1.0 2011.0 NaN 2011.0 \n", "DEPARTMENT HEAD V 2.0 2011.0 0.0 2011.0 \n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 1.0 2011.0 NaN 2011.0 \n", "DEPUTY DIRECTOR OF INVESTMENTS 1.0 2011.0 NaN 2011.0 \n", "EXECUTIVE CONTRACT EMPLOYEE 1.0 2011.0 NaN 2011.0 \n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 1.0 2011.0 NaN 2011.0 \n", "WIRE ROPE CABLE MAINTENANCE MECHANIC 1.0 2011.0 NaN 2011.0 \n", "\n", " \\\n", " 25% 50% 75% \n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 2011.0 2011.0 2011.0 \n", "ASSISTANT DEPUTY CHIEF II 2011.0 2011.0 2011.0 \n", "ASSISTANT MEDICAL EXAMINER 2011.0 2011.0 2011.0 \n", "BATTALION CHIEF, (FIRE DEPARTMENT) 2011.0 2011.0 2011.0 \n", "CAPTAIN III (POLICE DEPARTMENT) 2011.0 2011.0 2011.0 \n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES 2011.0 2011.0 2011.0 \n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 2011.0 2011.0 2011.0 \n", "COMMANDER III, (POLICE DEPARTMENT) 2011.0 2011.0 2011.0 \n", "DEPARTMENT HEAD V 2011.0 2011.0 2011.0 \n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 2011.0 2011.0 2011.0 \n", "DEPUTY DIRECTOR OF INVESTMENTS 2011.0 2011.0 2011.0 \n", "EXECUTIVE CONTRACT EMPLOYEE 2011.0 2011.0 2011.0 \n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 2011.0 2011.0 2011.0 \n", "WIRE ROPE CABLE MAINTENANCE MECHANIC 2011.0 2011.0 2011.0 \n", "\n", " \n", " max \n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 2011.0 \n", "ASSISTANT DEPUTY CHIEF II 2011.0 \n", "ASSISTANT MEDICAL EXAMINER 2011.0 \n", "BATTALION CHIEF, (FIRE DEPARTMENT) 2011.0 \n", "CAPTAIN III (POLICE DEPARTMENT) 2011.0 \n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES 2011.0 \n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 2011.0 \n", "COMMANDER III, (POLICE DEPARTMENT) 2011.0 \n", "DEPARTMENT HEAD V 2011.0 \n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 2011.0 \n", "DEPUTY DIRECTOR OF INVESTMENTS 2011.0 \n", "EXECUTIVE CONTRACT EMPLOYEE 2011.0 \n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 2011.0 \n", "WIRE ROPE CABLE MAINTENANCE MECHANIC 2011.0 " ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get some statistics on all numeric columns\n", "salaries_by_job_df.describe()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
JobTitleASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT)ASSISTANT DEPUTY CHIEF IIASSISTANT MEDICAL EXAMINERBATTALION CHIEF, (FIRE DEPARTMENT)CAPTAIN III (POLICE DEPARTMENT)CAPTAIN, EMERGENCYCY MEDICAL SERVICESCHIEF OF DEPARTMENT, (FIRE DEPARTMENT)COMMANDER III, (POLICE DEPARTMENT)DEPARTMENT HEAD VDEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT)DEPUTY DIRECTOR OF INVESTMENTSEXECUTIVE CONTRACT EMPLOYEEGENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITYWIRE ROPE CABLE MAINTENANCE MECHANIC
TotalPaycount1.001.001.004.0000003.0000001.001.001.002.0000001.001.001.001.001.00
mean299494.17316285.74274550.25295547.467500390599.370000278569.21302377.73286213.86288696.025000326373.19307899.46294580.02567595.43332343.61
stdNaNNaNNaN17602.943941129813.897272NaNNaNNaN5622.340368NaNNaNNaNNaNNaN
min299494.17316285.74274550.25276434.220000297608.920000278569.21302377.73286213.86284720.430000326373.19307899.46294580.02567595.43332343.61
25%299494.17316285.74274550.25283868.842500316444.415000278569.21302377.73286213.86286708.227500326373.19307899.46294580.02567595.43332343.61
50%299494.17316285.74274550.25294887.300000335279.910000278569.21302377.73286213.86288696.025000326373.19307899.46294580.02567595.43332343.61
75%299494.17316285.74274550.25306565.925000437094.595000278569.21302377.73286213.86290683.822500326373.19307899.46294580.02567595.43332343.61
max299494.17316285.74274550.25315981.050000538909.280000278569.21302377.73286213.86292671.620000326373.19307899.46294580.02567595.43332343.61
Yearcount1.001.001.004.0000003.0000001.001.001.002.0000001.001.001.001.001.00
mean2011.002011.002011.002011.0000002011.0000002011.002011.002011.002011.0000002011.002011.002011.002011.002011.00
stdNaNNaNNaN0.0000000.000000NaNNaNNaN0.000000NaNNaNNaNNaNNaN
min2011.002011.002011.002011.0000002011.0000002011.002011.002011.002011.0000002011.002011.002011.002011.002011.00
25%2011.002011.002011.002011.0000002011.0000002011.002011.002011.002011.0000002011.002011.002011.002011.002011.00
50%2011.002011.002011.002011.0000002011.0000002011.002011.002011.002011.0000002011.002011.002011.002011.002011.00
75%2011.002011.002011.002011.0000002011.0000002011.002011.002011.002011.0000002011.002011.002011.002011.002011.00
max2011.002011.002011.002011.0000002011.0000002011.002011.002011.002011.0000002011.002011.002011.002011.002011.00
\n", "
" ], "text/plain": [ "JobTitle ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) \\\n", "TotalPay count 1.00 \n", " mean 299494.17 \n", " std NaN \n", " min 299494.17 \n", " 25% 299494.17 \n", " 50% 299494.17 \n", " 75% 299494.17 \n", " max 299494.17 \n", "Year count 1.00 \n", " mean 2011.00 \n", " std NaN \n", " min 2011.00 \n", " 25% 2011.00 \n", " 50% 2011.00 \n", " 75% 2011.00 \n", " max 2011.00 \n", "\n", "JobTitle ASSISTANT DEPUTY CHIEF II ASSISTANT MEDICAL EXAMINER \\\n", "TotalPay count 1.00 1.00 \n", " mean 316285.74 274550.25 \n", " std NaN NaN \n", " min 316285.74 274550.25 \n", " 25% 316285.74 274550.25 \n", " 50% 316285.74 274550.25 \n", " 75% 316285.74 274550.25 \n", " max 316285.74 274550.25 \n", "Year count 1.00 1.00 \n", " mean 2011.00 2011.00 \n", " std NaN NaN \n", " min 2011.00 2011.00 \n", " 25% 2011.00 2011.00 \n", " 50% 2011.00 2011.00 \n", " 75% 2011.00 2011.00 \n", " max 2011.00 2011.00 \n", "\n", "JobTitle BATTALION CHIEF, (FIRE DEPARTMENT) \\\n", "TotalPay count 4.000000 \n", " mean 295547.467500 \n", " std 17602.943941 \n", " min 276434.220000 \n", " 25% 283868.842500 \n", " 50% 294887.300000 \n", " 75% 306565.925000 \n", " max 315981.050000 \n", "Year count 4.000000 \n", " mean 2011.000000 \n", " std 0.000000 \n", " min 2011.000000 \n", " 25% 2011.000000 \n", " 50% 2011.000000 \n", " 75% 2011.000000 \n", " max 2011.000000 \n", "\n", "JobTitle CAPTAIN III (POLICE DEPARTMENT) \\\n", "TotalPay count 3.000000 \n", " mean 390599.370000 \n", " std 129813.897272 \n", " min 297608.920000 \n", " 25% 316444.415000 \n", " 50% 335279.910000 \n", " 75% 437094.595000 \n", " max 538909.280000 \n", "Year count 3.000000 \n", " mean 2011.000000 \n", " std 0.000000 \n", " min 2011.000000 \n", " 25% 2011.000000 \n", " 50% 2011.000000 \n", " 75% 2011.000000 \n", " max 2011.000000 \n", "\n", "JobTitle CAPTAIN, EMERGENCYCY MEDICAL SERVICES \\\n", "TotalPay count 1.00 \n", " mean 278569.21 \n", " std NaN \n", " min 278569.21 \n", " 25% 278569.21 \n", " 50% 278569.21 \n", " 75% 278569.21 \n", " max 278569.21 \n", "Year count 1.00 \n", " mean 2011.00 \n", " std NaN \n", " min 2011.00 \n", " 25% 2011.00 \n", " 50% 2011.00 \n", " 75% 2011.00 \n", " max 2011.00 \n", "\n", "JobTitle CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) \\\n", "TotalPay count 1.00 \n", " mean 302377.73 \n", " std NaN \n", " min 302377.73 \n", " 25% 302377.73 \n", " 50% 302377.73 \n", " 75% 302377.73 \n", " max 302377.73 \n", "Year count 1.00 \n", " mean 2011.00 \n", " std NaN \n", " min 2011.00 \n", " 25% 2011.00 \n", " 50% 2011.00 \n", " 75% 2011.00 \n", " max 2011.00 \n", "\n", "JobTitle COMMANDER III, (POLICE DEPARTMENT) DEPARTMENT HEAD V \\\n", "TotalPay count 1.00 2.000000 \n", " mean 286213.86 288696.025000 \n", " std NaN 5622.340368 \n", " min 286213.86 284720.430000 \n", " 25% 286213.86 286708.227500 \n", " 50% 286213.86 288696.025000 \n", " 75% 286213.86 290683.822500 \n", " max 286213.86 292671.620000 \n", "Year count 1.00 2.000000 \n", " mean 2011.00 2011.000000 \n", " std NaN 0.000000 \n", " min 2011.00 2011.000000 \n", " 25% 2011.00 2011.000000 \n", " 50% 2011.00 2011.000000 \n", " 75% 2011.00 2011.000000 \n", " max 2011.00 2011.000000 \n", "\n", "JobTitle DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) \\\n", "TotalPay count 1.00 \n", " mean 326373.19 \n", " std NaN \n", " min 326373.19 \n", " 25% 326373.19 \n", " 50% 326373.19 \n", " 75% 326373.19 \n", " max 326373.19 \n", "Year count 1.00 \n", " mean 2011.00 \n", " std NaN \n", " min 2011.00 \n", " 25% 2011.00 \n", " 50% 2011.00 \n", " 75% 2011.00 \n", " max 2011.00 \n", "\n", "JobTitle DEPUTY DIRECTOR OF INVESTMENTS EXECUTIVE CONTRACT EMPLOYEE \\\n", "TotalPay count 1.00 1.00 \n", " mean 307899.46 294580.02 \n", " std NaN NaN \n", " min 307899.46 294580.02 \n", " 25% 307899.46 294580.02 \n", " 50% 307899.46 294580.02 \n", " 75% 307899.46 294580.02 \n", " max 307899.46 294580.02 \n", "Year count 1.00 1.00 \n", " mean 2011.00 2011.00 \n", " std NaN NaN \n", " min 2011.00 2011.00 \n", " 25% 2011.00 2011.00 \n", " 50% 2011.00 2011.00 \n", " 75% 2011.00 2011.00 \n", " max 2011.00 2011.00 \n", "\n", "JobTitle GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY \\\n", "TotalPay count 1.00 \n", " mean 567595.43 \n", " std NaN \n", " min 567595.43 \n", " 25% 567595.43 \n", " 50% 567595.43 \n", " 75% 567595.43 \n", " max 567595.43 \n", "Year count 1.00 \n", " mean 2011.00 \n", " std NaN \n", " min 2011.00 \n", " 25% 2011.00 \n", " 50% 2011.00 \n", " 75% 2011.00 \n", " max 2011.00 \n", "\n", "JobTitle WIRE ROPE CABLE MAINTENANCE MECHANIC \n", "TotalPay count 1.00 \n", " mean 332343.61 \n", " std NaN \n", " min 332343.61 \n", " 25% 332343.61 \n", " 50% 332343.61 \n", " 75% 332343.61 \n", " max 332343.61 \n", "Year count 1.00 \n", " mean 2011.00 \n", " std NaN \n", " min 2011.00 \n", " 25% 2011.00 \n", " 50% 2011.00 \n", " 75% 2011.00 \n", " max 2011.00 " ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Present statistics in a different way\n", "salaries_by_job_df.describe().transpose()" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EmployeeNameTotalPayYear
JobTitle
ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT)111
ASSISTANT DEPUTY CHIEF II111
ASSISTANT MEDICAL EXAMINER111
BATTALION CHIEF, (FIRE DEPARTMENT)444
CAPTAIN III (POLICE DEPARTMENT)333
CAPTAIN, EMERGENCYCY MEDICAL SERVICES111
CHIEF OF DEPARTMENT, (FIRE DEPARTMENT)111
COMMANDER III, (POLICE DEPARTMENT)111
DEPARTMENT HEAD V222
DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT)111
DEPUTY DIRECTOR OF INVESTMENTS111
EXECUTIVE CONTRACT EMPLOYEE111
GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY111
WIRE ROPE CABLE MAINTENANCE MECHANIC111
\n", "
" ], "text/plain": [ " EmployeeName TotalPay Year\n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 1 1 1\n", "ASSISTANT DEPUTY CHIEF II 1 1 1\n", "ASSISTANT MEDICAL EXAMINER 1 1 1\n", "BATTALION CHIEF, (FIRE DEPARTMENT) 4 4 4\n", "CAPTAIN III (POLICE DEPARTMENT) 3 3 3\n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES 1 1 1\n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 1 1 1\n", "COMMANDER III, (POLICE DEPARTMENT) 1 1 1\n", "DEPARTMENT HEAD V 2 2 2\n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 1 1 1\n", "DEPUTY DIRECTOR OF INVESTMENTS 1 1 1\n", "EXECUTIVE CONTRACT EMPLOYEE 1 1 1\n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 1 1 1\n", "WIRE ROPE CABLE MAINTENANCE MECHANIC 1 1 1" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Count number of rows in each group\n", "salaries_by_job_df.count()" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TotalPayYear
JobTitle
ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT)299494.17002011
ASSISTANT DEPUTY CHIEF II316285.74002011
ASSISTANT MEDICAL EXAMINER274550.25002011
BATTALION CHIEF, (FIRE DEPARTMENT)295547.46752011
CAPTAIN III (POLICE DEPARTMENT)390599.37002011
CAPTAIN, EMERGENCYCY MEDICAL SERVICES278569.21002011
CHIEF OF DEPARTMENT, (FIRE DEPARTMENT)302377.73002011
COMMANDER III, (POLICE DEPARTMENT)286213.86002011
DEPARTMENT HEAD V288696.02502011
DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT)326373.19002011
DEPUTY DIRECTOR OF INVESTMENTS307899.46002011
EXECUTIVE CONTRACT EMPLOYEE294580.02002011
GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY567595.43002011
WIRE ROPE CABLE MAINTENANCE MECHANIC332343.61002011
\n", "
" ], "text/plain": [ " TotalPay Year\n", "JobTitle \n", "ASSISTANT CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 299494.1700 2011\n", "ASSISTANT DEPUTY CHIEF II 316285.7400 2011\n", "ASSISTANT MEDICAL EXAMINER 274550.2500 2011\n", "BATTALION CHIEF, (FIRE DEPARTMENT) 295547.4675 2011\n", "CAPTAIN III (POLICE DEPARTMENT) 390599.3700 2011\n", "CAPTAIN, EMERGENCYCY MEDICAL SERVICES 278569.2100 2011\n", "CHIEF OF DEPARTMENT, (FIRE DEPARTMENT) 302377.7300 2011\n", "COMMANDER III, (POLICE DEPARTMENT) 286213.8600 2011\n", "DEPARTMENT HEAD V 288696.0250 2011\n", "DEPUTY CHIEF OF DEPARTMENT,(FIRE DEPARTMENT) 326373.1900 2011\n", "DEPUTY DIRECTOR OF INVESTMENTS 307899.4600 2011\n", "EXECUTIVE CONTRACT EMPLOYEE 294580.0200 2011\n", "GENERAL MANAGER-METROPOLITAN TRANSIT AUTHORITY 567595.4300 2011\n", "WIRE ROPE CABLE MAINTENANCE MECHANIC 332343.6100 2011" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Find the mean of numeric columns\n", "salaries_by_job_df.mean()" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "567595.43000000005" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the highest pay\n", "salaries_df['TotalPay'].max()" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the position of the highest pay\n", "salaries_df['TotalPay'].argmax()" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "EmployeeName GARY JIMENEZ\n", "JobTitle CAPTAIN III (POLICE DEPARTMENT)\n", "TotalPay 538909\n", "Year 2011\n", "Name: 2, dtype: object" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get the person with the highest pay\n", "salaries_df.iloc[salaries_df['TotalPay'].argmax()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Combining DataFrames" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],\n", " 'B': ['B0', 'B1', 'B2', 'B3'],\n", " 'C': ['C0', 'C1', 'C2', 'C3'],\n", " 'D': ['D0', 'D1', 'D2', 'D3']},\n", " index=[0, 1, 2, 3])\n", "df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],\n", " 'B': ['B4', 'B5', 'B6', 'B7'],\n", " 'C': ['C4', 'C5', 'C6', 'C7'],\n", " 'D': ['D4', 'D5', 'D6', 'D7']},\n", " index=[4, 5, 6, 7]) \n", "df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],\n", " 'B': ['B8', 'B9', 'B10', 'B11'],\n", " 'C': ['C8', 'C9', 'C10', 'C11'],\n", " 'D': ['D8', 'D9', 'D10', 'D11']},\n", " index=[8, 9, 10, 11])" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
0A0B0C0D0
1A1B1C1D1
2A2B2C2D2
3A3B3C3D3
4A4B4C4D4
5A5B5C5D5
6A6B6C6D6
7A7B7C7D7
8A8B8C8D8
9A9B9C9D9
10A10B10C10D10
11A11B11C11D11
\n", "
" ], "text/plain": [ " A B C D\n", "0 A0 B0 C0 D0\n", "1 A1 B1 C1 D1\n", "2 A2 B2 C2 D2\n", "3 A3 B3 C3 D3\n", "4 A4 B4 C4 D4\n", "5 A5 B5 C5 D5\n", "6 A6 B6 C6 D6\n", "7 A7 B7 C7 D7\n", "8 A8 B8 C8 D8\n", "9 A9 B9 C9 D9\n", "10 A10 B10 C10 D10\n", "11 A11 B11 C11 D11" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Combine along the rows\n", "pd.concat([df1, df2, df3])" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDABCDABCD
0A0B0C0D0NaNNaNNaNNaNNaNNaNNaNNaN
1A1B1C1D1NaNNaNNaNNaNNaNNaNNaNNaN
2A2B2C2D2NaNNaNNaNNaNNaNNaNNaNNaN
3A3B3C3D3NaNNaNNaNNaNNaNNaNNaNNaN
4NaNNaNNaNNaNA4B4C4D4NaNNaNNaNNaN
5NaNNaNNaNNaNA5B5C5D5NaNNaNNaNNaN
6NaNNaNNaNNaNA6B6C6D6NaNNaNNaNNaN
7NaNNaNNaNNaNA7B7C7D7NaNNaNNaNNaN
8NaNNaNNaNNaNNaNNaNNaNNaNA8B8C8D8
9NaNNaNNaNNaNNaNNaNNaNNaNA9B9C9D9
10NaNNaNNaNNaNNaNNaNNaNNaNA10B10C10D10
11NaNNaNNaNNaNNaNNaNNaNNaNA11B11C11D11
\n", "
" ], "text/plain": [ " A B C D A B C D A B C D\n", "0 A0 B0 C0 D0 NaN NaN NaN NaN NaN NaN NaN NaN\n", "1 A1 B1 C1 D1 NaN NaN NaN NaN NaN NaN NaN NaN\n", "2 A2 B2 C2 D2 NaN NaN NaN NaN NaN NaN NaN NaN\n", "3 A3 B3 C3 D3 NaN NaN NaN NaN NaN NaN NaN NaN\n", "4 NaN NaN NaN NaN A4 B4 C4 D4 NaN NaN NaN NaN\n", "5 NaN NaN NaN NaN A5 B5 C5 D5 NaN NaN NaN NaN\n", "6 NaN NaN NaN NaN A6 B6 C6 D6 NaN NaN NaN NaN\n", "7 NaN NaN NaN NaN A7 B7 C7 D7 NaN NaN NaN NaN\n", "8 NaN NaN NaN NaN NaN NaN NaN NaN A8 B8 C8 D8\n", "9 NaN NaN NaN NaN NaN NaN NaN NaN A9 B9 C9 D9\n", "10 NaN NaN NaN NaN NaN NaN NaN NaN A10 B10 C10 D10\n", "11 NaN NaN NaN NaN NaN NaN NaN NaN A11 B11 C11 D11" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Combine along the columns\n", "# Note that Pandas assigns cell values that does not align correct to NaN\n", "pd.concat([df1, df2, df3], axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `merge` function is useful if we want to combine DataFrames like we join tables using SQL." ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": true }, "outputs": [], "source": [ "left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],\n", " 'key2': ['K0', 'K1', 'K0', 'K1'],\n", " 'A': ['A0', 'A1', 'A2', 'A3'],\n", " 'B': ['B0', 'B1', 'B2', 'B3']})\n", "right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],\n", " 'key2': ['K0', 'K0', 'K0', 'K0'],\n", " 'C': ['C0', 'C1', 'C2', 'C3'],\n", " 'D': ['D0', 'D1', 'D2', 'D3']})" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABkey1key2CD
0A0B0K0K0C0D0
1A2B2K1K0C1D1
2A2B2K1K0C2D2
\n", "
" ], "text/plain": [ " A B key1 key2 C D\n", "0 A0 B0 K0 K0 C0 D0\n", "1 A2 B2 K1 K0 C1 D1\n", "2 A2 B2 K1 K0 C2 D2" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, how='inner', on=['key1', 'key2'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `join` function is used to combine the columns of DataFrames that may have different indices. It works exactly like the `merge` function except the keys that we join on are on the indices instead of the columns." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": true }, "outputs": [], "source": [ "left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],\n", " 'B': ['B0', 'B1', 'B2']},\n", " index=['K0', 'K1', 'K2']) \n", "right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],\n", " 'D': ['D0', 'D2', 'D3']},\n", " index=['K0', 'K2', 'K3'])" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
K0A0B0C0D0
K1A1B1NaNNaN
K2A2B2C2D2
\n", "
" ], "text/plain": [ " A B C D\n", "K0 A0 B0 C0 D0\n", "K1 A1 B1 NaN NaN\n", "K2 A2 B2 C2 D2" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "left.join(right)" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
K0A0B0C0D0
K1A1B1NaNNaN
K2A2B2C2D2
K3NaNNaNC3D3
\n", "
" ], "text/plain": [ " A B C D\n", "K0 A0 B0 C0 D0\n", "K1 A1 B1 NaN NaN\n", "K2 A2 B2 C2 D2\n", "K3 NaN NaN C3 D3" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "left.join(right, how='outer')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Operations" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2col3
01444abc
12555def
23666ghi
34444xyz
\n", "
" ], "text/plain": [ " col1 col2 col3\n", "0 1 444 abc\n", "1 2 555 def\n", "2 3 666 ghi\n", "3 4 444 xyz" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame({'col1':[1,2,3,4],\n", " 'col2':[444,555,666,444],\n", " 'col3':['abc','def','ghi','xyz']})\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([444, 555, 666], dtype=int64)" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Find the unique values in col2\n", "df['col2'].unique()" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Find the number of unique values in col2\n", "df['col2'].nunique()" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "444 2\n", "555 1\n", "666 1\n", "Name: col2, dtype: int64" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Find the unique values in col2\n", "df['col2'].value_counts()" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "444 2\n", "Name: col2, dtype: int64" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The value_counts() can be used to find top X row most common value\n", "df['col2'].value_counts().head(1)" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 4\n", "2 9\n", "3 16\n", "Name: col1, dtype: int64" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Apply custom function to each element of a column\n", "df['col1'].apply(lambda element_value: element_value**2)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['col1', 'col2', 'col3'], dtype='object')" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Find the names of all the columns in the DataFrame\n", "df.columns" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2col3
01444abc
34444xyz
12555def
23666ghi
\n", "
" ], "text/plain": [ " col1 col2 col3\n", "0 1 444 abc\n", "3 4 444 xyz\n", "1 2 555 def\n", "2 3 666 ghi" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sort data\n", "df.sort_values(by='col2')" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
col1col2col3
0FalseFalseFalse
1FalseFalseFalse
2FalseFalseFalse
3FalseFalseFalse
\n", "
" ], "text/plain": [ " col1 col2 col3\n", "0 False False False\n", "1 False False False\n", "2 False False False\n", "3 False False False" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Find null values\n", "df.isnull()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading data from HTML" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "data = pd.read_html('https://borsen.dk/kurser/danske_aktier/c20_cap.html', thousands='.', decimal=',')\n", "df = data[0]" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 20 entries, 0 to 19\n", "Data columns (total 8 columns):\n", "\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\tAktie\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t 20 non-null object\n", "\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\t%\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t 20 non-null float64\n", "\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\t+/-\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t 20 non-null float64\n", "\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\tKurs\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t 20 non-null float64\n", "\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\tÅTD%\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t 20 non-null float64\n", "\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\tBud\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t 20 non-null float64\n", "\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\tUdbud\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t 20 non-null float64\n", "\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\tOmsætning\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t 20 non-null float64\n", "dtypes: float64(7), object(1)\n", "memory usage: 1.3+ KB\n" ] } ], "source": [ "# Show information about the data\n", "df.info()" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tAktie\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t',\n", " '\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t%\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t',\n", " '\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t+/-\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t',\n", " '\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tKurs\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t',\n", " '\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tÅTD%\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t',\n", " '\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tBud\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t',\n", " '\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tUdbud\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t',\n", " '\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tOmsætning\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t\\t\\n\\t\\t\\t\\t\\t\\t\\t\\t\\t'],\n", " dtype='object')" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "df.columns = ['Akie', '%', '+/-', 'Kurs', 'ATD%', 'Bud', 'Udbud', 'Omsætning']" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 20 entries, 0 to 19\n", "Data columns (total 8 columns):\n", "Akie 20 non-null object\n", "% 20 non-null float64\n", "+/- 20 non-null float64\n", "Kurs 20 non-null float64\n", "ATD% 20 non-null float64\n", "Bud 20 non-null float64\n", "Udbud 20 non-null float64\n", "Omsætning 20 non-null float64\n", "dtypes: float64(7), object(1)\n", "memory usage: 1.3+ KB\n" ] } ], "source": [ "df.info()" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "82104014.530000001" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['Omsætning'][0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }