{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CASE STUDY: AIRCRAFT DATA\n",
"\n",
"\n",
"Lets say we're interested in the characteristics of various aircraft, for the purpose of analyzing noise levels associated therewith. We may have several reasons for improving our fluency regarding what aircraft fly in and out of the nearest airport.\n",
"\n",
"[Citizens for Quieter Neighbordhood](https://citizensforquieterneighborhood.com/)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below is a typical pandas function call. We're using Python to read an Excel spreadsheet, but maybe we don't want all the columns or have specific conversions we would like to perform.\n",
"\n",
"```python\n",
"\n",
"pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, verbose=False, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, mangle_dupe_cols=True, **kwds)\n",
"```\n",
"\n",
"Resources:\n",
"\n",
"* [Reading Excel into pandas](https://www.marsja.se/pandas-excel-tutorial-how-to-read-and-write-excel-files/)\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"\n",
"URL = \"https://www.faa.gov/airports/engineering/aircraft_char_database/media/FAA-Aircraft-Char-Database-v2-201810.xlsx\"\n",
"\n",
"df = pd.read_excel(URL, sheet_name=\"Aircraft Database\", \n",
" usecols=['Manufacturer', 'Model', 'Physical Class (Engine)', '# Engines'])"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Manufacturer | \n",
" Model | \n",
" Physical Class (Engine) | \n",
" # Engines | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Acro Sport | \n",
" Acro Sport | \n",
" Piston | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" Acro Sport | \n",
" Acro Sport II | \n",
" Piston | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" Acro Sport | \n",
" Cougar | \n",
" Piston | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" Acro Sport | \n",
" Junior Ace | \n",
" Piston | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" Acro Sport | \n",
" Super Ace | \n",
" Piston | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Manufacturer Model Physical Class (Engine) # Engines\n",
"0 Acro Sport Acro Sport Piston 1\n",
"1 Acro Sport Acro Sport II Piston 1\n",
"2 Acro Sport Cougar Piston 1\n",
"3 Acro Sport Junior Ace Piston 1\n",
"4 Acro Sport Super Ace Piston 1"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Manufacturer | \n",
" Model | \n",
" Physical Class (Engine) | \n",
" # Engines | \n",
"
\n",
" \n",
" \n",
" \n",
" 2759 | \n",
" Zlin | \n",
" Zlin Z-50 | \n",
" tbd | \n",
" tbd | \n",
"
\n",
" \n",
" 2760 | \n",
" Zlin | \n",
" Zlin Z-526 Akrobat | \n",
" tbd | \n",
" tbd | \n",
"
\n",
" \n",
" 2761 | \n",
" Zlin | \n",
" Zlin Z-526 Trener Master | \n",
" tbd | \n",
" tbd | \n",
"
\n",
" \n",
" 2762 | \n",
" Zlin | \n",
" Zlin Z-626 | \n",
" tbd | \n",
" tbd | \n",
"
\n",
" \n",
" 2763 | \n",
" Zlin | \n",
" Zlin Z-726 Universal | \n",
" tbd | \n",
" tbd | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Manufacturer Model Physical Class (Engine) # Engines\n",
"2759 Zlin Zlin Z-50 tbd tbd\n",
"2760 Zlin Zlin Z-526 Akrobat tbd tbd\n",
"2761 Zlin Zlin Z-526 Trener Master tbd tbd\n",
"2762 Zlin Zlin Z-626 tbd tbd\n",
"2763 Zlin Zlin Z-726 Universal tbd tbd"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['Manufacturer', 'Model', 'Physical Class (Engine)', '# Engines'], dtype='object')"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.columns"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2764"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(df)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2764, 25)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"agg corr cumsum get_group mean pct_change sem transform\n",
"aggregate corrwith describe head median pipe shift tshift \n",
"all count diff hist min plot size var \n",
"any cov expanding idxmax ngroup prod skew \n",
"apply cumcount ffill idxmin nth quantile std \n",
"backfill cummax fillna last nunique rank sum \n",
"bfill cummin filter mad ohlc resample tail \n",
"boxplot cumprod first max pad rolling take \n",
"\n"
]
}
],
"source": [
"import IPython\n",
"\n",
"# Grouping by one factor\n",
"df_man = df.groupby('Manufacturer')\n",
"\n",
"# Getting all methods from the groupby object:\n",
"meth = [method_name for method_name in dir(df_man)\n",
" if callable(getattr(df_man, method_name)) & ~method_name.startswith('_')]\n",
"\n",
"# Printing the result\n",
"print(IPython.utils.text.columnize(meth))"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"False as break del finally if lambda pass while\n",
"None assert class elif for import nonlocal raise with \n",
"True async continue else from in not return yield\n",
"and await def except global is or try \n",
"\n"
]
}
],
"source": [
"from keyword import kwlist\n",
"print(IPython.utils.text.columnize(kwlist))"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[('AESL', Int64Index([106, 107], dtype='int64')),\n",
" ('AIDC', Int64Index([121, 122, 123, 124], dtype='int64')),\n",
" ('AMX', Int64Index([216, 217], dtype='int64')),\n",
" ('AMX International', Int64Index([218], dtype='int64')),\n",
" ('ATR (Aérospatiale/Alenia)',\n",
" Int64Index([248, 249, 250, 251, 252, 253, 254, 255], dtype='int64')),\n",
" ('AVRO', Int64Index([300, 301, 302, 303, 304, 305, 306], dtype='int64')),\n",
" ('Acro Sport', Int64Index([0, 1, 2, 3, 4, 5], dtype='int64')),\n",
" ('Adam Aircraft Industries', Int64Index([6, 7], dtype='int64')),\n",
" ('Aeritalia', Int64Index([8], dtype='int64')),\n",
" ('Aermacchi / Macchi',\n",
" Int64Index([9, 10, 11, 12, 13, 14, 15, 16], dtype='int64'))]"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"list(df_man.groups.items())[:10]"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Model | \n",
" Physical Class (Engine) | \n",
" # Engines | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Acro Sport | \n",
" Piston | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" Acro Sport II | \n",
" Piston | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" Cougar | \n",
" Piston | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" Junior Ace | \n",
" Piston | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" Super Ace | \n",
" Piston | \n",
" 1 | \n",
"
\n",
" \n",
" 5 | \n",
" Super Acro Sport | \n",
" Piston | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Model Physical Class (Engine) # Engines\n",
"0 Acro Sport Piston 1\n",
"1 Acro Sport II Piston 1\n",
"2 Cougar Piston 1\n",
"3 Junior Ace Piston 1\n",
"4 Super Ace Piston 1\n",
"5 Super Acro Sport Piston 1"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_man.get_group(\"Acro Sport\")"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Manufacturer\n",
"AESL 2\n",
"AIDC 4\n",
"AMX 2\n",
"AMX International 1\n",
"ATR (Aérospatiale/Alenia) 8\n",
" ..\n",
"Wing Aircraft 1\n",
"XTI Aircraft 1\n",
"Yakovlev / Jakovlev 24\n",
"Zivko Aeronautics Inc. 2\n",
"Zlin 42\n",
"Length: 424, dtype: int64"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_man.size()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Model | \n",
" Physical Class (Engine) | \n",
" # Engines | \n",
"
\n",
" \n",
" Manufacturer | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" AESL | \n",
" 2 | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" AIDC | \n",
" 4 | \n",
" 4 | \n",
" 4 | \n",
"
\n",
" \n",
" AMX | \n",
" 2 | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" AMX International | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" ATR (Aérospatiale/Alenia) | \n",
" 8 | \n",
" 8 | \n",
" 8 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" Wing Aircraft | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" XTI Aircraft | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" Yakovlev / Jakovlev | \n",
" 24 | \n",
" 24 | \n",
" 24 | \n",
"
\n",
" \n",
" Zivko Aeronautics Inc. | \n",
" 2 | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" Zlin | \n",
" 42 | \n",
" 42 | \n",
" 42 | \n",
"
\n",
" \n",
"
\n",
"
424 rows × 3 columns
\n",
"
"
],
"text/plain": [
" Model Physical Class (Engine) # Engines\n",
"Manufacturer \n",
"AESL 2 2 2\n",
"AIDC 4 4 4\n",
"AMX 2 2 2\n",
"AMX International 1 1 1\n",
"ATR (Aérospatiale/Alenia) 8 8 8\n",
"... ... ... ...\n",
"Wing Aircraft 1 1 1\n",
"XTI Aircraft 1 1 1\n",
"Yakovlev / Jakovlev 24 24 24\n",
"Zivko Aeronautics Inc. 2 2 2\n",
"Zlin 42 42 42\n",
"\n",
"[424 rows x 3 columns]"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_man.count() # how many not-missing values per column per group"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Model | \n",
" Physical Class (Engine) | \n",
" # Engines | \n",
"
\n",
" \n",
" Manufacturer | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" AESL | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" AIDC | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" AMX | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" AMX International | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" ATR (Aérospatiale/Alenia) | \n",
" 8 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" Wing Aircraft | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" XTI Aircraft | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" Yakovlev / Jakovlev | \n",
" 21 | \n",
" 2 | \n",
" 2 | \n",
"
\n",
" \n",
" Zivko Aeronautics Inc. | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" Zlin | \n",
" 42 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
424 rows × 3 columns
\n",
"
"
],
"text/plain": [
" Model Physical Class (Engine) # Engines\n",
"Manufacturer \n",
"AESL 2 1 1\n",
"AIDC 4 1 1\n",
"AMX 2 1 1\n",
"AMX International 1 1 1\n",
"ATR (Aérospatiale/Alenia) 8 1 1\n",
"... ... ... ...\n",
"Wing Aircraft 1 1 1\n",
"XTI Aircraft 1 1 1\n",
"Yakovlev / Jakovlev 21 2 2\n",
"Zivko Aeronautics Inc. 2 1 1\n",
"Zlin 42 1 1\n",
"\n",
"[424 rows x 3 columns]"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_man.nunique() # how many unique values per column per group"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import YouTubeVideo\n",
"YouTubeVideo(\"MjHpMCIvwsY\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}