{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CASE STUDY: AIRCRAFT DATA\n", "\n", "\n", "Lets say we're interested in the characteristics of various aircraft, for the purpose of analyzing noise levels associated therewith. We may have several reasons for improving our fluency regarding what aircraft fly in and out of the nearest airport.\n", "\n", "[Citizens for Quieter Neighbordhood](https://citizensforquieterneighborhood.com/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is a typical pandas function call. We're using Python to read an Excel spreadsheet, but maybe we don't want all the columns or have specific conversions we would like to perform.\n", "\n", "```python\n", "\n", "pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, verbose=False, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, mangle_dupe_cols=True, **kwds)\n", "```\n", "\n", "Resources:\n", "\n", "* [Reading Excel into pandas](https://www.marsja.se/pandas-excel-tutorial-how-to-read-and-write-excel-files/)\n", " " ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "\n", "URL = \"https://www.faa.gov/airports/engineering/aircraft_char_database/media/FAA-Aircraft-Char-Database-v2-201810.xlsx\"\n", "\n", "df = pd.read_excel(URL, sheet_name=\"Aircraft Database\", \n", " usecols=['Manufacturer', 'Model', 'Physical Class (Engine)', '# Engines'])" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ManufacturerModelPhysical Class (Engine)# Engines
0Acro SportAcro SportPiston1
1Acro SportAcro Sport IIPiston1
2Acro SportCougarPiston1
3Acro SportJunior AcePiston1
4Acro SportSuper AcePiston1
\n", "
" ], "text/plain": [ " Manufacturer Model Physical Class (Engine) # Engines\n", "0 Acro Sport Acro Sport Piston 1\n", "1 Acro Sport Acro Sport II Piston 1\n", "2 Acro Sport Cougar Piston 1\n", "3 Acro Sport Junior Ace Piston 1\n", "4 Acro Sport Super Ace Piston 1" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ManufacturerModelPhysical Class (Engine)# Engines
2759ZlinZlin Z-50tbdtbd
2760ZlinZlin Z-526 Akrobattbdtbd
2761ZlinZlin Z-526 Trener Mastertbdtbd
2762ZlinZlin Z-626tbdtbd
2763ZlinZlin Z-726 Universaltbdtbd
\n", "
" ], "text/plain": [ " Manufacturer Model Physical Class (Engine) # Engines\n", "2759 Zlin Zlin Z-50 tbd tbd\n", "2760 Zlin Zlin Z-526 Akrobat tbd tbd\n", "2761 Zlin Zlin Z-526 Trener Master tbd tbd\n", "2762 Zlin Zlin Z-626 tbd tbd\n", "2763 Zlin Zlin Z-726 Universal tbd tbd" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Manufacturer', 'Model', 'Physical Class (Engine)', '# Engines'], dtype='object')" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2764" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2764, 25)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "agg corr cumsum get_group mean pct_change sem transform\n", "aggregate corrwith describe head median pipe shift tshift \n", "all count diff hist min plot size var \n", "any cov expanding idxmax ngroup prod skew \n", "apply cumcount ffill idxmin nth quantile std \n", "backfill cummax fillna last nunique rank sum \n", "bfill cummin filter mad ohlc resample tail \n", "boxplot cumprod first max pad rolling take \n", "\n" ] } ], "source": [ "import IPython\n", "\n", "# Grouping by one factor\n", "df_man = df.groupby('Manufacturer')\n", "\n", "# Getting all methods from the groupby object:\n", "meth = [method_name for method_name in dir(df_man)\n", " if callable(getattr(df_man, method_name)) & ~method_name.startswith('_')]\n", "\n", "# Printing the result\n", "print(IPython.utils.text.columnize(meth))" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False as break del finally if lambda pass while\n", "None assert class elif for import nonlocal raise with \n", "True async continue else from in not return yield\n", "and await def except global is or try \n", "\n" ] } ], "source": [ "from keyword import kwlist\n", "print(IPython.utils.text.columnize(kwlist))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('AESL', Int64Index([106, 107], dtype='int64')),\n", " ('AIDC', Int64Index([121, 122, 123, 124], dtype='int64')),\n", " ('AMX', Int64Index([216, 217], dtype='int64')),\n", " ('AMX International', Int64Index([218], dtype='int64')),\n", " ('ATR (Aérospatiale/Alenia)',\n", " Int64Index([248, 249, 250, 251, 252, 253, 254, 255], dtype='int64')),\n", " ('AVRO', Int64Index([300, 301, 302, 303, 304, 305, 306], dtype='int64')),\n", " ('Acro Sport', Int64Index([0, 1, 2, 3, 4, 5], dtype='int64')),\n", " ('Adam Aircraft Industries', Int64Index([6, 7], dtype='int64')),\n", " ('Aeritalia', Int64Index([8], dtype='int64')),\n", " ('Aermacchi / Macchi',\n", " Int64Index([9, 10, 11, 12, 13, 14, 15, 16], dtype='int64'))]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(df_man.groups.items())[:10]" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ModelPhysical Class (Engine)# Engines
0Acro SportPiston1
1Acro Sport IIPiston1
2CougarPiston1
3Junior AcePiston1
4Super AcePiston1
5Super Acro SportPiston1
\n", "
" ], "text/plain": [ " Model Physical Class (Engine) # Engines\n", "0 Acro Sport Piston 1\n", "1 Acro Sport II Piston 1\n", "2 Cougar Piston 1\n", "3 Junior Ace Piston 1\n", "4 Super Ace Piston 1\n", "5 Super Acro Sport Piston 1" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_man.get_group(\"Acro Sport\")" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Manufacturer\n", "AESL 2\n", "AIDC 4\n", "AMX 2\n", "AMX International 1\n", "ATR (Aérospatiale/Alenia) 8\n", " ..\n", "Wing Aircraft 1\n", "XTI Aircraft 1\n", "Yakovlev / Jakovlev 24\n", "Zivko Aeronautics Inc. 2\n", "Zlin 42\n", "Length: 424, dtype: int64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_man.size()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ModelPhysical Class (Engine)# Engines
Manufacturer
AESL222
AIDC444
AMX222
AMX International111
ATR (Aérospatiale/Alenia)888
............
Wing Aircraft111
XTI Aircraft111
Yakovlev / Jakovlev242424
Zivko Aeronautics Inc.222
Zlin424242
\n", "

424 rows × 3 columns

\n", "
" ], "text/plain": [ " Model Physical Class (Engine) # Engines\n", "Manufacturer \n", "AESL 2 2 2\n", "AIDC 4 4 4\n", "AMX 2 2 2\n", "AMX International 1 1 1\n", "ATR (Aérospatiale/Alenia) 8 8 8\n", "... ... ... ...\n", "Wing Aircraft 1 1 1\n", "XTI Aircraft 1 1 1\n", "Yakovlev / Jakovlev 24 24 24\n", "Zivko Aeronautics Inc. 2 2 2\n", "Zlin 42 42 42\n", "\n", "[424 rows x 3 columns]" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_man.count() # how many not-missing values per column per group" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ModelPhysical Class (Engine)# Engines
Manufacturer
AESL211
AIDC411
AMX211
AMX International111
ATR (Aérospatiale/Alenia)811
............
Wing Aircraft111
XTI Aircraft111
Yakovlev / Jakovlev2122
Zivko Aeronautics Inc.211
Zlin4211
\n", "

424 rows × 3 columns

\n", "
" ], "text/plain": [ " Model Physical Class (Engine) # Engines\n", "Manufacturer \n", "AESL 2 1 1\n", "AIDC 4 1 1\n", "AMX 2 1 1\n", "AMX International 1 1 1\n", "ATR (Aérospatiale/Alenia) 8 1 1\n", "... ... ... ...\n", "Wing Aircraft 1 1 1\n", "XTI Aircraft 1 1 1\n", "Yakovlev / Jakovlev 21 2 2\n", "Zivko Aeronautics Inc. 2 1 1\n", "Zlin 42 1 1\n", "\n", "[424 rows x 3 columns]" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_man.nunique() # how many unique values per column per group" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import YouTubeVideo\n", "YouTubeVideo(\"MjHpMCIvwsY\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }