{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tidy FAA survey\n", "\n", "In response to a public records request made by The Times, the FAA provided a decade of its annual estimates about the activity of America's most common helicopter models. This data is not published online. The records provided to The Times were manually input into a spreadsheet.\n", "\n", "This notebook filters the released data to population counts and flight-hour estimates for models that have appeared in all ten years of survey data." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "%store -r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read in the raw survey data." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "survey = pd.read_csv(os.path.join(input_dir, \"faa_general_aviation_survey.csv\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Identify all the models that have been surveyed by the FAA" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "model_counts = survey.groupby([\n", " 'latimes_make',\n", " 'latimes_model'\n", "]).size().rename(\"survey_count\").reset_index()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
latimes_makelatimes_modelsurvey_count
0AGUSTA10910
1AGUSTA1192
2AGUSTA1394
3AIRBUS1203
4AIRBUS13010
5AIRBUS13510
6AIRBUS35010
7BELL20610
8BELL2125
9BELL2221
10BELL40710
11BELL4124
12BELL4291
13BELL477
14BELLOH-5811
15BELLTH-131
16BELLUH-1H9
17ENSTROM2801
18ENSTROMF284
19HILLERUH-122
20HUGHES2693
21HUGHES36910
22HUGHESOH2
23HUGHESTH1
24HUGHESYO2
25MBB1056
26MBB1179
27MCDONNELL DOUGLAS36910
28MITSUBISHIMU2B1
29ROBINSONR2210
30ROBINSONR4410
31ROBINSONR661
32ROCKWELL6901
33ROTARYRAF20001
34SCHLEICHERASW271
35SCHWEIZER26910
36SIKORSKY612
37SIKORSKY7610
38SIKORSKY922
39SOCATATBM7001
40SWEARINGENSA2271
41TAYLORCRAFTBC121
42VANSRV101
43VANSRV41
44VANSRV61
45VANSRV71
46VANSRV81
47VANSRV91
48YAKOVLEVYAK521
49ZENAIRCH6011
50ZENAIRCH7011
\n", "
" ], "text/plain": [ " latimes_make latimes_model survey_count\n", "0 AGUSTA 109 10\n", "1 AGUSTA 119 2\n", "2 AGUSTA 139 4\n", "3 AIRBUS 120 3\n", "4 AIRBUS 130 10\n", "5 AIRBUS 135 10\n", "6 AIRBUS 350 10\n", "7 BELL 206 10\n", "8 BELL 212 5\n", "9 BELL 222 1\n", "10 BELL 407 10\n", "11 BELL 412 4\n", "12 BELL 429 1\n", "13 BELL 47 7\n", "14 BELL OH-58 11\n", "15 BELL TH-13 1\n", "16 BELL UH-1H 9\n", "17 ENSTROM 280 1\n", "18 ENSTROM F28 4\n", "19 HILLER UH-12 2\n", "20 HUGHES 269 3\n", "21 HUGHES 369 10\n", "22 HUGHES OH 2\n", "23 HUGHES TH 1\n", "24 HUGHES YO 2\n", "25 MBB 105 6\n", "26 MBB 117 9\n", "27 MCDONNELL DOUGLAS 369 10\n", "28 MITSUBISHI MU2B 1\n", "29 ROBINSON R22 10\n", "30 ROBINSON R44 10\n", "31 ROBINSON R66 1\n", "32 ROCKWELL 690 1\n", "33 ROTARY RAF2000 1\n", "34 SCHLEICHER ASW27 1\n", "35 SCHWEIZER 269 10\n", "36 SIKORSKY 61 2\n", "37 SIKORSKY 76 10\n", "38 SIKORSKY 92 2\n", "39 SOCATA TBM700 1\n", "40 SWEARINGEN SA227 1\n", "41 TAYLORCRAFT BC12 1\n", "42 VANS RV10 1\n", "43 VANS RV4 1\n", "44 VANS RV6 1\n", "45 VANS RV7 1\n", "46 VANS RV8 1\n", "47 VANS RV9 1\n", "48 YAKOVLEV YAK52 1\n", "49 ZENAIR CH601 1\n", "50 ZENAIR CH701 1" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_counts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Filter down to models that appear in all ten years of survey data" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "qualified_models = model_counts[model_counts.survey_count == 10][\n", " ['latimes_make', 'latimes_model']\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "List them" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
latimes_makelatimes_model
0AGUSTA109
4AIRBUS130
5AIRBUS135
6AIRBUS350
7BELL206
10BELL407
21HUGHES369
27MCDONNELL DOUGLAS369
29ROBINSONR22
30ROBINSONR44
35SCHWEIZER269
37SIKORSKY76
\n", "
" ], "text/plain": [ " latimes_make latimes_model\n", "0 AGUSTA 109\n", "4 AIRBUS 130\n", "5 AIRBUS 135\n", "6 AIRBUS 350\n", "7 BELL 206\n", "10 BELL 407\n", "21 HUGHES 369\n", "27 MCDONNELL DOUGLAS 369\n", "29 ROBINSON R22\n", "30 ROBINSON R44\n", "35 SCHWEIZER 269\n", "37 SIKORSKY 76" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qualified_models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Filter down the survey data to only include those models that have qualified for analysis." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "qualified_survey = survey.merge(\n", " qualified_models,\n", " on=[\"latimes_make\", \"latimes_model\"],\n", " how=\"inner\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Trim down the columns." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "trimmed_qualified_survey = qualified_survey[[\n", " 'year',\n", " 'latimes_make',\n", " 'latimes_model',\n", " 'population_count',\n", " 'active_count',\n", " 'total_hours'\n", "]]" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "cleaned_qualified_survey = trimmed_qualified_survey.sort_values([\n", " 'latimes_make',\n", " 'latimes_model',\n", " 'year'\n", "])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a combined make and model column." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "cleaned_qualified_survey['latimes_make_and_model'] = cleaned_qualified_survey.latimes_make + \" \" + cleaned_qualified_survey.latimes_model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Output the result." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "cleaned_qualified_survey.to_csv(os.path.join(output_dir, \"tidy-survey.csv\"), index=False)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }