{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# \"Alternative-fuel\" vehicles in Delaware: Data cleaning\n", "Peter Attia, [petermattia.com](http://petermattia.com)\n", "\n", "Last updated November 8, 2017\n", "\n", "This notebook investigates alternative-fuel vehicle purchases in Delaware. Alternative-fuel vehicles include:\n", "- Battery electric vehicles (BEVs)\n", "- Plug-in hybrid electric vehicles (PHEVs)\n", "- Propane or natural gas vehicles\n", "\n", "Rebates data downloaded on October 22, 2017 (data last updated October 2, 2017) from [this link](https://data.delaware.gov/Energy-and-Environment/State-Rebates-for-Alternative-Fuel-Vehicles/8z8z-di7f)\n", "\n", "More information about the program:\n", "- http://dnrec.alpha.delaware.gov/energy-climate/clean-transportation/vehicle-rebates/\n", "- http://www.dnrec.delaware.gov/energy/Pages/The-Delaware-Clean-Vehicle-Rebate-Program.aspx\n", "- http://www.dnrec.delaware.gov/energy/Pages/Clean-Transportation-Incentives-Home.aspx\n", "- http://www.dnrec.delaware.gov/energy/Pages/Clean-Transportation-July2015-October2016.aspx\n", "\n", "This notebook contains the data cleaning process for this dataset. [Another notebook](http://nbviewer.jupyter.org/github/petermattia/Delaware-EVs/blob/master/Electric%20vehicles%20in%20Delaware%20-%20Data%20cleaning.ipynb) shows the data analysis.\n", "\n", "## Imports and defaults" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# IMPORTS\n", "import pandas as pd\n", "from collections import Counter\n", "\n", "# DEFAULTS\n", "pd.set_option(\"display.max_rows\",10)\n", "pd.set_option(\"display.max_columns\",20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the `rebates` dataset" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Award_NumberRebate_AmountCityStateCountyZipAgeGenderDate_of_PurchaseDealershipVehicle_TypeMakeModelYearLease?Gasoline_EmissionsAlt-Fuel_Emissions
0CVR071501$2200.00HockessinDENew Castle1970781.0M07/20/2015Sheridan FordPlug-in HybridFordFusion Energi2016No14815.06575.0
1CVR071502$2200.00WilmingtonDENew Castle1980947.0M07/20/2015Porter FordPlug-in HybridFordC-Max Energi2015Yes14815.06575.0
2CVR071503$2200.00WilmingtonDENew Castle1981047.0M07/22/2015Darcars of Lanham Severn. Inc.ElectricFordFocus Electric2015Yes14815.05539.0
3CVR071504$2200.00WilmingtonDENew Castle1980866.0M07/20/2015Sheridan FordPlug-in HybridFordC-Max Energi2015Yes14815.06575.0
4CVR071505$2200.00New CastleDENew Castle1972051.0M07/24/2015Sheridan FordPlug-in HybridFordFusion Energi2015Yes14815.06575.0
......................................................
510CVR0616147$1100.00DoverDEKent19901NaNNaN04/20/2016Coach and Equipment/ Coach Bus SalesPropaneFordE-450 Phoenix2016NoNaNNaN
511CVR0616148$1100.00DoverDEKent19901NaNNaN04/20/2016Coach and Equipment/ Coach Bus SalesPropaneFordE-450 Phoenix2016NoNaNNaN
512CVR0616149$1100.00DoverDEKent19901NaNNaN04/20/2016Coach and Equipment/ Coach Bus SalesPropaneFordE-450 Phoenix2016NoNaNNaN
513CVR0616150$1100.00DoverDEKent19901NaNNaN04/20/2016Coach and Equipment/ Coach Bus SalesPropaneFordE-450 Phoenix2016NoNaNNaN
514CVR0616151$1100.00DoverDEKent19901NaNNaN04/20/2016Coach and Equipment/ Coach Bus SalesPropaneFordE-450 Phoenix2016NoNaNNaN
\n", "

515 rows × 17 columns

\n", "
" ], "text/plain": [ " Award_Number Rebate_Amount City State County Zip Age \\\n", "0 CVR071501 $2200.00 Hockessin DE New Castle 19707 81.0 \n", "1 CVR071502 $2200.00 Wilmington DE New Castle 19809 47.0 \n", "2 CVR071503 $2200.00 Wilmington DE New Castle 19810 47.0 \n", "3 CVR071504 $2200.00 Wilmington DE New Castle 19808 66.0 \n", "4 CVR071505 $2200.00 New Castle DE New Castle 19720 51.0 \n", ".. ... ... ... ... ... ... ... \n", "510 CVR0616147 $1100.00 Dover DE Kent 19901 NaN \n", "511 CVR0616148 $1100.00 Dover DE Kent 19901 NaN \n", "512 CVR0616149 $1100.00 Dover DE Kent 19901 NaN \n", "513 CVR0616150 $1100.00 Dover DE Kent 19901 NaN \n", "514 CVR0616151 $1100.00 Dover DE Kent 19901 NaN \n", "\n", " Gender Date_of_Purchase Dealership \\\n", "0 M 07/20/2015 Sheridan Ford \n", "1 M 07/20/2015 Porter Ford \n", "2 M 07/22/2015 Darcars of Lanham Severn. Inc. \n", "3 M 07/20/2015 Sheridan Ford \n", "4 M 07/24/2015 Sheridan Ford \n", ".. ... ... ... \n", "510 NaN 04/20/2016 Coach and Equipment/ Coach Bus Sales \n", "511 NaN 04/20/2016 Coach and Equipment/ Coach Bus Sales \n", "512 NaN 04/20/2016 Coach and Equipment/ Coach Bus Sales \n", "513 NaN 04/20/2016 Coach and Equipment/ Coach Bus Sales \n", "514 NaN 04/20/2016 Coach and Equipment/ Coach Bus Sales \n", "\n", " Vehicle_Type Make Model Year Lease? Gasoline_Emissions \\\n", "0 Plug-in Hybrid Ford Fusion Energi 2016 No 14815.0 \n", "1 Plug-in Hybrid Ford C-Max Energi 2015 Yes 14815.0 \n", "2 Electric Ford Focus Electric 2015 Yes 14815.0 \n", "3 Plug-in Hybrid Ford C-Max Energi 2015 Yes 14815.0 \n", "4 Plug-in Hybrid Ford Fusion Energi 2015 Yes 14815.0 \n", ".. ... ... ... ... ... ... \n", "510 Propane Ford E-450 Phoenix 2016 No NaN \n", "511 Propane Ford E-450 Phoenix 2016 No NaN \n", "512 Propane Ford E-450 Phoenix 2016 No NaN \n", "513 Propane Ford E-450 Phoenix 2016 No NaN \n", "514 Propane Ford E-450 Phoenix 2016 No NaN \n", "\n", " Alt-Fuel_Emissions \n", "0 6575.0 \n", "1 6575.0 \n", "2 5539.0 \n", "3 6575.0 \n", "4 6575.0 \n", ".. ... \n", "510 NaN \n", "511 NaN \n", "512 NaN \n", "513 NaN \n", "514 NaN \n", "\n", "[515 rows x 17 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rebates = pd.read_csv('State_Rebates_for_Alternative-Fuel_Vehicles.csv')\n", "rebates.columns = [c.replace(' ', '_') for c in rebates.columns] # add underscores to column names\n", "rebates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data cleaning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### County" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#### All alternative fuel vehicles" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Counter({'Kent': 76, 'Kent ': 1, 'New Castle': 330, 'Sussex': 108})" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Counter(rebates[\"County\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remove whitespace in 'Kent' with str.strip()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Counter({'Kent': 77, 'New Castle': 330, 'Sussex': 108})" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rebates[\"County\"] = rebates[\"County\"].str.strip()\n", "Counter(rebates[\"County\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Makes" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Counter({'Audi': 5,\n", " 'BMW': 18,\n", " 'BMW ': 7,\n", " 'Cadillac': 1,\n", " 'Chevrolet': 89,\n", " 'FORD': 1,\n", " 'Ford': 268,\n", " 'Ford ': 2,\n", " 'Hyundai': 3,\n", " 'Kia': 1,\n", " 'Mercedes-Benz': 2,\n", " 'Nissan': 20,\n", " 'Porsche ': 2,\n", " 'Tesla': 69,\n", " 'Tesla ': 1,\n", " 'Toyota': 22,\n", " 'Volvo': 2,\n", " 'ford': 2})" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Counter(rebates[\"Make\"])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Ford', 273),\n", " ('Chevrolet', 89),\n", " ('Tesla', 70),\n", " ('BMW', 25),\n", " ('Toyota', 22),\n", " ('Nissan', 20),\n", " ('Audi', 5),\n", " ('Hyundai', 3),\n", " ('Mercedes-Benz', 2),\n", " ('Volvo', 2),\n", " ('Porsche', 2),\n", " ('Cadillac', 1),\n", " ('Kia', 1)]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rebates[\"Make\"] = rebates[\"Make\"].str.title() # change FORD and ford to Ford\n", "rebates[\"Make\"] = rebates[\"Make\"].str.strip() # remove trailing whitespace\n", "rebates[\"Make\"] = rebates[\"Make\"].str.replace(\"Bmw\",\"BMW\") # to be anal\n", "Counter(rebates[\"Make\"]).most_common()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Models" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Counter({'330C': 1,\n", " '330e': 3,\n", " '530E': 1,\n", " 'A3 Sportback e-tron': 5,\n", " 'B250e': 1,\n", " 'Bolt': 17,\n", " 'C-Max Energi': 11,\n", " 'C-Max Energi ': 106,\n", " 'Cayenne S Hybrid': 2,\n", " 'E-450 Phoenix': 5,\n", " 'ELR': 1,\n", " 'F150': 2,\n", " 'F450': 2,\n", " 'Focus Electric': 9,\n", " 'Fusion Energi': 137,\n", " 'GLE 550': 1,\n", " 'Leaf': 20,\n", " 'Model S': 49,\n", " 'Model X': 20,\n", " 'Prius Prime': 22,\n", " 'S90D': 1,\n", " 'Sonata': 3,\n", " 'Soul EV': 1,\n", " 'Volt': 68,\n", " 'Volt ': 5,\n", " 'X5 40E': 14,\n", " 'XC90': 2,\n", " 'i3': 5,\n", " 'i3 Rex': 1})" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Counter(rebates[\"Model\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This one looks good without any processing. We'll strip whitespace just in case" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Fusion Energi', 137),\n", " ('C-Max Energi', 117),\n", " ('Volt', 73),\n", " ('Model S', 49),\n", " ('Prius Prime', 22),\n", " ('Leaf', 20),\n", " ('Model X', 20),\n", " ('Bolt', 17),\n", " ('X5 40E', 14),\n", " ('Focus Electric', 9),\n", " ('i3', 5),\n", " ('A3 Sportback e-tron', 5),\n", " ('E-450 Phoenix', 5),\n", " ('330e', 3),\n", " ('Sonata', 3),\n", " ('XC90', 2),\n", " ('F150', 2),\n", " ('Cayenne S Hybrid', 2),\n", " ('F450', 2),\n", " ('ELR', 1),\n", " ('B250e', 1),\n", " ('Soul EV', 1),\n", " ('S90D', 1),\n", " ('330C', 1),\n", " ('i3 Rex', 1),\n", " ('GLE 550', 1),\n", " ('530E', 1)]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rebates[\"Model\"] = rebates[\"Model\"].str.strip() # clean whitespace\n", "Counter(rebates[\"Model\"]).most_common()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Vehicle type" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Plug-in Hybrid', 383),\n", " ('Electric', 123),\n", " ('Propane', 7),\n", " ('Propane (bi-fuel)', 2)]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Counter(rebates[\"Vehicle_Type\"]).most_common()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### City" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Counter({'Bear': 14,\n", " 'Bellefonte': 1,\n", " 'Bethany Beach': 3,\n", " 'Bridgeville': 5,\n", " 'Camden': 3,\n", " 'Camden ': 2,\n", " 'Camden Wyoming': 4,\n", " 'Claymont': 2,\n", " 'Clayton': 6,\n", " 'Clayton ': 1,\n", " 'Dagsboro': 3,\n", " 'Delaware City': 1,\n", " 'Delmar': 1,\n", " 'Dover': 41,\n", " 'Elsmere': 1,\n", " 'Felton': 3,\n", " 'Frankford': 1,\n", " 'Georgetown': 2,\n", " 'Georgetown ': 1,\n", " 'Greenville': 1,\n", " 'Greenwood': 2,\n", " 'Hartly': 1,\n", " 'Hockessin': 14,\n", " 'Kenton': 1,\n", " 'Laurel': 1,\n", " 'Lewes': 24,\n", " 'Lincoln': 2,\n", " 'Magnolia': 3,\n", " 'Magnolia ': 1,\n", " 'Marshallton': 1,\n", " 'Middletown': 34,\n", " 'Middletown ': 5,\n", " 'Milford': 7,\n", " 'Millsboro': 29,\n", " 'Milton': 8,\n", " 'Montchanin': 1,\n", " 'New Castle': 13,\n", " 'New Castle ': 3,\n", " 'Newark': 65,\n", " 'Newark ': 9,\n", " 'Ocean View': 3,\n", " 'Port Penn': 1,\n", " 'Rehoboth Beach': 9,\n", " 'Rockland': 1,\n", " 'Seaford': 5,\n", " 'Seaford ': 2,\n", " 'Selbyville': 2,\n", " 'Selbyville ': 1,\n", " 'Smyrna': 14,\n", " 'Smyrna ': 1,\n", " 'Talleyville': 1,\n", " 'Townsend': 14,\n", " 'Wilmington': 127,\n", " 'Wilmington ': 14})" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Counter(rebates[\"City\"])" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Wilmington', 141),\n", " ('Newark', 74),\n", " ('Dover', 41),\n", " ('Middletown', 39),\n", " ('Millsboro', 29),\n", " ('Lewes', 24),\n", " ('New Castle', 16),\n", " ('Smyrna', 15),\n", " ('Hockessin', 14),\n", " ('Townsend', 14),\n", " ('Bear', 14),\n", " ('Rehoboth Beach', 9),\n", " ('Milton', 8),\n", " ('Clayton', 7),\n", " ('Seaford', 7),\n", " ('Milford', 7),\n", " ('Camden', 5),\n", " ('Bridgeville', 5),\n", " ('Magnolia', 4),\n", " ('Camden Wyoming', 4),\n", " ('Selbyville', 3),\n", " ('Georgetown', 3),\n", " ('Ocean View', 3),\n", " ('Dagsboro', 3),\n", " ('Bethany Beach', 3),\n", " ('Felton', 3),\n", " ('Greenwood', 2),\n", " ('Claymont', 2),\n", " ('Lincoln', 2),\n", " ('Hartly', 1),\n", " ('Bellefonte', 1),\n", " ('Talleyville', 1),\n", " ('Rockland', 1),\n", " ('Frankford', 1),\n", " ('Marshallton', 1),\n", " ('Montchanin', 1),\n", " ('Delmar', 1),\n", " ('Elsmere', 1),\n", " ('Port Penn', 1),\n", " ('Kenton', 1),\n", " ('Greenville', 1),\n", " ('Delaware City', 1),\n", " ('Laurel', 1)]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rebates[\"City\"] = rebates[\"City\"].str.strip() # clean whitespace\n", "Counter(rebates[\"City\"]).most_common()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Gender" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Counter({'M': 335, 'F': 159, nan: 21})" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Counter(rebates[\"Gender\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This looks good. I don't trust this gender breakdown, since it's just the purchaser (i.e. the beneficiary could be a couple or a spouse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Award type" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Counter({'$1000.00': 37,\n", " '$1100.00': 7,\n", " '$1350.00': 2,\n", " '$1500.00': 142,\n", " '$2000.00': 1,\n", " '$2200.00': 292,\n", " '$3500.00': 34})" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Counter(rebates[\"Rebate_Amount\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dates" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 True\n", "1 True\n", "2 True\n", "3 True\n", "4 True\n", " ... \n", "510 True\n", "511 True\n", "512 True\n", "513 True\n", "514 True\n", "Name: Date_of_Purchase, Length: 515, dtype: bool" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rebates[\"Date_of_Purchase\"] = pd.to_datetime(rebates[\"Date_of_Purchase\"])\n", "rebates[\"Date_of_Purchase\"] < '2016-10-31'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save cleaned file" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "rebates.to_csv('State_Rebates_for_Alternative-Fuel_Vehicles_cleaned.csv',index=False)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }