{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Used car dataset from eBay Kleinanzeigen" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The purpose of this project is importing, cleaning and exploratory analysis of used car listings from the German ebay website. Results are visualized to help easily understand and compare the data. Observations from the analysis are summarized at the end." ] }, { "cell_type": "code", "execution_count": 280, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Import necessary packages\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib\n", "\n", "# Read the input CSV data file into a dataframe\n", "input_file = 'autos.csv'\n", "autos = pd.read_csv(input_file, encoding='Latin-1' )\n" ] }, { "cell_type": "code", "execution_count": 281, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Setup the NB to display plots inline\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some options to look at the dataframe and get familiar with the dataset\n", "- Using a helper function that reads in the CSV file and prints out specific number of rows in a table format\n", "- Running the variable 'autos' from above. This utilizes the benefits of jupyter notebook for pandas and will render the first few and last few values in a table format. \n", "- Using df.head() or .tail() to print the first and last few rows in a table format" ] }, { "cell_type": "code", "execution_count": 282, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def print_some_rows(csv_file, num_rows, enc='Latin-1'):\n", " return pd.read_csv(csv_file, nrows=num_rows, encoding=enc)" ] }, { "cell_type": "code", "execution_count": 283, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateCrawlednamesellerofferTypepriceabtestvehicleTypeyearOfRegistrationgearboxpowerPSmodelodometermonthOfRegistrationfuelTypebrandnotRepairedDamagedateCreatednrOfPicturespostalCodelastSeen
02016-03-26 17:47:46Peugeot_807_160_NAVTECH_ON_BOARDprivatAngebot$5,000controlbus2004manuell158andere150,000km3lpgpeugeotnein2016-03-26 00:00:000795882016-04-06 06:45:54
12016-04-04 13:38:56BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_OptikprivatAngebot$8,500controllimousine1997automatik2867er150,000km6benzinbmwnein2016-04-04 00:00:000710342016-04-06 14:45:08
22016-03-26 18:57:24Volkswagen_Golf_1.6_UnitedprivatAngebot$8,990testlimousine2009manuell102golf70,000km7benzinvolkswagennein2016-03-26 00:00:000353942016-04-06 20:15:37
32016-03-12 16:58:10Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...privatAngebot$4,350controlkleinwagen2007automatik71fortwo70,000km6benzinsmartnein2016-03-12 00:00:000337292016-03-15 03:16:28
42016-04-01 14:38:50Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...privatAngebot$1,350testkombi2003manuell0focus150,000km7benzinfordnein2016-04-01 00:00:000392182016-04-01 14:38:50
52016-03-21 13:47:45Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto...privatAngebot$7,900testbus2006automatik150voyager150,000km4dieselchryslerNaN2016-03-21 00:00:000229622016-04-06 09:45:21
62016-03-20 17:55:21VW_Golf_III_GT_Special_Electronic_Green_Metall...privatAngebot$300testlimousine1995manuell90golf150,000km8benzinvolkswagenNaN2016-03-20 00:00:000315352016-03-23 02:48:59
72016-03-16 18:55:19Golf_IV_1.9_TDI_90PSprivatAngebot$1,990controllimousine1998manuell90golf150,000km12dieselvolkswagennein2016-03-16 00:00:000534742016-04-07 03:17:32
\n", "
" ], "text/plain": [ " dateCrawled name \\\n", "0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD \n", "1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik \n", "2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United \n", "3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... \n", "4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... \n", "5 2016-03-21 13:47:45 Chrysler_Grand_Voyager_2.8_CRD_Aut.Limited_Sto... \n", "6 2016-03-20 17:55:21 VW_Golf_III_GT_Special_Electronic_Green_Metall... \n", "7 2016-03-16 18:55:19 Golf_IV_1.9_TDI_90PS \n", "\n", " seller offerType price abtest vehicleType yearOfRegistration \\\n", "0 privat Angebot $5,000 control bus 2004 \n", "1 privat Angebot $8,500 control limousine 1997 \n", "2 privat Angebot $8,990 test limousine 2009 \n", "3 privat Angebot $4,350 control kleinwagen 2007 \n", "4 privat Angebot $1,350 test kombi 2003 \n", "5 privat Angebot $7,900 test bus 2006 \n", "6 privat Angebot $300 test limousine 1995 \n", "7 privat Angebot $1,990 control limousine 1998 \n", "\n", " gearbox powerPS model odometer monthOfRegistration fuelType \\\n", "0 manuell 158 andere 150,000km 3 lpg \n", "1 automatik 286 7er 150,000km 6 benzin \n", "2 manuell 102 golf 70,000km 7 benzin \n", "3 automatik 71 fortwo 70,000km 6 benzin \n", "4 manuell 0 focus 150,000km 7 benzin \n", "5 automatik 150 voyager 150,000km 4 diesel \n", "6 manuell 90 golf 150,000km 8 benzin \n", "7 manuell 90 golf 150,000km 12 diesel \n", "\n", " brand notRepairedDamage dateCreated nrOfPictures \\\n", "0 peugeot nein 2016-03-26 00:00:00 0 \n", "1 bmw nein 2016-04-04 00:00:00 0 \n", "2 volkswagen nein 2016-03-26 00:00:00 0 \n", "3 smart nein 2016-03-12 00:00:00 0 \n", "4 ford nein 2016-04-01 00:00:00 0 \n", "5 chrysler NaN 2016-03-21 00:00:00 0 \n", "6 volkswagen NaN 2016-03-20 00:00:00 0 \n", "7 volkswagen nein 2016-03-16 00:00:00 0 \n", "\n", " postalCode lastSeen \n", "0 79588 2016-04-06 06:45:54 \n", "1 71034 2016-04-06 14:45:08 \n", "2 35394 2016-04-06 20:15:37 \n", "3 33729 2016-03-15 03:16:28 \n", "4 39218 2016-04-01 14:38:50 \n", "5 22962 2016-04-06 09:45:21 \n", "6 31535 2016-03-23 02:48:59 \n", "7 53474 2016-04-07 03:17:32 " ] }, "execution_count": 283, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print_some_rows(input_file, 8)" ] }, { "cell_type": "code", "execution_count": 284, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#autos" ] }, { "cell_type": "code", "execution_count": 285, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateCrawlednamesellerofferTypepriceabtestvehicleTypeyearOfRegistrationgearboxpowerPSmodelodometermonthOfRegistrationfuelTypebrandnotRepairedDamagedateCreatednrOfPicturespostalCodelastSeen
02016-03-26 17:47:46Peugeot_807_160_NAVTECH_ON_BOARDprivatAngebot$5,000controlbus2004manuell158andere150,000km3lpgpeugeotnein2016-03-26 00:00:000795882016-04-06 06:45:54
12016-04-04 13:38:56BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_OptikprivatAngebot$8,500controllimousine1997automatik2867er150,000km6benzinbmwnein2016-04-04 00:00:000710342016-04-06 14:45:08
22016-03-26 18:57:24Volkswagen_Golf_1.6_UnitedprivatAngebot$8,990testlimousine2009manuell102golf70,000km7benzinvolkswagennein2016-03-26 00:00:000353942016-04-06 20:15:37
32016-03-12 16:58:10Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...privatAngebot$4,350controlkleinwagen2007automatik71fortwo70,000km6benzinsmartnein2016-03-12 00:00:000337292016-03-15 03:16:28
42016-04-01 14:38:50Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...privatAngebot$1,350testkombi2003manuell0focus150,000km7benzinfordnein2016-04-01 00:00:000392182016-04-01 14:38:50
\n", "
" ], "text/plain": [ " dateCrawled name \\\n", "0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD \n", "1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik \n", "2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United \n", "3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... \n", "4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... \n", "\n", " seller offerType price abtest vehicleType yearOfRegistration \\\n", "0 privat Angebot $5,000 control bus 2004 \n", "1 privat Angebot $8,500 control limousine 1997 \n", "2 privat Angebot $8,990 test limousine 2009 \n", "3 privat Angebot $4,350 control kleinwagen 2007 \n", "4 privat Angebot $1,350 test kombi 2003 \n", "\n", " gearbox powerPS model odometer monthOfRegistration fuelType \\\n", "0 manuell 158 andere 150,000km 3 lpg \n", "1 automatik 286 7er 150,000km 6 benzin \n", "2 manuell 102 golf 70,000km 7 benzin \n", "3 automatik 71 fortwo 70,000km 6 benzin \n", "4 manuell 0 focus 150,000km 7 benzin \n", "\n", " brand notRepairedDamage dateCreated nrOfPictures \\\n", "0 peugeot nein 2016-03-26 00:00:00 0 \n", "1 bmw nein 2016-04-04 00:00:00 0 \n", "2 volkswagen nein 2016-03-26 00:00:00 0 \n", "3 smart nein 2016-03-12 00:00:00 0 \n", "4 ford nein 2016-04-01 00:00:00 0 \n", "\n", " postalCode lastSeen \n", "0 79588 2016-04-06 06:45:54 \n", "1 71034 2016-04-06 14:45:08 \n", "2 35394 2016-04-06 20:15:37 \n", "3 33729 2016-03-15 03:16:28 \n", "4 39218 2016-04-01 14:38:50 " ] }, "execution_count": 285, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos.head()" ] }, { "cell_type": "code", "execution_count": 286, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 50000 entries, 0 to 49999\n", "Data columns (total 20 columns):\n", "dateCrawled 50000 non-null object\n", "name 50000 non-null object\n", "seller 50000 non-null object\n", "offerType 50000 non-null object\n", "price 50000 non-null object\n", "abtest 50000 non-null object\n", "vehicleType 44905 non-null object\n", "yearOfRegistration 50000 non-null int64\n", "gearbox 47320 non-null object\n", "powerPS 50000 non-null int64\n", "model 47242 non-null object\n", "odometer 50000 non-null object\n", "monthOfRegistration 50000 non-null int64\n", "fuelType 45518 non-null object\n", "brand 50000 non-null object\n", "notRepairedDamage 40171 non-null object\n", "dateCreated 50000 non-null object\n", "nrOfPictures 50000 non-null int64\n", "postalCode 50000 non-null int64\n", "lastSeen 50000 non-null object\n", "dtypes: int64(5), object(15)\n", "memory usage: 7.6+ MB\n" ] }, { "data": { "text/plain": [ "dateCrawled object\n", "name object\n", "seller object\n", "offerType object\n", "price object\n", "abtest object\n", "vehicleType object\n", "yearOfRegistration int64\n", "gearbox object\n", "powerPS int64\n", "model object\n", "odometer object\n", "monthOfRegistration int64\n", "fuelType object\n", "brand object\n", "notRepairedDamage object\n", "dateCreated object\n", "nrOfPictures int64\n", "postalCode int64\n", "lastSeen object\n", "dtype: object" ] }, "execution_count": 286, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Details about the dataframe - number of rows and columns, names of columns and types of data they contain\n", "autos.info()\n", "autos.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Observations on the input dataset\n", "There are 50,000 rows with 20 columns of datapoints which are mostly strings. \n", "\n", "There are 5 numeric columns.\n", "\n", "Some columns have missing values.\n", "\n", "More observations on patterns and the need to clean: \n", "\n", "- Some column labels are not as readable or too long. The following edits will make them better\n", "\n", "Replace\n", " 1. `yearOfRegistration` with `registration_year`\n", " 2. `monthOfRegistration` with `registration_month`\n", " 3. `notRepairedDamage` with `unrepaired_damage`\n", " 4. `dateCreated` with `ad_created`\n", " \n", "- Some column labels are in camelCase. It will be more user friendly to comply with Python's `lower_case_with_underscores` snake format\n", "\n", "Define functions to make the two changes to them." ] }, { "cell_type": "code", "execution_count": 287, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',\n", " 'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',\n", " 'odometer', 'monthOfRegistration', 'fuelType', 'brand',\n", " 'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',\n", " 'lastSeen'],\n", " dtype='object')\n", "['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest', 'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model', 'odometer', 'monthOfRegistration', 'fuelType', 'brand', 'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode', 'lastSeen']\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateCrawlednamesellerofferTypepriceabtestvehicleTypeyearOfRegistrationgearboxpowerPSmodelodometermonthOfRegistrationfuelTypebrandnotRepairedDamagedateCreatednrOfPicturespostalCodelastSeen
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [dateCrawled, name, seller, offerType, price, abtest, vehicleType, yearOfRegistration, gearbox, powerPS, model, odometer, monthOfRegistration, fuelType, brand, notRepairedDamage, dateCreated, nrOfPictures, postalCode, lastSeen]\n", "Index: []" ] }, "execution_count": 287, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get column names using .columns attribute\n", "print (autos.columns)\n", "print (list(autos.columns))\n", "# While this is useful for looping over in cleaning the column names, df.head() is also useful for a quick view of the names in a table format \n", "autos.head(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make column labels readable" ] }, { "cell_type": "code", "execution_count": 288, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Function that edits existing column names to be readable\n", "def edit_cols(col):\n", " col = col.replace('yearOfRegistration', 'registration_year')\n", " col = col.replace('monthOfRegistration', 'registration_month')\n", " col = col.replace('notRepairedDamage', 'unrepaired_damage')\n", " col = col.replace('dateCreated', 'ad_created')\n", " return col " ] }, { "cell_type": "code", "execution_count": 289, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',\n", " 'vehicleType', 'registration_year', 'gearbox', 'powerPS', 'model',\n", " 'odometer', 'registration_month', 'fuelType', 'brand',\n", " 'unrepaired_damage', 'ad_created', 'nrOfPictures', 'postalCode',\n", " 'lastSeen'],\n", " dtype='object')\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateCrawlednamesellerofferTypepriceabtestvehicleTyperegistration_yeargearboxpowerPSmodelodometerregistration_monthfuelTypebrandunrepaired_damagead_creatednrOfPicturespostalCodelastSeen
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [dateCrawled, name, seller, offerType, price, abtest, vehicleType, registration_year, gearbox, powerPS, model, odometer, registration_month, fuelType, brand, unrepaired_damage, ad_created, nrOfPictures, postalCode, lastSeen]\n", "Index: []" ] }, "execution_count": 289, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Implement the above function on the columns\n", "ed_cols = []\n", "\n", "for col in list(autos.columns):\n", " ed_cols.append(edit_cols(col))\n", "# Assign modified column names list back to the .columns attribute\n", "autos.columns = ed_cols\n", "\n", "print (autos.columns)\n", "autos.head(0)\n" ] }, { "cell_type": "code", "execution_count": 290, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Function that converts column names to python snake case\n", "def to_snake(col):\n", " snake = col[0].lower()\n", " return (snake + ''.join( '_'+l.lower() if l.isupper() else l for l in col[1:]) )" ] }, { "cell_type": "code", "execution_count": 291, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Implement the above function on the columns\n", "snaked_cols = []\n", "for c in autos.columns:\n", " snaked_cols.append(to_snake(c))\n", "# Assign modified column names list back to the .columns attribute\n", "autos.columns = snaked_cols" ] }, { "cell_type": "code", "execution_count": 292, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date_crawlednameselleroffer_typepriceabtestvehicle_typeregistration_yeargearboxpower_p_smodelodometerregistration_monthfuel_typebrandunrepaired_damagead_creatednr_of_picturespostal_codelast_seen
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [date_crawled, name, seller, offer_type, price, abtest, vehicle_type, registration_year, gearbox, power_p_s, model, odometer, registration_month, fuel_type, brand, unrepaired_damage, ad_created, nr_of_pictures, postal_code, last_seen]\n", "Index: []" ] }, "execution_count": 292, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Column names after the modifications\n", "autos.head(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data exploration and cleaning" ] }, { "cell_type": "code", "execution_count": 293, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date_crawlednameselleroffer_typepriceabtestvehicle_typeregistration_yeargearboxpower_p_smodelodometerregistration_monthfuel_typebrandunrepaired_damagead_creatednr_of_picturespostal_codelast_seen
count5000050000500005000050000500004490550000.0000004732050000.000000472425000050000.0000004551850000401715000050000.050000.00000050000
unique482133875422235728NaN2NaN24513NaN740276NaNNaN39481
top2016-03-05 16:57:05Ford_FiestaprivatAngebot$0testlimousineNaNmanuellNaNgolf150,000kmNaNbenzinvolkswagennein2016-04-03 00:00:00NaNNaN2016-04-07 06:17:27
freq378499994999914212575612859NaN36993NaN402432424NaN3010710687352321946NaNNaN8
meanNaNNaNNaNNaNNaNNaNNaN2005.073280NaN116.355920NaNNaN5.723360NaNNaNNaNNaN0.050813.627300NaN
stdNaNNaNNaNNaNNaNNaNNaN105.712813NaN209.216627NaNNaN3.711984NaNNaNNaNNaN0.025779.747957NaN
minNaNNaNNaNNaNNaNNaNNaN1000.000000NaN0.000000NaNNaN0.000000NaNNaNNaNNaN0.01067.000000NaN
25%NaNNaNNaNNaNNaNNaNNaN1999.000000NaN70.000000NaNNaN3.000000NaNNaNNaNNaN0.030451.000000NaN
50%NaNNaNNaNNaNNaNNaNNaN2003.000000NaN105.000000NaNNaN6.000000NaNNaNNaNNaN0.049577.000000NaN
75%NaNNaNNaNNaNNaNNaNNaN2008.000000NaN150.000000NaNNaN9.000000NaNNaNNaNNaN0.071540.000000NaN
maxNaNNaNNaNNaNNaNNaNNaN9999.000000NaN17700.000000NaNNaN12.000000NaNNaNNaNNaN0.099998.000000NaN
\n", "
" ], "text/plain": [ " date_crawled name seller offer_type price abtest \\\n", "count 50000 50000 50000 50000 50000 50000 \n", "unique 48213 38754 2 2 2357 2 \n", "top 2016-03-05 16:57:05 Ford_Fiesta privat Angebot $0 test \n", "freq 3 78 49999 49999 1421 25756 \n", "mean NaN NaN NaN NaN NaN NaN \n", "std NaN NaN NaN NaN NaN NaN \n", "min NaN NaN NaN NaN NaN NaN \n", "25% NaN NaN NaN NaN NaN NaN \n", "50% NaN NaN NaN NaN NaN NaN \n", "75% NaN NaN NaN NaN NaN NaN \n", "max NaN NaN NaN NaN NaN NaN \n", "\n", " vehicle_type registration_year gearbox power_p_s model \\\n", "count 44905 50000.000000 47320 50000.000000 47242 \n", "unique 8 NaN 2 NaN 245 \n", "top limousine NaN manuell NaN golf \n", "freq 12859 NaN 36993 NaN 4024 \n", "mean NaN 2005.073280 NaN 116.355920 NaN \n", "std NaN 105.712813 NaN 209.216627 NaN \n", "min NaN 1000.000000 NaN 0.000000 NaN \n", "25% NaN 1999.000000 NaN 70.000000 NaN \n", "50% NaN 2003.000000 NaN 105.000000 NaN \n", "75% NaN 2008.000000 NaN 150.000000 NaN \n", "max NaN 9999.000000 NaN 17700.000000 NaN \n", "\n", " odometer registration_month fuel_type brand unrepaired_damage \\\n", "count 50000 50000.000000 45518 50000 40171 \n", "unique 13 NaN 7 40 2 \n", "top 150,000km NaN benzin volkswagen nein \n", "freq 32424 NaN 30107 10687 35232 \n", "mean NaN 5.723360 NaN NaN NaN \n", "std NaN 3.711984 NaN NaN NaN \n", "min NaN 0.000000 NaN NaN NaN \n", "25% NaN 3.000000 NaN NaN NaN \n", "50% NaN 6.000000 NaN NaN NaN \n", "75% NaN 9.000000 NaN NaN NaN \n", "max NaN 12.000000 NaN NaN NaN \n", "\n", " ad_created nr_of_pictures postal_code last_seen \n", "count 50000 50000.0 50000.000000 50000 \n", "unique 76 NaN NaN 39481 \n", "top 2016-04-03 00:00:00 NaN NaN 2016-04-07 06:17:27 \n", "freq 1946 NaN NaN 8 \n", "mean NaN 0.0 50813.627300 NaN \n", "std NaN 0.0 25779.747957 NaN \n", "min NaN 0.0 1067.000000 NaN \n", "25% NaN 0.0 30451.000000 NaN \n", "50% NaN 0.0 49577.000000 NaN \n", "75% NaN 0.0 71540.000000 NaN \n", "max NaN 0.0 99998.000000 NaN " ] }, "execution_count": 293, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Summary stats of all columns - numeric and non-numeric\n", "autos.describe(include='all')\n" ] }, { "cell_type": "code", "execution_count": 294, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
registration_yearpower_p_sregistration_monthnr_of_picturespostal_code
count50000.00000050000.00000050000.00000050000.050000.000000
mean2005.073280116.3559205.7233600.050813.627300
std105.712813209.2166273.7119840.025779.747957
min1000.0000000.0000000.0000000.01067.000000
25%1999.00000070.0000003.0000000.030451.000000
50%2003.000000105.0000006.0000000.049577.000000
75%2008.000000150.0000009.0000000.071540.000000
max9999.00000017700.00000012.0000000.099998.000000
\n", "
" ], "text/plain": [ " registration_year power_p_s registration_month nr_of_pictures \\\n", "count 50000.000000 50000.000000 50000.000000 50000.0 \n", "mean 2005.073280 116.355920 5.723360 0.0 \n", "std 105.712813 209.216627 3.711984 0.0 \n", "min 1000.000000 0.000000 0.000000 0.0 \n", "25% 1999.000000 70.000000 3.000000 0.0 \n", "50% 2003.000000 105.000000 6.000000 0.0 \n", "75% 2008.000000 150.000000 9.000000 0.0 \n", "max 9999.000000 17700.000000 12.000000 0.0 \n", "\n", " postal_code \n", "count 50000.000000 \n", "mean 50813.627300 \n", "std 25779.747957 \n", "min 1067.000000 \n", "25% 30451.000000 \n", "50% 49577.000000 \n", "75% 71540.000000 \n", "max 99998.000000 " ] }, "execution_count": 294, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Summary stats of just numeric columns to better explore how good data quality is\n", "autos.describe()" ] }, { "cell_type": "code", "execution_count": 295, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date_crawlednameselleroffer_typepriceabtestvehicle_typegearboxmodelodometerfuel_typebrandunrepaired_damagead_createdlast_seen
count500005000050000500005000050000449054732047242500004551850000401715000050000
unique48213387542223572822451374027639481
top2016-03-05 16:57:05Ford_FiestaprivatAngebot$0testlimousinemanuellgolf150,000kmbenzinvolkswagennein2016-04-03 00:00:002016-04-07 06:17:27
freq3784999949999142125756128593699340243242430107106873523219468
\n", "
" ], "text/plain": [ " date_crawled name seller offer_type price abtest \\\n", "count 50000 50000 50000 50000 50000 50000 \n", "unique 48213 38754 2 2 2357 2 \n", "top 2016-03-05 16:57:05 Ford_Fiesta privat Angebot $0 test \n", "freq 3 78 49999 49999 1421 25756 \n", "\n", " vehicle_type gearbox model odometer fuel_type brand \\\n", "count 44905 47320 47242 50000 45518 50000 \n", "unique 8 2 245 13 7 40 \n", "top limousine manuell golf 150,000km benzin volkswagen \n", "freq 12859 36993 4024 32424 30107 10687 \n", "\n", " unrepaired_damage ad_created last_seen \n", "count 40171 50000 50000 \n", "unique 2 76 39481 \n", "top nein 2016-04-03 00:00:00 2016-04-07 06:17:27 \n", "freq 35232 1946 8 " ] }, "execution_count": 295, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Summary stats of non-numeric columns\n", "autos.describe(include=['O'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Observations on numeric columns:\n", "- Earliest and latest `registration_year` values 1000 and 9999 need further digging into\n", "- `registration_month` is 0 for over 5K data points. It needs to be looked into.\n", "- `power_in_ps` for some listings is 0. That means car is a dud or for scrap only\n", "- `nr_of_pictures` has 0 for all 50K values, which is not useful for analysis and can be dropped\n", "\n", "## Observations on non-numeric columns:\n", "- Some of the columns have very few unique values and need translation from German to figure usefulness for analysis\n", "- `price` has non-digit characters ($ and punctuation - commas). Removing those characters can help converting it to numeric data type\n", "- `price` has values of 0 which is strange and needs further study\n", "- `odometer` has non-digit characters (km and punctuation - commas), which can be removed and column data type converted to numeric as well \n", "\n", "Get a closer look at the counts on numeric columns\n" ] }, { "cell_type": "code", "execution_count": 296, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 5075\n", "3 5071\n", "6 4368\n", "5 4107\n", "4 4102\n", "7 3949\n", "10 3651\n", "12 3447\n", "9 3389\n", "11 3360\n", "1 3282\n", "8 3191\n", "2 3008\n", "Name: registration_month, dtype: int64" ] }, "execution_count": 296, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['registration_month'].value_counts()" ] }, { "cell_type": "code", "execution_count": 297, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 5500\n", "75 3171\n", "60 2195\n", "150 2046\n", "140 1884\n", "101 1756\n", "90 1746\n", "116 1646\n", "170 1492\n", "105 1410\n", "Name: power_p_s, dtype: int64" ] }, "execution_count": 297, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['power_p_s'].value_counts().head(10)" ] }, { "cell_type": "code", "execution_count": 298, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 50000\n", "Name: nr_of_pictures, dtype: int64" ] }, "execution_count": 298, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['nr_of_pictures'].value_counts()" ] }, { "cell_type": "code", "execution_count": 299, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Convert string columns to numeric data type \n", "autos['price'] = autos['price'].str.replace('$','')\n", "autos['price'] = autos['price'].str.replace(',','')\n", "autos['odometer'] =autos['odometer'].str.replace('km','')\n", "autos['odometer'] =autos['odometer'].str.replace(',','')\n", "\n", "autos[ ['price', 'odometer'] ] = autos[ ['price', 'odometer'] ].astype(int)" ] }, { "cell_type": "code", "execution_count": 300, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Rename odometer column to odometer_km, since, km is a useful unit, for information \n", "autos.rename({\"odometer\": \"odometer_km\"}, axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": 301, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "date_crawled object\n", "name object\n", "seller object\n", "offer_type object\n", "price int64\n", "abtest object\n", "vehicle_type object\n", "registration_year int64\n", "gearbox object\n", "power_p_s int64\n", "model object\n", "odometer_km int64\n", "registration_month int64\n", "fuel_type object\n", "brand object\n", "unrepaired_damage object\n", "ad_created object\n", "nr_of_pictures int64\n", "postal_code int64\n", "last_seen object\n", "dtype: object" ] }, "execution_count": 301, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# dataframe.dtypes will help verify the changed data types and column names\n", "autos.dtypes" ] }, { "cell_type": "code", "execution_count": 302, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "date_crawled 0\n", "name 0\n", "seller 0\n", "offer_type 0\n", "price 0\n", "abtest 0\n", "vehicle_type 5095\n", "registration_year 0\n", "gearbox 2680\n", "power_p_s 0\n", "model 2758\n", "odometer_km 0\n", "registration_month 0\n", "fuel_type 4482\n", "brand 0\n", "unrepaired_damage 9829\n", "ad_created 0\n", "nr_of_pictures 0\n", "postal_code 0\n", "last_seen 0\n", "dtype: int64" ] }, "execution_count": 302, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos.isnull().sum()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "As observed earlier, some columns have null values, however, not more than about 20%" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis of numeric columns `price` and `odometer_km`" ] }, { "cell_type": "code", "execution_count": 303, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2357,)\n", "(13,)\n" ] } ], "source": [ "print (autos['price'].unique().shape)\n", "print (autos['odometer_km'].unique().shape)" ] }, { "cell_type": "code", "execution_count": 304, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([150000, 70000, 50000, 80000, 10000, 30000, 125000, 90000,\n", " 20000, 60000, 5000, 100000, 40000])" ] }, "execution_count": 304, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['odometer_km'].unique()" ] }, { "cell_type": "code", "execution_count": 352, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "150000 25936\n", "125000 4441\n", "100000 1848\n", "90000 1522\n", "80000 1279\n", "70000 1087\n", "60000 1043\n", "50000 932\n", "40000 754\n", "30000 707\n", "20000 673\n", "5000 469\n", "10000 202\n", "Name: odometer_km, dtype: int64" ] }, "execution_count": 352, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos.odometer_km.value_counts()" ] }, { "cell_type": "code", "execution_count": 358, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 358, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbgAAAD7CAYAAAAYcYHFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFdFJREFUeJzt3X9sVfUZx/FPYbIl2umIEw2UUCithd7+QkTi5mTywx9B\npYCOYKW6zOhIlsEcsiUizGnBBRy6LdkfYzjn9A83nYvDjSiNPzbcoHRLlLBEe1lhCI6WAq6Ulj77\nw3ADWqDnntP2Pue8XwmGXk5P371oH/t9aMkzMxMAADEzZLADAADoDww4AEAsMeAAALHEgAMAxBID\nDgAQSww4AEAsfWawA7zLy8sb7AQAcKm/v0qNz+AiYGY5/+Ohhx4a9Ia4dHpopJPOXP8xEBhwCZFO\npwc7oU88dHpolOiMGp3+MOAAALHEgEuIurq6wU7oEw+dHholOqNGpz95NlCHoTGVl5c3YOfJABAX\nA/Gxk8/gEqKhoWGwE/rEQ6eHRonOqNHpDwMOABBLHFGGxBElAATHESUAAFliwCWEl3N5D50eGiU6\no0anPww4AEAssYMLiR0cAATHDg4AgCwx4BLCy7m8h04PjRKdUaPTHwYcACCW2MGFxA4OAIJjBwcA\nQJYYcAnh5VzeQ6eHRonOqNHpDwMOABBL7OBCYgcHAMGxgwMAIEsMuITwci7vodNDo0Rn1Oj0hwEH\nAIgldnAhsYMDgODYwQEAkCUGXEJ4OZf30OmhUaIzanT6w4ADAMQSO7iQ2MEBQHDs4AAAyBIDLiG8\nnMt76PTQKNEZNTr9YcABAGKJHVxI7OAAIDh2cAAAZIkBlxBezuU9dHpolOiMGp3+MOAAALHEDi4k\ndnAAEBw7OAAAssSASwgv5/IeOj00SnRGjU5/GHAAgFhiBxcSOzgACI4dHAAAWWLAJYSXc3kPnR4a\nJTqjRqc/DDgAQCyxgwuJHRwABMcODgCALDHgEsLLubyHTg+NEp1Ro9MfBhwAIJbYwYXEDg4AgmMH\nBwBAlhhwCeHlXN5Dp4dGic6o0ekPAw4AEEvs4EJiBwcAwbGDAwAgSwy4hPByLu+h00OjRGfU6PSH\nAQcAiCV2cCGxgwOA4NjBAQCQJQZcQng5l/fQ6aFRojNqdPrDgAMAxBI7uJDYwQFAcOzgAADIEgMu\nIbycy3vo9NAo0Rk1Ov1hwAEAYokdXEjs4AAgOHZwAABkiQGXEF7O5T10emiU6Iwanf4w4AAAscQO\nLiR2cAAQHDs4AACyxIBLCC/n8h46PTRKdEaNTn8YcACAWGIHFxI7OAAIjh0cAABZYsAlhJdzeQ+d\nHholOqNGpz8MOABALLGDC4kdHAAExw4OAIAsMeASwsu5vIdOD40SnVGj0x8GHAAgltjBhcQODgCC\nYwcHAECWGHAJ4eVc3kOnh0aJzqjR6Q8DDgAQS+zgQmIHBwDBsYMDACBLDLiE8HIu76HTQ6NEZ9To\n9IcBBwCIJXZwIbGDA4Dg2MEBAJAlBlxCeDmX99DpoVGiM2p0+sOAAwDEEju4kNjBAUBw7OAAAMgS\nAy4hvJzLe+j00Cj56czPb1Benhz8SG7nx//whwEHYFAdPSqZ5f6PLVsGv2GwOr1iBxcSOzggnLw8\n3x9EE6EffpPYwQEAkCUGXEJ42cd46PTQKPnplBoGO6BPvDyfXjoHAgMOABBL7OBCYgcHhMMOzgF2\ncAAA5A4GXEJ4OZf30OmhUfLTyQ4uWl46BwIDDlnLc/rFnwCSgR1cSEnewSX5fUd02ME5wA4OAIDc\nwYBLCC/n8h46PTRKfjrZwUXLS+dAiGTArVq1SuvWrYviVmdUX18f2b0KCwvV2toa2f0AALknkh3c\nqlWrlJ+fr6VLl0bR1Kv8/HwdOXIk0Ov09PRoyJBPz/CxY8dq27ZtGj58eOiuvp4jBzlv7o+zaS/3\nRPKwg3Mgzju4devWKZVKqby8XE888YQk6ZFHHlFJSYmuueYa7dq1K3NtU1OTpk6dqsrKSs2dO1ft\n7e2SpGnTpmnp0qWaPHmyJkyYoL///e+qqalRSUmJHnzwwczrP/PMM5oyZYqqq6t13333qaenR9/7\n3vfU0dGh6upq1dbW9nrdyScqPz9f999/v6qqqrR169Ze35+T13Z0dOjGG2/UL37xC+3evVulpaW6\n6667VFJSooULF2rz5s26+uqrVVJSom3btgV9bgEAg8nOYfv27VZeXm4dHR129OhRKysryzx27Ngx\nO3z4sBUVFdnatWvNzKy8vNzeeOMNMzNbsWKFLVmyxMzMrr32Wlu+fLmZma1fv94uu+wy279/v3V2\ndtqoUaOstbXVdu7cabNnz7bu7m4zM/vmN79pTz/9tJmZ5efnZ5rOdl1eXp49//zzZ32fCgsLLZ1O\n2/Tp0+3Xv/61mZml02k777zz7J133jEzs0mTJtndd99tZma///3v7dZbb+31Xn14CgNdF/TauN1z\ny5Ytkd8zah4azfx0SlsGO6FPvDyf/dLp5OPHJ53zM7g333xTc+bM0ec+9zmdf/75qqmp0csvv6w5\nc+bos5/9rPLz83XzzTdLkg4fPqz29nZ96UtfkiQtWrRIr7/+euZeJ69LpVJKpVK65JJLNGzYMI0b\nN04tLS169dVX1djYqMmTJ6uqqkqvvfaampubTw7izH3Odt3QoUNVU1NzrqGuW2+9VXfffbcWLlyY\nebywsFATJkyQJE2cOFHTp0/P9O7evfuM96urq9PKlSu1cuVK/fjHP84sefPy8jI/entZ+ngh3NDQ\ncNZrT10an7y+Ly8HefvZvpyX16C6ut5/feXKk3/54ql/AePZr582rSnQ9UHvH8X1q1c35VTPma7f\nuDG3es50vdT788nL2b3c1JSbz2dDQ4Pq6uoyHy8HxLkm4Pr16+2hhx7KvPzggw/aww8/fNpjS5cu\ntbVr11p7e7uNHj068/h7771nkyZNMrOPP4Pbvn27mZk1NDTY7NmzM9ed/LUnn3zSvv/97/faccEF\nF2R+frbrTv1M70zGjBljixcvtjvvvDPzWDqdtlQqlXm5rq7Ofvvb3/b6a6fqw1MY6Lqg18btnkge\n/jVywOnHj3N+BvflL39ZL774oo4dO6aPPvpIL774om666Sa98MIL6uzs1JEjR/SHP/xBkvT5z39e\nw4cP11tvvSVJevrpp/WVr3ylz8P2uuuu0/PPP68PP/xQktTW1qaWlhZJ0rBhw3TixIlzXmd9XFr+\n4Ac/0EUXXaTFixdnHjvb6/b1vgCA3HDOAVdVVaW6ujpNnjxZU6dO1Te+8Q1VVVXp9ttvV3l5uW66\n6SZdeeWVmes3btyo+++/X5WVlfrHP/6hFStWSDr7t3U6+WulpaX64Q9/qJkzZ6qiokIzZ87Uvn37\nJEn33HOPUqmUamtrVVpaqocffrjX6/ry7aNOXrN+/XodO3ZMy5cv/9TrfvI+fFuqgXHq8Uau8tAo\n+enk6+Ci5aVzIPCtukLy8kflGxoadO2110Z6z/543/ujM2oeGiU/nXl5DTK7drAzzsnL89kvnU6/\nTIABF5KXAdcfkvy+Izp8HZwDTgfcZ/r17oOspqZG6XRa0sc7tLy8PK1Zs0YzZswY3DAAQL+L9fei\n/N3vfqfGxkY1NjZqx44damxsTOxw83Iu76HTQ6Pkp5MdXLS8dA6EWA84AEBysYMLKcl7qCS/74gO\nOzgHnO7g+AwOWWO4AchlDLiE8HIu76HTQ6Pkp5MdXLS8dA4EBhwAIJbYwYXEHgoIhx2cA+zgAADI\nHQy4hPByLu+h00Oj5KeTHVy0vHQOBAYcACCW2MGFxA4OCIe/qCP3mXzu4GL9vSgB5D7+/9ADn79J\nHFEmhJdzeQ+dHholOqNGpz8MOABALLGDC4kdHAAEx9fBAQCQJQZcQng5l/fQ6aFRojNqdPrDgAMA\nxBI7uJDYwQFAcOzgAADIEgMuIbycy3vo9NAo0Rk1Ov1hwAEAYokdXEjs4AAgOHZwAABkiQGXEF7O\n5T10emiU6Iwanf4w4AAAscQOLiR2cAAQHDs4AACyxIBLCC/n8h46PTRKdEaNTn8YcACAWGIHFxI7\nOAAIjh0cAABZYsAlhJdzeQ+dHholOqNGpz8MOABALLGDC4kdHAAExw4OAIAsMeASwsu5vIdOD40S\nnVGj0x8GHAAgltjBhcQODgCCYwcHAECWGHAJ4eVc3kOnh0aJzqjR6Q8DDgAQS+zgQmIHBwDBsYMD\nACBLDLiE8HIu76HTQ6NEZ9To9IcBBwCIJXZwIbGDA4Dg2MEBAJAlBlxCeDmX99DpoVGiM2p0+sOA\nAwDEEju4kNjBAUBw7OAAAMgSAy4hvJzLe+j00CjRGTU6/WHAAQBiiR1cSOzgACA4dnAAAGSJAZcQ\nXs7lPXR6aJTojBqd/jDgAACxxA4uJHZwABAcOzgAALLEgEsIL+fyHjo9NEp0Ro1OfxhwAIBYYgcX\nEjs4AAiOHRwAAFliwCWEl3N5D50eGiU6o0anPww4AEAssYMLiR0cAATHDg4AgCwx4BLCy7m8h04P\njRKdUaPTHwYcACCW2MGFxA4OAIJjBwcAQJYYcAnh5VzeQ6eHRonOqNHpDwMOABBL7OBCYgcHAMGx\ngwMAIEsMuITwci7vodNDo0Rn1Oj0hwEHAIgldnAhsYMDgODYwQEAkCUGXEJ4OZf30OmhUaIzanT6\nw4ADAMQSO7iQ2MEBQHDs4AAAyBIDLiG8nMt76PTQKNEZNTr9YcABAGKJHVxI7OAAIDh2cAAAZIkB\nlxBezuU9dHpolOiMGp3+MOAAALHEDi4kdnAAEBw7OAAAssSASwgv5/IeOj00SnRGjU5/GHAAgFhi\nBxcSOzgACI4dHAAAWWLAJYSXc3kPnR4aJTqjRqc/DDgAQCyxgwuJHRwABMcODgCALDHgEsLLubyH\nTg+NEp1Ro9MfBlxCNDU1DXZCn3jo9NAo0Rk1Ov1hwCXEoUOHBjuhTzx0emiU6Iwanf4w4AAAscSA\nS4h0Oj3YCX3iodNDo0Rn1Oj0hy8TCCkvL2+wEwDApf4eP5/p17snAP9/AAC5iSNKAEAsMeAAALHE\ngMvSK6+8ossvv1zFxcVas2ZNv7+9PXv26Ktf/aomTpyoVCqlJ554QpLU1tammTNnqqSkRLNmzVJ7\ne3vmderr6zV+/HiVlpbqz3/+c+bxxsZGlZeXq7i4WN/+9rczjx8/flxf+9rXNH78eE2dOlX//ve/\ns+7t6elRdXW1br755pztbG9v1/z581VaWqqJEyfq7bffzsnO+vp6TZw4UeXl5Vq4cKGOHz+eE51f\n//rXNWLECJWXl2ceG6iup556SsXFxSopKdGvfvWrwJ3Lli1TaWmpKisrNXfuXB0+fDgnO09au3at\nhgwZotbW1pztfPLJJ1VaWqpUKqXly5cPeqckyRDYiRMnbNy4cZZOp+348eNWUVFhO3fu7Ne3uW/f\nPtuxY4eZmR05csSKi4tt586dtmzZMluzZo2Zma1evdoeeOABMzN75513rLKy0rq6uqy5udnGjRtn\nPT09ZmZ25ZVX2t/+9jczM7vhhhvslVdeMTOzn/3sZ3bfffeZmdlzzz1nt99+e9a969ats4ULF9rs\n2bPNzHKyc9GiRbZhwwYzM+vq6rJDhw7lXGc6nbbCwkLr7Ow0M7PbbrvNNm7cmBOdb7zxhu3YscNS\nqVTmsYHoam1ttbFjx9qhQ4esra0t8/MgnZs3b7YTJ06YmdkDDzxgy5cvz8lOM7OWlhabNWuWjRkz\nxg4ePGhmZu+++25OdW7ZssVmzJhhXV1dZmb24YcfDnqnmRkDLgt//etf7frrr8+8XF9fb6tXrx7Q\nhltuucU2b95sJSUl9sEHH5jZx0OwpKSk16brr7/etm7davv27bPS0tLM488++6zde++9ZmY2a9Ys\n27p1q5mZdXd328UXX5xVW0tLi02fPt22bNmSGXC51tne3m5jx4791OO51tna2molJSXW2tpqXV1d\nNnv27Jz6fU+n06d9oOvPri9+8YufusbM7N5777XnnnsuUOepXnjhBbvjjjtytnPevHn2z3/+87QB\nl2udt912m7366qufum6wOzmizMLevXtVUFCQeXnUqFHau3fvgL39dDqtpqYmXXXVVdq/f79GjBgh\nSbr00kt14MCBXhtHjhypvXv3au/evRo1alSv7ae+ztChQ3XRRReddiTSV0uWLNGPfvSj076EItc6\nm5ubdfHFF+uuu+5SdXW17rnnHv3vf//Luc4vfOEL+s53vqPRo0dr5MiRuvDCCzV9+vSc6zzpwIED\n/dZ14YUXqrW19Yz3ytaGDRt044035mTnSy+9pIKCAqVSqdMez7XOf/3rX3r99dd11VVXadq0adq+\nfXtOdDLgnDl69KjmzZun9evX64ILLvjU1+FF+XV5lsWXQLz88ssaMWKEKisrz/r6g93Z3d2txsZG\nLV68WI2NjTr//PO1evXqnHs+33//fT3++OPavXu3/vOf/+ijjz7SM888k3OdZ5KrXSc98sgjOu+8\n87RgwYLI7hlVZ0dHhx599FGtWrUqkvt9UpTPZ3d3t9ra2rR161Y99thjmj9/fmT3DtPJgMvCyJEj\nT1t87tmzRyNHjuz3t9vd3a158+aptrZWt9xyiyRpxIgR2r9/vyTpgw8+0CWXXJJpbGlp+VTjmR7/\n5OucOHFChw8f1vDhwwM1vvXWW3rppZc0duxYLViwQK+99ppqa2t16aWX5lTnqFGjVFBQoCuuuEKS\nNHfuXDU2Nubc87lt2zZdffXVGj58uIYOHao5c+boL3/5S851njQQXVH997dx40b98Y9/1G9+85vM\nY7nU+d577ymdTquiokKFhYXas2ePqqurdeDAgTPee7Cez4KCAtXU1EiSJk+erKFDh+rgwYOD33nW\nA0z0qru7O/OHTDo7O62iosLefffdfn+7tbW1tmTJktMeW7ZsWeaMu7elfmdnp73//vunLXenTJli\nb7/9tvX09NgNN9xgmzZtMjOzn/70p5nl7rPPPhvqD5mYmTU0NGR2cN/97ndzrvOaa66xXbt2mZnZ\nypUrbdmyZTn3fDY1NVlZWZl1dHRYT0+PLVq0yH7yk5/kTGdzc7OVlZVlXh6IrlP/sMHJn7e1tQXq\n3LRpk02YMMH++9//nnZdrnWeasyYMdba2pqTnT//+c9txYoVZma2a9cuGz16dE50MuCytGnTJisu\nLraioiKrr6/v97f35ptv2pAhQ6yiosIqKyutqqrKNm3aZAcPHrTrrrvOiouLbcaMGaf9hj/66KM2\nbtw4u/zyy+1Pf/pT5vFt27ZZWVmZFRUV2be+9a3M48eOHbP58+dbUVGRTZkyxZqbm0M1nzrgcrGz\nqanJrrjiCquoqLA5c+bYoUOHcrLzscceswkTJlgqlbI777zTjh8/nhOdCxYssMsuu8yGDRtmBQUF\ntmHDBmttbR2Qrl/+8pdWVFRk48ePt6eeeipwZ1FRkY0ePdqqqqqsqqoq8wE11zpPVVhYmPlDJrnW\n2dXVZXfccYeVlZXZpEmTrKGhYdA7zcz4XpQAgFhiBwcAiCUGHAAglhhwAIBYYsABAGKJAQcAiCUG\nHAAglhhwAIBYYsABAGLp/95yiIzNc+FaAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "autos.odometer_km.plot.box(vert=False,grid=True)" ] }, { "cell_type": "code", "execution_count": 359, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 359, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAaMAAAEACAYAAAAeHRm0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X1QVNf9P/D38qDfRKqI0V3dRdfoQhYFAQPqNE0oQVAy\n4kOQYAwPqbGJdsZgnGLS6SQw0wpJJ7YYteMkGLGTSHGmQTopiiXuJDVRqgtNRihohAgbFxJBwCjP\n5/eH9f5E1Cy6d88q79cMM7uHPXs+Z9F9c+49e9EIIQSIiIgk8pBdABEREcOIiIikYxgREZF0DCMi\nIpKOYURERNIxjIiISDpVw6i7uxvz5s1DWFgYZs2ahd/85jcAgLa2NsTGxiIwMBBxcXFob29X+uTk\n5MBkMsFsNqOsrExpt1qtCAkJQUBAADIyMpT2np4eJCcnw2QyYcGCBTh37pyaUyIiIhWoGkajR4/G\nkSNHUFlZiS+//BKffPIJjh49itzcXMTExKC2thbR0dHIyckBAFRXV6OoqAg1NTUoLS3F+vXrce1j\nUOvWrUN+fj7q6upQV1eHQ4cOAQDy8/Ph5+eH06dPIyMjA5mZmWpOiYiIVKD6YboHH3wQwNVV0sDA\nAMaPH48DBw4gLS0NAJCWlobi4mIAQElJCZKTk+Hl5QWj0QiTyYSKigrY7XZ0dnYiIiICAJCamqr0\nuf65EhMTUV5ervaUiIjIyVQPo4GBAYSFhUGn0yEqKgpBQUFobm6GVqsFAOh0OrS0tAAAbDYb/P39\nlb56vR42mw02mw0Gg0FpNxgMsNlsQ/p4enrC19cXra2tak+LiIicyEvtATw8PFBZWYmOjg7ExcXB\nYrFAo9EMesyN9+8Gr25ERHTvUT2Mrhk7dizi4+Nx4sQJaLVaZXVkt9sxadIkAFdXQo2NjUqfpqYm\n6PX6W7Zf32fKlCno7+9HR0cH/Pz8hozvzMAjIhpJXPFLvqqH6b7//ntlp9yVK1dw+PBhhIWFISEh\nAXv27AEAFBQUYOnSpQCAhIQEFBYWoqenB/X19Thz5gwiIyOh0+kwbtw4VFRUQAiBvXv3DupTUFAA\nANi/fz+io6NvWY8Qwu2/3njjDek1sE7WyDpZ57UvV1F1ZXT+/HmkpaVBCIGBgQGkpKTgySefRFhY\nGJKSkrB7925MmzYNRUVFAICgoCAkJSUhKCgI3t7e2Llzp7Ki2bFjB9LT09HV1YX4+HgsWrQIALBm\nzRqkpKTAZDJhwoQJKCwsVHNKRESkAlXDKDg4GFardUi7n58f/vnPf960z2uvvYbXXnttSPvcuXPx\n1VdfDWkfPXq0EmZERHRv4hUY3ExUVJTsEhzCOp3nXqgRYJ3Odq/U6Soa4cqDghJpNBqXHv8kIrof\nuOq9kysjIiKSjmFERETSMYyIiEg6hhEREUnHMCIiIukYRkREJB3DiIiIpGMYERGRdAwjIiKSjmFE\nRETSMYyIiEg6hhEREUnHMCIiIukYRkREJB3DiIiIpGMYERGRdAwjIiKSjmFERETSMYyIiEg6hhER\nEUnHMCIiIukYRkREJB3DiIiIpGMYERGRdAwjIiIn0+mM0Gg0Lv/S6Yyyp37HNEIIIbsIV9BoNBgh\nUyUiyTQaDQAZ7zfOf59z1XunqiujpqYmREdHY9asWQgODsY777wDAMjOzobBYEB4eDjCw8Nx8OBB\npU9OTg5MJhPMZjPKysqUdqvVipCQEAQEBCAjI0Np7+npQXJyMkwmExYsWIBz586pOSUiIlKDUNH5\n8+dFZWWlEEKIzs5OERAQIGpqakRWVpZ4++23hzy+urpahIaGit7eXlFfXy9mzJghBgYGhBBCREZG\nioqKCiGEEIsXLxYHDx4UQgixc+dOsW7dOiGEEIWFheKZZ565aS0qT5WISAFAAELCl/Pf51z13qnq\nykin0yE0NBQA4OPjA7PZDJvNdi0Ehzz+wIEDSE5OhpeXF4xGI0wmEyoqKmC329HZ2YmIiAgAQGpq\nKoqLi5U+aWlpAIDExESUl5erOSUiIlKByzYwNDQ0oKqqCvPmzQMAbN++HaGhoXjhhRfQ3t4OALDZ\nbPD391f66PV62Gw22Gw2GAwGpd1gMCihdn0fT09P+Pr6orW11VXTIiIiJ3BJGF26dAmJiYnIy8uD\nj48P1q9fj7Nnz6Kqqgo6nQ6bNm1y2lg3W3EREZF781J7gL6+PiQmJiIlJQVLly4FAEycOFH5/tq1\na7FkyRIAV1dCjY2Nyveampqg1+tv2X59nylTpqC/vx8dHR3w8/O7aS1ZWVnK7aioKERFRTlrmkRE\n9wWLxQKLxeL6gdU+KZWSkiI2btw4qO38+fPK7a1bt4pVq1YJIYQ4deqUCA0NFd3d3eLs2bODNjDM\nmzdPHD9+XAwMDIjFixeL0tJSIYQQO3bsUDYw7Nu3jxsYiEg6cAPDsKm6Mjp69Cg++OADBAcHIyws\nDBqNBlu2bMGHH36IqqoqeHh4wGg0YteuXQCAoKAgJCUlISgoCN7e3ti5c+f/9usDO3bsQHp6Orq6\nuhAfH49FixYBANasWYOUlBSYTCZMmDABhYWFak6JiIhUwA+9EhE5GT/0Ony8HBAREUnHMCIiIukY\nRkREJB3DiIiIpGMYERGRdAwjIiKSjmFERETSMYyIiEg6hhEREUnHMCIiIukYRkREJB3DiIiIpGMY\nERGRdAwjIiKSjmFERETSMYyIiEg6hhEREUnHMCIiIukYRkREJB3DiIiIpGMYERGRdAwjIiKSjmFE\nRETSMYyIiEg6hhEREUnHMCIiIukYRkREJB3DiIiIpGMYERGRdKqGUVNTE6KjozFr1iwEBwdj27Zt\nAIC2tjbExsYiMDAQcXFxaG9vV/rk5OTAZDLBbDajrKxMabdarQgJCUFAQAAyMjKU9p6eHiQnJ8Nk\nMmHBggU4d+6cmlMiIiIVqBpGXl5e2Lp1K06dOoUvvvgCO3bswH//+1/k5uYiJiYGtbW1iI6ORk5O\nDgCguroaRUVFqKmpQWlpKdavXw8hBABg3bp1yM/PR11dHerq6nDo0CEAQH5+Pvz8/HD69GlkZGQg\nMzNTzSkREZEKVA0jnU6H0NBQAICPjw/MZjOamppw4MABpKWlAQDS0tJQXFwMACgpKUFycjK8vLxg\nNBphMplQUVEBu92Ozs5OREREAABSU1OVPtc/V2JiIsrLy9WcEhERqcBl54waGhpQVVWF+fPno7m5\nGVqtFsDVwGppaQEA2Gw2+Pv7K330ej1sNhtsNhsMBoPSbjAYYLPZhvTx9PSEr68vWltbXTUtIiJy\nAi9XDHLp0iUkJiYiLy8PPj4+0Gg0g75/4/27ce2w3s1kZWUpt6OiohAVFeW0cYmI7gcWiwUWi8Xl\n46oeRn19fUhMTERKSgqWLl0KANBqtcrqyG63Y9KkSQCuroQaGxuVvk1NTdDr9bdsv77PlClT0N/f\nj46ODvj5+d20luvDiIiIhrrxF/Xs7GyXjKv6Ybpf/OIXCAoKwssvv6y0JSQkYM+ePQCAgoICJaQS\nEhJQWFiInp4e1NfX48yZM4iMjIROp8O4ceNQUVEBIQT27t07qE9BQQEAYP/+/YiOjlZ7SkRE5GQa\ncbvjWnfp6NGjePzxxxEcHAyNRgONRoMtW7YgMjISSUlJaGxsxLRp01BUVARfX18AV7d25+fnw9vb\nG3l5eYiNjQUAnDx5Eunp6ejq6kJ8fDzy8vIAAN3d3UhJSUFlZSUmTJiAwsJCGI3GoRPVaG57CI+I\nyFmunnqQ8X7j/Pc5V713qhpG7oRhRESuwjAaPl6BgYiIpGMYERGRdAwjIiKSjmFERETSMYyIiEg6\nhhEREUnHMCIiIukYRkREJJ1DYfTVV1+pXQcREY1gDl2B4Wc/+xm6u7uRnp6O1atXY9y4ca6ozal4\nBQYichVegWH4HFoZffbZZ/jggw/Q2NiIuXPn4tlnn8Xhw4fVro2IiEaIYV2brr+/H8XFxdiwYQPG\njh0LIQS2bNmCFStWqFmjU3BlRESuwpXRHYzjSBh9+eWXeP/99/Hxxx9j4cKFWLNmDcLDw/Htt99i\nwYIF+Oabb1Qv9G4xjIjIVRhGdzCOI2H0xBNP4IUXXkBiYiIeeOCBQd/7y1/+gpSUFNUKdBaGERG5\nCsPoDsZxJIwuXbqEBx54AJ6engCAgYEBdHV14cEHH1S9QGdhGBGRqzCMhs+hDQwxMTG4cuWKcv/y\n5cuIiYlRrSgiIhpZHAqjrq4u+Pj4KPd9fHxw+fJl1YoiIqKRxaEwGjNmDKxWq3L/5MmTQ84dERER\n3SkvRx70pz/9CStXrsSUKVMghIDdbsdf//pXtWsjIqIRwuHPGfX29qK2thYAEBgYCG9vb1ULczZu\nYCAiV+EGhjsYx9Ew+vzzz9HQ0IC+vj6lLTU1VbXCnI1hRESuwjAaPocO06WkpODrr79GaGiosr1b\no9HcU2FERETuy6GVkdlsRnV19f/S/t7ElRERuQpXRsPn0G662bNnw263q10LERGNUA4dpvv+++8R\nFBSEyMhIjB49WmkvKSlRrTAiIho5HAqjrKwslcsgIqKRzOHddN988w1Onz6NmJgYXL58Gf39/fjJ\nT36idn1Ow3NGROQqPGc0fA6dM3r33XeRmJiIF198EQBgs9mwbNkyVQsjIqKRw6Ew2rFjB44ePYqx\nY8cCAEwmE1paWn6035o1a6DVahESEqK0ZWdnw2AwIDw8HOHh4Th48KDyvZycHJhMJpjNZpSVlSnt\nVqsVISEhCAgIQEZGhtLe09OD5ORkmEwmLFiwAOfOnXNkOkRE5GYcCqPRo0dj1KhRyv2+vj6Htnk/\n//zzOHTo0JD2V155BVarFVarFYsWLQIA1NTUoKioCDU1NSgtLcX69euVpeG6deuQn5+Puro61NXV\nKc+Zn58PPz8/nD59GhkZGcjMzHRkOkRE5GYcCqMnnngCW7ZswZUrV3D48GGsXLkSS5Ys+dF+jz32\nGMaPHz+k/WbHHw8cOIDk5GR4eXnBaDTCZDKhoqICdrsdnZ2diIiIAHD1qg/FxcVKn7S0NABAYmIi\nysvLHZkOERG5GYfCKDc3FxMnTkRwcDB27dqF+Ph4/O53v7vjQbdv347Q0FC88MILaG9vB3D1PJS/\nv7/yGL1eD5vNBpvNBoPBoLQbDAbYbLYhfTw9PeHr64vW1tY7rouIiORwaGu3h4cH1q5di7Vr1971\ngOvXr8frr78OjUaD3/72t9i0aRPee++9u35e4OYrrutdv0U9KioKUVFRThmXiOh+YbFYYLFYXD6u\nQ2E0ffr0m54jOnv27LAHnDhxonJ77dq1yuE+vV6PxsZG5XtNTU3Q6/W3bL++z5QpU9Df34+Ojg74\n+fndcmx+XoqI6PZu/EU9OzvbJeM6FEYnTpxQbnd1dWH//v0OHw4TQgxasdjtduh0OgDA3/72N8ye\nPRsAkJCQgNWrV2Pjxo2w2Ww4c+YMIiMjodFoMG7cOFRUVCAiIgJ79+7Fhg0blD4FBQWYN28e9u/f\nj+joaMdmTUREbsXhD73eaO7cuTh58uRtH/Pss8/CYrHgwoUL0Gq1yM7OxpEjR1BVVQUPDw8YjUbs\n2rULWq0WwNWt3fn5+fD29kZeXh5iY2MBXP3Lsunp6ejq6kJ8fDzy8vIAAN3d3UhJSUFlZSUmTJiA\nwsJCGI3Gm0+UH3olIhfhh17vYBxHwuj6Pzk+MDCAEydO4M9//jP+85//qFqcMzGMiMhVGEbD59Bh\nuk2bNv3/Dv/bel1UVKRaUURENLLc8WG6ew1XRkTkKlwZDZ9DK6OtW7fe9vuvvPKKU4ohIqKRyeHd\ndP/+97+RkJAAAPj73/+OyMhImEwmVYsjIqKRwaHDdI8//jg+/vhj5U9GdHZ24qmnnsKnn36qeoHO\nwsN0ROQqPEw3fA5dDqi5uXnQhVJHjRqF5uZm1YoiIqKRxaHDdKmpqYiMjMTy5csBAMXFxcoFSomI\niO6Ww7vprFYrPvvsMwBXD9uFhYWpWpiz8TAdEbkKD9MNn0OH6QDg8uXLGDt2LF5++WUYDAbU19er\nWRcREY0gDq2MsrOzceLECdTW1qKurg7ffvstVq5ciaNHj7qiRqfgyoiIXIUro+FzaGX00UcfoaSk\nBGPGjAEATJkyBZ2dnaoWRkREI4dDYTRq1ChoNBrlz0j88MMPqhZFREQji0NhlJSUhBdffBEXL17E\nu+++i5iYGKf8oT0iIiJgGLvpDh8+jLKyMgghEBcXh4ULF6pdm1PxnBERuQrPGd3BOD8WRv39/YiJ\nicGRI0dUL0ZNDCMichWG0fD96GE6T09PeHh4oL29XfViiIhoZHLoCgw+Pj4IDg7GwoULlR11ALBt\n2zbVCiMiopHDoTBasWIFVqxYoXYtREQ0Qt32nNG5c+cwdepUV9ajGp4zIiJX4Tmj4bvtOaNly5Yp\nt59++mnViyEiopHptmF0fRqePXtW9WKIiGhkum0YXbviwo23iYiInOm254w8PT0xZswYCCFw5coV\nPPjggwCurpg0Gg06OjpcVujd4jkjInIVnjMavtvupuvv71e9ACIiIof/nhEREZFaGEZERCQdw4iI\niKRjGBERkXSqhtGaNWug1WoREhKitLW1tSE2NhaBgYGIi4sbdAHWnJwcmEwmmM1mlJWVKe1WqxUh\nISEICAhARkaG0t7T04Pk5GSYTCYsWLAA586dU3M6RESkElXD6Pnnn8ehQ4cGteXm5iImJga1tbWI\njo5GTk4OAKC6uhpFRUWoqalBaWkp1q9fr2wnXLduHfLz81FXV4e6ujrlOfPz8+Hn54fTp08jIyMD\nmZmZak6HiIhUomoYPfbYYxg/fvygtgMHDiAtLQ0AkJaWhuLiYgBASUkJkpOT4eXlBaPRCJPJhIqK\nCtjtdnR2diIiIgIAkJqaqvS5/rkSExNRXl6u5nSIiEglLj9n1NLSAq1WCwDQ6XRoaWkBANhsNvj7\n+yuP0+v1sNlssNlsMBgMSrvBYIDNZhvSx9PTE76+vmhtbXXVVIiIyEkc+hMSanLmZYZ+7FPCWVlZ\nyu2oqChERUU5bWwiovuBxWKBxWJx+bguDyOtVovm5mZotVrY7XZMmjQJwNWVUGNjo/K4pqYm6PX6\nW7Zf32fKlCno7+9HR0cH/Pz8bjn29WFERERD3fiLenZ2tkvGVf0wnRBi0IolISEBe/bsAQAUFBRg\n6dKlSnthYSF6enpQX1+PM2fOIDIyEjqdDuPGjUNFRQWEENi7d++gPgUFBQCA/fv3Izo6Wu3pEBGR\nCm57odS79eyzz8JiseDChQvQarXIzs7GsmXLsHLlSjQ2NmLatGkoKiqCr68vgKtbu/Pz8+Ht7Y28\nvDzExsYCAE6ePIn09HR0dXUhPj4eeXl5AIDu7m6kpKSgsrISEyZMQGFhIYxG480nygulEpGL8EKp\ndzCOmmHkThhGROQqDKPh4xUYiIhIOoYRERFJxzAiIiLpGEZERCQdw4iIiKRjGBERkXQMIyIiko5h\nRERE0jGMiIhIOoYRERFJxzAiIiLpGEZERCQdw4iIiKRjGBERkXQMIyIiko5hRERE0jGMiIhIOoYR\nERFJxzAiIiLpGEZERCQdw4iIiKRjGBERkXQMIyIiko5hRERE0jGMiIhIOoYRERFJxzAiIiLpGEZE\nRCSdtDAyGo2YM2cOwsLCEBkZCQBoa2tDbGwsAgMDERcXh/b2duXxOTk5MJlMMJvNKCsrU9qtVitC\nQkIQEBCAjIwMl8+DiIjunrQw8vDwgMViQWVlJSoqKgAAubm5iImJQW1tLaKjo5GTkwMAqK6uRlFR\nEWpqalBaWor169dDCAEAWLduHfLz81FXV4e6ujocOnRI1pSIiOgOSQsjIQQGBgYGtR04cABpaWkA\ngLS0NBQXFwMASkpKkJycDC8vLxiNRphMJlRUVMBut6OzsxMREREAgNTUVKUPERHdO6SFkUajwcKF\nCxEREYH33nsPANDc3AytVgsA0Ol0aGlpAQDYbDb4+/srffV6PWw2G2w2GwwGg9JuMBhgs9lcOAsi\nInIGL1kDHz16FJMnT8Z3332nnCfSaDSDHnPjfSIiuj9JC6PJkycDACZOnIhly5ahoqICWq1WWR3Z\n7XZMmjQJwNWVUGNjo9K3qakJer3+lu23kpWVpdyOiopCVFSUcydFRHSPs1gssFgsLh9XI67tBHCh\ny5cvY2BgAD4+Pvjhhx8QGxuLN954A+Xl5fDz88PmzZvx5ptvoq2tDbm5uaiursbq1atx/Phx2Gw2\nLFy4EKdPn4ZGo8H8+fOxbds2RERE4KmnnsKGDRuwaNGioRPVaCBhqkQ0Al09qiPj/cb573Oueu+U\nsjJqbm7G8uXLodFo0NfXh9WrVyM2NhaPPvookpKSsHv3bkybNg1FRUUAgKCgICQlJSEoKAje3t7Y\nuXOncghvx44dSE9PR1dXF+Lj428aRERE5N6krIxk4MqIiFyFK6Phk3bOiIhITTqdEc3N38gugxzE\nlRER3ZfkrU4AgCuj4eK16YiISDqGERERSccwIiIi6RhGREQkHcOIiIikYxgREZF0DCMiIpKOYURE\nRNIxjIiISDqGERERSccwInIhnc4IjUYj5UunM8qePtEt8dp0RC4k+3ppI+n/gOzXmtemGx6ujIiI\nSDqGERERSccwIiIi6RhGREQkHcOIpJK1u4w7y4jcC3fTkVTydjzJ+fcgd4fX/wHodvmoWu002O0N\nLh+Xu+mc9Iwueu9kGJFUDCOXji5pbL7Wrhz3Xg0jHqYjIiLpGEZERCQdw8iN8GQ+EY1UPGfkRkba\n+RNg5M2Z5zFcOCpfa+c8o4veO71UH4HILY3+35sVEbkDhhGNUN2Q9ZsrEQ3FMLrBr371a5w6Vefy\ncT09Zb5JcZVARHLdF+eMDh48iIyMDAwMDGDNmjXYvHnzkMc4etxz1KgH0du7B8Bo5xd6G6NHb0N3\n9yeQ99v6yDu2PrLGlTk2zxm5ctx79ZwRxD2uv79fzJgxQzQ0NIienh4xZ84cUVNTM+Rxjk7V2/sB\nAfwgAOHSrzFj0gUAARxx+dhXxx1uH2fVeSdjO6NOtccdznxd9TO/2znfaZ1w9n/72zpy5IgQQubP\n2NGx1fi5O/+1dtXP757f2l1RUQGTyYRp06bB29sbycnJOHDggOyy7oJFdgEOssguwEEW2QU4wCK7\nAAdZZBfgEIvFIrsEB1lkF+BW7vkwstls8Pf3V+4bDAbYbDaJFRER0XBxA8MNvLy88cADT8PVL01P\nT5VLxyMicif3/AaGY8eOISsrCwcPHgQA5ObmQqPRDNnEwN1iRER3xhUxcc+HUX9/PwIDA1FeXo7J\nkycjMjIS+/btg9lsll0aERE56J4/TOfp6Ynt27cjNjZW2drNICIiurfc8ysjIiK6993zu+kccfDg\nQTzyyCMICAjAm2++qfp4TU1NiI6OxqxZsxAcHIxt27YBANra2hAbG4vAwEDExcWhvb1d6ZOTkwOT\nyQSz2YyysjKl3Wq1IiQkBAEBAcjIyFDae3p6kJycDJPJhAULFuDcuXN3VOvAwADCw8ORkJDgtjW2\nt7dj5cqVMJvNmDVrFo4fP+6Wdebk5GDWrFkICQnB6tWr0dPT4xZ1rlmzBlqtFiEhIUqbq+oqKChA\nQEAAAgMDsXfv3mHXmZmZCbPZjNDQUDz99NPo6Ohwyzqvefvtt+Hh4YHW1la3rfOdd96B2WxGcHAw\nXn31Vel1KlzyaSaJHP1QrDOdP39eVFZWCiGE6OzsFAEBAaKmpkZkZmaKN998UwghRG5urti8ebMQ\nQohTp06J0NBQ0dvbK+rr68WMGTPEwMCAEEKIyMhIUVFRIYQQYvHixeLgwYNCCCF27twp1q1bJ4QQ\norCwUDzzzDN3VOvWrVvF6tWrxZIlS4QQwi1rTEtLE7t37xZCCNHb2ysuXrzodnU2NDSI6dOni+7u\nbiGEEElJSWLPnj1uUednn30mKisrRXBwsNLmirpaW1vFww8/LC5evCja2tqU28Op8/Dhw6K/v18I\nIcTmzZvFq6++6pZ1CiFEY2OjiIuLE0ajUVy4cEEIIUR1dbVb1XnkyBGxcOFC0dvbK4QQ4rvvvpNe\n5zX3fRh98cUXYtGiRcr9nJwckZub69Iali5dKg4fPiwCAwOF3W4XQlwNrMDAwJvWtGjRInHs2DFx\n/vx5YTablfZ9+/aJl156SQghRFxcnDh27JgQQoi+vj7x0EMPDbuuxsZGERMTI44cOaKEkbvV2N7e\nLh5++OEh7e5WZ2trqwgMDBStra2it7dXLFmyxK1+5g0NDYPelNSsa+LEiUMeI4QQL730kigsLBxW\nndf76KOPxHPPPee2dSYmJoovv/xyUBi5W51JSUmivLx8yONk1ynEfXAFhh8j+0OxDQ0NqKqqwvz5\n89Hc3AytVgsA0Ol0aGlpuWmNer0eNpsNNpsNBoPhprVf38fT0xO+vr6DDg04YuPGjfjDH/4waNu7\nu9VYX1+Phx56CM8//zzCw8Pxy1/+EpcvX3a7OsePH49NmzZh6tSp0Ov1GDduHGJiYtyuzmtaWlpU\nq2vcuHFobW295XPdqd27dyM+Pt4t6ywpKYG/vz+Cg4MHtbtbnXV1dfj0008xf/58/PznP8fJkyfd\nps77PoxkunTpEhITE5GXlwcfH58hn3Vy5mefxDD3oXz88cfQarUIDQ29bV+ZNQJAX18frFYrfvWr\nX8FqtWLMmDHKZ8muJ7vOs2fP4o9//CO++eYbfPvtt/jhhx/wwQcfuF2dt+KudV3z+9//Ht7e3li1\napXTntNZdV65cgVbtmxBdna2U57vRs58Pfv6+tDW1oZjx47hrbfewsqVK5323Hdb530fRnq9ftCJ\ntaamJuj1etXH7evrQ2JiIlJSUrB06VIAgFarRXNzMwDAbrdj0qRJSo2NjY1DarxV+419+vv70dHR\nAT8/P4frO3r0KEpKSvDwww9j1apV+OSTT5CSkgKdTuc2NQJXfxPz9/fHo48+CgB4+umnYbVa3eq1\nBIATJ07gpz/9Kfz8/ODp6Ynly5fj888/d7s6r3FFXc76v7dnzx784x//wIcffqi0uVOdX3/9NRoa\nGjBnzhztOzbGAAACSElEQVRMnz4dTU1NCA8PR0tLyy2fW9br6e/vjxUrVgAAIiIi4OnpiQsXLrhH\nnT96IO8e19fXp2xg6O7uFnPmzBHV1dWqj5uSkiI2btw4qC0zM1M5Lnuzk8bd3d3i7Nmzg04ezps3\nTxw/flwMDAyIxYsXi9LSUiGEEDt27FBOHu7bt++ONwcIIYTFYlHOGf361792uxoff/xxUVtbK4QQ\nIisrS2RmZrrda1lVVSVmz54trly5IgYGBkRaWprYvn2729RZX18vZs+erdx3RV3Xn8i+drutrW1Y\ndZaWloqgoCDx/fffD3qcu9V5PaPRKFpbW92yzl27donXX39dCCFEbW2tmDp1qlvUKcQI2MAgxNV/\n0AEBAWLmzJkiJydH9fH+9a9/CQ8PDzFnzhwRGhoqwsLCRGlpqbhw4YJ48sknRUBAgFi4cOGgH9CW\nLVvEjBkzxCOPPCIOHTqktJ84cULMnj1bzJw5U2zYsEFp7+rqEitXrhQzZ84U8+bNE/X19Xdc7/Vh\n5I41VlVViUcffVTMmTNHLF++XFy8eNEt63zrrbdEUFCQCA4OFqmpqaKnp8ct6ly1apWYPHmyGDVq\nlPD39xe7d+8Wra2tLqnr/fffFzNnzhQmk0kUFBQMu86ZM2eKqVOnirCwMBEWFqa8+blbndebPn26\nsoHB3ers7e0Vzz33nJg9e7aYO3eusFgs0uu8hh96JSIi6e77c0ZEROT+GEZERCQdw4iIiKRjGBER\nkXQMIyIiko5hRERE0jGMiIhIOoYRERFJ9/8A81ncwN/TBTUAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "autos.odometer_km.plot.hist()" ] }, { "cell_type": "code", "execution_count": 305, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 50000.000000\n", "mean 125732.700000\n", "std 40042.211706\n", "min 5000.000000\n", "25% 125000.000000\n", "50% 150000.000000\n", "75% 150000.000000\n", "max 150000.000000\n", "Name: odometer_km, dtype: float64" ] }, "execution_count": 305, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['odometer_km'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Listings range from low-mileage cars (5000 km) to high-mileage ones (150000 km)\n", "\n", "Most cars' mileage is on the higher side - over 120000KM. Visualizing it above indicates that the maximum, median value and 3rd quartile are all coinciding at 150K. It also shows that there is a wide range in the values of mileage (i.e more spread out data) in the under 100K area and about 50% of the listings are close together in the higher range. \n", "When we split the `odometer_km` into bins later on in the analysis, it will be clearer which bins most mileages fall into.\n" ] }, { "cell_type": "code", "execution_count": 306, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 5.000000e+04\n", "mean 9.840044e+03\n", "std 4.811044e+05\n", "min 0.000000e+00\n", "25% 1.100000e+03\n", "50% 2.950000e+03\n", "75% 7.200000e+03\n", "max 1.000000e+08\n", "Name: price, dtype: float64" ] }, "execution_count": 306, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['price'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As found earlier, `price` column has some outliers - minimum of 0 and maximum of ~100M dollars. Let's study the price list further" ] }, { "cell_type": "code", "execution_count": 307, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
price
01421
500781
1500734
2500643
1000639
1200639
600531
800498
3500498
2000460
\n", "
" ], "text/plain": [ " price\n", "0 1421\n", "500 781\n", "1500 734\n", "2500 643\n", "1000 639\n", "1200 639\n", "600 531\n", "800 498\n", "3500 498\n", "2000 460" ] }, "execution_count": 307, "metadata": {}, "output_type": "execute_result" } ], "source": [ " pd.DataFrame(autos['price'].value_counts(dropna=False).head(10))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The value_counts index above (which is actually the price) is in ascending order, so, by reversing the order of the index, let's see what the price distribution looks like on the higher end." ] }, { "cell_type": "code", "execution_count": 308, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
price
999999991
273222221
123456783
111111112
100000001
38900001
13000001
12345661
9999992
9999901
3500001
3450001
2990001
2950001
2650001
2590001
2500001
2200001
1980001
1970001
\n", "
" ], "text/plain": [ " price\n", "99999999 1\n", "27322222 1\n", "12345678 3\n", "11111111 2\n", "10000000 1\n", "3890000 1\n", "1300000 1\n", "1234566 1\n", "999999 2\n", "999990 1\n", "350000 1\n", "345000 1\n", "299000 1\n", "295000 1\n", "265000 1\n", "259000 1\n", "250000 1\n", "220000 1\n", "198000 1\n", "197000 1" ] }, "execution_count": 308, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.DataFrame(autos['price'].value_counts().sort_index(ascending=False).head(20))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- There are no cars in range 350000 < `price` < 999990\n", "\n", "- A used car has nearly 100M price tag - find what that is (i.e row with the MAX value of `price` )" ] }, { "cell_type": "code", "execution_count": 309, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
39705
date_crawled2016-03-22 14:58:27
nameTausch_gegen_gleichwertiges
sellerprivat
offer_typeAngebot
price99999999
abtestcontrol
vehicle_typelimousine
registration_year1999
gearboxautomatik
power_p_s224
models_klasse
odometer_km150000
registration_month9
fuel_typebenzin
brandmercedes_benz
unrepaired_damageNaN
ad_created2016-03-22 00:00:00
nr_of_pictures0
postal_code73525
last_seen2016-04-06 05:15:30
\n", "
" ], "text/plain": [ " 39705\n", "date_crawled 2016-03-22 14:58:27\n", "name Tausch_gegen_gleichwertiges\n", "seller privat\n", "offer_type Angebot\n", "price 99999999\n", "abtest control\n", "vehicle_type limousine\n", "registration_year 1999\n", "gearbox automatik\n", "power_p_s 224\n", "model s_klasse\n", "odometer_km 150000\n", "registration_month 9\n", "fuel_type benzin\n", "brand mercedes_benz\n", "unrepaired_damage NaN\n", "ad_created 2016-03-22 00:00:00\n", "nr_of_pictures 0\n", "postal_code 73525\n", "last_seen 2016-04-06 05:15:30" ] }, "execution_count": 309, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "pd.DataFrame(autos.loc[autos['price'].idxmax()])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's an intriguing listing name (when translated to english) - *Exchange for something of the same value*.\n", "\n", "It's also unrealistic a limo with 150K miles would be worth a 100M dollars. \n", "\n", "Let's see how many and of what type used cars are in millions of dollars" ] }, { "cell_type": "code", "execution_count": 310, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
vehicle_typemodelodometer_kmbrandprice
2897limousineescort50000ford11111111
7814coupeNaN50000sonstige_autos1300000
11137coupeNaN100000sonstige_autos10000000
22947kombiNaN150000bmw1234566
24384NaNNaN150000volkswagen11111111
27371NaNpunto150000fiat12345678
39377NaNv40150000volvo12345678
39705limousines_klasse150000mercedes_benz99999999
42221limousinec440000citroen27322222
47598limousinevectra150000opel12345678
47634coupeNaN5000sonstige_autos3890000
\n", "
" ], "text/plain": [ " vehicle_type model odometer_km brand price\n", "2897 limousine escort 50000 ford 11111111\n", "7814 coupe NaN 50000 sonstige_autos 1300000\n", "11137 coupe NaN 100000 sonstige_autos 10000000\n", "22947 kombi NaN 150000 bmw 1234566\n", "24384 NaN NaN 150000 volkswagen 11111111\n", "27371 NaN punto 150000 fiat 12345678\n", "39377 NaN v40 150000 volvo 12345678\n", "39705 limousine s_klasse 150000 mercedes_benz 99999999\n", "42221 limousine c4 40000 citroen 27322222\n", "47598 limousine vectra 150000 opel 12345678\n", "47634 coupe NaN 5000 sonstige_autos 3890000" ] }, "execution_count": 310, "metadata": {}, "output_type": "execute_result" } ], "source": [ "millions = autos['price'] > 999999\n", "car_is = autos[millions]\n", "pd.DataFrame(car_is[ ['vehicle_type', 'model','odometer_km', 'brand', 'price' ] ])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even millions price range does not seem to contain any of the expensive brands like Bugatti or Lamborghini, which could sell for that much in used condition. \n", "\n", "There are many that are missing vehicle type or model. A buyer would need that information before bidding.\n", "\n", "Therefore, it's reasonable to drop these and explore data points upto 1M. " ] }, { "cell_type": "code", "execution_count": 311, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 45100.000000\n", "mean 6386.757738\n", "std 12306.631453\n", "min 500.000000\n", "25% 1500.000000\n", "50% 3500.000000\n", "75% 7900.000000\n", "max 999999.000000\n", "Name: price, dtype: float64" ] }, "execution_count": 311, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Look at the distribution when the price is between 500 and 1M.\n", "upto_1mil = autos['price'].between(500,999999)\n", "autos_1mil = autos[upto_1mil]\n", "autos_1mil['price'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Checking distribution again using an upper price limit of 350K, not many data points are lost compared to 1M, because there are no listings between 350K and 1M. \n", "\n", "Also, $6K is more realistic for average price of a used car. " ] }, { "cell_type": "code", "execution_count": 312, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "count 45097.000000\n", "mean 6320.659600\n", "std 9261.841444\n", "min 500.000000\n", "25% 1500.000000\n", "50% 3500.000000\n", "75% 7900.000000\n", "max 350000.000000\n", "Name: price, dtype: float64\n" ] } ], "source": [ "# Clean the data so that prices are between 500 and 350K. Look at the distribution \n", "upto_350k = autos['price'].between(500,350000)\n", "autos_350k = autos[upto_350k]\n", "print (autos_350k['price'].describe())\n", "autos = autos_350k" ] }, { "cell_type": "code", "execution_count": 313, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(45097, 20)" ] }, "execution_count": 313, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Lines left after cleaning\n", "autos.shape" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Explore columns containing dates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These columns are string values right now\n", "- `date_crawled`\n", "- `ad_created` \n", "- `last_seen`\n", "\n", "They can be better understood and useful for analysis when converted to numeric type" ] }, { "cell_type": "code", "execution_count": 314, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date_crawledad_createdlast_seen
02016-03-26 17:47:462016-03-26 00:00:002016-04-06 06:45:54
12016-04-04 13:38:562016-04-04 00:00:002016-04-06 14:45:08
22016-03-26 18:57:242016-03-26 00:00:002016-04-06 20:15:37
32016-03-12 16:58:102016-03-12 00:00:002016-03-15 03:16:28
42016-04-01 14:38:502016-04-01 00:00:002016-04-01 14:38:50
\n", "
" ], "text/plain": [ " date_crawled ad_created last_seen\n", "0 2016-03-26 17:47:46 2016-03-26 00:00:00 2016-04-06 06:45:54\n", "1 2016-04-04 13:38:56 2016-04-04 00:00:00 2016-04-06 14:45:08\n", "2 2016-03-26 18:57:24 2016-03-26 00:00:00 2016-04-06 20:15:37\n", "3 2016-03-12 16:58:10 2016-03-12 00:00:00 2016-03-15 03:16:28\n", "4 2016-04-01 14:38:50 2016-04-01 00:00:00 2016-04-01 14:38:50" ] }, "execution_count": 314, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos[['date_crawled','ad_created','last_seen']].head()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "- Select first 10 characters which is sufficient for the date in yyyy-mm-dd format\n", "- Get relative frequencies (or percentages) instead of counts of the unique values, including missing(null) values, sorted in ascending order of dates\n", "\n", "These columns have non-numeric values so, Series.describe() gets us different stats like so:" ] }, { "cell_type": "code", "execution_count": 315, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 45097\n", "unique 34\n", "top 2016-04-03\n", "freq 1751\n", "Name: date_crawled, dtype: object" ] }, "execution_count": 315, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['date_crawled'].str[:10].describe()" ] }, { "cell_type": "code", "execution_count": 316, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2016-03-05 0.025567\n", "2016-03-06 0.014125\n", "2016-03-07 0.036189\n", "2016-03-08 0.033173\n", "2016-03-09 0.032907\n", "2016-03-10 0.032707\n", "2016-03-11 0.033018\n", "2016-03-12 0.037320\n", "2016-03-13 0.015522\n", "2016-03-14 0.036300\n", "2016-03-15 0.034016\n", "2016-03-16 0.029359\n", "2016-03-17 0.031155\n", "2016-03-18 0.012883\n", "2016-03-19 0.034747\n", "2016-03-20 0.038073\n", "2016-03-21 0.037741\n", "2016-03-22 0.033018\n", "2016-03-23 0.032397\n", "2016-03-24 0.028982\n", "2016-03-25 0.031089\n", "2016-03-26 0.032641\n", "2016-03-27 0.031177\n", "2016-03-28 0.034836\n", "2016-03-29 0.033262\n", "2016-03-30 0.033328\n", "2016-03-31 0.031665\n", "2016-04-01 0.033905\n", "2016-04-02 0.035767\n", "2016-04-03 0.038827\n", "2016-04-04 0.036610\n", "2016-04-05 0.013172\n", "2016-04-06 0.003171\n", "2016-04-07 0.001353\n", "Name: date_crawled, dtype: float64" ] }, "execution_count": 316, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Look at the distribution of values in percentages\n", "autos['date_crawled'].str[:10].value_counts(normalize=True, dropna=False).sort_index()" ] }, { "cell_type": "code", "execution_count": 317, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 45097\n", "unique 76\n", "top 2016-04-03\n", "freq 1761\n", "Name: ad_created, dtype: object" ] }, "execution_count": 317, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['ad_created'].str[:10].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ads are created starting 2015-06-11. Most listings occured on 2016-04-03" ] }, { "cell_type": "code", "execution_count": 318, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2015-06-11 0.000022\n", "2015-08-10 0.000022\n", "2015-09-09 0.000022\n", "2015-11-10 0.000022\n", "2015-12-05 0.000022\n", "2015-12-30 0.000022\n", "2016-01-03 0.000022\n", "2016-01-07 0.000022\n", "2016-01-10 0.000044\n", "2016-01-13 0.000022\n", "2016-01-14 0.000022\n", "2016-01-16 0.000022\n", "2016-01-22 0.000022\n", "2016-01-27 0.000067\n", "2016-01-29 0.000022\n", "2016-02-01 0.000022\n", "2016-02-02 0.000044\n", "2016-02-05 0.000044\n", "2016-02-07 0.000022\n", "2016-02-08 0.000022\n", "Name: ad_created, dtype: float64" ] }, "execution_count": 318, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Look at the distribution of values in percentages\n", "autos['ad_created'].str[:10].value_counts(normalize=True, dropna=False).sort_index().head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`date_crawled` and `last_seen` are the same dates" ] }, { "cell_type": "code", "execution_count": 319, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 45097\n", "unique 34\n", "top 2016-04-06\n", "freq 10161\n", "Name: last_seen, dtype: object" ] }, "execution_count": 319, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['last_seen'].str[:10].describe()" ] }, { "cell_type": "code", "execution_count": 320, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2016-03-05 0.001087\n", "2016-03-06 0.004169\n", "2016-03-07 0.005211\n", "2016-03-08 0.007007\n", "2016-03-09 0.009468\n", "2016-03-10 0.010289\n", "2016-03-11 0.012041\n", "2016-03-12 0.023904\n", "2016-03-13 0.008870\n", "2016-03-14 0.012285\n", "2016-03-15 0.015677\n", "2016-03-16 0.016165\n", "2016-03-17 0.027674\n", "2016-03-18 0.007406\n", "2016-03-19 0.015411\n", "2016-03-20 0.020423\n", "2016-03-21 0.020667\n", "2016-03-22 0.021243\n", "2016-03-23 0.018405\n", "2016-03-24 0.019536\n", "2016-03-25 0.018582\n", "2016-03-26 0.016476\n", "2016-03-27 0.015456\n", "2016-03-28 0.020534\n", "2016-03-29 0.021354\n", "2016-03-30 0.024148\n", "2016-03-31 0.023438\n", "2016-04-01 0.022862\n", "2016-04-02 0.024880\n", "2016-04-03 0.024946\n", "2016-04-04 0.024303\n", "2016-04-05 0.126616\n", "2016-04-06 0.225314\n", "2016-04-07 0.134155\n", "Name: last_seen, dtype: float64" ] }, "execution_count": 320, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Look at the distribution of values in percentages\n", "autos['last_seen'].str[:10].value_counts(normalize=True, dropna=False).sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis of numeric columns `registration_year` and `registration_month`" ] }, { "cell_type": "code", "execution_count": 321, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 45097.000000\n", "mean 2005.064173\n", "std 89.652017\n", "min 1000.000000\n", "25% 2000.000000\n", "50% 2004.000000\n", "75% 2008.000000\n", "max 9999.000000\n", "Name: registration_year, dtype: float64" ] }, "execution_count": 321, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['registration_year'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Registration years of 1000 or 9999 do not make sense and are irrelevant to analysis.\n", "\n", "The ad was first crawled starting in 2016 so, the latest valid registration year could not be after 2016. \n", "\n", "For lower limit on the registration year, I would utilize publicly available knowledge about when first cars came out. \n", "\n", "Cars became widely available in the early 20th century, even though there were some first ones in late 1800s. \n", "- 1886 was the birth year of the modern car when German inventor Karl Benz patented his Benz Patent-Motorwagen and \n", "- 1896 was when the first successful American gasoline automobile ( designed by Bicycle mechanics J. Frank and Charles Duryea of Springfield, Massachusetts ) was first sold. \n", "\n", "Hence, selecting 1920 for lower limit." ] }, { "cell_type": "code", "execution_count": 322, "metadata": { "collapsed": false }, "outputs": [], "source": [ "year_bool = autos['registration_year'].between(1000, 1920)\n" ] }, { "cell_type": "code", "execution_count": 323, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
registration_yearbrandprice
223161000volkswagen1500
226591910opel500
286931910renault599
492831001citroen7750
\n", "
" ], "text/plain": [ " registration_year brand price\n", "22316 1000 volkswagen 1500\n", "22659 1910 opel 500\n", "28693 1910 renault 599\n", "49283 1001 citroen 7750" ] }, "execution_count": 323, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos.loc[year_bool, ['registration_year', 'brand', 'price']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Based on 1920 for lower cutoff, we won't lose much of valid data" ] }, { "cell_type": "code", "execution_count": 324, "metadata": { "collapsed": true }, "outputs": [], "source": [ "include_reg_year = autos[autos['registration_year'].between(1920,2016)]" ] }, { "cell_type": "code", "execution_count": 325, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(43321, 20)" ] }, "execution_count": 325, "metadata": {}, "output_type": "execute_result" } ], "source": [ "include_reg_year.shape" ] }, { "cell_type": "code", "execution_count": 326, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 43321.000000\n", "mean 2003.231574\n", "std 7.046978\n", "min 1927.000000\n", "25% 1999.000000\n", "50% 2004.000000\n", "75% 2008.000000\n", "max 2016.000000\n", "Name: registration_year, dtype: float64" ] }, "execution_count": 326, "metadata": {}, "output_type": "execute_result" } ], "source": [ "include_reg_year['registration_year'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking at the percentages (instead of counts) of the unique registration year values, distribution prior to 1987 needs some looking into whether we can drop them." ] }, { "cell_type": "code", "execution_count": 327, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1927 0.000023\n", "1929 0.000023\n", "1931 0.000023\n", "1934 0.000046\n", "1937 0.000092\n", "1938 0.000023\n", "1939 0.000023\n", "1941 0.000046\n", "1943 0.000023\n", "1948 0.000023\n", "1950 0.000023\n", "1951 0.000046\n", "1952 0.000023\n", "1953 0.000023\n", "1954 0.000046\n", "1955 0.000046\n", "1956 0.000092\n", "1957 0.000046\n", "1958 0.000092\n", "1959 0.000139\n", "1960 0.000439\n", "1961 0.000139\n", "1962 0.000092\n", "1963 0.000185\n", "1964 0.000254\n", "1965 0.000392\n", "1966 0.000485\n", "1967 0.000600\n", "1968 0.000600\n", "1969 0.000439\n", " ... \n", "1987 0.001570\n", "1988 0.002978\n", "1989 0.003578\n", "1990 0.006325\n", "1991 0.006763\n", "1992 0.007040\n", "1993 0.007848\n", "1994 0.011542\n", "1995 0.019852\n", "1996 0.024746\n", "1997 0.034902\n", "1998 0.046767\n", "1999 0.059371\n", "2000 0.062672\n", "2001 0.058170\n", "2002 0.055839\n", "2003 0.061333\n", "2004 0.061933\n", "2005 0.066111\n", "2006 0.061541\n", "2007 0.052400\n", "2008 0.050922\n", "2009 0.047967\n", "2010 0.036610\n", "2011 0.037326\n", "2012 0.030170\n", "2013 0.018398\n", "2014 0.015050\n", "2015 0.008402\n", "2016 0.021929\n", "Name: registration_year, Length: 77, dtype: float64" ] }, "execution_count": 327, "metadata": {}, "output_type": "execute_result" } ], "source": [ "include_reg_year['registration_year'].value_counts(normalize=True, dropna=False).sort_index()" ] }, { "cell_type": "code", "execution_count": 328, "metadata": { "collapsed": true }, "outputs": [], "source": [ "include_reg_year2 = include_reg_year[include_reg_year['registration_year'].between(1969,2016)]\n" ] }, { "cell_type": "code", "execution_count": 329, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(43143, 20)" ] }, "execution_count": 329, "metadata": {}, "output_type": "execute_result" } ], "source": [ "include_reg_year2.shape" ] }, { "cell_type": "code", "execution_count": 330, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 43143.000000\n", "mean 2003.404492\n", "std 6.502568\n", "min 1969.000000\n", "25% 1999.000000\n", "50% 2004.000000\n", "75% 2008.000000\n", "max 2016.000000\n", "Name: registration_year, dtype: float64" ] }, "execution_count": 330, "metadata": {}, "output_type": "execute_result" } ], "source": [ "include_reg_year2['registration_year'].describe()" ] }, { "cell_type": "code", "execution_count": 331, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2016 0.022020\n", "2015 0.008437\n", "2014 0.015113\n", "2013 0.018473\n", "2012 0.030295\n", "2011 0.037480\n", "2010 0.036761\n", "2009 0.048165\n", "2008 0.051132\n", "2007 0.052616\n", "2006 0.061794\n", "2005 0.066384\n", "2004 0.062189\n", "2003 0.061586\n", "2002 0.056069\n", "2001 0.058410\n", "2000 0.062930\n", "1999 0.059616\n", "1998 0.046960\n", "1997 0.035046\n", "1996 0.024848\n", "1995 0.019934\n", "1994 0.011589\n", "1993 0.007881\n", "1992 0.007070\n", "1991 0.006791\n", "1990 0.006351\n", "1989 0.003593\n", "1988 0.002990\n", "1987 0.001576\n", "1986 0.001507\n", "1985 0.001970\n", "1984 0.001136\n", "1983 0.001159\n", "1982 0.000950\n", "1981 0.000626\n", "1980 0.001762\n", "1979 0.000788\n", "1978 0.000974\n", "1977 0.000510\n", "1976 0.000487\n", "1975 0.000417\n", "1974 0.000556\n", "1973 0.000533\n", "1972 0.000719\n", "1971 0.000579\n", "1970 0.000788\n", "1969 0.000440\n", "Name: registration_year, dtype: float64" ] }, "execution_count": 331, "metadata": {}, "output_type": "execute_result" } ], "source": [ "include_reg_year2['registration_year'].value_counts(normalize=True, dropna=False).sort_index(ascending=False)" ] }, { "cell_type": "code", "execution_count": 332, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.9478478548084279" ] }, "execution_count": 332, "metadata": {}, "output_type": "execute_result" } ], "source": [ "include_reg_year2['registration_year'].value_counts(normalize=True, dropna=False).sort_index(ascending=False).head(23).sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Distribution of the data for years of registration 1994-2016, with 94% of listings in that range, indicates it's reasonable to keep that data for analysis." ] }, { "cell_type": "code", "execution_count": 333, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(40893, 20)" ] }, "execution_count": 333, "metadata": {}, "output_type": "execute_result" } ], "source": [ "include_1994_2016 = include_reg_year2[include_reg_year2['registration_year'].between(1994,2016)]\n", "include_1994_2016.shape" ] }, { "cell_type": "code", "execution_count": 334, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "count 40893.000000\n", "mean 2004.287996\n", "std 5.248485\n", "min 1994.000000\n", "25% 2000.000000\n", "50% 2004.000000\n", "75% 2008.000000\n", "max 2016.000000\n", "Name: registration_year, dtype: float64" ] }, "execution_count": 334, "metadata": {}, "output_type": "execute_result" } ], "source": [ "include_1994_2016['registration_year'].describe()" ] }, { "cell_type": "code", "execution_count": 335, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1994 0.012227\n", "1995 0.021030\n", "1996 0.026215\n", "1997 0.036975\n", "1998 0.049544\n", "1999 0.062896\n", "2000 0.066393\n", "2001 0.061624\n", "2002 0.059154\n", "2003 0.064974\n", "2004 0.065610\n", "2005 0.070036\n", "2006 0.065195\n", "2007 0.055511\n", "2008 0.053946\n", "2009 0.050816\n", "2010 0.038784\n", "2011 0.039542\n", "2012 0.031961\n", "2013 0.019490\n", "2014 0.015944\n", "2015 0.008901\n", "2016 0.023231\n", "Name: registration_year, dtype: float64" ] }, "execution_count": 335, "metadata": {}, "output_type": "execute_result" } ], "source": [ "include_1994_2016['registration_year'].value_counts(normalize=True, dropna=False).sort_index()" ] }, { "cell_type": "code", "execution_count": 336, "metadata": { "collapsed": false }, "outputs": [], "source": [ "autos = include_1994_2016" ] }, { "cell_type": "code", "execution_count": 337, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(40893, 20)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date_crawlednameselleroffer_typepriceabtestvehicle_typeregistration_yeargearboxpower_p_smodelodometer_kmregistration_monthfuel_typebrandunrepaired_damagead_creatednr_of_picturespostal_codelast_seen
02016-03-26 17:47:46Peugeot_807_160_NAVTECH_ON_BOARDprivatAngebot5000controlbus2004manuell158andere1500003lpgpeugeotnein2016-03-26 00:00:000795882016-04-06 06:45:54
12016-04-04 13:38:56BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_OptikprivatAngebot8500controllimousine1997automatik2867er1500006benzinbmwnein2016-04-04 00:00:000710342016-04-06 14:45:08
22016-03-26 18:57:24Volkswagen_Golf_1.6_UnitedprivatAngebot8990testlimousine2009manuell102golf700007benzinvolkswagennein2016-03-26 00:00:000353942016-04-06 20:15:37
32016-03-12 16:58:10Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...privatAngebot4350controlkleinwagen2007automatik71fortwo700006benzinsmartnein2016-03-12 00:00:000337292016-03-15 03:16:28
42016-04-01 14:38:50Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...privatAngebot1350testkombi2003manuell0focus1500007benzinfordnein2016-04-01 00:00:000392182016-04-01 14:38:50
\n", "
" ], "text/plain": [ " date_crawled name \\\n", "0 2016-03-26 17:47:46 Peugeot_807_160_NAVTECH_ON_BOARD \n", "1 2016-04-04 13:38:56 BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik \n", "2 2016-03-26 18:57:24 Volkswagen_Golf_1.6_United \n", "3 2016-03-12 16:58:10 Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan... \n", "4 2016-04-01 14:38:50 Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg... \n", "\n", " seller offer_type price abtest vehicle_type registration_year \\\n", "0 privat Angebot 5000 control bus 2004 \n", "1 privat Angebot 8500 control limousine 1997 \n", "2 privat Angebot 8990 test limousine 2009 \n", "3 privat Angebot 4350 control kleinwagen 2007 \n", "4 privat Angebot 1350 test kombi 2003 \n", "\n", " gearbox power_p_s model odometer_km registration_month fuel_type \\\n", "0 manuell 158 andere 150000 3 lpg \n", "1 automatik 286 7er 150000 6 benzin \n", "2 manuell 102 golf 70000 7 benzin \n", "3 automatik 71 fortwo 70000 6 benzin \n", "4 manuell 0 focus 150000 7 benzin \n", "\n", " brand unrepaired_damage ad_created nr_of_pictures \\\n", "0 peugeot nein 2016-03-26 00:00:00 0 \n", "1 bmw nein 2016-04-04 00:00:00 0 \n", "2 volkswagen nein 2016-03-26 00:00:00 0 \n", "3 smart nein 2016-03-12 00:00:00 0 \n", "4 ford nein 2016-04-01 00:00:00 0 \n", "\n", " postal_code last_seen \n", "0 79588 2016-04-06 06:45:54 \n", "1 71034 2016-04-06 14:45:08 \n", "2 35394 2016-04-06 20:15:37 \n", "3 33729 2016-03-15 03:16:28 \n", "4 39218 2016-04-01 14:38:50 " ] }, "execution_count": 337, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Taking stock of cleaned data\n", "print (autos.shape)\n", "autos.head()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Exploring And analyzing `brand` column \n", "Understand the mean prices across brands" ] }, { "cell_type": "code", "execution_count": 338, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "39\n", "['peugeot' 'bmw' 'volkswagen' 'smart' 'ford' 'chrysler' 'renault' 'audi'\n", " 'mazda' 'porsche' 'mini' 'mercedes_benz' 'seat' 'toyota' 'dacia' 'nissan'\n", " 'opel' 'saab' 'volvo' 'jaguar' 'fiat' 'skoda' 'subaru' 'sonstige_autos'\n", " 'kia' 'citroen' 'mitsubishi' 'chevrolet' 'hyundai' 'honda' 'daewoo'\n", " 'suzuki' 'land_rover' 'jeep' 'alfa_romeo' 'rover' 'daihatsu' 'lancia'\n", " 'lada']\n" ] } ], "source": [ "# List and number of all the unique brands\n", "brand_list = autos[\"brand\"].unique()\n", "print (len(brand_list))\n", "print (brand_list)" ] }, { "cell_type": "code", "execution_count": 339, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
brand
volkswagen8509
bmw4744
opel4208
mercedes_benz3996
audi3746
ford2713
renault1881
peugeot1258
fiat976
seat768
skoda739
smart654
mazda630
nissan627
citroen595
toyota556
hyundai440
mini402
volvo361
mitsubishi324
kia318
honda307
alfa_romeo264
sonstige_autos264
suzuki243
porsche228
chevrolet217
chrysler146
dacia123
jeep97
land_rover93
daihatsu92
subaru78
saab66
jaguar62
daewoo58
rover53
lancia37
lada20
\n", "
" ], "text/plain": [ " brand\n", "volkswagen 8509\n", "bmw 4744\n", "opel 4208\n", "mercedes_benz 3996\n", "audi 3746\n", "ford 2713\n", "renault 1881\n", "peugeot 1258\n", "fiat 976\n", "seat 768\n", "skoda 739\n", "smart 654\n", "mazda 630\n", "nissan 627\n", "citroen 595\n", "toyota 556\n", "hyundai 440\n", "mini 402\n", "volvo 361\n", "mitsubishi 324\n", "kia 318\n", "honda 307\n", "alfa_romeo 264\n", "sonstige_autos 264\n", "suzuki 243\n", "porsche 228\n", "chevrolet 217\n", "chrysler 146\n", "dacia 123\n", "jeep 97\n", "land_rover 93\n", "daihatsu 92\n", "subaru 78\n", "saab 66\n", "jaguar 62\n", "daewoo 58\n", "rover 53\n", "lancia 37\n", "lada 20" ] }, "execution_count": 339, "metadata": {}, "output_type": "execute_result" } ], "source": [ "counts = pd.DataFrame(autos['brand'].value_counts())\n", "counts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I am selecting brands that have atleast 100 listings, including **Porsche** in the mix makes it interesting to analyze mean prices" ] }, { "cell_type": "code", "execution_count": 344, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford', 'renault',\n", " 'peugeot', 'fiat', 'seat', 'skoda', 'smart', 'mazda', 'nissan',\n", " 'citroen', 'toyota', 'hyundai', 'mini', 'volvo', 'mitsubishi', 'kia',\n", " 'honda', 'alfa_romeo', 'sonstige_autos', 'suzuki', 'porsche',\n", " 'chevrolet', 'chrysler', 'dacia'],\n", " dtype='object')\n" ] } ], "source": [ "#counts[counts.brand > 100] # Note: counts.brand is same as counts['brand']\n", "brands_gt_100 = counts[counts.brand > 100].index\n", "print (brands_gt_100)" ] }, { "cell_type": "code", "execution_count": 345, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "29\n", "{'mini': 10742.965174129353, 'fiat': 3159.2715163934427, 'hyundai': 5686.940909090909, 'chevrolet': 6307.451612903225, 'ford': 3970.638039071139, 'porsche': 49661.149122807015, 'renault': 2762.946836788942, 'sonstige_autos': 14265.818181818182, 'bmw': 8787.180227655987, 'mitsubishi': 3929.061728395062, 'chrysler': 3632.0, 'mercedes_benz': 8956.977477477478, 'volkswagen': 5941.851686449641, 'smart': 3614.0428134556573, 'dacia': 5915.528455284553, 'mazda': 4459.720634920635, 'audi': 9946.883609183129, 'opel': 3395.6494771863117, 'citroen': 3890.435294117647, 'kia': 6196.229559748428, 'nissan': 5248.81658692185, 'toyota': 5328.836330935252, 'suzuki': 4559.061728395061, 'alfa_romeo': 3643.2083333333335, 'honda': 4513.166123778502, 'peugeot': 3374.4252782193958, 'seat': 4845.5546875, 'volvo': 5318.81163434903, 'skoda': 6577.7997293640055}\n" ] } ], "source": [ "# Aggregate data by `brand` column\n", "# Assign brands and their mean prices as key-value pairs to a dictionary\n", "brand_price = {}\n", "\n", "for b in brands_gt_100:\n", " # Select only rows that correspond to a specific brand\n", " brand_rows = autos[autos[\"brand\"] == b]\n", " # Calculate the mean price for those rows\n", " mean_price = brand_rows[\"price\"].mean()\n", " brand_price[b] = mean_price\n", " \n", "print (len(brand_price))\n", "print (brand_price)\n" ] }, { "cell_type": "code", "execution_count": 346, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
brandmean_price
5porsche49661.149123
7sonstige_autos14265.818182
0mini10742.965174
16audi9946.883609
11mercedes_benz8956.977477
8bmw8787.180228
28skoda6577.799729
3chevrolet6307.451613
19kia6196.229560
12volkswagen5941.851686
14dacia5915.528455
2hyundai5686.940909
21toyota5328.836331
27volvo5318.811634
20nissan5248.816587
26seat4845.554688
22suzuki4559.061728
24honda4513.166124
15mazda4459.720635
4ford3970.638039
9mitsubishi3929.061728
18citroen3890.435294
23alfa_romeo3643.208333
10chrysler3632.000000
13smart3614.042813
17opel3395.649477
25peugeot3374.425278
1fiat3159.271516
6renault2762.946837
\n", "
" ], "text/plain": [ " brand mean_price\n", "5 porsche 49661.149123\n", "7 sonstige_autos 14265.818182\n", "0 mini 10742.965174\n", "16 audi 9946.883609\n", "11 mercedes_benz 8956.977477\n", "8 bmw 8787.180228\n", "28 skoda 6577.799729\n", "3 chevrolet 6307.451613\n", "19 kia 6196.229560\n", "12 volkswagen 5941.851686\n", "14 dacia 5915.528455\n", "2 hyundai 5686.940909\n", "21 toyota 5328.836331\n", "27 volvo 5318.811634\n", "20 nissan 5248.816587\n", "26 seat 4845.554688\n", "22 suzuki 4559.061728\n", "24 honda 4513.166124\n", "15 mazda 4459.720635\n", "4 ford 3970.638039\n", "9 mitsubishi 3929.061728\n", "18 citroen 3890.435294\n", "23 alfa_romeo 3643.208333\n", "10 chrysler 3632.000000\n", "13 smart 3614.042813\n", "17 opel 3395.649477\n", "25 peugeot 3374.425278\n", "1 fiat 3159.271516\n", "6 renault 2762.946837" ] }, "execution_count": 346, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Construct a dataframe from this dictionary\n", "brand_price_df = pd.DataFrame(list(brand_price.items()),columns = ['brand','mean_price'])\n", "print (type(brand_price_df))\n", "# Sort the mean_price column from highest to lowest, to find its corresponding brand\n", "brand_price_df = brand_price_df.sort_values('mean_price', ascending=False)\n", "brand_price_df" ] }, { "cell_type": "code", "execution_count": 347, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mean_price
count29.000000
mean7194.221476
std8577.023981
min2762.946837
25%3890.435294
50%5248.816587
75%6307.451613
max49661.149123
\n", "
" ], "text/plain": [ " mean_price\n", "count 29.000000\n", "mean 7194.221476\n", "std 8577.023981\n", "min 2762.946837\n", "25% 3890.435294\n", "50% 5248.816587\n", "75% 6307.451613\n", "max 49661.149123" ] }, "execution_count": 347, "metadata": {}, "output_type": "execute_result" } ], "source": [ "brand_price_df.describe()" ] }, { "cell_type": "code", "execution_count": 348, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
brand
volkswagen8509
bmw4744
opel4208
mercedes_benz3996
audi3746
ford2713
\n", "
" ], "text/plain": [ " brand\n", "volkswagen 8509\n", "bmw 4744\n", "opel 4208\n", "mercedes_benz 3996\n", "audi 3746\n", "ford 2713" ] }, "execution_count": 348, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Top 6 brands by listings\n", "counts.head(6)" ] }, { "cell_type": "code", "execution_count": 349, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
brandmean_price
5porsche49661.149123
7sonstige_autos14265.818182
0mini10742.965174
16audi9946.883609
11mercedes_benz8956.977477
8bmw8787.180228
\n", "
" ], "text/plain": [ " brand mean_price\n", "5 porsche 49661.149123\n", "7 sonstige_autos 14265.818182\n", "0 mini 10742.965174\n", "16 audi 9946.883609\n", "11 mercedes_benz 8956.977477\n", "8 bmw 8787.180228" ] }, "execution_count": 349, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Brands by mean price\n", "brand_price_df.head(6)" ] }, { "cell_type": "code", "execution_count": 350, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 350, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAbwAAAEKCAYAAABpI+C3AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8TPf+P/DXhFAiltJMLCGxJLJMJhNiaRYJUvtachuq\nxFaK6u23reVxW1J9WEpLuLTUfgUt4RalqJrbiC1potZSzSYkSJMgicj2/v1B5pdUQpDMhPN6Ph7z\neMyZM/M5rzmSvJwz55xRiYiAiIjoBWdm6gBERETGwMIjIiJFYOEREZEisPCIiEgRWHhERKQILDwi\nIlIEFh4RGVy8eBHm5uYVNt7KlSvh7+9fYeM9b8unqoWFR4pmaWmJunXrom7duqhWrRpq165teGzL\nli0Vvrx9+/ZBp9OhTp06sLW1xe7duyt8Gc9KpVJV6fGet+VT1VHd1AGITOnOnTuG+y1btsSaNWvg\n5+dXKcs6deoURo8ejc2bN8PX1xdpaWnIzMyslGWZQkFBAapVq/bUry+6BgYLiioLt/CIHhAR/P3C\nQzk5OZg0aRKaNGmC5s2b46OPPkJBQQEAYP/+/WjTpg1mz56Nhg0bolWrVti+fXuZ48+ZMwfvvvsu\n/Pz8oFKp0LBhQ7Ro0aLU5z5u7PT0dAwbNgxWVlZo1aoVFi5caJi3cuVKdOvWDRMmTEC9evXg4uKC\n8PBww/zGjRvj6NGjhukZM2Zg/PjxpeZYtWoVHB0dUbduXdjb22PdunUPZfzss89gbW2Nd955p9Qx\nCgoKyszSuXNnzJo1C506dYKFhQWSk5PLtcx58+bBysoKNjY22Lx5s2H+zZs30bt3b9SrVw+enp5I\nSEgwzCssLMSkSZNgZWWF+vXrQ6fT4Y8//ig1M72YWHhEj/DJJ5/g7NmzOHfuHH799Vfo9Xp8/vnn\nhvnx8fHIz8/H9evXsWrVKowcObLEH9nijh8/jry8PLi4uKBp06YICgrC7du3y1z2o8Z+++23UVBQ\ngISEBBw4cABfffVViV2wv/zyC3Q6HdLS0jBt2jQMHDjwqbYmmzRpgv379+P27dv4+uuvMWnSJFy4\ncKFExoKCAiQlJWHp0qWljvG4LKGhodi0aRPu3LkDtVr92GUmJCRApVIhOTkZy5Ytw4QJE5CdnQ0A\nGDduHBo1aoSbN29ixYoVWLt2reF1e/bswalTpxAXF4eMjAxs3rwZDRo0eOJ1Qs8xISIREbG1tZVD\nhw6VeKxp06ai1+sN099//704OjqKiMiPP/4otWrVknv37hnm9+/fXxYtWvTQ2AUFBWJmZib29vYS\nHx8vd+7ckX79+snYsWNLzfKose/duyfVqlWT+Ph4w7yQkBDp1auXiIh8/fXX0rJlyxLjubq6yvbt\n20VExNraWiIiIgzzpk+fLuPGjRMRkd9//13Mzc3LXEc9e/aUVatWGTLWqVNH8vPzy3z+47J06tRJ\n5s2bV+brS1tm/fr1S8yvW7eu/Pbbb5KTkyNmZmaSmJhomPf++++Lv7+/iIjs3btXXFxc5OTJk1JY\nWPjIZdKLiVt4RI+QkpKC5s2bG6ZbtGiBq1evGqZfeeUV1KhRo8T8a9euPTSOmZkZatSogXHjxqFF\nixaoU6cOpk+fjr1795a57LLGTklJgYjAxsamzFzNmjUrMVZZuR5n165d6NixIxo2bIgGDRrg8OHD\nSE1NNcy3trZ+7Od2j8tS/H2UZ5mvvPJKiefXrl0bmZmZSElJeWh5xXcZ9+rVC2PGjMHbb7+Nxo0b\nY/LkyYYtQ1IGFh7RIzRu3LjELsqEhAQ0bdrUMJ2amorc3FzDdGJiIpo0aVLqWBqN5omWXdbY1tbW\nUKlUSExMLDGveK6kpKQSYxWfb2FhUeIPfVFR/F12djYCAgIwa9YspKamIj09HX5+fiU+5yzPASal\nZSm+joqPUZ5llqVovVy5cqXEsop77733EB0djdOnT+PUqVMICQl57Lj04mDhET3CG2+8geDgYKSl\npeHGjRuYO3cuRowYYZifm5uLOXPmIC8vDz///DN++uknvP7666WOFRQUhNWrV+PKlSvIysrCwoUL\n0a9fvzKXXdrYQ4YMQY0aNTB48GDMnDkT2dnZ+PPPPxESElIi15UrV7Bq1SoUFBRg06ZNSEpKMpyP\n5ubmhi1btqCgoADHjx/H999/X2K5ReVy9+5d5OfnG7aodu3aBb1e/8TrMDEx8aEsr732WqnPfZZl\n1qxZE/369cOsWbOQk5OD06dPIzQ01DD/xIkT+PXXX1FQUIBatWqhRo0aMDPjn0Al4WkJRA+UtrXy\n6aef4oMPPoCzszOqVauGwMBAfPjhh4b5dnZ2qF69OqytrVGvXj2sX78etra2pY4/ceJEJCUlQafT\noXr16ujbty8WLVpUZp7Sxi7aRbdy5Uq88847ht2jEydORGBgoOG1Pj4+iImJwcsvv4xmzZph586d\nsLS0BADMnTsXw4cPR4MGDdCtWzcEBgYiLy/vofXQsGFDLFq0CH379kV+fj4GDRqEPn36lH+FPtCl\nS5cys/x9nT/NMouP8fXXX2PUqFGwtraGs7MzRo8ejcjISABARkYGPvjgAyQkJKBWrVro27cvpk6d\n+sTvh55fKinPvgIiesj+/fsxZcoUXLp0qUqNvXLlSoSFheHAgQMVnovoecbteSIiUgQWHhERKQJ3\naRIRkSJwC4+IiBSBR2kaAS+GS0T0dCpyJyS38IxEHlyYuCrdZs2aZfIMzMRMSszFTOW7VTQWHhER\nKQILj4iIFIGFp2C+vr6mjvAQZiofZiq/qpiLmUyDpyUYgUqlqpT90UREL7KK/tvJLTwiIlIEFh4R\nESkCC4+IiBSBJ54bCU8+r1rU6hZISYk3dQwiMiIetGIE98uOq7lq4YFERFUdD1ohIiJ6Ciw8IiJS\nBBYeEREpAgvvgaVLl8LJyQkjRox4qtf7+fkhOjq6glMREVFF4VGaD3z11Vc4dOgQmjRp8tjnFhQU\noFq1akZIRUREFYWFB2DixImIjY1Fr169MHLkSISHhyM2NhYWFhZYtWoVXFxcEBwcjD///BOxsbFo\n0aIF1qxZg1GjRuHMmTNwcHBATk6Oqd8GERE9AgsP97fu9u/fj8OHD2P27Nlwd3fHzp07cfjwYYwY\nMQIxMTEAgAsXLiAiIgI1atTA4sWLUadOHZw7dw5nzpyBu7u7id8FERE9CguvGBHBkSNHsGPHDgD3\nP5dLS0tDZmYmAKB///6oUaMGAOCXX37B1KlTAQAajQZarfYxo88udt/3wY2IiIro9Xro9fpKG5+F\nV8zjroZiYWFR5rzHnxw5+8kDEREpiK+vb4mvKQoODq7Q8XmU5gNFheXj44NNmzYBuP+/jUaNGqFO\nnToPPd/HxwehoaEAgLNnz+L06dPGC0tERE+MW3gPFG3dzZo1C6NHj4ZWq4WFhQU2btxY6vMnTpyI\noKAgODs7w9HREe3btzdmXCIiekK8lqYR8FqaVRGvpUlU1fFamkRERE+BhUdERIrAwiMiIkVg4RER\nkSKw8IiISBF4WoLRPPqkdjIutbqFqSMQkZGx8IyEh8ATEZkWd2kSEZEisPCIiEgRWHhERKQILDwi\nIlIEFh4RESkCC4+IiBSBhUdERIrAwiMiIkVg4RERkSKw8IiISBFYeEREpAgsPCIiUgQWHhERKQIL\nj4iIFIGFR0REisDCIyIiReAXwBqJSsVvPKfyUatbICUl3tQxiF44KuFXcVe6+2XH1UzlpQJ/LYnu\n/+2syN8F7tIkIiJFYOEREZEisPCIiEgRWHjPyNLSEgCQnJyMgIAAE6chIqKy8KCVZ1S3bl3cvn37\nkc/hQSv0ZHjQChHAg1YqxaBBg+Dh4QGNRoPVq1cD+P9bbgAQFhaGoKAgAEB8fDxeffVVaLVafPzx\nx4bnJCQkQKPRGDc4ERGVGwsPwLp16xAZGYnIyEiEhIQgLS3tofPmiqanTp2KSZMm4bfffkPjxo1L\nfQ4REVU9PPEcwJIlS/Df//4XAJCUlIQ//vijzOdGRERgx44dAIARI0Zg+vTp5VzK7GL3fR/ciIio\niF6vh16vr7TxFV94//vf//Dzzz/jxIkTqFmzJvz8/JCTk1Niay0nJ8dwX6VSGeY92b7l2RWUmIjo\nxeTr6wtfX1/DdHBwcIWOr/hdmrdu3UKDBg1Qs2ZN/P777zh+/DgAQK1W4+LFiygsLMTOnTsNz/f0\n9MSWLVsAAKGhoSXG4oEGRERVl+ILr2fPnsjLy4OzszNmzpyJzp07Q6VSYf78+ejTpw+8vLzQpEkT\nw/OXLFmC5cuXQ6vVIjk5ucRY/AyPiKjq4mkJRsDTEujJ8LQEIoCnJRARET0VFh4RESkCC4+IiBSB\nhUdERIrAwiMiIkVQ/InnxsNTFqh81OoWpo5A9EJi4RkJDzMnIjIt7tIkIiJFYOEREZEisPCIiEgR\nWHhERKQILDwiIlIEFh4RESkCC4+IiBSBhUdERIrAwiMiIkVg4RERkSKw8IiISBFYeEREpAgsPCIi\nUgQWHhERKQILj4iIFIGFR0REisAvgDUSlYrfeE4VR61ugZSUeFPHIHquqIRfxV3p7pcdVzNVJBX4\nq0svOpWqYn/OuUuTiIgUgYVHRESKwMIjIiJFYOEREZEiPFeFZ2lpWWFj2dnZIS0trcLGIyKiqs1o\nhVdQUPDMY1Tkof08TYCISFkeW3gJCQlwdHREUFAQHBwcMHz4cBw8eBCenp5wcHBAVFQUsrOzMWbM\nGHTq1Ant2rXD7t27AQAbNmzAgAED0K1bN3Tv3h0AsGDBAri6ukKn02HmzJkAgNjYWPTq1QseHh7o\n0qULLl26BACIj4/Hq6++Cq1Wi48//rhErkWLFqFDhw5wc3NDcHAwACA7Oxt9+/aFTqeDq6srtm3b\nVub7EhFDlk6dOiE2NhYAkJqaiiFDhqBjx47o2LEjjh07BgAIDg7GmDFj4Ofnh9atW+Pf//43AGDl\nypXQ6XRwd3dHy5Yt0a1bt/KvfSIiMh55jPj4eDE3N5dz586JiEi7du1k9OjRIiKya9cuGThwoMyc\nOVNCQ0NFRCQjI0Ps7e0lOztb1q9fLzY2NpKRkSEiIvv27RNPT0/JyckREZH09HQREenWrZtcvnxZ\nREROnDghXbt2FRGR/v37y6ZNm0REZPny5WJpaSkiIgcOHJDx48eLiEhhYaH07dtXwsPDJSwszPC4\niMjt27fLfF+2trYyb948ERHZuHGj9O3bV0REhg0bJhERESIikpiYKI6OjiIiMnv2bPH09JS8vDxJ\nTU2Vhg0bSn5+vmG8vLw88fHxkR9++OGhZQEQQHjjrQJveNyvLtFzr6J/zst1pRU7Ozs4OTkBAJyd\nnQ1bay4uLoiPj0dSUhJ2796NhQsXAgByc3ORmJgIAPD390e9evUAAD/99BOCgoJQs2ZNAED9+vWR\nlZWFo0ePYujQobj//oC8vDwAQEREBHbs2AEAGDFiBKZPnw4AOHDgAA4ePAh3d3eICLKysvDHH3/A\ny8sLH3zwAWbMmIE+ffrAy8vrke/rjTfeAAAEBgbi/fffN2S8cOGCIUtmZiays7MBAH369EH16tXR\nsGFDqNVqXL9+HU2aNAEAvPvuu+jatSt69+5dxtJmF7vv++BGRERF9Ho99Hp9pY1frsIrKigAMDMz\nM0ybmZkhPz8f1atXR1hYGNq0aVPidcePH4eFhcUjxy4sLESDBg0QHR390DyVSmX4rK2ogIruz5gx\nA+PGjXvoNdHR0di7dy/+9a9/oXv37vjXv/5V5rKLf45XdL+wsBAnTpyAubn5Q8//+3rIz88HAKxf\nvx5XrlzBihUrHvFOZz9iHhER+fr6wtfX1zBd9HFVRSnXQSvFy6Y0PXr0wNKlSw3Tp06dKvV5/v7+\nWLduHe7evQsASE9Ph6WlJezs7LB9+3bD806fPg0A8PT0xJYtWwAAoaGhJZa3du1aZGVlAQCuXbuG\nmzdvIjk5GbVq1cKwYcPw4YcfllqixX377bcAgK1bt6Jz586GsUNCQgzP+e233x45xq+//oovvvgC\nmzZteuTziIjItMpVeKVtCRWf/vjjj5GXlwdXV1e4uLjgk08+KXWcHj16oH///mjfvj3c3d3xxRdf\nAAA2bdqENWvWwM3NDS4uLti1axcAYMmSJVi+fDm0Wi2Sk5MN4/j7+2PYsGHo3LkzXF1dMXToUGRm\nZuLMmTPo0KEDdDodPv3008du3aWnp0Or1WLZsmVYvHgxACAkJARRUVHQarVwcXHBypUrH7lOli9f\njvT0dPj5+cHd3R3jx49/3OokIiIT4MWjjYAXj6aKx4tH04uPF48mIiJ6Ci/89+ENHjwY8fHxAO5/\nFqlSqbBgwQL4+/ubNhgRERkVd2kaAXdpUsXjLk168XGXJhER0VN44XdpVh28didVHLW6hakjED13\nWHhGwt1PRESmxV2aRESkCCw8IiJSBBYeEREpAguPiIgUgYVHRESKwMIjIiJFYOEREZEisPCIiEgR\nWHhERKQILDwiIlIEFh4RESkCC4+IiBSBhUdERIrAwiMiIkVg4RERkSKw8IiISBFYeEREpAj8xnMj\nUalUpo5ACqJWt0BKSrypYxBVKSoREVOHeNHdLzuuZjImFfirTc87lapif465S5OIiBSBhUdERIrA\nwqtAQUFB2LFjh6ljEBFRKVh4RESkCCy8B7788ktoNBq4uroiJCQECQkJcHR0xJtvvgknJycEBAQg\nJycHABAdHQ1fX194eHigV69euH79uonTExHR47DwcL/ANmzYgMjISBw7dgyrV69Geno6Ll68iMmT\nJ+P8+fOwtLTEihUrkJ+fjylTpiAsLAyRkZEICgrCzJkzTf0WiIjoMXgeHoAjR45g0KBBeOmllwAA\ngwcPRnh4OJo3b45OnToBAN58800sW7YMPXr0wNmzZ+Hv7w8RQWFhIZo0aWLK+EREVA4svFIUnffx\n95PFi84JcXFxQURExBOOOrvYfd8HNyIiKqLX66HX6ytvAUISHR0tWq1W7t69K5mZmaLRaOTUqVOi\nUqnk+PHjIiIyduxYWbx4seTm5kqbNm3k2LFjIiKSl5cn586dExGRUaNGSVhY2EPjAxBAeOPNiDcY\n7xeIqJJU9M8xP8MDoNPpMGrUKHh4eKBz584YN24c6tevDwcHByxfvhxOTk7IyMjAhAkTYG5uju3b\nt2PatGlwc3ODTqfDsWPHAPDyYUREVRkvLVaGhIQE9O3bF2fOnHnmsXhpMTI+XlqMnn+8tJgRcYuN\niOjFwS08I+AWHhkft/Do+cctPCIioqfAwiMiIkVg4RERkSLwxHOj4QEwZDxqdQtTRyCqclh4RsID\nCIiITIu7NImISBFYeEREpAgsPCIiUgQWHhERKQILj4iIFIGFR0REisDCIyIiRWDhERGRIrDwiIhI\nEVh4RESkCCw8IiJSBBYeEREpAguPiIgUgYVHRESKwMIjIiJFYOEREZEisPCIiEgR+I3nRqJSqUwd\ngeipqdUtkJISb+oYRM9EJSJi6hAvuvtlx9VMzzMV+KeCjE2lqtifO+7SJCIiRWDhERGRIrDwiIhI\nERRbeAkJCdBoNKaOQURERqLYwgN45CQRkZIouvDy8vLw5ptvwsnJCQEBAbh79y7s7Owwc+ZM6HQ6\neHh4IDo6Gj169ECbNm2watUqAMDkyZOxZ88eAMCgQYMwduxYAMC6devw8ccfm+z9EBFR2RRdeBcv\nXsTkyZNx/vx51K1bFytWrIBKpYKtrS1iYmLg7e2NoKAg7Ny5E8eOHcMnn3wCAPD29kZ4eDgA4Nq1\nazh//jwAIDw8HD4+PiZ7P0REVDZFn3jevHlzdOrUCQAwfPhwLF26FADQr18/AIBGo0FWVhZq166N\n2rVr46WXXsLt27fh7e2NJUuW4MKFC3ByckJGRgZSUlJw7NgxLFu2rIylzS523/fBjYiIiuj1euj1\n+kobX9GF9/fP8Iqma9asCQAwMzMz3C+an5+fjyZNmiAjIwP79+9Hly5dkJaWhu+++w6WlpawsLAo\nY2mzK+MtEBG9MHx9feHr62uYDg4OrtDxFb1LMyEhASdOnAAAbN68Gd7e3uV+badOnbB48WL4+PjA\ny8sLixYteqLXExGRcSm68Nq2bYvly5fDyckJt27dwoQJEx75/OJbhN7e3igoKEDLli3h7u6O9PR0\nfn5HRFSF8VqaRsBradLzj9fSJOPjtTSJiIieAguPiIgUgYVHRESKwMIjIiJFUPR5eMbF63bS80ut\nbmHqCETPjIVnJDzCjYjItLhLk4iIFIGFR0REisDCIyIiRWDhERGRIrDwiIhIEVh4RESkCCw8IiJS\nBBYeEREpAguPiIgUgYVHRESKwMIjIiJFYOEREZEisPCIiEgRWHhERKQILDwiIlIEFh4RESkCC4+I\niBSB33huJCqVytQRiIiMQq1ugZSUeFPHeIhKRMTUIV5098uOq5mIlEKFiqgWlapixinCXZpERKQI\nLDwiIlIEFh4RESlClS28hIQEaDQaAMCGDRswZcoUEyciIqLnWZUtPKDkkY08ypGIiJ6FUQtvxowZ\nWLFihWE6ODgYixYtwkcffQSNRgOtVovvvvvukWP88MMP8PT0RFpaGrZt2waNRgOdTgdfX18AQN++\nfXH27FkAgLu7Oz777DMAwKxZs7BmzRpkZWWhe/fuaN++PbRaLXbt2mUYe86cOWjbti18fHwwbNgw\nfPnllwCA2NhY9OrVCx4eHujSpQsuXboEAAgKCsLUqVPh6emJ1q1bY8eOHRW2roiIqIKJEcXExEiX\nLl0M005OTrJx40Z57bXXRETk+vXr0rx5c0lJSZH4+HjRaDQiIrJ+/XqZMmWK7Ny5U3x8fOTWrVsi\nIqLRaOTatWsiIobHFixYICtWrJBbt26Jh4eH9OzZU0RE/Pz85NKlS1JQUCB37twREZHU1FRp3bq1\niIicPHlSdDqd5Obmyp07d6RNmzbyxRdfiIhIt27d5PLlyyIicuLECenatauIiIwaNUoCAgJEROT8\n+fOGsf4OgADCG2+88aaQGyqgMSpunCJGPfHczc0NN2/eREpKCm7cuIGXX34Zp06dQmBgIADAysoK\nvr6+iIyMNHx+V+TQoUOIiorCgQMHUKdOHQCAl5cXRo4ciYCAAAwePNjw2NKlS2Fra4s+ffrgp59+\nwt27dxEXF4c2bdogPz8fM2bMwC+//AIzMzNcu3YNN27cwNGjRzFgwACYm5vD3Nwc/fr1AwBkZWXh\n6NGjGDp0KO6vfyAvL8+Qa+DAgQAAR0dH3Lhx4xHvfnax+74PbkREVESv10Ov11fa+Ea/0srQoUOx\nbds2pKSk4B//+Afi4uJKzC8qlb9r1aoV4uLicPHiRbRr1w4AsGLFCkRGRmLPnj1o164doqOj4eHh\ngaioKLRq1Qr+/v7466+/8M0336B9+/YAgNDQUKSmpiImJgZmZmaws7NDTk5OmXkLCwvRoEEDREdH\nlzq/Zs2aj81+3+xHzCMiIl9fX8PHU8D9j70qktEPWgkICMDWrVsRFhaGoUOHwsvLC99++y0KCwtx\n8+ZNhIeHo0OHDg+9ztbWFmFhYXjrrbdw/vx5APc/W/Pw8EBwcDCsrKxw5coVmJubw8bGBtu2bUPn\nzp3h5eWFRYsWwcfHBwBw69YtWFlZwczMDIcPH0ZiYiIAwNPTE7t378a9e/eQmZmJPXv2AAAsLS1h\nZ2eH7du3G7KcPn261Pf26MIjIiJTMnrhOTk54c6dO2jWrBnUajUGDRoEV1dXaLVadO/eHQsXLoSV\nlVWpr7W3t0doaCgCAgIQFxeHDz/8EK6urnB1dcWrr74KV1dXAIC3tzesrKxQs2ZNeHt74+rVq/D2\n9gYADB8+HJGRkdBqtdi0aRPatm0LAGjfvj369+8PrVaLPn36wNXVFfXq1QMAbNq0CWvWrIGbmxtc\nXFwMB7r8/chRHklKRFR18VqaxWRlZcHCwgJ3796Fj48PvvnmG7i5uT3zuLyWJhEpS9W8lia/LaGY\n8ePH4/z587h37x5GjRpVIWVHRERVA7fwjIBbeESkLFVzC69KX2mFiIioorDwiIhIEVh4RESkCDxo\nxWh4ygIRKYNa3cLUEUrFwjMSHhtERGRa3KVJRESKwMIjIiJFYOEREZEisPAUrDK/huNpMVP5MFP5\nVcVczGQaLDwFq4o/4MxUPsxUflUxFzOZBguPiIgUgYVHRESKwItHGwG/J4+I6Onw64GeM/w/BRGR\n6XGXJhERKQILj4iIFIGFV8l+/PFHtG3bFvb29liwYEGlLmvMmDFQq9VwdXU1PJaeno7XXnsNDg4O\n6NGjB27dumWYN2/ePLRp0waOjo44cOCA4fHo6Gi4urrC3t4e77333jNlSkpKQteuXeHs7AyNRoOl\nS5eaPNe9e/fQsWNH6HQ6ODs7Y+bMmSbPVKSwsBDu7u7o379/lchka2sLrVYLnU6HDh06VIlMAHDr\n1i0MHToUjo6OcHZ2xokTJ0ya69KlS9DpdHB3d4dOp0O9evWwdOlSk6+refPmwdnZGa6urhg+fDhy\nc3NNnikkJAQajcY0fw+EKk1BQYG0atVK4uPjJTc3V7RarVy4cKHSlhceHi4xMTGi0WgMj3300Uey\nYMECERGZP3++TJs2TUREzp07J25ubpKXlydxcXHSqlUrKSwsFBGRDh06yMmTJ0VEpFevXvLjjz8+\ndabk5GSJiYkREZE7d+6Ivb29XLhwweS5srKyREQkPz9fOnbsKEeOHDF5JhGRL7/8UoYPHy79+vUT\nEdP/+9nZ2UlaWlqJx0ydSURk5MiRsnbtWhERycvLk4yMjCqRS+T+733jxo0lMTHRpJni4+PFzs5O\n7t27JyIiAQEBsn79epNmOnv2rGg0GsnJyZH8/Hzx9/eXy5cvGy0TC68SHTt2THr27GmYnjdvnsyf\nP79SlxkfH1+i8BwcHCQlJUVE7pePg4NDqVl69uwpx48fl+TkZHF0dDQ8vmXLFpkwYUKF5RswYIAc\nPHiwyuTKysoSDw8POXfunMkzXblyRbp37y6HDx82FJ6pM9na2kpqamqJx0yd6datW9KyZcuHHjd1\nriL79+/fWSDDAAAEoklEQVQXLy8vk2dKS0sTBwcHSUtLk7y8POnXr5/Jf/e2bdsmY8eONUzPmTNH\nPv/8c2nbtq1RMnGXZiW6evUqbGxsDNPNmjXD1atXjZrhxo0bUKvVAABra2vcuHGj1GxNmzbF1atX\ncfXqVTRr1qxSMsfHx+PUqVPo1KkTrl+/btJchYWF0Ol0sLa2hq+vL5ycnEye6Z///CcWLlxY4jQW\nU2dSqVTw9/eHh4cHVq9eXSUyxcXFoVGjRggKCoK7uzvGjx+P7Oxsk+cq8u2332LYsGEATLuuGjRo\ngP/7v/9D8+bN0bRpU9SrVw/du3c3aSYXFxeEh4cjPT0d2dnZ2Lt3L65cuWK0TCw8hTHVOYGZmZkY\nMmQIQkJCUKdOnYdyGDuXmZkZYmJikJSUhPDwcOj1epNm+uGHH6BWq+Hm5vbI01iMvZ4iIiIQHR2N\nvXv3Yvny5QgPDzf5v11+fj6io6MxadIkREdHw8LCAvPnzzd5LgDIy8vDrl27MHTo0FIzGDNTbGws\nFi9ejISEBFy7dg1ZWVkIDQ01aaa2bdti2rRp8Pf3R+/evaHT6VCtWrWHnldZmVh4lahp06ZITEw0\nTCclJaFp06ZGzaBWq3H9+nUAQEpKCqysrAzZrly58lC2sh5/Fvn5+RgyZAhGjBiBAQMGVJlcAFC3\nbl307t0bUVFRJs0UERGBXbt2oWXLlggMDMTPP/+MESNGwNra2qTrqXHjxgCAV155BQMHDsTJkydN\n/m/XrFkz2NjYoH379gCA119/HdHR0SbPBQD79u1Du3bt0KhRIwCm/TmPioqCp6cnXn75ZVSrVg2D\nBg3C0aNHTb6egoKCEBUVBb1ej/r168PBwcFomVh4lcjDwwOXL19GQkICcnNzsXXrVsPRd5VF7n8u\na5ju378/1q9fDwDYsGGDoXD69++PrVu3Ijc3F3Fxcbh8+TI6dOgAa2tr1KtXDydPnoSIYOPGjYbX\nPK3Ro0fDyckJU6dOrRK5UlNTDUeB3b17FwcPHoROpzNpprlz5yIxMRGxsbHYunUrunbtiv/85z/o\n16+fyTJlZ2cjMzMTAJCVlYUDBw5Ao9GY/GdKrVbDxsYGly5dAgAcOnQIzs7OJs8FAFu2bEFgYKBh\n2pSZHBwccPz4ceTk5EBEcOjQITg5OZl8Pd28eRMAkJiYiJ07d2LYsGHGy/RUnzxSue3bt0/s7e2l\ndevWMm/evEpdVmBgoDRu3Fhq1KghNjY2snbtWklLS5Nu3bqJvb29+Pv7S3p6uuH5c+fOlVatWknb\ntm1l//79hsejoqLExcVFWrduLe++++4zZTpy5IiYmZmJVqsVNzc30el0sm/fPvnrr79Mluv06dOi\n0+nEzc1NXF1dZeHChSIiJs1UnF6vNxy0YspMsbGxhn83FxcXw89vVVhPp06dkvbt24tWq5VBgwZJ\nRkaGyXNlZWVJo0aN5Pbt24bHTJ3p888/FycnJ9FoNPLWW29Jbm6uyTN5e3uLs7OzuLm5yeHDh0XE\neOuJ19IkIiJF4C5NIiJSBBYeEREpAguPiIgUgYVHRESKwMIjIiJFYOEREZEisPCIiEgRWHhERKQI\n/w/6rBaHgYUHFwAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "counts.head(6).plot.barh(title='Top 6 popular brands', legend=False)" ] }, { "cell_type": "code", "execution_count": 351, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 351, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAc4AAAEKCAYAAACbn7USAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xtczvf/P/DHu9DPociplCgJlbq6IucooW0OMzNyaDQ2\nx+njs2F82EdzGF/2ncOHzXczfBznuK0dnDUkx3LYmDG6khX6lKxIp+fvDx9votIbV1e5Hvfdrtuu\n9+F6vZ/vV+nR+/RKEREBERERlYiFqQsgIiIqTxicREREGjA4iYiINGBwEhERacDgJCIi0oDBSURE\npAGDk8jEduzYATc3t3K3/QEDBmD27NlGqKj82rNnD/R6vanLICNjcJLZsLa2ho2NDWxsbGBpaYkq\nVaqo89avX//ct/fTTz9Br9ejWrVqcHZ2RmRkZJHrKory3Levham3/6IICgpCXFycqcsgI6tg6gKI\nSstff/2lvm/UqBGWL1+OwMBAo2zr5MmTeOutt7Bu3ToEBAQgNTUVGRkZz6Xt/Px8WFjwd96yJi8v\nD5aWlqYug0oB//WRWRIRPDpoVlZWFsaMGQMHBwc0aNAAEydORF5eHoAHpzOnT5+OWrVqwdXVFZs3\nby6y/RkzZmDcuHEIDAyEoiioVasWGjZsWGw9RbU9YMAAhIeHIzg4GNbW1jh8+DC++eYb+Pj4oHr1\n6nB2dsbHH3+srn/+/HlUrFgRK1euhJOTE+zs7DB//nx1+e3btzFo0CDY2tpCp9M9doQ0Y8YMODg4\noHr16vD09ER0dHSRdScnJ6Nz586wsbFB165dkZSUBAAYPnw4pk6dWmDd4OBgLFu27LE27t69CwsL\nCyxbtgyurq6oUaMGZs6cid9//x2tW7eGra0tQkNDkZ+fr35m27Zt0Ol0sLW1RadOnXDu3LkC9Tdq\n1Ag2Njbw9vbGjz/+qC5btmwZunTpgvDwcNja2sLNzQ179uwpcv/q1auHefPmwd3dHbVr18aIESOQ\nk5MD4MH3xMyZM2Fvb4/Ro0c/dtrbYDCgd+/eqFOnDurWrYsJEyYUqKVZs2aoXbs2evbsiT///LPI\nOqiMESIz5OzsLHv27Ckwb8KECdKxY0dJTU2V69evi5+fn8yePVtERLZv3y4VKlSQf/zjH5KTkyO7\nd++WKlWqSHx8fKHtOzg4SEREhHh6eoqDg4MMHTpU0tPTC133SW2HhIRIrVq15NixYyIicvfuXdm7\nd6+cPXtWRETi4uKkVq1asmPHDhER+e2330RRFBk7dqzcvXtXjh07JpUqVZLLly+LiEh4eLh06dJF\nbt26JfHx8dK0aVNxc3MTEZFTp05Jo0aN5MaNGyIicvny5SL3MSQkRGxtbeXIkSNy9+5dGTlypHTp\n0kVERPbv3y+NGjVS1/3zzz+latWqkpaW9lg7WVlZoiiK9OvXT27fvi0nT56UihUrSrdu3eTKlSuS\nlpYmbm5usnHjRhERiYmJEQcHB4mLi5P8/Hz54osvpEmTJpKXlyciIhs3bpRr166JiMiaNWvE2tpa\n/vOf/4iIyOeffy6VKlWSNWvWSH5+vnz66afi7Oxc6P6JiNjb24uvr68kJydLSkqK+Pn5yaxZswp8\n3aZPny45OTmSlZUl27dvV/syJydH3N3dZcqUKXLnzh3JysqSmJgYERHZsGGDeHh4yMWLFyU3N1em\nTZsmgYGBRdZBZQuDk8xSYcHp6OgoUVFR6vS3334r7u7uInLvh2TlypXl7t276vJevXrJ/PnzH2s7\nLy9PLCwspEmTJhIfHy9//fWX9OzZU4YPH15oLU9qOyQkREaMGFHs/owcOVKmTJkiIveC08LCQlJT\nU9Xl3t7e8u2334rIvVDfv3+/umzRokXqD/tff/1VHBwcZN++fZKbm1vsNkNCQiQsLEydTk1NFUVR\nJCUlRUREXF1d5eDBgyIiMn/+fHn99dcLbed+cMbFxanzPD09ZdGiRer0mDFjZPLkySIiEhYWpv5C\nc1/Dhg3l6NGjhbbfrFkz2blzp4jcC04vL68CNVtYWBT5S429vb38+9//Vqe3bt0qzZs3F5F7X7dq\n1aoV6KeHg3Pv3r1Sv379QtsNDAyUdevWqdPZ2dlSsWJFuX79eqHrU9nCU7VE/5WcnIwGDRqo0w0b\nNsTVq1fV6Tp16qBSpUoFlhd2es3CwgKVKlXC22+/jYYNG6JatWr44IMPCpwyfNST2nZyciqwfnR0\nNAICAlC3bl3UqFEDq1atQkpKirrc0tIStra26nSVKlWQkZEBEUFycjLq169fYFv3eXh4YM6cOfjH\nP/4BOzs7hIaG4vr160XW/XBdtra2qFatmlr34MGDsWbNGgDAmjVrEBoaWmQ7AFC3bl31feXKlWFn\nZ1dg+v41YoPBgNmzZ6NmzZqoWbMmbG1tkZKSon6tli9fDp1Opy77448/CvSNvb19gX4RkWKvPz/a\nVw9/Xezt7Yu8rpmYmAgXF5dClxkMBowcOVLdh7p166JSpUpITEwssg4qOxicRP9Vr149GAwGddpg\nMMDR0VGdTklJQXZ2tjqdkJAABweHQtvy8vLStO0ntf3oXa/9+/fHgAEDcPXqVdy8eRNDhgx57Jpt\nYRRFgZ2dHa5cuaLOe3ifASA0NBTR0dG4dOkS7ty5g2nTphXZ3sPtpKamIjMzE/Xq1QMAvPnmm9i8\neTNiY2ORmJiI7t27P7G+knBycsJHH32E1NRUpKamIi0tDRkZGejduzcuXLiAcePG4csvv1SXubq6\nlqhvivJoXxX3dXm0zvj4+EKXNWjQACtXrnxsH/goS/nA4CT6r5CQEERERCA1NRXXr1/H7NmzCxwl\nZWdnY8aMGcjJycHevXuxe/duvP7664W2FRYWhi+//BJXrlxBZmYm5s2bh549exa57cLa7tu3b5Hr\nZ2ZmombNmqhYsSIOHTqETZs2FVheXFD069cPs2bNwq1bt2AwGPDZZ5+py86dO4f9+/cjOzsbVlZW\nqFy5crF38H777bc4duwY7t69i6lTpyIwMBC1a9cGcO/O5WbNmiEsLAz9+/dHhQrP5yb+d955B4sX\nL8aJEycAABkZGYiMjERWVhYyMjJgaWmJ2rVrIzc3F59//jkuXrz4TNtbtGgRkpOTkZKSgrlz5yIk\nJKREn+vQoQOsra0xbdo03LlzB1lZWYiJiQEAjBgxAjNmzMDvv/8OAEhLS8PWrVufqU4qPQxOMkuF\nHSl89NFH8PDwgKenJ3x9feHv71/gLkgXFxdUqFAB9vb2GD58OFauXAlnZ+dC2x81ahRef/116PV6\nuLq6olatWgXubH1UYW3fP4VaWK2ff/453nvvPVSvXh3z589Hv379it2/h6dnzpyJWrVqoUGDBujZ\nsyeGDBmiLrtz5w7ee+891KlTB46OjsjMzMSMGTMKrVlRFAwePBiTJk1CnTp1cP78eaxatarAOkOG\nDMEvv/yCN998s8h9f1K9j2rXrh0WLVqEESNGwNbWFs2aNcP69euhKAr0ej1GjhyJFi1awNHREQaD\nAX5+fpq2/aiQkBAEBgaiadOm0Ol0Bb4nilOhQgX8+OOPOHnyJOrXr4+GDRvim2++Udt899130adP\nH9SoUQO+vr7YvXt3idol01PkWc5hEJmJHTt24N1331WPEKhkdu/ejdGjR5fbfqtXrx62bNmCdu3a\nmboUKkN4xElERpGdnY2FCxdixIgRpi6F6LlicBLRc3fq1CnUrFkTmZmZGD16tKnLeWocipAKw1O1\nREREGvCIk4iISAMO8l7G8VQREdHTMdYJVR5xlgPy3wHJzf31z3/+0+Q1lJUX+4J9wb4o/mVMDE4i\nIiINGJxEREQaMDip3AgICDB1CWUG++IB9sUD7IvSwcdRyjhFUYx+vp6I6EVjzJ+dPOIkIiLSgMFJ\nRESkAYOTiIhIAw6AUA6YyyAITk52SEhINnUZRETF4s1BZZyiKNi3z9RVlI7AQOON9EFE5oU3BxER\nEZURDE4iIiINGJxEREQaMDifgsFggJeXl6nLICIiE2BwPiVzudOViIgKYnA+pZycHAwePBgeHh7o\n168f7ty5AxcXF0yZMgV6vR5+fn6IjY1FcHAw3Nzc8H//938AgLFjx+L7778HALz22msYPnw4AGDF\nihWYNm2ayfaHiIhKhsH5lM6fP4+xY8fi7NmzsLGxwdKlS6EoCpydnREXFwd/f3+EhYVh27ZtiImJ\nwYcffggA8Pf3x4EDBwAAf/75J86ePQsAOHDgADp27Giy/SEiopLhAAhPqUGDBmjTpg0AYNCgQVi0\naBEAoGfPngAALy8vZGZmokqVKqhSpQr+3//7f7h16xb8/f2xYMECnDt3Dh4eHrh58yaSk5MRExOD\nxYsXF7qtlSsfvPfxufciIqIHoqKiEBUVVSrbYnA+pUevcd6ftrKyAgBYWFio7+8vz83NhYODA27e\nvIkdO3agU6dOSE1NxcaNG2FtbY2qVasWuq2hQ42zD0REL4qAgIACf1YtIiLCaNviqdqnZDAYcOTI\nEQDAunXr4O/vX+LPtmnTBp9++ik6duyIDh06YP78+Zo+T0REpsPgfErNmjXDkiVL4OHhgfT0dIwc\nObLY9R8+QvX390deXh4aNWoEX19fpKWl8fomEVE5wbFqyziOVUtEpB3HqiUiIiojGJxEREQaMDiJ\niIg0YHASERFpwOAkIiLSgHfVlnHmNJi8k5MdEhKSTV0GEb0AjHlXLUcOKgf4uw0RUdnBU7VEREQa\nMDiJiIg0YHASERFpwOAkIiLSgMFJRESkAYOTiIhIAwYnERGRBgxOIiIiDRicREREGjA4iYiINGBw\nEhERacDgJCIi0oDBSUREpAGDk4iISAMGJxERkQYMTiIiIg34h6zLAUVRTF1CqXGyc0JCcoKpyyAi\nKpIiImLqIqhoiqJgH/aZuoxSE4hA8FuSiJ6VoihG+1nCU7VEREQaMDiJiIg0YHASERFpYDbBaW1t\n/dzacnFxQWpq6nNrj4iIyo9yEZx5eXnP3MbzvDPVnO5yJSKigowanAaDAe7u7ggLC0PTpk0xaNAg\n7Nq1C+3bt0fTpk1x/Phx3L59G8OGDUObNm3QokULREZGAgBWrVqFV199FUFBQejSpQsAYO7cufD2\n9oZer8eUKVMAAJcuXcLLL78MPz8/dOrUCb///jsAID4+Hu3atYNOp8O0adMK1DV//ny0atUKPj4+\niIiIAADcvn0bPXr0gF6vh7e3NzZt2lTkfomIWkubNm1w6dIlAEBKSgr69u2L1q1bo3Xr1oiJiQEA\nREREYNiwYQgMDETjxo3xr3/9CwCwbNky6PV6+Pr6olGjRggKCnpeXU9EREZi9Oc4//jjD2zZsgUe\nHh5o2bIlNmzYgOjoaERGRmLWrFnw8PBAUFAQli9fjvT0dLRq1UoNyri4OJw5cwbVq1fH9u3bERkZ\niWPHjsHKygo3b94EALzzzjtYtmwZXF1dcfToUYwaNQp79uxBeHg4xowZg0GDBmHp0qVqPbt27cKF\nCxdw9OhRiAh69eqFgwcP4vr163B0dMT3338PAPjrr7+K3S9bW1ucPn0aq1evRnh4OCIjIxEeHo6/\n//3vaNeuHa5cuYLg4GCcPXsWAHD+/HlERUUhPT0dTZs2xahRozBixAiMGDECubm5CAoKwnvvvWeM\nLwERET1HRg9OFxcXeHh4AAA8PT3VUGzevDni4+ORmJiIyMhIzJs3DwCQnZ2NhIR7D8B37doV1atX\nBwDs3r0bYWFhsLKyAgDUqFEDmZmZOHToEN544w31eZ2cnBwAQHR0NLZu3QoACA0NxQcffAAA2Llz\nJ3bt2gVfX1+ICDIzM3HhwgV06NAB77//PiZPnozu3bujQ4cOxe5XSEgIAGDAgAH4+9//rtZ47tw5\ntZaMjAzcvn0bANC9e3dUqFABtWrVgp2dHa5duwYHBwcAwLhx49C5c2e88sorhW5rJVaq733++x8R\nET0QFRWFqKioUtmW0YPzftABgIWFhTptYWGB3NxcVKhQAVu2bIGbm1uBzx0+fBhVq1Yttu38/HzY\n2toiNjb2sWWKoqjXIh9+CFZEMHnyZLz99tuPfSY2NhY//vgjpk6dii5dumDq1KlFbvvh65z33+fn\n5+PIkSOoWLHiY+s/2g+5ubkAgJUrV+LKlSsFjoofNRRDi1xGRERAQEAAAgIC1On7l+GMweg3Bz1p\n5Ibg4GAsWrRInT558mSh63Xt2hUrVqzAnTt3AABpaWmwtraGi4sLNm/erK53+vRpAED79u2xfv16\nAMDatWsLbO+rr75CZmYmAODPP//EjRs3kJSUhMqVK2PgwIGYMGFCoWH8sK+//hoAsGHDBrRt21Zt\ne+HCheo6p06dKraNEydO4JNPPsGaNWuKXY+IiMoOowdnYUdmD09PmzYNOTk58Pb2RvPmzfHhhx8W\n2k5wcDB69eqFli1bwtfXF5988gkAYM2aNVi+fDl8fHzQvHlzfPfddwCABQsWYMmSJdDpdEhKSlLb\n6dq1KwYOHIi2bdvC29sbb7zxBjIyMnDmzBm0atUKer0eH3300ROPNtPS0qDT6bB48WJ8+umnAICF\nCxfi+PHj0Ol0aN68OZYtW1ZsnyxZsgRpaWkIDAyEr68v3nnnnSd1JxERmRjHqi3jOFYtEZF2HKuW\niIiojOCfFStGnz59EB8fD+DetVpFUTB37lx07drVtIUREZHJMDiLcf9xFiIiovt4qpaIiEgDBicR\nEZEGvKu2jDO3AeWd7JyQkJxg6jKIqJwz5l21vMZZDvB3GyKisoOnaomIiDRgcBIREWnA4CQiItKA\nwUlERKQBg5OIiEgDBicREZEGDE4iIiINGJxEREQaMDiJiIg0YHASERFpwOAkIiLSgMFJRESkAYOT\niIhIAwYnERGRBgxOIiIiDRicREREGvAPWZcDiqKYuoQywc7JCckJCaYug4jMnCIiYuoiqGiKogD7\n9pm6jLIhMBD8diWiklAUxWg/L3iqloiISAMGJxERkQYMTiIiIg0YnCZkbW0NAEhKSkK/fv1MXA0R\nEZUEg9OE7t8tW69ePWzcuNHE1RARUUkwOJ/Ra6+9Bj8/P3h5eeHLL78E8OBIEgC2bNmCsLAwAEB8\nfDzatWsHnU6HadOmqesYDAZ4eXmVbuFERPRUGJzPaMWKFTh27BiOHTuGhQsXIjU19bHnLu9Ph4eH\nY8yYMTh16hTq1atX6DpERFS2MTif0YIFC+Dj44M2bdogMTERFy5cKHLd6OhohISEAABCQ0NLq0Qi\nInqOOHLQM/j555+xd+9eHDlyBFZWVggMDERWVlaBo8esrCz1vaIo6jJND+auXPngvY/PvRcREami\noqIQFRVVKtticD6D9PR02NrawsrKCr/99hsOHz4MALCzs8P58+fh5uaGbdu2wcbGBgDQvn17rF+/\nHoMGDcLatWsLtFVskA4daqxdICJ6IQQEBCAgIECdjoiIMNq2eKr2Gbz00kvIycmBp6cnpkyZgrZt\n20JRFMyZMwfdu3dHhw4d4ODgoK6/YMECLFmyBDqdDklJSQXa4jVOIqLygWPVlnEcq/YhHKuWiEqI\nY9USERGVEQxOIiIiDYq9OWjr1q3FfrhPnz7PtRgiIqKyrtjgjIyMBABcv34dhw4dQufOnQEA+/bt\nQ7t27RicRERkdooNzhUrVgAAunXrhrNnz6qj3SQlJWEoH5EgIiIzVKLnOK9cuVJgiDg7OzskJCQY\nrSh6RGCgqSsoE+ycnExdAhFRyYIzKCgIwcHBGDBgAADg66+/RpcuXYxaGD3ARzCIiMqOEj/HuXXr\nVhw4cAAA0LFjR7z22mtGLYzuMeazSERELypj/uzkAAhlHIOTiEg7kw+AsHXrVri5uaF69eqwsbGB\ntbW1Ov4qERGROSnREWfjxo0RGRkJd3f30qiJHsIjTiIi7Ux+xGlnZ8fQJCIiQgnvqm3ZsiX69++P\n3r17w8rKSp3PARCIiMjclCg4b926hSpVqmDnzp3qPEVRGJxERGR2eFdtGcdrnERE2hnzZ2eJjjiz\nsrKwfPly/Prrr8jKylLnf/XVV0YpioiIqKwq0c1BoaGhSE5Oxo4dO9CpUyckJibC2tra2LURERGV\nOSU6VavX6xEXFwdvb2+cPn0aOTk58Pf3x+HDh0ujRrPGU7VERNqZ/HGUihUrAgBq1KiBX375Benp\n6bh+/bpRCiIiIirLSnSN85133kFaWhpmzpyJXr16ISMjAzNmzDB2bURERGXOE4MzPz8fNjY2sLW1\nRceOHXHp0qXSqIuIiKhMKtE1zpYtW+L48eOlUQ89gtc4iYi0M/lfR/nggw9Qu3Zt9O/fH1WrVlXn\n16xZ0yhF0QMMTiIi7UwenC4uLlAU5bH5PG1rfAxOIiLtTB6cd+7cwdKlS3Hw4EEoigJ/f3+MHDkS\nlStXNkpR9EBhv7DQAw3t7BCfnGzqMoiojDF5cPbr1w82NjYYNGgQAGDdunVIT0/Hxo0bjVIUPaAo\nCni8WTQF4BE5ET3G5MHp4eGBs2fPPnEePX8MzuIxOImoMCYfAMHX17fAKEFHjhxBy5YtjVIQERFR\nWVbsc5xeXl5QFAU5OTlo164dGjRoAEVRYDAY0KxZs9KqsVyLjIzEuXPnMHHixCLXSUpKQnh4OE99\nExGVA8WeqjUYDMV+uGHDhs+9ICqIp2qLx1O1RFQYk1/jpMIZDAa89NJLaNOmDQ4dOoSWLVti6NCh\nmD59OlJSUrBmzRqcPXsWx48fx+LFixEWFgYbGxscP34c165dw//8z/+gT58+MBgM6NGjB86cOfPY\nNhicxWNwElFhTH6Nk4r2xx9/YMKECTh//jzOnz+PDRs2IDo6GvPmzcPs2bOhKEqBR0qSk5MRHR2N\nyMhITJo0SZ3Px06IiMoHBuczcnFxgYeHBwDA09MTXbp0AXDv+nB8fPxj6/fu3RsA4O7uzr8wQ0RU\nDpXor6NQ0aysrNT3FhYW6rSFhQVyc3OLXb+kpxGmP/Q+4L8vIiJ6ICoqClFRUaWyLQbnM3qWc+gP\nf7a4dqY/9RaIiMxDQEAAAgIC1OmIiAijbYunap/Rw9cmH71OqWWa1ziJiMoH3lVbxvGu2uLxrloi\nKgzvqiUiIiojGJxEREQaMDiJiIg0YHASERFpwOAkIiLSgM9xlgN8UKVoDe3sTF0CEZkZBmc5wMct\niIjKDp6qJSIi0oDBSUREpAGDk4iISAMGJxERkQYMTiIiIg0YnERERBowOImIiDRgcBIREWnA4CQi\nItKAwUlERKQBg5OIiEgDBicREZEGDE4iIiINGJxEREQaMDiJiIg0YHASERFpwOAkIiLSoIKpC6An\nUxTF1CWYLTtHOyQnJpu6DCIqQxQREVMXQUVTFAWYbuoqzNh0gP9EiMofRVGM9m+Xp2qJiIg0YHAS\nERFpwOAkIiLSoNwHp8FgwPr169XpEydO4G9/+5sJKwJ+/vlnxMTEmLQGIiIyjnIfnJcvX8a6devU\n6RYtWmDBggUmrAiIiorCoUOHTFoDEREZR6kF5+3bt9GjRw/o9Xp4e3tj06ZN2Lt3L3x9faHT6TB8\n+HDk5OQAAFxcXDB9+nS0aNECOp0Ov//+OwBg//790Ov18PX1RYsWLZCZmYnJkyfj4MGD8PX1xcKF\nC/Hzzz+jZ8+eAICUlBR069YNXl5eePvtt+Hs7IzU1FQAwNq1a9G6dWv4+vpi1KhRxd59NXr0aLRq\n1QpeXl6IiIhQ57u4uKjtnThxAoGBgTAYDPj888+xYMEC+Pr6Ijo6GgaDAUFBQfDx8UHXrl2RmJgI\nANi0aRO8vLyg1+sREBDw3PuciIiev1ILzu3bt8PR0RFxcXE4ffo0goODMXToUGzatAmnTp1CTk4O\nPvvsM3X9unXr4sSJExg5ciTmz58PAJg/fz6WLl2K2NhYHDhwAJUrV8acOXPg7++P2NhYhIeHA3jw\n3GNERASCgoJw5swZ9O3bF1euXAEA/Pbbb/j6669x6NAhxMbGwsLCAmvXri2y9tmzZ+Po0aM4deoU\noqKi8MsvvxTYzn2KoqBhw4YYOXIkxo8fj9jYWLRv3x7vvvsuwsLCcPLkSQwcOBDvvvsuAGDGjBnY\nuXMn4uLi8N133z2nniYiImMqtQEQvLy88P7772Py5Mno3r07bGxs0KhRI7i6ugIAhgwZgqVLl2Lc\nuHEAgNdeew3AvVOv27ZtAwC0b98e48ePx6BBg9CnTx84OjoWu82DBw/im2++AQAEBwfD1tYWALBn\nzx7ExsbCz88PIoKsrCzY2dkV2c6GDRvwxRdfIDc3F8nJyTh79iyaN29e4meEYmJi1H0IDQ3FpEmT\n1P0ZMmQI+vXrhz59+hTdwL6H3jsDcCnRZomIzEZUVBSioqJKZVulFpxubm6IjY3Fjz/+iGnTpiEw\nMLDY9a2srAAAlpaWyM3NBQBMmjQJPXr0wA8//ID27dtj586dmmq4H3QigiFDhmDWrFlP/Ex8fDw+\n+eQTnDhxAjY2NggLC0NWVhYAoEKFCsjPzwcAdV5hihr557PPPsOxY8fw/fffo0WLFoiNjVXDvYDi\nu4qIyOwFBAQUuOT18GW1563UTtUmJSWhcuXKGDhwIN5//33ExMQgPj4ely5dAgCsXr36idf5Ll26\nBE9PT0ycOBF+fn747bffYG1tjVu3bhW6fvv27fH1118DAHbu3ImbN28CAIKCgrB582bcuHEDAJCW\nloaEhIRC27h16xaqVasGa2trXLt2DT/99JO6zMXFBSdOnAAAbNmyRZ3/aE3t2rVT7/xds2YN/P39\n1f3x8/NDREQE6tatq55KJiKisqvUjjjPnDmDCRMmwMLCApUqVcJnn32G9PR09O3bF3l5efDz88OI\nESMAFH2EtmDBAuzbtw+Wlpbw9PTEyy+/DEVRYGlpCb1ej6FDh8LHx0dd/5///CcGDhyINWvWoG3b\ntrC3t4e1tTVq1qyJmTNnolu3bsjPz0elSpWwZMkSNGjQ4LFtent7w8fHB+7u7nByckKHDh3UZR9+\n+CGGDRuG6tWrFwj9nj17om/fvvjuu++wePFiLF68GEOHDsX8+fNRp04drFixAgAwYcIEXLhwAQDQ\npUsXeHvlnCC5AAAKPklEQVR7P3M/ExGRcb3QY9VmZ2fD0tISlpaWOHz4MEaPHo3Y2FhTl6UJx6o1\nsekcq5aoPDLmWLUv9F9HSUhIQL9+/ZCfnw8rKyt88cUXpi6JiIjKuRc6OBs3bqzpCLNNmzbIzs4G\ncO8oQ1EUrF69Gp6ensYqkYiIypkXOji1Onz4sKlLICKiMq7cD7lHRERUml7om4NeBEXdYUylw87R\nDsmJyaYug4g04s1BZo6/2xARlR08VUtERKQBg5OIiEgDBicREZEGDE4iIiINGJxEREQaMDiJiIg0\nYHASERFpwOAkIiLSgMFJRESkAYOTiIhIAwYnERGRBgxOIiIiDRicREREGjA4iYiINGBwEhERacDg\nJCIi0oDBSUREpEEFUxdAT6YoiqlLICIqF+zsGiI5Od6o21BERIy6BXom90KTXyIiopJRICJQlHv/\nNwaeqiUiItKAwUlERKQBg5OIiEgDBmcpsLa2NnUJRET0nDA4n0J+fr6m9XlXLBHRi8Nsg9NgMMDd\n3R2DBw+Gh4cH+vXrh6ysLOzZswe+vr7Q6XQYPnw4cnJyAAAuLi744IMP0LJlS2zevBmLFy+Gp6cn\nfHx8MHDgQABAZmYm3nrrLXh7e8PHxwfbtm0DAIgIpk6dCh8fH7Rr1w43btwAAKSkpKBv375o3bo1\nWrdujUOHDpmmM4iIqOTETMXHx4uiKBITEyMiIsOGDZOZM2eKk5OTXLx4UURE3nzzTVm4cKGIiDg7\nO8u8efPUzzs4OEh2draIiKSnp4uIyKRJk2T8+PHqOjdv3hQREUVR5IcffhARkYkTJ8qsWbNERGTg\nwIESHR0tIiIJCQni7u7+WJ0ABBC++OKLL75K9IL6s9NYzPaIEwAaNGiANm3aAAAGDRqEPXv2oFGj\nRnB1dQUADBkyBPv371fX79+/v/pep9Nh4MCBWLt2LSwtLQEAu3fvxpgxY9R1qlevDgCwsrLCK6+8\nAgBo0aIF4uPj1fXHjh0LvV6PXr16ISMjA7dv3y6k0ukPvaKefceJiF44Ubj3MxKYPn26Ubdk1sH5\nqBo1ahS7vGrVqur7H374AWPHjkVsbCz8/PyQl5dX5OcqVqyovre0tERubi4AQERw5MgRxMXFIS4u\nDgkJCahSpUohLUx/6BVQwr0hIjInAWBwloKEhAQcOXIEALBu3Tr4+fkhPj4ely5dAgCsXr0aAQEB\nj31ORJCQkIBOnTphzpw5uHXrFjIzM9G1a1f861//Ute7efOmun5hunXrhoULF6rTp06del67RkRE\nRmLWwdm0aVMsWbIEHh4euHnzJsaPH48VK1agb9++0Ol0sLS0xIgRIwAUvDM2Ly8PgwcPhk6nQ4sW\nLRAeHg4bGxtMnToVaWlp8PLygl6vR1RU1GOffdjChQtx/Phx6HQ6NG/eHMuWLTP6PhMR0bMx27Fq\nDQYDevTogTNnzpi6lGJxrFoiIi04Vq1R8flKIiLSymyPOMsLHnESEWnBI04iIqIyhcFJRESkAYOT\niIhIgwqmLoBKgjcxERGVhJ1dQ6Nvg8FZDvD+LSKisoOnaomIiDRgcBIREWnA4CQiItKAwUnlxv2x\nf4l98TD2xQPsi9LB4KRygz8UHmBfPMC+eIB9UToYnERERBowOImIiDTgIO9lHP+CCxHR0zFWvHEA\nhDKOv9cQEZUtPFVLRESkAYOTiIhIAwZnGbZ9+3Y0a9YMTZo0wdy5c01dznMxbNgw2NnZwdvbW52X\nlpaGbt26oWnTpggODkZ6erq67OOPP4abmxvc3d2xc+dOdX5sbCy8vb3RpEkT/O1vf1PnZ2dnIyQk\nBG5ubmjbti0SEhJKZ8eeQmJiIjp37gxPT094eXlh0aJFAMyzP+7evYvWrVtDr9fD09MTU6ZMAWCe\nfXFffn4+fH190atXLwDm2xfOzs7Q6XTQ6/Vo1aoVgDLQF0JlUl5enri6ukp8fLxkZ2eLTqeTc+fO\nmbqsZ3bgwAGJi4sTLy8vdd7EiRNl7ty5IiIyZ84cmTRpkoiI/Prrr+Lj4yM5OTly+fJlcXV1lfz8\nfBERadWqlRw9elRERF5++WXZvn27iIgsXbpURo0aJSIiGzZskP79+5favmmVlJQkcXFxIiLy119/\nSZMmTeTcuXNm2x+ZmZkiIpKbmyutW7eWgwcPmm1fiIj87//+rwwaNEh69uwpIub778TFxUVSU1ML\nzDN1XzA4y6iYmBh56aWX1OmPP/5Y5syZY8KKnp/4+PgCwdm0aVNJTk4WkXth0rRpUxF5fJ9feukl\nOXz4sCQlJYm7u7s6f/369TJy5EgREQkODpbDhw+LyL0fwLVr1zb6/jwvr776quzatcvs+yMzM1P8\n/Pzk119/Ndu+uHLlinTp0kX27dunBqe59oWzs7OkpKQUmGfqvuCp2jLq6tWrcHJyUqfr16+Pq1ev\nmrAi47l+/Trs7OwAAPb29rh+/TqAx/vA0dERV69exdWrV1G/fn11/sN98/BnLC0tUaNGDaSmppbW\nrjy1+Ph4nDx5Em3atMG1a9fMsj/y8/Oh1+thb2+PgIAAeHh4mG1fjB8/HvPmzSvwOJq59oWiKOja\ntSv8/Pzw5ZdfAjB9X/BxFCpznuezq1IOHufJyMhA3759sXDhQlSrVu2x/TeX/rCwsEBcXBxu3bqF\n4OBgREVFmWVf/PDDD7Czs4OPj0+xQ+iZQ18AQHR0NOrVq4cbN26o1zVN/X3BI84yytHRscBF6sTE\nRDg6OpqwIuOxs7PDtWvXAADJycmoW7cugHt9cOXKFXW9+31Q1PxHP5OXl4dbt26hZs2apbUrmuXm\n5qJv374IDQ3Fq6++CsC8+wMAbGxs8Morr+D48eNm2RfR0dH47rvv0KhRIwwYMAB79+5FaGgo7O3t\nza4vAKBevXoAgDp16qB37944evSoyb8vGJxllJ+fHy5evAiDwYDs7Gxs2LBBvbuuvJN719bV6V69\nemHlypUAgFWrVqkB0qtXL2zYsAHZ2dm4fPkyLl68iFatWsHe3h7Vq1fH0aNHISL497//XeAzq1at\nAgBs2rQJnTt3Lt2d0+itt96Ch4cHwsPD1Xnm2B8pKSnqnZF37tzBrl27oNfrzbIvZs+ejYSEBFy6\ndAkbNmxA586dsXr1avTs2dPs+uL27dvIyMgAAGRmZmLnzp3w8vIy/feF9ku1VFp++uknadKkiTRu\n3Fg+/vhjU5fzXAwYMEDq1asnlSpVEicnJ/nqq68kNTVVgoKCpEmTJtK1a1dJS0tT1589e7a4urpK\ns2bNZMeOHer848ePS/PmzaVx48Yybtw4dX5WVpa88cYb0rhxY2ndurVcvny5NHdPk4MHD4qFhYXo\ndDrx8fERvV4vP/30k/znP/8xu/44ffq06PV68fHxEW9vb5k3b56IiFn2xcOioqLUm4PMsS8uXbqk\n/vto3ry5+nPQ1H3BsWqJiIg04KlaIiIiDRicREREGjA4iYiINGBwEhERacDgJCIi0oDBSUREpAGD\nk4iISAMGJxERkQb/H+EAp/uH8lXOAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "brand_price_df.head(6).plot.barh(x='brand',y='mean_price',title='Top 6 brands by mean price', legend=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Top brands **Audi, Mercedes Benz** and **BMW**, all German, make the list of used cars with a high mean price\n", "- **Ford** and **Opel** are top brands but much cheaper and don't make the list of top brands by price\n", "- **Porsche**, also a German car, is the priciest used car at an average of 49661. It's understandable because it is world's #1 luxury brand clearly reflected in its price tag\n", "- **Sonstige autos** is a distant second priciest at 14265, followed by \n", "- **Mini**, possibly, because it's owned by BMW\n", "- **Renault** is the least expensive used car at 2762. \n", "- The average used car price is 7194.\n", "- **Volkswagen** is the top most brand in terms of listings, however, it's not among the priciest. Its price falls below the average used car.\n", "\n", "We may infer that there are so many **Volkswagen** listings because they can sell easily due to their lower average price. \n", "\n", "However, it would be interesting to see what the average mileage on it is. That tells us if it's not a well liked car and people want to dump quickly or it's so popular that despite high mileage, people expect it to be an easy sale. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis of mileage of top 6 brands \n", "With mean mileage and mean price for each of the top brands, understand if there's any visible link between them" ] }, { "cell_type": "code", "execution_count": 250, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6\n", "{'bmw': 132718.17, 'opel': 128688.21, 'mercedes_benz': 130481.73, 'volkswagen': 128295.33, 'ford': 124699.59, 'audi': 128183.4}\n" ] } ], "source": [ "# Aggregate data by `odometer_km' column\n", "# Assign brands and their mean mileages as key-value pairs to a dictionary\n", "brand_miles = {}\n", "\n", "for b in list(counts.head(6).index):\n", " # Select only rows that correspond to a specific brand\n", " brand_rows = autos[autos[\"brand\"] == b]\n", " # Calculate the mean mileage for those rows\n", " mean_mileage = round(brand_rows[\"odometer_km\"].mean(),2)\n", " brand_miles[b] = mean_mileage\n", " \n", "print (len(brand_miles))\n", "print (brand_miles)" ] }, { "cell_type": "code", "execution_count": 251, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "audi 128183.40\n", "bmw 132718.17\n", "ford 124699.59\n", "mercedes_benz 130481.73\n", "opel 128688.21\n", "volkswagen 128295.33\n", "dtype: float64\n" ] } ], "source": [ "# Convert `brand_miles` dictionary to a series object; don't sort values\n", "brand_miles_series = pd.Series(brand_miles)\n", "print (brand_miles_series)" ] }, { "cell_type": "code", "execution_count": 252, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
avg_miles
audi128183.40
bmw132718.17
ford124699.59
mercedes_benz130481.73
opel128688.21
volkswagen128295.33
\n", "
" ], "text/plain": [ " avg_miles\n", "audi 128183.40\n", "bmw 132718.17\n", "ford 124699.59\n", "mercedes_benz 130481.73\n", "opel 128688.21\n", "volkswagen 128295.33" ] }, "execution_count": 252, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a dataframe from the series brand_miles_series \n", "brand_miles_df = pd.DataFrame(brand_miles_series,columns = ['avg_miles'])\n", "brand_miles_df" ] }, { "cell_type": "code", "execution_count": 253, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6\n", "{'bmw': 8787.18, 'opel': 3395.65, 'mercedes_benz': 8956.98, 'volkswagen': 5941.85, 'ford': 3970.64, 'audi': 9946.88}\n" ] } ], "source": [ "# Similarly, calculate the average price for the top 6 brands, using aggregation\n", "brand_avg_pr = {}\n", "for b in list(counts.head(6).index):\n", " # Select only rows that correspond to a specific brand\n", " brand_rows = autos[autos[\"brand\"] == b]\n", " # Calculate the mean price for those rows\n", " avg_pr = round(brand_rows[\"price\"].mean(),2)\n", " # Assign the mean price to the dictionary brand_avg_pr, using the brand name as the key\n", " brand_avg_pr[b] = avg_pr\n", " \n", "print (len(brand_avg_pr))\n", "print (brand_avg_pr)" ] }, { "cell_type": "code", "execution_count": 254, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/plain": [ "audi 9946.88\n", "bmw 8787.18\n", "ford 3970.64\n", "mercedes_benz 8956.98\n", "opel 3395.65\n", "volkswagen 5941.85\n", "dtype: float64" ] }, "execution_count": 254, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert `brand_avg_pr` dictionary to a Series object\n", "brand_avgpr_series = pd.Series(brand_avg_pr)\n", "print (type(brand_avgpr_series))\n", "brand_avgpr_series" ] }, { "cell_type": "code", "execution_count": 255, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
avg_milesmean_price
audi128183.409946.88
bmw132718.178787.18
ford124699.593970.64
mercedes_benz130481.738956.98
opel128688.213395.65
volkswagen128295.335941.85
\n", "
" ], "text/plain": [ " avg_miles mean_price\n", "audi 128183.40 9946.88\n", "bmw 132718.17 8787.18\n", "ford 124699.59 3970.64\n", "mercedes_benz 130481.73 8956.98\n", "opel 128688.21 3395.65\n", "volkswagen 128295.33 5941.85" ] }, "execution_count": 255, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add the Series object as a new column named `mean_price` to the dataframe `brand_miles_df`\n", "brand_miles_df['mean_price'] = brand_avgpr_series\n", "brand_miles_df" ] }, { "cell_type": "code", "execution_count": 256, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
avg_milesmean_price
bmw132718.178787.18
mercedes_benz130481.738956.98
opel128688.213395.65
volkswagen128295.335941.85
audi128183.409946.88
ford124699.593970.64
\n", "
" ], "text/plain": [ " avg_miles mean_price\n", "bmw 132718.17 8787.18\n", "mercedes_benz 130481.73 8956.98\n", "opel 128688.21 3395.65\n", "volkswagen 128295.33 5941.85\n", "audi 128183.40 9946.88\n", "ford 124699.59 3970.64" ] }, "execution_count": 256, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sort mileage values for seeing any connection to mean price values\n", "brand_miles_df.sort_values('avg_miles', ascending=False)" ] }, { "cell_type": "code", "execution_count": 257, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 257, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAaQAAAEaCAYAAABejCMwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XtYVOWiP/DvACOPptyUmwMyKgMCgqCiaVaICMpJ8JII\nmoLhOXvjKS/1S7PLDqsjpJ1dWtI+JQbsEqTaKu0dSWn4pG4dDU23qGhchDEublAQufP+/iBXoKKA\nDDPA9/M8PM/MO+vyvu8M6zvvWmvWkgkhBIiIiHTMQNcVICIiAhhIRESkJxhIRESkFxhIRESkFxhI\nRESkFxhIRESkFxhIpLf27NmDESNGwMTEBD///PNDLevw4cNwcXGRno8cORIHDx582CpSJ+3atQuz\nZs3q0XUePXoUTk5OMDExQVpaWo+umzpHxt8h9Q0jR45EfHw8fH19dV0ViYGBAS5fvoxRo0Z1aX5H\nR0e8//77eOqpp7q5ZvrZX6Qdfn5+mDt3Lp577jldV4UegCMk0hqZTPZQ8xcUFMDV1bWbakO61tTU\npJP19sTnSFdt62sYSH1QYmIipk2bhhdeeAHm5uZQqVQ4evQoEhISMGLECNjY2CApKUmafvny5YiK\nioK/vz9MTEwwffp0XLlyRXp9zZo1GDFiBExNTeHt7Y3Dhw9LrzU3N2PTpk1wdHSEiYkJvL29UVRU\nhCeffBJCCHh4eMDExARffPHFXfUUQuDtt9+GUqmEjY0NIiIiUFVVhfr6egwZMgTNzc3w8PCASqW6\nZzsNDAzw0UcfQaVSwdTUFH/605+Qm5uLqVOnwtzcHGFhYWhsbAQAHDp0CPb29vdcjhACsbGxcHR0\nhKWlJUJDQ1FRUSG9HhISAltbW5ibm8PHxwfZ2dnSa+Xl5ZgzZw5MTU0xefJkvP7663j88cel1y9c\nuAB/f38MHToULi4u9+yH23799VcEBwdj6NChcHJywo4dO6TXNm7ciEWLFiE8PBwmJiZwd3dHVlZW\nu8tq7z379ddfMWjQIFy/fl2a9tSpU7C0tERTUxOam5vx4osvwtLSEqNHj8b27dthYGCA5ubmdt+D\nDz74AKNHj4aVlRXWrVsnvdb6czhs2DBs3LgRiYmJbfrn3LlzUv/Y2toiNja23fekdZ3v9Mknn0Cl\nUmHYsGGYO3cuiouLAbSMsvPy8vDUU0/BxMQEDQ0Nd81bVFSEBQsWwMrKCpaWlli1ahUAIDc3FzNm\nzMCwYcNgZWWFZ555BpWVldJ8I0eOxObNmzFu3DgMHjy43T6iThDUJyiVSnHgwAEhhBAJCQlCLpeL\nxMRE0dzcLF577TVhZ2cnnnvuOVFfXy8yMjLEkCFDRHV1tRBCiIiICGFiYiIOHz4s6uvrxerVq8W0\nadOkZX/++eeioqJCNDU1iT//+c/CxsZG1NXVCSGE2Lx5s/Dw8BCXLl0SQghx5swZUV5eLoQQQiaT\nidzc3HbrHB8fL1QqlcjPzxfV1dVi/vz5YunSpdLrD5pfJpOJuXPnips3b4rs7GxhbGwsfH19RX5+\nvqisrBSurq4iKSlJCCFEZmamsLe3v2d/vf/++2LKlCni6tWror6+Xvzxj38UYWFh0rSffvqpqK6u\nFvX19WLt2rXC09NTem3RokUiLCxM1NbWiuzsbGFvby8ef/xxIYQQ1dXVwt7eXnofTp8+LSwtLcX5\n8+fv2Z7HH39ceo9uT/vDDz8IIYSIjo4WAwcOFN9++61obm4WGzZsEI8++mi7fXO/92zGjBlix44d\n0rQvvfSSiIqKEkII8dFHHwk3Nzdx9epVcf36deHn5ycMDAxEU1NTu++Br6+vuH79uigsLBROTk4i\nPj5eCNHyOTQyMhLbt28XTU1Nora2ViQkJEj9U1VVJWxtbcV7770n6urqxM2bN4Vare7Qe9LagQMH\nxLBhw8Tp06dFfX29eP7558UTTzwhva5UKsXBgwfvOW9TU5MYN26cePHFF0VNTY2oq6sTR44cEUII\ncfnyZfH999+LhoYGce3aNfHkk0+KtWvXtlmul5eX0Gg0ora2tt33gjqOgdRH3BlITk5O0mtnz54V\nBgYGoqysTCobOnSo+Pnnn4UQLYHU+p/95s2bwtDQUBQVFd1zXebm5uLMmTNCCCGcnZ3F119/fc/p\nZDKZ+OWXX9qt84wZM8RHH30kPb948aKQy+XSxu9B88tkMvHPf/5Tej5hwgSxefNm6fmLL74obUDu\nF0guLi5tNlhXr15tU4/WKioqhEwmE5WVlaKpqUnI5XIpjIUQ4rXXXpM2uLt3726zYRRCiD/84Q/i\nzTffvGu5hYWFwsjISPqSIIQQGzZsEMuXLxdCtATSzJkzpdeys7PFoEGD2u2bO7V+z3bs2CF8fX2l\n1+zt7cXhw4eFEEL4+vqKjz/+WHrt+++/f2AgZWRkSM/j4uKEn5+fEKLlc+jg4NBm+taBlJycLMaP\nH3/P5XbmPYmMjBTr16+Xnt+8eVPI5XJRUFAghGj7Xt/pn//8p7Cysmq3fa3t3bu3TX2VSqVISEh4\n4HzUcdxl10dZW1tLjwcOHAgAGDZsWJuymzdvSs9b78565JFHYGFhgatXrwIA3n33Xbi6usLc3Bzm\n5uaorKzEtWvXAACFhYVdPmnh6tWrcHBwkJ47ODigsbERJSUlHV6GlZVVmzbd2e7WbWxPQUEB5s2b\nBwsLC1hYWMDV1RVyuRwlJSVobm7Gyy+/DEdHR5iZmWHkyJGQyWS4du0aysrK0NTUBDs7O2lZrfux\noKAAx44dk5Zrbm6OXbt2SbuT7uwLCwsLDBo0qE1/aDQa6bmNjY30eNCgQaitrW13N9H93rMFCxbg\n2LFjKCkpwaFDh2BoaIjHHntMqkfrNrS3m7O11u13cHCQPjcPmr+wsBCjR4++52v3e0/udOfn6JFH\nHsHQoUPb9N396uDg4AADg7s3haWlpQgLC4OdnR3MzMzwzDPPSH14W+u208NjIBGAln/M227evIny\n8nIMHz4chw8fxpYtW/Dll1+ioqICFRUVMDExgfjt5Ex7e3v88ssvXVrn8OHDUVBQID0vKCiAXC5v\nEyo9YcSIEUhPT0d5eTnKy8tRUVGB6upq2NraYteuXfj6669x8OBBXL9+Hfn5+RAtexZgaWkJIyMj\nFBUVSctq3Y/29vbw8fFps9zKykps3779rjoMHz4c5eXlqK6ulsquXLkChULR6fY86D0zMzODv78/\nUlJSkJycjNDQUGleW1vbNu1pfSyxPa3bfOXKFQwfPlx6fr8TW+732bnfe3KnOz9H1dXV+Pe//92h\nsLC3t8eVK1fuGeyvvPIKDAwMcO7cOVy/fh2fffaZ1IcdaR91HgOpn7jzH+lO33zzDY4ePYr6+nq8\n/vrrmDJlChQKBaqqqiCXyzF06FDU19fjzTffRFVVlTTfihUr8Prrr+Py5csAgLNnz0onBNjY2CA3\nN7fddYaFheG9995Dfn4+bt68iVdffRWhoaH3/LaqTX/4wx/wyiuvSBvfsrIy6fcqVVVVMDY2hrm5\nOaqrq7FhwwZpI2RgYID58+cjOjoaNTU1uHDhQpuTRZ566ink5OTgs88+Q2NjIxoaGnDy5ElcuHDh\nrjrY2dlh6tSp2LBhA+rq6nDmzBnEx8dj6dKl7da7vff0Qe8Z0NL3SUlJ+Oqrr7B48WKpPCQkBFu3\nbsXVq1dx/fp1bN68+YH9t2XLFly/fh2FhYXYunVrm4C7n6eeegrFxcXYtm0b6uvrcfPmTajVagD3\nf0/uFBYWhk8//RRnzpxBXV0dXnnlFTz66KMdGt1NmjQJtra2ePnll3Hr1i3U1dXh6NGjAFr6cfDg\nwRgyZAg0Gg22bNnSoXZR1zGQ+ogHfVO78/U7ny9evBjR0dEYOnQoTp06hc8++wwAEBAQgICAADg5\nOWHkyJEYNGhQm3/0F154ASEhIfD394epqSlWrFiBmpoaAMAbb7yBZcuWwcLCAl9++eVddXr22Wex\ndOlSPPHEExg9ejQGDRqEbdu2dVubOjrv6tWrERwcLLVh6tSp0oZx2bJlGDFiBBQKBcaOHYupU6e2\nWc4HH3yA69evw9bWFuHh4Vi8eDGMjY0BAIMHD0ZGRgZSUlIwfPhwDB8+HC+//DLq6+vvWafk5GTk\n5eVh+PDhWLBgAd566y1Mnz69w+2/7UHvGQAEBQXh0qVLsLW1hbu7u1T+n//5n/D394eHhwcmTJiA\n//iP/4CRkZH0JSEqKgorV65ss6zg4GBMmDAB48ePx5w5c/Dss8+2W+fWBg8ejO+++w5paWmwsbGB\nk5MTMjMzAdz/PbnTjBkz8NZbb2H+/PlQKBTIy8tDSkrKA/sJaPlS8fXXX+PSpUsYMWIE7O3tkZqa\nCqDl8/vTTz/BzMwMc+bMwYIFC9rMy9GRFmj7INWzzz4rrKyshLu7u1T2xRdfCDc3N2FgYCB++umn\nNtNv2rRJODo6ijFjxoj9+/dL5T/99JNwd3cXKpVKrF69Wiqvq6sTixYtEo6OjuLRRx+VDmRSx0VE\nRIjXX39d19XoE9avXy8iIiJ0XY1uk56eLpRKZbuvP+jEE6LO0PoIafny5di/f3+bMnd3d+zZswdP\nPvlkm/Lz588jNTUV58+fR3p6OlauXCntloiKikJ8fDxycnKQk5MjLTM+Ph4WFha4dOkS1qxZ0+Z3\nEETadvHiRZw9exYAoFarER8fj/nz5+u4Vl1XW1uL9PR0NDU1QaPRYOPGjb26PdS7aD2Qpk2bBnNz\n8zZlzs7OUKlUd+0D37dvH0JDQ2FkZASlUgmVSgW1Wo3i4mJUVVXB29sbQMtulL1790rzhIeHAwCe\nfvppHDhwQNtN6nO466HrqqqqMH/+fAwePBhhYWF46aWXMGfOHF1Xq8uEEHjjjTdgYWGBCRMmwM3N\nDRs3bmx3en52qDsZ6boCrWk0GkyZMkV6rlAooNFoYGRk1OaMGTs7O+mUTo1GI+0fNzQ0hJmZGcrL\ny2FhYdGzle/Fdu7cqesq9FoTJ07EpUuXdF2NbjNw4MB2j9XcCy+ZQ92pz53UcOeoi4iIege9GiEp\nFIo2v2koKiqCQqFot7z1PMOHD0dTUxMqKyvbHR1x9wIRUdf0xJf9Hhkhid9+SNjea7cFBQUhJSUF\n9fX1yMvLw+XLlzFp0iTY2NjA1NQUarUaQggkJSUhODhYmicxMREA8MUXXzzwdgK369Lf/9544w2d\n10Ff/tgX7Av2xf3/eorWR0iLFy9GZmYm/v3vf2PEiBHYuHEjzM3N8fzzz+PatWt46qmn4OnpifT0\ndLi6uiIkJES6TEhcXJw0qtm+fTsiIiJQW1uLwMBA6SZfkZGRWLp0KVQqFYYOHdrm9wdERNR7aD2Q\ndu3adc/yuXPn3rN8w4YN2LBhw13lEyZMkE6vbc3Y2Fj6IRsREfVefe6kBuoYHx8fXVdBb7Avfse+\n+B37ouf1q1uYy2SyHt0fSkTUF/TUtpMjJCIi0gsMJCIi0gsMJCIi0gsMJCIi0gsMJCIi0gsMJCIi\n0gsMJCIi0gsMJCIi0gsMJCIi0gsMJCIi0gsMJCIi0gsMJCIi0gsMJCIi0gsMJCIi0gsMJCIi0gsM\nJCIi0gsMJCIi0gtaD6TIyEhYW1vDw8NDKquoqIC/vz+cnZ0REBCAGzduSK/FxMRApVLBxcUFGRkZ\nUnlWVhY8PDzg5OSENWvWSOX19fUIDQ2FSqXClClTcOXKFW03iYiItEDrgbR8+XLs37+/TVlsbCz8\n/Pxw8eJF+Pr6IiYmBgCQnZ2N1NRUnD9/Hunp6Vi5cqV029yoqCjEx8cjJycHOTk50jLj4+NhYWGB\nS5cuYc2aNVi3bp22m0RERFqg9UCaNm0azM3N25Tt27cP4eHhAIDw8HDs3bsXAJCWlobQ0FAYGRlB\nqVRCpVJBrVajuLgYVVVV8Pb2BgAsW7ZMmqf1sp5++mkcOHBA200iIiIt0MkxpNLSUlhbWwMAbGxs\nUFpaCgDQaDSwt7eXplMoFNBoNNBoNLCzs5PK7ezsoNFo7prH0NAQZmZmKC8v76mmUB9VVlaGEydO\noKysTNdVIeo39OKkBplM1m3Lur2Lj6irkpN3w8FhDGbO/CMcHMYgOXm3rqtE1C8Y6WKl1tbWKCkp\ngbW1NYqLi2FlZQWgZURUWFgoTVdUVASFQtFueet5hg8fjqamJlRWVsLCwqLddUdHR0uPfXx84OPj\n072No16trKwMkZErUVPzA2pqPACcQWTkdPj5+cLS0lLX1SPqEZmZmcjMzOz5FYsekJeXJ8aOHSs9\nX7dunYiNjRVCCBEbGyvWr18vhBDi3LlzwtPTU9TV1Ync3FwxevRo0dzcLIQQYvLkyeL48eOiublZ\nzJ49W6SnpwshhNi+fbuIiooSQgiRnJwsFi1a1G49eqi51Iup1WphajpeAEL6MzHxEmq1WtdVI9KZ\nntp2an0tYWFhwtbWVgwYMEDY29uLnTt3ivLycjFjxgzh5OQkZs6cKSoqKqTpN23aJEaPHi3GjBkj\n9u/fL5WfPHlSjB07Vjg6OopVq1ZJ5bW1tWLhwoXC0dFRTJ48WeTl5bVbFwYSPUhpaakYONBCAD//\nFkg/i4EDLURpaamuq0akMz217ZT9trJ+QSaT8RgTPVBy8m5ERq6EXO6AhoYCxMfHISxska6rRaQz\nPbXtZCAR3UNZWRny8/OhVCp57Ij6PQaSFjCQiIg6r6e2nXpx2jcREREDiYiI9AIDiYiI9AIDiYiI\n9AIDiYiI9AIDiYiI9AIDiYiI9AIDiYiI9AIDiYiI9AIDiYiI9AIDiYiI9AIDiYiI9AIDiYiI9AID\niYiI9AIDiYiI9AIDiYiI9AIDiYiI9IJOA2nr1q1wd3eHu7s7tm3bBgCoqKiAv78/nJ2dERAQgBs3\nbkjTx8TEQKVSwcXFBRkZGVJ5VlYWPDw84OTkhDVr1vR4O4iI6OHpLJDOnTuH+Ph4nDx5EqdPn8bf\n//53/PLLL4iNjYWfnx8uXrwIX19fxMTEAACys7ORmpqK8+fPIz09HStXrpRuqRsVFYX4+Hjk5OQg\nJycH+/fv11WziIioi3QWSOfPn8fkyZNhbGwMQ0NDPPHEE/jb3/6GtLQ0hIeHAwDCw8Oxd+9eAEBa\nWhpCQ0NhZGQEpVIJlUoFtVqN4uJiVFVVwdvbGwCwbNkyaR4iIuo9dBZIY8eOxY8//oiKigrcunUL\n33zzDQoLC1FSUgJra2sAgI2NDUpLSwEAGo0G9vb20vwKhQIajQYajQZ2dnZSuZ2dHTQaTc82hohQ\nVlaGEydOoKysTNdVoV7KSFcrHjNmDNavX4+ZM2di8ODB8PLygqGh4V3TyWSybl1vdHS09NjHxwc+\nPj7dunyi/ig5eTciI1diwAAl6uvzER8fh7CwRbquFnVRZmYmMjMze3y9OgskAFi+fDmWL18OAHj1\n1Vdhb28Pa2traZRUXFwMKysrAC0josLCQmneoqIiKBSKdsvb0zqQiOjhlZWVITJyJWpqfkBNjQeA\nM4iMnA4/P19YWlrqunrUBXd+Wd+4cWOPrFenZ9ndHtpfuXIFe/bsweLFixEUFISEhAQAQGJiIoKD\ngwEAQUFBSElJQX19PfLy8nD58mVMmjQJNjY2MDU1hVqthhACSUlJ0jxEpH35+fkYMEAJwOO3Eg/I\n5Q7Iz8/XXaWoV9LpCGnBggUoLy+HXC5HXFwcTExMsH79eoSEhGDnzp1wcHBAamoqAMDV1RUhISFw\ndXWVpr+9O2/79u2IiIhAbW0tAgMDMWvWLF02i6hfUSpbdtMBZ9ASSmfQ0FAApVKp03rps7KyMuTn\n50OpVHIU2YpM3D53uh+QyWToR80l6jG3jyHJ5Q5oaCjgMaT76I3H23pq28lAIqJuwW/9D1ZWVgYH\nhzGoqfkBt0eTAwdOR0HBBb3us57adup0lx0R9R2WlpZ6vVHVB7ePt7Wc/AG0Pt7GvuO17IiIekzb\n420Aj7e1xUAiIuohlpaWiI+Pw8CB02FiMh4DB05HfHwcR0e/4TEkIqIe1tuOt/GkBi1gIBERdV5P\nbTu5y46IiPQCA4mIiPQCA4mIiPQCA4mIiPQCA4mIiPQCA4mIiPQCA4mIiPQCA4mIiPQCA4mIiPQC\nA4mIiPQCA4mIiPQCA4mIiPQCA4mIiPSCTgMpJiYGbm5u8PDwwJIlS1BfX4+Kigr4+/vD2dkZAQEB\nuHHjRpvpVSoVXFxckJGRIZVnZWXBw8MDTk5OWLNmjS6aQkRED0lngVRQUIBPPvkEp06dwpkzZ9DY\n2Ijk5GTExsbCz88PFy9ehK+vL2JiYgAA2dnZSE1Nxfnz55Geno6VK1dKl0OPiopCfHw8cnJykJOT\ng/379+uqWURE1EU6CyQTExMMGDAA1dXVaGxsRE1NDRQKBfbt24fw8HAAQHh4OPbu3QsASEtLQ2ho\nKIyMjKBUKqFSqaBWq1FcXIyqqip4e3sDAJYtWybNQ0REvYfOAsnc3BwvvvgiRowYAYVCAVNTU/j5\n+aGkpATW1tYAABsbG5SWlgIANBoN7O3tpfkVCgU0Gg00Gg3s7Oykcjs7O2g0mp5tDBERPTQjXa04\nNzcX7733HgoKCmBqaoqFCxfi888/h0wmazPdnc8fVnR0tPTYx8cHPj4+3bp8IqLeLjMzE5mZmT2+\nXp0F0smTJ/HYY4/BwsICADBv3jwcPXoU1tbW0iipuLgYVlZWAFpGRIWFhdL8RUVFUCgU7Za3p3Ug\nERHR3e78sr5x48YeWW+Hd9kJIfDZZ5/hzTffBABcuXIFarW6yyt2dnbGsWPHUFtbCyEEDhw4AFdX\nVwQFBSEhIQEAkJiYiODgYABAUFAQUlJSUF9fj7y8PFy+fBmTJk2CjY0NTE1NoVarIYRAUlKSNA8R\nEfUeMnH7VLUHiIqKgoGBAQ4ePIjz589Lp2efOHGiyyvfsmULEhISYGhoCC8vL+zYsQNVVVUICQlB\nYWEhHBwckJqaCjMzMwAtp33Hx8dDLpdj69at8Pf3BwD89NNPiIiIQG1tLQIDA7F169Z7N1YmQweb\nS0REv+mpbWeHA2n8+PHIysqCl5cXTp06BQAYN24cfv75Z61WsDsxkIiIOq+ntp0d3mUnl8vR1NQk\nnWRQVlYGAwNe6IGIiLpHhxNl1apVmDdvHkpLS/Hqq69i2rRpeOWVV7RZNyIi6kc6vMsOAC5cuIAD\nBw5ACIEZM2bAxcVFm3XrdtxlR0TUeXp3DOnYsWNwc3PDkCFDAACVlZU4f/48Jk+erNUKdicGEhFR\n5+ldIHl5eSErK0s6htTc3IyJEyciKytLqxXsTgwkIqLO07uTGoQQba6aYGBggMbGRq1UioiI+p8O\nB9KoUaOwbds2NDQ0oKGhAVu3bsWoUaO0WTciIupHOhxIf/nLX3D06FEoFArY2dnh+PHj+Pjjj7VZ\nNyIi6kc6dZZdb8djSEREnddT284HXlx18+bNWLduHZ5//vl7Xnl727ZtWqkYERH1Lw8MpNu/NZo4\ncaLWK0NERP3XAwNpzpw5aGpqwtmzZ/Huu+/2RJ2IiKgf6tBJDYaGhjhy5Ii260JERP1Yh2/Q5+np\niaCgICxcuBCPPPKIVD5//nytVIyIiPqXDgdSbW0thg4dioMHD0plMpmMgURERN2Cp30TEdF96d2l\ng3JzczFnzhxYWlrCysoKwcHByMvL02bdiIioH+lwIC1evBghISH49ddfcfXqVSxcuBChoaHarBsR\nEfUjHd5l5+HhgTNnzrQp4y3MiYj6Pr3bZTd79mzExsYiPz8fBQUF2Lx5MwIDA1FeXo7y8vJOrzgn\nJwdeXl4YP348vLy8YGpqim3btqGiogL+/v5wdnZGQEAAbty4Ic0TExMDlUoFFxcXZGRkSOVZWVnw\n8PCAk5MT1qxZ0+m6EBGR7nV4hDRy5Mj2FyKTITc3t8uVaG5uli7Y+uGHH2Lo0KFYt24d3nnnHVRU\nVCA2NhbZ2dlYsmQJTpw4gaKiIvj5+eHSpUuQyWSYPHkyPvzwQ3h7eyMwMBCrV69GQEDAPevJERIR\nUefozbXsbnvQCQzfffcdZs6c2aVKfP/99xg9ejTs7e2xb98+HDp0CAAQHh4OHx8fxMbGIi0tDaGh\noTAyMoJSqYRKpYJarYaDgwOqqqrg7e0NAFi2bBn27t17z0AiIiL91eFddg+yfv36Ls+7e/duLF68\nGABQUlICa2trAICNjQ1KS0sBABqNBvb29tI8CoUCGo0GGo0GdnZ2UrmdnR00Gk2X60JERLrR4RHS\ng3R1ONfQ0IC0tDS88847AHDXFcXvdYXxhxEdHS099vHxgY+PT7cun4iot8vMzERmZmaPr7fbAqmr\nwZGeno4JEyZg2LBhAABra2tplFRcXAwrKysALSOiwsJCab6ioiIoFIp2y9vTOpCIiOhud35Z37hx\nY4+st9t22XVVcnIywsLCpOdBQUFISEgAACQmJiI4OFgqT0lJQX19PfLy8nD58mVMmjQJNjY2MDU1\nhVqthhACSUlJ0jxERNR7dNulg+bPn4+//e1vnZrn1q1bcHBwQG5uLoYMGQIAKC8vR0hICAoLC+Hg\n4IDU1FSYmZkBaDntOz4+HnK5HFu3boW/vz8A4KeffkJERARqa2sRGBiIrVu33nN9PMuOiKjzemrb\n2alAOnr0KPLz89HY2CiVLVu2TCsV0wYGEhFR5+ndad9Lly7FL7/8Ak9PTxgaGgJoqWRvCiQiItJf\nHR4hubi4IDs7u9vPeutJHCEREXWe3l06aOzYsSguLtZmXYiIqB/r8C67a9euwdXVFZMmTYKxsbFU\nnpaWppWKERFR/9LhQOLvd4iISJt4x1giIrovvTuGdOzYMXh7e2Pw4MEYMGAADA0NYWJios26ERFR\nP9LhQHruueeQnJwMlUqFmpoa7NixA//93/+tzboREVE/0qlLBzk6OqKpqQmGhoZYvnw5vv32W23V\ni4iI+pkOn9QwaNAg1NfXw9PTE+vWrYOtrS2am5u1WTciIupHOjxC+utf/4rm5mZ8+OGHeOSRR1BY\nWIivvvpKm3UjIqJ+pFNn2dXU1ODKlStwdnbWZp20hmfZERF1nt6dZff111/D09MTs2bNAgCcPn0a\nQUFBWquIeuLNAAAWv0lEQVQYERH1Lx0OpOjoaKjVaulWEJ6ensjLy9NaxYiIqH/pcCDJ5XKYmpq2\nKevNF1olIiL90uFAcnNzw65du9DU1IRLly7h+eefx9SpU7VZNyIi6kc6HEgffPABzp07B2NjYyxe\nvBimpqbt3pmViIioszocSNnZ2cjOzkZjYyNqa2uxb98+eHt7a7NuRETUj3T4tG9nZ2e8++67GDt2\nLAwMfs8xBwcHrVWuu/G0byKiztO7076HDRuGOXPmYOTIkXBwcJD+HsaNGzewcOFCuLi4wM3NDceP\nH0dFRQX8/f3h7OyMgIAA3LhxQ5o+JiYGKpUKLi4uyMjIkMqzsrLg4eEBJycnrFmz5qHqREREutHh\nEdJ3332HlJQU+Pn5tblB3/z587u88oiICDz55JNYvnw5GhsbUV1djU2bNmHo0KFYt24d3nnnHVRU\nVCA2NhbZ2dlYsmQJTpw4gaKiIvj5+eHSpUuQyWSYPHkyPvzwQ3h7eyMwMBCrV69GQEDA3Y3lCImI\nqNN6atvZ4UBasmQJLl68CDc3N2mXnUwmw86dO7u04srKSnh5eeGXX35pUz5mzBgcOnQI1tbWKC4u\nho+PDy5cuIDY2FjIZDKsX78eADB79mxER0fDwcEBvr6+yM7OBgCkpKTg0KFD+Oijj+5uLAOJiKjT\nemrb2eGLq548eRIXL17sthXn5eVh2LBhWL58OX7++WdMnDgR77//PkpKSmBtbQ0AsLGxQWlpKQBA\no9FgypQp0vwKhQIajQZGRkaws7OTyu3s7KDRaLqtnkQAUFZWhvz8fCiVSlhaWuq6OkR9UocDaerU\nqcjOzoarq2u3rLixsRFZWVnYvn07Jk6ciLVr10qjoNa6+8e3rW/F7uPjAx8fn25dPvU9ycm7ERm5\nEgMGKFFfn4/4+DiEhS3SdbWItCYzMxOZmZk9vt4OB9KxY8fg6emJkSNHwtjYGEIIyGQynDlzpksr\ntrOzg729PSZOnAgAWLBgAWJjY2FtbS2NkoqLi2FlZQWgZURUWFgozV9UVASFQtFueXtaBxLRg5SV\nlSEyciVqan5ATY0HgDOIjJwOPz9fjpSoz7rzy/rGjRt7ZL0dDqTuvhmftbU17O3tkZOTAycnJxw4\ncABubm5wc3NDQkIC1q9fj8TERAQHBwMAgoKCsGTJEqxduxYajQaXL1/GpEmTIJPJYGpqCrVaDW9v\nbyQlJWHVqlXdWlfqv/Lz8zFggPK3MAIAD8jlDsjPz2cgEXWzDgeSNn5vtG3bNixZsgQNDQ0YNWoU\nPv30UzQ1NSEkJAQ7d+6Eg4MDUlNTAQCurq4ICQmBq6sr5HI54uLipN1527dvR0REBGpraxEYGChd\nkZzoYSmVLbvpgDMAWkZIDQ0FUCqVOq0XUV/Uqfsh9XY8y4664vYxJLncAQ0NBTyGRP2O3p323Rcw\nkKireJYd9WcMJC1gIBERdZ7eXTqIiIhImxhIRESkFxhIRESkFxhIRESkFxhIRESkFxhIRESkFxhI\nRESkFxhIRESkFxhIRESkFxhIRESkFxhIRESkFxhIRESkFxhIRESkFxhIRESkFxhIRESkFxhIRESk\nFxhIRESkF3QaSEqlEuPGjYOXlxcmTZoEAKioqIC/vz+cnZ0REBCAGzduSNPHxMRApVLBxcUFGRkZ\nUnlWVhY8PDzg5OSENWvW9Hg7iIjo4ek0kAwMDJCZmYlTp05BrVYDAGJjY+Hn54eLFy/C19cXMTEx\nAIDs7Gykpqbi/PnzSE9Px8qVK6Vb6kZFRSE+Ph45OTnIycnB/v37ddYmIiLqGp0GkhACzc3Nbcr2\n7duH8PBwAEB4eDj27t0LAEhLS0NoaCiMjIygVCqhUqmgVqtRXFyMqqoqeHt7AwCWLVsmzUNERL2H\nTgNJJpNh5syZ8Pb2xo4dOwAAJSUlsLa2BgDY2NigtLQUAKDRaGBvby/Nq1AooNFooNFoYGdnJ5Xb\n2dlBo9H0YCuIiKg7GOly5UeOHIGtrS3Kysqk40YymazNNHc+f1jR0dHSYx8fH/j4+HTr8omIervM\nzExkZmb2+Hp1Gki2trYAAEtLS8ydOxdqtRrW1tbSKKm4uBhWVlYAWkZEhYWF0rxFRUVQKBTtlren\ndSAREdHd7vyyvnHjxh5Zr8522d26dQs3b94EAFRXVyMjIwPu7u4ICgpCQkICACAxMRHBwcEAgKCg\nIKSkpKC+vh55eXm4fPkyJk2aBBsbG5iamkKtVkMIgaSkJGkeIiLqPXQ2QiopKcG8efMgk8nQ2NiI\nJUuWwN/fHxMnTkRISAh27twJBwcHpKamAgBcXV0REhICV1dXyOVyxMXFSbvztm/fjoiICNTW1iIw\nMBCzZs3SVbOIiKiLZOL2udP9gEwmQz9qLhFRt+ipbSev1EBERHqBgURERHqBgURERHqBgURERHqB\ngURERHqBgURERHqBgURERHqBgURERHqBgURERHqBgURERHqBgURERHqBgURERHqBgURERHqBgURE\nRHqBgURERHqBgURERHqBgURERHqBgURERHpB54HU3NyM8ePHIygoCABQUVEBf39/ODs7IyAgADdu\n3JCmjYmJgUqlgouLCzIyMqTyrKwseHh4wMnJCWvWrOnxNhAR0cPTeSBt3boVrq6u0vPY2Fj4+fnh\n4sWL8PX1RUxMDAAgOzsbqampOH/+PNLT07Fy5UrpHu9RUVGIj49HTk4OcnJysH//fp20hYiIuk6n\ngVRUVIRvvvkGK1askMr27duH8PBwAEB4eDj27t0LAEhLS0NoaCiMjIygVCqhUqmgVqtRXFyMqqoq\neHt7AwCWLVsmzUNERL2HTgNp7dq12LJlC2QymVRWUlICa2trAICNjQ1KS0sBABqNBvb29tJ0CoUC\nGo0GGo0GdnZ2UrmdnR00Gk0PtaBFWVkZTpw4gbKysh5dLxFRX6KzQPrHP/4Ba2treHp6Srve7qV1\nWOmj5OTdcHAYg5kz/wgHhzFITt6t6yoREfVKRrpa8ZEjR5CWloZvvvkGNTU1qKqqwtKlS2FjYyON\nkoqLi2FlZQWgZURUWFgozV9UVASFQtFueXuio6Olxz4+PvDx8elyG8rKyhAZuRI1NT+gpsYDwBlE\nRk6Hn58vLC0tu7xcIiJdyszMRGZmZs+vWOiBzMxMMWfOHCGEEC+99JKIjY0VQggRGxsr1q9fL4QQ\n4ty5c8LT01PU1dWJ3NxcMXr0aNHc3CyEEGLy5Mni+PHjorm5WcyePVukp6ffcz3d3Vy1Wi1MTccL\nQEh/JiZeQq1Wd+t6iIh0qaeiQmcjpPa8/PLLCAkJwc6dO+Hg4IDU1FQAgKurK0JCQuDq6gq5XI64\nuDhpd9727dsRERGB2tpaBAYGYtasWT1SV6VSifr6fABnALSMkBoaCqBUKntk/UREfYnst/TrF2Qy\n2X2PV3VFcvJuREauhFzugIaGAsTHxyEsbFG3roOISJe0se2853oYSA+vrKwM+fn5UCqVPHZERH0O\nA0kLeqpTiYj6kp7adur8Sg1EREQAA4mIiPQEA4mIiPQCA4mIiPQCA4mIiPQCA4mIiPQCA4mIiPQC\nA4mIiPQCA4lIS3ifLKLOYSARaQHvk0XUebx0EFE3Kysrg4PDGNTU/IDbV4EfOHA6Cgou8FqH1Cvx\n0kFEvVR+fj4GDFCiJYwAwANyuQPy8/N1VymiXoCBRNTN2t4nC+B9sog6hoFE1M0sLS0RHx+HgQOn\nw8RkPAYOnI74+DjuriN6AB5DItIS3ieL+greD0kLGEhERJ3HkxqIiKhf0Vkg1dXVYfLkyfDy8oKb\nmxteeeUVAEBFRQX8/f3h7OyMgIAA3LhxQ5onJiYGKpUKLi4uyMjIkMqzsrLg4eEBJycnrFmzpsfb\nQkRED09ngWRsbIwffvgBp06dwpkzZ3Dw4EEcOXIEsbGx8PPzw8WLF+Hr64uYmBgAQHZ2NlJTU3H+\n/Hmkp6dj5cqV0hAyKioK8fHxyMnJQU5ODvbv36+rZvUamZmZuq6C3mBf/I598Tv2Rc/T6S67QYMG\nAWgZLTU3N8Pc3Bz79u1DeHg4ACA8PBx79+4FAKSlpSE0NBRGRkZQKpVQqVRQq9UoLi5GVVUVvL29\nAQDLli2T5qH28Z/td+yL37Evfse+6Hk6DaTm5mZ4eXnBxsYGPj4+cHV1RUlJCaytrQEANjY2KC0t\nBQBoNBrY29tL8yoUCmg0Gmg0GtjZ2UnldnZ20Gg0PdsQIiJ6aEa6XLmBgQFOnTqFyspKBAQEIDMz\nEzKZrM00dz4nIqK+SaeBdJuJiQkCAwNx8uRJWFtbS6Ok4uJiWFlZAWgZERUWFkrzFBUVQaFQtFve\nHgbc7zZu3KjrKugN9sXv2Be/Y1/0LJ3tsrt27Zp0Bl1NTQ2+++47eHl5ISgoCAkJCQCAxMREBAcH\nAwCCgoKQkpKC+vp65OXl4fLly5g0aRJsbGxgamoKtVoNIQSSkpKkee4khOAf//jHP/514a8n6GyE\n9OuvvyI8PBxCCDQ3N2Pp0qWYMWMGvLy8EBISgp07d8LBwQGpqakAAFdXV4SEhMDV1RVyuRxxcXHS\naGf79u2IiIhAbW0tAgMDMWvWLF01i4iIuqhfXamBiIj0V6+4UkNkZCSsra3h4eEhla1btw4uLi7w\n9PTEggULUFlZ2WaeK1euYMiQIfjzn/8slbX3A9r6+nqEhoZCpVJhypQpuHLlivRaYmIinJyc4Ozs\njKSkJC22smM62xdnzpzB1KlTMXbsWIwbNw719fUA+l9f1NXVYfHixfDw8ICbmxtiY2OlefpqX/zp\nT3/CuHHj4OnpCT8/PxQVFUmvdfZH5n21L77//ntMnDgR48aNg7e3N3744Qdpnv7WF7fpdNspeoEf\nf/xRnDp1Sri7u0tl3333nWhqahJCCLF+/Xrx8ssvt5nn6aefFiEhIeJ///d/pbJJkyYJtVothBBi\n9uzZ4ttvvxVCCBEXFyeioqKEEEKkpKSIRYsWCSGEKC8vF6NGjRLXr18XFRUV0mNd6khfrF+/Xggh\nRGNjo/Dw8BBnz54VQrS0p7m5WQjRf/ri9uciISFBhIWFCSGEuHXrllAqlaKgoEAI0Xf7oqqqSnq8\nbds2sWLFCiGEEOfOnROenp6ioaFB5OXlidGjR/f5z0V7fXH69Gnx66+/CiGE+Ne//iUUCoU0XX/p\ni8jIyDbz6HLb2StGSNOmTYO5uXmbMj8/PxgYtFT/0UcfbZPy+/btw6hRo+Dm5iaV3e8HtK1/jPv0\n00/j4MGDAID9+/fD398fpqamMDMzg7+/P7799lvtNbQDOtIXt3+HlZGRgXHjxmHs2LEAAHNzc8hk\nsn7VF7c/FzY2NqiurkZTUxNu3boFY2NjmJiY9Om+GDx4sPS4uroaQ4cOBdC1H5n31b4YN24cbGxs\nAABubm6ora1FQ0NDv+qLYcOGSc91ve3sFYH0IDt37sTs2bMBtHTw5s2b8cYbb7Q5M+R+P6Bt/aNb\nQ0NDmJqaory8vN0f4+qznTt3IjAwEACQk5MDAJg1axYmTpyILVu2AOhffXH7cxEQEAATExPY2tpC\nqVTi//2//wczM7M+3xevvfYaRowYgYSEBGzYsAFA135k3lf7orUvv/wS48ePh1wu75d9oQ/bzl4f\nSP/zP/8DuVyOxYsXAwCio6Oxdu1a6bJEXSF66Xket/siLCwMANDY2IgjR44gOTkZP/74I/bs2dNm\nH3lH9Pa+uP25+Pzzz1FTU4Pi4mLk5ubi3Xff7fQtxXtjX7z99tu4cuUKli9f3q0XHu5rfXHu3Dls\n2LABH3/8caeX21f6Qh+2nb06kBISEvDNN99g165dUtnx48exbt06jBo1Cu+//z42bdqEuLi4+/6A\ntvVrTU1NqKyshIWFBRQKRZuDdA/60a0u3asv7Ozs8MQTT8Dc3BwDBw5EYGAgsrKy+mVfHDlyBPPm\nzYOBgQEsLS3x2GOP4eTJk32+L25bvHgxTp48CaBrPzLvq30BtNR5/vz5+Otf/yrdZr4/9oVebDs7\neYxMZ/Ly8sTYsWOl5+np6cLV1VVcu3at3Xmio6PbHJibPHmyOH78uGhubhazZ88W6enpQgghtm/f\nLh2YS05OvueBuduPKyoqtNG8TuloX1RUVIgJEyaImpoa0dDQIPz8/KQ297e+2Lp1q1i+fLkQQoib\nN28KV1dX8a9//UsI0Xf74tKlS9Ljbdu2iWeeeUYI8ftJDXV1dSI3N7fNSQ39rS8qKirEuHHjxJ49\ne+5aRn/ri9Z0te3sFYEUFhYmbG1txYABA4S9vb3YuXOncHR0FCNGjBBeXl7Cy8tL6pTW7uzUkydP\nirFjxwpHR0exatUqqby2tlYsXLhQODo6ismTJ4u8vDzptU8//VQ4OjoKlUolEhMTtdrOjuhsX3z+\n+efCzc1NuLu7tzkTsb/1RW1trViyZIkYO3ascHNz6xefiwULFoixY8cKT09PMX/+fFFSUiJNv2nT\nJjF69GgxZswYsX//fqm8v/XF22+/LQYPHiy8vLyEp6en8PLyEmVlZUKI/tcXrelq28kfxhIRkV7o\n1ceQiIio72AgERGRXmAgERGRXmAgERGRXmAgERGRXmAgERGRXmAgERGRXmAgEfUS06ZNAwAUFBTA\n3d1dx7Uh6n4MJKJe4vDhw9JjmUymw5oQaQcDiaiD5s2bB29vb7i7u+OTTz7B//3f/2HdunXS64mJ\niVi1ahUA4K233sKYMWPwxBNPYPHixW3uvnmn6dOn44UXXoC3tzdcXV1x4sQJzJ8/H87Oznj99del\n6YYMGXLXvM3NzVi3bh0mT54MT09PfPLJJwBa7mHz5JNPYvz48fDw8MCRI0e6qxuItMZI1xUg6i0+\n/fRTmJmZoba2Ft7e3jh48CCmTp2KzZs3AwB2796N1157DSdPnsSePXtw9uxZ1NXVYfz48Zg4ceJ9\nl21sbIwTJ05g27ZtCA4OxunTp2FmZobRo0fjhRdekG6ueKf4+HiYmZnh+PHjqK+vx2OPPQZ/f398\n9dVXmDVrFjZs2AAhBG7duqWVPiHqTgwkog56//33pTtlFhUVITc3F6NHj4ZarYajoyMuXryIqVOn\nYuvWrQgODoZcLodcLsecOXMeuOygoCAAgLu7O9zd3WFlZQUAGDVqFAoLC++66+dtGRkZOHv2LL74\n4gsAQGVlJS5dugRvb288++yzaGhoQHBwMMaNG9cdXUCkVQwkog44dOgQDh48iOPHj8PY2BjTp09H\nXV0dQkNDsXv3bowZMwbz5s3r8vKNjY0BAAYGBtLj288bGxvbnU8IgQ8++AAzZ86867Uff/wR//jH\nPxAREYEXX3wRzzzzTJfrR9QTeAyJqANu3LgBc3NzGBsb48KFCzh27BgAYO7cudi3bx9SUlIQGhoK\nAHjsscfw9ddfo66uDjdv3sTf//73bqnDvS7MHxAQgLi4OCm0Ll26hFu3buHKlSuwsrJCZGQkVqxY\ngaysrG6pA5E2cYRE1AGzZs3CX/7yF7i5ucHZ2RlTpkwBAJiZmcHFxQUXLlyQjhNNnDgRQUFBGDdu\nHKytreHh4QFTU9N2l32/M+Zav3av6VasWIH8/HyMHz8eQghYWVlh7969yMzMxJYtWyCXyzFkyBAk\nJSV1telEPYb3QyLSgurqajzyyCOoqanBE088gU8++QSenp66rhaRXuMIiUgL/uu//gvZ2dmoq6tD\nREQEw4ioAzhCIuohzz33HI4cOQKZTAYhBGQyGVavXo3w8HBdV41ILzCQiIhIL/AsOyIi0gsMJCIi\n0gsMJCIi0gsMJCIi0gsMJCIi0gv/H/+V1oqvkrYTAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "brand_miles_df.plot.scatter(x='avg_miles',y='mean_price',title='Impact of mileage on avg.price of car')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the high-end brands **Audi, BMW and Mercedes**(in the top right of the scatter plot), as mileage increases, the average price clearly decreases. Mileage is a indeed factor when it comes to price, even among high-end brands.\n", "\n", "**Ford** and **Opel** (in the bottom left) are much less expensive used cars unlike German brands. However, as mileage goes up, their price goes down consistent with the trend seen with the expensive brands.\n", "\n", "**Volkswagen** with average mileage similar to **Opel** and higher than **Ford** still costs more than both. That could be because it's just a popular brand. It's just not in the league of expensive brands, nor comparable to the inexpensive ones. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Drop worthy columns\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are text columns where all or almost all values are the same or unique values are 1 or very few\n", "- column `nr_of_pictures` as identified earlier contains all values = `0` \n", "- columns `seller` and `offer_type` seem to have very few unique values \n", "Let's see " ] }, { "cell_type": "code", "execution_count": 207, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "seller object\n", "offer_type object\n", "dtype: object\n" ] }, { "data": { "text/plain": [ "(40893, 20)" ] }, "execution_count": 207, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print (autos[['seller','offer_type',]].dtypes)\n", "autos.shape" ] }, { "cell_type": "code", "execution_count": 208, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n", "1\n" ] } ], "source": [ "num_uniq_seller = len(autos['seller'].unique())\n", "num_uniq_offer = len(autos['offer_type'].unique())\n", "print (num_uniq_seller)\n", "print (num_uniq_offer)" ] }, { "cell_type": "code", "execution_count": 209, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "privat 40893\n", "Name: seller, dtype: int64\n", "Angebot 40893\n", "Name: offer_type, dtype: int64\n" ] } ], "source": [ "print (autos['seller'].value_counts())\n", "print (autos['offer_type'].value_counts())" ] }, { "cell_type": "code", "execution_count": 210, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Drop them 3 columns\n", "autos = autos.drop(['seller','offer_type','nr_of_pictures'],axis=1)" ] }, { "cell_type": "code", "execution_count": 211, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(40893, 17)" ] }, "execution_count": 211, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Further data cleaning\n", "- ** German to English translation of Categorical data** \n", "These 3 columns have categorical data that needs translation from German \n", "`gearbox` \n", "`unrepaired_damage` \n", "`fuel_type`" ] }, { "cell_type": "code", "execution_count": 212, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "manuell 30283\n", "automatik 9184\n", "Name: gearbox, dtype: int64" ] }, "execution_count": 212, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['gearbox'].value_counts()" ] }, { "cell_type": "code", "execution_count": 213, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "nein 31417\n", "ja 3336\n", "Name: unrepaired_damage, dtype: int64" ] }, "execution_count": 213, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['unrepaired_damage'].value_counts()" ] }, { "cell_type": "code", "execution_count": 214, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "benzin 24215\n", "diesel 13685\n", "lpg 606\n", "cng 64\n", "hybrid 36\n", "elektro 18\n", "andere 8\n", "Name: fuel_type, dtype: int64" ] }, "execution_count": 214, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['fuel_type'].value_counts()" ] }, { "cell_type": "code", "execution_count": 215, "metadata": { "collapsed": true }, "outputs": [], "source": [ "autos['gearbox'] = autos['gearbox'].str.replace('manuell','manual')\n", "autos['gearbox'] = autos['gearbox'].str.replace('automatik','automatic')\n", "autos['unrepaired_damage'] = autos['unrepaired_damage'].str.replace('nein','no')\n", "autos['unrepaired_damage'] = autos['unrepaired_damage'].str.replace('ja','yes')\n", "autos['fuel_type'] = autos['fuel_type'].str.replace('benzin', 'gasoline')\n", "autos['fuel_type'] = autos['fuel_type'].str.replace('elektro', 'electric')\n", "autos['fuel_type'] = autos['fuel_type'].str.replace('andere', 'other')" ] }, { "cell_type": "code", "execution_count": 216, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "manual 30283\n", "automatic 9184\n", "Name: gearbox, dtype: int64\n", "\n", "\n", "no 31417\n", "yes 3336\n", "Name: unrepaired_damage, dtype: int64\n", "\n", "\n", "gasoline 24215\n", "diesel 13685\n", "lpg 606\n", "cng 64\n", "hybrid 36\n", "electric 18\n", "other 8\n", "Name: fuel_type, dtype: int64\n" ] } ], "source": [ "print(autos['gearbox'].value_counts())\n", "print ('\\n')\n", "print(autos['unrepaired_damage'].value_counts())\n", "print ('\\n')\n", "print(autos['fuel_type'].value_counts())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **Investigation of key words from `name` column worthy of extracting as new columns**" ] }, { "cell_type": "code", "execution_count": 217, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namebrandmodel
49939Audi_TT_Roadster_2.0_TFSIauditt
49940Omas_Lieblingbmw3er
49941Maserati_Ghibli_Diesel_Automatiksonstige_autosNaN
49942Audi_A3_2.0_TDI_Sportback_DPF_Ambitionaudia3
49944SAAB_9_3__2_0t_SE__Automatik__Tempomat__Regens...saabandere
49945omega_2_2_facelift_mit_neuem_tuevopelomega
49947Audi_A5_2.7_TDI_+S_Line_Vollausstattung/ABT/B&...audia5
49948Hyundai_ix35_2.0_CRDi_4WD_Automatikhyundaii_reihe
49950Volvo_V70_2_II__2002__AHK__Klima__VolledervolvoNaN
49951Opel_Corsa_D1.0_weiss_/Scheckheftgepflegt/_Unf...opelcorsa
49952VW_POLO__9N3_Silber_1_2_Ltr._5_tuerer__96_tsd....volkswagenpolo
49954328_Cabrio_SchalterbmwNaN
49955Golf_1.9_TDI_DPF_4mot_GT_Sportvolkswagengolf
49956Toyota_Yaris_1.3_VVT_i_Executive_MODEL_2007toyotayaris
49957Bmw_mit_LPG__super_zustand_2_Jahre_Tuev_klimabmw3er
49958Golf_3_2.8_vr6_highlinevolkswagengolf
49959Mercedes_Benz_C_180_T_Kompressor_Classicmercedes_benzc_klasse
49961Golf_2.0_TDI_Sportline_Sonderproduktionvolkswagengolf
49962Mitsubishi_Space_Star_1__3_L__Bj_2004_Standhei...mitsubishiandere
49963Mercedes_Benz_B_200_CDI_Special_Editionmercedes_benzb_klasse
49964Audi_2_7_TDI_AVANTaudia4
49965Opel_Astra_1.6_Lenkradheizung~Sitzheizungopelastra
49966Citroën_C1_1.0_**Euro4**TÜV_OKT_2017**Scheiten...citroenc1
49967VW_Passat_2_0_TDI_comfortlinevolkswagenpassat
49969Nissan_X_Trail_2.2_dCi_4x4_Sport_m.AHZnissanx_trail
49970c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp...citroenc4
49971W.Lupo_1.0volkswagenlupo
49972Mercedes_Benz_Vito_115_CDI_Extralang_Aut.mercedes_benzvito
49973Mercedes_Benz_SLK_200_Kompressormercedes_benzslk
49975Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comforthondajazz
49977Mercedes_Benz_C200_Cdi_W203mercedes_benzc_klasse
49978Mercedes_Benz_E_200_Classicmercedes_benze_klasse
49979Volkswagen_Polo_1.6_TDI_Stylevolkswagenpolo
49981Opel_Astra_Kombi_mit_Anhaengerkupplungopelastra
49982Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkmskodafabia
49983Ford_focus_99fordfocus
49985Verkaufe_meinen_vw_vento!volkswagenNaN
49986Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst...chrysler300c
49987Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__...audia3
49988BMW_330_Cibmw3er
49990Mercedes_Benz_A_200__BlueEFFICIENCY__Urbanmercedes_benza_klasse
49991Kleinwagenrenaulttwingo
49992Fiat_Grande_Punto_1.4_T_Jet_16V_Sportfiatandere
49993Audi_A3__1_8l__Silber;_schoenes_FahrzeugaudiNaN
49994Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc...audia6
49995Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenonaudiq5
49996Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+...opelastra
49997Fiat_500_C_1.2_Dualogic_Loungefiat500
49998Audi_A3_2.0_TDI_Sportback_Ambitionaudia3
49999Opel_Vectra_1.6_16Vopelvectra
\n", "
" ], "text/plain": [ " name brand \\\n", "49939 Audi_TT_Roadster_2.0_TFSI audi \n", "49940 Omas_Liebling bmw \n", "49941 Maserati_Ghibli_Diesel_Automatik sonstige_autos \n", "49942 Audi_A3_2.0_TDI_Sportback_DPF_Ambition audi \n", "49944 SAAB_9_3__2_0t_SE__Automatik__Tempomat__Regens... saab \n", "49945 omega_2_2_facelift_mit_neuem_tuev opel \n", "49947 Audi_A5_2.7_TDI_+S_Line_Vollausstattung/ABT/B&... audi \n", "49948 Hyundai_ix35_2.0_CRDi_4WD_Automatik hyundai \n", "49950 Volvo_V70_2_II__2002__AHK__Klima__Volleder volvo \n", "49951 Opel_Corsa_D1.0_weiss_/Scheckheftgepflegt/_Unf... opel \n", "49952 VW_POLO__9N3_Silber_1_2_Ltr._5_tuerer__96_tsd.... volkswagen \n", "49954 328_Cabrio_Schalter bmw \n", "49955 Golf_1.9_TDI_DPF_4mot_GT_Sport volkswagen \n", "49956 Toyota_Yaris_1.3_VVT_i_Executive_MODEL_2007 toyota \n", "49957 Bmw_mit_LPG__super_zustand_2_Jahre_Tuev_klima bmw \n", "49958 Golf_3_2.8_vr6_highline volkswagen \n", "49959 Mercedes_Benz_C_180_T_Kompressor_Classic mercedes_benz \n", "49961 Golf_2.0_TDI_Sportline_Sonderproduktion volkswagen \n", "49962 Mitsubishi_Space_Star_1__3_L__Bj_2004_Standhei... mitsubishi \n", "49963 Mercedes_Benz_B_200_CDI_Special_Edition mercedes_benz \n", "49964 Audi_2_7_TDI_AVANT audi \n", "49965 Opel_Astra_1.6_Lenkradheizung~Sitzheizung opel \n", "49966 Citroën_C1_1.0_**Euro4**TÜV_OKT_2017**Scheiten... citroen \n", "49967 VW_Passat_2_0_TDI_comfortline volkswagen \n", "49969 Nissan_X_Trail_2.2_dCi_4x4_Sport_m.AHZ nissan \n", "49970 c4_Grand_Picasso_mit_Automatik_Leder_Navi_Temp... citroen \n", "49971 W.Lupo_1.0 volkswagen \n", "49972 Mercedes_Benz_Vito_115_CDI_Extralang_Aut. mercedes_benz \n", "49973 Mercedes_Benz_SLK_200_Kompressor mercedes_benz \n", "49975 Honda_Jazz_1.3_DSi_i_VTEC_IMA_CVT_Comfort honda \n", "49977 Mercedes_Benz_C200_Cdi_W203 mercedes_benz \n", "49978 Mercedes_Benz_E_200_Classic mercedes_benz \n", "49979 Volkswagen_Polo_1.6_TDI_Style volkswagen \n", "49981 Opel_Astra_Kombi_mit_Anhaengerkupplung opel \n", "49982 Skoda_Fabia_4_Tuerer_Bj:2004__85.000Tkm skoda \n", "49983 Ford_focus_99 ford \n", "49985 Verkaufe_meinen_vw_vento! volkswagen \n", "49986 Chrysler_300C_3.0_CRD_DPF_Automatik_Voll_Ausst... chrysler \n", "49987 Audi_A3_Limousine_2.0_TDI_DPF_Ambition__NAVI__... audi \n", "49988 BMW_330_Ci bmw \n", "49990 Mercedes_Benz_A_200__BlueEFFICIENCY__Urban mercedes_benz \n", "49991 Kleinwagen renault \n", "49992 Fiat_Grande_Punto_1.4_T_Jet_16V_Sport fiat \n", "49993 Audi_A3__1_8l__Silber;_schoenes_Fahrzeug audi \n", "49994 Audi_A6__S6__Avant_4.2_quattro_eventuell_Tausc... audi \n", "49995 Audi_Q5_3.0_TDI_qu._S_tr.__Navi__Panorama__Xenon audi \n", "49996 Opel_Astra_F_Cabrio_Bertone_Edition___TÜV_neu+... opel \n", "49997 Fiat_500_C_1.2_Dualogic_Lounge fiat \n", "49998 Audi_A3_2.0_TDI_Sportback_Ambition audi \n", "49999 Opel_Vectra_1.6_16V opel \n", "\n", " model \n", "49939 tt \n", "49940 3er \n", "49941 NaN \n", "49942 a3 \n", "49944 andere \n", "49945 omega \n", "49947 a5 \n", "49948 i_reihe \n", "49950 NaN \n", "49951 corsa \n", "49952 polo \n", "49954 NaN \n", "49955 golf \n", "49956 yaris \n", "49957 3er \n", "49958 golf \n", "49959 c_klasse \n", "49961 golf \n", "49962 andere \n", "49963 b_klasse \n", "49964 a4 \n", "49965 astra \n", "49966 c1 \n", "49967 passat \n", "49969 x_trail \n", "49970 c4 \n", "49971 lupo \n", "49972 vito \n", "49973 slk \n", "49975 jazz \n", "49977 c_klasse \n", "49978 e_klasse \n", "49979 polo \n", "49981 astra \n", "49982 fabia \n", "49983 focus \n", "49985 NaN \n", "49986 300c \n", "49987 a3 \n", "49988 3er \n", "49990 a_klasse \n", "49991 twingo \n", "49992 andere \n", "49993 NaN \n", "49994 a6 \n", "49995 q5 \n", "49996 astra \n", "49997 500 \n", "49998 a3 \n", "49999 vectra " ] }, "execution_count": 217, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos[ ['name', 'brand','model']].tail(50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- In some names, the first two words separated by `_` indicate the `brand` and `model` of the car respectively. However, those columns already exist so, it's not worth extracting them as new columns\n", "- Some of the names are obscure to be of value to extract into new columns (for example, `Omas_Liebling` or `W.Lupo_1.0`)\n", "- Some names have information reg. fuel type or gearbox which is redundant even if extracted, as those columns already exist.\n", "- In some cases, name is a single word or random number. There's just too much text under name column and not even consistent across all rows, to extract into new columns" ] }, { "cell_type": "code", "execution_count": 218, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
0017668128439
1Anfàngerauto
2TOOPPPWAGENDIESEL
3Schnaeppchen......
4Schnaeppchen
5STTOOOPPP!!!!
6Privatanbieter
7*RENAULT*MEGANE*1.5dCi*GRANDTOUR*EMOTION*1HAND...
80178/8055184
9Schlachtfest
100178/8055184
11Finanzierunguebernahme
12Volswagen
13Reddbeat1976
14~.A.U.D.I.~A6~3.0.TDI~QUATTRO~BOSE~
15wolsfagen
16Ausschlachten
17Gebrauchtwagen
1874828364829187482929
19Fahrberrit
20Unfallfahrzeug
21Mazda.121
22Unfallauto
23Polo......
24Autoverkauf
25Smart
26Notverkauf
27Motorschaden
28Smart.klima
29Hobbyaufgabe
......
71Klima/5Tuerer/Alufelgen
72015780886026
73Gelaendewagen
74Youngtimer
75Notverkauf
76Compfortline
770040728824593
78TOOPWAGENNNDIESEL
79WaldemarFunk
80Ggggggnffkfklff
81BMW116\"8\"fach\"bereift\"Start\"stop\"Tausch\"\"RESER...
82Zuverkaufen
83Autobeschreibung
84BMW116\"8\"fach\"bereift\"Start\"stop\"Tausch\"moegli...
85Gebrauchtfahrzeug
86Diesel.....
8701788177890
88TOPPP/ZUSTAND//DIESEL
89Anfaengerauto
90Schlachtfest
91Anfaengerauto
92Anfaengerauto
93Gelegenheit
94Fiat.Punto.1.2
95Motorschaden
96Autoverkauf
97Vvvvvvvvvv
98Beschaeftigt
99Unfallauto!
100Kleinwagen
\n", "

101 rows × 1 columns

\n", "
" ], "text/plain": [ " 0\n", "0 017668128439\n", "1 Anfàngerauto\n", "2 TOOPPPWAGENDIESEL\n", "3 Schnaeppchen......\n", "4 Schnaeppchen\n", "5 STTOOOPPP!!!!\n", "6 Privatanbieter\n", "7 *RENAULT*MEGANE*1.5dCi*GRANDTOUR*EMOTION*1HAND...\n", "8 0178/8055184\n", "9 Schlachtfest\n", "10 0178/8055184\n", "11 Finanzierunguebernahme\n", "12 Volswagen\n", "13 Reddbeat1976\n", "14 ~.A.U.D.I.~A6~3.0.TDI~QUATTRO~BOSE~\n", "15 wolsfagen\n", "16 Ausschlachten\n", "17 Gebrauchtwagen\n", "18 74828364829187482929\n", "19 Fahrberrit\n", "20 Unfallfahrzeug\n", "21 Mazda.121\n", "22 Unfallauto\n", "23 Polo......\n", "24 Autoverkauf\n", "25 Smart\n", "26 Notverkauf\n", "27 Motorschaden\n", "28 Smart.klima\n", "29 Hobbyaufgabe\n", ".. ...\n", "71 Klima/5Tuerer/Alufelgen\n", "72 015780886026\n", "73 Gelaendewagen\n", "74 Youngtimer\n", "75 Notverkauf\n", "76 Compfortline\n", "77 0040728824593\n", "78 TOOPWAGENNNDIESEL\n", "79 WaldemarFunk\n", "80 Ggggggnffkfklff\n", "81 BMW116\"8\"fach\"bereift\"Start\"stop\"Tausch\"\"RESER...\n", "82 Zuverkaufen\n", "83 Autobeschreibung\n", "84 BMW116\"8\"fach\"bereift\"Start\"stop\"Tausch\"moegli...\n", "85 Gebrauchtfahrzeug\n", "86 Diesel.....\n", "87 01788177890\n", "88 TOPPP/ZUSTAND//DIESEL\n", "89 Anfaengerauto\n", "90 Schlachtfest\n", "91 Anfaengerauto\n", "92 Anfaengerauto\n", "93 Gelegenheit\n", "94 Fiat.Punto.1.2\n", "95 Motorschaden\n", "96 Autoverkauf\n", "97 Vvvvvvvvvv\n", "98 Beschaeftigt\n", "99 Unfallauto!\n", "100 Kleinwagen\n", "\n", "[101 rows x 1 columns]" ] }, "execution_count": 218, "metadata": {}, "output_type": "execute_result" } ], "source": [ "single_names =[]\n", "for row in list(autos['name']):\n", " if '_' not in row:\n", " single_names.append(row)\n", "pd.DataFrame(single_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 101 names that don't convey anything meaningful or they need translation to English. Overall, there are no consistent key words in the `name` column that we can extract as is into separate columns that will provide value to analysis. In fact, it's a good candidate to drop from the dataframe." ] }, { "cell_type": "code", "execution_count": 219, "metadata": { "collapsed": false }, "outputs": [], "source": [ "autos.drop(['name'], axis=1, inplace=True)" ] }, { "cell_type": "code", "execution_count": 220, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(40893, 16)" ] }, "execution_count": 220, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **Convertion of all dates to uniform numeric data**" ] }, { "cell_type": "code", "execution_count": 221, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(40893, 16)" ] }, "execution_count": 221, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos.shape" ] }, { "cell_type": "code", "execution_count": 222, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "date_crawled object\n", "ad_created object\n", "last_seen object\n", "dtype: object" ] }, "execution_count": 222, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos[['date_crawled', 'ad_created', 'last_seen']].dtypes" ] }, { "cell_type": "code", "execution_count": 223, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date_crawledad_createdlast_seen
02016-03-26 17:47:462016-03-26 00:00:002016-04-06 06:45:54
12016-04-04 13:38:562016-04-04 00:00:002016-04-06 14:45:08
22016-03-26 18:57:242016-03-26 00:00:002016-04-06 20:15:37
32016-03-12 16:58:102016-03-12 00:00:002016-03-15 03:16:28
42016-04-01 14:38:502016-04-01 00:00:002016-04-01 14:38:50
\n", "
" ], "text/plain": [ " date_crawled ad_created last_seen\n", "0 2016-03-26 17:47:46 2016-03-26 00:00:00 2016-04-06 06:45:54\n", "1 2016-04-04 13:38:56 2016-04-04 00:00:00 2016-04-06 14:45:08\n", "2 2016-03-26 18:57:24 2016-03-26 00:00:00 2016-04-06 20:15:37\n", "3 2016-03-12 16:58:10 2016-03-12 00:00:00 2016-03-15 03:16:28\n", "4 2016-04-01 14:38:50 2016-04-01 00:00:00 2016-04-01 14:38:50" ] }, "execution_count": 223, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos[['date_crawled', 'ad_created', 'last_seen']].head()" ] }, { "cell_type": "code", "execution_count": 224, "metadata": { "collapsed": false }, "outputs": [], "source": [ "autos['date_crawled'] = autos['date_crawled'].str[:10]\n", "autos['ad_created'] = autos['ad_created'].str[:10]\n", "autos['last_seen'] = autos['last_seen'].str[:10]\n" ] }, { "cell_type": "code", "execution_count": 225, "metadata": { "collapsed": false }, "outputs": [], "source": [ "autos['date_crawled'] = autos['date_crawled'].str.replace('-','')\n", "autos['ad_created'] = autos['ad_created'].str.replace('-','')\n", "autos['last_seen'] = autos['last_seen'].str.replace('-','')" ] }, { "cell_type": "code", "execution_count": 226, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Convert to data type integer\n", "autos[['date_crawled', 'ad_created', 'last_seen']] = autos[['date_crawled', 'ad_created', 'last_seen']].astype(int)" ] }, { "cell_type": "code", "execution_count": 227, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "date_crawled int64\n", "ad_created int64\n", "last_seen int64\n", "dtype: object" ] }, "execution_count": 227, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos[['date_crawled', 'ad_created', 'last_seen']].dtypes" ] }, { "cell_type": "code", "execution_count": 228, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date_crawledad_createdlast_seen
0201603262016032620160406
1201604042016040420160406
2201603262016032620160406
3201603122016031220160315
4201604012016040120160401
\n", "
" ], "text/plain": [ " date_crawled ad_created last_seen\n", "0 20160326 20160326 20160406\n", "1 20160404 20160404 20160406\n", "2 20160326 20160326 20160406\n", "3 20160312 20160312 20160315\n", "4 20160401 20160401 20160401" ] }, "execution_count": 228, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos[['date_crawled', 'ad_created', 'last_seen']].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Some more questions for further Analysis\n", "\n", "** What are some of the most common brand/model combinations?**" ] }, { "cell_type": "code", "execution_count": 229, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
brandmodel
0peugeotandere
1bmw7er
2volkswagengolf
3smartfortwo
4fordfocus
\n", "
" ], "text/plain": [ " brand model\n", "0 peugeot andere\n", "1 bmw 7er\n", "2 volkswagen golf\n", "3 smart fortwo\n", "4 ford focus" ] }, "execution_count": 229, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos[['brand','model']].head()\n" ] }, { "cell_type": "code", "execution_count": 230, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Concatenate `brand`, `model` strings with a `_`\n", "autos['brand_model'] = autos['brand']+'_'+autos['model']" ] }, { "cell_type": "code", "execution_count": 231, "metadata": { "collapsed": false }, "outputs": [], "source": [ "brand_model_df = pd.DataFrame(autos['brand_model'].value_counts())" ] }, { "cell_type": "code", "execution_count": 232, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
brand_model
volkswagen_golf3134
bmw_3er2373
volkswagen_polo1291
volkswagen_passat1276
opel_corsa1254
opel_astra1210
audi_a41199
mercedes_benz_c_klasse1113
bmw_5er1074
mercedes_benz_e_klasse856
\n", "
" ], "text/plain": [ " brand_model\n", "volkswagen_golf 3134\n", "bmw_3er 2373\n", "volkswagen_polo 1291\n", "volkswagen_passat 1276\n", "opel_corsa 1254\n", "opel_astra 1210\n", "audi_a4 1199\n", "mercedes_benz_c_klasse 1113\n", "bmw_5er 1074\n", "mercedes_benz_e_klasse 856" ] }, "execution_count": 232, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The brand/model combinations that make the tope 10 most common listings \n", "brand_model_df.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most common cars by far are **Volkswagen** models. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**How much cheaper are cars with damage than their non-damaged counterparts?**" ] }, { "cell_type": "code", "execution_count": 233, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date_crawledpriceabtestvehicle_typeregistration_yeargearboxpower_p_smodelodometer_kmregistration_monthfuel_typebrandunrepaired_damagead_createdpostal_codelast_seenbrand_model
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [date_crawled, price, abtest, vehicle_type, registration_year, gearbox, power_p_s, model, odometer_km, registration_month, fuel_type, brand, unrepaired_damage, ad_created, postal_code, last_seen, brand_model]\n", "Index: []" ] }, "execution_count": 233, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos.head(0)" ] }, { "cell_type": "code", "execution_count": 234, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "no 31417\n", "yes 3336\n", "Name: unrepaired_damage, dtype: int64" ] }, "execution_count": 234, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos.unrepaired_damage.value_counts()" ] }, { "cell_type": "code", "execution_count": 235, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "6140" ] }, "execution_count": 235, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['unrepaired_damage'].isnull().sum()" ] }, { "cell_type": "code", "execution_count": 236, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3336, 17)\n", "(31417, 17)\n" ] } ], "source": [ "cars_with_damage = autos[autos['unrepaired_damage'] == 'yes']\n", "cars_no_damage = autos[autos['unrepaired_damage'] == 'no']\n", "\n", "print (cars_with_damage.shape)\n", "print (cars_no_damage.shape)\n" ] }, { "cell_type": "code", "execution_count": 237, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2803.3744004796163\n", "7336.2024700003185\n" ] } ], "source": [ "# Calculate the average price for just this set of rows \n", "damage_mean = cars_with_damage['price'].mean()\n", "no_damage_mean = cars_no_damage['price'].mean()\n", "print (damage_mean)\n", "print (no_damage_mean)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Do average prices follow any patterns based on the mileage?** \n" ] }, { "cell_type": "code", "execution_count": 238, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 145000\n", "dtype: int64\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
0150000
170000
250000
310000
430000
590000
6125000
720000
860000
95000
1040000
1180000
12100000
\n", "
" ], "text/plain": [ " 0\n", "0 150000\n", "1 70000\n", "2 50000\n", "3 10000\n", "4 30000\n", "5 90000\n", "6 125000\n", "7 20000\n", "8 60000\n", "9 5000\n", "10 40000\n", "11 80000\n", "12 100000" ] }, "execution_count": 238, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mileages = pd.DataFrame(autos['odometer_km'].unique())\n", "print (mileages.max() - mileages.min())\n", "mileages" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Split the odometer_km into bins like this:\n", "(Note that left bin edge is exclusive and right bin edge is inclusive)\n", "\n", "[1000, 31000, 61000, 91000, 121000, 151000] " ] }, { "cell_type": "code", "execution_count": 239, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Create a new column `odometer_km_bins` that sets the x argument to the `odometer_km` column and sets the bins argument to the list of bin edge values\n", "# `odometer_km` values will get assigned to the proper bin\n", "autos['odometer_km_bins'] = pd.cut(x=autos['odometer_km'], bins=[1000, 31000, 61000, 91000, 121000, 151000])" ] }, { "cell_type": "code", "execution_count": 240, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
priceodometer_kmodometer_km_bins
05000150000(121000, 151000]
18500150000(121000, 151000]
2899070000(61000, 91000]
3435070000(61000, 91000]
41350150000(121000, 151000]
\n", "
" ], "text/plain": [ " price odometer_km odometer_km_bins\n", "0 5000 150000 (121000, 151000]\n", "1 8500 150000 (121000, 151000]\n", "2 8990 70000 (61000, 91000]\n", "3 4350 70000 (61000, 91000]\n", "4 1350 150000 (121000, 151000]" ] }, "execution_count": 240, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Print out some rows with the new column and checkout the bins\n", "autos[['price', 'odometer_km', 'odometer_km_bins']].head()" ] }, { "cell_type": "code", "execution_count": 241, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(121000, 151000] 30377\n", "(61000, 91000] 3888\n", "(31000, 61000] 2729\n", "(1000, 31000] 2051\n", "(91000, 121000] 1848\n", "Name: odometer_km_bins, dtype: int64" ] }, "execution_count": 241, "metadata": {}, "output_type": "execute_result" } ], "source": [ "autos['odometer_km_bins'].value_counts()" ] }, { "cell_type": "code", "execution_count": 242, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "CategoricalIndex([(121000, 151000], (61000, 91000], (31000, 61000],\n", " (1000, 31000], (91000, 121000]],\n", " categories=[(1000, 31000], (31000, 61000], (61000, 91000], (91000, 121000], (121000, 151000]], ordered=True, dtype='category')" ] }, "execution_count": 242, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Unique mileage bins to loop over, using index labels \n", "autos['odometer_km_bins'].value_counts().index" ] }, { "cell_type": "code", "execution_count": 243, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{Interval(31000, 61000, closed='right'): 14044.41920117259, Interval(1000, 31000, closed='right'): 17594.77815699659, Interval(61000, 91000, closed='right'): 9906.484567901234, Interval(91000, 121000, closed='right'): 8237.034632034633, Interval(121000, 151000, closed='right'): 4443.746584587023}\n" ] } ], "source": [ "# Produce a dictionary of average price for each mileage bin \n", "avg_price_by_odo_bin = {}\n", "\n", "for b in autos['odometer_km_bins'].value_counts().index:\n", " # select rows that correspond to each mileage bin\n", " b_rows = autos[autos['odometer_km_bins'] == b]\n", " # Calculate the average price for just those rows\n", " avg_price = b_rows[\"price\"].mean()\n", " # Assign the avg price value to the dictionary, using the odometer_km_bins as the key\n", " avg_price_by_odo_bin[b] = avg_price\n", " \n", "print (avg_price_by_odo_bin)" ] }, { "cell_type": "code", "execution_count": 244, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Create a dataframe from the dictionary\n", "odo_price_df = pd.DataFrame(list(avg_price_by_odo_bin.items()),columns = ['odometer_km_groups','avg_price']) " ] }, { "cell_type": "code", "execution_count": 245, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
odometer_km_groupsavg_price
4(121000, 151000]4443.746585
3(91000, 121000]8237.034632
2(61000, 91000]9906.484568
0(31000, 61000]14044.419201
1(1000, 31000]17594.778157
\n", "
" ], "text/plain": [ " odometer_km_groups avg_price\n", "4 (121000, 151000] 4443.746585\n", "3 (91000, 121000] 8237.034632\n", "2 (61000, 91000] 9906.484568\n", "0 (31000, 61000] 14044.419201\n", "1 (1000, 31000] 17594.778157" ] }, "execution_count": 245, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sort column in descending order\n", "odo_price_df.sort_values('odometer_km_groups', ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "autos.odometer_km.plot.hist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summarizing the Analyzed results from Used Car Listings\n", "\n", "- These are the top 6 brands by listings: \n", "\n", " Volkswagen \n", " BMW \n", " Opel \n", " Mercedes-Benz \n", " Audi \n", " Ford \n", " \n", " \n", "- Some listings are priced at millions of dollars for coupes and Limousines. That aspect gives the site an impression that not every listing is realistic and buyers could be in for a surprise or a long bidding process\n", "\n", "\n", "- Top brands **Audi, Mercedes Benz** and **BMW**, all German, on average, are also among the highest priced cars\n", "\n", "- **Ford** and **Opel** are top brands but much cheaper and don't make the list of top brands by price\n", "\n", "- **Porsche**, also a German car, is the priciest used car at an average of 49661. It's understandable because it is world's #1 luxury brand and has over 200 listings on this site\n", "\n", "- **Sonstige autos** is a distant second priciest at 14265, followed by \n", "- **Mini**, possibly, because it's owned by BMW\n", "- **Renault** is the least expensive used car at 2762 \n", "- The average used car price is 7194 \n", "\n", "\n", "- **Volkswagen** is by far the most popular brand and 3 of its models **golf, polo and passat** are the most commonly listed. However, it's not among the priciest. Its price falls below that of the average used car\n", "\n", " \n", "- Almost 75% of the listings are high-mileage cars, having mileages in 121000 - 151000 KM range\n", "\n", "\n", "- It's also worth noting that damaged cars do not dominate the listings. 8% of the listings are with (unrepaired) damages while 76% are with no (unrepaired) damages\n", "\n", "\n", "- On an average, cars with unrepaired damages cost over 4500 less than their non-damaged counterparts. Knowing average repair costs for a brand would help a buyer decide if it's worth paying higher price for a car without damages or if the price difference\n", "covers the damages for that brand so he or she can get it for cheaper\n", "\n", "\n", "- Visualizing the data for `Impact of mileage on avg.price of car`(Scatter plot)corraborated what splitting mileage into separate bins showed - that the average prices increased steadily for lower mileage bins:\n", "\n", " For the high-end brands Audi, BMW and Mercedes, as mileage increased, the average price clearly decreased. Mileage is a indeed factor when it comes to price, even among high-end brands \n", "\n", " Ford and Opel are much less expensive used cars unlike German brands. However, as mileage went up, their price went down, consistent with the trend seen with the expensive brands.\n", "\n", " Volkswagen with average mileage similar to Opel and higher than Ford still costs more than both. That could be because it's just a popular brand, even though it's not in the league of expensive brands or comparable to the inexpensive ones\n", " \n", " We may also infer that there are so many Volkswagen listings because they can sell easily due to their lower average price" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 2 }