{ "cells": [ { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "%matplotlib inline\n", "\n", "from datetime import datetime, date\n", "plt.style.use('ggplot')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# Loading the Customer Demographics Data from the excel file\n", "\n", "cust_demo = pd.read_excel('Raw_data.xlsx' , sheet_name='CustomerDemographic')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
customer_idfirst_namelast_namegenderpast_3_years_bike_related_purchasesDOBjob_titlejob_industry_categorywealth_segmentdeceased_indicatordefaultowns_cartenure
01LaraineMedendorpF931953-10-12Executive SecretaryHealthMass CustomerN\"'Yes11.0
12EliBockmanMale811980-12-16Administrative OfficerFinancial ServicesMass CustomerN<script>alert('hi')</script>Yes16.0
23ArlinDearleMale611954-01-20Recruiting ManagerPropertyMass CustomerN2018-02-01 00:00:00Yes15.0
34TalbotNaNMale331961-10-03NaNITMass CustomerN() { _; } >_[$($())] { touch /tmp/blns.shellsh...No7.0
45Sheila-kathrynCaltonFemale561977-05-13Senior EditorNaNAffluent CustomerNNILYes8.0
\n", "
" ], "text/plain": [ " customer_id first_name last_name gender \\\n", "0 1 Laraine Medendorp F \n", "1 2 Eli Bockman Male \n", "2 3 Arlin Dearle Male \n", "3 4 Talbot NaN Male \n", "4 5 Sheila-kathryn Calton Female \n", "\n", " past_3_years_bike_related_purchases DOB job_title \\\n", "0 93 1953-10-12 Executive Secretary \n", "1 81 1980-12-16 Administrative Officer \n", "2 61 1954-01-20 Recruiting Manager \n", "3 33 1961-10-03 NaN \n", "4 56 1977-05-13 Senior Editor \n", "\n", " job_industry_category wealth_segment deceased_indicator \\\n", "0 Health Mass Customer N \n", "1 Financial Services Mass Customer N \n", "2 Property Mass Customer N \n", "3 IT Mass Customer N \n", "4 NaN Affluent Customer N \n", "\n", " default owns_car tenure \n", "0 \"' Yes 11.0 \n", "1 Yes 16.0 \n", "2 2018-02-01 00:00:00 Yes 15.0 \n", "3 () { _; } >_[$($())] { touch /tmp/blns.shellsh... No 7.0 \n", "4 NIL Yes 8.0 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Checking first 5 records from Customer Demographics Data\n", "\n", "cust_demo.head(5)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 4000 entries, 0 to 3999\n", "Data columns (total 13 columns):\n", "customer_id 4000 non-null int64\n", "first_name 4000 non-null object\n", "last_name 3875 non-null object\n", "gender 4000 non-null object\n", "past_3_years_bike_related_purchases 4000 non-null int64\n", "DOB 3913 non-null datetime64[ns]\n", "job_title 3494 non-null object\n", "job_industry_category 3344 non-null object\n", "wealth_segment 4000 non-null object\n", "deceased_indicator 4000 non-null object\n", "default 3698 non-null object\n", "owns_car 4000 non-null object\n", "tenure 3913 non-null float64\n", "dtypes: datetime64[ns](1), float64(1), int64(2), object(9)\n", "memory usage: 406.3+ KB\n" ] } ], "source": [ "# Information of columns and data-types of Customer Demographics Data.\n", "\n", "cust_demo.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The data-type of columns looks fine. However here default is an irrelevent column which should be dropped / deleted from the dataset. Let's check for the data quality and apply data cleaning process where ever applicable to clean our dataset before performing any analysis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Total Records" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total records (rows) in the dataset : 4000\n", "Total columns (features) in the dataset : 13\n" ] } ], "source": [ "print(\"Total records (rows) in the dataset : {}\".format(cust_demo.shape[0]))\n", "print(\"Total columns (features) in the dataset : {}\".format(cust_demo.shape[1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Numeric Columns and Non-Numeric Columns" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The numeric columns are : ['customer_id' 'past_3_years_bike_related_purchases' 'tenure']\n", "The non-numeric columns are : ['first_name' 'last_name' 'gender' 'DOB' 'job_title'\n", " 'job_industry_category' 'wealth_segment' 'deceased_indicator' 'default'\n", " 'owns_car']\n" ] } ], "source": [ "# select numeric columns\n", "df_numeric = cust_demo.select_dtypes(include=[np.number])\n", "numeric_cols = df_numeric.columns.values\n", "print(\"The numeric columns are : {}\".format(numeric_cols))\n", "\n", "\n", "# select non-numeric columns\n", "df_non_numeric = cust_demo.select_dtypes(exclude=[np.number])\n", "non_numeric_cols = df_non_numeric.columns.values\n", "print(\"The non-numeric columns are : {}\".format(non_numeric_cols))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Dropping Irrelevent Columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "default is an irrelevent column. Hence it should be dropped." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# Dropping the default column\n", "\n", "cust_demo.drop(labels={'default'}, axis=1 , inplace=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Missing Values Check" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Checking for the presence of any missing values in the dataset. If missing values are present for a particular feature then depending upon the situation the feature may be either dropped (cases when a major amount of data is missing) or an appropiate value will be imputed in the feature column with missing values." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "customer_id 0\n", "first_name 0\n", "last_name 125\n", "gender 0\n", "past_3_years_bike_related_purchases 0\n", "DOB 87\n", "job_title 506\n", "job_industry_category 656\n", "wealth_segment 0\n", "deceased_indicator 0\n", "owns_car 0\n", "tenure 87\n", "dtype: int64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Total number of missing values\n", "\n", "cust_demo.isnull().sum()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "customer_id 0.000\n", "first_name 0.000\n", "last_name 3.125\n", "gender 0.000\n", "past_3_years_bike_related_purchases 0.000\n", "DOB 2.175\n", "job_title 12.650\n", "job_industry_category 16.400\n", "wealth_segment 0.000\n", "deceased_indicator 0.000\n", "owns_car 0.000\n", "tenure 2.175\n", "dtype: float64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Percentage of missing values\n", "\n", "cust_demo.isnull().mean()*100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here it is observed that columns like gender, DOB, job_title, job_industry_category and tenure have missing values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1 Last Name" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "first_name 0\n", "customer_id 0\n", "dtype: int64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Checking for the presence of first name and customer id in records where last name is missing.\n", "\n", "cust_demo[cust_demo['last_name'].isnull()][['first_name', 'customer_id']].isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since All customers have a customer_id and First name, all the customers are identifiable. Hence it is okay for to not have a last name. Filling null last names with \"None\"." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
customer_idfirst_namelast_namegenderpast_3_years_bike_related_purchasesDOBjob_titlejob_industry_categorywealth_segmentdeceased_indicatorowns_cartenure
34TalbotNaNMale331961-10-03NaNITMass CustomerNNo7.0
6667VernonNaNMale671960-06-14Web Developer IIRetailMass CustomerNNo18.0
105106GlynNaNMale541966-07-03Software Test Engineer IIIHealthHigh Net WorthNYes18.0
138139GarNaNMale11964-07-28OperatorTelecommunicationsAffluent CustomerNNo4.0
196197AvisNaNFemale321977-01-27NaNNaNHigh Net WorthNNo5.0
210211BeitrisNaNFemale61974-03-04VP MarketingManufacturingMass CustomerNYes5.0
249250KristoferNaNMale531988-04-15Legal AssistantHealthMass CustomerNYes13.0
250251MalaNaNFemale881977-12-24VP SalesFinancial ServicesAffluent CustomerNYes10.0
256257MarissaNaNFemale701966-02-08Sales AssociateManufacturingAffluent CustomerNYes19.0
274275DudNaNMale71955-07-27VP SalesHealthHigh Net WorthNNo13.0
355356NicholeNaNFemale101975-03-30LibrarianEntertainmentHigh Net WorthNNo5.0
459460IllaNaNFemale01986-01-23Electrical EngineerManufacturingAffluent CustomerNYes16.0
474475VernorNaNMale01996-11-14Nuclear Power EngineerManufacturingAffluent CustomerNNo1.0
493494GabyNaNMale331975-06-02Design EngineerManufacturingMass CustomerNNo9.0
513514TrentNaNMale91996-06-20Associate ProfessorFinancial ServicesMass CustomerNYes4.0
525526ArdelleNaNU9NaTSocial WorkerHealthMass CustomerNYesNaN
656657HoytNaNMale661993-02-18Safety Technician IIManufacturingAffluent CustomerNNo10.0
659660StormiNaNFemale821995-07-29Geological EngineerManufacturingHigh Net WorthNNo6.0
675676CurtisNaNMale511968-05-19Senior EditorNaNHigh Net WorthNYes14.0
683684MalvinNaNMale881987-07-03Desktop Support TechnicianFinancial ServicesMass CustomerNNo14.0
689690LindseyNaNMale951987-03-27Assistant ProfessorNaNAffluent CustomerNYes17.0
702703EtheldaNaNFemale661966-10-31NaNPropertyMass CustomerNNo15.0
743744HeinrikNaNMale541977-08-30Graphic DesignerManufacturingAffluent CustomerNYes14.0
779780KimNaNFemale241973-10-12ProfessorFinancial ServicesMass CustomerNNo20.0
789790YvonneNaNFemale221968-03-24Senior EditorNaNAffluent CustomerNNo15.0
856857TheoNaNFemale151964-08-14General ManagerNaNHigh Net WorthNNo4.0
859860IdaNaNFemale801980-08-12NaNNaNHigh Net WorthNYes7.0
915916JoycelinNaNFemale181991-06-18RecruiterNaNAffluent CustomerNNo8.0
926927JarretNaNMale251966-02-19Cost AccountantFinancial ServicesMass CustomerNYes18.0
937938CorabelleNaNFemale181996-04-06Technical WriterRetailMass CustomerNNo7.0
.......................................
31793180GageNaNMale961974-06-14Business Systems Development AnalystITMass CustomerNYes19.0
31873188BoydNaNMale941999-07-07ActuaryFinancial ServicesMass CustomerNNo1.0
31993200MarnaNaNFemale511995-11-03Environmental TechManufacturingMass CustomerNNo1.0
32583259RabiNaNMale741953-11-04Quality Control SpecialistNaNHigh Net WorthNNo10.0
33183319ErdaNaNFemale671966-04-04NaNFinancial ServicesAffluent CustomerNYes19.0
33203321IvesNaNMale381980-05-10Software Test Engineer INaNHigh Net WorthNYes14.0
33233324SholomNaNMale321973-07-11Research NurseHealthMass CustomerNYes10.0
33243325SylasNaNMale801996-10-08Database Administrator IVManufacturingHigh Net WorthNNo1.0
33463347NicholsNaNMale991985-11-08Computer Systems Analyst IIEntertainmentHigh Net WorthNYes18.0
33633364TruemanNaNMale771993-08-19Engineer IVManufacturingMass CustomerNYes3.0
33843385RondaNaNFemale231975-02-10Systems Administrator IIIArgicultureMass CustomerNNo9.0
33963397MelisandeNaNFemale701985-08-19Product EngineerITMass CustomerNNo11.0
34003401CristieNaNFemale921993-07-28Tax AccountantTelecommunicationsMass CustomerNYes4.0
34423443FranNaNMale111995-04-12Technical WriterNaNMass CustomerNYes5.0
34443445CraggyNaNMale621966-06-23Database Administrator IFinancial ServicesAffluent CustomerNYes11.0
34463447LinellNaNFemale431977-11-23NaNFinancial ServicesHigh Net WorthNNo17.0
34793480JaribNaNMale301959-06-24NaNNaNMass CustomerNNo20.0
35543555LatashiaNaNFemale961976-02-26Programmer Analyst IIManufacturingMass CustomerNNo21.0
35963597GiorgiNaNMale711954-06-16Analog Circuit Design managerPropertyAffluent CustomerNYes16.0
36233624LenkaNaNFemale541984-10-16Cost AccountantFinancial ServicesMass CustomerNYes7.0
36343635ElsetNaNFemale511977-07-06VP MarketingRetailHigh Net WorthNNo9.0
36503651BaxieNaNMale911999-11-15Human Resources Assistant IManufacturingMass CustomerNNo2.0
37173718DamianoNaNU22NaTGeologist IVITMass CustomerNYesNaN
37553756BarryNaNMale221977-07-08NaNNaNAffluent CustomerNNo10.0
38163817TuckieNaNMale651957-05-02VP Product ManagementManufacturingHigh Net WorthNNo13.0
38843885AsherNaNMale551978-06-17ActuaryFinancial ServicesMass CustomerNYes8.0
39153916MyrtiaNaNFemale311958-10-17NaNRetailAffluent CustomerNYes17.0
39263927ConwayNaNMale291978-01-07Electrical EngineerManufacturingMass CustomerNYes7.0
39613962BenoitNaNMale171977-10-06Project ManagerArgicultureHigh Net WorthNYes14.0
39983999PatriziusNaNMale111973-10-24NaNManufacturingAffluent CustomerNYes10.0
\n", "

125 rows × 12 columns

\n", "
" ], "text/plain": [ " customer_id first_name last_name gender \\\n", "3 4 Talbot NaN Male \n", "66 67 Vernon NaN Male \n", "105 106 Glyn NaN Male \n", "138 139 Gar NaN Male \n", "196 197 Avis NaN Female \n", "210 211 Beitris NaN Female \n", "249 250 Kristofer NaN Male \n", "250 251 Mala NaN Female \n", "256 257 Marissa NaN Female \n", "274 275 Dud NaN Male \n", "355 356 Nichole NaN Female \n", "459 460 Illa NaN Female \n", "474 475 Vernor NaN Male \n", "493 494 Gaby NaN Male \n", "513 514 Trent NaN Male \n", "525 526 Ardelle NaN U \n", "656 657 Hoyt NaN Male \n", "659 660 Stormi NaN Female \n", "675 676 Curtis NaN Male \n", "683 684 Malvin NaN Male \n", "689 690 Lindsey NaN Male \n", "702 703 Ethelda NaN Female \n", "743 744 Heinrik NaN Male \n", "779 780 Kim NaN Female \n", "789 790 Yvonne NaN Female \n", "856 857 Theo NaN Female \n", "859 860 Ida NaN Female \n", "915 916 Joycelin NaN Female \n", "926 927 Jarret NaN Male \n", "937 938 Corabelle NaN Female \n", "... ... ... ... ... \n", "3179 3180 Gage NaN Male \n", "3187 3188 Boyd NaN Male \n", "3199 3200 Marna NaN Female \n", "3258 3259 Rabi NaN Male \n", "3318 3319 Erda NaN Female \n", "3320 3321 Ives NaN Male \n", "3323 3324 Sholom NaN Male \n", "3324 3325 Sylas NaN Male \n", "3346 3347 Nichols NaN Male \n", "3363 3364 Trueman NaN Male \n", "3384 3385 Ronda NaN Female \n", "3396 3397 Melisande NaN Female \n", "3400 3401 Cristie NaN Female \n", "3442 3443 Fran NaN Male \n", "3444 3445 Craggy NaN Male \n", "3446 3447 Linell NaN Female \n", "3479 3480 Jarib NaN Male \n", "3554 3555 Latashia NaN Female \n", "3596 3597 Giorgi NaN Male \n", "3623 3624 Lenka NaN Female \n", "3634 3635 Elset NaN Female \n", "3650 3651 Baxie NaN Male \n", "3717 3718 Damiano NaN U \n", "3755 3756 Barry NaN Male \n", "3816 3817 Tuckie NaN Male \n", "3884 3885 Asher NaN Male \n", "3915 3916 Myrtia NaN Female \n", "3926 3927 Conway NaN Male \n", "3961 3962 Benoit NaN Male \n", "3998 3999 Patrizius NaN Male \n", "\n", " past_3_years_bike_related_purchases DOB \\\n", "3 33 1961-10-03 \n", "66 67 1960-06-14 \n", "105 54 1966-07-03 \n", "138 1 1964-07-28 \n", "196 32 1977-01-27 \n", "210 6 1974-03-04 \n", "249 53 1988-04-15 \n", "250 88 1977-12-24 \n", "256 70 1966-02-08 \n", "274 7 1955-07-27 \n", "355 10 1975-03-30 \n", "459 0 1986-01-23 \n", "474 0 1996-11-14 \n", "493 33 1975-06-02 \n", "513 9 1996-06-20 \n", "525 9 NaT \n", "656 66 1993-02-18 \n", "659 82 1995-07-29 \n", "675 51 1968-05-19 \n", "683 88 1987-07-03 \n", "689 95 1987-03-27 \n", "702 66 1966-10-31 \n", "743 54 1977-08-30 \n", "779 24 1973-10-12 \n", "789 22 1968-03-24 \n", "856 15 1964-08-14 \n", "859 80 1980-08-12 \n", "915 18 1991-06-18 \n", "926 25 1966-02-19 \n", "937 18 1996-04-06 \n", "... ... ... \n", "3179 96 1974-06-14 \n", "3187 94 1999-07-07 \n", "3199 51 1995-11-03 \n", "3258 74 1953-11-04 \n", "3318 67 1966-04-04 \n", "3320 38 1980-05-10 \n", "3323 32 1973-07-11 \n", "3324 80 1996-10-08 \n", "3346 99 1985-11-08 \n", "3363 77 1993-08-19 \n", "3384 23 1975-02-10 \n", "3396 70 1985-08-19 \n", "3400 92 1993-07-28 \n", "3442 11 1995-04-12 \n", "3444 62 1966-06-23 \n", "3446 43 1977-11-23 \n", "3479 30 1959-06-24 \n", "3554 96 1976-02-26 \n", "3596 71 1954-06-16 \n", "3623 54 1984-10-16 \n", "3634 51 1977-07-06 \n", "3650 91 1999-11-15 \n", "3717 22 NaT \n", "3755 22 1977-07-08 \n", "3816 65 1957-05-02 \n", "3884 55 1978-06-17 \n", "3915 31 1958-10-17 \n", "3926 29 1978-01-07 \n", "3961 17 1977-10-06 \n", "3998 11 1973-10-24 \n", "\n", " job_title job_industry_category \\\n", "3 NaN IT \n", "66 Web Developer II Retail \n", "105 Software Test Engineer III Health \n", "138 Operator Telecommunications \n", "196 NaN NaN \n", "210 VP Marketing Manufacturing \n", "249 Legal Assistant Health \n", "250 VP Sales Financial Services \n", "256 Sales Associate Manufacturing \n", "274 VP Sales Health \n", "355 Librarian Entertainment \n", "459 Electrical Engineer Manufacturing \n", "474 Nuclear Power Engineer Manufacturing \n", "493 Design Engineer Manufacturing \n", "513 Associate Professor Financial Services \n", "525 Social Worker Health \n", "656 Safety Technician II Manufacturing \n", "659 Geological Engineer Manufacturing \n", "675 Senior Editor NaN \n", "683 Desktop Support Technician Financial Services \n", "689 Assistant Professor NaN \n", "702 NaN Property \n", "743 Graphic Designer Manufacturing \n", "779 Professor Financial Services \n", "789 Senior Editor NaN \n", "856 General Manager NaN \n", "859 NaN NaN \n", "915 Recruiter NaN \n", "926 Cost Accountant Financial Services \n", "937 Technical Writer Retail \n", "... ... ... \n", "3179 Business Systems Development Analyst IT \n", "3187 Actuary Financial Services \n", "3199 Environmental Tech Manufacturing \n", "3258 Quality Control Specialist NaN \n", "3318 NaN Financial Services \n", "3320 Software Test Engineer I NaN \n", "3323 Research Nurse Health \n", "3324 Database Administrator IV Manufacturing \n", "3346 Computer Systems Analyst II Entertainment \n", "3363 Engineer IV Manufacturing \n", "3384 Systems Administrator III Argiculture \n", "3396 Product Engineer IT \n", "3400 Tax Accountant Telecommunications \n", "3442 Technical Writer NaN \n", "3444 Database Administrator I Financial Services \n", "3446 NaN Financial Services \n", "3479 NaN NaN \n", "3554 Programmer Analyst II Manufacturing \n", "3596 Analog Circuit Design manager Property \n", "3623 Cost Accountant Financial Services \n", "3634 VP Marketing Retail \n", "3650 Human Resources Assistant I Manufacturing \n", "3717 Geologist IV IT \n", "3755 NaN NaN \n", "3816 VP Product Management Manufacturing \n", "3884 Actuary Financial Services \n", "3915 NaN Retail \n", "3926 Electrical Engineer Manufacturing \n", "3961 Project Manager Argiculture \n", "3998 NaN Manufacturing \n", "\n", " wealth_segment deceased_indicator owns_car tenure \n", "3 Mass Customer N No 7.0 \n", "66 Mass Customer N No 18.0 \n", "105 High Net Worth N Yes 18.0 \n", "138 Affluent Customer N No 4.0 \n", "196 High Net Worth N No 5.0 \n", "210 Mass Customer N Yes 5.0 \n", "249 Mass Customer N Yes 13.0 \n", "250 Affluent Customer N Yes 10.0 \n", "256 Affluent Customer N Yes 19.0 \n", "274 High Net Worth N No 13.0 \n", "355 High Net Worth N No 5.0 \n", "459 Affluent Customer N Yes 16.0 \n", "474 Affluent Customer N No 1.0 \n", "493 Mass Customer N No 9.0 \n", "513 Mass Customer N Yes 4.0 \n", "525 Mass Customer N Yes NaN \n", "656 Affluent Customer N No 10.0 \n", "659 High Net Worth N No 6.0 \n", "675 High Net Worth N Yes 14.0 \n", "683 Mass Customer N No 14.0 \n", "689 Affluent Customer N Yes 17.0 \n", "702 Mass Customer N No 15.0 \n", "743 Affluent Customer N Yes 14.0 \n", "779 Mass Customer N No 20.0 \n", "789 Affluent Customer N No 15.0 \n", "856 High Net Worth N No 4.0 \n", "859 High Net Worth N Yes 7.0 \n", "915 Affluent Customer N No 8.0 \n", "926 Mass Customer N Yes 18.0 \n", "937 Mass Customer N No 7.0 \n", "... ... ... ... ... \n", "3179 Mass Customer N Yes 19.0 \n", "3187 Mass Customer N No 1.0 \n", "3199 Mass Customer N No 1.0 \n", "3258 High Net Worth N No 10.0 \n", "3318 Affluent Customer N Yes 19.0 \n", "3320 High Net Worth N Yes 14.0 \n", "3323 Mass Customer N Yes 10.0 \n", "3324 High Net Worth N No 1.0 \n", "3346 High Net Worth N Yes 18.0 \n", "3363 Mass Customer N Yes 3.0 \n", "3384 Mass Customer N No 9.0 \n", "3396 Mass Customer N No 11.0 \n", "3400 Mass Customer N Yes 4.0 \n", "3442 Mass Customer N Yes 5.0 \n", "3444 Affluent Customer N Yes 11.0 \n", "3446 High Net Worth N No 17.0 \n", "3479 Mass Customer N No 20.0 \n", "3554 Mass Customer N No 21.0 \n", "3596 Affluent Customer N Yes 16.0 \n", "3623 Mass Customer N Yes 7.0 \n", "3634 High Net Worth N No 9.0 \n", "3650 Mass Customer N No 2.0 \n", "3717 Mass Customer N Yes NaN \n", "3755 Affluent Customer N No 10.0 \n", "3816 High Net Worth N No 13.0 \n", "3884 Mass Customer N Yes 8.0 \n", "3915 Affluent Customer N Yes 17.0 \n", "3926 Mass Customer N Yes 7.0 \n", "3961 High Net Worth N Yes 14.0 \n", "3998 Affluent Customer N Yes 10.0 \n", "\n", "[125 rows x 12 columns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Fetching records where last name is missing.\n", "\n", "cust_demo[cust_demo['last_name'].isnull()]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "cust_demo['last_name'].fillna('None',axis=0, inplace=True)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo['last_name'].isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently there are no missing values for last name column." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 Date of Birth" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
customer_idfirst_namelast_namegenderpast_3_years_bike_related_purchasesDOBjob_titlejob_industry_categorywealth_segmentdeceased_indicatorowns_cartenure
143144JoryBarrabealeU71NaTEnvironmental TechITMass CustomerNNoNaN
167168ReggieBroggettiU8NaTGeneral ManagerITAffluent CustomerNYesNaN
266267EdgarBucklerU53NaTNaNITHigh Net WorthNNoNaN
289290GiorgioKevaneU42NaTSenior Sales AssociateITMass CustomerNNoNaN
450451MarlowFlowerdewU37NaTQuality Control SpecialistITHigh Net WorthNNoNaN
452453CorneliusYarmouthU81NaTAssistant ProfessorITHigh Net WorthNNoNaN
453454EugenieDomencU58NaTResearch NurseHealthAffluent CustomerNYesNaN
479480DarelleIveU67NaTRegistered NurseHealthMass CustomerNYesNaN
512513KienanSoarU30NaTTax AccountantITMass CustomerNNoNaN
525526ArdelleNoneU9NaTSocial WorkerHealthMass CustomerNYesNaN
547548GeorgieCudbertsonU84NaTNaNITHigh Net WorthNYesNaN
581582RhodaMcKeownU21NaTStaff ScientistITAffluent CustomerNNoNaN
598599ErnestusCrudenU48NaTSenior Financial AnalystFinancial ServicesMass CustomerNYesNaN
679680GayPickersgillU22NaTNaNITHigh Net WorthNYesNaN
684685BoothBirkinU28NaTSenior DeveloperITMass CustomerNNoNaN
798799HarlandSpilisyU39NaTProgrammer IITMass CustomerNYesNaN
838839CharisGreavesU14NaTStructural Analysis EngineerITMass CustomerNYesNaN
882883LolitaBennieU73NaTRecruiterITMass CustomerNYesNaN
891892ConroyHealyU22NaTOffice Assistant IIITMass CustomerNYesNaN
949950BretIvakhnovU24NaTRecruiterITHigh Net WorthNYesNaN
974975GoldarinaRzehorzU26NaTAutomation Specialist IVITMass CustomerNNoNaN
982983ShaylynRiggsU49NaTNaNITAffluent CustomerNNoNaN
995996AuraBemlottU67NaTAssistant ManagerITMass CustomerNYesNaN
10371038FraserAcomeU57NaTEngineer IManufacturingMass CustomerNYesNaN
10431044FredericoWhilderU4NaTFood ChemistHealthHigh Net WorthNNoNaN
10811082GuinevereKelbyU90NaTFinancial AnalystFinancial ServicesMass CustomerNYesNaN
11731174ShellysheldonGooderridgeU9NaTExecutive SecretaryITMass CustomerNNoNaN
12091210ShandieSpriggU81NaTProgrammer IIITMass CustomerNNoNaN
12431244GlennTinhamU80NaTFinancial AnalystFinancial ServicesMass CustomerNYesNaN
13501351LorettalornaNoneU32NaTOffice Assistant IVITHigh Net WorthNNoNaN
.......................................
26952696IsabelleBursnollU42NaTSocial WorkerHealthMass CustomerNYesNaN
26962697KlarikaYerbyU70NaTLegal AssistantITHigh Net WorthNNoNaN
28532854VikkyDydeU49NaTProject ManagerITHigh Net WorthNYesNaN
29192920CasarRitchleyU0NaTBusiness Systems Development AnalystITMass CustomerNYesNaN
29622963ChristinFrickeU17NaTSafety Technician IIITAffluent CustomerNYesNaN
29982999RinaldoDigginU28NaTBusiness Systems Development AnalystITAffluent CustomerNYesNaN
30113012DevlandProbartU81NaTTechnical WriterITMass CustomerNYesNaN
30853086PieterGadesbyU18NaTBiostatistician IITHigh Net WorthNNoNaN
31503151ThornChoffinU20NaTSenior DeveloperITAffluent CustomerNYesNaN
32213222CaralieSellorsU40NaTSenior EditorITAffluent CustomerNNoNaN
32223223TiffiWorttU44NaTDatabase Administrator IIIITMass CustomerNYesNaN
32543255SutherlanTruinU47NaTEngineer IVITHigh Net WorthNNoNaN
32873288FairDewenU47NaTEngineer IIIITHigh Net WorthNNoNaN
32973298ChristineBaignardU1NaTVP Quality ControlITAffluent CustomerNYesNaN
33113312FrankyNannininiU49NaTAdministrative OfficerITHigh Net WorthNNoNaN
33213322HewSworderU24NaTFinancial AnalystFinancial ServicesAffluent CustomerNYesNaN
33423343CristabelBimU3NaTRecruiterITMass CustomerNYesNaN
33643365KarlensChaffynU29NaTEngineer IIIITMass CustomerNNoNaN
34723473SandersonAllowayU34NaTAnalog Circuit Design managerITMass CustomerNNoNaN
35093510JemimaIzaacU48NaTSafety Technician IIITAffluent CustomerNYesNaN
35123513EnriquetaWaterhowseU80NaTInternal AuditorITAffluent CustomerNYesNaN
35643565CharylPottipharU14NaTStructural EngineerITHigh Net WorthNYesNaN
36533654KenyonPaddefieldU78NaTElectrical EngineerManufacturingMass CustomerNNoNaN
37173718DamianoNoneU22NaTGeologist IVITMass CustomerNYesNaN
37263727EbaYouleU65NaTAssistant ProfessorITMass CustomerNNoNaN
37783779UlickDasparU68NaTNaNITAffluent CustomerNNoNaN
38823883NissaConradU35NaTLegal AssistantITMass CustomerNNoNaN
39303931KylieEpineU19NaTNaNITHigh Net WorthNYesNaN
39343935TeodorAlfonsiniU72NaTNaNITHigh Net WorthNYesNaN
39973998SareneWoolleyU60NaTAssistant ManagerITHigh Net WorthNNoNaN
\n", "

87 rows × 12 columns

\n", "
" ], "text/plain": [ " customer_id first_name last_name gender \\\n", "143 144 Jory Barrabeale U \n", "167 168 Reggie Broggetti U \n", "266 267 Edgar Buckler U \n", "289 290 Giorgio Kevane U \n", "450 451 Marlow Flowerdew U \n", "452 453 Cornelius Yarmouth U \n", "453 454 Eugenie Domenc U \n", "479 480 Darelle Ive U \n", "512 513 Kienan Soar U \n", "525 526 Ardelle None U \n", "547 548 Georgie Cudbertson U \n", "581 582 Rhoda McKeown U \n", "598 599 Ernestus Cruden U \n", "679 680 Gay Pickersgill U \n", "684 685 Booth Birkin U \n", "798 799 Harland Spilisy U \n", "838 839 Charis Greaves U \n", "882 883 Lolita Bennie U \n", "891 892 Conroy Healy U \n", "949 950 Bret Ivakhnov U \n", "974 975 Goldarina Rzehorz U \n", "982 983 Shaylyn Riggs U \n", "995 996 Aura Bemlott U \n", "1037 1038 Fraser Acome U \n", "1043 1044 Frederico Whilder U \n", "1081 1082 Guinevere Kelby U \n", "1173 1174 Shellysheldon Gooderridge U \n", "1209 1210 Shandie Sprigg U \n", "1243 1244 Glenn Tinham U \n", "1350 1351 Lorettalorna None U \n", "... ... ... ... ... \n", "2695 2696 Isabelle Bursnoll U \n", "2696 2697 Klarika Yerby U \n", "2853 2854 Vikky Dyde U \n", "2919 2920 Casar Ritchley U \n", "2962 2963 Christin Fricke U \n", "2998 2999 Rinaldo Diggin U \n", "3011 3012 Devland Probart U \n", "3085 3086 Pieter Gadesby U \n", "3150 3151 Thorn Choffin U \n", "3221 3222 Caralie Sellors U \n", "3222 3223 Tiffi Wortt U \n", "3254 3255 Sutherlan Truin U \n", "3287 3288 Fair Dewen U \n", "3297 3298 Christine Baignard U \n", "3311 3312 Franky Nanninini U \n", "3321 3322 Hew Sworder U \n", "3342 3343 Cristabel Bim U \n", "3364 3365 Karlens Chaffyn U \n", "3472 3473 Sanderson Alloway U \n", "3509 3510 Jemima Izaac U \n", "3512 3513 Enriqueta Waterhowse U \n", "3564 3565 Charyl Pottiphar U \n", "3653 3654 Kenyon Paddefield U \n", "3717 3718 Damiano None U \n", "3726 3727 Eba Youle U \n", "3778 3779 Ulick Daspar U \n", "3882 3883 Nissa Conrad U \n", "3930 3931 Kylie Epine U \n", "3934 3935 Teodor Alfonsini U \n", "3997 3998 Sarene Woolley U \n", "\n", " past_3_years_bike_related_purchases DOB \\\n", "143 71 NaT \n", "167 8 NaT \n", "266 53 NaT \n", "289 42 NaT \n", "450 37 NaT \n", "452 81 NaT \n", "453 58 NaT \n", "479 67 NaT \n", "512 30 NaT \n", "525 9 NaT \n", "547 84 NaT \n", "581 21 NaT \n", "598 48 NaT \n", "679 22 NaT \n", "684 28 NaT \n", "798 39 NaT \n", "838 14 NaT \n", "882 73 NaT \n", "891 22 NaT \n", "949 24 NaT \n", "974 26 NaT \n", "982 49 NaT \n", "995 67 NaT \n", "1037 57 NaT \n", "1043 4 NaT \n", "1081 90 NaT \n", "1173 9 NaT \n", "1209 81 NaT \n", "1243 80 NaT \n", "1350 32 NaT \n", "... ... .. \n", "2695 42 NaT \n", "2696 70 NaT \n", "2853 49 NaT \n", "2919 0 NaT \n", "2962 17 NaT \n", "2998 28 NaT \n", "3011 81 NaT \n", "3085 18 NaT \n", "3150 20 NaT \n", "3221 40 NaT \n", "3222 44 NaT \n", "3254 47 NaT \n", "3287 47 NaT \n", "3297 1 NaT \n", "3311 49 NaT \n", "3321 24 NaT \n", "3342 3 NaT \n", "3364 29 NaT \n", "3472 34 NaT \n", "3509 48 NaT \n", "3512 80 NaT \n", "3564 14 NaT \n", "3653 78 NaT \n", "3717 22 NaT \n", "3726 65 NaT \n", "3778 68 NaT \n", "3882 35 NaT \n", "3930 19 NaT \n", "3934 72 NaT \n", "3997 60 NaT \n", "\n", " job_title job_industry_category \\\n", "143 Environmental Tech IT \n", "167 General Manager IT \n", "266 NaN IT \n", "289 Senior Sales Associate IT \n", "450 Quality Control Specialist IT \n", "452 Assistant Professor IT \n", "453 Research Nurse Health \n", "479 Registered Nurse Health \n", "512 Tax Accountant IT \n", "525 Social Worker Health \n", "547 NaN IT \n", "581 Staff Scientist IT \n", "598 Senior Financial Analyst Financial Services \n", "679 NaN IT \n", "684 Senior Developer IT \n", "798 Programmer I IT \n", "838 Structural Analysis Engineer IT \n", "882 Recruiter IT \n", "891 Office Assistant II IT \n", "949 Recruiter IT \n", "974 Automation Specialist IV IT \n", "982 NaN IT \n", "995 Assistant Manager IT \n", "1037 Engineer I Manufacturing \n", "1043 Food Chemist Health \n", "1081 Financial Analyst Financial Services \n", "1173 Executive Secretary IT \n", "1209 Programmer II IT \n", "1243 Financial Analyst Financial Services \n", "1350 Office Assistant IV IT \n", "... ... ... \n", "2695 Social Worker Health \n", "2696 Legal Assistant IT \n", "2853 Project Manager IT \n", "2919 Business Systems Development Analyst IT \n", "2962 Safety Technician II IT \n", "2998 Business Systems Development Analyst IT \n", "3011 Technical Writer IT \n", "3085 Biostatistician I IT \n", "3150 Senior Developer IT \n", "3221 Senior Editor IT \n", "3222 Database Administrator III IT \n", "3254 Engineer IV IT \n", "3287 Engineer III IT \n", "3297 VP Quality Control IT \n", "3311 Administrative Officer IT \n", "3321 Financial Analyst Financial Services \n", "3342 Recruiter IT \n", "3364 Engineer III IT \n", "3472 Analog Circuit Design manager IT \n", "3509 Safety Technician II IT \n", "3512 Internal Auditor IT \n", "3564 Structural Engineer IT \n", "3653 Electrical Engineer Manufacturing \n", "3717 Geologist IV IT \n", "3726 Assistant Professor IT \n", "3778 NaN IT \n", "3882 Legal Assistant IT \n", "3930 NaN IT \n", "3934 NaN IT \n", "3997 Assistant Manager IT \n", "\n", " wealth_segment deceased_indicator owns_car tenure \n", "143 Mass Customer N No NaN \n", "167 Affluent Customer N Yes NaN \n", "266 High Net Worth N No NaN \n", "289 Mass Customer N No NaN \n", "450 High Net Worth N No NaN \n", "452 High Net Worth N No NaN \n", "453 Affluent Customer N Yes NaN \n", "479 Mass Customer N Yes NaN \n", "512 Mass Customer N No NaN \n", "525 Mass Customer N Yes NaN \n", "547 High Net Worth N Yes NaN \n", "581 Affluent Customer N No NaN \n", "598 Mass Customer N Yes NaN \n", "679 High Net Worth N Yes NaN \n", "684 Mass Customer N No NaN \n", "798 Mass Customer N Yes NaN \n", "838 Mass Customer N Yes NaN \n", "882 Mass Customer N Yes NaN \n", "891 Mass Customer N Yes NaN \n", "949 High Net Worth N Yes NaN \n", "974 Mass Customer N No NaN \n", "982 Affluent Customer N No NaN \n", "995 Mass Customer N Yes NaN \n", "1037 Mass Customer N Yes NaN \n", "1043 High Net Worth N No NaN \n", "1081 Mass Customer N Yes NaN \n", "1173 Mass Customer N No NaN \n", "1209 Mass Customer N No NaN \n", "1243 Mass Customer N Yes NaN \n", "1350 High Net Worth N No NaN \n", "... ... ... ... ... \n", "2695 Mass Customer N Yes NaN \n", "2696 High Net Worth N No NaN \n", "2853 High Net Worth N Yes NaN \n", "2919 Mass Customer N Yes NaN \n", "2962 Affluent Customer N Yes NaN \n", "2998 Affluent Customer N Yes NaN \n", "3011 Mass Customer N Yes NaN \n", "3085 High Net Worth N No NaN \n", "3150 Affluent Customer N Yes NaN \n", "3221 Affluent Customer N No NaN \n", "3222 Mass Customer N Yes NaN \n", "3254 High Net Worth N No NaN \n", "3287 High Net Worth N No NaN \n", "3297 Affluent Customer N Yes NaN \n", "3311 High Net Worth N No NaN \n", "3321 Affluent Customer N Yes NaN \n", "3342 Mass Customer N Yes NaN \n", "3364 Mass Customer N No NaN \n", "3472 Mass Customer N No NaN \n", "3509 Affluent Customer N Yes NaN \n", "3512 Affluent Customer N Yes NaN \n", "3564 High Net Worth N Yes NaN \n", "3653 Mass Customer N No NaN \n", "3717 Mass Customer N Yes NaN \n", "3726 Mass Customer N No NaN \n", "3778 Affluent Customer N No NaN \n", "3882 Mass Customer N No NaN \n", "3930 High Net Worth N Yes NaN \n", "3934 High Net Worth N Yes NaN \n", "3997 High Net Worth N No NaN \n", "\n", "[87 rows x 12 columns]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo[cust_demo['DOB'].isnull()]" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "round(cust_demo['DOB'].isnull().mean()*100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since less than 5 % of data has null date of birth. we can remove the records where date of birth is null." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Int64Index([ 143, 167, 266, 289, 450, 452, 453, 479, 512, 525, 547,\n", " 581, 598, 679, 684, 798, 838, 882, 891, 949, 974, 982,\n", " 995, 1037, 1043, 1081, 1173, 1209, 1243, 1350, 1476, 1508, 1582,\n", " 1627, 1682, 1739, 1772, 1779, 1805, 1917, 1937, 1989, 1999, 2020,\n", " 2068, 2164, 2204, 2251, 2294, 2334, 2340, 2413, 2425, 2468, 2539,\n", " 2641, 2646, 2695, 2696, 2853, 2919, 2962, 2998, 3011, 3085, 3150,\n", " 3221, 3222, 3254, 3287, 3297, 3311, 3321, 3342, 3364, 3472, 3509,\n", " 3512, 3564, 3653, 3717, 3726, 3778, 3882, 3930, 3934, 3997],\n", " dtype='int64')" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dob_index_drop = cust_demo[cust_demo['DOB'].isnull()].index\n", "dob_index_drop" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "cust_demo.drop(index=dob_index_drop, inplace=True, axis=0)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo['DOB'].isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently there are no missing values for DOB column." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating Age Column for checking further descripency in data" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "# Function to calculate the age as of today based on the DOB of the customer.\n", "\n", "def age(born):\n", " today = date.today()\n", " \n", " return today.year - born.year - ((today.month, today.day) < (born.month, born.day))\n", "\n", "cust_demo['Age'] = cust_demo['DOB'].apply(age)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABIQAAAHjCAYAAABSG8RzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAgAElEQVR4nO3dfXCddZ3//1duKCWNLU1SYFoKUoFREEUoUu5abqIywroMw7ADooPKIpaFhY6MHWeF3eHGrlAKSBnWwYVx0FX2D7sj48B3Qpd2F2RJbVGsyo2CCoolTWibBihJzu8P1vxgoSbNOTEJn8djhhlycs6Vz5W8c3J4cl3XqatUKpUAAAAAUIz68V4AAAAAAH9ZghAAAABAYQQhAAAAgMIIQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAoTON4L+BPfv/734/3EsZVW1tburq6xnsZTGJmiGqZIaplhqiWGaJaZohqmSGqNdFmaPbs2Tv9nCOEAAAAAAojCAEAAAAURhACAAAAKIwgBAAAAFAYQQgAAACgMIIQAAAAQGEEIQAAAIDCCEIAAAAAhRGEAAAAAAojCAEAAAAURhACAAAAKIwgBAAAAFAYQQgAAACgMIIQAAAAQGEEIQAAAIDCCEIAAAAAhRGEAAAAAAojCAEAAAAURhACAAAAKEzjeC8AAEZqcO19Nd1e/cJTa7o9AACYLEYchAYHB7N06dK0tLRk6dKl2bRpU2666ab09vbmgAMOyCWXXJLGxsa89tprufXWW/PrX/8673rXu3LZZZdlr732Gst9AAAAAGAXjPiUsR/+8IeZM2fO0Md33313TjvttNxyyy2ZNm1aVq9enSRZvXp1pk2blq9//es57bTT8u1vf7v2qwYAAABg1EYUhDZv3pz169fnlFNOSZJUKpVs3LgxCxYsSJKceOKJ6ezsTJKsW7cuJ554YpJkwYIF+dnPfpZKpTIGSwcAAABgNEYUhO66666cd955qaurS5Js27YtTU1NaWhoSJK0tLSku7s7SdLd3Z3W1tYkSUNDQ5qamrJt27axWDsAAAAAozDsNYR+/OMfZ8aMGZk3b142btw47Abf7migP4WkN+ro6EhHR0eSZNmyZWlraxvJet+xGhsbi/8eUB0zRLUmwwz1NTfXdHtNE3x/J5vJMENMbGaIapkhqmWGqNZkmqFhg9ATTzyRdevWZcOGDdmxY0defvnl3HXXXenr68vAwEAaGhrS3d2dlpaWJElra2s2b96c1tbWDAwMpK+vL81v8wK+vb097e3tQx93dXXVcLcmn7a2tuK/B1THDFGtyTBDg729Nd1e3wTf38lmMswQE5sZolpmiGqZIao10WZo9uzZO/3csKeMnXvuubn99tuzcuXKXHbZZXn/+9+fSy+9NIceemgeeeSRJMmDDz6Y+fPnJ0mOPPLIPPjgg0mSRx55JIceeujbHiEEAAAAwPgY8buM/V+f/OQnc++99+aSSy5Jb29vTj755CTJySefnN7e3lxyySW5995788lPfrJmiwUAAACgesOeMvZGhx56aA499NAkyd57752vfvWrb7nPlClTsmTJktqsDgAAAICaG/URQgAAAABMToIQAAAAQGEEIQAAAIDCCEIAAAAAhRGEAAAAAAojCAEAAAAURhACAAAAKIwgBAAAAFAYQQgAAACgMIIQAAAAQGEEIQAAAIDCCEIAAAAAhRGEAAAAAAojCAEAAAAURhACAAAAKIwgBAAAAFAYQQgAAACgMIIQAAAAQGEEIQAAAIDCCEIAAAAAhRGEAAAAAAojCAEAAAAURhACAAAAKIwgBAAAAFAYQQgAAACgMIIQAAAAQGEEIQAAAIDCCEIAAAAAhRGEAAAAAAojCAEAAAAURhACAAAAKIwgBAAAAFAYQQgAAACgMIIQAAAAQGEEIQAAAIDCCEIAAAAAhRGEAAAAAAojCAEAAAAURhACAAAAKIwgBAAAAFAYQQgAAACgMIIQAAAAQGEah7vDjh07ctVVV6W/vz8DAwNZsGBBzj777KxcuTI///nP09TUlCS5+OKL8+53vzuVSiV33nlnNmzYkN133z2LFy/OvHnzxnxHAAAAABiZYYPQbrvtlquuuipTp05Nf39/rrzyyhx++OFJkk996lNZsGDBm+6/YcOGvPDCC7nlllvy1FNP5Y477sh11103NqsHAAAAYJcNe8pYXV1dpk6dmiQZGBjIwMBA6urqdnr/devWZeHChamrq8vBBx+c7du3p6enp3YrBgAAAKAqI7qG0ODgYK644opccMEFOeyww3LQQQclSf7t3/4tX/ziF3PXXXfltddeS5J0d3enra1t6LGtra3p7u4eg6UDAAAAMBrDnjKWJPX19bn++uuzffv23HDDDfntb3+bc889N3vuuWf6+/vzL//yL/mP//iPnHXWWalUKm95/NsdUdTR0ZGOjo4kybJly94UkUrU2NhY/PeA6pghqjUZZqivubmm22ua4Ps72UyGGWJiM0NUywxRLTNEtSbTDI0oCP3JtGnTcsghh+Sxxx7LJz7xiSSvX2PopJNOyg9+8IMkrx8R1NXVNfSYzZs3Z+bMmW/ZVnt7e9rb24c+fuNjStTW1lb894DqmCGqNRlmaLC3t6bb65vg+zvZTIYZYmIzQ1TLDFEtM0S1JtoMzZ49e6efG/aUsa1bt2b79u1JXn/Hsccffzxz5swZui5QpVJJZ2dn5s6dmySZP39+1q5dm0qlkieffDJNTU1vG4QAAAAAGB/DHiHU09OTlStXZnBwMJVKJcccc0yOPPLI/NM//VO2bt2aJNl///1z4YUXJkk+9KEPZf369bn00kszZcqULF68eGz3AAAAAIBdMmwQ2n///fO1r33tLbdfddVVb3v/urq6XHDBBdWvDAAAAIAxMaJ3GQMAAADgnUMQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFEYQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFEYQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFEYQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFEYQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFEYQAgAAAChM43B32LFjR6666qr09/dnYGAgCxYsyNlnn51NmzblpptuSm9vbw444IBccsklaWxszGuvvZZbb701v/71r/Oud70rl112Wfbaa6+/xL4AAAAAMALDHiG022675aqrrsr111+fr33ta3nsscfy5JNP5u67785pp52WW265JdOmTcvq1auTJKtXr860adPy9a9/Paeddlq+/e1vj/lOAAAAADBywwahurq6TJ06NUkyMDCQgYGB1NXVZePGjVmwYEGS5MQTT0xnZ2eSZN26dTnxxBOTJAsWLMjPfvazVCqVMVo+AAAAALtq2FPGkmRwcDBf+tKX8sILL+RjH/tY9t577zQ1NaWhoSFJ0tLSku7u7iRJd3d3WltbkyQNDQ1pamrKtm3bMn369DHaBQAAAAB2xYiCUH19fa6//vps3749N9xwQ55//vmd3vftjgaqq6t7y20dHR3p6OhIkixbtixtbW0jXfM7UmNjY/HfA6pjhqjWZJihvubmmm6vaYLv72QzGWaIic0MUS0zRLXMENWaTDM0oiD0J9OmTcshhxySp556Kn19fRkYGEhDQ0O6u7vT0tKSJGltbc3mzZvT2tqagYGB9PX1pfltXsC3t7envb196OOurq4qd2Vya2trK/57QHXMENWaDDM02Ntb0+31TfD9nWwmwwwxsZkhqmWGqJYZoloTbYZmz569088New2hrVu3Zvv27Ulef8exxx9/PHPmzMmhhx6aRx55JEny4IMPZv78+UmSI488Mg8++GCS5JFHHsmhhx76tkcIAQAAADA+hj1CqKenJytXrszg4GAqlUqOOeaYHHnkkdl3331z00035bvf/W4OOOCAnHzyyUmSk08+ObfeemsuueSSNDc357LLLhvznQAAAABg5IYNQvvvv3++9rWvveX2vffeO1/96lffcvuUKVOyZMmS2qwOAAAAgJob9pQxAAAAAN5ZBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFEYQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFEYQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFEYQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFEYQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFEYQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwjQOd4eurq6sXLkyL730Uurq6tLe3p6Pf/zjueeee/LAAw9k+vTpSZJzzjknRxxxRJLk+9//flavXp36+vp85jOfyeGHHz62ewEAAADAiA0bhBoaGvKpT30q8+bNy8svv5ylS5fmAx/4QJLktNNOyyc+8Yk33f+5557Lww8/nBtvvDE9PT25+uqrc/PNN6e+3sFIAAAAABPBsJVm5syZmTdvXpJkjz32yJw5c9Ld3b3T+3d2dubYY4/Nbrvtlr322iv77LNPnn766dqtGAAAAICq7NJhO5s2bcozzzyTAw88MEly//3354tf/GJuu+229Pb2Jkm6u7vT2to69JiWlpY/G5AAAAAA+Msa9pSxP3nllVeyfPnynH/++WlqaspHP/rRnHXWWUmS733ve/nWt76VxYsXp1KpjGh7HR0d6ejoSJIsW7YsbW1to1j+O0djY2Px3wOqY4ao1mSYob7m5ppur2mC7+9kMxlmiInNDFEtM0S1zBDVmkwzNKIg1N/fn+XLl+eEE07I0UcfnSTZc889hz5/yimn5J//+Z+TJK2trdm8efPQ57q7u9PS0vKWbba3t6e9vX3o466urtHtwTtEW1tb8d8DqmOGqNZkmKHB/z0atVb6Jvj+TjaTYYaY2MwQ1TJDVMsMUa2JNkOzZ8/e6eeGPWWsUqnk9ttvz5w5c3L66acP3d7T0zP0748++mjmzp2bJJk/f34efvjhvPbaa9m0aVP+8Ic/DJ1iBgAAAMD4G/YIoSeeeCJr167NfvvtlyuuuCLJ628x/9BDD+XZZ59NXV1dZs2alQsvvDBJMnfu3BxzzDFZsmRJ6uvr87nPfc47jAEAAABMIMMGofe+972555573nL7EUccsdPHnHnmmTnzzDOrWxkAAAAAY8KhOwAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBhBCAAAAKAwghAAAABAYQQhAAAAgMIIQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBhBCAAAAKAwghAAAABAYRrHewEATAx9/29VBnt7a7rN+oWn1nR7AABAbThCCAAAAKAwghAAAABAYQQhAAAAgMIIQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBhBCAAAAKAwghAAAABAYQQhAAAAgMIIQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBhBCAAAAKAwghAAAABAYQQhAAAAgMI0jvcCAICdG1x7X023V7/w1JpuDwCAyckRQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBhBCAAAAKAwghAAAABAYRqHu0NXV1dWrlyZl156KXV1dWlvb8/HP/7x9Pb2ZsWKFXnxxRcza9asXH755Wlubk6lUsmdd96ZDRs2ZPfdd8/ixYszb968v8S+AAAAADACwx4h1NDQkE996lNZsWJFrr322tx///157rnnsmrVqhx22GG55ZZbcthhh2XVqlVJkg0bNuSFF17ILbfckgsvvDB33HHHmO8EAAAAACM3bBCaOXPm0BE+e+yxR+bMmZPu7u50dnZm0aJFSZJFixals7MzSbJu3bosXLgwdXV1Ofjgg7N9+/b09PSM4S4AAAAAsCuGPWXsjTZt2pRnnnkmBx54YLZs2ZKZM2cmeT0abd26NUnS3d2dtra2oce0tramu7t76L5/0tHRkY6OjiTJsmXL3vSYEjU2Nhb/PaA6ZohqvVrfkObm5ppus6nGM9k3wdc3FibTPnseolpmiGqZIaplhqjWZJqhEQehV155JcuXL8/555+fpqamnd6vUqm85ba6urq33Nbe3p729vahj7u6uka6lHektra24r8HVMcMUa2mwYH09vbWdJt9NZ7JwQm+vrEwmfbZ8xDVMkNUywxRLTNEtSbaDM2ePXunnxvRu4z19/dn+fLlOeGEE3L00UcnSWbMmDF0KlhPT0+mT5+e5PUjgt6485s3b37L0UEAAAAAjJ9hg1ClUsntt9+eOXPm5PTTTx+6ff78+VmzZk2SZM2aNTnqqKOGbl+7dm0qlUqefPLJNDU1CUIAAAAAE8iwp4w98cQTWbt2bfbbb79cccUVSZJzzjknZ5xxRlasWJHVq1enra0tS5YsSZJ86EMfyvr163PppZdmypQpWbx48djuAQAAAAC7ZNgg9N73vjf33HPP237uyiuvfMttdXV1ueCCC6pfGQAAAABjYkTXEAIAAADgnUMQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFEYQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFEYQAgAAACiMIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIVpHO8FAGUaXHtfTbdXv/DUmm4PAADgncwRQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBjvMgbAmKn1u8kBAAC14QghAAAAgMIIQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBhBCAAAAKAwghAAAABAYQQhAAAAgMI0jvcCAACgWoNr76vp9uoXnlrT7QHAROMIIQAAAIDCCEIAAAAAhRn2lLHbbrst69evz4wZM7J8+fIkyT333JMHHngg06dPT5Kcc845OeKII5Ik3//+97N69erU19fnM5/5TA4//PAxXD4AAAAAu2rYIHTiiSfm1FNPzcqVK990+2mnnZZPfOITb7rtueeey8MPP5wbb7wxPT09ufrqq3PzzTenvt6BSAAAAAATxbCl5pBDDklzc/OINtbZ2Zljjz02u+22W/baa6/ss88+efrpp6teJAAAAAC1M+p3Gbv//vuzdu3azJs3L5/+9KfT3Nyc7u7uHHTQQUP3aWlpSXd399s+vqOjIx0dHUmSZcuWpa2tbbRLeUdobGws/nswVvr+36qab7Ppo2fUfJvVmmwz1DfC0DxSTZNo3yeqV+sbRvw/AN4pJsPcTKbflcn2PMTEU80MTabfFcaO5yGqZYao1mSaoVEFoY9+9KM566yzkiTf+9738q1vfSuLFy9OpVIZ8Tba29vT3t4+9HFXV9dolvKO0dbW9hf7HpT2tqyDvb0132bfBJzXv+QM1UKtfy4T8Wcy2TQNDqR3DH5fJrLJMDeT6Xdlsj0PMfFUM0OT6XeFseN5iGqZIao10WZo9uzZO/3cqC7us+eee6a+vj719fU55ZRT8qtf/SpJ0trams2bNw/dr7u7Oy0tLaP5EgAAAACMkVEFoZ6enqF/f/TRRzN37twkyfz58/Pwww/ntddey6ZNm/KHP/whBx54YG1WCgAAAEBNDHvK2E033ZSf//zn2bZtWy666KKcffbZ2bhxY5599tnU1dVl1qxZufDCC5Mkc+fOzTHHHJMlS5akvr4+n/vc57zDGAAAAMAEM2wQuuyyy95y28knn7zT+5955pk588wzq1sVAAAAAGPG4TsAAAAAhRGEAAAAAAozqredBwAo1eDa+2q6vfqFp9Z0ewAAI+EIIQAAAIDCCEIAAAAAhRGEAAAAAArjGkIAwKi98Xo6fc3NGeztrXqbrqkDADD2BCEAqJFaX2wYAADGilPGAAAAAAojCAEAAAAURhACAAAAKIwgBAAAAFAYQQgAAACgMIIQAAAAQGEEIQAAAIDCCEIAAAAAhRGEAAAAAAojCAEAAAAURhACAAAAKEzjeC8AAIDaGlx7X023V7/w1Jpur0S1/pkkfi4AVMcRQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBhBCAAAAKAwghAAAABAYRrHewEAMF4G19433ksAAIBx4QghAAAAgMIIQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBhBCAAAAKAwghAAAABAYQQhAAAAgMIIQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUJjG4e5w2223Zf369ZkxY0aWL1+eJOnt7c2KFSvy4osvZtasWbn88svT3NycSqWSO++8Mxs2bMjuu++exYsXZ968eWO+EwAAAACM3LBHCJ144on58pe//KbbVq1alcMOOyy33HJLDjvssKxatSpJsmHDhrzwwgu55ZZbcuGFF+aOO+4Ym1UDAAAAMGrDBqFDDjkkzc3Nb7qts7MzixYtSpIsWrQonZ2dSZJ169Zl4cKFqaury8EHH5zt27enp6dnDJYNAAAAwGgNe8rY29myZUtmzpyZJJk5c2a2bt2aJOnu7k5bW9vQ/VpbW9Pd3T10X4DJYnDtfTXfZv3CU2u+TQAAgNEYVRDamUql8pbb6urq3va+HR0d6ejoSJIsW7bsTSGpRI2NjX+x70Hf/zniq1pNE/xnV+v9TSbmPv8lZ6gWav5zWf/ftd1eIXPzRq/WN7zliFDeeWo9h2/8XW6o0QxN9N+VyfB3dDKs8e1U87dsou9zKa9Hxttkez3ExGOGqNZkmqFRBaEZM2akp6cnM2fOTE9PT6ZPn57k9SOCurq6hu63efPmnR4d1N7envb29qGP3/i4ErW1tf3FvgeDvb013V7fBP/Z1Xp/k4m5z3/JGaqFsfi5THQTcW7eqGlwIL0F/lxKU+s5fOPvcnNzc01maKL/rkyGv6OTYY1vp5q/ZRN9n0t5PTLeJtvrISYeM0S1JtoMzZ49e6efG9Xbzs+fPz9r1qxJkqxZsyZHHXXU0O1r165NpVLJk08+maamJqeLAQAAAEwwwx4hdNNNN+XnP/95tm3blosuuihnn312zjjjjKxYsSKrV69OW1tblixZkiT50Ic+lPXr1+fSSy/NlClTsnjx4jHfAQAAAAB2zbBB6LLLLnvb26+88sq33FZXV5cLLrig+lUBAAAAMGZqelFpYHRq8Y5Wfc3NQ9cn8G5WAAAA/DmjuoYQAAAAAJOXIAQAAABQGEEIAAAAoDCCEAAAAEBhBCEAAACAwghCAAAAAIURhAAAAAAKIwgBAAAAFKZxvBcA1N7g2vtqvs36hafWfJsAAACMD0cIAQAAABRGEAIAAAAojFPGAADG0Vic5gsAMBxHCAEAAAAURhACAAAAKIxTxgCgIE5PAgAgcYQQAAAAQHEEIQAAAIDCOGWMCcfpDAAwsdT6b3P9wlNruj0AYNc5QggAAACgMIIQAAAAQGEEIQAAAIDCCEIAAAAAhRGEAAAAAArjXcZgFLwTGgAAAJOZI4QAAAAACiMIAQAAABRGEAIAAAAojGsIAQATSq2v01a/8NSabg8A4J1AEAIA3tG8EQAAwFsJQjU22hedfc3NGeztfcvtk+H/anqhDQAAAJOLIAQAAP+H/+EFwDudIASMiBfGAAAA7xzeZQwAAACgMIIQAAAAQGGcMjbBOU0H2JmaPz80N9d2ewAAwITlCCEAAACAwghCAAAAAIURhAAAAAAK4xpCAAAwCdX6WnL1C0+t6fYAmNgcIQQAAABQGEEIAAAAoDBVnTJ28cUXZ+rUqamvr09DQ0OWLVuW3t7erFixIi+++GJmzZqVyy+/PM3eyhgAAABgwqj6GkJXXXVVpk+fPvTxqlWrcthhh+WMM87IqlWrsmrVqpx33nnVfhkAAAAAaqTmp4x1dnZm0aJFSZJFixals7Oz1l8CAAAAgCpUfYTQtddemyT5yEc+kvb29mzZsiUzZ85MksycOTNbt26t9ksAAAAAUENVBaGrr746LS0t2bJlS6655prMnj17xI/t6OhIR0dHkmTZsmVpa2urZikTRt8or5fUUN/gWktUxQxNfE01fp4b7fPNzpghqmWGGLH1//22N79a35CmwYHRbdPsVa3Wf6fGQ2Nj4zvmvysYH2aIak2mGaoqCLW0tCRJZsyYkaOOOipPP/10ZsyYkZ6ensycOTM9PT1vur7QG7W3t6e9vX3o466urmqWMmEM9vaO6nHNzc3pHeVjITFDk0FfjZ/nRvt8szNmiGqZIaplhsZXrf9OjYe2trZ3zH9XMD7MENWaaDP05w7cGfU1hF555ZW8/PLLQ//+05/+NPvtt1/mz5+fNWvWJEnWrFmTo446arRfAgAAAIAxMOojhLZs2ZIbbrghSTIwMJDjjz8+hx9+eN7znvdkxYoVWb16ddra2rJkyZKaLRYAAACA6o06CO299965/vrr33L7u971rlx55ZVVLQoAAACAsVP1u4wBMDKDa+8b7yUAwE7V+u9U/cJTa7o9AGpr1NcQAgAAAGByEoQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBhBCAAAAKAwghAAAABAYQQhAAAAgMIIQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojPOCkNgAAAyKSURBVCAEAAAAUBhBCAAAAKAwghAAAABAYQQhAAAAgMIIQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBhBCAAAAKAwghAAAABAYQQhAAAAgMIIQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBhBCAAAAKAwghAAAABAYQQhAAAAgMIIQgAAAACFEYQAAAAACiMIAQAAABRGEAIAAAAojCAEAAAAUBhBCAAAAKAwghAAAABAYQQhAAAAgMIIQgAAAACFaRyrDT/22GO58847Mzg4mFNOOSVnnHHGWH0pAAAAAHbBmBwhNDg4mG9+85v58pe/nBUrVuShhx7Kc889NxZfCgAAAIBdNCZHCD399NPZZ599svfeeydJjj322HR2dmbfffcdiy8HAADAOBhce19Nt1e/8NSabg/YuTEJQt3d3WltbR36uLW1NU899dRYfCkAAAB4W7sarPqamzPY2/tn71NitJro4W+ir2+iqqtUKpVab/RHP/pRfvKTn+Siiy5KkqxduzZPP/10PvvZzw7dp6OjIx0dHUmSZcuW1XoJAAAAAOzEmFxDqLW1NZs3bx76ePPmzZk5c+ab7tPe3p5ly5aJQf9r6dKl470EJjkzRLXMENUyQ1TLDFEtM0S1zBDVmkwzNCZB6D3veU/+8Ic/ZNOmTenv78/DDz+c+fPnj8WXAgAAAGAXjck1hBoaGvLZz3421157bQYHB3PSSSdl7ty5Y/GlAAAAANhFYxKEkuSII47IEUccMVabf8dpb28f7yUwyZkhqmWGqJYZolpmiGqZIaplhqjWZJqhMbmoNAAAAAAT15hcQwgAAACAiWvMThlj57q6urJy5cq89NJLqaurS3t7ez7+8Y+nt7c3K1asyIsvvphZs2bl8ssvT3Nz83gvlwlox44dueqqq9Lf35+BgYEsWLAgZ599djZt2pSbbropvb29OeCAA3LJJZeksdGvOW9vcHAwS5cuTUtLS5YuXWp+2GUXX3xxpk6dmvr6+jQ0NGTZsmX+lrFLtm/fnttvvz2/+93vUldXly984QuZPXu2GWJEfv/732fFihVDH2/atClnn312Fi1aZIYYkXvvvTerV69OXV1d5s6dm8WLF+ell17yeogR++EPf5gHHngglUolp5xySk477bRJ9VrIKWPjoKenJz09PZk3b15efvnlLF26NFdccUUefPDBNDc354wzzsiqVavS29ub8847b7yXywRUqVTy6quvZurUqenv78+VV16Z888/P/fee2+OPvroHHfccfnGN76Rd7/73fnoRz863stlgrr33nvzq1/9auh56MYbbzQ/7JKLL744X/3qVzN9+vSh2+6++25/yxixW2+9Ne973/tyyimnpL+/P6+++mq+//3vmyF22eDgYD7/+c/nuuuuy/3332+GGFZ3d3e+8pWvZMWKFZkyZUpuvPHGHHHEEVm/fr3XQ4zIb3/729x888257rrr0tjYmOuuuy4XXHBBHnjggUnzHOSUsXEwc+bMzJs3L0myxx57ZM6cOenu7k5nZ2cWLVqUJFm0aFE6OzvHc5lMYHV1dZk6dWqSZGBgIAMDA6mrq8vGjRuzYMGCJMmJJ55ohtipzZs3Z/369TnllFOSvB4ZzQ+14G8ZI9XX15df/OIXOfnkk5MkjY2NmTZtmhliVB5//PHss88+mTVrlhlixAYHB7Njx44MDAxkx44d2XPPPb0eYsSef/75HHTQQdl9993T0NCQ973vfXn00Ucn1XOQY9/G2aZNm/LMM8/kwAMPzJYtWzJz5swkr0ejrVu3jvPqmMgGBwfzpS99KS+88EI+9rGPZe+9905TU1MaGhqSJC0tLenu7h7nVTJR3XXXXTnvvPPy8ssvJ0m2bdtmfhiVa6+9NknykY98JO3t7f6WMWKbNm3K9OnTc9ttt+U3v/lN5s2bl/PPP98MMSoPPfRQjjvuuCQxQ4xIS0tL/uqv/ipf+MIXMmXKlHzwgx/MvHnzvB5ixObOnZvvfve72bZtW6ZMmZINGzbkPe95z6R6DhKExtErr7yS5cuX5/zzz09TU9N4L4dJpr6+Ptdff322b9+eG264Ic8///x4L4lJ4sc//nFmzJiRefPmZePGjeO9HCaxq6++Oi0tLdmyZUuuueaazJ49e7yXxCQyMDCQZ555Jp/97Gdz0EEH5c4778yqVavGe1lMQv39/fnxj3+cc889d7yXwiTS29ubzs7OrFy5Mk1NTbnxxhvz2GOPjfeymET23Xff/PVf/3WuueaaTJ06Nfvvv3/q6yfXSViC0Djp7+/P8uXLc8IJJ+Too49OksyYMSM9PT2ZOXNmenp63nRNBtiZadOm5ZBDDslTTz2Vvr6+DAwMpKGhId3d3WlpaRnv5TEBPfHEE1m3bl02bNiQHTt25OWXX85dd91lfthlf5qRGTNm5KijjsrTTz/tbxkj1tramtbW1hx00EFJkgULFmTVqlVmiF22YcOGHHDAAdlzzz2TeE3NyDz++OPZa6+9hubj6KOPzhNPPOH1ELvk5JNPHjr1+Tvf+U5aW1sn1XPQ5MpX7xCVSiW333575syZk9NPP33o9vnz52fNmjVJkjVr1uSoo44aryUywW3dujXbt29P8vo7jj3++OOZM2dODj300DzyyCNJkgcffDDz588fz2UyQZ177rm5/fbbs3Llylx22WV5//vfn0svvdT8sEteeeWVoVMOX3nllfz0pz/Nfvvt528ZI7bnnnumtbU1v//975O8/h9n++67rxlil73xdLHEa2pGpq2tLU899VReffXVVCqVoecgr4fYFVu2bEny+juJP/rooznuuOMm1XOQdxkbB7/85S9z5ZVXZr/99ktdXV2S5JxzzslBBx2UFStWpKurK21tbVmyZMmEfXs6xtdvfvObrFy5MoODg6lUKjnmmGNy1lln5Y9//ONb3iZzt912G+/lMoFt3LgxP/jBD7J06VLzwy754x//mBtuuCHJ66f+HH/88TnzzDOzbds2f8sYsWeffTa33357+vv7s9dee2Xx4sWpVCpmiBF79dVX84UvfCG33nrr0CUYPA8xUvfcc08efvjhNDQ05N3vfncuuuiidHd3ez3EiF155ZXZtm1bGhsb8+lPfzqHHXbYpHoOEoQAAAAACuOUMQAAAIDCCEIAAAAAhRGEAAAAAAojCAEAAAAURhACAAAAKIwgBAAAAFAYQQgA4A3+8R//MZ/5zGfy2muvjfdSAADGjCAEAPC/Nm3alF/84hdJknXr1o3zagAAxk7jeC8AAGCiWLt2bQ4++OAceOCBWbNmTY455pgkybZt27Jy5cr84he/yOzZs/PBD34wGzduzNVXX50kef755/Ov//qv+fWvf53p06fnb/7mb3LssceO564AAPxZjhACAPhfa9asyfHHH58TTjghP/nJT/LSSy8lSb75zW9m6tSp+cY3vpGLL744a9asGXrMK6+8kmuuuSbHH3987rjjjvz93/99vvnNb+Z3v/vdeO0GAMCwBCEAgCS//OUv09XVlWOOOSbz5s3L3nvvnf/+7//O4OBg/ud//idnn312dt999+y7775ZtGjR0OPWr1+fWbNm5aSTTkpDQ0PmzZuXo48+Oo888sg47g0AwJ/nlDEAgCQPPvhgPvCBD2T69OlJkuOPP37oiKGBgYG0trYO3feN//7iiy/mqaeeyvnnnz9028DAQBYuXPgXWzsAwK4ShACA4u3YsSM/+tGPMjg4mL/9279NkvT392f79u156aWX0tDQkM2bN2f27NlJks2bNw89trW1NYcccki+8pWvjMvaAQBGQxACAIr36KOPpr6+PsuXL09j4///8mjFihVZu3ZtPvzhD+ff//3fc9FFF6Wrqytr1qxJW1tbkuTII4/Md77znaxdu3boQtLPPvtspk6dmn333Xdc9gcAYDiuIQQAFG/NmjU56aST0tbWlj333HPon4997GP5r//6r3zuc59LX19fLrzwwtx666057rjjsttuuyVJ9thjj/zDP/xDHnrooXz+85/PhRdemG9/+9vp7+8f570CANi5ukqlUhnvRQAATCZ33313Xnrppfzd3/3deC8FAGBUHCEEADCM559/Pr/5zW9SqVTy9NNP5z//8z/z4Q9/eLyXBQAwaq4hBAAwjJdffjk333xzenp6MmPGjJx++uk56qijxntZAACj5pQxAAAAgMI4ZQwAAACgMIIQAAAAQGEEIQAAAIDCCEIAAAAAhRGEAAAAAAojCAEAAAAU5v8DkNaRf8GB240AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Viz to find out the Age Distribution\n", "plt.figure(figsize=(20,8))\n", "sns.distplot(cust_demo['Age'], kde=False, bins=50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Statistics of the Age column" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 3913.000000\n", "mean 43.346026\n", "std 12.803129\n", "min 19.000000\n", "25% 34.000000\n", "50% 43.000000\n", "75% 53.000000\n", "max 177.000000\n", "Name: Age, dtype: float64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo['Age'].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we find there is only 1 customer with an age of 177. Clearly this is an outlier since the 75th percentile of Age is 53." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
customer_idfirst_namelast_namegenderpast_3_years_bike_related_purchasesDOBjob_titlejob_industry_categorywealth_segmentdeceased_indicatorowns_cartenureAge
3334JephthahBachmannU591843-12-21Legal AssistantITAffluent CustomerNNo20.0177
\n", "
" ], "text/plain": [ " customer_id first_name last_name gender \\\n", "33 34 Jephthah Bachmann U \n", "\n", " past_3_years_bike_related_purchases DOB job_title \\\n", "33 59 1843-12-21 Legal Assistant \n", "\n", " job_industry_category wealth_segment deceased_indicator owns_car \\\n", "33 IT Affluent Customer N No \n", "\n", " tenure Age \n", "33 20.0 177 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo[cust_demo['Age'] > 100]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we see a customer with age 177 which is an outlier. hence we need to remove this record." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "age_index_drop = cust_demo[cust_demo['Age']>100].index\n", "\n", "cust_demo.drop(index=age_index_drop, inplace=True , axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.3 Tenure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When Date of Birth was Null the Tenure was also Null. Hence after removing null DOBs from dataframe , null tenures were also removed." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo['tenure'].isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are no missing values for Tenure column." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.4 Job Title" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
customer_idfirst_namelast_namegenderpast_3_years_bike_related_purchasesDOBjob_titlejob_industry_categorywealth_segmentdeceased_indicatorowns_cartenureAge
34TalbotNoneMale331961-10-03NaNITMass CustomerNNo7.059
56CurrDuckhouseMale351966-09-16NaNRetailHigh Net WorthNYes13.054
67FinaMeraliFemale61976-02-23NaNFinancial ServicesAffluent CustomerNYes11.045
1011UriahBisattMale991954-04-30NaNPropertyMass CustomerNNo9.067
2122DeeanneDurtnellFemale791962-12-10NaNITMass CustomerNNo11.058
2223OlavPolakMale431995-02-10NaNNaNHigh Net WorthNYes1.026
2930DarrickHelleckasMale181961-10-18NaNITAffluent CustomerNYes6.059
4546KailaAllinFemale981972-02-26NaNNaNAffluent CustomerNYes15.049
5152CurranBentsonMale571988-06-22NaNFinancial ServicesMass CustomerNYes13.032
5960NadiyaChamperlenFemale181970-02-04NaNManufacturingMass CustomerNNo10.051
6162SorchaRoggersFemale381979-07-06NaNITMass CustomerNYes22.041
7374PansyKiddieFemale941969-06-19NaNNaNMass CustomerNYes6.051
8081BeeBlazewiczFemale581986-09-04NaNHealthHigh Net WorthNNo13.034
107108KayleMingaudFemale41994-03-14NaNNaNHigh Net WorthNNo3.027
109110SaschaSt. QuintinMale232000-07-31NaNFinancial ServicesAffluent CustomerNNo1.020
160161TaddBlossMale491976-01-21NaNNaNMass CustomerNNo16.045
166167NathalieTideswellFemale951969-10-27NaNHealthHigh Net WorthNYes17.051
177178MatthieuBertelmotMale21967-04-03NaNNaNAffluent CustomerNNo8.054
184185CrosbyWalcotMale801979-12-13NaNPropertyMass CustomerNYes13.041
196197AvisNoneFemale321977-01-27NaNNaNHigh Net WorthNNo5.044
206207AdenaWhymanFemale91994-08-10NaNNaNMass CustomerNNo7.026
216217JeraleeQuartlyFemale631979-12-09NaNManufacturingHigh Net WorthNNo16.041
228229VaughnLambisMale301966-03-06NaNPropertyHigh Net WorthNNo19.055
243244GermayneSperryMale571974-11-25NaNRetailAffluent CustomerNNo8.046
261262CordiePetrelliMale971977-12-23NaNHealthHigh Net WorthNYes10.043
275276GoldiDwineFemale471990-03-25NaNFinancial ServicesMass CustomerNNo22.031
287288EbenezerSeedmanMale711985-09-08NaNManufacturingHigh Net WorthNNo9.035
295296MarshalRathboneMale341972-06-19NaNHealthHigh Net WorthNYes17.048
301302LauriceColgraveFemale321977-03-27NaNHealthMass CustomerNNo13.044
318319MadelleMatterisFemale321971-10-11NaNRetailMass CustomerNYes14.049
..........................................
37973798YorkerDennisonMale131968-02-22NaNManufacturingMass CustomerNYes17.053
38033804AndriaKeaysFemale231986-08-21NaNManufacturingMass CustomerNYes4.034
38053806AdoGailorMale11954-02-08NaNPropertyMass CustomerNNo7.067
38103811EttaLeeleFemale601997-03-19NaNFinancial ServicesHigh Net WorthNNo4.024
38213822ConnySpeechleyMale371959-03-09NaNManufacturingHigh Net WorthNYes18.062
38233824GiffardStollmanMale331994-11-21NaNPropertyMass CustomerNNo3.026
38253826MarlowBalffyeMale331978-09-25NaNHealthMass CustomerNNo7.042
38263827CheridaWhyffenFemale101976-09-05NaNRetailAffluent CustomerNNo8.044
38393840MarcTorransMale271962-09-30NaNNaNHigh Net WorthNNo5.058
38433844ClotildaOretFemale871987-12-06NaNManufacturingAffluent CustomerNNo15.033
38643865UrbanusFuxmanMale491978-03-15NaNManufacturingMass CustomerNYes11.043
38803881OlivieNazairFemale501971-01-12NaNFinancial ServicesAffluent CustomerNNo18.050
38923893HadriaMolesFemale71996-11-18NaNNaNHigh Net WorthNYes4.024
39083909MicheilMcGeorgeMale11987-10-04NaNManufacturingHigh Net WorthNYes18.033
39153916MyrtiaNoneFemale311958-10-17NaNRetailAffluent CustomerNYes17.062
39273928KristinWayFemale711982-04-16NaNPropertyAffluent CustomerNYes6.039
39283929JacquiFortnamFemale501989-10-18NaNNaNAffluent CustomerNYes10.031
39293930BlanchaBaldiFemale431988-01-06NaNFinancial ServicesHigh Net WorthNNo22.033
39323933ChiarraCopsFemale651983-07-05NaNNaNHigh Net WorthNYes10.037
39383939GeorgesDumbeltonMale671981-06-25NaNManufacturingAffluent CustomerNNo15.039
39443945LazarusDonaghyMale771994-10-21NaNRetailHigh Net WorthNNo7.026
39453946WylieFitzGilbertMale851960-06-23NaNRetailHigh Net WorthNYes10.060
39513952DiBorsnallFemale961968-05-09NaNManufacturingAffluent CustomerNNo10.053
39583959DannieSowrayMale761992-12-07NaNNaNMass CustomerNNo3.028
39593960HobartBurganMale62000-03-16NaNPropertyMass CustomerNNo1.021
39673968AlexandraKrochFemale991977-12-22NaNPropertyHigh Net WorthNNo22.043
39713972MaribelleSchaffelFemale61979-03-28NaNRetailMass CustomerNNo8.042
39783979KleonAdamMale671974-07-13NaNFinancial ServicesMass CustomerNYes18.046
39863987BeckieWakehamFemale181964-05-29NaNArgicultureMass CustomerNNo7.056
39983999PatriziusNoneMale111973-10-24NaNManufacturingAffluent CustomerNYes10.047
\n", "

497 rows × 13 columns

\n", "
" ], "text/plain": [ " customer_id first_name last_name gender \\\n", "3 4 Talbot None Male \n", "5 6 Curr Duckhouse Male \n", "6 7 Fina Merali Female \n", "10 11 Uriah Bisatt Male \n", "21 22 Deeanne Durtnell Female \n", "22 23 Olav Polak Male \n", "29 30 Darrick Helleckas Male \n", "45 46 Kaila Allin Female \n", "51 52 Curran Bentson Male \n", "59 60 Nadiya Champerlen Female \n", "61 62 Sorcha Roggers Female \n", "73 74 Pansy Kiddie Female \n", "80 81 Bee Blazewicz Female \n", "107 108 Kayle Mingaud Female \n", "109 110 Sascha St. Quintin Male \n", "160 161 Tadd Bloss Male \n", "166 167 Nathalie Tideswell Female \n", "177 178 Matthieu Bertelmot Male \n", "184 185 Crosby Walcot Male \n", "196 197 Avis None Female \n", "206 207 Adena Whyman Female \n", "216 217 Jeralee Quartly Female \n", "228 229 Vaughn Lambis Male \n", "243 244 Germayne Sperry Male \n", "261 262 Cordie Petrelli Male \n", "275 276 Goldi Dwine Female \n", "287 288 Ebenezer Seedman Male \n", "295 296 Marshal Rathbone Male \n", "301 302 Laurice Colgrave Female \n", "318 319 Madelle Matteris Female \n", "... ... ... ... ... \n", "3797 3798 Yorker Dennison Male \n", "3803 3804 Andria Keays Female \n", "3805 3806 Ado Gailor Male \n", "3810 3811 Etta Leele Female \n", "3821 3822 Conny Speechley Male \n", "3823 3824 Giffard Stollman Male \n", "3825 3826 Marlow Balffye Male \n", "3826 3827 Cherida Whyffen Female \n", "3839 3840 Marc Torrans Male \n", "3843 3844 Clotilda Oret Female \n", "3864 3865 Urbanus Fuxman Male \n", "3880 3881 Olivie Nazair Female \n", "3892 3893 Hadria Moles Female \n", "3908 3909 Micheil McGeorge Male \n", "3915 3916 Myrtia None Female \n", "3927 3928 Kristin Way Female \n", "3928 3929 Jacqui Fortnam Female \n", "3929 3930 Blancha Baldi Female \n", "3932 3933 Chiarra Cops Female \n", "3938 3939 Georges Dumbelton Male \n", "3944 3945 Lazarus Donaghy Male \n", "3945 3946 Wylie FitzGilbert Male \n", "3951 3952 Di Borsnall Female \n", "3958 3959 Dannie Sowray Male \n", "3959 3960 Hobart Burgan Male \n", "3967 3968 Alexandra Kroch Female \n", "3971 3972 Maribelle Schaffel Female \n", "3978 3979 Kleon Adam Male \n", "3986 3987 Beckie Wakeham Female \n", "3998 3999 Patrizius None Male \n", "\n", " past_3_years_bike_related_purchases DOB job_title \\\n", "3 33 1961-10-03 NaN \n", "5 35 1966-09-16 NaN \n", "6 6 1976-02-23 NaN \n", "10 99 1954-04-30 NaN \n", "21 79 1962-12-10 NaN \n", "22 43 1995-02-10 NaN \n", "29 18 1961-10-18 NaN \n", "45 98 1972-02-26 NaN \n", "51 57 1988-06-22 NaN \n", "59 18 1970-02-04 NaN \n", "61 38 1979-07-06 NaN \n", "73 94 1969-06-19 NaN \n", "80 58 1986-09-04 NaN \n", "107 4 1994-03-14 NaN \n", "109 23 2000-07-31 NaN \n", "160 49 1976-01-21 NaN \n", "166 95 1969-10-27 NaN \n", "177 2 1967-04-03 NaN \n", "184 80 1979-12-13 NaN \n", "196 32 1977-01-27 NaN \n", "206 9 1994-08-10 NaN \n", "216 63 1979-12-09 NaN \n", "228 30 1966-03-06 NaN \n", "243 57 1974-11-25 NaN \n", "261 97 1977-12-23 NaN \n", "275 47 1990-03-25 NaN \n", "287 71 1985-09-08 NaN \n", "295 34 1972-06-19 NaN \n", "301 32 1977-03-27 NaN \n", "318 32 1971-10-11 NaN \n", "... ... ... ... \n", "3797 13 1968-02-22 NaN \n", "3803 23 1986-08-21 NaN \n", "3805 1 1954-02-08 NaN \n", "3810 60 1997-03-19 NaN \n", "3821 37 1959-03-09 NaN \n", "3823 33 1994-11-21 NaN \n", "3825 33 1978-09-25 NaN \n", "3826 10 1976-09-05 NaN \n", "3839 27 1962-09-30 NaN \n", "3843 87 1987-12-06 NaN \n", "3864 49 1978-03-15 NaN \n", "3880 50 1971-01-12 NaN \n", "3892 7 1996-11-18 NaN \n", "3908 1 1987-10-04 NaN \n", "3915 31 1958-10-17 NaN \n", "3927 71 1982-04-16 NaN \n", "3928 50 1989-10-18 NaN \n", "3929 43 1988-01-06 NaN \n", "3932 65 1983-07-05 NaN \n", "3938 67 1981-06-25 NaN \n", "3944 77 1994-10-21 NaN \n", "3945 85 1960-06-23 NaN \n", "3951 96 1968-05-09 NaN \n", "3958 76 1992-12-07 NaN \n", "3959 6 2000-03-16 NaN \n", "3967 99 1977-12-22 NaN \n", "3971 6 1979-03-28 NaN \n", "3978 67 1974-07-13 NaN \n", "3986 18 1964-05-29 NaN \n", "3998 11 1973-10-24 NaN \n", "\n", " job_industry_category wealth_segment deceased_indicator owns_car \\\n", "3 IT Mass Customer N No \n", "5 Retail High Net Worth N Yes \n", "6 Financial Services Affluent Customer N Yes \n", "10 Property Mass Customer N No \n", "21 IT Mass Customer N No \n", "22 NaN High Net Worth N Yes \n", "29 IT Affluent Customer N Yes \n", "45 NaN Affluent Customer N Yes \n", "51 Financial Services Mass Customer N Yes \n", "59 Manufacturing Mass Customer N No \n", "61 IT Mass Customer N Yes \n", "73 NaN Mass Customer N Yes \n", "80 Health High Net Worth N No \n", "107 NaN High Net Worth N No \n", "109 Financial Services Affluent Customer N No \n", "160 NaN Mass Customer N No \n", "166 Health High Net Worth N Yes \n", "177 NaN Affluent Customer N No \n", "184 Property Mass Customer N Yes \n", "196 NaN High Net Worth N No \n", "206 NaN Mass Customer N No \n", "216 Manufacturing High Net Worth N No \n", "228 Property High Net Worth N No \n", "243 Retail Affluent Customer N No \n", "261 Health High Net Worth N Yes \n", "275 Financial Services Mass Customer N No \n", "287 Manufacturing High Net Worth N No \n", "295 Health High Net Worth N Yes \n", "301 Health Mass Customer N No \n", "318 Retail Mass Customer N Yes \n", "... ... ... ... ... \n", "3797 Manufacturing Mass Customer N Yes \n", "3803 Manufacturing Mass Customer N Yes \n", "3805 Property Mass Customer N No \n", "3810 Financial Services High Net Worth N No \n", "3821 Manufacturing High Net Worth N Yes \n", "3823 Property Mass Customer N No \n", "3825 Health Mass Customer N No \n", "3826 Retail Affluent Customer N No \n", "3839 NaN High Net Worth N No \n", "3843 Manufacturing Affluent Customer N No \n", "3864 Manufacturing Mass Customer N Yes \n", "3880 Financial Services Affluent Customer N No \n", "3892 NaN High Net Worth N Yes \n", "3908 Manufacturing High Net Worth N Yes \n", "3915 Retail Affluent Customer N Yes \n", "3927 Property Affluent Customer N Yes \n", "3928 NaN Affluent Customer N Yes \n", "3929 Financial Services High Net Worth N No \n", "3932 NaN High Net Worth N Yes \n", "3938 Manufacturing Affluent Customer N No \n", "3944 Retail High Net Worth N No \n", "3945 Retail High Net Worth N Yes \n", "3951 Manufacturing Affluent Customer N No \n", "3958 NaN Mass Customer N No \n", "3959 Property Mass Customer N No \n", "3967 Property High Net Worth N No \n", "3971 Retail Mass Customer N No \n", "3978 Financial Services Mass Customer N Yes \n", "3986 Argiculture Mass Customer N No \n", "3998 Manufacturing Affluent Customer N Yes \n", "\n", " tenure Age \n", "3 7.0 59 \n", "5 13.0 54 \n", "6 11.0 45 \n", "10 9.0 67 \n", "21 11.0 58 \n", "22 1.0 26 \n", "29 6.0 59 \n", "45 15.0 49 \n", "51 13.0 32 \n", "59 10.0 51 \n", "61 22.0 41 \n", "73 6.0 51 \n", "80 13.0 34 \n", "107 3.0 27 \n", "109 1.0 20 \n", "160 16.0 45 \n", "166 17.0 51 \n", "177 8.0 54 \n", "184 13.0 41 \n", "196 5.0 44 \n", "206 7.0 26 \n", "216 16.0 41 \n", "228 19.0 55 \n", "243 8.0 46 \n", "261 10.0 43 \n", "275 22.0 31 \n", "287 9.0 35 \n", "295 17.0 48 \n", "301 13.0 44 \n", "318 14.0 49 \n", "... ... ... \n", "3797 17.0 53 \n", "3803 4.0 34 \n", "3805 7.0 67 \n", "3810 4.0 24 \n", "3821 18.0 62 \n", "3823 3.0 26 \n", "3825 7.0 42 \n", "3826 8.0 44 \n", "3839 5.0 58 \n", "3843 15.0 33 \n", "3864 11.0 43 \n", "3880 18.0 50 \n", "3892 4.0 24 \n", "3908 18.0 33 \n", "3915 17.0 62 \n", "3927 6.0 39 \n", "3928 10.0 31 \n", "3929 22.0 33 \n", "3932 10.0 37 \n", "3938 15.0 39 \n", "3944 7.0 26 \n", "3945 10.0 60 \n", "3951 10.0 53 \n", "3958 3.0 28 \n", "3959 1.0 21 \n", "3967 22.0 43 \n", "3971 8.0 42 \n", "3978 18.0 46 \n", "3986 7.0 56 \n", "3998 10.0 47 \n", "\n", "[497 rows x 13 columns]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Fetching records where Job Title is missing.\n", "\n", "cust_demo[cust_demo['job_title'].isnull()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since Percentage of missing Job is 13. We will replace null values with Missing." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "cust_demo['job_title'].fillna('Missing', inplace=True, axis=0)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo['job_title'].isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently there are no mssing values for job_title column." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.5 Job Industry Category" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
customer_idfirst_namelast_namegenderpast_3_years_bike_related_purchasesDOBjob_titlejob_industry_categorywealth_segmentdeceased_indicatorowns_cartenureAge
45Sheila-kathrynCaltonFemale561977-05-13Senior EditorNaNAffluent CustomerNYes8.044
78RodInderMale311962-03-30Media Manager INaNMass CustomerNNo7.059
1516HarlinParrMale381977-02-27Media Manager IVNaNMass CustomerNYes18.044
1617HeathFaradayMale571962-03-19Sales AssociateNaNAffluent CustomerNYes15.059
1718MarjieNeashamFemale791967-07-06ProfessorNaNAffluent CustomerNNo11.053
2223OlavPolakMale431995-02-10MissingNaNHigh Net WorthNYes1.026
3233ErnstHaconMale441957-06-25Product EngineerNaNAffluent CustomerNYes11.063
3536LuretteStonnellFemale331977-11-09VP Quality ControlNaNAffluent CustomerNNo22.043
4546KailaAllinFemale981972-02-26MissingNaNAffluent CustomerNYes15.049
4748RebbeccaCasoneFemale461975-08-15Biostatistician IINaNMass CustomerNYes8.045
4849NollyOwnsworthMale631994-01-26VP Quality ControlNaNAffluent CustomerNNo1.027
5657AbbaMasedonM871988-06-13Chief Design EngineerNaNMass CustomerNYes13.032
5859NikiHeathcoteMale602000-02-08Physical Therapy AssistantNaNHigh Net WorthNNo3.021
6768DahliaEddoesFemale371974-04-21Information Systems ManagerNaNAffluent CustomerNNo9.047
6869HeidiMilnerFemale161969-06-22Web Developer IINaNMass CustomerNNo6.051
7273MinetteWortersFemale161960-05-27TeacherNaNAffluent CustomerNYes5.060
7374PansyKiddieFemale941969-06-19MissingNaNMass CustomerNYes6.051
8384RichMathiasenMale781958-02-07Accountant IIINaNMass CustomerNYes14.063
8485KaneTixallMale11958-05-21Analyst ProgrammerNaNMass CustomerNNo8.062
107108KayleMingaudFemale41994-03-14MissingNaNHigh Net WorthNNo3.027
108109CodyBlabeyMale161978-12-11Marketing AssistantNaNAffluent CustomerNYes4.042
110111CeleEvasonFemale651993-08-29Analyst ProgrammerNaNMass CustomerNNo2.027
112113GageNicklessMale671956-05-06Staff ScientistNaNMass CustomerNNo20.065
117118PrenticePearmainMale431959-11-12Budget/Accounting Analyst IVNaNHigh Net WorthNNo19.061
118119WilleyChastanetMale91981-12-04Associate ProfessorNaNHigh Net WorthNYes9.039
147148JaquithMaffeyFemale691981-05-08Programmer Analyst IIINaNMass CustomerNYes5.040
153154FaydraDulieuFemale901958-02-13Junior ExecutiveNaNMass CustomerNNo11.063
157158HamlinOdamsMale991984-09-03Internal AuditorNaNAffluent CustomerNNo5.036
160161TaddBlossMale491976-01-21MissingNaNMass CustomerNNo16.045
177178MatthieuBertelmotMale21967-04-03MissingNaNAffluent CustomerNNo8.054
..........................................
38513852ZerkMerrienMale441982-02-04Help Desk OperatorNaNMass CustomerNNo4.039
38523853KerriMarringtonFemale911975-06-26Accounting Assistant IVNaNMass CustomerNYes19.045
38543855BrnabyDoughteryMale891965-02-26General ManagerNaNMass CustomerNNo16.056
38593860Sheila-kathrynConklinFemale141986-04-05Mechanical Systems EngineerNaNAffluent CustomerNYes13.035
38633864IlyssaPiaggiaFemale231963-08-27Help Desk TechnicianNaNMass CustomerNYes10.057
38703871MagdaShuggFemale801983-11-13Recruiting ManagerNaNMass CustomerNNo4.037
38763877GeorginePoutressFemale551971-01-28Account CoordinatorNaNHigh Net WorthNNo11.050
38773878WaldonDiggesMale991978-02-24Programmer IIINaNMass CustomerNNo9.043
38783879VinAttackMale741979-08-28Payment Adjustment CoordinatorNaNHigh Net WorthNNo19.041
38863887DulcieNealonFemale661964-07-16Computer Systems Analyst IVNaNAffluent CustomerNNo7.056
38913892RomaFinlaterMale191978-01-29Staff ScientistNaNMass CustomerNYes15.043
38923893HadriaMolesFemale71996-11-18MissingNaNHigh Net WorthNYes4.024
38953896PerlaBlakistonFemale31979-10-15Tax AccountantNaNMass CustomerNYes13.041
39023903DaynaCawtheraFemale691981-02-13Research Assistant IIINaNMass CustomerNYes17.040
39063907AdrianaHeamFemale81996-01-11Technical WriterNaNHigh Net WorthNYes5.025
39103911ValedaEzeleFemale811954-05-25Recruiting ManagerNaNMass CustomerNNo5.066
39173918RosaliaSkedgeFemale521977-07-05Junior ExecutiveNaNHigh Net WorthNNo18.043
39243925CallyChaimFemale811978-11-25Statistician INaNHigh Net WorthNNo7.042
39283929JacquiFortnamFemale501989-10-18MissingNaNAffluent CustomerNYes10.031
39323933ChiarraCopsFemale651983-07-05MissingNaNHigh Net WorthNYes10.037
39463947TanitansyMcTrustamFemale261970-05-12GIS Technical ArchitectNaNMass CustomerNNo12.051
39503951EphremHollerinMale391975-02-10Quality Control SpecialistNaNAffluent CustomerNYes9.046
39563957BerniceScotchforthFemale41978-07-20Business Systems Development AnalystNaNHigh Net WorthNYes14.042
39583959DannieSowrayMale761992-12-07MissingNaNMass CustomerNNo3.028
39623963ArdelleDasentFemale101954-08-22Software Test Engineer IINaNMass CustomerNNo13.066
39653966AstrixSigwardFemale531968-09-15Geologist INaNMass CustomerNYes11.052
39733974MishaRanklinFemale821961-02-11Technical WriterNaNAffluent CustomerNYes9.060
39753976GretelChrystalFemale01957-11-20Internal AuditorNaNAffluent CustomerNYes13.063
39823983JarredLysteMale191965-04-21Graphic DesignerNaNMass CustomerNYes9.056
39994000KippyOldlandMale761991-11-05Software Engineer IVNaNAffluent CustomerNNo11.029
\n", "

656 rows × 13 columns

\n", "
" ], "text/plain": [ " customer_id first_name last_name gender \\\n", "4 5 Sheila-kathryn Calton Female \n", "7 8 Rod Inder Male \n", "15 16 Harlin Parr Male \n", "16 17 Heath Faraday Male \n", "17 18 Marjie Neasham Female \n", "22 23 Olav Polak Male \n", "32 33 Ernst Hacon Male \n", "35 36 Lurette Stonnell Female \n", "45 46 Kaila Allin Female \n", "47 48 Rebbecca Casone Female \n", "48 49 Nolly Ownsworth Male \n", "56 57 Abba Masedon M \n", "58 59 Niki Heathcote Male \n", "67 68 Dahlia Eddoes Female \n", "68 69 Heidi Milner Female \n", "72 73 Minette Worters Female \n", "73 74 Pansy Kiddie Female \n", "83 84 Rich Mathiasen Male \n", "84 85 Kane Tixall Male \n", "107 108 Kayle Mingaud Female \n", "108 109 Cody Blabey Male \n", "110 111 Cele Evason Female \n", "112 113 Gage Nickless Male \n", "117 118 Prentice Pearmain Male \n", "118 119 Willey Chastanet Male \n", "147 148 Jaquith Maffey Female \n", "153 154 Faydra Dulieu Female \n", "157 158 Hamlin Odams Male \n", "160 161 Tadd Bloss Male \n", "177 178 Matthieu Bertelmot Male \n", "... ... ... ... ... \n", "3851 3852 Zerk Merrien Male \n", "3852 3853 Kerri Marrington Female \n", "3854 3855 Brnaby Doughtery Male \n", "3859 3860 Sheila-kathryn Conklin Female \n", "3863 3864 Ilyssa Piaggia Female \n", "3870 3871 Magda Shugg Female \n", "3876 3877 Georgine Poutress Female \n", "3877 3878 Waldon Digges Male \n", "3878 3879 Vin Attack Male \n", "3886 3887 Dulcie Nealon Female \n", "3891 3892 Roma Finlater Male \n", "3892 3893 Hadria Moles Female \n", "3895 3896 Perla Blakiston Female \n", "3902 3903 Dayna Cawthera Female \n", "3906 3907 Adriana Heam Female \n", "3910 3911 Valeda Ezele Female \n", "3917 3918 Rosalia Skedge Female \n", "3924 3925 Cally Chaim Female \n", "3928 3929 Jacqui Fortnam Female \n", "3932 3933 Chiarra Cops Female \n", "3946 3947 Tanitansy McTrustam Female \n", "3950 3951 Ephrem Hollerin Male \n", "3956 3957 Bernice Scotchforth Female \n", "3958 3959 Dannie Sowray Male \n", "3962 3963 Ardelle Dasent Female \n", "3965 3966 Astrix Sigward Female \n", "3973 3974 Misha Ranklin Female \n", "3975 3976 Gretel Chrystal Female \n", "3982 3983 Jarred Lyste Male \n", "3999 4000 Kippy Oldland Male \n", "\n", " past_3_years_bike_related_purchases DOB \\\n", "4 56 1977-05-13 \n", "7 31 1962-03-30 \n", "15 38 1977-02-27 \n", "16 57 1962-03-19 \n", "17 79 1967-07-06 \n", "22 43 1995-02-10 \n", "32 44 1957-06-25 \n", "35 33 1977-11-09 \n", "45 98 1972-02-26 \n", "47 46 1975-08-15 \n", "48 63 1994-01-26 \n", "56 87 1988-06-13 \n", "58 60 2000-02-08 \n", "67 37 1974-04-21 \n", "68 16 1969-06-22 \n", "72 16 1960-05-27 \n", "73 94 1969-06-19 \n", "83 78 1958-02-07 \n", "84 1 1958-05-21 \n", "107 4 1994-03-14 \n", "108 16 1978-12-11 \n", "110 65 1993-08-29 \n", "112 67 1956-05-06 \n", "117 43 1959-11-12 \n", "118 9 1981-12-04 \n", "147 69 1981-05-08 \n", "153 90 1958-02-13 \n", "157 99 1984-09-03 \n", "160 49 1976-01-21 \n", "177 2 1967-04-03 \n", "... ... ... \n", "3851 44 1982-02-04 \n", "3852 91 1975-06-26 \n", "3854 89 1965-02-26 \n", "3859 14 1986-04-05 \n", "3863 23 1963-08-27 \n", "3870 80 1983-11-13 \n", "3876 55 1971-01-28 \n", "3877 99 1978-02-24 \n", "3878 74 1979-08-28 \n", "3886 66 1964-07-16 \n", "3891 19 1978-01-29 \n", "3892 7 1996-11-18 \n", "3895 3 1979-10-15 \n", "3902 69 1981-02-13 \n", "3906 8 1996-01-11 \n", "3910 81 1954-05-25 \n", "3917 52 1977-07-05 \n", "3924 81 1978-11-25 \n", "3928 50 1989-10-18 \n", "3932 65 1983-07-05 \n", "3946 26 1970-05-12 \n", "3950 39 1975-02-10 \n", "3956 4 1978-07-20 \n", "3958 76 1992-12-07 \n", "3962 10 1954-08-22 \n", "3965 53 1968-09-15 \n", "3973 82 1961-02-11 \n", "3975 0 1957-11-20 \n", "3982 19 1965-04-21 \n", "3999 76 1991-11-05 \n", "\n", " job_title job_industry_category \\\n", "4 Senior Editor NaN \n", "7 Media Manager I NaN \n", "15 Media Manager IV NaN \n", "16 Sales Associate NaN \n", "17 Professor NaN \n", "22 Missing NaN \n", "32 Product Engineer NaN \n", "35 VP Quality Control NaN \n", "45 Missing NaN \n", "47 Biostatistician II NaN \n", "48 VP Quality Control NaN \n", "56 Chief Design Engineer NaN \n", "58 Physical Therapy Assistant NaN \n", "67 Information Systems Manager NaN \n", "68 Web Developer II NaN \n", "72 Teacher NaN \n", "73 Missing NaN \n", "83 Accountant III NaN \n", "84 Analyst Programmer NaN \n", "107 Missing NaN \n", "108 Marketing Assistant NaN \n", "110 Analyst Programmer NaN \n", "112 Staff Scientist NaN \n", "117 Budget/Accounting Analyst IV NaN \n", "118 Associate Professor NaN \n", "147 Programmer Analyst III NaN \n", "153 Junior Executive NaN \n", "157 Internal Auditor NaN \n", "160 Missing NaN \n", "177 Missing NaN \n", "... ... ... \n", "3851 Help Desk Operator NaN \n", "3852 Accounting Assistant IV NaN \n", "3854 General Manager NaN \n", "3859 Mechanical Systems Engineer NaN \n", "3863 Help Desk Technician NaN \n", "3870 Recruiting Manager NaN \n", "3876 Account Coordinator NaN \n", "3877 Programmer III NaN \n", "3878 Payment Adjustment Coordinator NaN \n", "3886 Computer Systems Analyst IV NaN \n", "3891 Staff Scientist NaN \n", "3892 Missing NaN \n", "3895 Tax Accountant NaN \n", "3902 Research Assistant III NaN \n", "3906 Technical Writer NaN \n", "3910 Recruiting Manager NaN \n", "3917 Junior Executive NaN \n", "3924 Statistician I NaN \n", "3928 Missing NaN \n", "3932 Missing NaN \n", "3946 GIS Technical Architect NaN \n", "3950 Quality Control Specialist NaN \n", "3956 Business Systems Development Analyst NaN \n", "3958 Missing NaN \n", "3962 Software Test Engineer II NaN \n", "3965 Geologist I NaN \n", "3973 Technical Writer NaN \n", "3975 Internal Auditor NaN \n", "3982 Graphic Designer NaN \n", "3999 Software Engineer IV NaN \n", "\n", " wealth_segment deceased_indicator owns_car tenure Age \n", "4 Affluent Customer N Yes 8.0 44 \n", "7 Mass Customer N No 7.0 59 \n", "15 Mass Customer N Yes 18.0 44 \n", "16 Affluent Customer N Yes 15.0 59 \n", "17 Affluent Customer N No 11.0 53 \n", "22 High Net Worth N Yes 1.0 26 \n", "32 Affluent Customer N Yes 11.0 63 \n", "35 Affluent Customer N No 22.0 43 \n", "45 Affluent Customer N Yes 15.0 49 \n", "47 Mass Customer N Yes 8.0 45 \n", "48 Affluent Customer N No 1.0 27 \n", "56 Mass Customer N Yes 13.0 32 \n", "58 High Net Worth N No 3.0 21 \n", "67 Affluent Customer N No 9.0 47 \n", "68 Mass Customer N No 6.0 51 \n", "72 Affluent Customer N Yes 5.0 60 \n", "73 Mass Customer N Yes 6.0 51 \n", "83 Mass Customer N Yes 14.0 63 \n", "84 Mass Customer N No 8.0 62 \n", "107 High Net Worth N No 3.0 27 \n", "108 Affluent Customer N Yes 4.0 42 \n", "110 Mass Customer N No 2.0 27 \n", "112 Mass Customer N No 20.0 65 \n", "117 High Net Worth N No 19.0 61 \n", "118 High Net Worth N Yes 9.0 39 \n", "147 Mass Customer N Yes 5.0 40 \n", "153 Mass Customer N No 11.0 63 \n", "157 Affluent Customer N No 5.0 36 \n", "160 Mass Customer N No 16.0 45 \n", "177 Affluent Customer N No 8.0 54 \n", "... ... ... ... ... ... \n", "3851 Mass Customer N No 4.0 39 \n", "3852 Mass Customer N Yes 19.0 45 \n", "3854 Mass Customer N No 16.0 56 \n", "3859 Affluent Customer N Yes 13.0 35 \n", "3863 Mass Customer N Yes 10.0 57 \n", "3870 Mass Customer N No 4.0 37 \n", "3876 High Net Worth N No 11.0 50 \n", "3877 Mass Customer N No 9.0 43 \n", "3878 High Net Worth N No 19.0 41 \n", "3886 Affluent Customer N No 7.0 56 \n", "3891 Mass Customer N Yes 15.0 43 \n", "3892 High Net Worth N Yes 4.0 24 \n", "3895 Mass Customer N Yes 13.0 41 \n", "3902 Mass Customer N Yes 17.0 40 \n", "3906 High Net Worth N Yes 5.0 25 \n", "3910 Mass Customer N No 5.0 66 \n", "3917 High Net Worth N No 18.0 43 \n", "3924 High Net Worth N No 7.0 42 \n", "3928 Affluent Customer N Yes 10.0 31 \n", "3932 High Net Worth N Yes 10.0 37 \n", "3946 Mass Customer N No 12.0 51 \n", "3950 Affluent Customer N Yes 9.0 46 \n", "3956 High Net Worth N Yes 14.0 42 \n", "3958 Mass Customer N No 3.0 28 \n", "3962 Mass Customer N No 13.0 66 \n", "3965 Mass Customer N Yes 11.0 52 \n", "3973 Affluent Customer N Yes 9.0 60 \n", "3975 Affluent Customer N Yes 13.0 63 \n", "3982 Mass Customer N Yes 9.0 56 \n", "3999 Affluent Customer N No 11.0 29 \n", "\n", "[656 rows x 13 columns]" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo[cust_demo['job_industry_category'].isnull()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since Percentage of missing Job Industry Category is 16. We will replace null values with Missing" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "cust_demo['job_industry_category'].fillna('Missing', inplace=True, axis=0)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo['job_industry_category'].isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally there are no Missing Values in the dataset." ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "customer_id 0\n", "first_name 0\n", "last_name 0\n", "gender 0\n", "past_3_years_bike_related_purchases 0\n", "DOB 0\n", "job_title 0\n", "job_industry_category 0\n", "wealth_segment 0\n", "deceased_indicator 0\n", "owns_car 0\n", "tenure 0\n", "Age 0\n", "dtype: int64" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo.isnull().sum()" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total records after removing Missing Values: 3912\n" ] } ], "source": [ "print(\"Total records after removing Missing Values: {}\".format(cust_demo.shape[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Inconsistency Check in Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will check whether there is inconsistent data / typo error data is present in the categorical columns.
\n", "The columns to be checked are 'gender', 'wealth_segment' ,'deceased_indicator', 'owns_car'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1 Gender" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Female 2037\n", "Male 1872\n", "F 1\n", "M 1\n", "Femal 1\n", "Name: gender, dtype: int64" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo['gender'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here there are inconsistent data in gender column.There are spelling mistakes and typos. For gender with value M will be replaced with Male, F will be replaced by Female and Femal will be replaced by Female" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "def replace_gender_names(gender):\n", " \n", " # Making Gender as Male and Female as standards\n", " if gender=='M':\n", " return 'Male'\n", " elif gender=='F':\n", " return 'Female'\n", " elif gender=='Femal':\n", " return 'Female'\n", " else :\n", " return gender\n", "\n", "cust_demo['gender'] = cust_demo['gender'].apply(replace_gender_names)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Female 2039\n", "Male 1873\n", "Name: gender, dtype: int64" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo['gender'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The inconsistent data ,spelling mistakes and typos in gender column are removed. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 Wealth Segment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is no inconsistent data in wealth_segment column." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Mass Customer 1954\n", "High Net Worth 996\n", "Affluent Customer 962\n", "Name: wealth_segment, dtype: int64" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo['wealth_segment'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3 Deceased Indicator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is no inconsistent data in deceased_indicator column." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "N 3910\n", "Y 2\n", "Name: deceased_indicator, dtype: int64" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo['deceased_indicator'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.4 Owns a Car" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is no inconsistent data in owns_car column." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Yes 1974\n", "No 1938\n", "Name: owns_car, dtype: int64" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cust_demo['owns_car'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Duplication Checks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to ensure that there is no duplication of records in the dataset. This may lead to error in data analysis due to poor data quality. If there are duplicate rows of data then we need to drop such records.
For checking for duplicate records we need to firstly remove the primary key column of the dataset then apply drop_duplicates() function provided by Python." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of records after removing customer_id (pk), duplicates : 3912\n", "Number of records in original dataset : 3912\n" ] } ], "source": [ "cust_demo_dedupped = cust_demo.drop('customer_id', axis=1).drop_duplicates()\n", "\n", "print(\"Number of records after removing customer_id (pk), duplicates : {}\".format(cust_demo_dedupped.shape[0]))\n", "print(\"Number of records in original dataset : {}\".format(cust_demo.shape[0]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since both the numbers are same. There are no duplicate records in the dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Exporting the Cleaned Customer Demographic Data Set to csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently the Customer Demographics dataset is clean. Hence we can export the data to a csv to continue our data analysis of Customer Segments by joining it to other tables." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "cust_demo.to_csv('CustomerDemographic_Cleaned.csv', index=False)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }