{ "cells": [ { "cell_type": "markdown", "id": "7e5f2abc", "metadata": {}, "source": [ "# Customer segmentation: clustering - assignment 3" ] }, { "cell_type": "markdown", "id": "1bd223de", "metadata": {}, "source": [ "In this project, we will be performing an unsupervised clustering of data on the customer's records from a groceries firm's database. Customer segmentation is the practice of separating customers into groups that reflect similarities among customers in each cluster. We will divide customers into segments to optimize the significance of each customer to the business. To modify products according to distinct needs and behaviours of the customers. It also helps the business to cater to the concerns of different types of customers." ] }, { "cell_type": "markdown", "id": "ae3f38f1", "metadata": {}, "source": [ "## Importing libraries" ] }, { "cell_type": "code", "execution_count": 19, "id": "351f137d", "metadata": { "papermill": { "duration": 1.53841, "end_time": "2021-10-08T04:14:40.759736", "exception": false, "start_time": "2021-10-08T04:14:39.221326", "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from matplotlib import colors\n", "from mpl_toolkits.mplot3d import Axes3D\n", "import seaborn as sns\n", "from sklearn.preprocessing import LabelEncoder\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.decomposition import PCA\n", "from yellowbrick.cluster import KElbowVisualizer\n", "from sklearn.cluster import KMeans\n", "import matplotlib.pyplot as plt, numpy as np\n", "from sklearn.cluster import AgglomerativeClustering\n", "import warnings\n", "import sys\n", "if not sys.warnoptions:\n", " warnings.simplefilter(\"ignore\")\n", "np.random.seed(42)" ] }, { "cell_type": "markdown", "id": "b72af4fa", "metadata": {}, "source": [ "## Loading data" ] }, { "cell_type": "code", "execution_count": 20, "id": "7a6e8b5a", "metadata": { "papermill": { "duration": 0.087752, "end_time": "2021-10-08T04:14:40.933340", "exception": false, "start_time": "2021-10-08T04:14:40.845588", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of datapoints: 2240\n" ] }, { "data": { "text/html": [ "
| \n", " | ID | \n", "Year_Birth | \n", "Education | \n", "Marital_Status | \n", "Income | \n", "Kidhome | \n", "Teenhome | \n", "Dt_Customer | \n", "Recency | \n", "MntWines | \n", "... | \n", "NumWebVisitsMonth | \n", "AcceptedCmp3 | \n", "AcceptedCmp4 | \n", "AcceptedCmp5 | \n", "AcceptedCmp1 | \n", "AcceptedCmp2 | \n", "Complain | \n", "Z_CostContact | \n", "Z_Revenue | \n", "Response | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "5524 | \n", "1957 | \n", "Graduation | \n", "Single | \n", "58138.0 | \n", "0 | \n", "0 | \n", "04-09-2012 | \n", "58 | \n", "635 | \n", "... | \n", "7 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "1 | \n", "
| 1 | \n", "2174 | \n", "1954 | \n", "Graduation | \n", "Single | \n", "46344.0 | \n", "1 | \n", "1 | \n", "08-03-2014 | \n", "38 | \n", "11 | \n", "... | \n", "5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
| 2 | \n", "4141 | \n", "1965 | \n", "Graduation | \n", "Together | \n", "71613.0 | \n", "0 | \n", "0 | \n", "21-08-2013 | \n", "26 | \n", "426 | \n", "... | \n", "4 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
| 3 | \n", "6182 | \n", "1984 | \n", "Graduation | \n", "Together | \n", "26646.0 | \n", "1 | \n", "0 | \n", "10-02-2014 | \n", "26 | \n", "11 | \n", "... | \n", "6 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
| 4 | \n", "5324 | \n", "1981 | \n", "PhD | \n", "Married | \n", "58293.0 | \n", "1 | \n", "0 | \n", "19-01-2014 | \n", "94 | \n", "173 | \n", "... | \n", "5 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "3 | \n", "11 | \n", "0 | \n", "
5 rows × 29 columns
\n", "| \n", " | Income | \n", "Kidhome | \n", "Teenhome | \n", "Recency | \n", "Wines | \n", "Fruits | \n", "Meat | \n", "Fish | \n", "Sweets | \n", "Gold | \n", "... | \n", "AcceptedCmp1 | \n", "AcceptedCmp2 | \n", "Complain | \n", "Response | \n", "Customer_For | \n", "Age | \n", "Spent | \n", "Children | \n", "Family_Size | \n", "Is_Parent | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "... | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "2.216000e+03 | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "2216.000000 | \n", "
| mean | \n", "52247.251354 | \n", "0.441787 | \n", "0.505415 | \n", "49.012635 | \n", "305.091606 | \n", "26.356047 | \n", "166.995939 | \n", "37.637635 | \n", "27.028881 | \n", "43.965253 | \n", "... | \n", "0.064079 | \n", "0.013538 | \n", "0.009477 | \n", "0.150271 | \n", "4.423735e+16 | \n", "52.179603 | \n", "607.075361 | \n", "0.947202 | \n", "2.592509 | \n", "0.714350 | \n", "
| std | \n", "25173.076661 | \n", "0.536896 | \n", "0.544181 | \n", "28.948352 | \n", "337.327920 | \n", "39.793917 | \n", "224.283273 | \n", "54.752082 | \n", "41.072046 | \n", "51.815414 | \n", "... | \n", "0.244950 | \n", "0.115588 | \n", "0.096907 | \n", "0.357417 | \n", "2.008532e+16 | \n", "11.985554 | \n", "602.900476 | \n", "0.749062 | \n", "0.905722 | \n", "0.451825 | \n", "
| min | \n", "1730.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "... | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000e+00 | \n", "25.000000 | \n", "5.000000 | \n", "0.000000 | \n", "1.000000 | \n", "0.000000 | \n", "
| 25% | \n", "35303.000000 | \n", "0.000000 | \n", "0.000000 | \n", "24.000000 | \n", "24.000000 | \n", "2.000000 | \n", "16.000000 | \n", "3.000000 | \n", "1.000000 | \n", "9.000000 | \n", "... | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "2.937600e+16 | \n", "44.000000 | \n", "69.000000 | \n", "0.000000 | \n", "2.000000 | \n", "0.000000 | \n", "
| 50% | \n", "51381.500000 | \n", "0.000000 | \n", "0.000000 | \n", "49.000000 | \n", "174.500000 | \n", "8.000000 | \n", "68.000000 | \n", "12.000000 | \n", "8.000000 | \n", "24.500000 | \n", "... | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "4.432320e+16 | \n", "51.000000 | \n", "396.500000 | \n", "1.000000 | \n", "3.000000 | \n", "1.000000 | \n", "
| 75% | \n", "68522.000000 | \n", "1.000000 | \n", "1.000000 | \n", "74.000000 | \n", "505.000000 | \n", "33.000000 | \n", "232.250000 | \n", "50.000000 | \n", "33.000000 | \n", "56.000000 | \n", "... | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "0.000000 | \n", "5.927040e+16 | \n", "62.000000 | \n", "1048.000000 | \n", "1.000000 | \n", "3.000000 | \n", "1.000000 | \n", "
| max | \n", "666666.000000 | \n", "2.000000 | \n", "2.000000 | \n", "99.000000 | \n", "1493.000000 | \n", "199.000000 | \n", "1725.000000 | \n", "259.000000 | \n", "262.000000 | \n", "321.000000 | \n", "... | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "1.000000 | \n", "9.184320e+16 | \n", "128.000000 | \n", "2525.000000 | \n", "3.000000 | \n", "5.000000 | \n", "1.000000 | \n", "
8 rows × 28 columns
\n", "| \n", " | Education | \n", "Income | \n", "Kidhome | \n", "Teenhome | \n", "Recency | \n", "Wines | \n", "Fruits | \n", "Meat | \n", "Fish | \n", "Sweets | \n", "... | \n", "NumCatalogPurchases | \n", "NumStorePurchases | \n", "NumWebVisitsMonth | \n", "Customer_For | \n", "Age | \n", "Spent | \n", "Living_With | \n", "Children | \n", "Family_Size | \n", "Is_Parent | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "-0.893586 | \n", "0.287105 | \n", "-0.822754 | \n", "-0.929699 | \n", "0.310353 | \n", "0.977660 | \n", "1.552041 | \n", "1.690293 | \n", "2.453472 | \n", "1.483713 | \n", "... | \n", "2.503607 | \n", "-0.555814 | \n", "0.692181 | \n", "1.973583 | \n", "1.018352 | \n", "1.676245 | \n", "-1.349603 | \n", "-1.264598 | \n", "-1.758359 | \n", "-1.581139 | \n", "
| 1 | \n", "-0.893586 | \n", "-0.260882 | \n", "1.040021 | \n", "0.908097 | \n", "-0.380813 | \n", "-0.872618 | \n", "-0.637461 | \n", "-0.718230 | \n", "-0.651004 | \n", "-0.634019 | \n", "... | \n", "-0.571340 | \n", "-1.171160 | \n", "-0.132545 | \n", "-1.665144 | \n", "1.274785 | \n", "-0.963297 | \n", "-1.349603 | \n", "1.404572 | \n", "0.449070 | \n", "0.632456 | \n", "
| 2 | \n", "-0.893586 | \n", "0.913196 | \n", "-0.822754 | \n", "-0.929699 | \n", "-0.795514 | \n", "0.357935 | \n", "0.570540 | \n", "-0.178542 | \n", "1.339513 | \n", "-0.147184 | \n", "... | \n", "-0.229679 | \n", "1.290224 | \n", "-0.544908 | \n", "-0.172664 | \n", "0.334530 | \n", "0.280110 | \n", "0.740959 | \n", "-1.264598 | \n", "-0.654644 | \n", "-1.581139 | \n", "
| 3 | \n", "-0.893586 | \n", "-1.176114 | \n", "1.040021 | \n", "-0.929699 | \n", "-0.795514 | \n", "-0.872618 | \n", "-0.561961 | \n", "-0.655787 | \n", "-0.504911 | \n", "-0.585335 | \n", "... | \n", "-0.913000 | \n", "-0.555814 | \n", "0.279818 | \n", "-1.923210 | \n", "-1.289547 | \n", "-0.920135 | \n", "0.740959 | \n", "0.069987 | \n", "0.449070 | \n", "0.632456 | \n", "
| 4 | \n", "0.571657 | \n", "0.294307 | \n", "1.040021 | \n", "-0.929699 | \n", "1.554453 | \n", "-0.392257 | \n", "0.419540 | \n", "-0.218684 | \n", "0.152508 | \n", "-0.001133 | \n", "... | \n", "0.111982 | \n", "0.059532 | \n", "-0.132545 | \n", "-0.822130 | \n", "-1.033114 | \n", "-0.307562 | \n", "0.740959 | \n", "0.069987 | \n", "0.449070 | \n", "0.632456 | \n", "
5 rows × 23 columns
\n", "| \n", " | count | \n", "mean | \n", "std | \n", "min | \n", "25% | \n", "50% | \n", "75% | \n", "max | \n", "
|---|---|---|---|---|---|---|---|---|
| col1 | \n", "2212.0 | \n", "1.477621e-16 | \n", "2.878377 | \n", "-5.969394 | \n", "-2.538494 | \n", "-0.780421 | \n", "2.383290 | \n", "7.444305 | \n", "
| col2 | \n", "2212.0 | \n", "1.927331e-17 | \n", "1.706839 | \n", "-4.312196 | \n", "-1.328316 | \n", "-0.158123 | \n", "1.242289 | \n", "6.142721 | \n", "
| col3 | \n", "2212.0 | \n", "1.284887e-17 | \n", "1.221956 | \n", "-3.530416 | \n", "-0.829067 | \n", "-0.022692 | \n", "0.799895 | \n", "6.611222 | \n", "