{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Mean Shift Clustering\n", "\n", "Mean shift clustering aims to discover “blobs” in a smooth density of samples. It is a centroid-based algorithm, which works by updating candidates for centroids to be the mean of the points within a given region. These candidates are then filtered in a post-processing stage to eliminate near-duplicates to form the final set of centroids.\n", "\n", "### Working\n", "\n", "* Define a window (bandwidth of the kernel) and place the window on a data point\n", "* Calculate the mean for all the points in the window\n", "* Move the center of the window to the location of the mean\n", "* Repeat steps 2 and 3 until there is convergence" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | CustomerID | \n", "Genre | \n", "Age | \n", "Annual Income (k$) | \n", "Spending Score (1-100) | \n", "
|---|---|---|---|---|---|
| 127 | \n", "128 | \n", "Male | \n", "40 | \n", "71 | \n", "95 | \n", "
| 9 | \n", "10 | \n", "Female | \n", "30 | \n", "19 | \n", "72 | \n", "