{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [ "s1", "content", "l1" ] }, "source": [ "# Spectral Clustering\n", "\n", "## Spectral Clustering\n", "\n", "Spectral Clustering works by transforming the data into a subspace prior to clustering. This is incredibly useful when the data is high dimensional. This saves the effort of doing a PCA or a dimensionality reduction ourselves prior to clustering. Spectral clustering works by determining an affinity matrix between the datasets. The data is represented as a graph and an affinity matrix is computed. For the affinity function, we can use the rbf kernel function or nearest neighbors.\n", "\n", "Let us consider the intertwined circles with noise:\n", "\n", "\n", "\n", "
\n", "## Exercise:\n", "\n", " - Apply spectral clustering to the dataset with number of clusters as 2.\n", " - Assign the cluster labels to a dataframe, circles_df with column 'spectral'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true, "tags": [ "s1", "ce", "l1" ] }, "outputs": [], "source": [ "from sklearn.cluster import SpectralClustering\n", "from sklearn import datasets\n", "import pandas as pd\n", "import seaborn as sns\n", "\n", "N_Samples = 2000\n", "X, y = datasets.make_circles(n_samples=N_Samples, factor=.5, noise=.2)\n", "noisy_circles = pd.DataFrame({'X_0':X[:,0],'X_1':X[:,1], 'y':y})\n", "\n", "# Fit the spectral clustering to the dataset and plot the results.\n", "spectral = SpectralClustering(n_clusters=2, eigen_solver='arpack', affinity=\"nearest_neighbors\")\n", "\n", "noisy_circles.drop('y', 1)\n" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "s1", "l1", "hint" ] }, "source": [ "

use .fit and .labels_ to extract the cluster associations.

" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s1", "l1", "ans" ] }, "outputs": [], "source": [ "spectral.fit(noisy_circles)\n", "noisy_circles['spectral'] = spectral.labels_\n", "g=sns.pairplot(x_vars=\"X_0\", y_vars=\"X_1\", hue = \"spectral\", data = noisy_circles)\n", "g.fig.set_size_inches(14, 6)\n", "sns.despine()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s1", "hid", "l1" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "\n", "try:\n", " ref_assert_var = False\n", " import numpy as np\n", " \n", " spectral_ = SpectralClustering(n_clusters=2, eigen_solver='arpack', affinity=\"nearest_neighbors\")\n", " spectral_.fit(noisy_circles)\n", " \n", " if (len(noisy_circles['spectral']) == len(spectral_.labels_)):\n", " ref_assert_var = True\n", " out = g\n", " else:\n", " ref_assert_var = False\n", " \n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "else:\n", " if ref_assert_var:\n", " ref_tmp_var = True\n", " else:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "l2", "content", "s2" ] }, "source": [ "\n", "\n", "\n", "


\n", "## The Algorithm\n", "\n", "* project your data to $R^{n}$\n", "* Form an Affinity matrix, using a Gaussian Kernel/Adjacency matrix: \n", "$$A_{i,j}=\\delta_{i,j}$$\n", "* Construct the Graph Laplacian from A\n", "* Solve an Eigenvalue problem, such as $$L v=\\lambda v$$ \n", "* Select k eigenvectors \\{ v_{i}, i=1, k \\} corresponding to the k eigenvalues $\\{ \\lambda_{i}, i=1, k \\}$, to define a k-dimensional subspace $P^{t}LP$\n", "* Compute clusters in this subspace using using k-means" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "tags": [ "l2", "ce", "s2" ] }, "outputs": [], "source": [ "# This section will contain code examples that you can run to generate results as well as quizzes." ] }, { "cell_type": "markdown", "metadata": { "tags": [ "l2", "s2", "hint" ] }, "source": [ "

#

" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "l2", "s2", "ans" ] }, "outputs": [], "source": [ "#" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "l2", "hid", "s2" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "\n", "try:\n", " ref_assert_var = True\n", " \n", " \n", " \n", " \n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "else:\n", " if ref_assert_var:\n", " ref_tmp_var = True\n", " else:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "\n", "assert ref_tmp_var" ] } ], "metadata": { "executed_sections": [], "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" }, "rf_version": 1 }, "nbformat": 4, "nbformat_minor": 2 }