{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Dimensionality Reduction with Neighborhood Components Analysis\n\nSample usage of Neighborhood Components Analysis for dimensionality reduction.\n\nThis example compares different (linear) dimensionality reduction methods\napplied on the Digits data set. The data set contains images of digits from\n0 to 9 with approximately 180 samples of each class. Each image is of\ndimension 8x8 = 64, and is reduced to a two-dimensional data point.\n\nPrincipal Component Analysis (PCA) applied to this data identifies the\ncombination of attributes (principal components, or directions in the\nfeature space) that account for the most variance in the data. Here we\nplot the different samples on the 2 first principal components.\n\nLinear Discriminant Analysis (LDA) tries to identify attributes that\naccount for the most variance *between classes*. In particular,\nLDA, in contrast to PCA, is a supervised method, using known class labels.\n\nNeighborhood Components Analysis (NCA) tries to find a feature space such\nthat a stochastic nearest neighbor algorithm will give the best accuracy.\nLike LDA, it is a supervised method.\n\nOne can see that NCA enforces a clustering of the data that is visually\nmeaningful despite the large reduction in dimension.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause\n\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nfrom sklearn import datasets\nfrom sklearn.decomposition import PCA\nfrom sklearn.discriminant_analysis import LinearDiscriminantAnalysis\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.neighbors import KNeighborsClassifier, NeighborhoodComponentsAnalysis\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nn_neighbors = 3\nrandom_state = 0\n\n# Load Digits dataset\nX, y = datasets.load_digits(return_X_y=True)\n\n# Split into train/test\nX_train, X_test, y_train, y_test = train_test_split(\n X, y, test_size=0.5, stratify=y, random_state=random_state\n)\n\ndim = len(X[0])\nn_classes = len(np.unique(y))\n\n# Reduce dimension to 2 with PCA\npca = make_pipeline(StandardScaler(), PCA(n_components=2, random_state=random_state))\n\n# Reduce dimension to 2 with LinearDiscriminantAnalysis\nlda = make_pipeline(StandardScaler(), LinearDiscriminantAnalysis(n_components=2))\n\n# Reduce dimension to 2 with NeighborhoodComponentAnalysis\nnca = make_pipeline(\n StandardScaler(),\n NeighborhoodComponentsAnalysis(n_components=2, random_state=random_state),\n)\n\n# Use a nearest neighbor classifier to evaluate the methods\nknn = KNeighborsClassifier(n_neighbors=n_neighbors)\n\n# Make a list of the methods to be compared\ndim_reduction_methods = [(\"PCA\", pca), (\"LDA\", lda), (\"NCA\", nca)]\n\n# plt.figure()\nfor i, (name, model) in enumerate(dim_reduction_methods):\n plt.figure()\n # plt.subplot(1, 3, i + 1, aspect=1)\n\n # Fit the method's model\n model.fit(X_train, y_train)\n\n # Fit a nearest neighbor classifier on the embedded training set\n knn.fit(model.transform(X_train), y_train)\n\n # Compute the nearest neighbor accuracy on the embedded test set\n acc_knn = knn.score(model.transform(X_test), y_test)\n\n # Embed the data set in 2 dimensions using the fitted model\n X_embedded = model.transform(X)\n\n # Plot the projected points and show the evaluation score\n plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=y, s=30, cmap=\"Set1\")\n plt.title(\n \"{}, KNN (k={})\\nTest accuracy = {:.2f}\".format(name, n_neighbors, acc_knn)\n )\nplt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }