{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Swiss Roll And Swiss-Hole Reduction\nThis notebook seeks to compare two popular non-linear dimensionality\ntechniques, T-distributed Stochastic Neighbor Embedding (t-SNE) and\nLocally Linear Embedding (LLE), on the classic Swiss Roll dataset.\nThen, we will explore how they both deal with the addition of a hole\nin the data.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Swiss Roll\n\nWe start by generating the Swiss Roll dataset.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n\nfrom sklearn import datasets, manifold\n\nsr_points, sr_color = datasets.make_swiss_roll(n_samples=1500, random_state=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's take a look at our data:\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fig = plt.figure(figsize=(8, 6))\nax = fig.add_subplot(111, projection=\"3d\")\nfig.add_axes(ax)\nax.scatter(\n sr_points[:, 0], sr_points[:, 1], sr_points[:, 2], c=sr_color, s=50, alpha=0.8\n)\nax.set_title(\"Swiss Roll in Ambient Space\")\nax.view_init(azim=-66, elev=12)\n_ = ax.text2D(0.8, 0.05, s=\"n_samples=1500\", transform=ax.transAxes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Computing the LLE and t-SNE embeddings, we find that LLE seems to unroll the\nSwiss Roll pretty effectively. t-SNE on the other hand, is able\nto preserve the general structure of the data, but, poorly represents the\ncontinuous nature of our original data. Instead, it seems to unnecessarily\nclump sections of points together.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "sr_lle, sr_err = manifold.locally_linear_embedding(\n sr_points, n_neighbors=12, n_components=2\n)\n\nsr_tsne = manifold.TSNE(n_components=2, perplexity=40, random_state=0).fit_transform(\n sr_points\n)\n\nfig, axs = plt.subplots(figsize=(8, 8), nrows=2)\naxs[0].scatter(sr_lle[:, 0], sr_lle[:, 1], c=sr_color)\naxs[0].set_title(\"LLE Embedding of Swiss Roll\")\naxs[1].scatter(sr_tsne[:, 0], sr_tsne[:, 1], c=sr_color)\n_ = axs[1].set_title(\"t-SNE Embedding of Swiss Roll\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
LLE seems to be stretching the points from the center (purple)\n of the swiss roll. However, we observe that this is simply a byproduct\n of how the data was generated. There is a higher density of points near the\n center of the roll, which ultimately affects how LLE reconstructs the\n data in a lower dimension.