{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Manifold Learning\n",
    "\n",
    "One weakness of PCA is that it cannot detect non-linear features.  A set\n",
    "of algorithms known as *Manifold Learning* have been developed to address\n",
    "this deficiency.  A canonical dataset used in Manifold learning is the\n",
    "*S-curve*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "from sklearn.datasets import make_s_curve\n",
    "X, y = make_s_curve(n_samples=1000)\n",
    "\n",
    "from mpl_toolkits.mplot3d import Axes3D\n",
    "ax = plt.axes(projection='3d')\n",
    "\n",
    "ax.scatter3D(X[:, 0], X[:, 1], X[:, 2], c=y)\n",
    "ax.view_init(10, -60);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "This is a 2-dimensional dataset embedded in three dimensions, but it is embedded\n",
    "in such a way that PCA cannot discover the underlying data orientation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "from sklearn.decomposition import PCA\n",
    "X_pca = PCA(n_components=2).fit_transform(X)\n",
    "plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Manifold learning algorithms, however, available in the ``sklearn.manifold``\n",
    "submodule, are able to recover the underlying 2-dimensional manifold:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "from sklearn.manifold import Isomap\n",
    "\n",
    "iso = Isomap(n_neighbors=15, n_components=2)\n",
    "X_iso = iso.fit_transform(X)\n",
    "plt.scatter(X_iso[:, 0], X_iso[:, 1], c=y);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Manifold learning on the digits data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "We can apply manifold learning techniques to much higher dimensional datasets, for example the digits data that we saw before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_digits\n",
    "digits = load_digits()\n",
    "\n",
    "fig, axes = plt.subplots(2, 5, figsize=(10, 5),\n",
    "                         subplot_kw={'xticks':(), 'yticks': ()})\n",
    "for ax, img in zip(axes.ravel(), digits.images):\n",
    "    ax.imshow(img, interpolation=\"none\", cmap=\"gray\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "We can visualize the dataset using a linear technique, such as PCA. We saw this already provides some intuition about the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "# build a PCA model\n",
    "pca = PCA(n_components=2)\n",
    "pca.fit(digits.data)\n",
    "# transform the digits data onto the first two principal components\n",
    "digits_pca = pca.transform(digits.data)\n",
    "colors = [\"#476A2A\", \"#7851B8\", \"#BD3430\", \"#4A2D4E\", \"#875525\",\n",
    "          \"#A83683\", \"#4E655E\", \"#853541\", \"#3A3120\",\"#535D8E\"]\n",
    "plt.figure(figsize=(10, 10))\n",
    "plt.xlim(digits_pca[:, 0].min(), digits_pca[:, 0].max() + 1)\n",
    "plt.ylim(digits_pca[:, 1].min(), digits_pca[:, 1].max() + 1)\n",
    "for i in range(len(digits.data)):\n",
    "    # actually plot the digits as text instead of using scatter\n",
    "    plt.text(digits_pca[i, 0], digits_pca[i, 1], str(digits.target[i]),\n",
    "             color = colors[digits.target[i]],\n",
    "             fontdict={'weight': 'bold', 'size': 9})\n",
    "plt.xlabel(\"first principal component\")\n",
    "plt.ylabel(\"second principal component\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Using a more powerful, nonlinear techinque can provide much better visualizations, though.\n",
    "Here, we are using the t-SNE manifold learning method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "from sklearn.manifold import TSNE\n",
    "tsne = TSNE(random_state=42)\n",
    "# use fit_transform instead of fit, as TSNE has no transform method:\n",
    "digits_tsne = tsne.fit_transform(digits.data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10, 10))\n",
    "plt.xlim(digits_tsne[:, 0].min(), digits_tsne[:, 0].max() + 1)\n",
    "plt.ylim(digits_tsne[:, 1].min(), digits_tsne[:, 1].max() + 1)\n",
    "for i in range(len(digits.data)):\n",
    "    # actually plot the digits as text instead of using scatter\n",
    "    plt.text(digits_tsne[i, 0], digits_tsne[i, 1], str(digits.target[i]),\n",
    "             color = colors[digits.target[i]],\n",
    "             fontdict={'weight': 'bold', 'size': 9})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "t-SNE has a somewhat longer runtime that other manifold learning algorithms, but the result is quite striking. Keep in mind that this algorithm is purely unsupervised, and does not know about the class labels. Still it is able to separate the classes very well (though the classes four, one and nine have been split into multiple groups)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<div class=\"alert alert-success\">\n",
    "    <b>EXERCISE</b>:\n",
    "     <ul>\n",
    "      <li>\n",
    "      Compare the results of applying isomap to the digits dataset to the results of PCA and t-SNE. Which result do you think looks best?\n",
    "      </li>\n",
    "      <li>\n",
    "      Given how well t-SNE separated the classes, one might be tempted to use this processing for classification. Try training a K-nearest neighbor classifier on digits data transformed with t-SNE, and compare to the accuracy on using the dataset without any transformation.\n",
    "      </li>\n",
    "    </ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "# %load solutions/21A_isomap_digits.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "# %load solutions/21B_tsne_classification.py"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}