{ "metadata": { "name": "", "signature": "sha256:5cf0791b3e444d1918be698dc1538ecc65a66c17ddb412d87ffeb7aa5d36980d" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Breakout: EigenFaces\n", "\n", "In this breakout, we'll be using *Principal Component Analysis* to explore how it interacts with the *faces* dataset that we saw earlier." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scipy import stats\n", "\n", "# use seaborn plotting defaults\n", "import seaborn as sns; sns.set()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll use this code to load the data:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn.datasets import fetch_lfw_people\n", "faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)\n", "\n", "X, y = faces.data, faces.target" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Compute a PCA of the data\n", "\n", "- Compute a Principal Component Analysis of the data, using all components\n", "- Plot the cumulative explained variance ratio. How many components do we need to recover 90% of the variance?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Plot the \"eigenfaces\"\n", "\n", "The mean of the data (found in the ``mean_`` attribute) and each component of the data (found in the rows of the ``components_`` attribute) can be reshaped and interpreted as an image.\n", "\n", "- Display the mean face using ``plt.imshow``\n", "- Display the first few \"eigenfaces\" (given by the rows of the ``components_`` matrix\n", "\n", "You'll have to play around with the colormap and grid settings to make this look OK" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Plot the reconstructed faces\n", "\n", "For several faces, plot the true image plus the reconstruction (computed using ``inverse_transform``) for several different values of ``n_components``. (you might even use IPython's interactive functions to make this exploration easier).\n", "\n", "Does the 90% variance choice seem to correspond to a good visual representation of each picture?\n", "\n", "**Note:** As you experiment with this, you may want to use ``RandomizedPCA`` rather than ``PCA`` for this task. ``RandomizedPCA`` is an approximate method with the same interface as ``PCA``, but operates much more quickly." ] } ], "metadata": {} } ] }