{
 "metadata": {
  "name": "",
  "signature": "sha256:139a6df8a3ab63d10b6ff341c458833eea9ae1300d5256bf357328f714cfe785"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Machine Learning Breakout: Facial Recognition\n",
      "\n",
      "This exercise will walk you through the process of using machine learning for facial recognition."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from __future__ import print_function, division\n",
      "\n",
      "%matplotlib inline\n",
      "import numpy as np\n",
      "import matplotlib.pyplot as plt\n",
      "import pandas as pd\n",
      "\n",
      "# use seaborn for better matplotlib styles\n",
      "import seaborn; seaborn.set(style='white')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## 1. Fetch & explore the data\n",
      "\n",
      "The data we'll use is a number of snapshots of the faces of world leaders. We'll fetch the data as follows:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.datasets import fetch_lfw_people\n",
      "faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- Explore this data, which is layed out very similarly to the digits data we saw earlier. How many **samples** are there? How many **features**? How many **classes**, or **targets**?\n",
      "- Use subplots and ``plt.imshow`` to plot several of the images. How many pixels are in each image?\n",
      "- Use ``sklearn.metrics.train_test_split`` to split the data into a training set and a test set."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## 2. Projecting the Data\n",
      "\n",
      "Lets use some dimensionality reduction routines to try and understand the data.\n",
      "Just a warning: you'll probably find that, unlike in the case of the handwritten digits, the projections will be a bit too jumbled to gain much insight. Still, it's always a useful step in understanding your data!\n",
      "\n",
      "- Project the data to two-dimensions with Principal Component Analysis, and scatter-plot the results\n",
      "- Project the data to two dimensinos with Isomap and scatter-plot the results"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## 3: Classification of unknown images\n",
      "\n",
      "Here we'll perform a classification task on our data. Given a training set, we want to build a classifier that will accurately predict the test set\n",
      "\n",
      "- Start by splitting your data into a train and test set (you can use ``sklearn.cross_validation.train_test_split``)\n",
      "- We'll use a support vector classifier (``sklearn.svm.SVC``) to classify the data. Import this and instantiate the estimator.\n",
      "- Perform an initial fit on the data, predict the test labels, and use ``sklearn.metrics.accuracy_score`` to see how well you're doing.\n",
      "- The estimator can be tuned to make the fit better. we'll do this by adjusting the ``C`` parameter of ``SVC``. Look at the ``SVC`` doc string and try some choices for the ``kernel``, for ``C`` and for ``gamma``. What's the best accuracy you can find?\n",
      "- For this best estimator, print the ``sklearn.metrics.classification_report`` and ``sklearn.metrics.confusion_matrix``, and plot some of the images with the true and predicted label. How well does it do?"
     ]
    }
   ],
   "metadata": {}
  }
 ]
}