{ "metadata": { "name": "", "signature": "sha256:139a6df8a3ab63d10b6ff341c458833eea9ae1300d5256bf357328f714cfe785" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Machine Learning Breakout: Facial Recognition\n", "\n", "This exercise will walk you through the process of using machine learning for facial recognition." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from __future__ import print_function, division\n", "\n", "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "\n", "# use seaborn for better matplotlib styles\n", "import seaborn; seaborn.set(style='white')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Fetch & explore the data\n", "\n", "The data we'll use is a number of snapshots of the faces of world leaders. We'll fetch the data as follows:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn.datasets import fetch_lfw_people\n", "faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Explore this data, which is layed out very similarly to the digits data we saw earlier. How many **samples** are there? How many **features**? How many **classes**, or **targets**?\n", "- Use subplots and ``plt.imshow`` to plot several of the images. How many pixels are in each image?\n", "- Use ``sklearn.metrics.train_test_split`` to split the data into a training set and a test set." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Projecting the Data\n", "\n", "Lets use some dimensionality reduction routines to try and understand the data.\n", "Just a warning: you'll probably find that, unlike in the case of the handwritten digits, the projections will be a bit too jumbled to gain much insight. Still, it's always a useful step in understanding your data!\n", "\n", "- Project the data to two-dimensions with Principal Component Analysis, and scatter-plot the results\n", "- Project the data to two dimensinos with Isomap and scatter-plot the results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3: Classification of unknown images\n", "\n", "Here we'll perform a classification task on our data. Given a training set, we want to build a classifier that will accurately predict the test set\n", "\n", "- Start by splitting your data into a train and test set (you can use ``sklearn.cross_validation.train_test_split``)\n", "- We'll use a support vector classifier (``sklearn.svm.SVC``) to classify the data. Import this and instantiate the estimator.\n", "- Perform an initial fit on the data, predict the test labels, and use ``sklearn.metrics.accuracy_score`` to see how well you're doing.\n", "- The estimator can be tuned to make the fit better. we'll do this by adjusting the ``C`` parameter of ``SVC``. Look at the ``SVC`` doc string and try some choices for the ``kernel``, for ``C`` and for ``gamma``. What's the best accuracy you can find?\n", "- For this best estimator, print the ``sklearn.metrics.classification_report`` and ``sklearn.metrics.confusion_matrix``, and plot some of the images with the true and predicted label. How well does it do?" ] } ], "metadata": {} } ] }