{ "metadata": { "name": "07A_in_depth_with_SVMs" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "%pylab inline\n", "import numpy as np\n", "import pylab as pl" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "In depth with SVMs: Support Vector Machines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SVM stands for \"support vector machines\". They are efficient and easy to use estimators.\n", "They come in two kinds: SVCs, Support Vector Classifiers, for classification problems, and SVRs, Support Vector Regressors, for regression problems." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn import svm" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Linear SVMs: some intuitions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To develop our intuitions, let us look at a very simple classification problem: classifying irises based on sepal length and width. We only use 2 features to enable easy visualization." ] }, { "cell_type": "code", "collapsed": false, "input": [ "svc = svm.SVC(kernel='linear')\n", "from sklearn import datasets\n", "iris = datasets.load_iris()\n", "X = iris.data[:, :2]\n", "y = iris.target\n", "svc.fit(X, y)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To visualize the prediction, we evaluate it on a grid of points:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from matplotlib.colors import ListedColormap\n", "# Create color maps for 3-class classification problem, as with iris\n", "cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])\n", "cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])\n", "\n", "def plot_estimator(estimator, X, y):\n", " estimator.fit(X, y)\n", " x_min, x_max = X[:, 0].min() - .1, X[:, 0].max() + .1\n", " y_min, y_max = X[:, 1].min() - .1, X[:, 1].max() + .1\n", " xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),\n", " np.linspace(y_min, y_max, 100))\n", " Z = estimator.predict(np.c_[xx.ravel(), yy.ravel()])\n", "\n", " # Put the result into a color plot\n", " Z = Z.reshape(xx.shape)\n", " pl.figure()\n", " pl.pcolormesh(xx, yy, Z, cmap=cmap_light)\n", "\n", " # Plot also the training points\n", " pl.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold)\n", " pl.axis('tight')\n", " pl.axis('off')\n", " pl.tight_layout()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "plot_estimator(svc, X, y)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see, `kernel=\"linear\"` gives linear decision frontiers: the frontier between two classes is a line.\n", "\n", "How does multi-class work? With the `SVC` object, it is done by combining \"one versus one\" decisions on each pair of classes.\n", "\n", "**LinearSVC**: for linear kernels, there is another object, the `LinearSVC` that uses a different algorithm. On some data it may be faster (for instance sparse data, as in text mining). It uses a \"one versus all\" strategy for multi-class problems." ] }, { "cell_type": "code", "collapsed": false, "input": [ "plot_estimator(svm.LinearSVC(), X, y)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SVRs (Support Vector Regression) work like SVCs, but for regression rather than classification." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Support vectors and regularisation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Support vectors**: The way a support vector machine works is by finding a decision boundary separating the 2 classes that is spanned by a small number of training samples, called \"support vectors\". These samples lie closest to the other class, and can thus be considered as most representative samples in terms of the two-class discrimination problem.\n", "\n", "To make visualization even simpler, let us consider a 2 class problem, for instance using classes 1 and 2 in the iris dataset. These 2 classes are not well linearly separable, which makes it an interesting problem.\n", "\n", "The indices of the support vectors for each class can be found in the `support_vectors_` attribute. We highlight them in the following figure." ] }, { "cell_type": "code", "collapsed": false, "input": [ "X, y = X[np.in1d(y, [1, 2])], y[np.in1d(y, [1, 2])]\n", "plot_estimator(svc, X, y)\n", "pl.scatter(svc.support_vectors_[:, 0], svc.support_vectors_[:, 1], s=80, facecolors='none', zorder=10)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Regularization**: Considering only the discriminant samples is a form of regularization. Indeed, it forces the model to be simpler in how it combines observed structures.\n", "\n", "This regularization can be tuned with the *C* parameter:\n", "\n", "- Low C values: many support vectors... Decision frontier = mean(class A) - mean(class B)\n", "- High C values: small number of support vectors: Decision frontier fully driven by most disriminant samples" ] }, { "cell_type": "code", "collapsed": false, "input": [ "svc = svm.SVC(kernel='linear', C=1e3)\n", "plot_estimator(svc, X, y)\n", "pl.scatter(svc.support_vectors_[:, 0], svc.support_vectors_[:, 1], s=80, facecolors='none', zorder=10)\n", "pl.title('High C values: small number of support vectors')\n", "\n", "svc = svm.SVC(kernel='linear', C=1e-3)\n", "plot_estimator(svc, X, y)\n", "pl.scatter(svc.support_vectors_[:, 0], svc.support_vectors_[:, 1], s=80, facecolors='none', zorder=10)\n", "pl.title('Low C values: high number of support vectors')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One nice features of SVMs is that on many datasets, the default value 'C=1' works well.\n", "\n", "**Practical note: Normalizing data** For many estimators, including the SVMs, having datasets with unit standard deviation for each feature is often important to get good prediction." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Kernels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One appealling aspect of SVMs is that they can easily be used to build non linear decision frontiers using **kernels**. Kernel define the 'building blocks' that are assembled to form a decision rule.\n", "\n", "- **linear** will give linear decision frontiers. It is the most computationally efficient approach and the one that requires the least amount of data.\n", "\n", "- **poly** will give decision frontiers that are polynomial. The order of this polynomial is given by the 'order' argument.\n", "\n", "- **rbf** uses 'radial basis functions' centered at each support vector to assemble a decision frontier. The size of the RBFs, that ultimately controls the smmothness of the decision frontier. RBFs are the most flexible approach, but also the one that will require the largest amount of data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "svc = svm.SVC(kernel='linear')\n", "plot_estimator(svc, X, y)\n", "pl.scatter(svc.support_vectors_[:, 0], svc.support_vectors_[:, 1], s=80, facecolors='none', zorder=10)\n", "pl.title('Linear kernel')\n", "\n", "svc = svm.SVC(kernel='poly', degree=4)\n", "plot_estimator(svc, X, y)\n", "pl.scatter(svc.support_vectors_[:, 0], svc.support_vectors_[:, 1], s=80, facecolors='none', zorder=10)\n", "pl.title('Polynomial kernel')\n", "\n", "svc = svm.SVC(kernel='rbf', gamma=1e2)\n", "plot_estimator(svc, X, y)\n", "pl.scatter(svc.support_vectors_[:, 0], svc.support_vectors_[:, 1], s=80, facecolors='none', zorder=10)\n", "pl.title('RBF kernel')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that RBFs and more flexible and fit our train data best. Remember, minimizing train error is not a goal per se, and we have to watch for overfit." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Exercise: tune an SVM on the digits dataset" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn import datasets\n", "digits = datasets.load_digits()\n", "X, y = digits.data, digits.target\n", "#... now all that is left to do is the prediction" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }