{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "SVM.ipynb", "version": "0.3.2", "provenance": [] }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" }, "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" } }, "cells": [ { "metadata": { "id": "rx7xM35VlNk0", "colab_type": "text" }, "cell_type": "markdown", "source": [ "# Support vector machine classifier with $\\ell_1$-regularization" ] }, { "metadata": { "id": "UBeCbr-dlNk2", "colab_type": "text" }, "cell_type": "markdown", "source": [ "In this example we use CVXPY to train a SVM classifier with $\\ell_1$-regularization.\n", "We are given data $(x_i,y_i)$, $i=1,\\ldots, m$. The $x_i \\in {\\bf R}^n$ are feature vectors, while the $y_i \\in \\{\\pm 1\\}$ are associated boolean outcomes.\n", "Our goal is to construct a good linear classifier $\\hat y = {\\rm sign}(\\beta^T x - v)$.\n", "We find the parameters $\\beta,v$ by minimizing the (convex) function\n", "\n", "$$\n", "f(\\beta,v) = (1/m) \\sum_i \\left(1 - y_i ( \\beta^T x_i-v) \\right)_+ + \\lambda\n", "\\| \\beta\\|_1\n", "$$\n", "\n", "The first term is the average hinge loss. The second term shrinks the coefficients in $\\beta$ and encourages sparsity.\n", "The scalar $\\lambda \\geq 0$ is a (regularization) parameter.\n", "Minimizing $f(\\beta,v)$ simultaneously selects features and fits the classifier." ] }, { "metadata": { "id": "7PlzylpWlNk3", "colab_type": "text" }, "cell_type": "markdown", "source": [ "### Example\n", "\n", "In the following code we generate data with $n=20$ features by randomly choosing $x_i$ and a sparse $\\beta_{\\mathrm{true}} \\in {\\bf R}^n$.\n", "We then set $y_i = {\\rm sign}(\\beta_{\\mathrm{true}}^T x_i -v_{\\mathrm{true}} - z_i)$, where the $z_i$ are i.i.d. normal random variables.\n", "We divide the data into training and test sets with $m=1000$ examples each." ] }, { "metadata": { "id": "DxXlfuzLlNk3", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "# Generate data for SVM classifier with L1 regularization.\n", "from __future__ import division\n", "import numpy as np\n", "np.random.seed(1)\n", "n = 20\n", "m = 1000\n", "TEST = m\n", "DENSITY = 0.2\n", "beta_true = np.random.randn(n,1)\n", "idxs = np.random.choice(range(n), int((1-DENSITY)*n), replace=False)\n", "for idx in idxs:\n", " beta_true[idx] = 0\n", "offset = 0\n", "sigma = 45\n", "X = np.random.normal(0, 5, size=(m,n))\n", "Y = np.sign(X.dot(beta_true) + offset + np.random.normal(0,sigma,size=(m,1)))\n", "X_test = np.random.normal(0, 5, size=(TEST,n))\n", "Y_test = np.sign(X_test.dot(beta_true) + offset + np.random.normal(0,sigma,size=(TEST,1)))" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "dezeHwQDlNk6", "colab_type": "text" }, "cell_type": "markdown", "source": [ "We next formulate the optimization problem using CVXPY." ] }, { "metadata": { "id": "gp278JAclNk7", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "# Form SVM with L1 regularization problem.\n", "import cvxpy as cp\n", "beta = cp.Variable((n,1))\n", "v = cp.Variable()\n", "loss = cp.sum(cp.pos(1 - cp.multiply(Y, X*beta - v)))\n", "reg = cp.norm(beta, 1)\n", "lambd = cp.Parameter(nonneg=True)\n", "prob = cp.Problem(cp.Minimize(loss/m + lambd*reg))" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "v7clejmAlNk9", "colab_type": "text" }, "cell_type": "markdown", "source": [ "We solve the optimization problem for a range of $\\lambda$ to compute a trade-off curve.\n", "We then plot the train and test error over the trade-off curve.\n", "A reasonable choice of $\\lambda$ is the value that minimizes the test error." ] }, { "metadata": { "id": "URUg5liflNk-", "colab_type": "code", "colab": {} }, "cell_type": "code", "source": [ "# Compute a trade-off curve and record train and test error.\n", "TRIALS = 100\n", "train_error = np.zeros(TRIALS)\n", "test_error = np.zeros(TRIALS)\n", "lambda_vals = np.logspace(-2, 0, TRIALS)\n", "beta_vals = []\n", "for i in range(TRIALS):\n", " lambd.value = lambda_vals[i]\n", " prob.solve()\n", " train_error[i] = (np.sign(X.dot(beta_true) + offset) != np.sign(X.dot(beta.value) - v.value)).sum()/m\n", " test_error[i] = (np.sign(X_test.dot(beta_true) + offset) != np.sign(X_test.dot(beta.value) - v.value)).sum()/TEST\n", " beta_vals.append(beta.value)" ], "execution_count": 0, "outputs": [] }, { "metadata": { "id": "giaE3nRDlNlA", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 383 }, "outputId": "2d9bd5f7-d7db-4fc6-87fc-b92fcc2ff8d8" }, "cell_type": "code", "source": [ "# Plot the train and test error over the trade-off curve.\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "%config InlineBackend.figure_format = 'svg'\n", "\n", "plt.plot(lambda_vals, train_error, label=\"Train error\")\n", "plt.plot(lambda_vals, test_error, label=\"Test error\")\n", "plt.xscale('log')\n", "plt.legend(loc='upper left')\n", "plt.xlabel(r\"$\\lambda$\", fontsize=16)\n", "plt.show()" ], "execution_count": 8, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n" }, "metadata": { "tags": [] } } ] }, { "metadata": { "id": "nLca9p-ulNlF", "colab_type": "text" }, "cell_type": "markdown", "source": [ "We also plot the regularization path, or the $\\beta_i$ versus $\\lambda$. Notice that the $\\beta_i$ do not necessarily decrease monotonically as $\\lambda$ increases.\n", "4 features remain non-zero longer for larger $\\lambda$ than the rest, which suggests that these features are the most important. In fact $\\beta_{\\mathrm{true}}$ had 4 non-zero values." ] }, { "metadata": { "id": "9FfvOlADlNlF", "colab_type": "code", "colab": { "base_uri": "https://localhost:8080/", "height": 383 }, "outputId": "b904b3a7-6c5d-4511-8095-06c84ff9acd2" }, "cell_type": "code", "source": [ "# Plot the regularization path for beta.\n", "for i in range(n):\n", " plt.plot(lambda_vals, [wi[i,0] for wi in beta_vals])\n", "plt.xlabel(r\"$\\lambda$\", fontsize=16)\n", "plt.xscale(\"log\")" ], "execution_count": 9, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n" }, "metadata": { "tags": [] } } ] } ] }