{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": false,
    "editable": false,
    "id": "NYJWmpEfR9lJ",
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "65581aa445e47361f83d64ec69310261",
     "grade": false,
     "grade_id": "cell-92665c32235efd5a",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "# Linear Support Vector Machine (SVM)\n",
    "\n",
    "We've now seen how to optimise analytic functions using PyTorch's optimisers, and in the previous labs and exercises we played with training simple machine learning models with hand-coded gradient descent. Let's put everything together and implement a Soft-Margin Linear Support Vector Machine, which we'll train on some artifically generated data using a range of optimisers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ba82988994ec42e9cc63d42b118605ce",
     "grade": false,
     "grade_id": "cell-51814571a361e8a4",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# We're going to use a library called celluloid to make animations that work on colab\n",
    "try: \n",
    "    from celluloid import Camera\n",
    "except:\n",
    "    !pip install celluloid\n",
    "\n",
    "from IPython.display import HTML\n",
    "import torch\n",
    "import torch.optim as optim"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": false,
    "editable": false,
    "id": "a31Upx80S0Wf",
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "ef09680740eca5e908063bd1cbc2f8b5",
     "grade": false,
     "grade_id": "cell-ad34b4924532e881",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## SVM Recap\n",
    "\n",
    "Recall that an SVM tries to find the maximum margin hyperplane which separates the data classes. For a soft margin SVM\n",
    "where $\\textbf{x}$ is our data, we minimize:\n",
    "\n",
    "\\begin{equation}\n",
    "\\left[\\frac 1 n \\sum_{i=1}^n \\max\\left(0, 1 - y_i(\\textbf{w}\\cdot \\textbf{x}_i + b)\\right) \\right] + \\lambda\\lVert \\textbf{w} \\rVert^2\n",
    "\\end{equation}\n",
    "\n",
    "We can formulate this as an optimization over our weights $\\textbf{w}$ and bias $b$, where we minimize the\n",
    "hinge loss subject to a level 2 weight decay term. The hinge loss for some model outputs\n",
    "$z = \\textbf{w}\\textbf{x} + b$ with targets $y$ is given by:\n",
    "\n",
    "\\begin{equation}\n",
    "\\ell(y,z) = \\max\\left(0, 1 - yz \\right)\n",
    "\\end{equation}\n",
    "\n",
    "First, complete the following function to implement the hinge loss for batches of predictions `y_pred` and targets `y_true`. You should return the mean of the hinge loss across the batch. Note that this is a binary problem with labels are chosen to be $\\{-1,1\\}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "deletable": false,
    "id": "a-0v2QecS6YP",
    "nbgrader": {
     "cell_type": "code",
     "checksum": "4a913c1bb199d596ad24ad4408c9eeb0",
     "grade": false,
     "grade_id": "cell-420f491f3b45382b",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def hinge_loss(y_pred, y_true):\n",
    "    # YOUR CODE HERE\n",
    "    raise NotImplementedError()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "75107b63da640aa87fdbbd284dacb5e2",
     "grade": false,
     "grade_id": "cell-5057286d33fba508",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Defining the SVM\n",
    "\n",
    "Defining the SVM is pretty simple - it's just a basic linear classifier like a Perceptron; what distinguishes it is the loss.  We'll wrap it up in a function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "deletable": false,
    "editable": false,
    "id": "27hRy0i8Sze4",
    "nbgrader": {
     "cell_type": "code",
     "checksum": "8fecc13d5366fa71717a03cf02f1c883",
     "grade": false,
     "grade_id": "cell-41c9a1a8a2140213",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def svm(x, w, b):\n",
    "    h = (w*x).sum(1) + b\n",
    "    return h"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": false,
    "editable": false,
    "id": "diJonMlwS7z4",
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "8477a6a51fa5c79348734e5f43c648f1",
     "grade": false,
     "grade_id": "cell-92a878d686b53c98",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Creating Synthetic Data\n",
    "-----------------------------------------------\n",
    "\n",
    "Now for some data, 1024 samples should do the trick. We normalise here so that our random init is in the same space as\n",
    "the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "deletable": false,
    "editable": false,
    "id": "U4U7FpoiS946",
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d2e3a87b86eb49d5a5420123397df03d",
     "grade": false,
     "grade_id": "cell-210ee9436a431b1d",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.datasets import make_blobs\n",
    "\n",
    "X, Y = make_blobs(n_samples=1024, centers=2, cluster_std=1.2, random_state=1)\n",
    "X = (X - X.mean()) / X.std()\n",
    "Y[np.where(Y == 0)] = -1\n",
    "X, Y = torch.FloatTensor(X), torch.FloatTensor(Y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "bb5c51de1529fce62d95430e304b787f",
     "grade": false,
     "grade_id": "cell-a5d0c7fc6368409c",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "For the first time, we're going to do proper mini-batch gradient descent. As such, we actually need to be able to produce batches of data. PyTorch has the concept of datasets (which represent entire collections of data) and data loaders (which allow us to iterate batches of data from a dataset). This allows the framework to do all the hard work for us:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "3a3dc4137d085f28bb284143c91736f8",
     "grade": false,
     "grade_id": "cell-e65bbcc750fada1a",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "from torch.utils import data\n",
    "\n",
    "dataset = data.TensorDataset(X,Y) # create your datset\n",
    "dataloader = data.DataLoader(dataset, batch_size=32, shuffle=True) # create your dataloader"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": false,
    "editable": false,
    "id": "OR_iJX_6TJRF",
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "2653001f82321a71bde17f19a62975c6",
     "grade": false,
     "grade_id": "cell-ba76c22eaa870aa9",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Visualizing the Training\n",
    "----------------------------------------\n",
    "\n",
    "We now aim to create a nice visualisation, such as the one below, that shows what happens as our SVM learns.\n",
    "\n",
    "![svmgif](https://raw.githubusercontent.com/ecs-vlc/torchbearer/master/docs/_static/img/svm_fit.gif)\n",
    "\n",
    "The code for the visualisation (using [pyplot](https://matplotlib.org/api/pyplot_api.html)) is a bit ugly but we'll\n",
    "try to explain it to some degree. First, we need a mesh grid `xy` over the range of our data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "deletable": false,
    "editable": false,
    "id": "9WWuOIt5TeAA",
    "nbgrader": {
     "cell_type": "code",
     "checksum": "2b0a1274964e386c9899b503c6db3e88",
     "grade": false,
     "grade_id": "cell-527fa37ff55bde6c",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "delta = 0.01\n",
    "x = np.arange(X[:, 0].min(), X[:, 0].max(), delta)\n",
    "y = np.arange(X[:, 1].min(), X[:, 1].max(), delta)\n",
    "x, y = np.meshgrid(x, y)\n",
    "xy = list(map(np.ravel, [x, y]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": false,
    "editable": false,
    "id": "Wm9gBsuzTy7t",
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "fde84120db02f77f649b1ac393a38591",
     "grade": false,
     "grade_id": "cell-8e0141bbdb2b155e",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Now things get a little strange. We start by evaluating our model over the mesh grid from earlier.\n",
    "\n",
    "For our outputs $z \\in \\textbf{Z}$, we can make some observations about the decision boundary. First, that we are\n",
    "outside the margin if $z \\lt -1$ or $z \\gt 1$. Conversely, we are inside the margine where $z \\gt -1$\n",
    "or $z \\lt 1$. \n",
    "\n",
    "This whole process is shown in the function below, which we can call at the end of every epoch. The `camera` takes snapshots of the current plot and is used later to render a video."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "deletable": false,
    "editable": false,
    "id": "QEcC8BsoTzQ9",
    "nbgrader": {
     "cell_type": "code",
     "checksum": "785e22b2bb4b7f867908edc9792e8536",
     "grade": false,
     "grade_id": "cell-6efabcf77a2e0515",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "import matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "def draw_margin(w, b, camera):\n",
    "    w = w.data.numpy()\n",
    "    b = b.data.numpy()\n",
    "\n",
    "    z = (w.dot(xy) + b).reshape(x.shape)\n",
    "    z[np.where(z > 1.)] = 4\n",
    "    z[np.where((z > 0.) & (z <= 1.))] = 3\n",
    "    z[np.where((z > -1.) & (z <= 0.))] = 2\n",
    "    z[np.where(z <= -1.)] = 1\n",
    "\n",
    "    plt.scatter(x=X[:, 0], y=X[:, 1], c=\"black\", s=10)\n",
    "    plt.contourf(x, y, z, cmap=plt.cm.jet, alpha=0.5)\n",
    "    camera.snap()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": false,
    "editable": false,
    "id": "GAdaOug0S_Nf",
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "3a0668d3f64bd7f21a64312127013411",
     "grade": false,
     "grade_id": "cell-f15891d25a807ea4",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Since we don't know that our data is linearly separable, we would like to use a soft-margin SVM. That is, an SVM for\n",
    "which the data does not all have to be outside of the margin. This takes the form of a weight decay term,\n",
    "$\\lambda\\lVert \\textbf{w} \\rVert^2$ in the above equation. This term is called weight decay because the gradient\n",
    "corresponds to subtracting some amount ($2\\lambda\\textbf{w}$) from our weights at each step. \n",
    "\n",
    "Most PyTorch optimisers actually have weight decay built in to them as an option (`weight_decay=...`), so its trivial to incorporate this. \n",
    "\n",
    "At this point we are ready to create and train our model. We've written most of the code, but you'll need to implement the forward and backward pass:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "deletable": false,
    "id": "gpKBohTtTHdr",
    "nbgrader": {
     "cell_type": "code",
     "checksum": "531b2437be629976814f210aeaf54246",
     "grade": false,
     "grade_id": "cell-1631d2d34dd5f1d3",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Set up drawing\n",
    "fig = plt.figure(figsize=(5, 5))\n",
    "camera = Camera(fig)\n",
    "\n",
    "w = torch.randn(1, 2, requires_grad=True)\n",
    "b = torch.randn(1, requires_grad=True)\n",
    "\n",
    "opt = optim.SGD([w,b], lr=0.1, weight_decay=0.01)\n",
    "\n",
    "for epoch in range(50):\n",
    "    for batch in dataloader:\n",
    "        opt.zero_grad()\n",
    "        # YOUR CODE HERE\n",
    "        raise NotImplementedError()\n",
    "        opt.step()\n",
    "    draw_margin(w, b, camera)\n",
    "    \n",
    "# create the animation and display it\n",
    "anim = camera.animate()\n",
    "plt.close()\n",
    "HTML(anim.to_html5_video())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "deletable": false,
    "editable": false,
    "id": "tZwqaO7pT0le",
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "f8be0f68a23e13fddd5f2ca815055f66",
     "grade": false,
     "grade_id": "cell-5b71776c13df59d7",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Now do some further experiments. What optimiser and parameters gets you to a good solution the quickest? Do you notice that when the model is near a solution it jitters around upon each step? Can you add some kind of learning rate decay or schedule from the `torch.optim.lr_scheduler` package to reduce the learning rate over time?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "9078d7fbc64e6cd09b31c42ce52e25b8",
     "grade": false,
     "grade_id": "cell-e3497d120884f361",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# YOUR CODE HERE\n",
    "raise NotImplementedError()"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "name": "3_2_SVM.ipynb",
   "provenance": [],
   "version": "0.3.2"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}