{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from fastai import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this part of the lecture we explain Stochastic Gradient Descent (SGD) which is an **optimization** method commonly used in neural networks. We will illustrate the concepts with concrete examples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear Regression problem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of linear regression is to fit a line to a set of points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n=100" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[ 0.1695, 1.0000],\n", " [-0.3731, 1.0000],\n", " [ 0.4746, 1.0000],\n", " [ 0.7718, 1.0000],\n", " [ 0.5793, 1.0000]])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = torch.ones(n,2) \n", "x[:,0].uniform_(-1.,1)\n", "x[:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([3., 2.])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = tensor(3.,2); a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y = x@a + torch.rand(n)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.scatter(x[:,0], y);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You want to find **parameters** (weights) `a` such that you minimize the *error* between the points and the line `x@a`. Note that here `a` is unknown. For a regression problem the most common *error function* or *loss function* is the **mean squared error**. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def mse(y_hat, y): return ((y_hat-y)**2).mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose we believe `a = (-1.0,1.0)` then we can compute `y_hat` which is our *prediction* and then compute our error." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = tensor(-1.,1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor(7.0485)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_hat = x@a\n", "mse(y_hat, y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.scatter(x[:,0],y)\n", "plt.scatter(x[:,0],y_hat);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So far we have specified the *model* (linear regression) and the *evaluation criteria* (or *loss function*). Now we need to handle *optimization*; that is, how do we find the best values for `a`? How do we find the best *fitting* linear regression." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Gradient Descent" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We would like to find the values of `a` that minimize `mse_loss`.\n", "\n", "**Gradient descent** is an algorithm that minimizes functions. Given a function defined by a set of parameters, gradient descent starts with an initial set of parameter values and iteratively moves toward a set of parameter values that minimize the function. This iterative minimization is achieved by taking steps in the negative direction of the function gradient.\n", "\n", "Here is gradient descent implemented in [PyTorch](http://pytorch.org/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Parameter containing:\n", "tensor([-1., 1.], requires_grad=True)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = nn.Parameter(a); a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def update():\n", " y_hat = x@a\n", " loss = mse(y, y_hat)\n", " if t % 10 == 0: print(loss)\n", " loss.backward()\n", " with torch.no_grad():\n", " a.sub_(lr * a.grad)\n", " a.grad.zero_()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(7.0485, grad_fn=)\n", "tensor(1.5014, grad_fn=)\n", "tensor(0.4738, grad_fn=)\n", "tensor(0.1954, grad_fn=)\n", "tensor(0.1185, grad_fn=)\n", "tensor(0.0972, grad_fn=)\n", "tensor(0.0913, grad_fn=)\n", "tensor(0.0897, grad_fn=)\n", "tensor(0.0892, grad_fn=)\n", "tensor(0.0891, grad_fn=)\n" ] } ], "source": [ "lr = 1e-1\n", "for t in range(100): update()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.scatter(x[:,0],y)\n", "plt.scatter(x[:,0],x@a);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Animate it!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib import animation, rc\n", "rc('animation', html='jshtml')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", " \n", "
\n", " \n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", " Once \n", " Loop \n", " Reflect \n", "
\n", "
\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = nn.Parameter(tensor(-1.,1))\n", "\n", "fig = plt.figure()\n", "plt.scatter(x[:,0], y, c='orange')\n", "line, = plt.plot(x[:,0], x@a)\n", "plt.close()\n", "\n", "def animate(i):\n", " update()\n", " line.set_ydata(x@a)\n", " return line,\n", "\n", "animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In practice, we don't calculate on the whole file at once, but we use *mini-batches*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vocab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Learning rate\n", "- Epoch\n", "- Minibatch\n", "- SGD\n", "- Model / Architecture\n", "- Parameters\n", "- Loss function\n", "\n", "For classification problems, we use *cross entropy loss*, also known as *negative log likelihood loss*. This penalizes incorrect confident predictions, and correct unconfident predictions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 1 }