{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Practical Deep Learning for Coders, v3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 2_sgd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from fastai.basics import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this part of the lecture we will explain Stochastic Gradient Descent (SGD) which is an **optimization** method commonly used in neural networks We will ilustrate the concepts with concrete examples.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在这部分,我们将会解释随机梯度下降算法(SGD),它在神经网络应用中是常用的**优化**算法。我们将通过实例来解释其原理和概念。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear Regression problem 线性回归问题" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of linear regression is to fit a line to a set of points.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "线性回归的目标是将一条直线拟合到一组点。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n=100" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[-0.1957, 1.0000],\n", " [ 0.1826, 1.0000],\n", " [-0.1008, 1.0000],\n", " [-0.1449, 1.0000],\n", " [ 0.7091, 1.0000]])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = torch.ones(n,2) \n", "x[:,0].uniform_(-1.,1)\n", "x[:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([3., 2.])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = tensor(3.,2); a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y = x@a + torch.rand(n)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(x[:,0], y);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You want to find **parameters** (weights) `a` such that you minimize the *error* between the points and the line `x@a`. Note that here `a` is unknown. For a regression problem the most common *error function* or *loss function* is the **mean squared error**.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "你希望找到这样的 **参数**(权重) `a`,使得数据点和直线`x@a`之间的 *误差* 尽可能小。需要注意的是这里`a`是未知的。对于回归问题最常用的 *误差函数* 或者说 *损失函数* 是 **均方误差(MSE)** 。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def mse(y_hat, y): return ((y_hat-y)**2).mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose we believe `a = (-1.0,1.0)` then we can compute `y_hat` which is our *prediction* and then compute our error.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "假设我们取`a = (-1.0,1.0)`,那么我们就可以计算 *预测值* `y_hat` ,随后我们可以算出误差来。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = tensor(-1.,1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor(7.9356)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_hat = x@a\n", "mse(y_hat, y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(x[:,0],y)\n", "plt.scatter(x[:,0],y_hat);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So far we have specified the *model* (linear regression) and the *evaluation criteria* (or *loss function*). Now we need to handle *optimization*; that is, how do we find the best values for `a`? How do we find the best *fitting* linear regression.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "到现在我们已经指定了 *模型* 的类型(线性回归),以及 *评估标准* (或者说 *损失函数* ),接下来我们需要处理 *优化* 过程;即,我们如何才能找到最优的`a`呢?我们如何才能找到 *拟合* 最好的线性回归模型呢?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Gradient Descent 梯度下降" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We would like to find the values of `a` that minimize `mse_loss`.
\n", "\n", "我们希望找到最小化`mse_loss`值的`a`的值。\n", "\n", "**Gradient descent** is an algorithm that minimizes functions. Given a function defined by a set of parameters, gradient descent starts with an initial set of parameter values and iteratively moves toward a set of parameter values that minimize the function. This iterative minimization is achieved by taking steps in the negative direction of the function gradient.
\n", "\n", "**梯度下降** 是一个用于优化函数的算法。给定一个由一组参数决定的函数,梯度下降从一组初始的参数值开始,不断向能够最小化函数值的参数值迭代。这个迭代式最小化的结果是,通过向函数梯度的负方向不断递进而得到的。\n", "\n", "Here is gradient descent implemented in [PyTorch](http://pytorch.org/).
\n", "\n", "这里是 [PyTorch](http://pytorch.org/)中梯度下降算法实施的细节。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Parameter containing:\n", "tensor([-1., 1.], requires_grad=True)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = nn.Parameter(a); a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def update():\n", " y_hat = x@a\n", " loss = mse(y, y_hat)\n", " if t % 10 == 0: print(loss)\n", " loss.backward()\n", " with torch.no_grad():\n", " a.sub_(lr * a.grad)\n", " a.grad.zero_()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(7.9356, grad_fn=)\n", "tensor(1.4609, grad_fn=)\n", "tensor(0.4824, grad_fn=)\n", "tensor(0.1995, grad_fn=)\n", "tensor(0.1147, grad_fn=)\n", "tensor(0.0893, grad_fn=)\n", "tensor(0.0816, grad_fn=)\n", "tensor(0.0793, grad_fn=)\n", "tensor(0.0786, grad_fn=)\n", "tensor(0.0784, grad_fn=)\n" ] } ], "source": [ "lr = 1e-1\n", "for t in range(100): update()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(x[:,0],y)\n", "plt.scatter(x[:,0],x@a);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Animate it! 过程动画化" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib import animation, rc\n", "rc('animation', html='jshtml')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", " \n", "
\n", " \n", "
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
\n", " Once \n", " Loop \n", " Reflect \n", "
\n", "
\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = nn.Parameter(tensor(-1.,1))\n", "\n", "fig = plt.figure()\n", "plt.scatter(x[:,0], y, c='orange')\n", "line, = plt.plot(x[:,0], x@a)\n", "plt.close()\n", "\n", "def animate(i):\n", " update()\n", " line.set_ydata(x@a)\n", " return line,\n", "\n", "animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In practice, we don't calculate on the whole file at once, but we use *mini-batches*.
\n", "\n", "实际上,我们并没有立刻计算整个数据集,相反,我们采用 *mini-batches(小批次)* 的策略。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vocab 术语" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Learning rate 学习率\n", "- Epoch 轮次\n", "- Minibatch 小批次\n", "- SGD 随机梯度下降法\n", "- Model / Architecture 模型/架构\n", "- Parameters 参数\n", "- Loss function 损失函数\n", "\n", "For classification problems, we use *cross entropy loss*, also known as *negative log likelihood loss*. This penalizes incorrect confident predictions, and correct unconfident predictions.
\n", "\n", "对于分类问题,我们使用 *交叉熵损失* ,也被称为 *负对数似然损失* 。该损失函数将惩罚那些置信高的错误预测和置信低的正确预测。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 1 }