{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Deep Learning with Pytorch(简介)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "python 优先的计算包,主要针对用户的两类需求:\n", "- 作为numpy的替代品, 提供GPU加速\n", "- 作为深度学习框架, 提供最大的灵活性和速度\n", "\n", "本节目标\n", "- 理解PyTorch的核心数据结构 Tensor 和Variable,以及高阶的神经网络接口(nn).\n", "- 训练一个神经网络进行图片分类(mnist+cifar10)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<torch._C.Generator at 0x7f34ac1fdcc0>" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import torch\n", "import torch.autograd as autograd\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torch.optim as optim\n", "from torch.autograd import Variable\n", "\n", "torch.manual_seed(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Torch's tensor library(数据结构)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tensors张量 就像numpy的ndarrays(多维数组), 但是PyTorch的Tensor可以使用GPU进行加速。\n", "\n", "- 一维张量==列向量\n", "- 二维张量==矩阵\n", "- 三维张量==就叫张量了\n", "- 四维张量以上就只是数字了\n", "\n", "张量是运算的基本数据类型,学习数据类型只需要知道增删改查\n", "\n", "[文档在此](http://pytorch.org/docs/master/torch.html#tensors)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### 增:创建张量" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true, "scrolled": true }, "outputs": [], "source": [ "# Create a torch.Tensor object with the given data. It is a 1D vector\n", "V_data = [1., 2., 3.]\n", "V = torch.Tensor(V_data) # 列向量\n", "print(V)\n", "# Index into V and get a scalar标量\n", "print(V[0])\n", "\n", "\n", "# Creates a matrix\n", "M_data = [[1., 2., 3.], [4., 5., 6]]\n", "M = torch.Tensor(M_data)\n", "print(M)\n", "# Index into M and get a vector向量\n", "print(M[0])\n", "\n", "\n", "# Create a 3D tensor of size 2x2x2.\n", "T_data = [[[1.,2.], [3.,4.]],\n", " [[5.,6.], [7.,8.]]]\n", "T = torch.Tensor(T_data)\n", "print(T)\n", "# Index into T and get a matrix矩阵\n", "print(T[0])\n", "\n", "print(\n", "torch.randn((3, 4, 5)), # 3x4x5的随机张量\n", "torch.eye(5),\n", "torch.ones(5),\n", "torch.zeros(4,4),\n", "torch.from_numpy(np.array([1, 2, 3])),\n", "torch.linspace(start=-10, end=10, steps=5),\n", "torch.logspace(start=-10, end=10, steps=5)\n", ")\n", "\n", "print(\n", "torch.ones_like(T),\n", "torch.zeros_like(T),\n", ") # 不知道为啥报错,应该有这个属性的" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "**创建GPU: CUDA Tensors**\n", "\n", "Tensor可以通过`.cuda` 函数转为GPU的Tensor,享受GPU加速" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "# 如果GPU可用的话,就执行下一步\n", "if torch.cuda.is_available():\n", " x = x.cuda()\n", " y = y.cuda()\n", " x + y" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### 删:删除数据等" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "# 重新创建删除元素后的新张量吧,少年!\n", "# torch.split()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 改:张量运算、合并、变形、转换等Operations with Tensors" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "#### 运算(以加法运算的几种使用方式为例)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "> **Note**: 函数名后面跟着 `_` 的函数会修改tensor 本身\n", "\n", ">比如: `x.copy_(y)`, `x.t_()`, 会改变 x.\n", "\n", ">但是`x.t()` 返回一个新的 矩阵, 而x的数据不变" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "x = torch.Tensor([ 1., 2., 3. ])\n", "y = torch.Tensor([ 4., 5., 6. ])\n", "z = x + y # 运算后还是torch.Tensor #----------------0\n", "# 调用add()方法\n", "z = torch.add(x, y) #----------------1\n", "# 指定加法结果的输出目标为z\n", "z = torch.Tensor(3)\n", "torch.add(x, y, out=z) #----------------2\n", "# !!!!in-place加法!!\n", "y.add(x) # 普通加法, y不变 #----------------3\n", "print('y=',y)\n", "y.add_(x) # in-place 加法, y变了 #----------------4\n", "print('y=',y)\n", "print('z=',z)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "#### 合并(按第几维合并)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "# By default, it concatenates along the first axis (concatenates rows)默认按第一列合并\n", "x_1 = torch.randn(2, 5)\n", "y_1 = torch.randn(3, 5)\n", "z_1 = torch.cat([x_1, y_1]) # 等价于torch.cat([x_1, y_1], 0), 按第一列合并\n", "print('z_1=',z_1)\n", "\n", "# Concatenate columns:\n", "x_2 = torch.randn(2, 3)\n", "y_2 = torch.randn(2, 5)\n", "z_2 = torch.cat([x_2, y_2], 1) # second arg specifies which axis to concat along按第二列合并\n", "print('z_2=',z_2)\n", "\n", "# If your tensors are not compatible, torch will complain. Uncomment to see the error\n", "# torch.cat([x_1, x_2]) # 没有对齐,不要作死去合并" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "#### 变形(reshape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "x = torch.randn(2, 3, 4)\n", "print('x=',x)\n", "print(x.view(2, 12)) # Reshape to 2 rows, 12 columns\n", "print(x.view(2, -1)) # Same as above. If one of the dimensions is -1, its size can be inferred\n", "# 有-1自动推断-1应该代表的数,这里-1代表3x4=12\n", "print(x.view(1, -1))\n", "# 这里-1代表2x3x4=24" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 转换(转换成numpy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tensor和numpy对象共享内存, 所以他们之间的转换很快,而且不会消耗太多的额外资源.\n", "\n", "但这也意味着,`其中一个变了,另外一个也会随之改变`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# 4.1 torch Tensor -> numpy Array\n", "a = torch.ones(5)\n", "b = a.numpy()\n", "print('a=',a,'b=',b)\n", "a.add_(1) #以`_` 结尾的函数 是in-place 会修改自身!!\n", "print('a=',a,'b=',b) # tensor变了,numpy 的array也变了,因为他们共享内存。\n", "print('\\n\\n')\n", "# 4.2 numpy Array -> torch Tensor\n", "a = np.ones(5)\n", "b = torch.from_numpy(a)\n", "np.add(a, 1, out=a)\n", "print('a=',a,'b=',b) # array变了,tensor 也变了,因为他们共享内存。" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### 查:张量属性、张量内的数据等" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "**NOTE**: `torch.Size` 是一个tuple对象的子类, 所以它支持tuple的所有操作, 比如`x.size()[0]`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true, "scrolled": false }, "outputs": [], "source": [ "print(\n", " 'T.size()=', T.size(),\n", " 'T.size()[1]=', T.size()[1],\n", " 'T.sort()=', T.sort(),\n", " 'T.sign()=', T.sign(),\n", " \n", "# Tensor具有和numpy类似的选取(indexing)操作 (standard numpy-like indexing with all bells and whistles)\n", " 'T[:,1]=', T[:,1]\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Computation Graphs and Automatic Differentiation(核心)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Autograd: 自动微分(求导)**\n", "\n", "在Tensor上的所有操作,autograd都能为他们自动提供微分, autograd采用define-by-run的运行机制(注意和define-and-run的区别),意味着反向传播的过程取决于你怎么定义代码(好抽象),即你每次计算都可以提供一个不一样的操作(不像TensorFlow预先定义好一个图,然后不能改,要运行好几次)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "## Variable:变量" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "autograd.Variable是autograd中的核心类, 它简单的封装了Tensor,并支持几乎所有Tensor操作(即tensor支持的操作,你基本都能直接用在Variable上, Tensor在被封装为Variable之后, 可以调用它的.backward操作实现反向传播,自动计算所有梯度)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "import torch.autograd as autograd\n", "from torch.autograd import Variable\n", "# Variables wrap tensor objects\n", "# autograd.Variable是autograd中的核心类, 它简单的封装了Tensor\n", "x = autograd.Variable( torch.Tensor([1., 2., 3]), requires_grad=True )\n", "# You can access the data with the .data attribute\n", "print(x.data)\n", "\n", "# You can also do all the same operations you did with tensors with Variables.支持几乎所有Tensor操作\n", "y = autograd.Variable( torch.Tensor([4., 5., 6]), requires_grad=True )\n", "z = x + y\n", "print(z.data)\n", "\n", "# BUT z knows something extra.可以调用它的.backward操作实现反向传播,自动计算所有梯度\n", "print(z.grad_fn)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "\n", "通过`.data` 属性,可以访问Variable所包含的Tensor,`.grad` 可以访问对应的梯度(也是个Variable,而不是Tensor)\n", "> 注意:.grad 和.data的形状一样, 并且.grad是**累加的(accumulated)**, 意味着每一次运行反向传播,梯度都会加上之前的梯度, 所以运行zero_grad很有必要." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true, "scrolled": false }, "outputs": [], "source": [ "x = Variable(torch.ones(2, 2), requires_grad = True)\n", "y = x.mean()\n", "y.backward()\n", "x.grad # y = x.mean-> y = 0.25 * (x[0][0] + x[0][1] + x[1][0] + x[1][1]) " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "# .grad随y.backward()累加\n", "print('x.data=', x.data)\n", "y.backward()\n", "print('x.grad=', x.grad)\n", "y.backward()\n", "print('x.grad=', x.grad)\n", "\n", "# x.grad.data先置0再y.backward(),不累加\n", "x.grad.data.zero_()\n", "y.backward()\n", "print('x.grad=', x.grad)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "## Function:动态图的实现原理" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "autograd中的另一个比较重要的类是Function, 这个类在你实现具有自动求导功能的函数时很有用(注:在实际使用中尽量用nn.module替代)\n", "\n", "`Variable` 和 `Function` 彼此相互联系, 建立起无环图, 记住计算历史(对应到图,就是每个节点都会记住哪个节点,或者哪条边指向`自己`)\n", "这些信息保存在`.creator`属性中, 它指向一个`Function`对象, 这个Function对象的输出是这个Variable自己\n", "> (由用户创建的Variable的creator 是None 比如 `assert Variable(Tensor(3,4)).creator is None`.\n", "\n", "> 注: 关于计算图和方向传播的相关知识,极力推荐: http://colah.github.io/posts/2015-08-Backprop/\n", "\n", "如果 Variable是一个标量(只包含一个数,而非向量之类的), 你在反向传播时候可以不指定参数,默认是1.否则你必须指定一个梯度,这个梯度和Variable具有相同的形状.\n", "\n", "$$\n", "\\frac{d z}{d x} = \\frac{d z}{d y} \\frac{d y}{d x} \n", "$$ \n", "假设这个Variable是y, 那么反向传播要求的是dz/dx, .backward要传进来的参数就是dz/dy" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "x = autograd.Variable(torch.ones(2,2), requires_grad = True)\n", "print('x=', x)\n", "y = x + 2\n", "print('y=', y)\n", "print('x.grad_fn=', x.grad_fn)\n", "print('y.grad_fn=', y.grad_fn)\n", "\n", "z = y * y * 3\n", "print('z=', z)\n", "\n", "out = z.mean()\n", "print('out=', out)\n", "# let's backprop now\n", "out.backward() # out.backward() 和 out.backward(torch.Tensor([1.0])) 等价\n", "# 打印梯度: d(out)/dx\n", "print('x.grad=', x.grad)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "矩阵大小都是`4.5`. 显而易见, 在纸上稍微 写一下就知道了\n", "假设 `out` *Variable* 是\"$o$\". \n", "则有\n", "$$\\begin{align}\n", "o &= \\frac{1}{4}\\sum_i z_i\\\\\n", "z_i &= 3(x_i+2)^2\\\\\n", "z_i\\bigr\\rvert_{x_i=1} &= 27\\\\\n", "\\frac{\\partial o}{\\partial x_i} &= \\frac{3}{2}(x_i+2)\\\\\n", "\\frac{\\partial o}{\\partial x_i}\\bigr\\rvert_{x_i=1} &= \\frac{9}{2} = 4.5\n", "\\end{align}$$." ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "默认情况下,求导计算将刷新计算图中包含的所有内部缓冲区,因此如果您想在计算图的某部分图上执行两次反向传播,则需要在定义时传入``retain_variables = True``" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "x = Variable(torch.ones(2, 2), requires_grad=True)\n", "y = x + 2\n", "y.backward(torch.ones(2, 2), retain_graph=True)\n", "# 打印梯度: d(out)/dx\n", "print('x.grad=', x.grad)\n", "z = y * y\n", "print('z=', z)\n", "\n", "gradient = torch.randn(2, 2)\n", "\n", "# this would fail if we didn't specify that we want to retain variables\n", "y.backward(gradient)\n", "\n", "print('x.grad=', x.grad)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "> **有了AutoGrad可以做很多疯狂的事情:**\n", "\n", "> 更多关于 `Variable` 和 `Function` 的文档: [pytorch.org/docs/autograd.html](http://pytorch.org/docs/autograd.html)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "x = torch.randn(3)\n", "x = Variable(x, requires_grad = True)\n", "y = x * 2\n", "while y.data.norm() < 1000:\n", " y = y * 2\n", "print('y=',y)\n", "gradients = torch.FloatTensor([0.1, 1.0, 0.0001])\n", "y.backward(gradients)\n", "print('x.grad=', x.grad)\n", "print('y=',y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Neural Network(神经网络)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`torch.nn` 包用来创建神经网络.\n", "\n", "`nn` 构建于 `autograd` 之上,来定义和运行神经网络. 这个`nn` 和lua torch的`nn` 接口相似, 但是实现几乎完全不一样.\n", "\n", "nn.Module可以近似为一个网络(?), 包含很多layers, 调用他的forward(input), 可以返回前向传播的结果output\n", " \n", "\n", "来用LeNet + Mnist练练手吧:\n", "\n", "这是一个基础的前向传播(feed-forward)的网络: 接收输入,并通过一层一层的传递到最后,给出输出.\n", "\n", "流程如下(大多数的神经网络训练流程都是这样):\n", "\n", "- 定义网络结构和参数\n", "- 在不同的数据中迭代\n", " - 数据预处理,并输入到网络中\n", " - 计算loss(output和目标距离的偏差)\n", " - 反向传播梯度(误差)\n", " - 更新网络参数(使用最基础的SGD `weight = weight + learning_rate * gradient`)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 定义网络" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch.nn as nn\n", "import torch.nn.functional as F\n", "\n", "class Net(nn.Module):\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " self.conv1 = nn.Conv2d(1, 6, 5) # 1通道:因为是黑白图片, 6个输出channels, 5x5 的卷积核\n", " self.conv2 = nn.Conv2d(6, 16, 5) #6通道(因为上面的输出是6通道的)\n", " self.fc1 = nn.Linear(16*5*5, 120) # 仿射层, 其实就是: y = Wx + b\n", " self.fc2 = nn.Linear(120, 84)# 同上\n", " self.fc3 = nn.Linear(84, 10)\n", "\n", " def forward(self, x):\n", " x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) \n", " # Max pooling over a (2, 2) window ,也可以直接写成2,如下\n", " x = F.max_pool2d(F.relu(self.conv2(x)), 2) \n", " # If the size is a square you can only specify a single number\n", " x = x.view(-1, self.num_flat_features(x))\n", " #x = x.view(x.size()[0], -1) # 用这个应该也行,\n", "# if x.sum()>100: return x \n", " \n", " x = F.relu(self.fc1(x))\n", " x = F.relu(self.fc2(x))\n", " x = self.fc3(x)\n", "# if x[0]>0 : return 1\n", " \n", " return x\n", " \n", " # x的大小是N*H*W*C \n", " # N 是batch_size, 我们要的是features_size,也就是一张图片?的所有像素\n", " def num_flat_features(self, x):\n", " size = x.size()[1:] \n", "\n", " num_features = 1\n", " for s in size:\n", " num_features *= s\n", " return num_features\n", "\n", "net = Net()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 其实我更喜欢这种写法:\n", "class MyNet(nn.Module):\n", " def __init__(self):\n", " super(MyNet, self).__init__()\n", " self.main = nn.Sequential(\n", " # input is (nc) x 64 x 64\n", " nn.Conv2d(2, 2, 4, 2, 1, bias=False),\n", " nn.LeakyReLU(0.2, inplace=True),\n", " nn.Conv2d(3 * 8, 1, 4, 1, 0, bias=False),\n", " nn.Sigmoid()\n", " )\n", " def forward(self, input):\n", " return self.main(input)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 或者更加简化:\n", "my_net = nn.Sequential(\n", " # input is (nc) x 64 x 64\n", " nn.Conv2d(1, 2, 4, 2, 1, bias=False),\n", " nn.LeakyReLU(0.2, inplace=True),\n", " nn.Conv2d(3 * 8, 1, 4, 1, 0, bias=False),\n", " nn.Sigmoid()\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(net)\n", "print(MyNet())\n", "print(my_net)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在nn.module的子类中定义了forward函数, `backward` 函数会自动被实现(利用`autograd`)\n", "\n", "在`forward` 函数中你可以使用任何的Tensor支持的函数, 还可以使用if,for循环,print,log等等. 标准python是怎么写的, 你就可以怎么写的.\n", "\n", "\n", "可学习的参数通过`net.parameters()`返回" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "params = list(net.parameters())\n", "print(len(params))\n", "print(params[0].size()) # conv1's .weight" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**记住!!!: forward的输入和输出都是 `autograd.Variable`, 因为只有Variable才有自动求导功能,Tensor是没有的**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "input = Variable(torch.randn(1, 1, 32, 32))\n", "out = net(input)\n", "print('out=', out)\n", "#net.zero_grad() # 所有参数的梯度清零\n", "out.backward(torch.ones(1, 10), retain_graph=True) # 反向传播\n", "# out.backward(torch.randn(1, 10)) #如果不清零,再次反向传播会如何? 梯度叠加 (把randn改成ones可以更明显的看到效果)\n", "print('out=', out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> **NOTE**: `torch.nn` 只支持 mini-batches\n", "\n", "> 不支持一次只输入一个样本, 一次必须是一个batch. 但如果你一定要输入一个样本的话,用 `input.unsqueeze(0)` .(伪装成只有一个样本的batch)\n", " \n", "比如 `nn.Conv2d` 输入必须是一个 4D Tensor , 形如: `nSamples x nChannels x Height x Width`. 可以把nsample设为1, 但是形状不能是`nChannels x Height x Width`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Module有很多属性,可以查看权重、参数等等" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(net)\n", "\n", "for param in net.parameters():\n", " print(type(param.data), param.size())\n", " print(list(param.data)) \n", "\n", "print(net.state_dict().keys())\n", "#参数的keys\n", "\n", "for key in net.state_dict():#模型参数\n", " print(key, 'corresponds to', list(net.state_dict()[key]))" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "## 损失函数" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "损失函数的定义: A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.\n", "\n", "nn中常用的损失函数: [several different loss functions under the nn package](http://pytorch.org/docs/nn.html#loss-functions).\n", "\n", "最简单的loss: `nn.MSELoss` 计算均方误差" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "output = net(input)\n", "# target = Variable(torch.range(1, 10)) # a dummy target, for example\n", "# torch.range is deprecated in favor of torch.arange and will be removed in 0.3.\n", "# arange generates values in [start; end), not [start; end].\n", "target = Variable(torch.arange(1, 11)) # a dummy target, for example\n", "criterion = nn.MSELoss()\n", "loss = criterion(output, target)\n", "loss" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "现在如果对 `loss` 进行反向传播的溯源(使用它的`.creator` 属性),你会看到它的计算图看起来像这样:\n", "\n", "```\n", "input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d \n", " -> view -> linear -> relu -> linear -> relu -> linear \n", " -> MSELoss\n", " -> loss\n", "```\n", "\n", "所以,当我们调用`loss.backward()`, 这个图动态生成, 自动微分,图中参数(Parameter)会自动计算他们的导数,并与当前的导数相加(所以zero_grad很有必要)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "# For illustration, let us follow a few steps backward\n", "print(loss.grad_fn) # MSELoss\n", "print(loss.grad_fn.next_functions[0][0]) # Linear\n", "print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU/Threshold" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "# 运行.backward, 来看看调用之前和调用之后的grad\n", "# now we shall call loss.backward(), and have a look at conv1's bias gradients before and after the backward.\n", "net.zero_grad() # zeroes the gradient buffers of all parameters\n", "print('conv1.bias.grad -- before backward')\n", "print(net.conv1.bias.grad)\n", "loss.backward() # 这个cell只能运行一次,如再运行,提示要加retain_graph=True\n", "print('conv1.bias.grad -- after backward')\n", "print(net.conv1.bias.grad)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "> **NOTE**:`nn` 包中包含大量神经网络中会用到的函数和工具 详细文档见 http://pytorch.org/docs/nn.html" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "## 优化器" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "最简单的梯度下降法: 随机梯度下降(SGD):\n", "> `weight = weight - learning_rate * gradient`\n", "\n", "很容易实现\n", "\n", "```python\n", "learning_rate = 0.01\n", "for f in net.parameters():\n", " f.data.sub_(f.grad.data * learning_rate)# inplace\n", "```\n", "\n", "torch.optim 中包含着许多常用的优化方法, 比如RMSProp,Adam等等, 非常易于使用. 而且参照着他们的代码, 实现自己的优化方法也是相当的简单. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "import torch.optim as optim\n", "# create your optimizer\n", "optimizer = optim.SGD(net.parameters(), lr = 0.01)\n", "\n", "# in your training loop:\n", "optimizer.zero_grad() # zero the gradient buffers\n", "output = net(input)\n", "loss = criterion(output, target)\n", "loss.backward()\n", "optimizer.step() # Does the update" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 数据加载与预处理" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "通常来说, 当你要处理图片, 文本,语音甚至视频数据时候, 你必须使用标准的Python工具来加载数据,并转为numpy 数组, 然后再转为torch.Tensor\n", " \n", "\n", "- For images, Pillow, OpenCV很有用 \n", "- For audio, scipy and librosa \n", "- For text, either raw Python or Cython based loading, or NLTK and SpaCy are useful.\n", "\n", "当然最好用的还是torch提供的vison包,叫做`torchvision`, 实现了常用的图像数据加载功能 比如Imagenet,CIFAR10,MNIST等等, 以及常用的数据转换操作, 这位数据加载带来了极大的方便, 并可避免撰写重复代码.\n", " \n", "来看看CIFAR10, 它有10个类别: 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'.\n", " 图片大小: 3x32x32, i.e. 3-通道彩色 32x32分辨率" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Excise(练习:训练图片分类器)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "步骤如下: \n", "\n", "\n", "1. 使用torchvision加载并预处理CIFAR10\n", "1. 定义网络\n", "1. 定义损失函数\n", "1. 训练网络(+ 更新网络参数)\n", "1. 测试网络\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 加载和预处理CIFAR10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**ImageFolder**\n", "\n", "\n", "ImageFolder\n", "假设图片的文件夹如下所式 :\n", "```\n", "root/good/xxx.png\n", "root/good/xxy.png\n", "root/good/xxz.png\n", "\n", "root/bad/123.png\n", "root/bad/nsdf3.png\n", "root/bad/asd932_.png\n", "```\n", "\n", "通过如下代码可加载:\n", "\n", "\n", "------\n", "\n", "```python\n", "val_dataset = MyImageFolder('/root',\n", " transform=transforms.Compose([transforms.Scale(opt.image_size),\n", " # transforms.Lambda(lambda image:image.rotate(random.randint(0,359))),\n", " transforms.ToTensor(),\n", " transforms.Normalize([0.5]*3,[0.5]*3)\n", " ]), loader=my_loader)\n", "\n", "val_dataloader=t.utils.data.DataLoader(val_dataset,opt.batch_size,True,num_workers=opt.workers, collate_fn=my_collate)\n", "```\n", "其中my_loader 用来加载指定路径的图片到内存\n", "my_collate 用来对dataset加载的数据进行检查\n", "transforms 包含两大类的操作:\n", "1. PIL的Image对象的操作\n", "2. Torch 的Tensor对象的操作\n", "\n", "还可以利用transforms.Lambda 传入任意的函数进行操作\n", "\n", "\n", "比如要在传入的图片中进行随机旋转(每次)\n", "```python\n", "val_dataset = MyImageFolder('/root',\n", " transform=transforms.Compose([transforms.Scale(opt.image_size),\n", " transforms.Lambda(lambda image:image.rotate(random.randint(0,359))),\n", " transforms.ToTensor(),\n", " transforms.Normalize([0.5]*3,[0.5]*3)\n", " ]), loader=my_loader)\n", "\n", "val_dataloader=t.utils.data.DataLoader(val_dataset,opt.batch_size,True,num_workers=opt.workers, collate_fn=my_collate)\n", "```\n", "\n", "-----\n", "\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import torchvision\n", "import torchvision.transforms as transforms" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Files already downloaded and verified\n", "Files already downloaded and verified\n" ] } ], "source": [ "# torchvision dataset 的输出默认都是 PILImage: range [0, 1].\n", "# 通过transform 来把他们转成[-1,1]\n", "\n", "transform=transforms.Compose([transforms.ToTensor(),\n", " transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n", " ])\n", "trainset = torchvision.datasets.CIFAR10(root='./Data/', train=True, download=True, transform=transform)\n", "trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, \n", " shuffle=True, num_workers=2)\n", "\n", "testset = torchvision.datasets.CIFAR10(root='./Data/', train=False, download=True, transform=transform)\n", "testloader = torch.utils.data.DataLoader(testset, batch_size=4, \n", " shuffle=False, num_workers=2)\n", "classes = ('plane', 'car', 'bird', 'cat',\n", " 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**来看几张图片**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# functions to show an image\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "%matplotlib inline\n", "def imshow(img):\n", " img = img / 2 + 0.5 # unnormalize\n", " npimg = img.numpy()\n", " plt.imshow(np.transpose(npimg, (1,2,0)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# show some random training images\n", "dataiter = iter(trainloader)\n", "images, labels = dataiter.next()\n", "print(images.size())\n", "# print images\n", "imshow(torchvision.utils.make_grid(images))\n", "# print labels\n", "print(' '.join('%5s'%classes[labels[j]] for j in range(4)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 定义CNN" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** 直接拷贝上面的LeNet+Mnist网络,然后修改第一个参数为3通道 (因为mnist是黑白, 而cifar是32)\n", "\n", "提示: You only have to change the first layer, change the number 1 to be 3." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Net0(nn.Module):\n", " def __init__(self):\n", " super(Net0, self).__init__()\n", " self.conv1 = nn.Conv2d(3, 6, 5)\n", " self.conv2 = nn.Conv2d(6, 16, 5)\n", " self.fc1 = nn.Linear(16*5*5, 120)\n", " self.fc2 = nn.Linear(120, 84)\n", " self.fc3 = nn.Linear(84, 10)\n", "\n", " def forward(self, x):\n", " x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) \n", " x = F.max_pool2d(F.relu(self.conv2(x)), 2) \n", " x = x.view(-1, self.num_flat_features(x))\n", " x = F.relu(self.fc1(x))\n", " x = F.relu(self.fc2(x))\n", " x = self.fc3(x)\n", " return x\n", " \n", " def num_flat_features(self, x):\n", " size = x.size()[1:] \n", "\n", " num_features = 1\n", " for s in size:\n", " num_features *= s\n", " return num_features\n", "\n", "net0 = Net0()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Net(nn.Module):\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " self.conv1 = nn.Conv2d(3, 6, 5)\n", " self.pool = nn.MaxPool2d(2,2)\n", " self.conv2 = nn.Conv2d(6, 16, 5)\n", " self.fc1 = nn.Linear(16*5*5, 120)\n", " self.fc2 = nn.Linear(120, 84)\n", " self.fc3 = nn.Linear(84, 10)\n", "\n", " def forward(self, x):\n", " x = self.pool(F.relu(self.conv1(x)))\n", " x = self.pool(F.relu(self.conv2(x)))\n", " x = x.view(-1, 16*5*5)\n", " x = F.relu(self.fc1(x))\n", " x = F.relu(self.fc2(x))\n", " x = self.fc3(x)\n", " return x\n", "\n", "net = Net()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 定义损失函数和优化器(loss和optimizer)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from torch import optim\n", "criterion = nn.CrossEntropyLoss() # use a Classification Cross-Entropy loss 交叉熵损失函数\n", "optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 训练网络\n", "\n", "写一个for循环, 不断地\n", "- 输入数据\n", "- 计算损失函数\n", "- 更新参数" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for epoch in range(2): # loop over the dataset multiple times \n", " \n", " running_loss = 0.0\n", " for i, data in enumerate(trainloader, 0):#(当所有的数据都被输入了一遍的时候,循环就会结束,所以需要上面的for epoch in)\n", " # get the inputs\n", " inputs, labels = data\n", " \n", " # wrap them in Variable\n", " inputs, labels = Variable(inputs), Variable(labels)\n", " \n", " # zero the parameter gradients\n", " optimizer.zero_grad()\n", " \n", " # forward + backward + optimize\n", " outputs = net(inputs)\n", " loss = criterion(outputs, labels)\n", " loss.backward() \n", " optimizer.step()\n", " \n", " # print statistics\n", " running_loss += loss.data[0]\n", " if i % 2000 == 1999: # print every 2000 mini-batches\n", " print('[%d, %5d] loss: %.3f' % (epoch+1, i+1, running_loss / 2000))\n", " running_loss = 0.0\n", "print('Finished Training')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们训练了2个epoch(也就是每张图片都输入训练了两次)\n", "\n", "可以看看网络有没有效果(测试的图片输入到网络中, 计算它的label, 然后和实际的label进行比较)\n", "\n", "先来看看测试集中的一张图片." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataiter = iter(testloader)\n", "images, labels = dataiter.next()\n", "\n", "# print images\n", "imshow(torchvision.utils.make_grid(images))\n", "print('GroundTruth: ', ' '.join('%5s'%classes[labels[j]] for j in range(4)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "计算label" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 计算图片在每个类别上的分数(能量?)\n", "outputs = net(Variable(images))\n", "\n", "# the outputs are energies for the 10 classes. \n", "# Higher the energy for a class, the more the network \n", "# thinks that the image is of the particular class\n", "\n", "# 得分最高的那个类\n", "_, predicted = torch.max(outputs.data, 1)\n", "print('Predicted: ', ' '.join('%5s'% classes[predicted[j]] for j in range(4)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "还行,至少比随机预测好, 接下来看看在这个测试集的准确率" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "correct = 0\n", "total = 0\n", "for data in testloader:\n", " images, labels = data\n", " outputs = net(Variable(images))\n", " _, predicted = torch.max(outputs.data, 1)\n", " total += labels.size(0)\n", " correct += (predicted == labels).sum()\n", "\n", "print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "远比随机猜测(准确率10%)好,说明网络学到了点东西 " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class_correct = list(0. for i in range(10))\n", "class_total = list(0. for i in range(10))\n", "for data in testloader:\n", " images, labels = data\n", " outputs = net(Variable(images))\n", " _, predicted = torch.max(outputs.data, 1)\n", " c = (predicted == labels).squeeze()\n", " for i in range(4):\n", " label = labels[i]\n", " class_correct[label] += c[i]\n", " class_total[label] += 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i in range(10):\n", " print('Accuracy of %5s : %2d %%' % (classes[i], 100 * class_correct[i] / class_total[i]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**经典的网络已经根据论文定义好了,拿来就可以用**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torchvision.models as models\n", "alexnet = models.alexnet(pretrained=True) #已经根据论文定义好了模型, 并且有训练好的参数\n", "#可以很方便的进行Finetune和特征提取" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training on the GPU\n", "就像我们之前把Tensor从CPU转到GPU一样, 模型也可以很简单的从CPU转到GPU, \n", "这会把所有的模型参数和buffer转成CUDA tensor" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "net.cuda()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "\n", "### tensor example\n", "x_cpu = torch.randn(10, 20)\n", "w_cpu = torch.randn(20, 10)\n", "# direct transfer to the GPU\n", "x_gpu = x_cpu.cuda()\n", "w_gpu = w_cpu.cuda()\n", "result_gpu = x_gpu @ w_gpu\n", "# get back from GPU to CPU\n", "result_cpu = result_gpu.cpu()\n", "\n", "### model example\n", "model = model.cuda()\n", "# train step\n", "inputs = Variable(inputs.cuda())\n", "outputs = model(inputs)\n", "# get back from GPU to CPU\n", "outputs = outputs.cpu()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果你觉得在GPU上没有比CPU提速很多, 不要着急, 那是因为这个网络实在太小了\n", " \n", "**Exercise:** 增加网络的深度和宽度 , 看看提速如何\n", "(第一个 `nn.Conv2d`的第二个参数,第二个 `nn.Conv2d`的第一个参数, 需要一样(you know Why) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "因为有些时候我们想在 CPU 和 GPU 中运行相同的模型,而无需改动代码,我们会需要一种封装:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Trainer:\n", " def __init__(self, model, use_cuda=False, gpu_idx=0):\n", " self.use_cuda = use_cuda\n", " self.gpu_idx = gpu_idx\n", " self.model = self.to_gpu(model)\n", "\n", " def to_gpu(self, tensor):\n", " if self.use_cuda:\n", " return tensor.cuda(self.gpu_idx)\n", " else:\n", " return tensor\n", "\n", " def from_gpu(self, tensor):\n", " if self.use_cuda:\n", " return tensor.cpu()\n", " else:\n", " return tensor\n", "\n", " def train(self, inputs):\n", " inputs = self.to_gpu(inputs)\n", " outputs = self.model(inputs)\n", " outputs = self.from_gpu(outputs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 最终架构" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "这里有一段用于解读的伪代码:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class ImagesDataset(torch.utils.data.Dataset):\n", " pass\n", "\n", "class Net(nn.Module):\n", " pass\n", "\n", "model = Net()\n", "optimizer = torch.optim.SGD(model.parameters(), lr=0.01)\n", "scheduler = lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)\n", "criterion = torch.nn.MSELoss()\n", "\n", "dataset = ImagesDataset(path_to_images)\n", "data_loader = torch.utils.data.DataLoader(dataset, batch_size=10)\n", "\n", "train = True\n", "for epoch in range(epochs):\n", " if train:\n", " lr_scheduler.step()\n", "\n", " for inputs, labels in data_loader:\n", " inputs = Variable(to_gpu(inputs))\n", " labels = Variable(to_gpu(labels))\n", "\n", " outputs = model(inputs)\n", " loss = criterion(outputs, labels)\n", " if train:\n", " optimizer.zero_grad()\n", " loss.backward()\n", " optimizer.step()\n", "\n", " if not train:\n", " save_best_model(epoch_validation_accuracy)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Next(后续学习)\n", "\n", "- [训练网络来打游戏(强化学习)](https://goo.gl/uGOksc)\n", "- [在IMAGENET上训练ResNet](https://github.com/pytorch/examples/tree/master/imagenet)\n", "- [DCGAN](https://github.com/pytorch/examples/tree/master/dcgan)\n", "- [LSTM+language model](https://github.com/pytorch/examples/tree/master/word_language_model)\n", "- [更多官方example](https://github.com/pytorch/examples)\n", "- [更多 tutorials](https://github.com/pytorch/tutorials)\n", "- [论坛](https://discuss.pytorch.org/)\n", "- [ Slack](http://pytorch.slack.com/messages/beginner/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 如何finetune" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "0. 如何进行参数初始化\n", "1. 如何加载和保存模型\n", "2. 如何给不同的网络层设置不同的学习率\n", "3. 如何冻结某些网络层的学习率\n", "4. 如何从某一层中提取feature" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### 如何进行参数初始化(使用 torch.nn.init )" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "def initNetParams(net):\n", " '''Init net parameters.'''\n", " for m in net.modules():\n", " if isinstance(m, nn.Conv2d):\n", " init.xavier_uniform(m.weight)\n", " if m.bias:\n", " init.constant(m.bias, 0)\n", " elif isinstance(m, nn.BatchNorm2d):\n", " init.constant(m.weight, 1)\n", " init.constant(m.bias, 0)\n", " elif isinstance(m, nn.Linear):\n", " init.normal(m.weight, std=1e-3)\n", " if m.bias:\n", " init.constant(m.bias, 0)\n", "\n", "initNetParams(net)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### 如何加载和保存模型(torch.save(),torch.load(‘.pth’))" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "#### 保存ConvNet\n", "\n", "使用torch.save()对网络结构和模型参数的保存,有两种保存方式:\n", "\n", "- 保存整个神经网络的的结构信息和模型参数信息,save的对象是网络net;\n", "- 保存神经网络的训练模型参数,save的对象是net.state_dict()。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "torch.save(net1, 'net.pkl') # 保存整个神经网络的结构和模型参数 \n", "torch.save(net1.state_dict(), 'net_params.pkl') # 只保存神经网络的模型参数 " ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "#### 加载ConvNet\n", "\n", "对应上面两种保存方式,重载方式也有两种。\n", "\n", "- 对应第一种完整网络结构信息,重载的时候通过torch.load(‘.pth’)直接初始化新的神经网络对象即可。\n", "- 对应第二种只保存模型参数信息,需要首先导入对应的网络,通过net.load_state_dict(torch.load('.pth'))完成模型参数的重载。\n", "\n", "在网络比较大的时候,第一种方法会花费较多的时间,所占的存储空间也比较大。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "# 保存和加载整个模型 \n", "torch.save(model_object, 'model.pth') \n", "model = torch.load('model.pth') \n", "\n", "# 仅保存和加载模型参数 \n", "torch.save(model_object.state_dict(), 'params.pth') \n", "model_object.load_state_dict(torch.load('params.pth')) " ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### 如何给不同的网络层设置不同的学习率(给Optimizer传dict)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "`Optimizer`也支持为每个参数单独设置选项。若想这么做,不要直接传入`Variable`的`iterable`,而是传入`dict`的`iterable`。每一个`dict`都分别定 义了一组参数,并且包含一个`param`键,这个键对应参数的列表。其他的键应该`optimizer`所接受的其他参数的关键字相匹配,并且会被用于对这组参数的 优化。\n", "\n", "> 注意:\n", "\n", "> 您仍然可以将选项作为关键字参数传递。它们将被用作默认值,在不覆盖它们的组中。当您只想改变一个选项,同时保持参数组之间的所有其他选项一致时,这很有用。\n", "\n", "例如,当我们想指定每一层的学习率时,这是非常有用的:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hidden": true }, "outputs": [], "source": [ "optim.SGD([{'params': model.base.parameters()},\n", " {'params': model.classifier.parameters(), 'lr': 1e-3}\n", " ], lr=1e-2, momentum=0.9)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "这意味着`model.base`参数将使用默认的学习速率`1e-2`,`model.classifier`参数将使用学习速率`1e-3`,并且`0.9`的`momentum`将会被用于所有的参数。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 如何冻结某些网络层的学习率(TODO)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 如何从某一层中提取feature(TODO)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" }, "toc": { "colors": { "hover_highlight": "#DAA520", "navigate_num": "#000000", "navigate_text": "#333333", "running_highlight": "#FF0000", "selected_highlight": "#FFD700", "sidebar_border": "#EEEEEE", "wrapper_background": "#FFFFFF" }, "moveMenuLeft": true, "nav_menu": { "height": "512px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": false, "threshold": 4, "toc_cell": false, "toc_position": { "height": "259px", "left": "1px", "right": "20px", "top": "132px", "width": "212px" }, "toc_section_display": "block", "toc_window_display": true, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 2 }