{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deep Learning with Pytorch(简介)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "python 优先的计算包,主要针对用户的两类需求:\n",
    "- 作为numpy的替代品, 提供GPU加速\n",
    "- 作为深度学习框架, 提供最大的灵活性和速度\n",
    "\n",
    "本节目标\n",
    "- 理解PyTorch的核心数据结构 Tensor 和Variable,以及高阶的神经网络接口(nn).\n",
    "- 训练一个神经网络进行图片分类(mnist+cifar10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<torch._C.Generator at 0x7f34ac1fdcc0>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import torch\n",
    "import torch.autograd as autograd\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "import torch.optim as optim\n",
    "from torch.autograd import Variable\n",
    "\n",
    "torch.manual_seed(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Torch's tensor library(数据结构)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Tensors张量 就像numpy的ndarrays(多维数组), 但是PyTorch的Tensor可以使用GPU进行加速。\n",
    "\n",
    "- 一维张量==列向量\n",
    "- 二维张量==矩阵\n",
    "- 三维张量==就叫张量了\n",
    "- 四维张量以上就只是数字了\n",
    "\n",
    "张量是运算的基本数据类型,学习数据类型只需要知道增删改查\n",
    "\n",
    "[文档在此](http://pytorch.org/docs/master/torch.html#tensors)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "### 增:创建张量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Create a torch.Tensor object with the given data.  It is a 1D vector\n",
    "V_data = [1., 2., 3.]\n",
    "V = torch.Tensor(V_data) # 列向量\n",
    "print(V)\n",
    "# Index into V and get a scalar标量\n",
    "print(V[0])\n",
    "\n",
    "\n",
    "# Creates a matrix\n",
    "M_data = [[1., 2., 3.], [4., 5., 6]]\n",
    "M = torch.Tensor(M_data)\n",
    "print(M)\n",
    "# Index into M and get a vector向量\n",
    "print(M[0])\n",
    "\n",
    "\n",
    "# Create a 3D tensor of size 2x2x2.\n",
    "T_data = [[[1.,2.], [3.,4.]],\n",
    "          [[5.,6.], [7.,8.]]]\n",
    "T = torch.Tensor(T_data)\n",
    "print(T)\n",
    "# Index into T and get a matrix矩阵\n",
    "print(T[0])\n",
    "\n",
    "print(\n",
    "torch.randn((3, 4, 5)), # 3x4x5的随机张量\n",
    "torch.eye(5),\n",
    "torch.ones(5),\n",
    "torch.zeros(4,4),\n",
    "torch.from_numpy(np.array([1, 2, 3])),\n",
    "torch.linspace(start=-10, end=10, steps=5),\n",
    "torch.logspace(start=-10, end=10, steps=5)\n",
    ")\n",
    "\n",
    "print(\n",
    "torch.ones_like(T),\n",
    "torch.zeros_like(T),\n",
    ") # 不知道为啥报错,应该有这个属性的"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "**创建GPU: CUDA Tensors**\n",
    "\n",
    "Tensor可以通过`.cuda` 函数转为GPU的Tensor,享受GPU加速"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# 如果GPU可用的话,就执行下一步\n",
    "if torch.cuda.is_available():\n",
    "    x = x.cuda()\n",
    "    y = y.cuda()\n",
    "    x + y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "### 删:删除数据等"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# 重新创建删除元素后的新张量吧,少年!\n",
    "# torch.split()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 改:张量运算、合并、变形、转换等Operations with Tensors"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "#### 运算(以加法运算的几种使用方式为例)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "> **Note**: 函数名后面跟着 `_` 的函数会修改tensor 本身\n",
    "\n",
    ">比如:  `x.copy_(y)`, `x.t_()`, 会改变 x.\n",
    "\n",
    ">但是`x.t()` 返回一个新的 矩阵, 而x的数据不变"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "x = torch.Tensor([ 1., 2., 3. ])\n",
    "y = torch.Tensor([ 4., 5., 6. ])\n",
    "z = x + y # 运算后还是torch.Tensor #----------------0\n",
    "# 调用add()方法\n",
    "z = torch.add(x, y)              #----------------1\n",
    "# 指定加法结果的输出目标为z\n",
    "z = torch.Tensor(3)\n",
    "torch.add(x, y, out=z)           #----------------2\n",
    "# !!!!in-place加法!!\n",
    "y.add(x) # 普通加法, y不变         #----------------3\n",
    "print('y=',y)\n",
    "y.add_(x) # in-place 加法, y变了  #----------------4\n",
    "print('y=',y)\n",
    "print('z=',z)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "#### 合并(按第几维合并)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# By default, it concatenates along the first axis (concatenates rows)默认按第一列合并\n",
    "x_1 = torch.randn(2, 5)\n",
    "y_1 = torch.randn(3, 5)\n",
    "z_1 = torch.cat([x_1, y_1]) # 等价于torch.cat([x_1, y_1], 0), 按第一列合并\n",
    "print('z_1=',z_1)\n",
    "\n",
    "# Concatenate columns:\n",
    "x_2 = torch.randn(2, 3)\n",
    "y_2 = torch.randn(2, 5)\n",
    "z_2 = torch.cat([x_2, y_2], 1) # second arg specifies which axis to concat along按第二列合并\n",
    "print('z_2=',z_2)\n",
    "\n",
    "# If your tensors are not compatible, torch will complain.  Uncomment to see the error\n",
    "# torch.cat([x_1, x_2])  # 没有对齐,不要作死去合并"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "#### 变形(reshape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "x = torch.randn(2, 3, 4)\n",
    "print('x=',x)\n",
    "print(x.view(2, 12)) # Reshape to 2 rows, 12 columns\n",
    "print(x.view(2, -1)) # Same as above.  If one of the dimensions is -1, its size can be inferred\n",
    "# 有-1自动推断-1应该代表的数,这里-1代表3x4=12\n",
    "print(x.view(1, -1))\n",
    "# 这里-1代表2x3x4=24"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 转换(转换成numpy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Tensor和numpy对象共享内存, 所以他们之间的转换很快,而且不会消耗太多的额外资源.\n",
    "\n",
    "但这也意味着,`其中一个变了,另外一个也会随之改变`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# 4.1 torch Tensor -> numpy Array\n",
    "a = torch.ones(5)\n",
    "b = a.numpy()\n",
    "print('a=',a,'b=',b)\n",
    "a.add_(1) #以`_` 结尾的函数 是in-place 会修改自身!!\n",
    "print('a=',a,'b=',b) # tensor变了,numpy 的array也变了,因为他们共享内存。\n",
    "print('\\n\\n')\n",
    "# 4.2 numpy Array -> torch Tensor\n",
    "a = np.ones(5)\n",
    "b = torch.from_numpy(a)\n",
    "np.add(a, 1, out=a)\n",
    "print('a=',a,'b=',b) # array变了,tensor 也变了,因为他们共享内存。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "### 查:张量属性、张量内的数据等"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "**NOTE**: `torch.Size` 是一个tuple对象的子类, 所以它支持tuple的所有操作, 比如`x.size()[0]`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "print(\n",
    "    'T.size()=', T.size(),\n",
    "    'T.size()[1]=', T.size()[1],\n",
    "    'T.sort()=', T.sort(),\n",
    "    'T.sign()=', T.sign(),\n",
    "    \n",
    "# Tensor具有和numpy类似的选取(indexing)操作   (standard numpy-like indexing with all bells and whistles)\n",
    "    'T[:,1]=', T[:,1]\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Computation Graphs and Automatic Differentiation(核心)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Autograd: 自动微分(求导)**\n",
    "\n",
    "在Tensor上的所有操作,autograd都能为他们自动提供微分, autograd采用define-by-run的运行机制(注意和define-and-run的区别),意味着反向传播的过程取决于你怎么定义代码(好抽象),即你每次计算都可以提供一个不一样的操作(不像TensorFlow预先定义好一个图,然后不能改,要运行好几次)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## Variable:变量"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "autograd.Variable是autograd中的核心类, 它简单的封装了Tensor,并支持几乎所有Tensor操作(即tensor支持的操作,你基本都能直接用在Variable上, Tensor在被封装为Variable之后, 可以调用它的.backward操作实现反向传播,自动计算所有梯度)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "import torch.autograd as autograd\n",
    "from torch.autograd import Variable\n",
    "# Variables wrap tensor objects\n",
    "# autograd.Variable是autograd中的核心类, 它简单的封装了Tensor\n",
    "x = autograd.Variable( torch.Tensor([1., 2., 3]), requires_grad=True )\n",
    "# You can access the data with the .data attribute\n",
    "print(x.data)\n",
    "\n",
    "# You can also do all the same operations you did with tensors with Variables.支持几乎所有Tensor操作\n",
    "y = autograd.Variable( torch.Tensor([4., 5., 6]), requires_grad=True )\n",
    "z = x + y\n",
    "print(z.data)\n",
    "\n",
    "# BUT z knows something extra.可以调用它的.backward操作实现反向传播,自动计算所有梯度\n",
    "print(z.grad_fn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "![Variable](Variable.png)\n",
    "通过`.data` 属性,可以访问Variable所包含的Tensor,`.grad` 可以访问对应的梯度(也是个Variable,而不是Tensor)\n",
    "> 注意:.grad 和.data的形状一样, 并且.grad是**累加的(accumulated)**, 意味着每一次运行反向传播,梯度都会加上之前的梯度, 所以运行zero_grad很有必要."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true,
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "x = Variable(torch.ones(2, 2), requires_grad = True)\n",
    "y = x.mean()\n",
    "y.backward()\n",
    "x.grad # y = x.mean-> y = 0.25 * (x[0][0] + x[0][1] + x[1][0] + x[1][1]) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# .grad随y.backward()累加\n",
    "print('x.data=', x.data)\n",
    "y.backward()\n",
    "print('x.grad=', x.grad)\n",
    "y.backward()\n",
    "print('x.grad=', x.grad)\n",
    "\n",
    "# x.grad.data先置0再y.backward(),不累加\n",
    "x.grad.data.zero_()\n",
    "y.backward()\n",
    "print('x.grad=', x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## Function:动态图的实现原理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "autograd中的另一个比较重要的类是Function, 这个类在你实现具有自动求导功能的函数时很有用(注:在实际使用中尽量用nn.module替代)\n",
    "\n",
    "`Variable` 和 `Function` 彼此相互联系, 建立起无环图, 记住计算历史(对应到图,就是每个节点都会记住哪个节点,或者哪条边指向`自己`)\n",
    "这些信息保存在`.creator`属性中, 它指向一个`Function`对象, 这个Function对象的输出是这个Variable自己\n",
    "> (由用户创建的Variable的creator 是None 比如 `assert Variable(Tensor(3,4)).creator is None`.\n",
    "\n",
    "> 注: 关于计算图和方向传播的相关知识,极力推荐: http://colah.github.io/posts/2015-08-Backprop/\n",
    "\n",
    "如果 Variable是一个标量(只包含一个数,而非向量之类的), 你在反向传播时候可以不指定参数,默认是1.否则你必须指定一个梯度,这个梯度和Variable具有相同的形状.\n",
    "\n",
    "$$\n",
    "\\frac{d z}{d x} = \\frac{d z}{d y}    \\frac{d y}{d x}  \n",
    "$$ \n",
    "假设这个Variable是y, 那么反向传播要求的是dz/dx, .backward要传进来的参数就是dz/dy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "x = autograd.Variable(torch.ones(2,2), requires_grad = True)\n",
    "print('x=', x)\n",
    "y = x + 2\n",
    "print('y=', y)\n",
    "print('x.grad_fn=', x.grad_fn)\n",
    "print('y.grad_fn=', y.grad_fn)\n",
    "\n",
    "z = y * y * 3\n",
    "print('z=', z)\n",
    "\n",
    "out = z.mean()\n",
    "print('out=', out)\n",
    "# let's backprop now\n",
    "out.backward() # out.backward() 和 out.backward(torch.Tensor([1.0])) 等价\n",
    "# 打印梯度: d(out)/dx\n",
    "print('x.grad=', x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "矩阵大小都是`4.5`. 显而易见, 在纸上稍微 写一下就知道了\n",
    "假设 `out` *Variable* 是\"$o$\".  \n",
    "则有\n",
    "$$\\begin{align}\n",
    "o &= \\frac{1}{4}\\sum_i z_i\\\\\n",
    "z_i &= 3(x_i+2)^2\\\\\n",
    "z_i\\bigr\\rvert_{x_i=1} &= 27\\\\\n",
    "\\frac{\\partial o}{\\partial x_i} &= \\frac{3}{2}(x_i+2)\\\\\n",
    "\\frac{\\partial o}{\\partial x_i}\\bigr\\rvert_{x_i=1} &= \\frac{9}{2} = 4.5\n",
    "\\end{align}$$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "默认情况下,求导计算将刷新计算图中包含的所有内部缓冲区,因此如果您想在计算图的某部分图上执行两次反向传播,则需要在定义时传入``retain_variables = True``"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "x = Variable(torch.ones(2, 2), requires_grad=True)\n",
    "y = x + 2\n",
    "y.backward(torch.ones(2, 2), retain_graph=True)\n",
    "# 打印梯度: d(out)/dx\n",
    "print('x.grad=', x.grad)\n",
    "z = y * y\n",
    "print('z=', z)\n",
    "\n",
    "gradient = torch.randn(2, 2)\n",
    "\n",
    "# this would fail if we didn't specify that we want to retain variables\n",
    "y.backward(gradient)\n",
    "\n",
    "print('x.grad=', x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "> **有了AutoGrad可以做很多疯狂的事情:**\n",
    "\n",
    "> 更多关于 `Variable` 和 `Function` 的文档: [pytorch.org/docs/autograd.html](http://pytorch.org/docs/autograd.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "x = torch.randn(3)\n",
    "x = Variable(x, requires_grad = True)\n",
    "y = x * 2\n",
    "while y.data.norm() < 1000:\n",
    "    y = y * 2\n",
    "print('y=',y)\n",
    "gradients = torch.FloatTensor([0.1, 1.0, 0.0001])\n",
    "y.backward(gradients)\n",
    "print('x.grad=', x.grad)\n",
    "print('y=',y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Neural Network(神经网络)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`torch.nn` 包用来创建神经网络.\n",
    "\n",
    "`nn` 构建于 `autograd` 之上,来定义和运行神经网络. 这个`nn` 和lua torch的`nn` 接口相似, 但是实现几乎完全不一样.\n",
    "\n",
    "nn.Module可以近似为一个网络(?), 包含很多layers, 调用他的forward(input), 可以返回前向传播的结果output\n",
    " \n",
    "\n",
    "来用LeNet + Mnist练练手吧:\n",
    "\n",
    "这是一个基础的前向传播(feed-forward)的网络: 接收输入,并通过一层一层的传递到最后,给出输出.\n",
    "\n",
    "流程如下(大多数的神经网络训练流程都是这样):\n",
    "\n",
    "- 定义网络结构和参数\n",
    "- 在不同的数据中迭代\n",
    "    - 数据预处理,并输入到网络中\n",
    "    - 计算loss(output和目标距离的偏差)\n",
    "    - 反向传播梯度(误差)\n",
    "    - 更新网络参数(使用最基础的SGD `weight = weight + learning_rate * gradient`)\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义网络"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "\n",
    "class Net(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        self.conv1 = nn.Conv2d(1, 6, 5) # 1通道:因为是黑白图片, 6个输出channels,  5x5 的卷积核\n",
    "        self.conv2 = nn.Conv2d(6, 16, 5) #6通道(因为上面的输出是6通道的)\n",
    "        self.fc1   = nn.Linear(16*5*5, 120) # 仿射层, 其实就是: y = Wx + b\n",
    "        self.fc2   = nn.Linear(120, 84)# 同上\n",
    "        self.fc3   = nn.Linear(84, 10)\n",
    "\n",
    "    def forward(self, x):\n",
    "        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) \n",
    "        # Max pooling over a (2, 2) window ,也可以直接写成2,如下\n",
    "        x = F.max_pool2d(F.relu(self.conv2(x)), 2) \n",
    "        # If the size is a square you can only specify a single number\n",
    "        x = x.view(-1, self.num_flat_features(x))\n",
    "        #x = x.view(x.size()[0], -1) # 用这个应该也行,\n",
    "#         if x.sum()>100: return x \n",
    "        \n",
    "        x = F.relu(self.fc1(x))\n",
    "        x = F.relu(self.fc2(x))\n",
    "        x = self.fc3(x)\n",
    "#        if x[0]>0 : return 1\n",
    "        \n",
    "        return x\n",
    "    \n",
    "    # x的大小是N*H*W*C \n",
    "    # N 是batch_size, 我们要的是features_size,也就是一张图片?的所有像素\n",
    "    def num_flat_features(self, x):\n",
    "        size = x.size()[1:] \n",
    "\n",
    "        num_features = 1\n",
    "        for s in size:\n",
    "            num_features *= s\n",
    "        return num_features\n",
    "\n",
    "net = Net()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 其实我更喜欢这种写法:\n",
    "class MyNet(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(MyNet, self).__init__()\n",
    "        self.main = nn.Sequential(\n",
    "            # input is (nc) x 64 x 64\n",
    "            nn.Conv2d(2, 2, 4, 2, 1, bias=False),\n",
    "            nn.LeakyReLU(0.2, inplace=True),\n",
    "            nn.Conv2d(3 * 8, 1, 4, 1, 0, bias=False),\n",
    "            nn.Sigmoid()\n",
    "        )\n",
    "    def forward(self, input):\n",
    "        return self.main(input)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 或者更加简化:\n",
    "my_net = nn.Sequential(\n",
    "            # input is (nc) x 64 x 64\n",
    "            nn.Conv2d(1, 2, 4, 2, 1, bias=False),\n",
    "            nn.LeakyReLU(0.2, inplace=True),\n",
    "            nn.Conv2d(3 * 8, 1, 4, 1, 0, bias=False),\n",
    "            nn.Sigmoid()\n",
    "        )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(net)\n",
    "print(MyNet())\n",
    "print(my_net)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在nn.module的子类中定义了forward函数, `backward` 函数会自动被实现(利用`autograd`)\n",
    "\n",
    "在`forward` 函数中你可以使用任何的Tensor支持的函数, 还可以使用if,for循环,print,log等等. 标准python是怎么写的, 你就可以怎么写的.\n",
    "\n",
    "\n",
    "可学习的参数通过`net.parameters()`返回"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "params = list(net.parameters())\n",
    "print(len(params))\n",
    "print(params[0].size()) # conv1's .weight"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**记住!!!: forward的输入和输出都是 `autograd.Variable`, 因为只有Variable才有自动求导功能,Tensor是没有的**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "input = Variable(torch.randn(1, 1, 32, 32))\n",
    "out = net(input)\n",
    "print('out=', out)\n",
    "#net.zero_grad() # 所有参数的梯度清零\n",
    "out.backward(torch.ones(1, 10), retain_graph=True) # 反向传播\n",
    "# out.backward(torch.randn(1, 10)) #如果不清零,再次反向传播会如何? 梯度叠加 (把randn改成ones可以更明显的看到效果)\n",
    "print('out=', out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **NOTE**: `torch.nn` 只支持 mini-batches\n",
    "\n",
    "> 不支持一次只输入一个样本, 一次必须是一个batch. 但如果你一定要输入一个样本的话,用 `input.unsqueeze(0)` .(伪装成只有一个样本的batch)\n",
    " \n",
    "比如 `nn.Conv2d` 输入必须是一个 4D Tensor , 形如: `nSamples x nChannels x Height x Width`. 可以把nsample设为1, 但是形状不能是`nChannels x Height x Width`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Module有很多属性,可以查看权重、参数等等"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(net)\n",
    "\n",
    "for param in net.parameters():\n",
    "     print(type(param.data), param.size())\n",
    "     print(list(param.data)) \n",
    "\n",
    "print(net.state_dict().keys())\n",
    "#参数的keys\n",
    "\n",
    "for key in net.state_dict():#模型参数\n",
    "    print(key, 'corresponds to', list(net.state_dict()[key]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## 损失函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "损失函数的定义: A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.\n",
    "\n",
    "nn中常用的损失函数: [several different loss functions under the nn package](http://pytorch.org/docs/nn.html#loss-functions).\n",
    "\n",
    "最简单的loss: `nn.MSELoss` 计算均方误差"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "output = net(input)\n",
    "# target = Variable(torch.range(1, 10))  # a dummy target, for example\n",
    "# torch.range is deprecated in favor of torch.arange and will be removed in 0.3.\n",
    "# arange generates values in [start; end), not [start; end].\n",
    "target = Variable(torch.arange(1, 11))  # a dummy target, for example\n",
    "criterion = nn.MSELoss()\n",
    "loss = criterion(output, target)\n",
    "loss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "现在如果对 `loss` 进行反向传播的溯源(使用它的`.creator` 属性),你会看到它的计算图看起来像这样:\n",
    "\n",
    "```\n",
    "input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d  \n",
    "      -> view -> linear -> relu -> linear -> relu -> linear \n",
    "      -> MSELoss\n",
    "      -> loss\n",
    "```\n",
    "\n",
    "所以,当我们调用`loss.backward()`, 这个图动态生成, 自动微分,图中参数(Parameter)会自动计算他们的导数,并与当前的导数相加(所以zero_grad很有必要)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# For illustration, let us follow a few steps backward\n",
    "print(loss.grad_fn) # MSELoss\n",
    "print(loss.grad_fn.next_functions[0][0]) # Linear\n",
    "print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU/Threshold"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# 运行.backward, 来看看调用之前和调用之后的grad\n",
    "# now we shall call loss.backward(), and have a look at conv1's bias gradients before and after the backward.\n",
    "net.zero_grad() # zeroes the gradient buffers of all parameters\n",
    "print('conv1.bias.grad -- before backward')\n",
    "print(net.conv1.bias.grad)\n",
    "loss.backward() # 这个cell只能运行一次,如再运行,提示要加retain_graph=True\n",
    "print('conv1.bias.grad -- after backward')\n",
    "print(net.conv1.bias.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "> **NOTE**:`nn` 包中包含大量神经网络中会用到的函数和工具 详细文档见 http://pytorch.org/docs/nn.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "## 优化器"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "最简单的梯度下降法: 随机梯度下降(SGD):\n",
    "> `weight = weight - learning_rate * gradient`\n",
    "\n",
    "很容易实现\n",
    "\n",
    "```python\n",
    "learning_rate = 0.01\n",
    "for f in net.parameters():\n",
    "    f.data.sub_(f.grad.data * learning_rate)# inplace\n",
    "```\n",
    "\n",
    "torch.optim 中包含着许多常用的优化方法, 比如RMSProp,Adam等等, 非常易于使用. 而且参照着他们的代码, 实现自己的优化方法也是相当的简单. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "import torch.optim as optim\n",
    "# create your optimizer\n",
    "optimizer = optim.SGD(net.parameters(), lr = 0.01)\n",
    "\n",
    "# in your training loop:\n",
    "optimizer.zero_grad() # zero the gradient buffers\n",
    "output = net(input)\n",
    "loss = criterion(output, target)\n",
    "loss.backward()\n",
    "optimizer.step() # Does the update"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据加载与预处理"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通常来说, 当你要处理图片, 文本,语音甚至视频数据时候, 你必须使用标准的Python工具来加载数据,并转为numpy 数组, 然后再转为torch.Tensor\n",
    " \n",
    "\n",
    "- For images, Pillow, OpenCV很有用 \n",
    "- For audio,  scipy and librosa  \n",
    "- For text, either raw Python or Cython based loading, or NLTK and SpaCy are useful.\n",
    "\n",
    "当然最好用的还是torch提供的vison包,叫做`torchvision`, 实现了常用的图像数据加载功能 比如Imagenet,CIFAR10,MNIST等等, 以及常用的数据转换操作, 这位数据加载带来了极大的方便, 并可避免撰写重复代码.\n",
    " \n",
    "来看看CIFAR10, 它有10个类别: 'airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'.\n",
    " 图片大小:  3x32x32, i.e. 3-通道彩色 32x32分辨率"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Excise(练习:训练图片分类器)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "步骤如下: \n",
    "\n",
    "\n",
    "1. 使用torchvision加载并预处理CIFAR10\n",
    "1. 定义网络\n",
    "1. 定义损失函数\n",
    "1. 训练网络(+ 更新网络参数)\n",
    "1. 测试网络\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 加载和预处理CIFAR10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**ImageFolder**\n",
    "\n",
    "\n",
    "ImageFolder\n",
    "假设图片的文件夹如下所式 :\n",
    "```\n",
    "root/good/xxx.png\n",
    "root/good/xxy.png\n",
    "root/good/xxz.png\n",
    "\n",
    "root/bad/123.png\n",
    "root/bad/nsdf3.png\n",
    "root/bad/asd932_.png\n",
    "```\n",
    "\n",
    "通过如下代码可加载:\n",
    "\n",
    "\n",
    "------\n",
    "\n",
    "```python\n",
    "val_dataset = MyImageFolder('/root',\n",
    "                transform=transforms.Compose([transforms.Scale(opt.image_size),\n",
    "                                             # transforms.Lambda(lambda image:image.rotate(random.randint(0,359))),\n",
    "                                              transforms.ToTensor(),\n",
    "                                              transforms.Normalize([0.5]*3,[0.5]*3)\n",
    "                                             ]), loader=my_loader)\n",
    "\n",
    "val_dataloader=t.utils.data.DataLoader(val_dataset,opt.batch_size,True,num_workers=opt.workers, collate_fn=my_collate)\n",
    "```\n",
    "其中my_loader 用来加载指定路径的图片到内存\n",
    "my_collate 用来对dataset加载的数据进行检查\n",
    "transforms 包含两大类的操作:\n",
    "1. PIL的Image对象的操作\n",
    "2. Torch 的Tensor对象的操作\n",
    "\n",
    "还可以利用transforms.Lambda 传入任意的函数进行操作\n",
    "\n",
    "\n",
    "比如要在传入的图片中进行随机旋转(每次)\n",
    "```python\n",
    "val_dataset = MyImageFolder('/root',\n",
    "                transform=transforms.Compose([transforms.Scale(opt.image_size),\n",
    "                                              transforms.Lambda(lambda image:image.rotate(random.randint(0,359))),\n",
    "                                              transforms.ToTensor(),\n",
    "                                              transforms.Normalize([0.5]*3,[0.5]*3)\n",
    "                                             ]), loader=my_loader)\n",
    "\n",
    "val_dataloader=t.utils.data.DataLoader(val_dataset,opt.batch_size,True,num_workers=opt.workers, collate_fn=my_collate)\n",
    "```\n",
    "\n",
    "-----\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torchvision\n",
    "import torchvision.transforms as transforms"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Files already downloaded and verified\n",
      "Files already downloaded and verified\n"
     ]
    }
   ],
   "source": [
    "# torchvision dataset 的输出默认都是 PILImage: range [0, 1].\n",
    "# 通过transform 来把他们转成[-1,1]\n",
    "\n",
    "transform=transforms.Compose([transforms.ToTensor(),\n",
    "                              transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n",
    "                             ])\n",
    "trainset = torchvision.datasets.CIFAR10(root='./Data/', train=True, download=True, transform=transform)\n",
    "trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, \n",
    "                                          shuffle=True, num_workers=2)\n",
    "\n",
    "testset = torchvision.datasets.CIFAR10(root='./Data/', train=False, download=True, transform=transform)\n",
    "testloader = torch.utils.data.DataLoader(testset, batch_size=4, \n",
    "                                          shuffle=False, num_workers=2)\n",
    "classes = ('plane', 'car', 'bird', 'cat',\n",
    "           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**来看几张图片**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# functions to show an image\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "%matplotlib inline\n",
    "def imshow(img):\n",
    "    img = img / 2 + 0.5 # unnormalize\n",
    "    npimg = img.numpy()\n",
    "    plt.imshow(np.transpose(npimg, (1,2,0)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# show some random training images\n",
    "dataiter = iter(trainloader)\n",
    "images, labels = dataiter.next()\n",
    "print(images.size())\n",
    "# print images\n",
    "imshow(torchvision.utils.make_grid(images))\n",
    "# print labels\n",
    "print(' '.join('%5s'%classes[labels[j]] for j in range(4)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义CNN"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** 直接拷贝上面的LeNet+Mnist网络,然后修改第一个参数为3通道 (因为mnist是黑白, 而cifar是32)\n",
    "\n",
    "提示: You only have to change the first layer, change the number 1 to be 3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Net0(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(Net0, self).__init__()\n",
    "        self.conv1 = nn.Conv2d(3, 6, 5)\n",
    "        self.conv2 = nn.Conv2d(6, 16, 5)\n",
    "        self.fc1   = nn.Linear(16*5*5, 120)\n",
    "        self.fc2   = nn.Linear(120, 84)\n",
    "        self.fc3   = nn.Linear(84, 10)\n",
    "\n",
    "    def forward(self, x):\n",
    "        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) \n",
    "        x = F.max_pool2d(F.relu(self.conv2(x)), 2) \n",
    "        x = x.view(-1, self.num_flat_features(x))\n",
    "        x = F.relu(self.fc1(x))\n",
    "        x = F.relu(self.fc2(x))\n",
    "        x = self.fc3(x)\n",
    "        return x\n",
    "    \n",
    "    def num_flat_features(self, x):\n",
    "        size = x.size()[1:] \n",
    "\n",
    "        num_features = 1\n",
    "        for s in size:\n",
    "            num_features *= s\n",
    "        return num_features\n",
    "\n",
    "net0 = Net0()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Net(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        self.conv1 = nn.Conv2d(3, 6, 5)\n",
    "        self.pool  = nn.MaxPool2d(2,2)\n",
    "        self.conv2 = nn.Conv2d(6, 16, 5)\n",
    "        self.fc1   = nn.Linear(16*5*5, 120)\n",
    "        self.fc2   = nn.Linear(120, 84)\n",
    "        self.fc3   = nn.Linear(84, 10)\n",
    "\n",
    "    def forward(self, x):\n",
    "        x = self.pool(F.relu(self.conv1(x)))\n",
    "        x = self.pool(F.relu(self.conv2(x)))\n",
    "        x = x.view(-1, 16*5*5)\n",
    "        x = F.relu(self.fc1(x))\n",
    "        x = F.relu(self.fc2(x))\n",
    "        x = self.fc3(x)\n",
    "        return x\n",
    "\n",
    "net = Net()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 定义损失函数和优化器(loss和optimizer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch import optim\n",
    "criterion = nn.CrossEntropyLoss() # use a Classification Cross-Entropy loss 交叉熵损失函数\n",
    "optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练网络\n",
    "\n",
    "写一个for循环, 不断地\n",
    "- 输入数据\n",
    "- 计算损失函数\n",
    "- 更新参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for epoch in range(2): # loop over the dataset multiple times \n",
    "    \n",
    "    running_loss = 0.0\n",
    "    for i, data in enumerate(trainloader, 0):#(当所有的数据都被输入了一遍的时候,循环就会结束,所以需要上面的for epoch in)\n",
    "        # get the inputs\n",
    "        inputs, labels = data\n",
    "        \n",
    "        # wrap them in Variable\n",
    "        inputs, labels = Variable(inputs), Variable(labels)\n",
    "        \n",
    "        # zero the parameter gradients\n",
    "        optimizer.zero_grad()\n",
    "        \n",
    "        # forward + backward + optimize\n",
    "        outputs = net(inputs)\n",
    "        loss = criterion(outputs, labels)\n",
    "        loss.backward()        \n",
    "        optimizer.step()\n",
    "        \n",
    "        # print statistics\n",
    "        running_loss += loss.data[0]\n",
    "        if i % 2000 == 1999: # print every 2000 mini-batches\n",
    "            print('[%d, %5d] loss: %.3f' % (epoch+1, i+1, running_loss / 2000))\n",
    "            running_loss = 0.0\n",
    "print('Finished Training')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们训练了2个epoch(也就是每张图片都输入训练了两次)\n",
    "\n",
    "可以看看网络有没有效果(测试的图片输入到网络中, 计算它的label, 然后和实际的label进行比较)\n",
    "\n",
    "先来看看测试集中的一张图片."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataiter = iter(testloader)\n",
    "images, labels = dataiter.next()\n",
    "\n",
    "# print images\n",
    "imshow(torchvision.utils.make_grid(images))\n",
    "print('GroundTruth: ', ' '.join('%5s'%classes[labels[j]] for j in range(4)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "计算label"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 计算图片在每个类别上的分数(能量?)\n",
    "outputs = net(Variable(images))\n",
    "\n",
    "# the outputs are energies for the 10 classes. \n",
    "# Higher the energy for a class, the more the network \n",
    "# thinks that the image is of the particular class\n",
    "\n",
    "# 得分最高的那个类\n",
    "_, predicted = torch.max(outputs.data, 1)\n",
    "print('Predicted: ', ' '.join('%5s'% classes[predicted[j]] for j in range(4)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "还行,至少比随机预测好, 接下来看看在这个测试集的准确率"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "correct = 0\n",
    "total = 0\n",
    "for data in testloader:\n",
    "    images, labels = data\n",
    "    outputs = net(Variable(images))\n",
    "    _, predicted = torch.max(outputs.data, 1)\n",
    "    total += labels.size(0)\n",
    "    correct += (predicted == labels).sum()\n",
    "\n",
    "print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "远比随机猜测(准确率10%)好,说明网络学到了点东西 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class_correct = list(0. for i in range(10))\n",
    "class_total = list(0. for i in range(10))\n",
    "for data in testloader:\n",
    "    images, labels = data\n",
    "    outputs = net(Variable(images))\n",
    "    _, predicted = torch.max(outputs.data, 1)\n",
    "    c = (predicted == labels).squeeze()\n",
    "    for i in range(4):\n",
    "        label = labels[i]\n",
    "        class_correct[label] += c[i]\n",
    "        class_total[label] += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i in range(10):\n",
    "    print('Accuracy of %5s : %2d %%' % (classes[i], 100 * class_correct[i] / class_total[i]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**经典的网络已经根据论文定义好了,拿来就可以用**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torchvision.models as models\n",
    "alexnet = models.alexnet(pretrained=True)  #已经根据论文定义好了模型, 并且有训练好的参数\n",
    "#可以很方便的进行Finetune和特征提取"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training on the GPU\n",
    "就像我们之前把Tensor从CPU转到GPU一样, 模型也可以很简单的从CPU转到GPU, \n",
    "这会把所有的模型参数和buffer转成CUDA tensor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "net.cuda()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "\n",
    "### tensor example\n",
    "x_cpu = torch.randn(10, 20)\n",
    "w_cpu = torch.randn(20, 10)\n",
    "# direct transfer to the GPU\n",
    "x_gpu = x_cpu.cuda()\n",
    "w_gpu = w_cpu.cuda()\n",
    "result_gpu = x_gpu @ w_gpu\n",
    "# get back from GPU to CPU\n",
    "result_cpu = result_gpu.cpu()\n",
    "\n",
    "### model example\n",
    "model = model.cuda()\n",
    "# train step\n",
    "inputs = Variable(inputs.cuda())\n",
    "outputs = model(inputs)\n",
    "# get back from GPU to CPU\n",
    "outputs = outputs.cpu()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果你觉得在GPU上没有比CPU提速很多, 不要着急, 那是因为这个网络实在太小了\n",
    " \n",
    "**Exercise:** 增加网络的深度和宽度 , 看看提速如何\n",
    "(第一个 `nn.Conv2d`的第二个参数,第二个 `nn.Conv2d`的第一个参数, 需要一样(you know Why)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为有些时候我们想在 CPU 和 GPU 中运行相同的模型,而无需改动代码,我们会需要一种封装:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Trainer:\n",
    "    def __init__(self, model, use_cuda=False, gpu_idx=0):\n",
    "        self.use_cuda = use_cuda\n",
    "        self.gpu_idx = gpu_idx\n",
    "        self.model = self.to_gpu(model)\n",
    "\n",
    "    def to_gpu(self, tensor):\n",
    "        if self.use_cuda:\n",
    "            return tensor.cuda(self.gpu_idx)\n",
    "        else:\n",
    "            return tensor\n",
    "\n",
    "    def from_gpu(self, tensor):\n",
    "        if self.use_cuda:\n",
    "            return tensor.cpu()\n",
    "        else:\n",
    "            return tensor\n",
    "\n",
    "    def train(self, inputs):\n",
    "        inputs = self.to_gpu(inputs)\n",
    "        outputs = self.model(inputs)\n",
    "        outputs = self.from_gpu(outputs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 最终架构"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](640.png)\n",
    "\n",
    "这里有一段用于解读的伪代码:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class ImagesDataset(torch.utils.data.Dataset):\n",
    "    pass\n",
    "\n",
    "class Net(nn.Module):\n",
    "    pass\n",
    "\n",
    "model = Net()\n",
    "optimizer = torch.optim.SGD(model.parameters(), lr=0.01)\n",
    "scheduler = lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)\n",
    "criterion = torch.nn.MSELoss()\n",
    "\n",
    "dataset = ImagesDataset(path_to_images)\n",
    "data_loader = torch.utils.data.DataLoader(dataset, batch_size=10)\n",
    "\n",
    "train = True\n",
    "for epoch in range(epochs):\n",
    "    if train:\n",
    "        lr_scheduler.step()\n",
    "\n",
    "    for inputs, labels in data_loader:\n",
    "        inputs = Variable(to_gpu(inputs))\n",
    "        labels = Variable(to_gpu(labels))\n",
    "\n",
    "        outputs = model(inputs)\n",
    "        loss = criterion(outputs, labels)\n",
    "        if train:\n",
    "            optimizer.zero_grad()\n",
    "            loss.backward()\n",
    "            optimizer.step()\n",
    "\n",
    "    if not train:\n",
    "        save_best_model(epoch_validation_accuracy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Next(后续学习)\n",
    "\n",
    "- [训练网络来打游戏(强化学习)](https://goo.gl/uGOksc)\n",
    "- [在IMAGENET上训练ResNet](https://github.com/pytorch/examples/tree/master/imagenet)\n",
    "- [DCGAN](https://github.com/pytorch/examples/tree/master/dcgan)\n",
    "- [LSTM+language model](https://github.com/pytorch/examples/tree/master/word_language_model)\n",
    "- [更多官方example](https://github.com/pytorch/examples)\n",
    "- [更多 tutorials](https://github.com/pytorch/tutorials)\n",
    "- [论坛](https://discuss.pytorch.org/)\n",
    "- [ Slack](http://pytorch.slack.com/messages/beginner/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 如何finetune"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "0. 如何进行参数初始化\n",
    "1. 如何加载和保存模型\n",
    "2. 如何给不同的网络层设置不同的学习率\n",
    "3. 如何冻结某些网络层的学习率\n",
    "4. 如何从某一层中提取feature"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "### 如何进行参数初始化(使用 torch.nn.init )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "def initNetParams(net):\n",
    "    '''Init net parameters.'''\n",
    "    for m in net.modules():\n",
    "        if isinstance(m, nn.Conv2d):\n",
    "            init.xavier_uniform(m.weight)\n",
    "            if m.bias:\n",
    "                init.constant(m.bias, 0)\n",
    "        elif isinstance(m, nn.BatchNorm2d):\n",
    "            init.constant(m.weight, 1)\n",
    "            init.constant(m.bias, 0)\n",
    "        elif isinstance(m, nn.Linear):\n",
    "            init.normal(m.weight, std=1e-3)\n",
    "            if m.bias:\n",
    "                init.constant(m.bias, 0)\n",
    "\n",
    "initNetParams(net)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "### 如何加载和保存模型(torch.save(),torch.load(‘.pth’))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "#### 保存ConvNet\n",
    "\n",
    "使用torch.save()对网络结构和模型参数的保存,有两种保存方式:\n",
    "\n",
    "- 保存整个神经网络的的结构信息和模型参数信息,save的对象是网络net;\n",
    "- 保存神经网络的训练模型参数,save的对象是net.state_dict()。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "torch.save(net1, 'net.pkl')  # 保存整个神经网络的结构和模型参数    \n",
    "torch.save(net1.state_dict(), 'net_params.pkl') # 只保存神经网络的模型参数    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "#### 加载ConvNet\n",
    "\n",
    "对应上面两种保存方式,重载方式也有两种。\n",
    "\n",
    "- 对应第一种完整网络结构信息,重载的时候通过torch.load(‘.pth’)直接初始化新的神经网络对象即可。\n",
    "- 对应第二种只保存模型参数信息,需要首先导入对应的网络,通过net.load_state_dict(torch.load('.pth'))完成模型参数的重载。\n",
    "\n",
    "在网络比较大的时候,第一种方法会花费较多的时间,所占的存储空间也比较大。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "# 保存和加载整个模型  \n",
    "torch.save(model_object, 'model.pth')  \n",
    "model = torch.load('model.pth')  \n",
    "\n",
    "# 仅保存和加载模型参数  \n",
    "torch.save(model_object.state_dict(), 'params.pth')  \n",
    "model_object.load_state_dict(torch.load('params.pth')) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "heading_collapsed": true
   },
   "source": [
    "### 如何给不同的网络层设置不同的学习率(给Optimizer传dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "`Optimizer`也支持为每个参数单独设置选项。若想这么做,不要直接传入`Variable`的`iterable`,而是传入`dict`的`iterable`。每一个`dict`都分别定 义了一组参数,并且包含一个`param`键,这个键对应参数的列表。其他的键应该`optimizer`所接受的其他参数的关键字相匹配,并且会被用于对这组参数的 优化。\n",
    "\n",
    "> 注意:\n",
    "\n",
    "> 您仍然可以将选项作为关键字参数传递。它们将被用作默认值,在不覆盖它们的组中。当您只想改变一个选项,同时保持参数组之间的所有其他选项一致时,这很有用。\n",
    "\n",
    "例如,当我们想指定每一层的学习率时,这是非常有用的:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hidden": true
   },
   "outputs": [],
   "source": [
    "optim.SGD([{'params': model.base.parameters()},\n",
    "           {'params': model.classifier.parameters(), 'lr': 1e-3}\n",
    "          ], lr=1e-2, momentum=0.9)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hidden": true
   },
   "source": [
    "这意味着`model.base`参数将使用默认的学习速率`1e-2`,`model.classifier`参数将使用学习速率`1e-3`,并且`0.9`的`momentum`将会被用于所有的参数。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 如何冻结某些网络层的学习率(TODO)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 如何从某一层中提取feature(TODO)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "navigate_num": "#000000",
    "navigate_text": "#333333",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700",
    "sidebar_border": "#EEEEEE",
    "wrapper_background": "#FFFFFF"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "512px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": false,
   "threshold": 4,
   "toc_cell": false,
   "toc_position": {
    "height": "259px",
    "left": "1px",
    "right": "20px",
    "top": "132px",
    "width": "212px"
   },
   "toc_section_display": "block",
   "toc_window_display": true,
   "widenNotebook": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}