{ "cells": [ { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [], "source": [ "import pickle,gzip,math,os,time,shutil,torch,matplotlib as mpl,numpy as np,matplotlib.pyplot as plt\n", "from pathlib import Path\n", "from torch import tensor,nn\n", "import torch.nn.functional as F\n", "from fastcore.test import test_close\n", "\n", "torch.set_printoptions(precision=2, linewidth=140, sci_mode=False)\n", "torch.manual_seed(1)\n", "mpl.rcParams['image.cmap'] = 'gray'\n", "\n", "path_data = Path('data')\n", "path_gz = path_data/'mnist.pkl.gz'\n", "with gzip.open(path_gz, 'rb') as f: ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding='latin-1')\n", "x_train, y_train, x_valid, y_valid = map(tensor, [x_train, y_train, x_valid, y_valid])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initial setup" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Data" ] }, { "cell_type": "code", "execution_count": 122, "metadata": { "hidden": true }, "outputs": [], "source": [ "n,m = x_train.shape\n", "c = y_train.max()+1\n", "nh = 50" ] }, { "cell_type": "code", "execution_count": 123, "metadata": { "hidden": true }, "outputs": [], "source": [ "class Model(nn.Module):\n", " def __init__(self, n_in, nh, n_out):\n", " super().__init__()\n", " self.layers = [nn.Linear(n_in,nh), nn.ReLU(), nn.Linear(nh,n_out)]\n", " \n", " def __call__(self, x):\n", " for l in self.layers: x = l(x)\n", " return x" ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "torch.Size([50000, 10])" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = Model(m, nh, 10)\n", "pred = model(x_train)\n", "pred.shape" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Cross entropy loss" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "First, we will need to compute the softmax of our activations. This is defined by:\n", "\n", "$$\\hbox{softmax(x)}_{i} = \\frac{e^{x_{i}}}{e^{x_{0}} + e^{x_{1}} + \\cdots + e^{x_{n-1}}}$$\n", "\n", "or more concisely:\n", "\n", "$$\\hbox{softmax(x)}_{i} = \\frac{e^{x_{i}}}{\\sum_{0 \\leq j \\leq n-1} e^{x_{j}}}$$ \n", "\n", "In practice, we will need the log of the softmax when we calculate the loss." ] }, { "cell_type": "code", "execution_count": 125, "metadata": { "hidden": true }, "outputs": [], "source": [ "def log_softmax(x): return (x.exp()/(x.exp().sum(-1,keepdim=True))).log()" ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "tensor([[-2.37, -2.49, -2.36, ..., -2.31, -2.28, -2.22],\n", " [-2.37, -2.44, -2.44, ..., -2.27, -2.26, -2.16],\n", " [-2.48, -2.33, -2.28, ..., -2.30, -2.30, -2.27],\n", " ...,\n", " [-2.33, -2.52, -2.34, ..., -2.31, -2.21, -2.16],\n", " [-2.38, -2.38, -2.33, ..., -2.29, -2.26, -2.17],\n", " [-2.33, -2.55, -2.36, ..., -2.29, -2.27, -2.16]], grad_fn=)" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "log_softmax(pred)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Note that the formula \n", "\n", "$$\\log \\left ( \\frac{a}{b} \\right ) = \\log(a) - \\log(b)$$ \n", "\n", "gives a simplification when we compute the log softmax, which was previously defined as `(x.exp()/(x.exp().sum(-1,keepdim=True))).log()`" ] }, { "cell_type": "code", "execution_count": 127, "metadata": { "hidden": true }, "outputs": [], "source": [ "def log_softmax(x): return x - x.exp().sum(-1,keepdim=True).log()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Then, there is a way to compute the log of the sum of exponentials in a more stable way, called the [LogSumExp trick](https://en.wikipedia.org/wiki/LogSumExp). The idea is to use the following formula:\n", "\n", "$$\\log \\left ( \\sum_{j=1}^{n} e^{x_{j}} \\right ) = \\log \\left ( e^{a} \\sum_{j=1}^{n} e^{x_{j}-a} \\right ) = a + \\log \\left ( \\sum_{j=1}^{n} e^{x_{j}-a} \\right )$$\n", "\n", "where a is the maximum of the $x_{j}$." ] }, { "cell_type": "code", "execution_count": 128, "metadata": { "hidden": true }, "outputs": [], "source": [ "def logsumexp(x):\n", " m = x.max(-1)[0]\n", " return m + (x-m[:,None]).exp().sum(-1).log()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "This way, we will avoid an overflow when taking the exponential of a big activation. In PyTorch, this is already implemented for us. " ] }, { "cell_type": "code", "execution_count": 129, "metadata": { "hidden": true }, "outputs": [], "source": [ "def log_softmax(x): return x - x.logsumexp(-1,keepdim=True)" ] }, { "cell_type": "code", "execution_count": 130, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "tensor([[-2.37, -2.49, -2.36, ..., -2.31, -2.28, -2.22],\n", " [-2.37, -2.44, -2.44, ..., -2.27, -2.26, -2.16],\n", " [-2.48, -2.33, -2.28, ..., -2.30, -2.30, -2.27],\n", " ...,\n", " [-2.33, -2.52, -2.34, ..., -2.31, -2.21, -2.16],\n", " [-2.38, -2.38, -2.33, ..., -2.29, -2.26, -2.17],\n", " [-2.33, -2.55, -2.36, ..., -2.29, -2.27, -2.16]], grad_fn=)" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_close(logsumexp(pred), pred.logsumexp(-1))\n", "sm_pred = log_softmax(pred)\n", "sm_pred" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "The cross entropy loss for some target $x$ and some prediction $p(x)$ is given by:\n", "\n", "$$ -\\sum x\\, \\log p(x) $$\n", "\n", "But since our $x$s are 1-hot encoded, this can be rewritten as $-\\log(p_{i})$ where i is the index of the desired target.\n", "\n", "This can be done using numpy-style [integer array indexing](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html#integer-array-indexing). Note that PyTorch supports all the tricks in the advanced indexing methods discussed in that link." ] }, { "cell_type": "code", "execution_count": 131, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "tensor([5, 0, 4])" ] }, "execution_count": 131, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_train[:3]" ] }, { "cell_type": "code", "execution_count": 132, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "(tensor(-2.20, grad_fn=),\n", " tensor(-2.37, grad_fn=),\n", " tensor(-2.36, grad_fn=))" ] }, "execution_count": 132, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sm_pred[0,5],sm_pred[1,0],sm_pred[2,4]" ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "tensor([-2.20, -2.37, -2.36], grad_fn=)" ] }, "execution_count": 133, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sm_pred[[0,1,2], y_train[:3]]" ] }, { "cell_type": "code", "execution_count": 134, "metadata": { "hidden": true }, "outputs": [], "source": [ "def nll(input, target): return -input[range(target.shape[0]), target].mean()" ] }, { "cell_type": "code", "execution_count": 135, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "tensor(2.30, grad_fn=)" ] }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loss = nll(sm_pred, y_train)\n", "loss" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Then use PyTorch's implementation." ] }, { "cell_type": "code", "execution_count": 136, "metadata": { "hidden": true }, "outputs": [], "source": [ "test_close(F.nll_loss(F.log_softmax(pred, -1), y_train), loss, 1e-3)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "In PyTorch, `F.log_softmax` and `F.nll_loss` are combined in one optimized function, `F.cross_entropy`." ] }, { "cell_type": "code", "execution_count": 137, "metadata": { "hidden": true }, "outputs": [], "source": [ "test_close(F.cross_entropy(pred, y_train), loss, 1e-3)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "## Basic training loop" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Basically the training loop repeats over the following steps:\n", "- get the output of the model on a batch of inputs\n", "- compare the output to the labels we have and compute a loss\n", "- calculate the gradients of the loss with respect to every parameter of the model\n", "- update said parameters with those gradients to make them a little bit better" ] }, { "cell_type": "code", "execution_count": 138, "metadata": { "hidden": true }, "outputs": [], "source": [ "loss_func = F.cross_entropy" ] }, { "cell_type": "code", "execution_count": 139, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "(tensor([-0.09, -0.21, -0.08, 0.10, -0.04, 0.08, -0.04, -0.03, 0.01, 0.06], grad_fn=),\n", " torch.Size([64, 10]))" ] }, "execution_count": 139, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bs=64 # batch size\n", "\n", "xb = x_train[0:bs] # a mini-batch from x\n", "preds = model(xb) # predictions\n", "preds[0], preds.shape" ] }, { "cell_type": "code", "execution_count": 140, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "tensor(2.30, grad_fn=)" ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yb = y_train[0:bs]\n", "loss_func(preds, yb)" ] }, { "cell_type": "code", "execution_count": 141, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "tensor([3, 9, 3, 8, 5, 9, 3, 9, 3, 9, 5, 3, 9, 9, 3, 9, 9, 5, 8, 7, 9, 5, 3, 8, 9, 5, 9, 5, 5, 9, 3, 5, 9, 7, 5, 7, 9, 9, 3, 9, 3, 5, 3, 8,\n", " 3, 5, 9, 5, 9, 5, 3, 9, 3, 8, 9, 5, 9, 5, 9, 5, 8, 8, 9, 8])" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "torch.argmax(preds, dim=1)" ] }, { "cell_type": "code", "execution_count": 142, "metadata": { "hidden": true }, "outputs": [], "source": [ "def accuracy(out, yb): return (torch.argmax(out, dim=1)==yb).float().mean()" ] }, { "cell_type": "code", "execution_count": 143, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "tensor(0.09)" ] }, "execution_count": 143, "metadata": {}, "output_type": "execute_result" } ], "source": [ "accuracy(preds, yb)" ] }, { "cell_type": "code", "execution_count": 144, "metadata": { "hidden": true }, "outputs": [], "source": [ "lr = 0.5 # learning rate\n", "epochs = 3 # how many epochs to train for" ] }, { "cell_type": "code", "execution_count": 145, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.3036487102508545 0.09375\n", "0.12374822050333023 0.96875\n", "0.09232541173696518 0.96875\n" ] } ], "source": [ "for epoch in range(epochs):\n", " for i in range(0, n, bs):\n", " s = slice(i, min(n,i+bs))\n", " xb,yb = x_train[s],y_train[s]\n", " preds = model(xb)\n", " loss = loss_func(preds, yb)\n", " loss.backward()\n", " if i==0: print(loss.item(), accuracy(preds, yb).item())\n", " with torch.no_grad():\n", " for l in model.layers:\n", " if hasattr(l, 'weight'):\n", " l.weight -= l.weight.grad * lr\n", " l.bias -= l.bias.grad * lr\n", " l.weight.grad.zero_()\n", " l.bias .grad.zero_()" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "## Using parameters and optim" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### Parameters" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Use `nn.Module.__setattr__`:" ] }, { "cell_type": "code", "execution_count": 146, "metadata": { "hidden": true }, "outputs": [], "source": [ "class Model(nn.Module):\n", " def __init__(self, n_in, nh, n_out):\n", " super().__init__()\n", " self.l1 = nn.Linear(n_in,nh)\n", " self.l2 = nn.Linear(nh,n_out)\n", " self.relu = nn.ReLU()\n", " \n", " def __call__(self, x): return self.l2(self.relu(self.l1(x)))" ] }, { "cell_type": "code", "execution_count": 147, "metadata": { "hidden": true }, "outputs": [], "source": [ "model = Model(m, nh, 10)" ] }, { "cell_type": "code", "execution_count": 148, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "l1: Linear(in_features=784, out_features=50, bias=True)\n", "l2: Linear(in_features=50, out_features=10, bias=True)\n", "relu: ReLU()\n" ] } ], "source": [ "for name,l in model.named_children(): print(f\"{name}: {l}\")" ] }, { "cell_type": "code", "execution_count": 149, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "Model(\n", " (l1): Linear(in_features=784, out_features=50, bias=True)\n", " (l2): Linear(in_features=50, out_features=10, bias=True)\n", " (relu): ReLU()\n", ")" ] }, "execution_count": 149, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model" ] }, { "cell_type": "code", "execution_count": 150, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "Linear(in_features=784, out_features=50, bias=True)" ] }, "execution_count": 150, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.l1" ] }, { "cell_type": "code", "execution_count": 151, "metadata": { "hidden": true }, "outputs": [], "source": [ "def fit():\n", " for epoch in range(epochs):\n", " for i in range(0, n, bs):\n", " s = slice(i, min(n,i+bs))\n", " xb,yb = x_train[s],y_train[s]\n", " preds = model(xb)\n", " loss = loss_func(preds, yb)\n", " loss.backward()\n", " if i==0: print(loss.item(), accuracy(preds, yb).item())\n", " with torch.no_grad():\n", " for p in model.parameters(): p -= p.grad * lr\n", " model.zero_grad()" ] }, { "cell_type": "code", "execution_count": 152, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.309433937072754 0.0625\n", "0.20068740844726562 0.953125\n", "0.18196897208690643 0.9375\n" ] } ], "source": [ "fit()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Behind the scenes, PyTorch overrides the `__setattr__` function in `nn.Module` so that the submodules you define are properly registered as parameters of the model." ] }, { "cell_type": "code", "execution_count": 153, "metadata": { "hidden": true }, "outputs": [], "source": [ "class DummyModule():\n", " def __init__(self, n_in, nh, n_out):\n", " self._modules = {}\n", " self.l1 = nn.Linear(n_in,nh)\n", " self.l2 = nn.Linear(nh,n_out)\n", " \n", " def __setattr__(self,k,v):\n", " if not k.startswith(\"_\"): self._modules[k] = v\n", " super().__setattr__(k,v)\n", " \n", " def __repr__(self): return f'{self._modules}'\n", " \n", " def parameters(self):\n", " for l in self._modules.values():\n", " for p in l.parameters(): yield p" ] }, { "cell_type": "code", "execution_count": 154, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "{'l1': Linear(in_features=784, out_features=50, bias=True), 'l2': Linear(in_features=50, out_features=10, bias=True)}" ] }, "execution_count": 154, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mdl = DummyModule(m,nh,10)\n", "mdl" ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "[torch.Size([50, 784]),\n", " torch.Size([50]),\n", " torch.Size([10, 50]),\n", " torch.Size([10])]" ] }, "execution_count": 155, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[o.shape for o in mdl.parameters()]" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### Registering modules" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "We can use the original `layers` approach, but we have to register the modules." ] }, { "cell_type": "code", "execution_count": 156, "metadata": { "hidden": true }, "outputs": [], "source": [ "layers = [nn.Linear(m,nh), nn.ReLU(), nn.Linear(nh,10)]" ] }, { "cell_type": "code", "execution_count": 157, "metadata": { "hidden": true }, "outputs": [], "source": [ "class Model(nn.Module):\n", " def __init__(self, layers):\n", " super().__init__()\n", " self.layers = layers\n", " for i,l in enumerate(self.layers): self.add_module(f'layer_{i}', l)\n", " \n", " def __call__(self, x):\n", " for l in self.layers: x = l(x)\n", " return x" ] }, { "cell_type": "code", "execution_count": 158, "metadata": { "hidden": true }, "outputs": [], "source": [ "model = Model(layers)" ] }, { "cell_type": "code", "execution_count": 159, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "Model(\n", " (layer_0): Linear(in_features=784, out_features=50, bias=True)\n", " (layer_1): ReLU()\n", " (layer_2): Linear(in_features=50, out_features=10, bias=True)\n", ")" ] }, "execution_count": 159, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### nn.ModuleList" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "`nn.ModuleList` does this for us." ] }, { "cell_type": "code", "execution_count": 160, "metadata": { "hidden": true }, "outputs": [], "source": [ "class SequentialModel(nn.Module):\n", " def __init__(self, layers):\n", " super().__init__()\n", " self.layers = nn.ModuleList(layers)\n", " \n", " def __call__(self, x):\n", " for l in self.layers: x = l(x)\n", " return x" ] }, { "cell_type": "code", "execution_count": 161, "metadata": { "hidden": true }, "outputs": [], "source": [ "model = SequentialModel(layers)" ] }, { "cell_type": "code", "execution_count": 162, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "SequentialModel(\n", " (layers): ModuleList(\n", " (0): Linear(in_features=784, out_features=50, bias=True)\n", " (1): ReLU()\n", " (2): Linear(in_features=50, out_features=10, bias=True)\n", " )\n", ")" ] }, "execution_count": 162, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model" ] }, { "cell_type": "code", "execution_count": 163, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.3222596645355225 0.03125\n", "0.14386768639087677 0.96875\n", "0.08797654509544373 0.96875\n" ] }, { "data": { "text/plain": [ "(tensor(0.02, grad_fn=), tensor(1.))" ] }, "execution_count": 163, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fit()\n", "loss_func(model(xb), yb), accuracy(model(xb), yb)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### nn.Sequential" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "`nn.Sequential` is a convenient class which does the same as the above:" ] }, { "cell_type": "code", "execution_count": 164, "metadata": { "hidden": true }, "outputs": [], "source": [ "model = nn.Sequential(nn.Linear(m,nh), nn.ReLU(), nn.Linear(nh,10))" ] }, { "cell_type": "code", "execution_count": 165, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.3087124824523926 0.09375\n", "0.20316630601882935 0.90625\n", "0.20330585539340973 0.921875\n" ] }, { "data": { "text/plain": [ "(tensor(0.01, grad_fn=), tensor(1.))" ] }, "execution_count": 165, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fit()\n", "loss_func(model(xb), yb), accuracy(model(xb), yb)" ] }, { "cell_type": "code", "execution_count": 166, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): Linear(in_features=784, out_features=50, bias=True)\n", " (1): ReLU()\n", " (2): Linear(in_features=50, out_features=10, bias=True)\n", ")" ] }, "execution_count": 166, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true, "hidden": true }, "source": [ "### optim" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Let's replace our previous manually coded optimization step:\n", "\n", "```python\n", "with torch.no_grad():\n", " for p in model.parameters(): p -= p.grad * lr\n", " model.zero_grad()\n", "```\n", "\n", "and instead use just:\n", "\n", "```python\n", "opt.step()\n", "opt.zero_grad()\n", "```" ] }, { "cell_type": "code", "execution_count": 167, "metadata": { "hidden": true }, "outputs": [], "source": [ "class Optimizer():\n", " def __init__(self, params, lr=0.5): self.params,self.lr=list(params),lr\n", " \n", " def step(self):\n", " with torch.no_grad():\n", " for p in self.params: p -= p.grad * self.lr\n", "\n", " def zero_grad(self):\n", " for p in self.params: p.grad.data.zero_()" ] }, { "cell_type": "code", "execution_count": 168, "metadata": { "hidden": true }, "outputs": [], "source": [ "model = nn.Sequential(nn.Linear(m,nh), nn.ReLU(), nn.Linear(nh,10))" ] }, { "cell_type": "code", "execution_count": 169, "metadata": { "hidden": true }, "outputs": [], "source": [ "opt = Optimizer(model.parameters())" ] }, { "cell_type": "code", "execution_count": 170, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.30017352104187 0.0625\n", "0.13068024814128876 0.96875\n", "0.11748667806386948 0.96875\n" ] } ], "source": [ "for epoch in range(epochs):\n", " for i in range(0, n, bs):\n", " s = slice(i, min(n,i+bs))\n", " xb,yb = x_train[s],y_train[s]\n", " preds = model(xb)\n", " loss = loss_func(preds, yb)\n", " loss.backward()\n", " if i==0: print(loss.item(), accuracy(preds, yb).item())\n", "\n", " opt.step()\n", " opt.zero_grad()" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "PyTorch already provides this exact functionality in `optim.SGD` (it also handles stuff like momentum, which we'll look at later)" ] }, { "cell_type": "code", "execution_count": 171, "metadata": { "hidden": true }, "outputs": [], "source": [ "from torch import optim" ] }, { "cell_type": "code", "execution_count": 172, "metadata": { "hidden": true }, "outputs": [], "source": [ "def get_model():\n", " model = nn.Sequential(nn.Linear(m,nh), nn.ReLU(), nn.Linear(nh,10))\n", " return model, optim.SGD(model.parameters(), lr=lr)" ] }, { "cell_type": "code", "execution_count": 173, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "tensor(2.30, grad_fn=)" ] }, "execution_count": 173, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model,opt = get_model()\n", "loss_func(model(xb), yb)" ] }, { "cell_type": "code", "execution_count": 174, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.312685012817383 0.078125\n", "0.21422098577022552 0.90625\n", "0.17829009890556335 0.921875\n" ] } ], "source": [ "for epoch in range(epochs):\n", " for i in range(0, n, bs):\n", " s = slice(i, min(n,i+bs))\n", " xb,yb = x_train[s],y_train[s]\n", " preds = model(xb)\n", " loss = loss_func(preds, yb)\n", " loss.backward()\n", " if i==0: print(loss.item(), accuracy(preds, yb).item())\n", " opt.step()\n", " opt.zero_grad()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset and DataLoader" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Dataset" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "It's clunky to iterate through minibatches of x and y values separately:\n", "\n", "```python\n", " xb = x_train[s]\n", " yb = y_train[s]\n", "```\n", "\n", "Instead, let's do these two steps together, by introducing a `Dataset` class:\n", "\n", "```python\n", " xb,yb = train_ds[s]\n", "```" ] }, { "cell_type": "code", "execution_count": 175, "metadata": { "hidden": true }, "outputs": [], "source": [ "class Dataset():\n", " def __init__(self, x, y): self.x,self.y = x,y\n", " def __len__(self): return len(self.x)\n", " def __getitem__(self, i): return self.x[i],self.y[i]" ] }, { "cell_type": "code", "execution_count": 176, "metadata": { "hidden": true }, "outputs": [], "source": [ "train_ds,valid_ds = Dataset(x_train, y_train),Dataset(x_valid, y_valid)\n", "assert len(train_ds)==len(x_train)\n", "assert len(valid_ds)==len(x_valid)" ] }, { "cell_type": "code", "execution_count": 177, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "(tensor([[0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.]]),\n", " tensor([5, 0, 4, 1, 9]))" ] }, "execution_count": 177, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xb,yb = train_ds[0:5]\n", "assert xb.shape==(5,28*28)\n", "assert yb.shape==(5,)\n", "xb,yb" ] }, { "cell_type": "code", "execution_count": 178, "metadata": { "hidden": true }, "outputs": [], "source": [ "model,opt = get_model()" ] }, { "cell_type": "code", "execution_count": 179, "metadata": { "hidden": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.2979648113250732 0.09375\n", "0.21198976039886475 0.953125\n", "0.17290501296520233 0.921875\n" ] } ], "source": [ "for epoch in range(epochs):\n", " for i in range(0, n, bs):\n", " xb,yb = train_ds[i:min(n,i+bs)]\n", " pred = model(xb)\n", " loss = loss_func(pred, yb)\n", " if i==0: print(loss.item(), accuracy(pred, yb).item())\n", "\n", " loss.backward()\n", " opt.step()\n", " opt.zero_grad()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DataLoader" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Previously, our loop iterated over batches (xb, yb) like this:\n", "\n", "```python\n", "for i in range(0, n, bs):\n", " xb,yb = train_ds[i:min(n,i+bs)]\n", " ...\n", "```\n", "\n", "Let's make our loop much cleaner, using a data loader:\n", "\n", "```python\n", "for xb,yb in train_dl:\n", " ...\n", "```" ] }, { "cell_type": "code", "execution_count": 180, "metadata": {}, "outputs": [], "source": [ "class DataLoader():\n", " def __init__(self, ds, bs): self.ds,self.bs = ds,bs\n", " def __iter__(self):\n", " for i in range(0, len(self.ds), self.bs): yield self.ds[i:i+self.bs]" ] }, { "cell_type": "code", "execution_count": 181, "metadata": {}, "outputs": [], "source": [ "train_dl = DataLoader(train_ds, bs)\n", "valid_dl = DataLoader(valid_ds, bs)" ] }, { "cell_type": "code", "execution_count": 182, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([64, 784])" ] }, "execution_count": 182, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xb,yb = next(iter(valid_dl))\n", "xb.shape" ] }, { "cell_type": "code", "execution_count": 183, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([3, 8, 6, 9, 6, 4, 5, 3, 8, 4, 5, 2, 3, 8, 4, 8, 1, 5, 0, 5, 9, 7, 4, 1, 0, 3, 0, 6, 2, 9, 9, 4, 1, 3, 6, 8, 0, 7, 7, 6, 8, 9, 0, 3,\n", " 8, 3, 7, 7, 8, 4, 4, 1, 2, 9, 8, 1, 1, 0, 6, 6, 5, 0, 1, 1])" ] }, "execution_count": 183, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yb" ] }, { "cell_type": "code", "execution_count": 184, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor(3)" ] }, "execution_count": 184, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAANeElEQVR4nO3df6hc9ZnH8c9HTTExQaNBTdJo2hv/2GUxZhVZMSzVYnFFiBVcGnDJxsCtUKHVVVayQkUphGVbBf+IpBiSXbuWmtg1VCWKhPUXFOOP1djY+INsEnNzgwY0otKNPvvHPVmuyT3fuZlfZ/Y+7xdcZuY8c855GPLJOTPfM/N1RAjA1HdS0w0A6A/CDiRB2IEkCDuQBGEHkjilnzuzzUf/QI9FhCda3tGR3fbVtv9o+13bd3ayLQC95XbH2W2fLGmXpKsk7ZP0sqTlEfGHwjoc2YEe68WR/VJJ70bE+xHxJ0m/lrSsg+0B6KFOwj5f0t5xj/dVy77G9rDt7ba3d7AvAB3q5AO6iU4VjjtNj4h1ktZJnMYDTerkyL5P0oJxj78paX9n7QDolU7C/rKkC2x/y/Y3JP1A0pbutAWg29o+jY+II7ZvkbRV0smS1kfEW13rDEBXtT301tbOeM8O9FxPLqoB8P8HYQeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJ9HXKZrRn8eLFxfqtt95aWxsaGiquO2PGjGJ99erVxfrpp59erD/11FO1tcOHDxfXRXdxZAeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJJjFdQDMnDmzWN+zZ0+xfsYZZ3Sxm+764IMPamul6wMkadOmTd1uJ4W6WVw7uqjG9m5JhyV9KelIRFzSyfYA9E43rqC7IiI+7MJ2APQQ79mBJDoNe0h62vYrtocneoLtYdvbbW/vcF8AOtDpafzlEbHf9tmSnrH9dkQ8N/4JEbFO0jqJD+iAJnV0ZI+I/dXtQUm/lXRpN5oC0H1th932abZnHb0v6XuSdnSrMQDd1fY4u+1va+xoLo29Hfj3iPhZi3U4jZ/ArFmzivUnn3yyWP/oo49qa6+99lpx3SVLlhTr559/frG+YMGCYn369Om1tdHR0eK6l112WbHeav2suj7OHhHvSyr/qgKAgcHQG5AEYQeSIOxAEoQdSIKwA0nwFVd0ZM6cOcX6HXfc0VZNklauXFmsb9y4sVjPqm7ojSM7kARhB5Ig7EAShB1IgrADSRB2IAnCDiTBlM3oyIcfln9r9MUXX6yttRpnb/X1W8bZTwxHdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgnF2dGT27NnF+urVq9ve9rx589peF8fjyA4kQdiBJAg7kARhB5Ig7EAShB1IgrADSfC78ShavLg8Ue+jjz5arC9atKi2tmvXruK6V111VbG+d+/eYj2rtn833vZ62wdt7xi37Ezbz9h+p7otX1kBoHGTOY3fIOnqY5bdKenZiLhA0rPVYwADrGXYI+I5SYeOWbxM0tHfBNoo6brutgWg29q9Nv6ciBiRpIgYsX123RNtD0sabnM/ALqk51+EiYh1ktZJfEAHNKndobdR23Mlqbo92L2WAPRCu2HfImlFdX+FpMe70w6AXmk5zm77EUnfkTRH0qikn0r6D0m/kXSepD2SboiIYz/Em2hbnMYPmBUrVhTr99xzT7G+YMGCYv3zzz+vrV177bXFdbdt21asY2J14+wt37NHxPKa0nc76ghAX3G5LJAEYQeSIOxAEoQdSIKwA0nwU9JTwMyZM2trt99+e3Hdu+66q1g/6aTy8eDQofKI69KlS2trb7/9dnFddBdHdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgnH2KWDDhg21teuvv76jbW/atKlYv//++4t1xtIHB0d2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCcfYpYGhoqGfbXrt2bbH+0ksv9Wzf6C6O7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBOPsU8DTTz9dW1u8eHHPti21Hodfs2ZNbW3//v1t9YT2tDyy215v+6DtHeOW3W37A9uvV3/X9LZNAJ2azGn8BklXT7D8voi4qPp7srttAei2lmGPiOcklef4ATDwOvmA7hbbb1Sn+bPrnmR72PZ229s72BeADrUb9rWShiRdJGlE0s/rnhgR6yLikoi4pM19AeiCtsIeEaMR8WVEfCXpl5Iu7W5bALqtrbDbnjvu4fcl7ah7LoDB4IgoP8F+RNJ3JM2RNCrpp9XjiySFpN2SfhgRIy13Zpd3hrZMnz69tvbwww8X17344ouL9fPOO6+tno46cOBAbW3lypXFdbdu3drRvrOKCE+0vOVFNRGxfILFD3XcEYC+4nJZIAnCDiRB2IEkCDuQBGEHkmg59NbVnTH01nennnpqsX7KKeUBmU8++aSb7XzNF198UazfdtttxfqDDz7YzXamjLqhN47sQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AE4+wouvDCC4v1++67r1i/4oor2t73nj17ivWFCxe2ve2pjHF2IDnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCcfYBMGPGjGL9s88+61MnJ2727NqZvyRJ69evr60tW7aso33Pnz+/WB8Zafnr5lMS4+xAcoQdSIKwA0kQdiAJwg4kQdiBJAg7kETLWVzRuaGhoWL9hRdeKNafeOKJYn3Hjh21tVZjzatWrSrWp02bVqy3GutetGhRsV7y3nvvFetZx9Hb1fLIbnuB7W22d9p+y/aPq+Vn2n7G9jvVbfnqCgCNmsxp/BFJ/xARfybpryT9yPafS7pT0rMRcYGkZ6vHAAZUy7BHxEhEvFrdPyxpp6T5kpZJ2lg9baOk63rUI4AuOKH37LYXSloi6feSzomIEWnsPwTbZ9esMyxpuMM+AXRo0mG3PVPSZkk/iYhP7AmvtT9ORKyTtK7aBl+EARoyqaE329M0FvRfRcRj1eJR23Or+lxJB3vTIoBuaHlk99gh/CFJOyPiF+NKWyStkLSmun28Jx1OATfccEOxfu655xbrN910UzfbOSGtzuA6+Yr0p59+WqzffPPNbW8bx5vMafzlkv5O0pu2X6+WrdZYyH9je5WkPZLK/6IBNKpl2CPiBUl1/71/t7vtAOgVLpcFkiDsQBKEHUiCsANJEHYgCb7i2gdnnXVW0y30zObNm4v1e++9t7Z28GD5OqwDBw601RMmxpEdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5JgyuY+aPVzzFdeeWWxfuONNxbr8+bNq619/PHHxXVbeeCBB4r1559/vlg/cuRIR/vHiWPKZiA5wg4kQdiBJAg7kARhB5Ig7EAShB1IgnF2YIphnB1IjrADSRB2IAnCDiRB2IEkCDuQBGEHkmgZdtsLbG+zvdP2W7Z/XC2/2/YHtl+v/q7pfbsA2tXyohrbcyXNjYhXbc+S9Iqk6yT9raRPI+JfJr0zLqoBeq7uoprJzM8+Immkun/Y9k5J87vbHoBeO6H37LYXSloi6ffVoltsv2F7ve3ZNesM295ue3tnrQLoxKSvjbc9U9J/SvpZRDxm+xxJH0oKSfdq7FT/phbb4DQe6LG60/hJhd32NEm/k7Q1In4xQX2hpN9FxF+02A5hB3qs7S/C2LakhyTtHB/06oO7o74vaUenTQLoncl8Gr9U0vOS3pT0VbV4taTlki7S2Gn8bkk/rD7MK22LIzvQYx2dxncLYQd6j++zA8kRdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkmj5g5Nd9qGk/x73eE61bBANam+D2pdEb+3qZm/n1xX6+n3243Zub4+ISxproGBQexvUviR6a1e/euM0HkiCsANJNB32dQ3vv2RQexvUviR6a1dfemv0PTuA/mn6yA6gTwg7kEQjYbd9te0/2n7X9p1N9FDH9m7bb1bTUDc6P101h95B2zvGLTvT9jO236luJ5xjr6HeBmIa78I0442+dk1Pf9739+y2T5a0S9JVkvZJelnS8oj4Q18bqWF7t6RLIqLxCzBs/7WkTyX969GptWz/s6RDEbGm+o9ydkT844D0drdOcBrvHvVWN83436vB166b05+3o4kj+6WS3o2I9yPiT5J+LWlZA30MvIh4TtKhYxYvk7Sxur9RY/9Y+q6mt4EQESMR8Wp1/7Cko9OMN/raFfrqiybCPl/S3nGP92mw5nsPSU/bfsX2cNPNTOCco9NsVbdnN9zPsVpO491Px0wzPjCvXTvTn3eqibBPNDXNII3/XR4RfynpbyT9qDpdxeSslTSksTkARyT9vMlmqmnGN0v6SUR80mQv403QV19etybCvk/SgnGPvylpfwN9TCgi9le3ByX9VmNvOwbJ6NEZdKvbgw33838iYjQivoyIryT9Ug2+dtU045sl/SoiHqsWN/7aTdRXv163JsL+sqQLbH/L9jck/UDSlgb6OI7t06oPTmT7NEnf0+BNRb1F0orq/gpJjzfYy9cMyjTeddOMq+HXrvHpzyOi73+SrtHYJ/LvSfqnJnqo6evbkv6r+nur6d4kPaKx07r/0dgZ0SpJZ0l6VtI71e2ZA9Tbv2lsau83NBasuQ31tlRjbw3fkPR69XdN069doa++vG5cLgskwRV0QBKEHUiCsANJEHYgCcIOJEHYgSQIO5DE/wLwpj8ONnyk5wAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.imshow(xb[0].view(28,28))\n", "yb[0]" ] }, { "cell_type": "code", "execution_count": 185, "metadata": {}, "outputs": [], "source": [ "model,opt = get_model()" ] }, { "cell_type": "code", "execution_count": 186, "metadata": {}, "outputs": [], "source": [ "def fit():\n", " for epoch in range(epochs):\n", " for xb,yb in train_dl:\n", " pred = model(xb)\n", " loss = loss_func(pred, yb)\n", " loss.backward()\n", " opt.step()\n", " opt.zero_grad()" ] }, { "cell_type": "code", "execution_count": 187, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(tensor(0.05, grad_fn=), tensor(1.))" ] }, "execution_count": 187, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fit()\n", "loss_func(model(xb), yb), accuracy(model(xb), yb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Random sampling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want our training set to be in a random order, and that order should differ each iteration. But the validation set shouldn't be randomized." ] }, { "cell_type": "code", "execution_count": 188, "metadata": {}, "outputs": [], "source": [ "import random" ] }, { "cell_type": "code", "execution_count": 189, "metadata": {}, "outputs": [], "source": [ "class Sampler():\n", " def __init__(self, ds, shuffle=False): self.n,self.shuffle = len(ds),shuffle\n", " def __iter__(self):\n", " res = list(range(self.n))\n", " if self.shuffle: random.shuffle(res)\n", " return iter(res)" ] }, { "cell_type": "code", "execution_count": 190, "metadata": {}, "outputs": [], "source": [ "from itertools import islice" ] }, { "cell_type": "code", "execution_count": 191, "metadata": {}, "outputs": [], "source": [ "ss = Sampler(train_ds)" ] }, { "cell_type": "code", "execution_count": 192, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "1\n", "2\n", "3\n", "4\n" ] } ], "source": [ "it = iter(ss)\n", "for o in range(5): print(next(it))" ] }, { "cell_type": "code", "execution_count": 193, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 2, 3, 4]" ] }, "execution_count": 193, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(islice(ss, 5))" ] }, { "cell_type": "code", "execution_count": 194, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2468, 34785, 22293, 22313, 36680]" ] }, "execution_count": 194, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ss = Sampler(train_ds, shuffle=True)\n", "list(islice(ss, 5))" ] }, { "cell_type": "code", "execution_count": 195, "metadata": {}, "outputs": [], "source": [ "import fastcore.all as fc" ] }, { "cell_type": "code", "execution_count": 196, "metadata": {}, "outputs": [], "source": [ "class BatchSampler():\n", " def __init__(self, sampler, bs, drop_last=False): fc.store_attr()\n", " def __iter__(self): yield from fc.chunked(iter(self.sampler), self.bs, drop_last=self.drop_last)" ] }, { "cell_type": "code", "execution_count": 197, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[[3338, 3713, 47999, 33349],\n", " [1382, 28497, 19584, 35095],\n", " [33760, 20524, 1959, 7968],\n", " [40952, 25061, 32207, 20443],\n", " [11419, 11479, 45286, 40070]]" ] }, "execution_count": 197, "metadata": {}, "output_type": "execute_result" } ], "source": [ "batchs = BatchSampler(ss, 4)\n", "list(islice(batchs, 5))" ] }, { "cell_type": "code", "execution_count": 198, "metadata": {}, "outputs": [], "source": [ "def collate(b):\n", " xs,ys = zip(*b)\n", " return torch.stack(xs),torch.stack(ys)" ] }, { "cell_type": "code", "execution_count": 199, "metadata": {}, "outputs": [], "source": [ "class DataLoader():\n", " def __init__(self, ds, batchs, collate_fn=collate): fc.store_attr()\n", " def __iter__(self): yield from (self.collate_fn(self.ds[i] for i in b) for b in self.batchs)" ] }, { "cell_type": "code", "execution_count": 200, "metadata": {}, "outputs": [], "source": [ "train_samp = BatchSampler(Sampler(train_ds, shuffle=True ), bs)\n", "valid_samp = BatchSampler(Sampler(valid_ds, shuffle=False), bs)" ] }, { "cell_type": "code", "execution_count": 201, "metadata": {}, "outputs": [], "source": [ "train_dl = DataLoader(train_ds, batchs=train_samp, collate_fn=collate)\n", "valid_dl = DataLoader(valid_ds, batchs=valid_samp, collate_fn=collate)" ] }, { "cell_type": "code", "execution_count": 202, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor(3)" ] }, "execution_count": 202, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAANeElEQVR4nO3df6hc9ZnH8c9HTTExQaNBTdJo2hv/2GUxZhVZMSzVYnFFiBVcGnDJxsCtUKHVVVayQkUphGVbBf+IpBiSXbuWmtg1VCWKhPUXFOOP1djY+INsEnNzgwY0otKNPvvHPVmuyT3fuZlfZ/Y+7xdcZuY8c855GPLJOTPfM/N1RAjA1HdS0w0A6A/CDiRB2IEkCDuQBGEHkjilnzuzzUf/QI9FhCda3tGR3fbVtv9o+13bd3ayLQC95XbH2W2fLGmXpKsk7ZP0sqTlEfGHwjoc2YEe68WR/VJJ70bE+xHxJ0m/lrSsg+0B6KFOwj5f0t5xj/dVy77G9rDt7ba3d7AvAB3q5AO6iU4VjjtNj4h1ktZJnMYDTerkyL5P0oJxj78paX9n7QDolU7C/rKkC2x/y/Y3JP1A0pbutAWg29o+jY+II7ZvkbRV0smS1kfEW13rDEBXtT301tbOeM8O9FxPLqoB8P8HYQeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCsANJ9HXKZrRn8eLFxfqtt95aWxsaGiquO2PGjGJ99erVxfrpp59erD/11FO1tcOHDxfXRXdxZAeSIOxAEoQdSIKwA0kQdiAJwg4kQdiBJJjFdQDMnDmzWN+zZ0+xfsYZZ3Sxm+764IMPamul6wMkadOmTd1uJ4W6WVw7uqjG9m5JhyV9KelIRFzSyfYA9E43rqC7IiI+7MJ2APQQ79mBJDoNe0h62vYrtocneoLtYdvbbW/vcF8AOtDpafzlEbHf9tmSnrH9dkQ8N/4JEbFO0jqJD+iAJnV0ZI+I/dXtQUm/lXRpN5oC0H1th932abZnHb0v6XuSdnSrMQDd1fY4u+1va+xoLo29Hfj3iPhZi3U4jZ/ArFmzivUnn3yyWP/oo49qa6+99lpx3SVLlhTr559/frG+YMGCYn369Om1tdHR0eK6l112WbHeav2suj7OHhHvSyr/qgKAgcHQG5AEYQeSIOxAEoQdSIKwA0nwFVd0ZM6cOcX6HXfc0VZNklauXFmsb9y4sVjPqm7ojSM7kARhB5Ig7EAShB1IgrADSRB2IAnCDiTBlM3oyIcfln9r9MUXX6yttRpnb/X1W8bZTwxHdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgnF2dGT27NnF+urVq9ve9rx589peF8fjyA4kQdiBJAg7kARhB5Ig7EAShB1IgrADSfC78ShavLg8Ue+jjz5arC9atKi2tmvXruK6V111VbG+d+/eYj2rtn833vZ62wdt7xi37Ezbz9h+p7otX1kBoHGTOY3fIOnqY5bdKenZiLhA0rPVYwADrGXYI+I5SYeOWbxM0tHfBNoo6brutgWg29q9Nv6ciBiRpIgYsX123RNtD0sabnM/ALqk51+EiYh1ktZJfEAHNKndobdR23Mlqbo92L2WAPRCu2HfImlFdX+FpMe70w6AXmk5zm77EUnfkTRH0qikn0r6D0m/kXSepD2SboiIYz/Em2hbnMYPmBUrVhTr99xzT7G+YMGCYv3zzz+vrV177bXFdbdt21asY2J14+wt37NHxPKa0nc76ghAX3G5LJAEYQeSIOxAEoQdSIKwA0nwU9JTwMyZM2trt99+e3Hdu+66q1g/6aTy8eDQofKI69KlS2trb7/9dnFddBdHdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgnH2KWDDhg21teuvv76jbW/atKlYv//++4t1xtIHB0d2IAnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCcfYpYGhoqGfbXrt2bbH+0ksv9Wzf6C6O7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBOPsU8DTTz9dW1u8eHHPti21Hodfs2ZNbW3//v1t9YT2tDyy215v+6DtHeOW3W37A9uvV3/X9LZNAJ2azGn8BklXT7D8voi4qPp7srttAei2lmGPiOcklef4ATDwOvmA7hbbb1Sn+bPrnmR72PZ229s72BeADrUb9rWShiRdJGlE0s/rnhgR6yLikoi4pM19AeiCtsIeEaMR8WVEfCXpl5Iu7W5bALqtrbDbnjvu4fcl7ah7LoDB4IgoP8F+RNJ3JM2RNCrpp9XjiySFpN2SfhgRIy13Zpd3hrZMnz69tvbwww8X17344ouL9fPOO6+tno46cOBAbW3lypXFdbdu3drRvrOKCE+0vOVFNRGxfILFD3XcEYC+4nJZIAnCDiRB2IEkCDuQBGEHkmg59NbVnTH01nennnpqsX7KKeUBmU8++aSb7XzNF198UazfdtttxfqDDz7YzXamjLqhN47sQBKEHUiCsANJEHYgCcIOJEHYgSQIO5AE4+wouvDCC4v1++67r1i/4oor2t73nj17ivWFCxe2ve2pjHF2IDnCDiRB2IEkCDuQBGEHkiDsQBKEHUiCcfYBMGPGjGL9s88+61MnJ2727NqZvyRJ69evr60tW7aso33Pnz+/WB8Zafnr5lMS4+xAcoQdSIKwA0kQdiAJwg4kQdiBJAg7kETLWVzRuaGhoWL9hRdeKNafeOKJYn3Hjh21tVZjzatWrSrWp02bVqy3GutetGhRsV7y3nvvFetZx9Hb1fLIbnuB7W22d9p+y/aPq+Vn2n7G9jvVbfnqCgCNmsxp/BFJ/xARfybpryT9yPafS7pT0rMRcYGkZ6vHAAZUy7BHxEhEvFrdPyxpp6T5kpZJ2lg9baOk63rUI4AuOKH37LYXSloi6feSzomIEWnsPwTbZ9esMyxpuMM+AXRo0mG3PVPSZkk/iYhP7AmvtT9ORKyTtK7aBl+EARoyqaE329M0FvRfRcRj1eJR23Or+lxJB3vTIoBuaHlk99gh/CFJOyPiF+NKWyStkLSmun28Jx1OATfccEOxfu655xbrN910UzfbOSGtzuA6+Yr0p59+WqzffPPNbW8bx5vMafzlkv5O0pu2X6+WrdZYyH9je5WkPZLK/6IBNKpl2CPiBUl1/71/t7vtAOgVLpcFkiDsQBKEHUiCsANJEHYgCb7i2gdnnXVW0y30zObNm4v1e++9t7Z28GD5OqwDBw601RMmxpEdSIKwA0kQdiAJwg4kQdiBJAg7kARhB5JgyuY+aPVzzFdeeWWxfuONNxbr8+bNq619/PHHxXVbeeCBB4r1559/vlg/cuRIR/vHiWPKZiA5wg4kQdiBJAg7kARhB5Ig7EAShB1IgnF2YIphnB1IjrADSRB2IAnCDiRB2IEkCDuQBGEHkmgZdtsLbG+zvdP2W7Z/XC2/2/YHtl+v/q7pfbsA2tXyohrbcyXNjYhXbc+S9Iqk6yT9raRPI+JfJr0zLqoBeq7uoprJzM8+Immkun/Y9k5J87vbHoBeO6H37LYXSloi6ffVoltsv2F7ve3ZNesM295ue3tnrQLoxKSvjbc9U9J/SvpZRDxm+xxJH0oKSfdq7FT/phbb4DQe6LG60/hJhd32NEm/k7Q1In4xQX2hpN9FxF+02A5hB3qs7S/C2LakhyTtHB/06oO7o74vaUenTQLoncl8Gr9U0vOS3pT0VbV4taTlki7S2Gn8bkk/rD7MK22LIzvQYx2dxncLYQd6j++zA8kRdiAJwg4kQdiBJAg7kARhB5Ig7EAShB1IgrADSRB2IAnCDiRB2IEkCDuQBGEHkmj5g5Nd9qGk/x73eE61bBANam+D2pdEb+3qZm/n1xX6+n3243Zub4+ISxproGBQexvUviR6a1e/euM0HkiCsANJNB32dQ3vv2RQexvUviR6a1dfemv0PTuA/mn6yA6gTwg7kEQjYbd9te0/2n7X9p1N9FDH9m7bb1bTUDc6P101h95B2zvGLTvT9jO236luJ5xjr6HeBmIa78I0442+dk1Pf9739+y2T5a0S9JVkvZJelnS8oj4Q18bqWF7t6RLIqLxCzBs/7WkTyX969GptWz/s6RDEbGm+o9ydkT844D0drdOcBrvHvVWN83436vB166b05+3o4kj+6WS3o2I9yPiT5J+LWlZA30MvIh4TtKhYxYvk7Sxur9RY/9Y+q6mt4EQESMR8Wp1/7Cko9OMN/raFfrqiybCPl/S3nGP92mw5nsPSU/bfsX2cNPNTOCco9NsVbdnN9zPsVpO491Px0wzPjCvXTvTn3eqibBPNDXNII3/XR4RfynpbyT9qDpdxeSslTSksTkARyT9vMlmqmnGN0v6SUR80mQv403QV19etybCvk/SgnGPvylpfwN9TCgi9le3ByX9VmNvOwbJ6NEZdKvbgw33838iYjQivoyIryT9Ug2+dtU045sl/SoiHqsWN/7aTdRXv163JsL+sqQLbH/L9jck/UDSlgb6OI7t06oPTmT7NEnf0+BNRb1F0orq/gpJjzfYy9cMyjTeddOMq+HXrvHpzyOi73+SrtHYJ/LvSfqnJnqo6evbkv6r+nur6d4kPaKx07r/0dgZ0SpJZ0l6VtI71e2ZA9Tbv2lsau83NBasuQ31tlRjbw3fkPR69XdN069doa++vG5cLgskwRV0QBKEHUiCsANJEHYgCcIOJEHYgSQIO5DE/wLwpj8ONnyk5wAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "xb,yb = next(iter(valid_dl))\n", "plt.imshow(xb[0].view(28,28))\n", "yb[0]" ] }, { "cell_type": "code", "execution_count": 203, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(torch.Size([64, 784]), torch.Size([64]))" ] }, "execution_count": 203, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xb.shape,yb.shape" ] }, { "cell_type": "code", "execution_count": 204, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(tensor(0.04, grad_fn=), tensor(1.))" ] }, "execution_count": 204, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model,opt = get_model()\n", "fit()\n", "\n", "loss_func(model(xb), yb), accuracy(model(xb), yb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multiprocessing DataLoader" ] }, { "cell_type": "code", "execution_count": 205, "metadata": {}, "outputs": [], "source": [ "import torch.multiprocessing as mp\n", "from fastcore.basics import store_attr" ] }, { "cell_type": "code", "execution_count": 206, "metadata": {}, "outputs": [], "source": [ "class DataLoader():\n", " def __init__(self, ds, batchs, n_workers=1, collate_fn=collate): fc.store_attr()\n", " def __iter__(self):\n", " with mp.Pool(self.n_workers) as ex: yield from ex.map(self.ds.__getitem__, iter(self.batchs))" ] }, { "cell_type": "code", "execution_count": 207, "metadata": {}, "outputs": [], "source": [ "train_dl = DataLoader(train_ds, batchs=train_samp, collate_fn=collate, n_workers=2)\n", "it = iter(train_dl)" ] }, { "cell_type": "code", "execution_count": 208, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(torch.Size([64, 784]), torch.Size([64]))" ] }, "execution_count": 208, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xb,yb = next(it)\n", "xb.shape,yb.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### PyTorch DataLoader" ] }, { "cell_type": "code", "execution_count": 209, "metadata": {}, "outputs": [], "source": [ "from torch.utils.data import DataLoader, SequentialSampler, RandomSampler, BatchSampler" ] }, { "cell_type": "code", "execution_count": 210, "metadata": {}, "outputs": [], "source": [ "train_samp = BatchSampler(RandomSampler(train_ds), bs, drop_last=False)\n", "valid_samp = BatchSampler(SequentialSampler(valid_ds), bs, drop_last=False)" ] }, { "cell_type": "code", "execution_count": 211, "metadata": {}, "outputs": [], "source": [ "train_dl = DataLoader(train_ds, batch_sampler=train_samp, collate_fn=collate)\n", "valid_dl = DataLoader(valid_ds, batch_sampler=valid_samp, collate_fn=collate)" ] }, { "cell_type": "code", "execution_count": 212, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(tensor(0.09, grad_fn=), tensor(0.98))" ] }, "execution_count": 212, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model,opt = get_model()\n", "fit()\n", "loss_func(model(xb), yb), accuracy(model(xb), yb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyTorch can auto-generate the BatchSampler for us:" ] }, { "cell_type": "code", "execution_count": 213, "metadata": {}, "outputs": [], "source": [ "train_dl = DataLoader(train_ds, bs, sampler=RandomSampler(train_ds), collate_fn=collate)\n", "valid_dl = DataLoader(valid_ds, bs, sampler=SequentialSampler(valid_ds), collate_fn=collate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyTorch can also generate the Sequential/RandomSamplers too:" ] }, { "cell_type": "code", "execution_count": 214, "metadata": {}, "outputs": [], "source": [ "train_dl = DataLoader(train_ds, bs, shuffle=True, drop_last=True, num_workers=2)\n", "valid_dl = DataLoader(valid_ds, bs, shuffle=False, num_workers=2)" ] }, { "cell_type": "code", "execution_count": 215, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(tensor(0.05, grad_fn=), tensor(0.98))" ] }, "execution_count": 215, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model,opt = get_model()\n", "fit()\n", "\n", "loss_func(model(xb), yb), accuracy(model(xb), yb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our dataset actually already knows how to sample a batch of indices all at once:" ] }, { "cell_type": "code", "execution_count": 216, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(tensor([[0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.],\n", " [0., 0., 0., ..., 0., 0., 0.]]),\n", " tensor([9, 1, 3]))" ] }, "execution_count": 216, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_ds[[4,6,7]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "...that means that we can actually skip the batch_sampler and collate_fn entirely:" ] }, { "cell_type": "code", "execution_count": 217, "metadata": {}, "outputs": [], "source": [ "train_dl = DataLoader(train_ds, sampler=train_samp)\n", "valid_dl = DataLoader(valid_ds, sampler=valid_samp)" ] }, { "cell_type": "code", "execution_count": 218, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(torch.Size([1, 64, 784]), torch.Size([1, 64]))" ] }, "execution_count": 218, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xb,yb = next(iter(train_dl))\n", "xb.shape,yb.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Validation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You **always** should also have a [validation set](http://www.fast.ai/2017/11/13/validation-sets/), in order to identify if you are overfitting.\n", "\n", "We will calculate and print the validation loss at the end of each epoch.\n", "\n", "(Note that we always call `model.train()` before training, and `model.eval()` before inference, because these are used by layers such as `nn.BatchNorm2d` and `nn.Dropout` to ensure appropriate behaviour for these different phases.)" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [], "source": [ "def fit(epochs, model, loss_func, opt, train_dl, valid_dl):\n", " for epoch in range(epochs):\n", " model.train()\n", " for xb,yb in train_dl:\n", " loss = loss_func(model(xb), yb)\n", " loss.backward()\n", " opt.step()\n", " opt.zero_grad()\n", "\n", " model.eval()\n", " with torch.no_grad():\n", " tot_loss,tot_acc,count = 0.,0.,0\n", " for xb,yb in valid_dl:\n", " pred = model(xb)\n", " n = len(xb)\n", " count += n\n", " tot_loss += loss_func(pred, yb).item()*n\n", " tot_acc += accuracy (pred,yb).item()*n\n", " print(epoch, tot_loss/count, tot_acc/count)\n", " return tot_loss/count, tot_acc/count" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [], "source": [ "def get_dls(train_ds, valid_ds, bs, **kwargs):\n", " return (DataLoader(train_ds, batch_size=bs, shuffle=True, **kwargs),\n", " DataLoader(valid_ds, batch_size=bs*2, **kwargs))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, our whole process of obtaining the data loaders and fitting the model can be run in 3 lines of code:" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 0.2599395067691803 0.9152\n", "1 0.1375726773455739 0.9599\n", "2 0.10632649689242243 0.9696\n", "3 0.11931819728948176 0.9643\n", "4 0.15087997979782522 0.9549\n" ] } ], "source": [ "train_dl,valid_dl = get_dls(train_ds, valid_ds, bs)\n", "model,opt = get_model()\n", "loss,acc = fit(5, model, loss_func, opt, train_dl, valid_dl)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.10" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }