{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "! [ -e /content ] && pip install -Uqq fastbook\n", "import fastbook\n", "fastbook.setup_book()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "from fastai.vision.all import *\n", "from fastbook import *\n", "\n", "matplotlib.rc('image', cmap='Greys')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Under the Hood: Training a Digit Classifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pixels: The Foundations of Computer Vision" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sidebar: Tenacity and Deep Learning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## End sidebar" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path = untar_data(URLs.MNIST_SAMPLE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "Path.BASE_PATH = path" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path.ls()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(path/'train').ls()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "threes = (path/'train'/'3').ls().sorted()\n", "sevens = (path/'train'/'7').ls().sorted()\n", "threes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "im3_path = threes[1]\n", "im3 = Image.open(im3_path)\n", "im3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "array(im3)[4:10,4:10]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tensor(im3)[4:10,4:10]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "im3_t = tensor(im3)\n", "df = pd.DataFrame(im3_t[4:15,4:22])\n", "df.style.set_properties(**{'font-size':'6pt'}).background_gradient('Greys')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## First Try: Pixel Similarity" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "seven_tensors = [tensor(Image.open(o)) for o in sevens]\n", "three_tensors = [tensor(Image.open(o)) for o in threes]\n", "len(three_tensors),len(seven_tensors)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_image(three_tensors[1]);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stacked_sevens = torch.stack(seven_tensors).float()/255\n", "stacked_threes = torch.stack(three_tensors).float()/255\n", "stacked_threes.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "len(stacked_threes.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stacked_threes.ndim" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mean3 = stacked_threes.mean(0)\n", "show_image(mean3);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mean7 = stacked_sevens.mean(0)\n", "show_image(mean7);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a_3 = stacked_threes[1]\n", "show_image(a_3);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dist_3_abs = (a_3 - mean3).abs().mean()\n", "dist_3_sqr = ((a_3 - mean3)**2).mean().sqrt()\n", "dist_3_abs,dist_3_sqr" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dist_7_abs = (a_3 - mean7).abs().mean()\n", "dist_7_sqr = ((a_3 - mean7)**2).mean().sqrt()\n", "dist_7_abs,dist_7_sqr" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "F.l1_loss(a_3.float(),mean7), F.mse_loss(a_3,mean7).sqrt()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NumPy Arrays and PyTorch Tensors" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = [[1,2,3],[4,5,6]]\n", "arr = array (data)\n", "tns = tensor(data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "arr # numpy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tns # pytorch" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tns[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tns[:,1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tns[1,1:3]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tns+1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tns.type()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tns*1.5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing Metrics Using Broadcasting" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "valid_3_tens = torch.stack([tensor(Image.open(o)) \n", " for o in (path/'valid'/'3').ls()])\n", "valid_3_tens = valid_3_tens.float()/255\n", "valid_7_tens = torch.stack([tensor(Image.open(o)) \n", " for o in (path/'valid'/'7').ls()])\n", "valid_7_tens = valid_7_tens.float()/255\n", "valid_3_tens.shape,valid_7_tens.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def mnist_distance(a,b): return (a-b).abs().mean((-1,-2))\n", "mnist_distance(a_3, mean3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "valid_3_dist = mnist_distance(valid_3_tens, mean3)\n", "valid_3_dist, valid_3_dist.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tensor([1,2,3]) + tensor(1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(valid_3_tens-mean3).shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def is_3(x): return mnist_distance(x,mean3) < mnist_distance(x,mean7)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "is_3(a_3), is_3(a_3).float()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "is_3(valid_3_tens)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "accuracy_3s = is_3(valid_3_tens).float() .mean()\n", "accuracy_7s = (1 - is_3(valid_7_tens).float()).mean()\n", "\n", "accuracy_3s,accuracy_7s,(accuracy_3s+accuracy_7s)/2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stochastic Gradient Descent (SGD)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gv('''\n", "init->predict->loss->gradient->step->stop\n", "step->predict[label=repeat]\n", "''')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def f(x): return x**2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_function(f, 'x', 'x**2')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_function(f, 'x', 'x**2')\n", "plt.scatter(-1.5, f(-1.5), color='red');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculating Gradients" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xt = tensor(3.).requires_grad_()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yt = f(xt)\n", "yt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yt.backward()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xt.grad" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xt = tensor([3.,4.,10.]).requires_grad_()\n", "xt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def f(x): return (x**2).sum()\n", "\n", "yt = f(xt)\n", "yt" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "yt.backward()\n", "xt.grad" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stepping With a Learning Rate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### An End-to-End SGD Example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "time = torch.arange(0,20).float(); time" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "speed = torch.randn(20)*3 + 0.75*(time-9.5)**2 + 1\n", "plt.scatter(time,speed);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def f(t, params):\n", " a,b,c = params\n", " return a*(t**2) + (b*t) + c" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def mse(preds, targets): return ((preds-targets)**2).mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 1: Initialize the parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "params = torch.randn(3).requires_grad_()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "orig_params = params.clone()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 2: Calculate the predictions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "preds = f(time, params)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def show_preds(preds, ax=None):\n", " if ax is None: ax=plt.subplots()[1]\n", " ax.scatter(time, speed)\n", " ax.scatter(time, to_np(preds), color='red')\n", " ax.set_ylim(-300,100)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_preds(preds)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 3: Calculate the loss" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "loss = mse(preds, speed)\n", "loss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 4: Calculate the gradients" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "loss.backward()\n", "params.grad" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "params.grad * 1e-5" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "params" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 5: Step the weights. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lr = 1e-5\n", "params.data -= lr * params.grad.data\n", "params.grad = None" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "preds = f(time,params)\n", "mse(preds, speed)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_preds(preds)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def apply_step(params, prn=True):\n", " preds = f(time, params)\n", " loss = mse(preds, speed)\n", " loss.backward()\n", " params.data -= lr * params.grad.data\n", " params.grad = None\n", " if prn: print(loss.item())\n", " return preds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 6: Repeat the process " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i in range(10): apply_step(params)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#hide\n", "params = orig_params.detach().requires_grad_()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_,axs = plt.subplots(1,4,figsize=(12,3))\n", "for ax in axs: show_preds(apply_step(params, False), ax)\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 7: stop" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summarizing Gradient Descent" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gv('''\n", "init->predict->loss->gradient->step->stop\n", "step->predict[label=repeat]\n", "''')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The MNIST Loss Function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_x = torch.cat([stacked_threes, stacked_sevens]).view(-1, 28*28)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_y = tensor([1]*len(threes) + [0]*len(sevens)).unsqueeze(1)\n", "train_x.shape,train_y.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dset = list(zip(train_x,train_y))\n", "x,y = dset[0]\n", "x.shape,y" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "valid_x = torch.cat([valid_3_tens, valid_7_tens]).view(-1, 28*28)\n", "valid_y = tensor([1]*len(valid_3_tens) + [0]*len(valid_7_tens)).unsqueeze(1)\n", "valid_dset = list(zip(valid_x,valid_y))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def init_params(size, std=1.0): return (torch.randn(size)*std).requires_grad_()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "weights = init_params((28*28,1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bias = init_params(1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(train_x[0]*weights.T).sum() + bias" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def linear1(xb): return xb@weights + bias\n", "preds = linear1(train_x)\n", "preds" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "corrects = (preds>0.0).float() == train_y\n", "corrects" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "corrects.float().mean().item()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "with torch.no_grad(): weights[0] *= 1.0001" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "preds = linear1(train_x)\n", "((preds>0.0).float() == train_y).float().mean().item()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "trgts = tensor([1,0,1])\n", "prds = tensor([0.9, 0.4, 0.2])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def mnist_loss(predictions, targets):\n", " return torch.where(targets==1, 1-predictions, predictions).mean()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "torch.where(trgts==1, 1-prds, prds)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mnist_loss(prds,trgts)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mnist_loss(tensor([0.9, 0.4, 0.8]),trgts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sigmoid" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def sigmoid(x): return 1/(1+torch.exp(-x))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_function(torch.sigmoid, title='Sigmoid', min=-4, max=4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def mnist_loss(predictions, targets):\n", " predictions = predictions.sigmoid()\n", " return torch.where(targets==1, 1-predictions, predictions).mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SGD and Mini-Batches" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "coll = range(15)\n", "dl = DataLoader(coll, batch_size=5, shuffle=True)\n", "list(dl)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds = L(enumerate(string.ascii_lowercase))\n", "ds" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dl = DataLoader(ds, batch_size=6, shuffle=True)\n", "list(dl)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Putting It All Together" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "weights = init_params((28*28,1))\n", "bias = init_params(1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dl = DataLoader(dset, batch_size=256)\n", "xb,yb = first(dl)\n", "xb.shape,yb.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "valid_dl = DataLoader(valid_dset, batch_size=256)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batch = train_x[:4]\n", "batch.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "preds = linear1(batch)\n", "preds" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "loss = mnist_loss(preds, train_y[:4])\n", "loss" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "loss.backward()\n", "weights.grad.shape,weights.grad.mean(),bias.grad" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def calc_grad(xb, yb, model):\n", " preds = model(xb)\n", " loss = mnist_loss(preds, yb)\n", " loss.backward()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "calc_grad(batch, train_y[:4], linear1)\n", "weights.grad.mean(),bias.grad" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "calc_grad(batch, train_y[:4], linear1)\n", "weights.grad.mean(),bias.grad" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "weights.grad.zero_()\n", "bias.grad.zero_();" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def train_epoch(model, lr, params):\n", " for xb,yb in dl:\n", " calc_grad(xb, yb, model)\n", " for p in params:\n", " p.data -= p.grad*lr\n", " p.grad.zero_()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(preds>0.0).float() == train_y[:4]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def batch_accuracy(xb, yb):\n", " preds = xb.sigmoid()\n", " correct = (preds>0.5) == yb\n", " return correct.float().mean()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batch_accuracy(linear1(batch), train_y[:4])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def validate_epoch(model):\n", " accs = [batch_accuracy(model(xb), yb) for xb,yb in valid_dl]\n", " return round(torch.stack(accs).mean().item(), 4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "validate_epoch(linear1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lr = 1.\n", "params = weights,bias\n", "train_epoch(linear1, lr, params)\n", "validate_epoch(linear1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i in range(20):\n", " train_epoch(linear1, lr, params)\n", " print(validate_epoch(linear1), end=' ')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating an Optimizer" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "linear_model = nn.Linear(28*28,1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "w,b = linear_model.parameters()\n", "w.shape,b.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class BasicOptim:\n", " def __init__(self,params,lr): self.params,self.lr = list(params),lr\n", "\n", " def step(self, *args, **kwargs):\n", " for p in self.params: p.data -= p.grad.data * self.lr\n", "\n", " def zero_grad(self, *args, **kwargs):\n", " for p in self.params: p.grad = None" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "opt = BasicOptim(linear_model.parameters(), lr)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def train_epoch(model):\n", " for xb,yb in dl:\n", " calc_grad(xb, yb, model)\n", " opt.step()\n", " opt.zero_grad()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "validate_epoch(linear_model)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def train_model(model, epochs):\n", " for i in range(epochs):\n", " train_epoch(model)\n", " print(validate_epoch(model), end=' ')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_model(linear_model, 20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "linear_model = nn.Linear(28*28,1)\n", "opt = SGD(linear_model.parameters(), lr)\n", "train_model(linear_model, 20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dls = DataLoaders(dl, valid_dl)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn = Learner(dls, nn.Linear(28*28,1), opt_func=SGD,\n", " loss_func=mnist_loss, metrics=batch_accuracy)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.fit(10, lr=lr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding a Nonlinearity" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def simple_net(xb): \n", " res = xb@w1 + b1\n", " res = res.max(tensor(0.0))\n", " res = res@w2 + b2\n", " return res" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "w1 = init_params((28*28,30))\n", "b1 = init_params(30)\n", "w2 = init_params((30,1))\n", "b2 = init_params(1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_function(F.relu)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "simple_net = nn.Sequential(\n", " nn.Linear(28*28,30),\n", " nn.ReLU(),\n", " nn.Linear(30,1)\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn = Learner(dls, simple_net, opt_func=SGD,\n", " loss_func=mnist_loss, metrics=batch_accuracy)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.fit(40, 0.1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(L(learn.recorder.values).itemgot(2));" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.recorder.values[-1][2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Going Deeper" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dls = ImageDataLoaders.from_folder(path)\n", "learn = vision_learner(dls, resnet18, pretrained=False,\n", " loss_func=F.cross_entropy, metrics=accuracy)\n", "learn.fit_one_cycle(1, 0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Jargon Recap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Questionnaire" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. How is a grayscale image represented on a computer? How about a color image?\n", "1. How are the files and folders in the `MNIST_SAMPLE` dataset structured? Why?\n", "1. Explain how the \"pixel similarity\" approach to classifying digits works.\n", "1. What is a list comprehension? Create one now that selects odd numbers from a list and doubles them.\n", "1. What is a \"rank-3 tensor\"?\n", "1. What is the difference between tensor rank and shape? How do you get the rank from the shape?\n", "1. What are RMSE and L1 norm?\n", "1. How can you apply a calculation on thousands of numbers at once, many thousands of times faster than a Python loop?\n", "1. Create a 3×3 tensor or array containing the numbers from 1 to 9. Double it. Select the bottom-right four numbers.\n", "1. What is broadcasting?\n", "1. Are metrics generally calculated using the training set, or the validation set? Why?\n", "1. What is SGD?\n", "1. Why does SGD use mini-batches?\n", "1. What are the seven steps in SGD for machine learning?\n", "1. How do we initialize the weights in a model?\n", "1. What is \"loss\"?\n", "1. Why can't we always use a high learning rate?\n", "1. What is a \"gradient\"?\n", "1. Do you need to know how to calculate gradients yourself?\n", "1. Why can't we use accuracy as a loss function?\n", "1. Draw the sigmoid function. What is special about its shape?\n", "1. What is the difference between a loss function and a metric?\n", "1. What is the function to calculate new weights using a learning rate?\n", "1. What does the `DataLoader` class do?\n", "1. Write pseudocode showing the basic steps taken in each epoch for SGD.\n", "1. Create a function that, if passed two arguments `[1,2,3,4]` and `'abcd'`, returns `[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]`. What is special about that output data structure?\n", "1. What does `view` do in PyTorch?\n", "1. What are the \"bias\" parameters in a neural network? Why do we need them?\n", "1. What does the `@` operator do in Python?\n", "1. What does the `backward` method do?\n", "1. Why do we have to zero the gradients?\n", "1. What information do we have to pass to `Learner`?\n", "1. Show Python or pseudocode for the basic steps of a training loop.\n", "1. What is \"ReLU\"? Draw a plot of it for values from `-2` to `+2`.\n", "1. What is an \"activation function\"?\n", "1. What's the difference between `F.relu` and `nn.ReLU`?\n", "1. The universal approximation theorem shows that any function can be approximated as closely as needed using just one nonlinearity. So why do we normally use more?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Further Research" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Create your own implementation of `Learner` from scratch, based on the training loop shown in this chapter.\n", "1. Complete all the steps in this chapter using the full MNIST datasets (that is, for all digits, not just 3s and 7s). This is a significant project and will take you quite a bit of time to complete! You'll need to do some of your own research to figure out how to overcome some obstacles you'll meet on the way." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "jupytext": { "split_at_heading": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 4 }