{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "<div>\n", "<img src=\"https://discuss.pytorch.org/uploads/default/original/2X/3/35226d9fbc661ced1c5d17e374638389178c3176.png\" width=\"400\" style=\"margin: 50px auto; display: block; position: relative; left: -30px;\" />\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": true }, "source": [ "<!--NAVIGATION-->\n", "# < [Modules](5-Modules.ipynb) | Convolutional Neural Networks | [Transfer Learning](7-Transfer-Learning.ipynb) >\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convolutional Neural Networks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we first give a short introduction to convolutions, which are a prerequisite to understand how CNNs work. \n", "Then, we will write code to create a CNN, load the data and train our model. \n", "\n", "<p>\n", "<img src=\"figures/sprinkle.jpg\" width=\"250\" style=\"margin-left: auto;margin-right: auto;display: block;\" />\n", "</p>\n", "\n", "### Table of Contents\n", "\n", "#### 1. [What is a convolution ?](#What-is-a-convolution-?) \n", "#### 2. [Building a CNN](#Building-a-CNN)\n", "#### 3. [Training our CNN](#Training-our-CNN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "! pip -q install colorama\n", "! wget -q https://github.com/theevann/amld-pytorch-workshop/raw/master/figures/image-city.jpg" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "import colorama\n", "from collections import OrderedDict\n", "from matplotlib import pyplot as plt \n", "\n", "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torch.optim as optim\n", "\n", "torch.set_printoptions(precision=3)" ] }, { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": true }, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": "true" }, "source": [ "# What is a convolution ?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Note: Below are **intuitive and practical** explanations that do not intent to be mathematically rigorous. This is intentional to make the understanding easier._" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": "true" }, "source": [ "## 1D Convolution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A convolution is an operation between two signals. \n", "In computer vision, one signal is usually the **input** (audio signal, image, ...), while the other signal is called **filter** or **kernel**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get the convolution output of an input vector and a kernel:\n", "- **Slide the kernel** at each different possible positions in the input\n", "- For each position, perform the **element-wise product** between the kernel and the corresponding part of the input\n", "- **Sum** the result of the element-wise product" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can perform this operation in PyTorch using the `conv_1d` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "input = torch.Tensor([1,4,-1,0,2,-2,1,3,3,1]).view(1,1,-1) # Size: (Batch size, Num Channels, Input size)\n", "kernel = torch.Tensor([1,2,0,-1]).view(1,1,-1) # Size: (Num output channels, Num input channels, Kernel size)\n", "\n", "torch.nn.functional.conv1d(input, kernel)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": "true" }, "source": [ "## 2D Convolution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The extension to 2D is straightforward: the operation is similar, but now input and kernel are both 2-dimensional." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can perform this operation in PyTorch using the `conv_2d` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Input Size: (Batch size, Num channels, Input height, Input width)\n", "input = torch.Tensor([[3,3,2,1,0], [0,0,1,3,1], [3,1,2,2,3], [2,0,0,2,2], [2,0,0,0,1]]).view(1,1,5,5)\n", "\n", "# Kernel Size: (Num output channels, Num input channels, Kernel height, Kernel width)\n", "kernel = torch.Tensor([[0,1,2],[2,2,0],[0,1,2]]).view(1,1,3,3)\n", "\n", "torch.nn.functional.conv2d(input, kernel)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": "true" }, "source": [ "## Multiple channels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use have multiple channels in input. A color image usually have 3 input channels : RGB. \n", "Therefore, the kernel will also have channels, one for each input channel." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/conv-2d-in-channels.gif\" alt=\"drawing\" width=\"700\"/>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use multiple different kernels. The number of output channels corresponds to the number of kernel you use." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<img src=\"figures/conv-2d-out-channels.gif\" alt=\"drawing\" width=\"400\"/>" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": "true", "toc-hr-collapsed": true, "toc-nb-collapsed": true }, "source": [ "## Trying on a real image" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from PIL import Image\n", "\n", "image = Image.open(\"image-city.jpg\")\n", "print(image.size)\n", "image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Your Turn !**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Fill in the code cell below to perform the convolution of the above image with a kernel.** \n", "We will use a boundary detector filter as kernel:\n", "\n", "$$\\begin{bmatrix}\n", "-1 & 0 & 1 \\\\\n", "-1 & 0 & 1 \\\\\n", "-1 & 0 & 1 \n", "\\end{bmatrix}\n", "$$\n", "\n", "Pay attention to the dimensions of the input and the kernel:\n", "- Input Size: **Batch size** x **Num channels** x **Input height** x **Input width**\n", "- Kernel Size: **Num output channels** x **Num input channels** x **Kernel height** x **Kernel width**\n", "\n", "Note:\n", "- There are 3 input channels (since we have an rgb image)\n", "- In output, we want only one channel (ie. we define only one kernel)\n", "- Here, batch size will be 1." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# %load -r 1-8 solutions/solution_5.py\n", "from torchvision.transforms.functional import to_tensor, to_pil_image\n", "\n", "image_tensor = to_tensor(image)\n", "input = image_tensor.unsqueeze(0)\n", "kernel = torch.Tensor([-1,0,1]).expand(1,3,3,3)\n", "\n", "out = torch.nn.functional.conv2d(input, kernel)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "norm_out = (out - out.min()) / (out.max() - out.min()) # Map output to [0,1] for visualisation purposes\n", "to_pil_image(norm_out.squeeze())" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": "true" }, "source": [ "## Convolutions in `nn.Module`\n", "\n", "When using a convolution layer inside an `nn.Module`, we rather use the `nn.Conv2d` module. \n", "The kernels of the convolution are directly instantiated by `nn.Conv2d`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "conv_1 = nn.Conv2d(in_channels=3, out_channels=2, kernel_size=(3,3))\n", "\n", "print(\"Convolution\", conv_1)\n", "print(\"Kernel size: \", conv_1.weight.shape) # First two dimensions are: Num output channels and Num input channels" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Fake 5x5 input with 3 channels\n", "input = torch.randn(1, 3, 5, 5) # batch_size, num_channels, height, width\n", "\n", "out = conv_1(input)\n", "print(out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Animations credits: Francois Fleuret, Vincent Dumoulin_" ] }, { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": true }, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Building a CNN " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use the MNIST classification dataset again as our learning task. However, this time we will try to solve it using Convolutional Neural Networks. Let's build the LeNet-5 CNN with PyTorch !" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Defining the LeNet-5 architecture" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. \"Gradient-based learning applied to document recognition.\" Proceedings of the IEEE, 86(11):2278-2324, November 1998.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Note: The *Gaussian connections* in the last layer were used to estimate the lack of fit. In our implementation, we use a cross-entropy loss function as it's common nowadays. Similarly, ReLU are used instead of tanh activation._\n", "\n", "**Architecture Details**\n", "\n", "+ Convolutional part:\n", "\n", "\n", "| Layer | Name | Input channels | Output channels | Kernel | stride |\n", "| ----------- | :--: | :------------: | :-------------: | :----: | :----: |\n", "| Convolution | C1 | 1 | 6 | 5x5 | 1 |\n", "| ReLU | | 6 | 6 | | |\n", "| MaxPooling | S2 | 6 | 6 | 2x2 | 2 |\n", "| Convolution | C3 | 6 | 16 | 5x5 | 1 |\n", "| ReLU | | 16 | 16 | | |\n", "| MaxPooling | S4 | 16 | 16 | 2x2 | 2 |\n", "| Convolution | C5 | 16 | 120 | 5x5 | 1 |\n", "| ReLU | | 120 | 120 | | |\n", "\n", "\n", "+ Fully Connected part:\n", "\n", "| Layer | Name | Input size | Output size |\n", "| ---------- | :--: | :--------: | :---------: |\n", "| Linear | F5 | 120 | 84 |\n", "| ReLU | | | |\n", "| Linear | F6 | 84 | 10 |\n", "| LogSoftmax | | | |\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **Your turn !**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write a Pytorch module for the LeNet-5 model. \n", "You may need to use : `nn.Sequential`, `nn.Conv2d`, `nn.ReLU`, `nn.MaxPool2d`, `nn.Linear`, `nn.LogSoftmax`, `nn.Flatten`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# %load -s LeNet5 solutions/solution_5.py\n", "\n", "class LeNet5(nn.Module):\n", " def __init__(self):\n", " super(LeNet5, self).__init__()\n", "\n", " # YOUR TURN\n", " \n", " def forward(self, imgs):\n", " \n", " # YOUR TURN\n", " \n", " return output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An extensive list of all available layer types can be found on https://pytorch.org/docs/stable/nn.html." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Print a network summary" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "conv_net = LeNet5()\n", "print(conv_net)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Retrieve trainable parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "named_params = list(conv_net.named_parameters())\n", "print(\"len(params): %s\\n\" % len(named_params))\n", "\n", "for name, param in named_params:\n", " print(\"%s:\\t%s\" % (name, param.shape))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Feed network with a random input" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "input = torch.randn(1, 1, 32, 32) # batch_size, num_channels, height, width\n", "out = conv_net(input)\n", "print(\"Log-Probabilities: \\n%s\\n\" % out)\n", "print(\"Probabilities: \\n%s\\n\" % torch.exp(out))\n", "print(\"out.shape: \\n%s\" % (out.shape,))" ] }, { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": true }, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Training our CNN" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train function \n", "\n", "Similarly to the previous notebook, we define a train function `train_cnn`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def train_cnn(model, train_loader, test_loader, device, num_epochs=3, lr=0.1):\n", "\n", " # define an optimizer and a loss function\n", " optimizer = torch.optim.Adam(model.parameters(), lr=lr)\n", " criterion = torch.nn.CrossEntropyLoss()\n", "\n", " for epoch in range(num_epochs):\n", " print(\"=\" * 40, \"Starting epoch %d\" % (epoch + 1), \"=\" * 40)\n", " \n", " model.train() # Not necessary in our example, but still good practice.\n", " # Only models with nn.Dropout and nn.BatchNorm modules require it\n", " \n", " # dataloader returns batches of images for 'data' and a tensor with their respective labels in 'labels'\n", " for batch_idx, (data, labels) in enumerate(train_loader):\n", " data, labels = data.to(device), labels.to(device)\n", "\n", " optimizer.zero_grad()\n", " output = model(data)\n", " loss = criterion(output, labels)\n", " loss.backward()\n", " optimizer.step()\n", " \n", " if batch_idx % 40 == 0:\n", " print(\"Batch %d/%d, Loss=%.4f\" % (batch_idx, len(train_loader), loss.item()))\n", " \n", " # Compute the train and test accuracy at the end of each epoch\n", " train_acc = accuracy(model, train_loader, device)\n", " test_acc = accuracy(model, test_loader, device)\n", " \n", " print(colorama.Fore.GREEN, \"\\nAccuracy on training: %.2f%%\" % (100*train_acc))\n", " print(\"Accuracy on test: %.2f%%\" % (100*test_acc), colorama.Fore.RESET)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Test function\n", "We also define an `accuracy` function which can evaluate our model's accuracy on train/test data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def accuracy(model, dataloader, device):\n", " \"\"\" Computes the model's accuracy on the data provided by 'dataloader'\n", " \"\"\"\n", " model.eval()\n", " \n", " num_correct = 0\n", " num_samples = 0\n", " with torch.no_grad(): # deactivates autograd, reduces memory usage and speeds up computations\n", " for data, labels in dataloader:\n", " data, labels = data.to(device), labels.to(device)\n", "\n", " predictions = model(data).max(1)[1] # indices of the maxima along the second dimension\n", " num_correct += (predictions == labels).sum().item()\n", " num_samples += predictions.shape[0]\n", " \n", " return num_correct / num_samples" ] }, { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": true, "toc-nb-collapsed": true }, "source": [ "### Loading the train and test data with *`dataloaders`*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from torchvision import datasets, transforms\n", "\n", "transformations = transforms.Compose([\n", " transforms.Resize((32, 32)),\n", " transforms.ToTensor()\n", "])\n", "\n", "train_data = datasets.MNIST('./data', \n", " train = True, \n", " download = True,\n", " transform = transformations)\n", "\n", "test_data = datasets.MNIST('./data', \n", " train = False, \n", " download = True,\n", " transform = transformations)\n", "\n", "train_loader = torch.utils.data.DataLoader(train_data, batch_size=256, shuffle=True)\n", "test_loader = torch.utils.data.DataLoader(test_data, batch_size=1024, shuffle=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Start the training!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n", "conv_net.to(device)\n", "\n", "train_cnn(conv_net, train_loader, test_loader, device, lr=2e-3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's look at some of the model's predictions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def visualize_predictions(model, dataloader, device):\n", " data, labels = next(iter(dataloader))\n", " data, labels = data[:10].to(device), labels[:10]\n", " predictions = model(data).max(1)[1]\n", " \n", " predictions, data = predictions.cpu(), data.cpu()\n", " \n", " plt.figure(figsize=(16,9))\n", " for i in range(10):\n", " img = data.squeeze(1)[i]\n", " plt.subplot(1, 10, i+1)\n", " plt.imshow(img, cmap=\"gray\", interpolation=\"none\")\n", " plt.xlabel(predictions[i].item(), fontsize=18)\n", " plt.xticks([])\n", " plt.yticks([]) \n", " \n", "visualize_predictions(conv_net, test_loader, device)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___" ] }, { "cell_type": "markdown", "metadata": { "toc-hr-collapsed": true }, "source": [ "<!--NAVIGATION-->\n", "# < [Modules](5-Modules.ipynb) | Convolutional Neural Networks | [Transfer Learning](7-Transfer-Learning.ipynb) >\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" }, "toc-autonumbering": false }, "nbformat": 4, "nbformat_minor": 4 }