{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Convolutional Neural Networks in PyTorch\n", "> In this third chapter, we introduce convolutional neural networks, learning how to train them and how to use them to make predictions. This is the Summary of lecture \"Introduction to Deep Learning with PyTorch\", via datacamp.\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, PyTorch, Deep_Learning]\n", "- image: images/alexnet.png " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torchvision\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "plt.rcParams['figure.figsize'] = (8, 8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convolution operator\n", "- Problems with the Fully-connected nn\n", " - Do you need to consider all the relations between the features?\n", " - Fully connected nn are big and so very computationally inefficient\n", " - They have so many parameters, and so overfit\n", "- Main idea of CNN\n", " - Units are connected with only a few units from the previous layer\n", " - Units share weights\n", "![convolution](image/convolutions.png)\n", "- Convolving operation\n", "![convolution2](image/convolution2.png)\n", "- Activation map\n", "![act](image/act_map.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convolution operator - OOP way\n", "Let's kick off this chapter by using convolution operator from the `torch.nn` package. You are going to create a random tensor which will represent your image and random filters to convolve the image with. Then you'll apply those images." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch.Size([10, 6, 28, 28])\n" ] } ], "source": [ "# Create 10 random images of shape (1, 28, 28)\n", "images = torch.rand(10, 1, 28, 28)\n", "\n", "# Build 6 conv. filters\n", "conv_filters = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=(3, 3), stride=1, padding=1)\n", "\n", "# Convolve the image with the filters\n", "output_feature = conv_filters(images)\n", "print(output_feature.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convolution operator - Functional way\n", "While I and most of PyTorch practitioners love the `torch.nn` package (OOP way), other practitioners prefer building neural network models in a more functional way, using `torch.nn.functional`. More importantly, it is possible to mix the concepts and use both libraries at the same time (we have already done it in the previous chapter). You are going to build the same neural network you built in the previous exercise, but this time using the functional way." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch.Size([10, 6, 28, 28])\n" ] } ], "source": [ "# Create 10 random images\n", "images = torch.rand(10, 1, 28, 28)\n", "\n", "# Create 6 filters\n", "filters = torch.rand(6, 1, 3, 3)\n", "\n", "# Convolve the image with the filters\n", "output_feature = F.conv2d(images, filters, stride=1, padding=1)\n", "print(output_feature.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pooling operators\n", "- For feature selection,\n", "![pooling](image/pooling.png)\n", "- Max-Pooling\n", "![maxpool](image/maxpool.png)\n", "- Average-Pooling\n", "![avgpool](image/avgpool.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Max-pooling operator\n", "Here you are going to practice using max-pooling in both OOP and functional way, and see for yourself that the produced results are the same." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[[[0.3719, 0.5626, 0.0467, 0.5783, 0.0541, 0.5374],\n", " [0.0660, 0.8718, 0.4926, 0.1291, 0.3391, 0.9095],\n", " [0.5611, 0.2622, 0.2592, 0.3187, 0.6624, 0.9951],\n", " [0.8349, 0.6743, 0.9794, 0.6889, 0.0306, 0.4627],\n", " [0.4803, 0.0634, 0.7857, 0.5005, 0.7090, 0.6187],\n", " [0.8773, 0.8505, 0.9799, 0.3615, 0.1887, 0.3500]]]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "im = torch.rand(1, 1, 6, 6)\n", "im" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[[[0.8718, 0.5783, 0.9095],\n", " [0.8349, 0.9794, 0.9951],\n", " [0.8773, 0.9799, 0.7090]]]])\n", "tensor([[[[0.8718, 0.5783, 0.9095],\n", " [0.8349, 0.9794, 0.9951],\n", " [0.8773, 0.9799, 0.7090]]]])\n" ] } ], "source": [ "# Build a pooling operator with size 2\n", "max_pooling = nn.MaxPool2d(2)\n", "\n", "# Apply the pooling operator\n", "output_feature = max_pooling(im)\n", "\n", "# Use pooling operator in the image\n", "output_feature_F = F.max_pool2d(im, 2)\n", "\n", "# Print the results of both cases\n", "print(output_feature)\n", "print(output_feature_F)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Average-pooling operator\n", "After coding the max-pooling operator, you are now going to code the average-pooling operator. You just need to replace max-pooling with average pooling.\n", "\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[[[0.4681, 0.3117, 0.4601],\n", " [0.5831, 0.5615, 0.5377],\n", " [0.5679, 0.6569, 0.4666]]]])\n", "tensor([[[[0.4681, 0.3117, 0.4601],\n", " [0.5831, 0.5615, 0.5377],\n", " [0.5679, 0.6569, 0.4666]]]])\n" ] } ], "source": [ "# Build a pooling operator with size 2\n", "avg_pooling = nn.AvgPool2d(2)\n", "\n", "# Apply the pooling operator\n", "output_feature = avg_pooling(im)\n", "\n", "# Use pooling operator in the image\n", "output_feature_F = F.avg_pool2d(im, 2)\n", "\n", "# Print the results of both cases\n", "print(output_feature)\n", "print(output_feature_F)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convolutional Neural Networks\n", "- AlexNet\n", "![alexnet](image/alexnet.png)\n", "- Implementation in pytorch\n", "```python\n", "class AlexNet(nn.Module):\n", " def __init__(self, num_classes=1000):\n", " super(AlexNet, self).__init__()\n", " self.conv1 = nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2)\n", " self.relu = nn.ReLU(inplace=True)\n", " self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2)\n", " self.conv2 = nn.Conv2d(64, 192, kernel_size=5, padding=2)\n", " self.conv3 = nn.Conv2d(192, 384, kernel_size=3, padding=1)\n", " self.conv4 = nn.Conv2d(384, 256, kernel_size=3, padding=1)\n", " self.conv5 = nn.Conv2d(256, 256, kernel_size=3, padding=1)\n", " self.avgpool = nn.AdaptiveAvgPool2d((6, 6))\n", " self.fc1 = nn.Linear(256 * 6 * 6, 4096)\n", " self.fc2 = nn.Linear(4096, 4096)\n", " self.fc3 = nn.Linear(4096, num_classes)\n", " \n", " def forward(self, x):\n", " x = self.relu(self.conv1(x))\n", " x = self.maxpool(x)\n", " x = self.relu(self.conv2(x))\n", " x = self.maxpool(x)\n", " x = self.relu(self.conv3(x))\n", " x = self.relu(self.conv4(x))\n", " x = self.relu(self.conv5(x))\n", " x = self.maxpool(x)\n", " x = self.avgpool(x)\n", " x = x.view(x.size(0), 256 * 6 * 6)\n", " x = self.relu(self.fc1(x))\n", " x = self.relu(self.fc2(x))\n", " return self.fc3(x)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Your first CNN - __init__ method\n", "You are going to build your first convolutional neural network. You're going to use the MNIST dataset as the dataset, which is made of handwritten digits from 0 to 9. The convolutional neural network is going to have 2 convolutional layers, each followed by a ReLU nonlinearity, and a fully connected layer. Remember that each pooling layer halves both the height and the width of the image, so by using 2 pooling layers, the height and width are 1/4 of the original sizes. MNIST images have shape (1, 28, 28)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "class Net(nn.Module):\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " \n", " # Instantiate two convolutional layers\n", " self.conv1 = nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, padding=1)\n", " self.conv2 = nn.Conv2d(in_channels=5, out_channels=10, kernel_size=3, padding=1)\n", " \n", " # Instantiate the ReLU nonlinearity\n", " self.relu = nn.ReLU(inplace=True)\n", " \n", " # Instantiate a max pooling layer\n", " self.pool = nn.MaxPool2d(kernel_size=2, stride=2)\n", " \n", " # Instantiate a fully connected layer\n", " self.fc = nn.Linear(49 * 10, 10)\n", " \n", " def forward(self, x):\n", " # Apply conv followed by relu, then in next line pool\n", " x = self.relu(self.conv1(x))\n", " x = self.pool(x)\n", " \n", " # Apply conv followed by relu, then in next line pool\n", " x = self.relu(self.conv2(x))\n", " x = self.pool(x)\n", " \n", " # Prepare the image for the fully connected layer\n", " x = x.view(-1, 7 * 7 * 10)\n", " \n", " # Apply the fully connected layer and return the result\n", " return self.fc(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training Convolutional Neural Networks\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Training CNNs\n", "Similarly to what you did in Chapter 2, you are going to train a neural network. This time however, you will train the CNN you built in the previous lesson, instead of a fully connected network. The packages you need have been imported for you and the network (called `net`) instantiated. The cross-entropy loss function (called `criterion`) and the Adam optimizer (called `optimizer`) are also available. We have subsampled the training set so that the training goes faster, and you are going to use a single epoch.\n", "\n" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "import torchvision.transforms as transforms\n", "\n", "# Transform the data to torch tensors and normalize it\n", "transform = transforms.Compose([\n", " transforms.ToTensor(),\n", " transforms.Normalize((0.1307), (0.3081))\n", "])\n", "\n", "# Preparing the training and test set\n", "trainset = torchvision.datasets.MNIST('mnist', train=True, transform=transform)\n", "testset = torchvision.datasets.MNIST('mnist', train=False, transform=transform)\n", "\n", "# Prepare loader\n", "train_loader = torch.utils.data.DataLoader(trainset, batch_size=1, shuffle=True, num_workers=0)\n", "test_loader = torch.utils.data.DataLoader(testset, batch_size=1, shuffle=False, num_workers=0)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "import torch.optim as optim\n", "\n", "net = Net()\n", "optimizer = optim.Adam(net.parameters(), lr=3e-4)\n", "criterion = nn.CrossEntropyLoss()" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "for i, data in enumerate(train_loader, 0):\n", " inputs, labels = data\n", " optimizer.zero_grad()\n", " \n", " # Compute the forward pass\n", " outputs = net(inputs)\n", " \n", " # Compute the loss function\n", " loss = criterion(outputs, labels)\n", " \n", " # Compute the gradients\n", " loss.backward()\n", " \n", " # Update the weights\n", " optimizer.step()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using CNNs to make predictions\n", "Building and training neural networks is a very exciting job (trust me, I do it every day)! However, the main utility of neural networks is to make predictions. This is the entire reason why the field of deep learning has bloomed in the last few years, as neural networks predictions are extremely accurate. On this exercise, we are going to use the convolutional neural network you already trained in order to make predictions on the MNIST dataset.\n", "\n", "Remember that `torch.max()` takes two arguments: -`output.data` - the tensor which contains the data.\n", "\n", "Either 1 to do argmax or 0 to do max." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Yipes, your net made the right prediction tensor([7])\n", "Yipes, your net made the right prediction tensor([2])\n", "Yipes, your net made the right prediction tensor([1])\n", "Yipes, your net made the right prediction tensor([0])\n", "Yipes, your net made the right prediction tensor([4])\n", "Yipes, your net made the right prediction tensor([1])\n", "Yipes, your net made the right prediction tensor([4])\n", "Yipes, your net made the right prediction tensor([9])\n", "Yipes, your net made the right prediction tensor([5])\n", "Yipes, your net made the right prediction tensor([9])\n", "Yipes, your net made the right prediction tensor([0])\n", "Yipes, your net made the right prediction tensor([6])\n" ] } ], "source": [ "net.eval()\n", "# Iterate over the data in the test_loader\n", "for i, data in enumerate(test_loader):\n", " # Get the image and label from data\n", " image, label = data\n", " \n", " # Make a forward pass in the net with your image\n", " output = net(image)\n", " \n", " # Argmax the results of the net\n", " _, predicted = torch.max(output.data, 1)\n", " \n", " if predicted == label:\n", " print(\"Yipes, your net made the right prediction \" + str(predicted))\n", " else:\n", " print(\"Your net prediction was \" + str(predicted) + \", but the correct label is: \" + str(label))\n", " \n", " if i > 10:\n", " break" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }