{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using Convolutional Neural Networks in PyTorch\n",
    "> In this last chapter, we learn how to make neural networks work well in practice, using concepts like regularization, batch-normalization and transfer learning. This is the Summary of lecture \"Introduction to Deep Learning with PyTorch\", via datacamp.\n",
    "\n",
    "- toc: true \n",
    "- badges: true\n",
    "- comments: true\n",
    "- author: Chanseok Kang\n",
    "- categories: [Python, Datacamp, PyTorch, Deep_Learning]\n",
    "- image: images/earlystopping.png"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim\n",
    "import torch.nn.functional as F\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "plt.rcParams['figure.figsize'] = (8, 8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The sequential module\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sequential module\n",
    "Having learned about the sequential module, now is the time to see how you can convert a neural network that doesn't use sequential modules to one that uses them. We are giving the code to build the network in the usual way, and you are going to write the code for the same network using sequential modules.\n",
    "\n",
    "```python\n",
    "class Net(nn.Module):\n",
    "    def __init__(self, num_classes):\n",
    "        super(Net, self).__init__()\n",
    "\n",
    "        self.conv1 = nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, padding=1)\n",
    "        self.conv2 = nn.Conv2d(in_channels=5, out_channels=10, kernel_size=3, padding=1)\n",
    "        self.conv3 = nn.Conv2d(in_channels=10, out_channels=20, kernel_size=3, padding=1)\n",
    "        self.conv4 = nn.Conv2d(in_channels=20, out_channels=40, kernel_size=3, padding=1)\n",
    "\n",
    "        self.relu = nn.ReLU()\n",
    "\n",
    "        self.pool = nn.MaxPool2d(2, 2)\n",
    "\n",
    "        self.fc1 = nn.Linear(7 * 7 * 40, 1024)\n",
    "        self.fc2 = nn.Linear(1024, 2048)\n",
    "        self.fc3 = nn.Linear(2048, 10) \n",
    "        \n",
    "    def forward():\n",
    "        x = self.relu(self.conv1(x))\n",
    "        x = self.relu(self.pool(self.conv2(x)))\n",
    "        x = self.relu(self.conv3(x))\n",
    "        x = self.relu(self.pool(self.conv4(x)))\n",
    "        x = x.view(-1, 7 * 7 * 40)\n",
    "        x = self.relu(self.fc1(x))\n",
    "        x = self.relu(self.fc2(x))\n",
    "        x = self.fc3(x)\n",
    "        return x\n",
    "```\n",
    "\n",
    "We want the pooling layer to be used after the second and fourth convolutional layers, while the relu nonlinearity needs to be used after each layer except the last (fully-connected) layer. For the number of filters (kernels), stride, passing, number of channels and number of units, use the same numbers as above.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Net(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        \n",
    "        # Declare all the layers for feature extraction\n",
    "        self.features = nn.Sequential(\n",
    "            nn.Conv2d(in_channels=1, out_channels=5, kernel_size=3, padding=1),\n",
    "            nn.ReLU(inplace=True),\n",
    "            nn.Conv2d(in_channels=5, out_channels=10, kernel_size=3, padding=1),\n",
    "            nn.MaxPool2d(2, 2),\n",
    "            nn.ReLU(inplace=True),\n",
    "            nn.Conv2d(in_channels=10, out_channels=20, kernel_size=3, padding=1),\n",
    "            nn.ReLU(inplace=True),\n",
    "            nn.Conv2d(in_channels=20, out_channels=40, kernel_size=3, padding=1),\n",
    "            nn.MaxPool2d(2, 2),\n",
    "            nn.ReLU(inplace=True)\n",
    "        )\n",
    "        \n",
    "        # Declare all the layers for classification\n",
    "        self.classifier = nn.Sequential(\n",
    "            nn.Linear(7 * 7 * 40, 1024),\n",
    "            nn.ReLU(inplace=True),\n",
    "            nn.Linear(1024, 2048),\n",
    "            nn.ReLU(inplace=True),\n",
    "            nn.Linear(2048, 10)\n",
    "        )\n",
    "    def forward(self, x):\n",
    "        # Apply the feature extractor in the input\n",
    "        x = self.features(x)\n",
    "        \n",
    "        # Squeeze the three spatial dimentions in one\n",
    "        x = x.view(-1, 7 * 7 * 40)\n",
    "        \n",
    "        # Classifiy the image\n",
    "        x = self.classifier(x)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The problem of overfitting\n",
    "- Overfitting\n",
    "![overfit](image/overfit.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Validation set\n",
    "You saw the need for validation set in the previous video. Problem is that the datasets typically are not separated into training, validation and testing. It is your job as a data scientist to split the dataset into training, testing and validation. The easiest (and most used) way of doing so is to do a random splitting of the dataset. In PyTorch, that can be done using `SubsetRandomSampler` object. You are going to split the training part of MNIST dataset into training and validation. After randomly shuffling the dataset, use the first 55000 points for training, and the remaining 5000 points for validation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to mnist/MNIST/raw/train-images-idx3-ubyte.gz\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "73f4ba987c6b41b99ecaf8d3eecd7901",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting mnist/MNIST/raw/train-images-idx3-ubyte.gz to mnist/MNIST/raw\n",
      "Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to mnist/MNIST/raw/train-labels-idx1-ubyte.gz\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f8c58ffa243c4b04a0059375e13808eb",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting mnist/MNIST/raw/train-labels-idx1-ubyte.gz to mnist/MNIST/raw\n",
      "Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to mnist/MNIST/raw/t10k-images-idx3-ubyte.gz\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "8591e6e8057345218c663c9f32d5e4e5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting mnist/MNIST/raw/t10k-images-idx3-ubyte.gz to mnist/MNIST/raw\n",
      "Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "cd296d95814a49c39bb0acffd9a27b52",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting mnist/MNIST/raw/t10k-labels-idx1-ubyte.gz to mnist/MNIST/raw\n",
      "Processing...\n",
      "Done!\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/conda-bld/pytorch_1591914855613/work/torch/csrc/utils/tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program.\n"
     ]
    }
   ],
   "source": [
    "import torchvision.datasets as datasets\n",
    "import torchvision.transforms as transforms\n",
    "\n",
    "# Shuffle the indices\n",
    "indices = np.arange(60000)\n",
    "np.random.shuffle(indices)\n",
    "\n",
    "transform = transforms.Compose([\n",
    "    transforms.ToTensor(),\n",
    "    transforms.Normalize((0.1307, ), (0.3081, ))\n",
    "])\n",
    "\n",
    "# Build the train loader\n",
    "train_loader = torch.utils.data.DataLoader(\n",
    "    datasets.MNIST('mnist', download=True, train=True, transform=transform),\n",
    "    batch_size=64, shuffle=False, sampler=torch.utils.data.SubsetRandomSampler(indices[:55000])\n",
    ")\n",
    "\n",
    "# Build the validation loader\n",
    "val_loader = torch.utils.data.DataLoader(\n",
    "    datasets.MNIST('mnist', download=True, train=True, transform=transform),\n",
    "    batch_size=64, shuffle=False, sampler=torch.utils.data.SubsetRandomSampler(indices[55000:])\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Regularization techniques\n",
    "- L2-regularization\n",
    "$$ C = -\\frac{1}{n} \\sum_{xj}[y_j \\ln a_j^L + (1 - y_j) \\ln (1 - a_j^L)] + \\frac{\\lambda}{2n} \\sum_w w^2 $$\n",
    "- Dropout\n",
    "![dropout](image/dropout.png)\n",
    "- Batch Normalization\n",
    "- Early-stopping\n",
    "![early](image/earlystopping.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### L2-regularization\n",
    "You are going to implement each of the regularization techniques explained in the previous video. Doing so, you will also remember important concepts studied throughout the course. You will start with l2-regularization, the most important regularization technique in machine learning. As you saw in the video, l2-regularization simply penalizes large weights, and thus enforces the network to use only small weights.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Instantiate the network\n",
    "model = Net()\n",
    "\n",
    "# Instantiate the cross-entropy loss\n",
    "criterion = nn.CrossEntropyLoss()\n",
    "\n",
    "# Instantiate the Adam optimizer\n",
    "optimizer = optim.Adam(model.parameters(), lr=3e-4, weight_decay=0.001)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dropout\n",
    "You saw that dropout is an effective technique to avoid overfitting. Typically, dropout is applied in fully-connected neural networks, or in the fully-connected layers of a convolutional neural network. You are now going to implement dropout and use it on a small fully-connected neural network.\n",
    "\n",
    "For the first hidden layer use 200 units, for the second hidden layer use 500 units, and for the output layer use 10 units (one for each class). For the activation function, use ReLU. Use `.Dropout()` with strength 0.5, between the first and second hidden layer. Use the sequential module, with the order being: fully-connected, activation, dropout, fully-connected, activation, fully-connected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Net(nn.Module):\n",
    "    def __init__(self):\n",
    "        # Define all the parameters of the net\n",
    "        self.classifier = nn.Sequential(\n",
    "            nn.Linear(28 * 28, 200),\n",
    "            nn.ReLU(inplace=True),\n",
    "            nn.Dropout(p=0.5),\n",
    "            nn.Linear(200, 500),\n",
    "            nn.ReLU(inplace=True),\n",
    "            nn.Linear(500, 10)\n",
    "        )\n",
    "        \n",
    "    def forward(self, x):\n",
    "        # Do the forward pass\n",
    "        return self.classifier(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Batch-normalization\n",
    "Dropout is used to regularize fully-connected layers. Batch-normalization is used to make the training of convolutional neural networks more efficient, while at the same time having regularization effects. You are going to implement the `__init__` method of a small convolutional neural network, with batch-normalization. The feature extraction part of the CNN will contain the following modules (in order): convolution, max-pool, activation, batch-norm, convolution, max-pool, relu, batch-norm.\n",
    "\n",
    "The first convolutional layer will contain 10 output channels, while the second will contain 20 output channels. As always, we are going to use MNIST dataset, with images having shape (28, 28) in grayscale format (1 channel). In all cases, the size of the `filter` should be 3, the `stride` should be 1 and the `padding` should be 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Net(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        \n",
    "        # Implement the sequential module for feature extraction\n",
    "        self.features = nn.Sequential(\n",
    "            nn.Conv2d(in_channels=1, out_channels=10, kernel_size=3, stride=1, padding=1),\n",
    "            nn.MaxPool2d(2, 2), nn.ReLU(inplace=True), nn.BatchNorm2d(10),\n",
    "            nn.Conv2d(in_channels=10, out_channels=20, kernel_size=3, stride=1, padding=1),\n",
    "            nn.MaxPool2d(2, 2), nn.ReLU(inplace=True), nn.BatchNorm2d(20)\n",
    "        )\n",
    "        \n",
    "        # Implement the fully connected layer for classification\n",
    "        self.fc = nn.Linear(in_features=20 * 7 * 7, out_features=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Transfer learning\n",
    "- Transfer learning\n",
    "![transfer](image/transfer.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Torchvision module\n",
    "You already finetuned a net you had pretrained. In practice though, it is very common to finetune CNNs that someone else (typically the library's developers) have pretrained in ImageNet. Big networks still take a lot of time to be trained on large datasets, and maybe you cannot afford to train a large network on a dataset of 1.2 million images on your laptop.\n",
    "\n",
    "Instead, you can simply download the network and finetune it on your dataset. That's what you will do right now. You are going to assume that you have a personal dataset, containing the images from all your last 7 holidays. You want to build a neural network that can classify each image depending on the holiday it comes from. However, since the dataset is so small, you need to use the finetuning technique."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torchvision\n",
    "\n",
    "# Download resnet18\n",
    "model = torchvision.models.resnet18(pretrained=True)\n",
    "\n",
    "# Freeze all the layers bar the last one\n",
    "for param in model.parameters():\n",
    "    param.requires_grad = False\n",
    "    \n",
    "# Change the number of output units\n",
    "model.fc = nn.Linear(512, 7)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}