{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UEBilEjLj5wY"
   },
   "source": [
    "Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.\n",
    "- Author: Sebastian Raschka\n",
    "- GitHub Repository: https://github.com/rasbt/deeplearning-models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "base_uri": "https://localhost:8080/",
     "height": 119
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 536,
     "status": "ok",
     "timestamp": 1524974472601,
     "user": {
      "displayName": "Sebastian Raschka",
      "photoUrl": "//lh6.googleusercontent.com/-cxK6yOSQ6uE/AAAAAAAAAAI/AAAAAAAAIfw/P9ar_CHsKOQ/s50-c-k-no/photo.jpg",
      "userId": "118404394130788869227"
     },
     "user_tz": 240
    },
    "id": "GOzuY8Yvj5wb",
    "outputId": "c19362ce-f87a-4cc2-84cc-8d7b4b9e6007"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sebastian Raschka \n",
      "\n",
      "CPython 3.6.8\n",
      "IPython 7.2.0\n",
      "\n",
      "torch 1.0.0\n"
     ]
    }
   ],
   "source": [
    "%load_ext watermark\n",
    "%watermark -a 'Sebastian Raschka' -v -p torch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "rH4XmErYj5wm"
   },
   "source": [
    "# Model Zoo -- ResNet-34 CIFAR-10 Classifier with Pinned Memory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is an example notebook comparing the speed of model training with and without using page-locked memory. Page-locked memory can be enabled by setting `pin_memory=True` in PyTorch's `DataLoader` class (disabled by default).\n",
    "\n",
    "Theoretically, pinning the memory should speed up the data transfer rate but minimizing the data transfer cost between CPU and the CUDA device; hence, enabling `pin_memory=True` should make the model training faster by some small margin.\n",
    "\n",
    "> Host (CPU) data allocations are pageable by default. The GPU cannot access data directly from pageable host memory, so when a data transfer from pageable host memory to device memory is invoked, the CUDA driver must first allocate a temporary page-locked, or “pinned”, host array, copy the host data to the pinned array, and then transfer the data from the pinned array to device memory, as illustrated below... (Source: https://devblogs.nvidia.com/how-optimize-data-transfers-cuda-cc/)\n",
    "\n",
    "\n",
    "After the Model preamble, this Notebook is divided into too subsections, \"Training Without Pinned Memory\" and \"Training with Pinned Memory\" to investigate whether there is a noticable training time difference when toggling `pin_memory` on and off."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Network Architecture"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The network in this notebook is an implementation of the ResNet-34 [1] architecture on the MNIST digits dataset (http://yann.lecun.com/exdb/mnist/) to train a handwritten digit classifier.  \n",
    "\n",
    "\n",
    "References\n",
    "    \n",
    "- [1] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). ([CVPR Link](https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html))\n",
    "\n",
    "- [2] http://yann.lecun.com/exdb/mnist/\n",
    "\n",
    "![](../images/resnets/resnet34/resnet34-arch.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following figure illustrates residual blocks with skip connections such that the input passed via the shortcut matches the dimensions of the main path's output, which allows the network to learn identity functions.\n",
    "\n",
    "![](../images/resnets/resnet-ex-1-1.png)\n",
    "\n",
    "\n",
    "The ResNet-34 architecture actually uses residual blocks with skip connections such that the input passed via the shortcut matches is resized to dimensions of the main path's output. Such a residual block is illustrated below:\n",
    "\n",
    "![](../images/resnets/resnet-ex-1-2.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For a more detailed explanation see the other notebook, [resnet-ex-1.ipynb](resnet-ex-1.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "MkoGLH_Tj5wn"
   },
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     }
    },
    "colab_type": "code",
    "id": "ORj09gnrj5wp"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import time\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "from torch.utils.data import DataLoader\n",
    "\n",
    "from torchvision import datasets\n",
    "from torchvision import transforms\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "from PIL import Image\n",
    "\n",
    "\n",
    "if torch.cuda.is_available():\n",
    "    torch.backends.cudnn.deterministic = True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "I6hghKPxj5w0"
   },
   "source": [
    "## Model Settings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "base_uri": "https://localhost:8080/",
     "height": 85
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 23936,
     "status": "ok",
     "timestamp": 1524974497505,
     "user": {
      "displayName": "Sebastian Raschka",
      "photoUrl": "//lh6.googleusercontent.com/-cxK6yOSQ6uE/AAAAAAAAAAI/AAAAAAAAIfw/P9ar_CHsKOQ/s50-c-k-no/photo.jpg",
      "userId": "118404394130788869227"
     },
     "user_tz": 240
    },
    "id": "NnT0sZIwj5wu",
    "outputId": "55aed925-d17e-4c6a-8c71-0d9b3bde5637"
   },
   "outputs": [],
   "source": [
    "##########################\n",
    "### SETTINGS\n",
    "##########################\n",
    "\n",
    "# Hyperparameters\n",
    "RANDOM_SEED = 1\n",
    "LEARNING_RATE = 0.001\n",
    "BATCH_SIZE = 256\n",
    "NUM_EPOCHS = 10\n",
    "\n",
    "# Architecture\n",
    "NUM_FEATURES = 28*28\n",
    "NUM_CLASSES = 10\n",
    "\n",
    "# Other\n",
    "DEVICE = \"cuda:1\"\n",
    "GRAYSCALE = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following code cell that implements the ResNet-34 architecture is a derivative of the code provided at https://pytorch.org/docs/0.4.0/_modules/torchvision/models/resnet.html."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "##########################\n",
    "### MODEL\n",
    "##########################\n",
    "\n",
    "\n",
    "def conv3x3(in_planes, out_planes, stride=1):\n",
    "    \"\"\"3x3 convolution with padding\"\"\"\n",
    "    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,\n",
    "                     padding=1, bias=False)\n",
    "\n",
    "\n",
    "class BasicBlock(nn.Module):\n",
    "    expansion = 1\n",
    "\n",
    "    def __init__(self, inplanes, planes, stride=1, downsample=None):\n",
    "        super(BasicBlock, self).__init__()\n",
    "        self.conv1 = conv3x3(inplanes, planes, stride)\n",
    "        self.bn1 = nn.BatchNorm2d(planes)\n",
    "        self.relu = nn.ReLU(inplace=True)\n",
    "        self.conv2 = conv3x3(planes, planes)\n",
    "        self.bn2 = nn.BatchNorm2d(planes)\n",
    "        self.downsample = downsample\n",
    "        self.stride = stride\n",
    "\n",
    "    def forward(self, x):\n",
    "        residual = x\n",
    "\n",
    "        out = self.conv1(x)\n",
    "        out = self.bn1(out)\n",
    "        out = self.relu(out)\n",
    "\n",
    "        out = self.conv2(out)\n",
    "        out = self.bn2(out)\n",
    "\n",
    "        if self.downsample is not None:\n",
    "            residual = self.downsample(x)\n",
    "\n",
    "        out += residual\n",
    "        out = self.relu(out)\n",
    "\n",
    "        return out\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "class ResNet(nn.Module):\n",
    "\n",
    "    def __init__(self, block, layers, num_classes, grayscale):\n",
    "        self.inplanes = 64\n",
    "        if grayscale:\n",
    "            in_dim = 1\n",
    "        else:\n",
    "            in_dim = 3\n",
    "        super(ResNet, self).__init__()\n",
    "        self.conv1 = nn.Conv2d(in_dim, 64, kernel_size=7, stride=2, padding=3,\n",
    "                               bias=False)\n",
    "        self.bn1 = nn.BatchNorm2d(64)\n",
    "        self.relu = nn.ReLU(inplace=True)\n",
    "        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)\n",
    "        self.layer1 = self._make_layer(block, 64, layers[0])\n",
    "        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)\n",
    "        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)\n",
    "        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)\n",
    "        self.avgpool = nn.AvgPool2d(7, stride=1)\n",
    "        self.fc = nn.Linear(512 * block.expansion, num_classes)\n",
    "\n",
    "        for m in self.modules():\n",
    "            if isinstance(m, nn.Conv2d):\n",
    "                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n",
    "                m.weight.data.normal_(0, (2. / n)**.5)\n",
    "            elif isinstance(m, nn.BatchNorm2d):\n",
    "                m.weight.data.fill_(1)\n",
    "                m.bias.data.zero_()\n",
    "\n",
    "    def _make_layer(self, block, planes, blocks, stride=1):\n",
    "        downsample = None\n",
    "        if stride != 1 or self.inplanes != planes * block.expansion:\n",
    "            downsample = nn.Sequential(\n",
    "                nn.Conv2d(self.inplanes, planes * block.expansion,\n",
    "                          kernel_size=1, stride=stride, bias=False),\n",
    "                nn.BatchNorm2d(planes * block.expansion),\n",
    "            )\n",
    "\n",
    "        layers = []\n",
    "        layers.append(block(self.inplanes, planes, stride, downsample))\n",
    "        self.inplanes = planes * block.expansion\n",
    "        for i in range(1, blocks):\n",
    "            layers.append(block(self.inplanes, planes))\n",
    "\n",
    "        return nn.Sequential(*layers)\n",
    "\n",
    "    def forward(self, x):\n",
    "        x = self.conv1(x)\n",
    "        x = self.bn1(x)\n",
    "        x = self.relu(x)\n",
    "        x = self.maxpool(x)\n",
    "\n",
    "        x = self.layer1(x)\n",
    "        x = self.layer2(x)\n",
    "        x = self.layer3(x)\n",
    "        x = self.layer4(x)\n",
    "        # because MNIST is already 1x1 here:\n",
    "        # disable avg pooling\n",
    "        #x = self.avgpool(x)\n",
    "        \n",
    "        x = x.view(x.size(0), -1)\n",
    "        logits = self.fc(x)\n",
    "        probas = F.softmax(logits, dim=1)\n",
    "        return logits, probas\n",
    "\n",
    "\n",
    "\n",
    "def resnet34(num_classes):\n",
    "    \"\"\"Constructs a ResNet-34 model.\"\"\"\n",
    "    model = ResNet(block=BasicBlock, \n",
    "                   layers=[3, 4, 6, 3],\n",
    "                   num_classes=NUM_CLASSES,\n",
    "                   grayscale=GRAYSCALE)\n",
    "    return model\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "RAodboScj5w6"
   },
   "source": [
    "## Training without Pinned Memory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Files already downloaded and verified\n",
      "Image batch dimensions: torch.Size([256, 3, 32, 32])\n",
      "Image label dimensions: torch.Size([256])\n",
      "Image batch dimensions: torch.Size([256, 3, 32, 32])\n",
      "Image label dimensions: torch.Size([256])\n"
     ]
    }
   ],
   "source": [
    "##########################\n",
    "### CIFAR-10 Dataset\n",
    "##########################\n",
    "\n",
    "\n",
    "# Note transforms.ToTensor() scales input images\n",
    "# to 0-1 range\n",
    "train_dataset = datasets.CIFAR10(root='data', \n",
    "                                 train=True, \n",
    "                                 transform=transforms.ToTensor(),\n",
    "                                 download=True)\n",
    "\n",
    "test_dataset = datasets.CIFAR10(root='data', \n",
    "                                train=False, \n",
    "                                transform=transforms.ToTensor())\n",
    "\n",
    "\n",
    "train_loader = DataLoader(dataset=train_dataset, \n",
    "                          batch_size=BATCH_SIZE, \n",
    "                          num_workers=8,\n",
    "                          shuffle=True)\n",
    "\n",
    "test_loader = DataLoader(dataset=test_dataset, \n",
    "                         batch_size=BATCH_SIZE,\n",
    "                         num_workers=8,\n",
    "                         shuffle=False)\n",
    "\n",
    "# Checking the dataset\n",
    "for images, labels in train_loader:  \n",
    "    print('Image batch dimensions:', images.shape)\n",
    "    print('Image label dimensions:', labels.shape)\n",
    "    break\n",
    "\n",
    "# Checking the dataset\n",
    "for images, labels in train_loader:  \n",
    "    print('Image batch dimensions:', images.shape)\n",
    "    print('Image label dimensions:', labels.shape)\n",
    "    break"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     }
    },
    "colab_type": "code",
    "id": "_lza9t_uj5w1"
   },
   "outputs": [],
   "source": [
    "torch.manual_seed(RANDOM_SEED)\n",
    "\n",
    "model = resnet34(NUM_CLASSES)\n",
    "model.to(DEVICE)\n",
    "\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "base_uri": "https://localhost:8080/",
     "height": 1547
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 2384585,
     "status": "ok",
     "timestamp": 1524976888520,
     "user": {
      "displayName": "Sebastian Raschka",
      "photoUrl": "//lh6.googleusercontent.com/-cxK6yOSQ6uE/AAAAAAAAAAI/AAAAAAAAIfw/P9ar_CHsKOQ/s50-c-k-no/photo.jpg",
      "userId": "118404394130788869227"
     },
     "user_tz": 240
    },
    "id": "Dzh3ROmRj5w7",
    "outputId": "5f8fd8c9-b076-403a-b0b7-fd2d498b48d7",
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch: 001/010 | Batch 0000/0196 | Cost: 2.6021\n",
      "Epoch: 001/010 | Batch 0150/0196 | Cost: 1.3961\n",
      "Epoch: 001/010 | Train: 45.084%\n",
      "Time elapsed: 0.26 min\n",
      "Epoch: 002/010 | Batch 0000/0196 | Cost: 1.1228\n",
      "Epoch: 002/010 | Batch 0150/0196 | Cost: 1.0426\n",
      "Epoch: 002/010 | Train: 56.166%\n",
      "Time elapsed: 0.52 min\n",
      "Epoch: 003/010 | Batch 0000/0196 | Cost: 0.9980\n",
      "Epoch: 003/010 | Batch 0150/0196 | Cost: 0.8279\n",
      "Epoch: 003/010 | Train: 66.702%\n",
      "Time elapsed: 0.80 min\n",
      "Epoch: 004/010 | Batch 0000/0196 | Cost: 0.6384\n",
      "Epoch: 004/010 | Batch 0150/0196 | Cost: 0.7103\n",
      "Epoch: 004/010 | Train: 65.330%\n",
      "Time elapsed: 1.08 min\n",
      "Epoch: 005/010 | Batch 0000/0196 | Cost: 0.6308\n",
      "Epoch: 005/010 | Batch 0150/0196 | Cost: 0.5913\n",
      "Epoch: 005/010 | Train: 79.636%\n",
      "Time elapsed: 1.36 min\n",
      "Epoch: 006/010 | Batch 0000/0196 | Cost: 0.4409\n",
      "Epoch: 006/010 | Batch 0150/0196 | Cost: 0.5557\n",
      "Epoch: 006/010 | Train: 76.456%\n",
      "Time elapsed: 1.62 min\n",
      "Epoch: 007/010 | Batch 0000/0196 | Cost: 0.4778\n",
      "Epoch: 007/010 | Batch 0150/0196 | Cost: 0.4815\n",
      "Epoch: 007/010 | Train: 65.890%\n",
      "Time elapsed: 1.89 min\n",
      "Epoch: 008/010 | Batch 0000/0196 | Cost: 0.3782\n",
      "Epoch: 008/010 | Batch 0150/0196 | Cost: 0.4339\n",
      "Epoch: 008/010 | Train: 85.200%\n",
      "Time elapsed: 2.16 min\n",
      "Epoch: 009/010 | Batch 0000/0196 | Cost: 0.3083\n",
      "Epoch: 009/010 | Batch 0150/0196 | Cost: 0.3290\n",
      "Epoch: 009/010 | Train: 78.108%\n",
      "Time elapsed: 2.42 min\n",
      "Epoch: 010/010 | Batch 0000/0196 | Cost: 0.2229\n",
      "Epoch: 010/010 | Batch 0150/0196 | Cost: 0.1945\n",
      "Epoch: 010/010 | Train: 87.384%\n",
      "Time elapsed: 2.70 min\n",
      "Total Training Time: 2.70 min\n",
      "Test accuracy: 70.67%\n",
      "Total Time: 2.71 min\n"
     ]
    }
   ],
   "source": [
    "def compute_accuracy(model, data_loader, device):\n",
    "    correct_pred, num_examples = 0, 0\n",
    "    for i, (features, targets) in enumerate(data_loader):\n",
    "            \n",
    "        features = features.to(device)\n",
    "        targets = targets.to(device)\n",
    "\n",
    "        logits, probas = model(features)\n",
    "        _, predicted_labels = torch.max(probas, 1)\n",
    "        num_examples += targets.size(0)\n",
    "        correct_pred += (predicted_labels == targets).sum()\n",
    "    return correct_pred.float()/num_examples * 100\n",
    "    \n",
    "\n",
    "start_time = time.time()\n",
    "for epoch in range(NUM_EPOCHS):\n",
    "    \n",
    "    model.train()\n",
    "    for batch_idx, (features, targets) in enumerate(train_loader):\n",
    "        \n",
    "        features = features.to(DEVICE)\n",
    "        targets = targets.to(DEVICE)\n",
    "            \n",
    "        ### FORWARD AND BACK PROP\n",
    "        logits, probas = model(features)\n",
    "        cost = F.cross_entropy(logits, targets)\n",
    "        optimizer.zero_grad()\n",
    "        \n",
    "        cost.backward()\n",
    "        \n",
    "        ### UPDATE MODEL PARAMETERS\n",
    "        optimizer.step()\n",
    "        \n",
    "        ### LOGGING\n",
    "        if not batch_idx % 150:\n",
    "            print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f' \n",
    "                   %(epoch+1, NUM_EPOCHS, batch_idx, \n",
    "                     len(train_loader), cost))\n",
    "\n",
    "        \n",
    "\n",
    "    model.eval()\n",
    "    with torch.set_grad_enabled(False): # save memory during inference\n",
    "        print('Epoch: %03d/%03d | Train: %.3f%%' % (\n",
    "              epoch+1, NUM_EPOCHS, \n",
    "              compute_accuracy(model, train_loader, device=DEVICE)))\n",
    "        \n",
    "    print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))\n",
    "    \n",
    "print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))\n",
    "\n",
    "\n",
    "with torch.set_grad_enabled(False): # save memory during inference\n",
    "    print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader, device=DEVICE)))\n",
    "    \n",
    "print('Total Time: %.2f min' % ((time.time() - start_time)/60))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training with Pinned Memory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Files already downloaded and verified\n",
      "Image batch dimensions: torch.Size([256, 3, 32, 32])\n",
      "Image label dimensions: torch.Size([256])\n",
      "Image batch dimensions: torch.Size([256, 3, 32, 32])\n",
      "Image label dimensions: torch.Size([256])\n"
     ]
    }
   ],
   "source": [
    "##########################\n",
    "### CIFAR-10 Dataset\n",
    "##########################\n",
    "\n",
    "\n",
    "# Note transforms.ToTensor() scales input images\n",
    "# to 0-1 range\n",
    "train_dataset = datasets.CIFAR10(root='data', \n",
    "                                 train=True, \n",
    "                                 transform=transforms.ToTensor(),\n",
    "                                 download=True)\n",
    "\n",
    "test_dataset = datasets.CIFAR10(root='data', \n",
    "                                train=False, \n",
    "                                transform=transforms.ToTensor())\n",
    "\n",
    "\n",
    "train_loader = DataLoader(dataset=train_dataset, \n",
    "                          batch_size=BATCH_SIZE, \n",
    "                          pin_memory=True,\n",
    "                          shuffle=True)\n",
    "\n",
    "test_loader = DataLoader(dataset=test_dataset, \n",
    "                         batch_size=BATCH_SIZE,\n",
    "                         pin_memory=True,\n",
    "                         shuffle=False)\n",
    "\n",
    "# Checking the dataset\n",
    "for images, labels in train_loader:  \n",
    "    print('Image batch dimensions:', images.shape)\n",
    "    print('Image label dimensions:', labels.shape)\n",
    "    break\n",
    "\n",
    "# Checking the dataset\n",
    "for images, labels in train_loader:  \n",
    "    print('Image batch dimensions:', images.shape)\n",
    "    print('Image label dimensions:', labels.shape)\n",
    "    break"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     }
    },
    "colab_type": "code",
    "id": "_lza9t_uj5w1"
   },
   "outputs": [],
   "source": [
    "torch.manual_seed(RANDOM_SEED)\n",
    "\n",
    "model = resnet34(NUM_CLASSES)\n",
    "model.to(DEVICE)\n",
    "\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "base_uri": "https://localhost:8080/",
     "height": 1547
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 2384585,
     "status": "ok",
     "timestamp": 1524976888520,
     "user": {
      "displayName": "Sebastian Raschka",
      "photoUrl": "//lh6.googleusercontent.com/-cxK6yOSQ6uE/AAAAAAAAAAI/AAAAAAAAIfw/P9ar_CHsKOQ/s50-c-k-no/photo.jpg",
      "userId": "118404394130788869227"
     },
     "user_tz": 240
    },
    "id": "Dzh3ROmRj5w7",
    "outputId": "5f8fd8c9-b076-403a-b0b7-fd2d498b48d7",
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch: 001/010 | Batch 0000/0196 | Cost: 2.6021\n",
      "Epoch: 001/010 | Batch 0150/0196 | Cost: 1.3961\n",
      "Epoch: 001/010 | Train: 45.084%\n",
      "Time elapsed: 0.39 min\n",
      "Epoch: 002/010 | Batch 0000/0196 | Cost: 1.1228\n",
      "Epoch: 002/010 | Batch 0150/0196 | Cost: 1.0426\n",
      "Epoch: 002/010 | Train: 56.166%\n",
      "Time elapsed: 0.77 min\n",
      "Epoch: 003/010 | Batch 0000/0196 | Cost: 0.9980\n",
      "Epoch: 003/010 | Batch 0150/0196 | Cost: 0.8279\n",
      "Epoch: 003/010 | Train: 66.702%\n",
      "Time elapsed: 1.16 min\n",
      "Epoch: 004/010 | Batch 0000/0196 | Cost: 0.6384\n",
      "Epoch: 004/010 | Batch 0150/0196 | Cost: 0.7103\n",
      "Epoch: 004/010 | Train: 65.330%\n",
      "Time elapsed: 1.55 min\n",
      "Epoch: 005/010 | Batch 0000/0196 | Cost: 0.6308\n",
      "Epoch: 005/010 | Batch 0150/0196 | Cost: 0.5913\n",
      "Epoch: 005/010 | Train: 79.636%\n",
      "Time elapsed: 1.94 min\n",
      "Epoch: 006/010 | Batch 0000/0196 | Cost: 0.4409\n",
      "Epoch: 006/010 | Batch 0150/0196 | Cost: 0.5557\n",
      "Epoch: 006/010 | Train: 76.456%\n",
      "Time elapsed: 2.33 min\n",
      "Epoch: 007/010 | Batch 0000/0196 | Cost: 0.4778\n",
      "Epoch: 007/010 | Batch 0150/0196 | Cost: 0.4815\n",
      "Epoch: 007/010 | Train: 65.890%\n",
      "Time elapsed: 2.71 min\n",
      "Epoch: 008/010 | Batch 0000/0196 | Cost: 0.3782\n",
      "Epoch: 008/010 | Batch 0150/0196 | Cost: 0.4339\n",
      "Epoch: 008/010 | Train: 85.200%\n",
      "Time elapsed: 3.10 min\n",
      "Epoch: 009/010 | Batch 0000/0196 | Cost: 0.3083\n",
      "Epoch: 009/010 | Batch 0150/0196 | Cost: 0.3290\n",
      "Epoch: 009/010 | Train: 78.108%\n",
      "Time elapsed: 3.49 min\n",
      "Epoch: 010/010 | Batch 0000/0196 | Cost: 0.2229\n",
      "Epoch: 010/010 | Batch 0150/0196 | Cost: 0.1945\n",
      "Epoch: 010/010 | Train: 87.384%\n",
      "Time elapsed: 3.88 min\n",
      "Total Training Time: 3.88 min\n",
      "Test accuracy: 70.67%\n",
      "Total Time: 3.91 min\n"
     ]
    }
   ],
   "source": [
    "def compute_accuracy(model, data_loader, device):\n",
    "    correct_pred, num_examples = 0, 0\n",
    "    for i, (features, targets) in enumerate(data_loader):\n",
    "            \n",
    "        features = features.to(device)\n",
    "        targets = targets.to(device)\n",
    "\n",
    "        logits, probas = model(features)\n",
    "        _, predicted_labels = torch.max(probas, 1)\n",
    "        num_examples += targets.size(0)\n",
    "        correct_pred += (predicted_labels == targets).sum()\n",
    "    return correct_pred.float()/num_examples * 100\n",
    "    \n",
    "\n",
    "start_time = time.time()\n",
    "for epoch in range(NUM_EPOCHS):\n",
    "    \n",
    "    model.train()\n",
    "    for batch_idx, (features, targets) in enumerate(train_loader):\n",
    "        \n",
    "        features = features.to(DEVICE)\n",
    "        targets = targets.to(DEVICE)\n",
    "            \n",
    "        ### FORWARD AND BACK PROP\n",
    "        logits, probas = model(features)\n",
    "        cost = F.cross_entropy(logits, targets)\n",
    "        optimizer.zero_grad()\n",
    "        \n",
    "        cost.backward()\n",
    "        \n",
    "        ### UPDATE MODEL PARAMETERS\n",
    "        optimizer.step()\n",
    "        \n",
    "        ### LOGGING\n",
    "        if not batch_idx % 150:\n",
    "            print ('Epoch: %03d/%03d | Batch %04d/%04d | Cost: %.4f' \n",
    "                   %(epoch+1, NUM_EPOCHS, batch_idx, \n",
    "                     len(train_loader), cost))\n",
    "\n",
    "        \n",
    "\n",
    "    model.eval()\n",
    "    with torch.set_grad_enabled(False): # save memory during inference\n",
    "        print('Epoch: %03d/%03d | Train: %.3f%%' % (\n",
    "              epoch+1, NUM_EPOCHS, \n",
    "              compute_accuracy(model, train_loader, device=DEVICE)))\n",
    "        \n",
    "    print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))\n",
    "    \n",
    "print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))\n",
    "\n",
    "\n",
    "with torch.set_grad_enabled(False): # save memory during inference\n",
    "    print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader, device=DEVICE)))\n",
    "    \n",
    "print('Total Time: %.2f min' % ((time.time() - start_time)/60))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Based on the training time without and with `pin_memory=True`, there doesn't seem to be a speed-up when using page-locked (or \"pinned\") memory -- in fact, pinning the memory even slowed down the training. (I reran the code in the opposite order, i.e., `pin_memory=True` first, and got the same results.) This could be due to the relatively small dataset size, batch size, and hardware configuration that I was using:\n",
    "\n",
    "- Processor: Intel Xeon® Processor E5-2650 v4 (12 core)\n",
    "- GPU: NVIDIA GeForce GTX 1080Ti\n",
    "- Motherboard: ASUS X99-E-10G WS with PCI-E Gen3 X16 port\n",
    "- Memory: 128 GB DDR4 RAM\n",
    "- Storage: SSD"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "numpy       1.15.4\n",
      "pandas      0.23.4\n",
      "torch       1.0.0\n",
      "PIL.Image   5.3.0\n",
      "\n"
     ]
    }
   ],
   "source": [
    "%watermark -iv"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "default_view": {},
   "name": "convnet-vgg16.ipynb",
   "provenance": [],
   "version": "0.3.2",
   "views": {}
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "371px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}