{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.\n", "- Author: Sebastian Raschka\n", "- GitHub Repository: https://github.com/rasbt/deeplearning-models" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sebastian Raschka \n", "\n", "CPython 3.6.8\n", "IPython 7.2.0\n", "\n", "torch 1.0.0\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -a 'Sebastian Raschka' -v -p torch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Runs on CPU or GPU (if available)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Zoo -- Weight Sharing Within a Layer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For some exotic research projects, you may want to share the weights in certain layers. For this example, suppose you want to share the weights across all output units but want to have unique bias units for each output unit.\n", "\n", "The illustration below shows the last hidden layer and the output layer of a regular multilayer neural network:\n", "\n", "![](../images/weight-sharing/weight-sharing-1.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What we are trying to achieve is to have the same weight for each output unit, i.e., \n", "\n", "![](../images/weight-sharing/weight-sharing-2.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One approach to achive this is to share the weight columns in the weight matrix of the hidden layer that connects to the output layer. A more efficient approach is to replace the matrix-matrix multiplication with shared weights by a matrix-vector multiplication that produces a single output unit, which we can then duplicate before adding the bias vector.\n", "\n", "In other words, the first step is to modify the hidden layer such that it only contains a single vector:\n", "\n", "```python\n", " \n", " # Replace this by the uncommented code below:\n", " #self.linear_1 = torch.nn.Linear(7*7*8, num_classes)\n", " \n", " # Use only a weight vector instead of weight matrix:\n", " self.linear_1 = torch.nn.Linear(7*7*8, 1, bias=False)\n", " \n", " # Define bias manually:\n", " self.linear_1_bias = torch.nn.Parameter(torch.tensor(torch.zeros(num_classes),\n", " dtype=self.linear_1.weight.dtype))\n", "```\n", "\n", "Next, in the `forward` method, we compute the single output and duplicate it over the number of classes, then we add the weights:\n", "\n", "```python\n", "\n", " # Duplicate outputs over all output units\n", " logits = self.linear_1(out.view(-1, 7*7*8))\n", " ones = torch.ones(num_classes, dtype=logits.dtype)\n", " ones = logits\n", " \n", " # then manually add bias\n", " logits = logits + self.linear_1_bias\n", "```\n", "\n", "The following code in this notebook illustrates this using a convnet and the 10-class MNIST dataset. \n", "\n", "**The classification performance will obviously poor, because in this case weight sharing is not ideal, but this is more meant as a technical reference/demo, not a real-world use case for this dataset**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import time\n", "import numpy as np\n", "import torch\n", "import torch.nn.functional as F\n", "from torchvision import datasets\n", "from torchvision import transforms\n", "from torch.utils.data import DataLoader\n", "\n", "\n", "if torch.cuda.is_available():\n", " torch.backends.cudnn.deterministic = True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Settings and Dataset" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Image batch dimensions: torch.Size([128, 1, 28, 28])\n", "Image label dimensions: torch.Size([128])\n" ] } ], "source": [ "##########################\n", "### SETTINGS\n", "##########################\n", "\n", "# Device\n", "device = torch.device(\"cuda:3\" if torch.cuda.is_available() else \"cpu\")\n", "\n", "# Hyperparameters\n", "random_seed = 1\n", "learning_rate = 0.1\n", "num_epochs = 10\n", "batch_size = 128\n", "\n", "# Architecture\n", "num_classes = 10\n", "\n", "\n", "##########################\n", "### MNIST DATASET\n", "##########################\n", "\n", "# Note transforms.ToTensor() scales input images\n", "# to 0-1 range\n", "train_dataset = datasets.MNIST(root='data', \n", " train=True, \n", " transform=transforms.ToTensor(),\n", " download=True)\n", "\n", "test_dataset = datasets.MNIST(root='data', \n", " train=False, \n", " transform=transforms.ToTensor())\n", "\n", "\n", "train_loader = DataLoader(dataset=train_dataset, \n", " batch_size=batch_size, \n", " shuffle=True)\n", "\n", "test_loader = DataLoader(dataset=test_dataset, \n", " batch_size=batch_size, \n", " shuffle=False)\n", "\n", "# Checking the dataset\n", "for images, labels in train_loader: \n", " print('Image batch dimensions:', images.shape)\n", " print('Image label dimensions:', labels.shape)\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/raschka/.local/lib/python3.6/site-packages/ipykernel_launcher.py:46: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n" ] } ], "source": [ "##########################\n", "### MODEL\n", "##########################\n", "\n", "\n", "class ConvNet(torch.nn.Module):\n", "\n", " def __init__(self, num_classes):\n", " super(ConvNet, self).__init__()\n", " \n", " # calculate same padding:\n", " # (w - k + 2*p)/s + 1 = o\n", " # => p = (s(o-1) - w + k)/2\n", " \n", " # 28x28x1 => 28x28x4\n", " self.conv_1 = torch.nn.Conv2d(in_channels=1,\n", " out_channels=4,\n", " kernel_size=(3, 3),\n", " stride=(1, 1),\n", " padding=1) # (1(28-1) - 28 + 3) / 2 = 1\n", " # 28x28x4 => 14x14x4\n", " self.pool_1 = torch.nn.MaxPool2d(kernel_size=(2, 2),\n", " stride=(2, 2),\n", " padding=0) # (2(14-1) - 28 + 2) = 0 \n", " # 14x14x4 => 14x14x8\n", " self.conv_2 = torch.nn.Conv2d(in_channels=4,\n", " out_channels=8,\n", " kernel_size=(3, 3),\n", " stride=(1, 1),\n", " padding=1) # (1(14-1) - 14 + 3) / 2 = 1 \n", " # 14x14x8 => 7x7x8 \n", " self.pool_2 = torch.nn.MaxPool2d(kernel_size=(2, 2),\n", " stride=(2, 2),\n", " padding=0) # (2(7-1) - 14 + 2) = 0\n", " \n", " ##############################################################################\n", " ### WEIGHT SHARING IN LAST LAYER\n", " \n", " #self.linear_1 = torch.nn.Linear(7*7*8, num_classes)\n", " \n", " # Use only a weight vector instead of weight matrix:\n", " self.linear_1 = torch.nn.Linear(7*7*8, 1, bias=False)\n", " \n", " # Define bias manually:\n", " self.linear_1_bias = torch.nn.Parameter(torch.tensor(torch.zeros(num_classes),\n", " dtype=self.linear_1.weight.dtype))\n", " ##############################################################################\n", " \n", " def forward(self, x):\n", " out = self.conv_1(x)\n", " out = F.relu(out)\n", " out = self.pool_1(out)\n", "\n", " out = self.conv_2(out)\n", " out = F.relu(out)\n", " out = self.pool_2(out)\n", " \n", " ##############################################################################\n", " ### WEIGHT SHARING IN LAST LAYER\n", " \n", " # Duplicate outputs over all output units\n", " logits = self.linear_1(out.view(-1, 7*7*8))\n", " \n", " # then manually add bias\n", " logits = logits + self.linear_1_bias\n", " ############################################################################## \n", " \n", " probas = F.softmax(logits, dim=1)\n", " return logits, probas\n", "\n", " \n", "torch.manual_seed(random_seed)\n", "model = ConvNet(num_classes=num_classes)\n", "\n", "model = model.to(device)\n", "\n", "optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch: 001/010 | Batch 000/469 | Cost: 2.3026\n", "Epoch: 001/010 | Batch 050/469 | Cost: 2.2990\n", "Epoch: 001/010 | Batch 100/469 | Cost: 2.3003\n", "Epoch: 001/010 | Batch 150/469 | Cost: 2.2959\n", "Epoch: 001/010 | Batch 200/469 | Cost: 2.3057\n", "Epoch: 001/010 | Batch 250/469 | Cost: 2.2986\n", "Epoch: 001/010 | Batch 300/469 | Cost: 2.3015\n", "Epoch: 001/010 | Batch 350/469 | Cost: 2.3060\n", "Epoch: 001/010 | Batch 400/469 | Cost: 2.3028\n", "Epoch: 001/010 | Batch 450/469 | Cost: 2.2964\n", "Epoch: 001/010 training accuracy: 11.24%\n", "Time elapsed: 0.20 min\n", "Epoch: 002/010 | Batch 000/469 | Cost: 2.2972\n", "Epoch: 002/010 | Batch 050/469 | Cost: 2.3077\n", "Epoch: 002/010 | Batch 100/469 | Cost: 2.3085\n", "Epoch: 002/010 | Batch 150/469 | Cost: 2.3044\n", "Epoch: 002/010 | Batch 200/469 | Cost: 2.2997\n", "Epoch: 002/010 | Batch 250/469 | Cost: 2.2986\n", "Epoch: 002/010 | Batch 300/469 | Cost: 2.2935\n", "Epoch: 002/010 | Batch 350/469 | Cost: 2.3029\n", "Epoch: 002/010 | Batch 400/469 | Cost: 2.3011\n", "Epoch: 002/010 | Batch 450/469 | Cost: 2.3057\n", "Epoch: 002/010 training accuracy: 11.24%\n", "Time elapsed: 0.41 min\n", "Epoch: 003/010 | Batch 000/469 | Cost: 2.3035\n", "Epoch: 003/010 | Batch 050/469 | Cost: 2.3138\n", "Epoch: 003/010 | Batch 100/469 | Cost: 2.3072\n", "Epoch: 003/010 | Batch 150/469 | Cost: 2.3000\n", "Epoch: 003/010 | Batch 200/469 | Cost: 2.3023\n", "Epoch: 003/010 | Batch 250/469 | Cost: 2.2958\n", "Epoch: 003/010 | Batch 300/469 | Cost: 2.2970\n", "Epoch: 003/010 | Batch 350/469 | Cost: 2.3000\n", "Epoch: 003/010 | Batch 400/469 | Cost: 2.3005\n", "Epoch: 003/010 | Batch 450/469 | Cost: 2.2929\n", "Epoch: 003/010 training accuracy: 11.24%\n", "Time elapsed: 0.61 min\n", "Epoch: 004/010 | Batch 000/469 | Cost: 2.2920\n", "Epoch: 004/010 | Batch 050/469 | Cost: 2.2986\n", "Epoch: 004/010 | Batch 100/469 | Cost: 2.3005\n", "Epoch: 004/010 | Batch 150/469 | Cost: 2.2923\n", "Epoch: 004/010 | Batch 200/469 | Cost: 2.3051\n", "Epoch: 004/010 | Batch 250/469 | Cost: 2.3060\n", "Epoch: 004/010 | Batch 300/469 | Cost: 2.3073\n", "Epoch: 004/010 | Batch 350/469 | Cost: 2.3064\n", "Epoch: 004/010 | Batch 400/469 | Cost: 2.3052\n", "Epoch: 004/010 | Batch 450/469 | Cost: 2.2975\n", "Epoch: 004/010 training accuracy: 11.24%\n", "Time elapsed: 0.81 min\n", "Epoch: 005/010 | Batch 000/469 | Cost: 2.3027\n", "Epoch: 005/010 | Batch 050/469 | Cost: 2.3005\n", "Epoch: 005/010 | Batch 100/469 | Cost: 2.2946\n", "Epoch: 005/010 | Batch 150/469 | Cost: 2.2892\n", "Epoch: 005/010 | Batch 200/469 | Cost: 2.2958\n", "Epoch: 005/010 | Batch 250/469 | Cost: 2.3036\n", "Epoch: 005/010 | Batch 300/469 | Cost: 2.3003\n", "Epoch: 005/010 | Batch 350/469 | Cost: 2.3015\n", "Epoch: 005/010 | Batch 400/469 | Cost: 2.3057\n", "Epoch: 005/010 | Batch 450/469 | Cost: 2.2972\n", "Epoch: 005/010 training accuracy: 11.24%\n", "Time elapsed: 1.01 min\n", "Epoch: 006/010 | Batch 000/469 | Cost: 2.3029\n", "Epoch: 006/010 | Batch 050/469 | Cost: 2.2979\n", "Epoch: 006/010 | Batch 100/469 | Cost: 2.3027\n", "Epoch: 006/010 | Batch 150/469 | Cost: 2.3028\n", "Epoch: 006/010 | Batch 200/469 | Cost: 2.3010\n", "Epoch: 006/010 | Batch 250/469 | Cost: 2.3025\n", "Epoch: 006/010 | Batch 300/469 | Cost: 2.3054\n", "Epoch: 006/010 | Batch 350/469 | Cost: 2.2972\n", "Epoch: 006/010 | Batch 400/469 | Cost: 2.3037\n", "Epoch: 006/010 | Batch 450/469 | Cost: 2.3064\n", "Epoch: 006/010 training accuracy: 11.24%\n", "Time elapsed: 1.22 min\n", "Epoch: 007/010 | Batch 000/469 | Cost: 2.2983\n", "Epoch: 007/010 | Batch 050/469 | Cost: 2.2979\n", "Epoch: 007/010 | Batch 100/469 | Cost: 2.3077\n", "Epoch: 007/010 | Batch 150/469 | Cost: 2.3047\n", "Epoch: 007/010 | Batch 200/469 | Cost: 2.2998\n", "Epoch: 007/010 | Batch 250/469 | Cost: 2.2993\n", "Epoch: 007/010 | Batch 300/469 | Cost: 2.2966\n", "Epoch: 007/010 | Batch 350/469 | Cost: 2.2967\n", "Epoch: 007/010 | Batch 400/469 | Cost: 2.2916\n", "Epoch: 007/010 | Batch 450/469 | Cost: 2.3016\n", "Epoch: 007/010 training accuracy: 11.24%\n", "Time elapsed: 1.42 min\n", "Epoch: 008/010 | Batch 000/469 | Cost: 2.2992\n", "Epoch: 008/010 | Batch 050/469 | Cost: 2.2953\n", "Epoch: 008/010 | Batch 100/469 | Cost: 2.3018\n", "Epoch: 008/010 | Batch 150/469 | Cost: 2.3053\n", "Epoch: 008/010 | Batch 200/469 | Cost: 2.2983\n", "Epoch: 008/010 | Batch 250/469 | Cost: 2.3089\n", "Epoch: 008/010 | Batch 300/469 | Cost: 2.3048\n", "Epoch: 008/010 | Batch 350/469 | Cost: 2.3065\n", "Epoch: 008/010 | Batch 400/469 | Cost: 2.3037\n", "Epoch: 008/010 | Batch 450/469 | Cost: 2.2966\n", "Epoch: 008/010 training accuracy: 11.24%\n", "Time elapsed: 1.62 min\n", "Epoch: 009/010 | Batch 000/469 | Cost: 2.3060\n", "Epoch: 009/010 | Batch 050/469 | Cost: 2.2945\n", "Epoch: 009/010 | Batch 100/469 | Cost: 2.3037\n", "Epoch: 009/010 | Batch 150/469 | Cost: 2.3064\n", "Epoch: 009/010 | Batch 200/469 | Cost: 2.3043\n", "Epoch: 009/010 | Batch 250/469 | Cost: 2.2991\n", "Epoch: 009/010 | Batch 300/469 | Cost: 2.2945\n", "Epoch: 009/010 | Batch 350/469 | Cost: 2.2999\n", "Epoch: 009/010 | Batch 400/469 | Cost: 2.3131\n", "Epoch: 009/010 | Batch 450/469 | Cost: 2.3079\n", "Epoch: 009/010 training accuracy: 11.24%\n", "Time elapsed: 1.82 min\n", "Epoch: 010/010 | Batch 000/469 | Cost: 2.2977\n", "Epoch: 010/010 | Batch 050/469 | Cost: 2.2996\n", "Epoch: 010/010 | Batch 100/469 | Cost: 2.2979\n", "Epoch: 010/010 | Batch 150/469 | Cost: 2.3038\n", "Epoch: 010/010 | Batch 200/469 | Cost: 2.3062\n", "Epoch: 010/010 | Batch 250/469 | Cost: 2.2935\n", "Epoch: 010/010 | Batch 300/469 | Cost: 2.3027\n", "Epoch: 010/010 | Batch 350/469 | Cost: 2.3006\n", "Epoch: 010/010 | Batch 400/469 | Cost: 2.3051\n", "Epoch: 010/010 | Batch 450/469 | Cost: 2.3077\n", "Epoch: 010/010 training accuracy: 11.24%\n", "Time elapsed: 2.02 min\n", "Total Training Time: 2.02 min\n" ] } ], "source": [ "def compute_accuracy(model, data_loader):\n", " correct_pred, num_examples = 0, 0\n", " for features, targets in data_loader:\n", " features = features.to(device)\n", " targets = targets.to(device)\n", " logits, probas = model(features)\n", " _, predicted_labels = torch.max(probas, 1)\n", " num_examples += targets.size(0)\n", " correct_pred += (predicted_labels == targets).sum()\n", " return correct_pred.float()/num_examples * 100\n", " \n", "\n", "start_time = time.time()\n", "for epoch in range(num_epochs):\n", " model = model.train()\n", " for batch_idx, (features, targets) in enumerate(train_loader):\n", " \n", " features = features.to(device)\n", " targets = targets.to(device)\n", "\n", " ### FORWARD AND BACK PROP\n", " logits, probas = model(features)\n", " cost = F.cross_entropy(logits, targets)\n", " optimizer.zero_grad()\n", " \n", " cost.backward()\n", " \n", " ### UPDATE MODEL PARAMETERS\n", " optimizer.step()\n", " \n", " ### LOGGING\n", " if not batch_idx % 50:\n", " print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' \n", " %(epoch+1, num_epochs, batch_idx, \n", " len(train_loader), cost))\n", " \n", " model = model.eval()\n", " with torch.set_grad_enabled(False): # save memory during inference\n", " print('Epoch: %03d/%03d training accuracy: %.2f%%' % (\n", " epoch+1, num_epochs, \n", " compute_accuracy(model, train_loader)))\n", " \n", " print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))\n", " \n", "print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check that bias units updated correctly (should be all different):" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Parameter containing:\n", "tensor([-0.0202, 0.1097, -0.0029, 0.0253, -0.0269, -0.1026, -0.0194, 0.0572,\n", " -0.0027, -0.0175], device='cuda:3', requires_grad=True)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.linear_1_bias" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluation" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test accuracy: 11.35%\n" ] } ], "source": [ "with torch.set_grad_enabled(False): # save memory during inference\n", " print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**The classification performance is obviously poor, because in this case weight sharing is not ideal, but this is more meant as a technical reference/demo, not a real-world use case for this dataset**" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" }, "toc": { "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }