{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.\n", "- Author: Sebastian Raschka\n", "- GitHub Repository: https://github.com/rasbt/deeplearning-models" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sebastian Raschka \n", "\n", "CPython 3.6.8\n", "IPython 7.2.0\n", "\n", "torch 1.0.0\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -a 'Sebastian Raschka' -v -p torch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Runs on CPU or GPU (if available)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Zoo -Standardizing Images" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook provides an example for working with standardized images, that is, images where the image pixels in each image has mean zero and unit variance across the channel. \n", "\n", "The general equation for z-score standardization is computed as \n", "\n", "$$x' = \\frac{x_i - \\mu}{\\sigma}$$\n", "\n", "where $\\mu$ is the mean and $\\sigma$ is the standard deviation of the training set, respectively. Then $x_i'$ is the scaled feature feature value, and $x_i$ is the original feature value.\n", "\n", "I.e, for grayscale images, we would obtain 1 mean and 1 standard deviation. For RGB images (3 color channels), we would obtain 3 mean values and 3 standard deviations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import time\n", "import numpy as np\n", "import torch\n", "import torch.nn.functional as F\n", "from torchvision import datasets\n", "from torchvision import transforms\n", "from torch.utils.data import DataLoader\n", "\n", "\n", "if torch.cuda.is_available():\n", " torch.backends.cudnn.deterministic = True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Settings and Dataset" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "##########################\n", "### SETTINGS\n", "##########################\n", "\n", "# Device\n", "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n", "\n", "# Hyperparameters\n", "random_seed = 1\n", "learning_rate = 0.05\n", "num_epochs = 10\n", "batch_size = 128\n", "\n", "# Architecture\n", "num_classes = 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compute the Mean and Standard Deviation for Normalization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we need to determine the mean and standard deviation for each color channel in the training set. Since we assume the entire dataset does not fit into the computer memory all at once, we do this in an incremental fashion, as shown below." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean: tensor([0.1307])\n", "Std Dev: tensor([0.3077])\n" ] } ], "source": [ "##############################\n", "### PRELIMINARY DATALOADER\n", "##############################\n", "\n", "\n", "train_dataset = datasets.MNIST(root='data', \n", " train=True, \n", " transform=transforms.ToTensor(),\n", " download=True)\n", "\n", "train_loader = DataLoader(dataset=train_dataset, \n", " batch_size=batch_size, \n", " shuffle=False)\n", "\n", "train_mean = []\n", "train_std = []\n", "\n", "for i, image in enumerate(train_loader, 0):\n", " numpy_image = image[0].numpy()\n", " \n", " batch_mean = np.mean(numpy_image, axis=(0, 2, 3))\n", " batch_std = np.std(numpy_image, axis=(0, 2, 3))\n", " \n", " train_mean.append(batch_mean)\n", " train_std.append(batch_std)\n", "\n", "train_mean = torch.tensor(np.mean(train_mean, axis=0))\n", "train_std = torch.tensor(np.mean(train_std, axis=0))\n", "\n", "print('Mean:', train_mean)\n", "print('Std Dev:', train_std)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note that**\n", "\n", "- For RGB images (3 color channels), we would get 3 means and 3 standard deviations.\n", "- The transforms.ToTensor() method converts images to [0, 1] range, which is why the mean and standard deviation values are below 1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Standardized Dataset Loader" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use a custom transform function to standardize the dataset according the the mean and standard deviation we computed above." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "custom_transform = transforms.Compose([transforms.ToTensor(),\n", " transforms.Normalize(mean=train_mean, std=train_std)])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "##########################\n", "### MNIST DATASET\n", "##########################\n", "\n", "# Note transforms.ToTensor() scales input images\n", "# to 0-1 range\n", "train_dataset = datasets.MNIST(root='data', \n", " train=True, \n", " transform=custom_transform,\n", " download=True)\n", "\n", "test_dataset = datasets.MNIST(root='data', \n", " train=False, \n", " transform=custom_transform)\n", "\n", "\n", "train_loader = DataLoader(dataset=train_dataset, \n", " batch_size=batch_size, \n", " shuffle=True)\n", "\n", "test_loader = DataLoader(dataset=test_dataset, \n", " batch_size=batch_size, \n", " shuffle=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check that the dataset can be loaded:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Image batch dimensions: torch.Size([128, 1, 28, 28])\n", "Image label dimensions: torch.Size([128])\n" ] } ], "source": [ "# Checking the dataset\n", "for images, labels in train_loader: \n", " print('Image batch dimensions:', images.shape)\n", " print('Image label dimensions:', labels.shape)\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the given batch, check that the channel means and standard deviations are roughly 0 and 1, respectively:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Channel mean: tensor(0.0035)\n", "Channel std: tensor(1.0037)\n" ] } ], "source": [ "print('Channel mean:', torch.mean(images[:, 0, :, :]))\n", "print('Channel std:', torch.std(images[:, 0, :, :]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "##########################\n", "### MODEL\n", "##########################\n", "\n", "\n", "class ConvNet(torch.nn.Module):\n", "\n", " def __init__(self, num_classes):\n", " super(ConvNet, self).__init__()\n", " \n", " # calculate same padding:\n", " # (w - k + 2*p)/s + 1 = o\n", " # => p = (s(o-1) - w + k)/2\n", " \n", " # 28x28x1 => 28x28x4\n", " self.conv_1 = torch.nn.Conv2d(in_channels=1,\n", " out_channels=4,\n", " kernel_size=(3, 3),\n", " stride=(1, 1),\n", " padding=1) # (1(28-1) - 28 + 3) / 2 = 1\n", " # 28x28x4 => 14x14x4\n", " self.pool_1 = torch.nn.MaxPool2d(kernel_size=(2, 2),\n", " stride=(2, 2),\n", " padding=0) # (2(14-1) - 28 + 2) = 0 \n", " # 14x14x4 => 14x14x8\n", " self.conv_2 = torch.nn.Conv2d(in_channels=4,\n", " out_channels=8,\n", " kernel_size=(3, 3),\n", " stride=(1, 1),\n", " padding=1) # (1(14-1) - 14 + 3) / 2 = 1 \n", " # 14x14x8 => 7x7x8 \n", " self.pool_2 = torch.nn.MaxPool2d(kernel_size=(2, 2),\n", " stride=(2, 2),\n", " padding=0) # (2(7-1) - 14 + 2) = 0\n", " \n", " self.linear_1 = torch.nn.Linear(7*7*8, num_classes)\n", "\n", " \n", " def forward(self, x):\n", " out = self.conv_1(x)\n", " out = F.relu(out)\n", " out = self.pool_1(out)\n", "\n", " out = self.conv_2(out)\n", " out = F.relu(out)\n", " out = self.pool_2(out)\n", " \n", " logits = self.linear_1(out.view(-1, 7*7*8))\n", " probas = F.softmax(logits, dim=1)\n", " return logits, probas\n", "\n", " \n", "torch.manual_seed(random_seed)\n", "model = ConvNet(num_classes=num_classes)\n", "\n", "model = model.to(device)\n", "\n", "optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch: 001/010 | Batch 000/469 | Cost: 2.3170\n", "Epoch: 001/010 | Batch 050/469 | Cost: 0.9476\n", "Epoch: 001/010 | Batch 100/469 | Cost: 0.4007\n", "Epoch: 001/010 | Batch 150/469 | Cost: 0.2662\n", "Epoch: 001/010 | Batch 200/469 | Cost: 0.3218\n", "Epoch: 001/010 | Batch 250/469 | Cost: 0.2300\n", "Epoch: 001/010 | Batch 300/469 | Cost: 0.1494\n", "Epoch: 001/010 | Batch 350/469 | Cost: 0.1837\n", "Epoch: 001/010 | Batch 400/469 | Cost: 0.2072\n", "Epoch: 001/010 | Batch 450/469 | Cost: 0.1541\n", "Epoch: 001/010 training accuracy: 95.68%\n", "Time elapsed: 0.27 min\n", "Epoch: 002/010 | Batch 000/469 | Cost: 0.1077\n", "Epoch: 002/010 | Batch 050/469 | Cost: 0.1394\n", "Epoch: 002/010 | Batch 100/469 | Cost: 0.1593\n", "Epoch: 002/010 | Batch 150/469 | Cost: 0.2174\n", "Epoch: 002/010 | Batch 200/469 | Cost: 0.1093\n", "Epoch: 002/010 | Batch 250/469 | Cost: 0.1314\n", "Epoch: 002/010 | Batch 300/469 | Cost: 0.1019\n", "Epoch: 002/010 | Batch 350/469 | Cost: 0.1102\n", "Epoch: 002/010 | Batch 400/469 | Cost: 0.1028\n", "Epoch: 002/010 | Batch 450/469 | Cost: 0.0678\n", "Epoch: 002/010 training accuracy: 96.50%\n", "Time elapsed: 0.54 min\n", "Epoch: 003/010 | Batch 000/469 | Cost: 0.0713\n", "Epoch: 003/010 | Batch 050/469 | Cost: 0.1317\n", "Epoch: 003/010 | Batch 100/469 | Cost: 0.1434\n", "Epoch: 003/010 | Batch 150/469 | Cost: 0.0452\n", "Epoch: 003/010 | Batch 200/469 | Cost: 0.0783\n", "Epoch: 003/010 | Batch 250/469 | Cost: 0.2011\n", "Epoch: 003/010 | Batch 300/469 | Cost: 0.1132\n", "Epoch: 003/010 | Batch 350/469 | Cost: 0.0930\n", "Epoch: 003/010 | Batch 400/469 | Cost: 0.0842\n", "Epoch: 003/010 | Batch 450/469 | Cost: 0.1059\n", "Epoch: 003/010 training accuracy: 96.39%\n", "Time elapsed: 0.81 min\n", "Epoch: 004/010 | Batch 000/469 | Cost: 0.1334\n", "Epoch: 004/010 | Batch 050/469 | Cost: 0.1208\n", "Epoch: 004/010 | Batch 100/469 | Cost: 0.0962\n", "Epoch: 004/010 | Batch 150/469 | Cost: 0.1293\n", "Epoch: 004/010 | Batch 200/469 | Cost: 0.0977\n", "Epoch: 004/010 | Batch 250/469 | Cost: 0.0504\n", "Epoch: 004/010 | Batch 300/469 | Cost: 0.0801\n", "Epoch: 004/010 | Batch 350/469 | Cost: 0.0968\n", "Epoch: 004/010 | Batch 400/469 | Cost: 0.2138\n", "Epoch: 004/010 | Batch 450/469 | Cost: 0.0946\n", "Epoch: 004/010 training accuracy: 97.52%\n", "Time elapsed: 1.08 min\n", "Epoch: 005/010 | Batch 000/469 | Cost: 0.0960\n", "Epoch: 005/010 | Batch 050/469 | Cost: 0.0132\n", "Epoch: 005/010 | Batch 100/469 | Cost: 0.1012\n", "Epoch: 005/010 | Batch 150/469 | Cost: 0.0437\n", "Epoch: 005/010 | Batch 200/469 | Cost: 0.0386\n", "Epoch: 005/010 | Batch 250/469 | Cost: 0.0461\n", "Epoch: 005/010 | Batch 300/469 | Cost: 0.1063\n", "Epoch: 005/010 | Batch 350/469 | Cost: 0.0972\n", "Epoch: 005/010 | Batch 400/469 | Cost: 0.0912\n", "Epoch: 005/010 | Batch 450/469 | Cost: 0.0633\n", "Epoch: 005/010 training accuracy: 97.84%\n", "Time elapsed: 1.35 min\n", "Epoch: 006/010 | Batch 000/469 | Cost: 0.0771\n", "Epoch: 006/010 | Batch 050/469 | Cost: 0.0245\n", "Epoch: 006/010 | Batch 100/469 | Cost: 0.0373\n", "Epoch: 006/010 | Batch 150/469 | Cost: 0.0459\n", "Epoch: 006/010 | Batch 200/469 | Cost: 0.1140\n", "Epoch: 006/010 | Batch 250/469 | Cost: 0.0465\n", "Epoch: 006/010 | Batch 300/469 | Cost: 0.0166\n", "Epoch: 006/010 | Batch 350/469 | Cost: 0.0145\n", "Epoch: 006/010 | Batch 400/469 | Cost: 0.0621\n", "Epoch: 006/010 | Batch 450/469 | Cost: 0.0570\n", "Epoch: 006/010 training accuracy: 97.94%\n", "Time elapsed: 1.62 min\n", "Epoch: 007/010 | Batch 000/469 | Cost: 0.1529\n", "Epoch: 007/010 | Batch 050/469 | Cost: 0.0283\n", "Epoch: 007/010 | Batch 100/469 | Cost: 0.0463\n", "Epoch: 007/010 | Batch 150/469 | Cost: 0.0623\n", "Epoch: 007/010 | Batch 200/469 | Cost: 0.0637\n", "Epoch: 007/010 | Batch 250/469 | Cost: 0.0718\n", "Epoch: 007/010 | Batch 300/469 | Cost: 0.0098\n", "Epoch: 007/010 | Batch 350/469 | Cost: 0.0853\n", "Epoch: 007/010 | Batch 400/469 | Cost: 0.0958\n", "Epoch: 007/010 | Batch 450/469 | Cost: 0.0633\n", "Epoch: 007/010 training accuracy: 98.17%\n", "Time elapsed: 1.89 min\n", "Epoch: 008/010 | Batch 000/469 | Cost: 0.1090\n", "Epoch: 008/010 | Batch 050/469 | Cost: 0.0963\n", "Epoch: 008/010 | Batch 100/469 | Cost: 0.1356\n", "Epoch: 008/010 | Batch 150/469 | Cost: 0.0197\n", "Epoch: 008/010 | Batch 200/469 | Cost: 0.0714\n", "Epoch: 008/010 | Batch 250/469 | Cost: 0.0509\n", "Epoch: 008/010 | Batch 300/469 | Cost: 0.0830\n", "Epoch: 008/010 | Batch 350/469 | Cost: 0.0872\n", "Epoch: 008/010 | Batch 400/469 | Cost: 0.0888\n", "Epoch: 008/010 | Batch 450/469 | Cost: 0.1107\n", "Epoch: 008/010 training accuracy: 97.96%\n", "Time elapsed: 2.15 min\n", "Epoch: 009/010 | Batch 000/469 | Cost: 0.0666\n", "Epoch: 009/010 | Batch 050/469 | Cost: 0.0787\n", "Epoch: 009/010 | Batch 100/469 | Cost: 0.1526\n", "Epoch: 009/010 | Batch 150/469 | Cost: 0.0501\n", "Epoch: 009/010 | Batch 200/469 | Cost: 0.0628\n", "Epoch: 009/010 | Batch 250/469 | Cost: 0.1503\n", "Epoch: 009/010 | Batch 300/469 | Cost: 0.0475\n", "Epoch: 009/010 | Batch 350/469 | Cost: 0.0390\n", "Epoch: 009/010 | Batch 400/469 | Cost: 0.0298\n", "Epoch: 009/010 | Batch 450/469 | Cost: 0.0184\n", "Epoch: 009/010 training accuracy: 98.25%\n", "Time elapsed: 2.42 min\n", "Epoch: 010/010 | Batch 000/469 | Cost: 0.0119\n", "Epoch: 010/010 | Batch 050/469 | Cost: 0.0582\n", "Epoch: 010/010 | Batch 100/469 | Cost: 0.0242\n", "Epoch: 010/010 | Batch 150/469 | Cost: 0.0256\n", "Epoch: 010/010 | Batch 200/469 | Cost: 0.0234\n", "Epoch: 010/010 | Batch 250/469 | Cost: 0.0455\n", "Epoch: 010/010 | Batch 300/469 | Cost: 0.0744\n", "Epoch: 010/010 | Batch 350/469 | Cost: 0.1547\n", "Epoch: 010/010 | Batch 400/469 | Cost: 0.0181\n", "Epoch: 010/010 | Batch 450/469 | Cost: 0.0622\n", "Epoch: 010/010 training accuracy: 98.18%\n", "Time elapsed: 2.69 min\n", "Total Training Time: 2.69 min\n" ] } ], "source": [ "def compute_accuracy(model, data_loader):\n", " correct_pred, num_examples = 0, 0\n", " for features, targets in data_loader:\n", " features = features.to(device)\n", " targets = targets.to(device)\n", " logits, probas = model(features)\n", " _, predicted_labels = torch.max(probas, 1)\n", " num_examples += targets.size(0)\n", " correct_pred += (predicted_labels == targets).sum()\n", " return correct_pred.float()/num_examples * 100\n", " \n", "\n", "start_time = time.time()\n", "for epoch in range(num_epochs):\n", " model = model.train()\n", " for batch_idx, (features, targets) in enumerate(train_loader):\n", " \n", " features = features.to(device)\n", " targets = targets.to(device)\n", "\n", " ### FORWARD AND BACK PROP\n", " logits, probas = model(features)\n", " cost = F.cross_entropy(logits, targets)\n", " optimizer.zero_grad()\n", " \n", " cost.backward()\n", " \n", " ### UPDATE MODEL PARAMETERS\n", " optimizer.step()\n", " \n", " ### LOGGING\n", " if not batch_idx % 50:\n", " print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' \n", " %(epoch+1, num_epochs, batch_idx, \n", " len(train_loader), cost))\n", " \n", " model = model.eval()\n", " print('Epoch: %03d/%03d training accuracy: %.2f%%' % (\n", " epoch+1, num_epochs, \n", " compute_accuracy(model, train_loader)))\n", " \n", " print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))\n", " \n", "print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluation" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test accuracy: 98.07%\n" ] } ], "source": [ "print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "numpy 1.15.4\n", "torch 1.0.0\n", "\n" ] } ], "source": [ "%watermark -iv" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" }, "toc": { "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }