{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Classify \"Quick, draw!\" drawings\n", "\n", "**How to implement an image classifier in PyTorch?**\n", "\n", "* Data pipeline - DataSet, Data Augmentation\n", "* Implementation - PyTorch Modules\n", "* ConvNets - Convolution, learn Kernels, Pooling\n", "\n", "\"Can a neural network learn to recognize doodling?\" - [quickdraw.withgoogle.com][quickdraw]\n", "\n", "\n", " \n", "\n", "\n", "[quickdraw]:https://quickdraw.withgoogle.com/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "print(\"Torch version:\", torch.__version__)\n", "\n", "import torchvision\n", "print(\"Torchvision version:\", torchvision.__version__)\n", "\n", "import numpy as np\n", "print(\"Numpy version:\", np.__version__)\n", "\n", "import matplotlib\n", "print(\"Matplotlib version:\", matplotlib.__version__)\n", "\n", "import PIL\n", "print(\"PIL version:\", PIL.__version__)\n", "\n", "import IPython\n", "print(\"IPython version:\", IPython.__version__)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Setup Matplotlib\n", "%matplotlib inline\n", "#%config InlineBackend.figure_format = 'retina' # If you have a retina screen\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A QuickDraw DataSet\n", "\n", "More about PyTorch - [Data Loading and Processing Tutorial][dataloading-tutorial] by Sasank Chilamkurthy\n", "\n", "* **How to encapsulate a data set?** - DataSet\n", "* **How to perform Data augmentation?** - Transformation pipelines\n", "\n", "Download **Numpy bitmap files .npy** - [npy files from Google Cloud][quickdraw-npy] / [GitHub repository][quickdraw-github]\n", "\n", "Possible classes: [Airplanes][npy-planes] - [Cars][npy-cars] - [Cats][npy-cats] - [Ships][npy-ships]\n", "\n", "\n", " \n", "\n", "\n", "[dataloading-tutorial]:https://pytorch.org/tutorials/beginner/data_loading_tutorial.html\n", "[quickdraw-npy]:https://console.cloud.google.com/storage/quickdraw_dataset/full/numpy_bitmap\n", "[quickdraw-github]:https://github.com/googlecreativelab/quickdraw-dataset\n", "[npy-planes]:https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/airplane.npy\n", "[npy-cars]:https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/car.npy\n", "[npy-cats]:https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/cat.npy\n", "[npy-ships]:https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/cruise%20ship.npy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If your are on mybinder.org, you can use this code to download the data\n", "!mkdir -p data\n", "!wget \"https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/aircraft%20carrier.npy\" -O \"data/airplane.npy\" -q --show-progress\n", "!wget \"https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/car.npy\" -O \"data/car.npy\" -q --show-progress\n", "!wget \"https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/cat.npy\" -O \"data/cat.npy\" -q --show-progress\n", "!wget \"https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/cruise%20ship.npy\" -O \"data/cruise ship.npy\" -q --show-progress" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "# Collect files\n", "npy_files = [\n", " os.path.join('data', 'airplane.npy'),\n", " os.path.join('data', 'car.npy'),\n", " os.path.join('data', 'cat.npy'),\n", " os.path.join('data', 'cruise ship.npy'),\n", "]\n", "classes = ['plane', 'car', 'cat', 'ship']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from PIL import Image\n", "\n", "# Create a class for our data set\n", "class QuickDraw(torch.utils.data.Dataset):\n", " def __init__(self, npy_files, transform=None):\n", " # Open .npy files\n", " self.X_list = [np.load(f, mmap_mode='r') for f in npy_files]\n", " self.lengths = [len(X) for X in self.X_list]\n", " \n", " # Transformation pipeline\n", " self.transform = transform\n", "\n", " def __len__(self):\n", " return sum(self.lengths)\n", " \n", " def get_pixels(self, idx):\n", " for label, (X, l) in enumerate(zip(self.X_list, self.lengths)):\n", " if idx < l:\n", " return X[idx], label\n", " idx -= l\n", "\n", " def __getitem__(self, idx):\n", " # Get image\n", " img, label = self.get_pixels(idx)\n", " pil_img = Image.fromarray(255 - img.reshape(28, 28)) # White background\n", "\n", " # Transform image\n", " processed_img = self.transform(pil_img) if self.transform else pil_img\n", " \n", " return processed_img, label\n", " \n", "# Create the data set\n", "dataset = QuickDraw(npy_files)\n", "print('Size:', len(dataset))\n", "dataset[0][0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from torchvision import transforms\n", "\n", "# Data augmentation\n", "t = transforms.Compose([\n", " transforms.RandomAffine(degrees=25, translate=(0.1, 0.1), shear=5, fillcolor=255),\n", " transforms.RandomHorizontalFlip() \n", "])\n", "\n", "dataset = QuickDraw(npy_files, t)\n", "print('Size:', len(dataset))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get first image\n", "dataset[0][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data loaders\n", "\n", "The next steps of the Data Pipeline in PyTorch\n", "\n", "* **Data Samplers** - Train, Validation samplers\n", "* **Data Loaders** - Combine DataSet, DataSampler" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from torch.utils.data.sampler import SubsetRandomSampler\n", "\n", "# Define train/validation sets\n", "idx = np.arange(len(dataset)) # idx: 0 .. (n_images - 1)\n", "np.random.shuffle(idx) # shuffle\n", "\n", "# Create train/validation samplers\n", "valid_size = 500\n", "train_sampler = SubsetRandomSampler(idx[:-valid_size])\n", "valid_sampler = SubsetRandomSampler(idx[-valid_size:])\n", "\n", "print('Train set:', len(train_sampler))\n", "print('Validation set:', len(valid_sampler))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from torch.utils.data import DataLoader\n", "\n", "# Data augmentation for the \"training\" set\n", "train_t = transforms.Compose([\n", " transforms.RandomHorizontalFlip(),\n", " transforms.ToTensor(),\n", " transforms.Normalize((0.829,), (0.326,)) # Computed on the train set\n", "])\n", "valid_t = transforms.Compose([\n", " transforms.ToTensor(),\n", " transforms.Normalize((0.829,), (0.326,))\n", "])\n", "\n", "# Create DataSets\n", "train_set = QuickDraw(npy_files, train_t)\n", "valid_set = QuickDraw(npy_files, valid_t)\n", "\n", "# Create DataLoaders\n", "train_loader = DataLoader(train_set, batch_size=64, sampler=train_sampler)\n", "valid_loader = DataLoader(valid_set, batch_size=64, sampler=valid_sampler)\n", "\n", "# Plot sample images\n", "images, labels = next(iter(train_loader))\n", "print('Labels:', labels)\n", "\n", "grid = torchvision.utils.make_grid(images, normalize=True)\n", "plt.imshow(grid.numpy().transpose((1, 2, 0)))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fully-connected Network\n", "\n", "**What are the different ways to define a Network in PyTorch?**\n", "\n", "- **Sequential Class** - Add layers from **nn** module\n", "- **Create a NN Module** - Subclass of torch.nn.Module or others, ex. torch.nn.Sequential\n", "\n", "**PyTorch nn.Module** - [link to the documentation][pytorch-module]\n", "\n", "> `torch.nn.Module` - Base class for all neural network modules. Your models should also subclass this class.\n", "> Modules can also contain other Modules, allowing to nest them in a tree structure.\n", "> You can assign the submodules as regular attributes:\n", "> Submodules assigned in this way will be registered, and will have their parameters converted too when you call .cuda(), etc.\n", "\n", "[pytorch-module]:https://pytorch.org/docs/master/nn.html#torch.nn.Module" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create a FullyConnected \"Sequential\" Module\n", "class FullyConnected(torch.nn.Sequential):\n", " def __init__(self, n_in, n_out, h_units=[]):\n", " # Initialize module\n", " super().__init__()\n", " \n", " # Save network parameters\n", " self.n_inputs = n_in\n", " self.n_outputs = n_out\n", " self.h_units = h_units\n", " \n", " # Add hidden layers\n", " n_hidden = len(h_units)\n", " for i in range(n_hidden):\n", " # Input/output sizes\n", " hidden_in = n_in if i == 0 else h_units[i-1]\n", " hidden_out = h_units[i]\n", "\n", " # Add layer and activation\n", " self.add_module('hidden_{}'.format(i+1), torch.nn.Linear(hidden_in, hidden_out))\n", " self.add_module('relu_{}'.format(i+1), torch.nn.ReLU())\n", " \n", " # Add output layer\n", " output_in = n_in if n_hidden == 0 else hidden_out\n", " self.add_module('output', torch.nn.Linear(output_in, n_out))\n", " \n", " def forward(self, img):\n", " flat_img = img.view(-1, self.n_inputs)\n", " return super().forward(flat_img)\n", " \n", "# Test\n", "model = FullyConnected(28*28, len(classes))\n", "model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task - How to access Model Parameters?**\n", "\n", "* Plot weights from the first layer\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Visualize 1st layer of a FC network\n", "def plot_weights(weights_fc1, axis):\n", " # Shape of weights matrix\n", " n_out, n_in = weights_fc1.shape\n", "\n", " # Create a grid\n", " n_cells = min(16, n_out)\n", " grid = torchvision.utils.make_grid(\n", " weights_fc1[:n_cells].view(n_cells, 1, 28, 28),\n", " nrow=4, normalize=True\n", " )\n", " \n", " # Plot it\n", " axis.imshow(grid.numpy().transpose((1, 2, 0)),)\n", " \n", "# TODO - Plot weights from \"model\" 1st layer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train the Model\n", "\n", "**Tasks - What is a good Network Architecture?**\n", "\n", "* 2-layer FC network - Best accuracy?\n", "* Deeper network - Can you improve results?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from collections import defaultdict\n", "\n", "# Create model\n", "model = FullyConnected(28*28, len(classes))\n", "\n", "# Criterion and optimizer for \"training\"\n", "criterion = torch.nn.CrossEntropyLoss()\n", "optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=1)\n", "\n", "# Backprop step\n", "def compute_loss(output, target):\n", " y_tensor = torch.LongTensor(target)\n", " y_variable = torch.autograd.Variable(y_tensor)\n", " return criterion(output, y_variable)\n", "\n", "def backpropagation(output, target):\n", " optimizer.zero_grad() # Clear the gradients\n", " loss = compute_loss(output, target) # Compute loss\n", " loss.backward() # Backpropagation\n", " optimizer.step() # Let the optimizer adjust our model\n", " return loss.data\n", "\n", "# Helper function\n", "def get_accuracy(output, y):\n", " predictions = torch.argmax(output, dim=1) # Max activation\n", " is_correct = np.equal(predictions, y)\n", " return is_correct.numpy().mean()\n", " \n", "# Create a figure to visualize the results\n", "fig, (ax1, ax2, ax3) = plt.subplots(nrows=1, ncols=3, figsize=(12, 3))\n", " \n", "try:\n", " # Collect loss / accuracy values\n", " stats = defaultdict(list)\n", " t = 0 # Number of samples seen\n", " print_step = 200 # Refresh rate\n", " \n", " for epoch in range(1, 10**5):\n", " # Train by small batches of data\n", " for batch, (batch_X, batch_y) in enumerate(train_loader, 1):\n", " # Forward pass & backpropagation\n", " output = model(batch_X)\n", " loss = backpropagation(output, batch_y)\n", " \n", " # Log \"train\" stats\n", " stats['train_loss'].append(loss)\n", " stats['train_acc'].append(get_accuracy(output, batch_y))\n", " stats['train_t'].append(t)\n", "\n", " if t%print_step == 0:\n", " # Log \"validation\" stats\n", " loss_vals, acc_vals = [], []\n", " for X, y in valid_loader:\n", " output = model(X)\n", " loss_vals.append(compute_loss(output, y).data)\n", " acc_vals.append(get_accuracy(output, y))\n", " \n", " stats['val_loss'].append(np.mean(loss_vals))\n", " stats['val_acc'].append(np.mean(acc_vals))\n", " stats['val_t'].append(t)\n", " \n", " # Plot what the network learned\n", " ax1.cla()\n", " ax1.set_title('Epoch {}, batch {:,}'.format(epoch, batch))\n", " plot_weights(model[0].weight.data, ax1)\n", " ax2.cla()\n", " ax2.set_title('Loss, val: {:.3f}'.format(np.mean(stats['val_loss'][-10:])))\n", " ax2.plot(stats['train_t'], stats['train_loss'], label='train')\n", " ax2.plot(stats['val_t'], stats['val_loss'], label='valid')\n", " ax2.legend()\n", " ax3.cla()\n", " ax3.set_title('Accuracy, val: {:.3f}'.format(np.mean(stats['val_acc'][-10:])))\n", " ax3.plot(stats['train_t'], stats['train_acc'], label='train')\n", " ax3.plot(stats['val_t'], stats['val_acc'], label='valid')\n", " ax3.set_ylim(0, 1)\n", " ax3.legend()\n", "\n", " # Jupyter trick\n", " IPython.display.clear_output(wait=True)\n", " IPython.display.display(fig)\n", " \n", " # Update t\n", " t += train_loader.batch_size\n", "\n", "except KeyboardInterrupt:\n", " # Clear output\n", " IPython.display.clear_output()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convolutional Network\n", "\n", "\"Make some assumptions about the inputs to make learning more efficient\" - [Andrej Karpathy Lecture][karpathy-lecture]\n", "\n", "\n", " \n", "\n", "\n", "**Convolutional Layers Parameters**\n", "\n", "* **Kernel Size** - Size of our \"Feature Detectors\"\n", "* **Output depth** - Number of kernels\n", "* **Stride** - How they move\n", "* **Padding** - Add \"borders\" to the inputs\n", "\n", "**Pooling Parameters**\n", "\n", "* **Pooling function** - Maximum, Average\n", "* **Size and Stride** - Downsampling ex. size=2 and stride=2\n", "\n", "Implementation - Inspired by [AlexNet PyTorch Code][pytorch-alexnet]\n", "\n", "[karpathy-lecture]:https://youtu.be/u6aEYuemt0M?t=10s\n", "[exts-convolution]:https://youtu.be/Y1ugnb0bobk\n", "[pytorch-alexnet]:https://github.com/pytorch/vision/blob/master/torchvision/models/alexnet.py" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from collections import namedtuple\n", "\n", "# Create a Pooling Named Tuple\n", "PoolParams = namedtuple('PoolParams', ['size', 'stride'])\n", "sample_poolparams = PoolParams(size=2, stride=2)\n", "\n", "# Create a ConvParams Named Tuple\n", "ConvParams = namedtuple('ConvParams', ['size', 'n_kernels', 'stride', 'pooling'])\n", "sample_convparams = ConvParams(size=16, n_kernels=5, stride=2, pooling=None)\n", "\n", "# Create an InputShape Named Tuple\n", "InputShape = namedtuple('InputShape', ['channels', 'height', 'width'])\n", "images_shape = InputShape(channels=1, height=28, width=28)\n", "\n", "# Create a ConvNet Module\n", "class ConvNet(torch.nn.Module):\n", " def __init__(self, layers_params, input_shape):\n", " # Initialize module\n", " super().__init__()\n", " \n", " # \"Feature extraction\" part\n", " self.features = torch.nn.Sequential()\n", " for layer_no, (size, n_kernels, stride, pooling) in enumerate(layers_params, 1):\n", " # Input depth\n", " depth_in = input_shape.channels if layer_no == 1 else layers_params[layer_no-2].n_kernels\n", " \n", " # Convolutional Layer\n", " self.features.add_module(\n", " 'conv2d_{}'.format(layer_no),\n", " torch.nn.Conv2d(depth_in, n_kernels, size, stride)\n", " )\n", " self.features.add_module('relu_{}'.format(layer_no), torch.nn.ReLU())\n", " \n", " # Max-pooling layer\n", " if pooling is not None:\n", " self.features.add_module('maxpool_{}'.format(layer_no), torch.nn.MaxPool2d(pooling.size, pooling.stride))\n", " \n", " # Compute the number of features extracted\n", " sample_input = torch.zeros(1, input_shape.channels, input_shape.height, input_shape.width)\n", " _, depth_out, height_out, width_out = self.features(sample_input).shape\n", " self.n_features = depth_out*height_out*width_out\n", "\n", " # \"Classifier\" part\n", " self.classifier = torch.nn.Sequential(\n", " torch.nn.Linear(self.n_features, 10)\n", " )\n", " \n", " def forward(self, img):\n", " # Extract features\n", " img_features = self.features(img)\n", " flat_features = img_features.view(-1, self.n_features)\n", " \n", " # Classify image\n", " return self.classifier(flat_features)\n", " \n", "# Create toy model\n", "model = ConvNet([\n", " ConvParams(size=5, n_kernels=16, stride=2, pooling=PoolParams(size=2, stride=2)),\n", " ConvParams(size=3, n_kernels=32, stride=1, pooling=PoolParams(size=2, stride=2)),\n", "], images_shape)\n", "model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Try forward pass\n", "print('Sample input:', images.shape)\n", "print('Sample output:', model(images).shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**How to access Model Parameters?**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Visualize kernels from the 1st Convolutional Layer\n", "def plot_kernels(model, axis):\n", " # Weights\n", " kernel_weights = model.features.conv2d_1.weight.data\n", " n_kernels, in_depth, height, width = kernel_weights.shape\n", "\n", " # Create a grid\n", " n_cells = min(16, n_kernels)\n", " grid = torchvision.utils.make_grid(\n", " kernel_weights[:n_cells, 0].view(n_cells, in_depth, height, width),\n", " nrow=4, normalize=True, padding=1\n", " )\n", " \n", " # Plot it\n", " axis.imshow(grid.numpy().transpose((1, 2, 0)),)\n", " \n", "fig = plt.figure(figsize=(3, 3))\n", "plot_kernels(model, fig.gca())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Tasks - Convolutional Nets**\n", "\n", "* Train model - Adapt \"training\" code from above for ConvNets\n", "* Model Architecture - Play with the different parameters, best accuracy?\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO - Train ConvNet here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Small challenges\n", "\n", "* Print the number of parameters in each layer\n", "* Pass a few images and print the outputs shapes\n", "* Plot the \"activation maps\" for a sample input" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Additional resources\n", "\n", "Nice visualizations\n", "\n", "* Deep Visualization Toolbox - [Presentation on YouTube][vistool-video] / [GitHub repository][vistool-github]\n", "* Feature Visualization - [distill.pub article][distill-featvis]\n", "* The Building Blocks of Interpretability - [distill.pub article][distill-bblocks]\n", "\n", "To go deeper\n", "\n", "* ImageNet Classification with Deep Convolutional Neural Networks - [Slides][alexnet-slides]\n", "\n", "[vistool-video]:https://youtu.be/AgkfIQ4IGaM\n", "[vistool-github]:https://github.com/yosinski/deep-visualization-toolbox\n", "[distill-featvis]:https://distill.pub/2017/feature-visualization/\n", "[distill-bblocks]:https://distill.pub/2018/building-blocks/\n", "[alexnet-slides]:vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }