{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Classify \"Quick, draw!\" drawings\n",
"\n",
"**How to implement an image classifier in PyTorch?**\n",
"\n",
"* Data pipeline - DataSet, Data Augmentation\n",
"* Implementation - PyTorch Modules\n",
"* ConvNets - Convolution, learn Kernels, Pooling\n",
"\n",
"\"Can a neural network learn to recognize doodling?\" - [quickdraw.withgoogle.com][quickdraw]\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"[quickdraw]:https://quickdraw.withgoogle.com/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"print(\"Torch version:\", torch.__version__)\n",
"\n",
"import torchvision\n",
"print(\"Torchvision version:\", torchvision.__version__)\n",
"\n",
"import numpy as np\n",
"print(\"Numpy version:\", np.__version__)\n",
"\n",
"import matplotlib\n",
"print(\"Matplotlib version:\", matplotlib.__version__)\n",
"\n",
"import PIL\n",
"print(\"PIL version:\", PIL.__version__)\n",
"\n",
"import IPython\n",
"print(\"IPython version:\", IPython.__version__)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Setup Matplotlib\n",
"%matplotlib inline\n",
"#%config InlineBackend.figure_format = 'retina' # If you have a retina screen\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A QuickDraw DataSet\n",
"\n",
"More about PyTorch - [Data Loading and Processing Tutorial][dataloading-tutorial] by Sasank Chilamkurthy\n",
"\n",
"* **How to encapsulate a data set?** - DataSet\n",
"* **How to perform Data augmentation?** - Transformation pipelines\n",
"\n",
"Download **Numpy bitmap files .npy** - [npy files from Google Cloud][quickdraw-npy] / [GitHub repository][quickdraw-github]\n",
"\n",
"Possible classes: [Airplanes][npy-planes] - [Cars][npy-cars] - [Cats][npy-cats] - [Ships][npy-ships]\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"[dataloading-tutorial]:https://pytorch.org/tutorials/beginner/data_loading_tutorial.html\n",
"[quickdraw-npy]:https://console.cloud.google.com/storage/quickdraw_dataset/full/numpy_bitmap\n",
"[quickdraw-github]:https://github.com/googlecreativelab/quickdraw-dataset\n",
"[npy-planes]:https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/airplane.npy\n",
"[npy-cars]:https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/car.npy\n",
"[npy-cats]:https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/cat.npy\n",
"[npy-ships]:https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/cruise%20ship.npy"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If your are on mybinder.org, you can use this code to download the data\n",
"!mkdir -p data\n",
"!wget \"https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/aircraft%20carrier.npy\" -O \"data/airplane.npy\" -q --show-progress\n",
"!wget \"https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/car.npy\" -O \"data/car.npy\" -q --show-progress\n",
"!wget \"https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/cat.npy\" -O \"data/cat.npy\" -q --show-progress\n",
"!wget \"https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/cruise%20ship.npy\" -O \"data/cruise ship.npy\" -q --show-progress"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# Collect files\n",
"npy_files = [\n",
" os.path.join('data', 'airplane.npy'),\n",
" os.path.join('data', 'car.npy'),\n",
" os.path.join('data', 'cat.npy'),\n",
" os.path.join('data', 'cruise ship.npy'),\n",
"]\n",
"classes = ['plane', 'car', 'cat', 'ship']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from PIL import Image\n",
"\n",
"# Create a class for our data set\n",
"class QuickDraw(torch.utils.data.Dataset):\n",
" def __init__(self, npy_files, transform=None):\n",
" # Open .npy files\n",
" self.X_list = [np.load(f, mmap_mode='r') for f in npy_files]\n",
" self.lengths = [len(X) for X in self.X_list]\n",
" \n",
" # Transformation pipeline\n",
" self.transform = transform\n",
"\n",
" def __len__(self):\n",
" return sum(self.lengths)\n",
" \n",
" def get_pixels(self, idx):\n",
" for label, (X, l) in enumerate(zip(self.X_list, self.lengths)):\n",
" if idx < l:\n",
" return X[idx], label\n",
" idx -= l\n",
"\n",
" def __getitem__(self, idx):\n",
" # Get image\n",
" img, label = self.get_pixels(idx)\n",
" pil_img = Image.fromarray(255 - img.reshape(28, 28)) # White background\n",
"\n",
" # Transform image\n",
" processed_img = self.transform(pil_img) if self.transform else pil_img\n",
" \n",
" return processed_img, label\n",
" \n",
"# Create the data set\n",
"dataset = QuickDraw(npy_files)\n",
"print('Size:', len(dataset))\n",
"dataset[0][0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from torchvision import transforms\n",
"\n",
"# Data augmentation\n",
"t = transforms.Compose([\n",
" transforms.RandomAffine(degrees=25, translate=(0.1, 0.1), shear=5, fillcolor=255),\n",
" transforms.RandomHorizontalFlip() \n",
"])\n",
"\n",
"dataset = QuickDraw(npy_files, t)\n",
"print('Size:', len(dataset))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get first image\n",
"dataset[0][0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data loaders\n",
"\n",
"The next steps of the Data Pipeline in PyTorch\n",
"\n",
"* **Data Samplers** - Train, Validation samplers\n",
"* **Data Loaders** - Combine DataSet, DataSampler"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from torch.utils.data.sampler import SubsetRandomSampler\n",
"\n",
"# Define train/validation sets\n",
"idx = np.arange(len(dataset)) # idx: 0 .. (n_images - 1)\n",
"np.random.shuffle(idx) # shuffle\n",
"\n",
"# Create train/validation samplers\n",
"valid_size = 500\n",
"train_sampler = SubsetRandomSampler(idx[:-valid_size])\n",
"valid_sampler = SubsetRandomSampler(idx[-valid_size:])\n",
"\n",
"print('Train set:', len(train_sampler))\n",
"print('Validation set:', len(valid_sampler))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from torch.utils.data import DataLoader\n",
"\n",
"# Data augmentation for the \"training\" set\n",
"train_t = transforms.Compose([\n",
" transforms.RandomHorizontalFlip(),\n",
" transforms.ToTensor(),\n",
" transforms.Normalize((0.829,), (0.326,)) # Computed on the train set\n",
"])\n",
"valid_t = transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize((0.829,), (0.326,))\n",
"])\n",
"\n",
"# Create DataSets\n",
"train_set = QuickDraw(npy_files, train_t)\n",
"valid_set = QuickDraw(npy_files, valid_t)\n",
"\n",
"# Create DataLoaders\n",
"train_loader = DataLoader(train_set, batch_size=64, sampler=train_sampler)\n",
"valid_loader = DataLoader(valid_set, batch_size=64, sampler=valid_sampler)\n",
"\n",
"# Plot sample images\n",
"images, labels = next(iter(train_loader))\n",
"print('Labels:', labels)\n",
"\n",
"grid = torchvision.utils.make_grid(images, normalize=True)\n",
"plt.imshow(grid.numpy().transpose((1, 2, 0)))\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fully-connected Network\n",
"\n",
"**What are the different ways to define a Network in PyTorch?**\n",
"\n",
"- **Sequential Class** - Add layers from **nn** module\n",
"- **Create a NN Module** - Subclass of torch.nn.Module or others, ex. torch.nn.Sequential\n",
"\n",
"**PyTorch nn.Module** - [link to the documentation][pytorch-module]\n",
"\n",
"> `torch.nn.Module` - Base class for all neural network modules. Your models should also subclass this class.\n",
"> Modules can also contain other Modules, allowing to nest them in a tree structure.\n",
"> You can assign the submodules as regular attributes:\n",
"> Submodules assigned in this way will be registered, and will have their parameters converted too when you call .cuda(), etc.\n",
"\n",
"[pytorch-module]:https://pytorch.org/docs/master/nn.html#torch.nn.Module"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a FullyConnected \"Sequential\" Module\n",
"class FullyConnected(torch.nn.Sequential):\n",
" def __init__(self, n_in, n_out, h_units=[]):\n",
" # Initialize module\n",
" super().__init__()\n",
" \n",
" # Save network parameters\n",
" self.n_inputs = n_in\n",
" self.n_outputs = n_out\n",
" self.h_units = h_units\n",
" \n",
" # Add hidden layers\n",
" n_hidden = len(h_units)\n",
" for i in range(n_hidden):\n",
" # Input/output sizes\n",
" hidden_in = n_in if i == 0 else h_units[i-1]\n",
" hidden_out = h_units[i]\n",
"\n",
" # Add layer and activation\n",
" self.add_module('hidden_{}'.format(i+1), torch.nn.Linear(hidden_in, hidden_out))\n",
" self.add_module('relu_{}'.format(i+1), torch.nn.ReLU())\n",
" \n",
" # Add output layer\n",
" output_in = n_in if n_hidden == 0 else hidden_out\n",
" self.add_module('output', torch.nn.Linear(output_in, n_out))\n",
" \n",
" def forward(self, img):\n",
" flat_img = img.view(-1, self.n_inputs)\n",
" return super().forward(flat_img)\n",
" \n",
"# Test\n",
"model = FullyConnected(28*28, len(classes))\n",
"model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Task - How to access Model Parameters?**\n",
"\n",
"* Plot weights from the first layer\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Visualize 1st layer of a FC network\n",
"def plot_weights(weights_fc1, axis):\n",
" # Shape of weights matrix\n",
" n_out, n_in = weights_fc1.shape\n",
"\n",
" # Create a grid\n",
" n_cells = min(16, n_out)\n",
" grid = torchvision.utils.make_grid(\n",
" weights_fc1[:n_cells].view(n_cells, 1, 28, 28),\n",
" nrow=4, normalize=True\n",
" )\n",
" \n",
" # Plot it\n",
" axis.imshow(grid.numpy().transpose((1, 2, 0)),)\n",
" \n",
"# TODO - Plot weights from \"model\" 1st layer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train the Model\n",
"\n",
"**Tasks - What is a good Network Architecture?**\n",
"\n",
"* 2-layer FC network - Best accuracy?\n",
"* Deeper network - Can you improve results?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from collections import defaultdict\n",
"\n",
"# Create model\n",
"model = FullyConnected(28*28, len(classes))\n",
"\n",
"# Criterion and optimizer for \"training\"\n",
"criterion = torch.nn.CrossEntropyLoss()\n",
"optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=1)\n",
"\n",
"# Backprop step\n",
"def compute_loss(output, target):\n",
" y_tensor = torch.LongTensor(target)\n",
" y_variable = torch.autograd.Variable(y_tensor)\n",
" return criterion(output, y_variable)\n",
"\n",
"def backpropagation(output, target):\n",
" optimizer.zero_grad() # Clear the gradients\n",
" loss = compute_loss(output, target) # Compute loss\n",
" loss.backward() # Backpropagation\n",
" optimizer.step() # Let the optimizer adjust our model\n",
" return loss.data\n",
"\n",
"# Helper function\n",
"def get_accuracy(output, y):\n",
" predictions = torch.argmax(output, dim=1) # Max activation\n",
" is_correct = np.equal(predictions, y)\n",
" return is_correct.numpy().mean()\n",
" \n",
"# Create a figure to visualize the results\n",
"fig, (ax1, ax2, ax3) = plt.subplots(nrows=1, ncols=3, figsize=(12, 3))\n",
" \n",
"try:\n",
" # Collect loss / accuracy values\n",
" stats = defaultdict(list)\n",
" t = 0 # Number of samples seen\n",
" print_step = 200 # Refresh rate\n",
" \n",
" for epoch in range(1, 10**5):\n",
" # Train by small batches of data\n",
" for batch, (batch_X, batch_y) in enumerate(train_loader, 1):\n",
" # Forward pass & backpropagation\n",
" output = model(batch_X)\n",
" loss = backpropagation(output, batch_y)\n",
" \n",
" # Log \"train\" stats\n",
" stats['train_loss'].append(loss)\n",
" stats['train_acc'].append(get_accuracy(output, batch_y))\n",
" stats['train_t'].append(t)\n",
"\n",
" if t%print_step == 0:\n",
" # Log \"validation\" stats\n",
" loss_vals, acc_vals = [], []\n",
" for X, y in valid_loader:\n",
" output = model(X)\n",
" loss_vals.append(compute_loss(output, y).data)\n",
" acc_vals.append(get_accuracy(output, y))\n",
" \n",
" stats['val_loss'].append(np.mean(loss_vals))\n",
" stats['val_acc'].append(np.mean(acc_vals))\n",
" stats['val_t'].append(t)\n",
" \n",
" # Plot what the network learned\n",
" ax1.cla()\n",
" ax1.set_title('Epoch {}, batch {:,}'.format(epoch, batch))\n",
" plot_weights(model[0].weight.data, ax1)\n",
" ax2.cla()\n",
" ax2.set_title('Loss, val: {:.3f}'.format(np.mean(stats['val_loss'][-10:])))\n",
" ax2.plot(stats['train_t'], stats['train_loss'], label='train')\n",
" ax2.plot(stats['val_t'], stats['val_loss'], label='valid')\n",
" ax2.legend()\n",
" ax3.cla()\n",
" ax3.set_title('Accuracy, val: {:.3f}'.format(np.mean(stats['val_acc'][-10:])))\n",
" ax3.plot(stats['train_t'], stats['train_acc'], label='train')\n",
" ax3.plot(stats['val_t'], stats['val_acc'], label='valid')\n",
" ax3.set_ylim(0, 1)\n",
" ax3.legend()\n",
"\n",
" # Jupyter trick\n",
" IPython.display.clear_output(wait=True)\n",
" IPython.display.display(fig)\n",
" \n",
" # Update t\n",
" t += train_loader.batch_size\n",
"\n",
"except KeyboardInterrupt:\n",
" # Clear output\n",
" IPython.display.clear_output()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convolutional Network\n",
"\n",
"\"Make some assumptions about the inputs to make learning more efficient\" - [Andrej Karpathy Lecture][karpathy-lecture]\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"**Convolutional Layers Parameters**\n",
"\n",
"* **Kernel Size** - Size of our \"Feature Detectors\"\n",
"* **Output depth** - Number of kernels\n",
"* **Stride** - How they move\n",
"* **Padding** - Add \"borders\" to the inputs\n",
"\n",
"**Pooling Parameters**\n",
"\n",
"* **Pooling function** - Maximum, Average\n",
"* **Size and Stride** - Downsampling ex. size=2 and stride=2\n",
"\n",
"Implementation - Inspired by [AlexNet PyTorch Code][pytorch-alexnet]\n",
"\n",
"[karpathy-lecture]:https://youtu.be/u6aEYuemt0M?t=10s\n",
"[exts-convolution]:https://youtu.be/Y1ugnb0bobk\n",
"[pytorch-alexnet]:https://github.com/pytorch/vision/blob/master/torchvision/models/alexnet.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from collections import namedtuple\n",
"\n",
"# Create a Pooling Named Tuple\n",
"PoolParams = namedtuple('PoolParams', ['size', 'stride'])\n",
"sample_poolparams = PoolParams(size=2, stride=2)\n",
"\n",
"# Create a ConvParams Named Tuple\n",
"ConvParams = namedtuple('ConvParams', ['size', 'n_kernels', 'stride', 'pooling'])\n",
"sample_convparams = ConvParams(size=16, n_kernels=5, stride=2, pooling=None)\n",
"\n",
"# Create an InputShape Named Tuple\n",
"InputShape = namedtuple('InputShape', ['channels', 'height', 'width'])\n",
"images_shape = InputShape(channels=1, height=28, width=28)\n",
"\n",
"# Create a ConvNet Module\n",
"class ConvNet(torch.nn.Module):\n",
" def __init__(self, layers_params, input_shape):\n",
" # Initialize module\n",
" super().__init__()\n",
" \n",
" # \"Feature extraction\" part\n",
" self.features = torch.nn.Sequential()\n",
" for layer_no, (size, n_kernels, stride, pooling) in enumerate(layers_params, 1):\n",
" # Input depth\n",
" depth_in = input_shape.channels if layer_no == 1 else layers_params[layer_no-2].n_kernels\n",
" \n",
" # Convolutional Layer\n",
" self.features.add_module(\n",
" 'conv2d_{}'.format(layer_no),\n",
" torch.nn.Conv2d(depth_in, n_kernels, size, stride)\n",
" )\n",
" self.features.add_module('relu_{}'.format(layer_no), torch.nn.ReLU())\n",
" \n",
" # Max-pooling layer\n",
" if pooling is not None:\n",
" self.features.add_module('maxpool_{}'.format(layer_no), torch.nn.MaxPool2d(pooling.size, pooling.stride))\n",
" \n",
" # Compute the number of features extracted\n",
" sample_input = torch.zeros(1, input_shape.channels, input_shape.height, input_shape.width)\n",
" _, depth_out, height_out, width_out = self.features(sample_input).shape\n",
" self.n_features = depth_out*height_out*width_out\n",
"\n",
" # \"Classifier\" part\n",
" self.classifier = torch.nn.Sequential(\n",
" torch.nn.Linear(self.n_features, 10)\n",
" )\n",
" \n",
" def forward(self, img):\n",
" # Extract features\n",
" img_features = self.features(img)\n",
" flat_features = img_features.view(-1, self.n_features)\n",
" \n",
" # Classify image\n",
" return self.classifier(flat_features)\n",
" \n",
"# Create toy model\n",
"model = ConvNet([\n",
" ConvParams(size=5, n_kernels=16, stride=2, pooling=PoolParams(size=2, stride=2)),\n",
" ConvParams(size=3, n_kernels=32, stride=1, pooling=PoolParams(size=2, stride=2)),\n",
"], images_shape)\n",
"model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Try forward pass\n",
"print('Sample input:', images.shape)\n",
"print('Sample output:', model(images).shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**How to access Model Parameters?**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Visualize kernels from the 1st Convolutional Layer\n",
"def plot_kernels(model, axis):\n",
" # Weights\n",
" kernel_weights = model.features.conv2d_1.weight.data\n",
" n_kernels, in_depth, height, width = kernel_weights.shape\n",
"\n",
" # Create a grid\n",
" n_cells = min(16, n_kernels)\n",
" grid = torchvision.utils.make_grid(\n",
" kernel_weights[:n_cells, 0].view(n_cells, in_depth, height, width),\n",
" nrow=4, normalize=True, padding=1\n",
" )\n",
" \n",
" # Plot it\n",
" axis.imshow(grid.numpy().transpose((1, 2, 0)),)\n",
" \n",
"fig = plt.figure(figsize=(3, 3))\n",
"plot_kernels(model, fig.gca())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Tasks - Convolutional Nets**\n",
"\n",
"* Train model - Adapt \"training\" code from above for ConvNets\n",
"* Model Architecture - Play with the different parameters, best accuracy?\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# TODO - Train ConvNet here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Small challenges\n",
"\n",
"* Print the number of parameters in each layer\n",
"* Pass a few images and print the outputs shapes\n",
"* Plot the \"activation maps\" for a sample input"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional resources\n",
"\n",
"Nice visualizations\n",
"\n",
"* Deep Visualization Toolbox - [Presentation on YouTube][vistool-video] / [GitHub repository][vistool-github]\n",
"* Feature Visualization - [distill.pub article][distill-featvis]\n",
"* The Building Blocks of Interpretability - [distill.pub article][distill-bblocks]\n",
"\n",
"To go deeper\n",
"\n",
"* ImageNet Classification with Deep Convolutional Neural Networks - [Slides][alexnet-slides]\n",
"\n",
"[vistool-video]:https://youtu.be/AgkfIQ4IGaM\n",
"[vistool-github]:https://github.com/yosinski/deep-visualization-toolbox\n",
"[distill-featvis]:https://distill.pub/2017/feature-visualization/\n",
"[distill-bblocks]:https://distill.pub/2018/building-blocks/\n",
"[alexnet-slides]:vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}