{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Convolutional Autoencoder\n",
"\n",
"Sticking with the MNIST dataset, let's improve our autoencoder's performance using convolutional layers. We'll build a convolutional autoencoder to compress the MNIST dataset. \n",
"\n",
">The encoder portion will be made of convolutional and pooling layers and the decoder will be made of **transpose convolutional layers** that learn to \"upsample\" a compressed representation.\n",
"\n",
"\n",
"\n",
"### Compressed Representation\n",
"\n",
"A compressed representation can be great for saving and sharing any kind of data in a way that is more efficient than storing raw data. In practice, the compressed representation often holds key information about an input image and we can use it for denoising images or other kinds of reconstruction and transformation!\n",
"\n",
"\n",
"\n",
"Let's get started by importing our libraries and getting the dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import torch\n",
"import numpy as np\n",
"from torchvision import datasets\n",
"import torchvision.transforms as transforms\n",
"\n",
"# convert data to torch.FloatTensor\n",
"transform = transforms.ToTensor()\n",
"\n",
"# load the training and test datasets\n",
"train_data = datasets.MNIST(root='data', train=True,\n",
" download=True, transform=transform)\n",
"test_data = datasets.MNIST(root='data', train=False,\n",
" download=True, transform=transform)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Create training and test dataloaders\n",
"\n",
"num_workers = 0\n",
"# how many samples per batch to load\n",
"batch_size = 20\n",
"\n",
"# prepare data loaders\n",
"train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers)\n",
"test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Visualize the Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
" \n",
"# obtain one batch of training images\n",
"dataiter = iter(train_loader)\n",
"images, labels = dataiter.next()\n",
"images = images.numpy()\n",
"\n",
"# get one image from the batch\n",
"img = np.squeeze(images[0])\n",
"\n",
"fig = plt.figure(figsize = (5,5)) \n",
"ax = fig.add_subplot(111)\n",
"ax.imshow(img, cmap='gray')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"## Convolutional Autoencoder\n",
"\n",
"#### Encoder\n",
"The encoder part of the network will be a typical convolutional pyramid. Each convolutional layer will be followed by a max-pooling layer to reduce the dimensions of the layers. \n",
"\n",
"#### Decoder\n",
"\n",
"The decoder though might be something new to you. The decoder needs to convert from a narrow representation to a wide, reconstructed image. For example, the representation could be a 7x7x4 max-pool layer. This is the output of the encoder, but also the input to the decoder. We want to get a 28x28x1 image out from the decoder so we need to work our way back up from the compressed representation. A schematic of the network is shown below.\n",
"\n",
"\n",
"\n",
"Here our final encoder layer has size 7x7x4 = 196. The original images have size 28x28 = 784, so the encoded vector is 25% the size of the original image. These are just suggested sizes for each of the layers. Feel free to change the depths and sizes, in fact, you're encouraged to add additional layers to make this representation even smaller! Remember our goal here is to find a small representation of the input data.\n",
"\n",
"### Transpose Convolutions, Decoder\n",
"\n",
"This decoder uses **transposed convolutional** layers to increase the width and height of the input layers. They work almost exactly the same as convolutional layers, but in reverse. A stride in the input layer results in a larger stride in the transposed convolution layer. For example, if you have a 3x3 kernel, a 3x3 patch in the input layer will be reduced to one unit in a convolutional layer. Comparatively, one unit in the input layer will be expanded to a 3x3 path in a transposed convolution layer. PyTorch provides us with an easy way to create the layers, [`nn.ConvTranspose2d`](https://pytorch.org/docs/stable/nn.html#convtranspose2d). \n",
"\n",
"It is important to note that transpose convolution layers can lead to artifacts in the final images, such as checkerboard patterns. This is due to overlap in the kernels which can be avoided by setting the stride and kernel size equal. In [this Distill article](http://distill.pub/2016/deconv-checkerboard/) from Augustus Odena, *et al*, the authors show that these checkerboard artifacts can be avoided by resizing the layers using nearest neighbor or bilinear interpolation (upsampling) followed by a convolutional layer. \n",
"\n",
"> We'll show this approach in another notebook, so you can experiment with it and see the difference.\n",
"\n",
"\n",
"#### TODO: Build the network shown above. \n",
"> Build the encoder out of a series of convolutional and pooling layers. \n",
"> When building the decoder, recall that transpose convolutional layers can upsample an input by a factor of 2 using a stride and kernel_size of 2. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import torch.nn as nn\n",
"import torch.nn.functional as F\n",
"\n",
"# define the NN architecture\n",
"class ConvAutoencoder(nn.Module):\n",
" def __init__(self):\n",
" super(ConvAutoencoder, self).__init__()\n",
" ## encoder layers ##\n",
" \n",
" \n",
" ## decoder layers ##\n",
" ## a kernel of 2 and a stride of 2 will increase the spatial dims by 2\n",
" self.t_conv1 = nn.ConvTranspose2d(4, 16, 2, stride=2)\n",
"\n",
"\n",
" def forward(self, x):\n",
" ## encode ##\n",
" \n",
" ## decode ##\n",
" ## apply ReLu to all hidden layers *except for the output layer\n",
" ## apply a sigmoid to the output layer\n",
" \n",
" \n",
" return x\n",
"\n",
"# initialize the NN\n",
"model = ConvAutoencoder()\n",
"print(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"## Training\n",
"\n",
"Here I'll write a bit of code to train the network. I'm not too interested in validation here, so I'll just monitor the training loss and the test loss afterwards. \n",
"\n",
"We are not concerned with labels in this case, just images, which we can get from the `train_loader`. Because we're comparing pixel values in input and output images, it will be best to use a loss that is meant for a regression task. Regression is all about comparing quantities rather than probabilistic values. So, in this case, I'll use `MSELoss`. And compare output images and input images as follows:\n",
"```\n",
"loss = criterion(outputs, images)\n",
"```\n",
"\n",
"Otherwise, this is pretty straightfoward training with PyTorch. Since this is a convlutional autoencoder, our images _do not_ need to be flattened before being passed in an input to our model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# specify loss function\n",
"criterion = nn.MSELoss()\n",
"\n",
"# specify loss function\n",
"optimizer = torch.optim.Adam(model.parameters(), lr=0.001)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# number of epochs to train the model\n",
"n_epochs = 30\n",
"\n",
"for epoch in range(1, n_epochs+1):\n",
" # monitor training loss\n",
" train_loss = 0.0\n",
" \n",
" ###################\n",
" # train the model #\n",
" ###################\n",
" for data in train_loader:\n",
" # _ stands in for labels, here\n",
" # no need to flatten images\n",
" images, _ = data\n",
" # clear the gradients of all optimized variables\n",
" optimizer.zero_grad()\n",
" # forward pass: compute predicted outputs by passing inputs to the model\n",
" outputs = model(images)\n",
" # calculate the loss\n",
" loss = criterion(outputs, images)\n",
" # backward pass: compute gradient of the loss with respect to model parameters\n",
" loss.backward()\n",
" # perform a single optimization step (parameter update)\n",
" optimizer.step()\n",
" # update running training loss\n",
" train_loss += loss.item()*images.size(0)\n",
" \n",
" # print avg training statistics \n",
" train_loss = train_loss/len(train_loader)\n",
" print('Epoch: {} \\tTraining Loss: {:.6f}'.format(\n",
" epoch, \n",
" train_loss\n",
" ))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Checking out the results\n",
"\n",
"Below I've plotted some of the test images along with their reconstructions. These look a little rough around the edges, likely due to the checkerboard effect we mentioned above that tends to happen with transpose layers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# obtain one batch of test images\n",
"dataiter = iter(test_loader)\n",
"images, labels = dataiter.next()\n",
"\n",
"# get sample outputs\n",
"output = model(images)\n",
"# prep images for display\n",
"images = images.numpy()\n",
"\n",
"# output is resized into a batch of iages\n",
"output = output.view(batch_size, 1, 28, 28)\n",
"# use detach when it's an output that requires_grad\n",
"output = output.detach().numpy()\n",
"\n",
"# plot the first ten input images and then reconstructed images\n",
"fig, axes = plt.subplots(nrows=2, ncols=10, sharex=True, sharey=True, figsize=(25,4))\n",
"\n",
"# input images on top row, reconstructions on bottom\n",
"for images, row in zip([images, output], axes):\n",
" for img, ax in zip(images, row):\n",
" ax.imshow(np.squeeze(img), cmap='gray')\n",
" ax.get_xaxis().set_visible(False)\n",
" ax.get_yaxis().set_visible(False)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}