{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.\n", "- Author: Sebastian Raschka\n", "- GitHub Repository: https://github.com/rasbt/deeplearning-models" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sebastian Raschka \n", "\n", "CPython 3.7.1\n", "IPython 7.2.0\n", "\n", "torch 1.0.0\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -a 'Sebastian Raschka' -v -p torch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Zoo -- Using PyTorch Dataset Loading Utilities for Custom Datasets (Cropped Street View Hous Numbers, SVHN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook provides an example for how to load an image dataset, stored as individual PNG files, using PyTorch's data loading utilities. For a more in-depth discussion, please see the official\n", "\n", "- [Data Loading and Processing Tutorial](http://pytorch.org/tutorials/beginner/data_loading_tutorial.html)\n", "- [torch.utils.data](http://pytorch.org/docs/master/data.html) API documentation\n", "\n", "In this example, we are using the cropped version of the **Street View House Numbers (SVHN) Dataset**, which is available at http://ufldl.stanford.edu/housenumbers/. \n", "\n", "To execute the following examples, you need to download the 2 \".mat\" files \n", "\n", "- [train_32x32.mat](http://ufldl.stanford.edu/housenumbers/train_32x32.mat) (ca. 182 Mb, 73,257 images)\n", "- [test_32x32.mat](http://ufldl.stanford.edu/housenumbers/test_32x32.mat) (ca. 65 Mb, 26,032 images)\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import os\n", "\n", "import torch\n", "from torch.utils.data import Dataset\n", "from torch.utils.data import DataLoader\n", "from torchvision import transforms\n", "\n", "import matplotlib.pyplot as plt\n", "from PIL import Image\n", "import scipy.io as sio\n", "import imageio" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function will convert the images from \".mat\" into individual \".png\" files. In addition, we will create CSV contained the image paths and associated class labels." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def make_pngs(main_dir, mat_file, label):\n", " \n", " if not os.path.exists(main_dir):\n", " os.mkdir(main_dir)\n", " \n", " sub_dir = os.path.join(main_dir, label)\n", " if not os.path.exists(sub_dir):\n", " os.mkdir(sub_dir)\n", "\n", " data = sio.loadmat(mat_file)\n", "\n", " X = np.transpose(data['X'], (3, 0, 1, 2))\n", " y = data['y'].flatten()\n", "\n", " with open(os.path.join(main_dir, '%s_labels.csv' % label), 'w') as out_f:\n", " for i, img in enumerate(X):\n", " file_path = os.path.join(sub_dir, str(i) + '.png')\n", " imageio.imwrite(os.path.join(file_path),\n", " img)\n", "\n", " out_f.write(\"%d.png,%d\\n\" % (i, y[i]))\n", "\n", " \n", "make_pngs(main_dir='svhn_cropped',\n", " mat_file='train_32x32.mat',\n", " label='train')\n", " \n", " \n", "make_pngs(main_dir='svhn_cropped',\n", " mat_file='test_32x32.mat',\n", " label='test')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1
0
0.png1
1.png9
2.png2
3.png3
4.png2
\n", "
" ], "text/plain": [ " 1\n", "0 \n", "0.png 1\n", "1.png 9\n", "2.png 2\n", "3.png 3\n", "4.png 2" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('svhn_cropped/train_labels.csv', header=None, index_col=0)\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1
0
0.png5
1.png2
2.png1
3.png10
4.png6
\n", "
" ], "text/plain": [ " 1\n", "0 \n", "0.png 5\n", "1.png 2\n", "2.png 1\n", "3.png 10\n", "4.png 6" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv('svhn_cropped/test_labels.csv', header=None, index_col=0)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implementing a Custom Dataset Class" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we implement a custom `Dataset` for reading the images. The `__getitem__` method will\n", "\n", "1. read a single image from disk based on an `index` (more on batching later)\n", "2. perform a custom image transformation (if a `transform` argument is provided in the `__init__` construtor)\n", "3. return a single image and it's corresponding label" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "class SVHNDataset(Dataset):\n", " \"\"\"Custom Dataset for loading cropped SVHN images\"\"\"\n", " \n", " def __init__(self, csv_path, img_dir, transform=None):\n", " \n", " df = pd.read_csv(csv_path, index_col=0, header=None)\n", " self.img_dir = img_dir\n", " self.csv_path = csv_path\n", " self.img_names = df.index.values\n", " self.y = df[1].values\n", " self.transform = transform\n", "\n", " def __getitem__(self, index):\n", " img = Image.open(os.path.join(self.img_dir,\n", " self.img_names[index]))\n", " \n", " if self.transform is not None:\n", " img = self.transform(img)\n", " \n", " label = self.y[index]\n", " return img, label\n", "\n", " def __len__(self):\n", " return self.y.shape[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have created our custom Dataset class, let us add some custom transformations via the `transforms` utilities from `torchvision`, we\n", "\n", "1. normalize the images (here: dividing by 255)\n", "2. converting the image arrays into PyTorch tensors\n", "\n", "Then, we initialize a Dataset instance for the training images using the 'quickdraw_png_set1_train.csv' label file (we omit the test set, but the same concepts apply).\n", "\n", "Finally, we initialize a `DataLoader` that allows us to read from the dataset." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Note that transforms.ToTensor()\n", "# already divides pixels by 255. internally\n", "\n", "custom_transform = transforms.Compose([#transforms.Grayscale(), \n", " #transforms.Lambda(lambda x: x/255.),\n", " transforms.ToTensor()])\n", "\n", "train_dataset = SVHNDataset(csv_path='svhn_cropped/train_labels.csv',\n", " img_dir='svhn_cropped/train',\n", " transform=custom_transform)\n", "\n", "test_dataset = SVHNDataset(csv_path='svhn_cropped/test_labels.csv',\n", " img_dir='svhn_cropped/test',\n", " transform=custom_transform)\n", "\n", "BATCH_SIZE=128\n", "\n", "\n", "train_loader = DataLoader(dataset=train_dataset,\n", " batch_size=BATCH_SIZE,\n", " shuffle=True,\n", " num_workers=4)\n", "\n", "test_loader = DataLoader(dataset=test_dataset,\n", " batch_size=BATCH_SIZE,\n", " shuffle=False,\n", " num_workers=4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's it, now we can iterate over an epoch using the train_loader as an iterator and use the features and labels from the training dataset for model training:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Iterating Through the Custom Dataset" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch: 1 | Batch index: 0 | Batch size: 128\n", "Epoch: 2 | Batch index: 0 | Batch size: 128\n" ] } ], "source": [ "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n", "torch.manual_seed(0)\n", "\n", "num_epochs = 2\n", "for epoch in range(num_epochs):\n", "\n", " for batch_idx, (x, y) in enumerate(train_loader):\n", " \n", " print('Epoch:', epoch+1, end='')\n", " print(' | Batch index:', batch_idx, end='')\n", " print(' | Batch size:', y.size()[0])\n", " \n", " x = x.to(device)\n", " y = y.to(device)\n", " break" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just to make sure that the batches are being loaded correctly, let's print out the dimensions of the last batch:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([128, 3, 32, 32])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see, each batch consists of 128 images, just as specified. However, one thing to keep in mind though is that\n", "PyTorch uses a different image layout (which is more efficient when working with CUDA); here, the image axes are \"num_images x channels x height x width\" (NCHW) instead of \"num_images height x width x channels\" (NHWC):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To visually check that the images that coming of the data loader are intact, let's swap the axes to NHWC and convert an image from a Torch Tensor to a NumPy array so that we can visualize the image via `imshow`:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "torch.Size([32, 32, 3])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "one_image = x[99].permute(1, 2, 0)\n", "one_image.shape" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# note that imshow also works fine with scaled\n", "# images in [0, 1] range.\n", "plt.imshow(one_image.to(torch.device('cpu')));" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch 1.0.0\n", "pandas 0.23.4\n", "imageio 2.4.1\n", "numpy 1.15.4\n", "torchvision 0.2.1\n", "scipy 1.1.0\n", "PIL.Image 5.3.0\n", "matplotlib 3.0.2\n", "\n" ] } ], "source": [ "%watermark -iv" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" }, "toc": { "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }