{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# AlexNet Training\n", "\n", "In this notebook we will train the implementation of the AlexNet class provided in the alexnet.py file. We will be using the CIFAR-10 dataset for this task. As, the input dimensions and the amount of data in CIFAR-10 differs from that of ImageNet, some modifications have been made in the training process." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/mohit/virtualenvs/tensorflow/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n", " from ._conv import register_converters as _register_converters\n" ] } ], "source": [ "import tensorflow as tf\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## CIFAR-10\n", "\n", "The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.\n", "\n", "The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "\"\"\" Get Data \"\"\"\n", "\n", "# File Path\n", "CIFAR_DIR = 'Data/cifar-10-batches-py/'\n", "\n", "# Load the Data\n", "def unpickle(file):\n", " import pickle\n", " with open(file, 'rb') as fo:\n", " cifar_dict = pickle.load(fo, encoding='bytes')\n", " return cifar_dict\n", "\n", "dirs = ['batches.meta','data_batch_1','data_batch_2','data_batch_3','data_batch_4','data_batch_5','test_batch']\n", "all_data = [0,1,2,3,4,5,6]\n", "\n", "for i,direc in zip(all_data,dirs):\n", " all_data[i] = unpickle(CIFAR_DIR+direc)\n", " \n", "batch_meta = all_data[0]\n", "data_batch1 = all_data[1]\n", "data_batch2 = all_data[2]\n", "data_batch3 = all_data[3]\n", "data_batch4 = all_data[4]\n", "data_batch5 = all_data[5]\n", "test_batch = all_data[6]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{b'label_names': [b'airplane',\n", " b'automobile',\n", " b'bird',\n", " b'cat',\n", " b'deer',\n", " b'dog',\n", " b'frog',\n", " b'horse',\n", " b'ship',\n", " b'truck'],\n", " b'num_cases_per_batch': 10000,\n", " b'num_vis': 3072}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "batch_meta" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys([b'labels', b'data', b'batch_label', b'filenames'])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_batch1.keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Loaded in this way, each of the batch files contains a dictionary with the following elements:\n", "\n", "data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image. labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data. The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries:\n", "\n", "label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == \"airplane\", label_names[1] == \"automobile\", etc." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(-0.5, 31.5, 31.5, -0.5)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Display the Data Images# Displ \n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "X = data_batch1[b\"data\"] \n", "X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype(\"uint8\")\n", "\n", "fig = plt.figure(figsize=(15,10))\n", "fig.add_subplot(1,3,1)\n", "plt.imshow(X[6])\n", "plt.axis('off')\n", "fig.add_subplot(1,3,2)\n", "plt.imshow(X[120])\n", "plt.axis('off')\n", "fig.add_subplot(1,3,3)\n", "plt.imshow(X[360])\n", "plt.axis('off')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Helper Functions for Dealing With Data." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def one_hot_encode(vec, vals=10):\n", " '''\n", " For use to one-hot encode the 10- possible labels\n", " '''\n", " n = len(vec)\n", " out = np.zeros((n, vals))\n", " out[range(n), vec] = 1\n", " return out" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "class CifarHelper():\n", " \n", " def __init__(self):\n", " self.i = 0\n", " \n", " # Grabs a list of all the data batches for training\n", " self.all_train_batches = [data_batch1,data_batch2,data_batch3,data_batch4,data_batch5]\n", " # Grabs a list of all the test batches (really just one batch)\n", " self.test_batch = [test_batch]\n", " \n", " # Intialize some empty variables for later on\n", " self.training_images = None\n", " self.training_labels = None\n", " \n", " self.test_images = None\n", " self.test_labels = None\n", " \n", " def set_up_images(self):\n", " \n", " print(\"Setting Up Training Images and Labels\")\n", " \n", " # Vertically stacks the training images\n", " self.training_images = np.vstack([d[b\"data\"] for d in self.all_train_batches])\n", " train_len = len(self.training_images)\n", " \n", " # Reshapes and normalizes training images\n", " self.training_images = self.training_images.reshape(train_len,3,32,32).transpose(0,2,3,1)/255\n", " # One hot Encodes the training labels (e.g. [0,0,0,1,0,0,0,0,0,0])\n", " self.training_labels = one_hot_encode(np.hstack([d[b\"labels\"] for d in self.all_train_batches]), 10)\n", " \n", " print(\"Setting Up Test Images and Labels\")\n", " \n", " # Vertically stacks the test images\n", " self.test_images = np.vstack([d[b\"data\"] for d in self.test_batch])\n", " test_len = len(self.test_images)\n", " \n", " # Reshapes and normalizes test images\n", " self.test_images = self.test_images.reshape(test_len,3,32,32).transpose(0,2,3,1)/255\n", " # One hot Encodes the test labels (e.g. [0,0,0,1,0,0,0,0,0,0])\n", " self.test_labels = one_hot_encode(np.hstack([d[b\"labels\"] for d in self.test_batch]), 10)\n", "\n", " \n", " def next_batch(self, batch_size):\n", " # Note that the 100 dimension in the reshape call is set by an assumed batch size of 100\n", " x = self.training_images[self.i:self.i+batch_size].reshape(100,32,32,3)\n", " y = self.training_labels[self.i:self.i+batch_size]\n", " self.i = (self.i + batch_size) % len(self.training_images)\n", " return x, y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating the AlexNet Model" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from alexnet import AlexNet" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "#placeholder for input and dropout rate\n", "x = tf.placeholder(tf.float32, shape = [None, 32, 32, 3])\n", "y_true = tf.placeholder(tf.float32, shape = [None, 10])\n", "keep_prob = tf.placeholder(tf.float32)\n", "\n", "# Create the AlexNet model\n", "model = AlexNet(x = x, keep_prob = keep_prob, num_classes = 10)\n", "\n", "#define activation of last layer as score\n", "score = model.fc8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loss Function" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels = y_true, logits = score))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Optimizer" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# The optimiser used in this implementation is different\n", "# to that used in the paper.\n", "optimizer = tf.train.AdamOptimizer(learning_rate = 0.0001)\n", "train = optimizer.minimize(cross_entropy)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Initialize all global variables\n", "init = tf.global_variables_initializer()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Graph Session" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Setting Up Training Images and Labels\n", "Setting Up Test Images and Labels\n", "EPOCH: 0.0\n", "ACCURACY \n", "0.1\n", "\n", "\n", "EPOCH: 1.0\n", "ACCURACY \n", "0.2389\n", "\n", "\n", "EPOCH: 2.0\n", "ACCURACY \n", "0.3206\n", "\n", "\n", "EPOCH: 3.0\n", "ACCURACY \n", "0.3345\n", "\n", "\n", "EPOCH: 4.0\n", "ACCURACY \n", "0.3412\n", "\n", "\n", "EPOCH: 5.0\n", "ACCURACY \n", "0.3572\n", "\n", "\n", "EPOCH: 6.0\n", "ACCURACY \n", "0.3953\n", "\n", "\n", "EPOCH: 7.0\n", "ACCURACY \n", "0.4241\n", "\n", "\n", "EPOCH: 8.0\n", "ACCURACY \n", "0.4486\n", "\n", "\n", "EPOCH: 9.0\n", "ACCURACY \n", "0.4566\n", "\n", "\n", "EPOCH: 10.0\n", "ACCURACY \n", "0.4748\n", "\n", "\n", "EPOCH: 11.0\n", "ACCURACY \n", "0.4869\n", "\n", "\n", "EPOCH: 12.0\n", "ACCURACY \n", "0.4965\n", "\n", "\n", "EPOCH: 13.0\n", "ACCURACY \n", "0.5043\n", "\n", "\n", "EPOCH: 14.0\n", "ACCURACY \n", "0.5127\n", "\n", "\n", "EPOCH: 15.0\n", "ACCURACY \n", "0.5177\n", "\n", "\n", "EPOCH: 16.0\n", "ACCURACY \n", "0.5243\n", "\n", "\n", "EPOCH: 17.0\n", "ACCURACY \n", "0.5345\n", "\n", "\n", "EPOCH: 18.0\n", "ACCURACY \n", "0.5394\n", "\n", "\n", "EPOCH: 19.0\n", "ACCURACY \n", "0.5436\n", "\n", "\n", "EPOCH: 20.0\n", "ACCURACY \n", "0.5468\n", "\n", "\n" ] } ], "source": [ "# steps = 10,000 will create 20 epochs.\n", "# There are a total of 50,000 images in the training set.\n", "# (10,000 * 100) / 50,000 = 20\n", "steps = 10001\n", "\n", "ch = CifarHelper()\n", "# pre-processes the data.\n", "ch.set_up_images()\n", "\n", "with tf.Session() as sess:\n", " \n", " sess.run(init)\n", " \n", " for i in range(steps):\n", " \n", " # get next batch of data.\n", " batch = ch.next_batch(100)\n", " # On training set.\n", " sess.run(train, feed_dict = {x : batch[0], y_true : batch[1], keep_prob : 0.5})\n", " \n", " # Print accuracy after every epoch.\n", " # 500 * 100 = 50,000 which is one complete batch of data.\n", " if i%500 == 0:\n", " \n", " print(\"EPOCH: {}\".format(i / 500))\n", " print(\"ACCURACY \")\n", " \n", " matches = tf.equal(tf.argmax(score, 1), tf.argmax(y_true, 1))\n", " acc = tf.reduce_mean(tf.cast(matches, tf.float32))\n", " \n", " # On valid/test set.\n", " print(sess.run(acc, feed_dict = {x : ch.test_images, y_true : ch.test_labels, keep_prob : 1.0}))\n", " print('\\n')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }