{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 2 - Building and Training Models\n",
    "\n",
    "In this notebook we will cover the following topics\n",
    "\n",
    "* Creating models from scratch with Keras\n",
    "* Training the model\n",
    "* Looking at the results of the training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "np.warnings.filterwarnings('ignore')  # Hide np.floating warning\n",
    "import holoviews as hv\n",
    "hv.extension('bokeh')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prevent TensorFlow from grabbing all the GPU memory\n",
    "import tensorflow as tf\n",
    "config = tf.ConfigProto()\n",
    "config.gpu_options.allow_growth=True\n",
    "sess = tf.Session(config=config)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading Data\n",
    "\n",
    "Using the pattern we saw in the last notebook, we can load and transform the CIFAR10 data for deep learning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from keras.datasets import cifar10\n",
    "import keras.utils\n",
    "\n",
    "(x_train, y_train), (x_test, y_test) = cifar10.load_data()\n",
    "\n",
    "# Save an unmodified copy of y_test for later, flattened to one column\n",
    "y_test_true = y_test[:,0].copy()\n",
    "\n",
    "x_train = x_train.astype('float32')\n",
    "x_test = x_test.astype('float32')\n",
    "x_train /= 255\n",
    "x_test /= 255\n",
    "\n",
    "num_classes = 10\n",
    "y_train = keras.utils.to_categorical(y_train, num_classes)\n",
    "y_test = keras.utils.to_categorical(y_test, num_classes)\n",
    "\n",
    "# The data only has numeric categories so we also have the string labels below \n",
    "cifar10_labels = np.array(['airplane', 'automobile', 'bird', 'cat', 'deer', \n",
    "                           'dog', 'frog', 'horse', 'ship', 'truck'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating a Model\n",
    "\n",
    "The simplest way to define a deep learning model in Keras is using the [Sequential class](https://keras.io/getting-started/sequential-model-guide/), which holds a stack of layers that are executed in sequence.\n",
    "\n",
    "Keras has an [extensive catalog of layers](https://keras.io/layers/about-keras-layers/), making it very easy to recreate almost any network you find in the literature.  The VGG16-like networks we will use in this tutorial have the following kinds of layers:\n",
    "* Conv2D - 2D convolutions, useful for image networks\n",
    "* MaxPooling2D - Pooling of adjacent values using the `max()` function in 2 dimensions, also useful in image networks\n",
    "* Flatten - Turn any shape input in to a flat, 1D output.  Often used to transition to dense layers\n",
    "* Dense - The traditional neural network layer, where each output is a weighted sum of input layers + offset with an activation function.\n",
    "\n",
    "Keras also has a large list of supported [activation functions](https://keras.io/activations/).  For all of these examples, we will use the `relu` function as it has good performance.\n",
    "\n",
    "We begin by importing the necessary classes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from keras import Sequential\n",
    "from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Creating a Keras model has the following structure:\n",
    "\n",
    "* Create empty model\n",
    "* Add layers in order, setting the input_shape for the first layer\n",
    "* Finish by compiling the model with a loss function, an optimizer, and a list of metrics to compute during fitting\n",
    "\n",
    "The choice of [loss function](https://keras.io/losses/) depends on the kind of model we are training.  Since we are doing categorization with more than two categories, `categorical_crossentropy` is preferred.\n",
    "\n",
    "The choice of [optimizer](https://keras.io/optimizers/) is less straightforward.  We're using `Adadelta` because it is self-tuning and works pretty well on this problem.\n",
    "\n",
    "Metrics are functions that score your model, but are not used to optimize it.  The most common metric is accuracy, so we include it here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Sequential()\n",
    "\n",
    "### Convolution and max pool layers\n",
    "\n",
    "# Group 1: Convolution\n",
    "model.add(Conv2D(32, kernel_size=(3, 3),\n",
    "                 activation='relu',\n",
    "                 input_shape=x_train.shape[1:]))\n",
    "model.add(Conv2D(32, (3, 3), activation='relu'))\n",
    "model.add(MaxPooling2D(pool_size=(2, 2)))\n",
    "\n",
    "# Group 2: Convolution\n",
    "model.add(Conv2D(64, kernel_size=(3, 3),\n",
    "                 activation='relu',\n",
    "                 input_shape=x_train.shape[1:]))\n",
    "model.add(Conv2D(64, (3, 3), activation='relu'))\n",
    "model.add(MaxPooling2D(pool_size=(2, 2)))\n",
    "\n",
    "# Group 3: Dense layers\n",
    "model.add(Flatten())\n",
    "model.add(Dense(128, activation='relu'))\n",
    "model.add(Dense(num_classes, activation='softmax'))\n",
    "\n",
    "model.compile(loss=keras.losses.categorical_crossentropy,\n",
    "              optimizer=keras.optimizers.Adadelta(),\n",
    "              metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can inspect various properties of the model, such as the number of free parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we see that the majority of free parameters are introduced at the point where we switch from the convolutional layers to the dense layers.  If we want to reduce the size of this model, we will either need to reduce the size of the dense layer or reduce the number of convolution kernels."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training a Model\n",
    "\n",
    "To train a model, we use the `fit()` method on a compiled model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "history = model.fit(x_train, y_train,\n",
    "          batch_size=256,\n",
    "          epochs=5,\n",
    "          verbose=1,\n",
    "          validation_data=(x_test, y_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `epochs` value controls how many passes through the training data are taken.  The `batch_size` determines how many training examples are processed in parallel.  The model parameters are updated between each batch using backpropagation according to the optimizer's strategy.  Batch size affects both training performance and model quality, as we'll discuss later.\n",
    "\n",
    "The validation data is not used by the optimizer for training, but it is scored between each epoch to give an independent assessment of the model quality.  The results for the validation data are what you should keep an eye on to understand how well the model is generalizing.  In the next notebook, we'll look more closely at how to interpret differences in accuracy between training and validation data.\n",
    "\n",
    "Note that the model object retains its state after training.  If we wanted additional rounds of training, we could call `fit()` again, and it would pick up where the last fit left off:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.fit(x_train, y_train,\n",
    "          batch_size=256,\n",
    "          epochs=2,\n",
    "          verbose=1,\n",
    "          validation_data=(x_test, y_test))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One of the more powerful features of the `fit()` method is the `callbacks` argument.  We can use [prebuilt classes](https://keras.io/callbacks/), or create our own, that are called after every batch and epoch to update status or cause the fit to terminate.  For example, we can use the [EarlyStopping](https://keras.io/callbacks/#earlystopping) to end the fit if no improvement larger than 5% is seen for 2 training epochs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "early_stop = keras.callbacks.EarlyStopping(monitor='val_acc', min_delta=0.05, patience=2, verbose=1)\n",
    "model.fit(x_train, y_train,\n",
    "          batch_size=256,\n",
    "          epochs=10,\n",
    "          verbose=1,\n",
    "          validation_data=(x_test, y_test),\n",
    "          callbacks=[early_stop])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inspecting the Fit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that the model is trained, we can use the model object in various ways.  First, we can look at the training history object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(history.epoch)\n",
    "history.history"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `history.history` dictionary has tracked several different values through the training process:\n",
    "\n",
    "* `acc`: Accuracy of the model on the training set, averaged over the batches\n",
    "* `val_acc`: Accuracy of the model on the validation set\n",
    "* `loss`: Value of the loss function on the training set, averaged over the batches\n",
    "* `val_loss`: Value of the loss function on the validation set\n",
    "\n",
    "Note that the loss function on the training data is the only thing the optimizer is trying to minimize.  The other metrics hopefully improve at the same time, but do not always.\n",
    "\n",
    "We can plot the accuracy on the test and training data with Holoviews:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_acc = hv.Curve((history.epoch, history.history['acc']), 'epoch', 'accuracy', label='training')\n",
    "val_acc = hv.Curve((history.epoch, history.history['val_acc']), 'epoch', 'accuracy', label='validation')\n",
    "\n",
    "layout = train_acc * val_acc\n",
    "\n",
    "layout.opts(\n",
    "    hv.opts.Curve(width=400, height=300, line_width=3),\n",
    "    hv.opts.Overlay(legend_position='top_left')\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also look individual predictions.  Let's run the final trained model over the validation set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_predict = model.predict(x_test)\n",
    "y_predict[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is still using the one-hot encoding, where each input image produces 10 columns (for categories 0-9) of output.  Normally, we would take the column with the largest output as the predicted category.  We could do this with some NumPy magic, but Keras also includes a convenience method `predict_classes()`, which does this automatically:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_predict = model.predict_classes(x_test)\n",
    "y_predict[:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then we can use our label array and NumPy fancy indexing to see these as strings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_predict_labels = cifar10_labels[y_predict]\n",
    "y_true_labels = cifar10_labels[y_test_true]\n",
    "print(y_predict_labels[:5])\n",
    "print(y_true_labels[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Holoviews makes it easy to look at the first few predictions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "images = [hv.RGB(x_test[i], label='%s(%s)' % (y_true_labels[i], y_predict_labels[i]) ) for i in range(12)]\n",
    "\n",
    "hv.output(\n",
    "    hv.Layout(images).cols(4).opts(\n",
    "        hv.opts.RGB(xaxis=None, yaxis=None)\n",
    "    ),\n",
    "    size=64\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In fact, let's select out the failed predictions with more NumPy fancy indexing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "failed = y_predict != y_test_true\n",
    "print('Number failed:', np.count_nonzero(failed))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "images = [hv.RGB(x_test[failed][i], label='%d: %s(%s)' % \n",
    "                 (i, y_true_labels[failed][i],\n",
    "                  y_predict_labels[failed][i]) ) for i in range(12)]\n",
    "\n",
    "hv.output(\n",
    "    hv.Layout(images).cols(4).opts(\n",
    "        hv.opts.RGB(xaxis=None, yaxis=None),\n",
    "    ),\n",
    "    size=64\n",
    ")   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll learn more about evaluating the model in the next section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Experiments to Try\n",
    "\n",
    "* Try changing some of the model parameters (number of dense nodes, number of convolution kernels) and see how training changes.\n",
    "* Try changing the batch size during training to see how the speed of training is affected (and the final accuracy).\n",
    "* Try changing `relu` to `sigmoid`.\n",
    "\n",
    "If you screw everything up, you can use File / Revert to Checkpoint to go back to the first version of the notebook and restart the Jupyter kernel with Kernel / Restart."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}