{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MNIST classification using keras"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The goal of this exercise is to train a simple MLP on the MNIST digit classification dataset, the \"Hello World!\" of machine learning. MNIST was created by Yann LeCun to benchmark supervised learning algorithms. State-of the art is at 99.7% accuracy on the test set (using convolutional deep networks). See this link to see the different approaches : <http://yann.lecun.com/exdb/mnist>\n",
    "\n",
    "In MNIST, each input is a 28x28 grayscale image representing digits between 0 and 9. The training set has 60.000 examples, the test set 10.000.\n",
    "\n",
    "Instead of programming explicitly the MLP like in the previous exercise, we will now use **Keras** <https://keras.io>, a high-level API to **tensorflow** <https://tensorflow.org>.\n",
    "\n",
    "You need first to install `tensorflow` 2.x if not done already. On Colab, tensorflow is already installed. Even if you are using Anaconda, it is recommended to install tensorflow with pip:\n",
    "\n",
    "```bash\n",
    "pip install tensorflow\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import tensorflow as tf\n",
    "\n",
    "print(tf.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q:** Read the documentation of keras at <https://keras.io/api/> to get an overview of its structure."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You are provided with a basic poor-performing keras model to get you started. The goal is to extend this model in order to obtain a satisfying accuracy on the test set. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data preprocessing\n",
    "\n",
    "The first step is to download the MNIST dataset. You could download the raw data from <http://yann.lecun.com/exdb/mnist> and process it, but that would take a while.\n",
    "\n",
    "Fortunately, keras comes with a utility to automatically download MNIST, split it into training and test set and create nice numpy arrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "(X_train, t_train), (X_test, t_test) = tf.keras.datasets.mnist.load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Have a look at the doc of `tf.keras.datasets` to see what other datasets you can simply use.\n",
    "\n",
    "**Q:** Print the shape of the four numpy arrays `(X_train, t_train), (X_test, t_test)` and visualize some training examples to better understand what you are going to work on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"Training data:\", X_train.shape, t_train.shape)\n",
    "print(\"Test data:\", X_test.shape, t_test.shape)\n",
    "\n",
    "idx = 682 # for example\n",
    "x = X_train[idx, :]\n",
    "t = t_train[idx]\n",
    "print(\"x:\", x)\n",
    "print(\"x (shape):\", x.shape)\n",
    "print(\"t:\", t)\n",
    "\n",
    "plt.figure(figsize=(5, 5))\n",
    "plt.imshow(x, cmap=\"gray\")\n",
    "plt.colorbar()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this exercise, we are going to use a regular MLP (with fully-connected layers). Convolutional layers will be seen next time.\n",
    "\n",
    "We therefore need to transform the 28x28 input matrix into a 784 vector. Additionally, pixel values are integers between 0 and 255. We have to rescale them to floating values in [0, 1]."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train = X_train.reshape(X_train.shape[0], 784).astype('float32') / 255.\n",
    "X_test = X_test.reshape(X_test.shape[0], 784).astype('float32') / 255."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We saw in the last exercise that **mean removal** is crucial when training a neural network. The following cell removes the mean image of the training set from all examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_mean = np.mean(X_train, axis=0)\n",
    "X_train -= X_mean\n",
    "X_test -= X_mean\n",
    "\n",
    "\n",
    "plt.figure(figsize=(5, 5))\n",
    "plt.imshow(X_mean.reshape((28, 28))*255, cmap=\"gray\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The last preprocessing step is to perform **one-hot encoding** of the output labels. We want for example the digit 4 (index 5 in the outputs t) to be represented by the vector:\n",
    "\n",
    "[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]\n",
    "\n",
    "`keras` offers the utility `utils.to_categorical` to do that on the whole data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "T_train = tf.keras.utils.to_categorical(t_train, 10)\n",
    "T_test = tf.keras.utils.to_categorical(t_test, 10)\n",
    "\n",
    "print(T_train[idx])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All set! The data is ready to be learned by a neural network. You should normally not have to re-run those cells again. If you do, do not forget to run all of them sequentially: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model definition\n",
    "\n",
    "Let's now define a simple MLP with keras. When using a notebook, you can recreate models by simply re-running the cell, but this does not delete the previous networks which may end up filling your RAM. It is therefore good practice to start by telling tensorflow to delete all previous models (if this is what you want):\n",
    "\n",
    "```python\n",
    "tf.keras.backend.clear_session()\n",
    "```\n",
    "\n",
    "One way to define a neural network in keras is by stacking layers in a `Sequential()` model (you can later have a look at the doc of `Model()` for directed acyclic graphs).\n",
    "\n",
    "```python\n",
    "model = tf.keras.models.Sequential()\n",
    "```\n",
    "\n",
    "The input layer has 784 neurons, one per pixel in the input image. We only need to define a placeholder of the correct size to represent inputs and `add()` it to the model as its first layer:\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Input(shape=(784,)))\n",
    "```\n",
    "\n",
    "The input layer goes into a hidden, fully-connected, layer of 100 neurons using the logistic (or sigmoid) transfer function This can be specified by adding to the model a `Dense` layer (in the sense \"fully-connected\") with 100 units (another name for neuron), followed by an `Activation` layer using the 'sigmoid' function:\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Dense(units=100))\n",
    "model.add(tf.keras.layers.Activation('sigmoid')) \n",
    "```\n",
    "\n",
    "The weight matrix and the biases are intialized automatically using the Glorot uniform scheme (seen in the last exercise) for the weights and zeros for the biases. Check the doc of the `Dense` layer to see how to change this: <https://keras.io/layers/core/#dense>.\n",
    "\n",
    "We then add a softmax layer as output layer (classification problem), with 10 units (one per digit):\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Dense(units=10))\n",
    "model.add(tf.keras.layers.Activation('softmax')) \n",
    "```\n",
    "\n",
    "Weights and biases are initialized in the same manner. That's all, keras now knows how to transform the input vector into class probabilities using randomly initialized weights!\n",
    "\n",
    "For training, we need to choose an optimizer (learning rule). Several optimizers are available (<https://keras.io/optimizers/>). We pick simply **Stochastic Gradient Descent** with a learning rate of 0.01:\n",
    "\n",
    "```python\n",
    "optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)\n",
    "```\n",
    "\n",
    "The last step is to **compile** the network, so that keras computes how to implement the backpropagation algorithm. You need to specify:\n",
    "\n",
    "1. which loss function you want to minimize. The full list is at <https://keras.io/losses/>. Here, we will use the cross-entropy loss function as we have a clasification problem with softmax outputs.\n",
    "2. which optimizer you want to use.\n",
    "3. which metrics (accuracy, error, etc. - <https://keras.io/metrics/>) you want to track during learning.\n",
    "\n",
    "After the call to `compile()`, the neural network is instantiated and ready to learn."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete all previous models to free memory\n",
    "tf.keras.backend.clear_session()\n",
    "\n",
    "# Sequential model\n",
    "model = tf.keras.models.Sequential()\n",
    "\n",
    "# Input layer representing the 784 pixels\n",
    "model.add(tf.keras.layers.Input(shape=(784,)))\n",
    "\n",
    "# Hidden layer with 100 logistic neurons\n",
    "model.add(tf.keras.layers.Dense(units=100))\n",
    "model.add(tf.keras.layers.Activation('sigmoid')) \n",
    "\n",
    "# Softmax output layer over 10 classes\n",
    "model.add(tf.keras.layers.Dense(units=10))\n",
    "model.add(tf.keras.layers.Activation('softmax')) \n",
    "\n",
    "# Learning rule\n",
    "optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)\n",
    "\n",
    "# Loss function\n",
    "model.compile(\n",
    "    loss='categorical_crossentropy', # loss function\n",
    "    optimizer=optimizer, # learning rule\n",
    "    metrics=['accuracy'] # show accuracy\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A good practice after creating the model is to call `model.summary()` to see how many layers you have created and how many parameters each layer has.\n",
    "\n",
    "**Q:** Explain why you obtain this numbers of parameters in each layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(model.summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model training\n",
    "\n",
    "Now is time to train the network on MNIST. The following cell creates a `History()` object that will record the progress of your network.\n",
    "\n",
    "It then calls the `model.fit()` method, which tells the network to learn the MNIST dataset defined by the `(X_train, Y_train)` arrays. You have to specify:\n",
    "\n",
    "1. the batch size, i.e. the number of training examples in each minibatch used by SGD.\n",
    "2. the maximal number of epochs for training\n",
    "3. the size of the validation, taken from the training set to track the progress (this is not the test set!). Here we reserve 10% of the training data to validate. If you do not have much data, you could set it to 0.\n",
    "4. a callback, which will be called at the end of each epoch. Here it will save the metrics defined in `model.compile()` in the `History()` object.\n",
    "\n",
    "The training process can take a while depending on how big your network is and how many data samples you have. You can interrupt the kernel using the menu if you want to stop the processing in the cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# History tracks the evolution of the metrics during learning\n",
    "history = tf.keras.callbacks.History()\n",
    "\n",
    "# Training procedure\n",
    "model.fit(\n",
    "    X_train, T_train, # training data\n",
    "    batch_size=128,  # batch size\n",
    "    epochs=20, # Maximum number of epochs\n",
    "    validation_split=0.1, # Perceptage of training data used for validation\n",
    "    callbacks=[history] # Track the metrics at the end of each epoch\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The training has now run for 20 epochs on the training set. You see the evolution of loss function and accuracy for both the training and validation sets.\n",
    "\n",
    "To test your trained model on the test set, you can call `model.evaluate()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "score = model.evaluate(X_test, T_test, verbose=0)\n",
    "\n",
    "print('Test loss:', score[0])\n",
    "print('Test accuracy:', score[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also use the `History()` object to visualize the evolution of the the training and validation accuracy during learning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(15, 6))\n",
    "\n",
    "plt.subplot(121)\n",
    "plt.plot(history.history['loss'], '-r', label=\"Training\")\n",
    "plt.plot(history.history['val_loss'], '-b', label=\"Validation\")\n",
    "plt.xlabel('Epoch #')\n",
    "plt.ylabel('Loss')\n",
    "plt.legend()\n",
    "\n",
    "plt.subplot(122)\n",
    "plt.plot(history.history['accuracy'], '-r', label=\"Training\")\n",
    "plt.plot(history.history['val_accuracy'], '-b', label=\"Validation\")\n",
    "plt.xlabel('Epoch #')\n",
    "plt.ylabel('Accuracy')\n",
    "plt.legend()\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q:** Did overfitting occur during learning? Why? Looking at the curves, does it make sense to continue learning for much more epochs?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following cell makes predictions on the test set (`model.predict(X_test)`), computes the predicted classes by looking at the maximum probability for each example and displays some misclassified examples. The title of each subplot denotes the predicted class and the ground truth.\n",
    "\n",
    "**Q:** Are some mistakes understandable?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Y_test = model.predict(X_test)\n",
    "c_test = np.argmax(Y_test, axis=-1)\n",
    "\n",
    "misclassification = (c_test != t_test).nonzero()[0]\n",
    "\n",
    "plt.figure(figsize=(10, 8))\n",
    "for i in range(12):\n",
    "    plt.subplot(3, 4, i+1)\n",
    "    plt.imshow((X_test[misclassification[i], :] + X_mean).reshape((28, 28)), cmap=plt.cm.gray, interpolation='nearest')\n",
    "    plt.title('P = ' + str(c_test[misclassification[i]]) + ' ; T = ' + str(t_test[misclassification[i]]))\n",
    "    plt.xticks([]); plt.yticks([])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Questions\n",
    "\n",
    "With the provided model, you probably obtained a final accuracy on the test set around 90%. That is lame. The state-of-the-art performance is 99.7%.\n",
    "\n",
    "The goal of this exercise is now to modify the network in order to obtain an accuracy of **98%** (or more) in 20 epochs only.\n",
    "\n",
    "You are free to use any improvement on the basic model, using the doc of Keras. Here are some suggestions:\n",
    "\n",
    "* Change the learning rate of SGD.\n",
    "\n",
    "* Change the number of neurons in the hidden layer.\n",
    "\n",
    "* Change the number of hidden layers (just stack another `Dense` layer in the model).\n",
    "\n",
    "**Beware:** you do not have three weeks in front of you, so keep the complexity of your model in a reasonable range.\n",
    "\n",
    "* Change the transfer function of the hidden neurons. See <https://keras.io/activations/> for the different possibilities in `keras`. Check in particular the Rectifier Linear Unit (ReLU).\n",
    "\n",
    "* Change the learning rule. Instead of the regular SGD, use for example the Nesterov Momentum method:\n",
    "\n",
    "```python\n",
    "optimizer = tf.keras.optimizers.SGD(learning_rate=0.1, decay=1e-6, momentum=0.9, nesterov=True)\n",
    "```\n",
    "\n",
    "or the Adam learning rule:\n",
    "\n",
    "```python\n",
    "optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)\n",
    "```\n",
    "\n",
    "* Change the batch size. What impact does it have on training time?\n",
    "\n",
    "* Apply **L2- or L1-regularization** to the weight updates to avoid overfitting <https://keras.io/regularizers/>:\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Dense(50, kernel_regularizer=tf.keras.regularizers.l2(0.0001)))\n",
    "```\n",
    "\n",
    "* Apply **dropout** regularization after each layer. Find a good level of dropout.\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Dropout(0.5))\n",
    "```\n",
    "\n",
    "* Add **Batch normalization** between the fully-connected layer and the transfer function.\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Dense(100)) # Weights\n",
    "model.add(tf.keras.layers.BatchNormalization()) # Batch normalization\n",
    "model.add(tf.keras.layers.Activation('relu')) # Transfer function\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9.13 ('base')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "vscode": {
   "interpreter": {
    "hash": "3d24234067c217f49dc985cbc60012ce72928059d528f330ba9cb23ce737906d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}