{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Keras tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The goal of this tutorial is to very quickly present keras, the high-level API of tensorflow, as it has already been seen in the Neurocomputing exercises. We will train a small fully-connected network on MNIST and observe what happens when the inputs or outputs are correlated, by training successively on the 0 digits, then the 1, etc. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Keras\n",
    "\n",
    "The first step is to install tensorflow. The easiest way is to use pip:\n",
    "    \n",
    "```bash\n",
    "pip install tensorflow\n",
    "```\n",
    "\n",
    "`keras` is now available as a submodule of tensorflow (you can also install it as a separate package):\n",
    "\n",
    "```python\n",
    "import tensorflow as tf\n",
    "```\n",
    "\n",
    "Keras provides a lot of ready-made layer types, activation functions, optimizers and so on. Do not hesitate to read its documentation on <https://keras.io>."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The most important object in keras is `Sequential`. It is a container where you sequentially add layers of neurons (fully-connected, convolutional, recurrent, etc) and other stuff. It represents your model, i.e. the neural network itself.\n",
    "\n",
    "```python\n",
    "model = tf.keras.models.Sequential()\n",
    "```\n",
    "\n",
    "You can then `add()` layers to the model. A fully-connected layer is called `Dense` in keras. \n",
    "\n",
    "Let's create a MLP with 10 input neurons, two hidden layers with 100 hidden neurons each and 3 output neurons. \n",
    "\n",
    "The input layer is represented by the `Input` layer:\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Input((10,)))\n",
    "```\n",
    "\n",
    "The first hidden layer can be added to the model with:\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Dense(100, activation=\"relu\"))\n",
    "```\n",
    "\n",
    "The layer has 100 neurons and uses the ReLU activation function. One could optionally define the activation function as an additional \"layer\", but it is usually not needed:\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Dense(100))\n",
    "model.add(tf.keras.layers.Activation('relu'))\n",
    "```\n",
    "\n",
    "Adding more layers is straightforward:\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Dense(100, activation=\"relu\"))\n",
    "```\n",
    "\n",
    "Finally, we can add the output layer. The activation function depends on the problem:\n",
    "\n",
    "* For regression problems, a linear activation function should be used when the targets can take any value (e.g. Q-values):\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Dense(3, activation=\"linear\"))\n",
    "```\n",
    "\n",
    "If the targets are bounded between 0 and 1, a logistic/sigmoid function can be used:\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Dense(3, activation=\"sigmoid\"))\n",
    "```\n",
    "\n",
    "* For multi-class classification problems, a softmax activation function should be used:\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Dense(3, activation=\"softmax\"))\n",
    "```\n",
    "\n",
    "This defines fully the structure of your desired neural network.\n",
    "\n",
    "**Q:** Implement a neural network for classification with 10 input neurons, two hidden layers with 100 neurons each (using ReLU) and 3 output neurons.\n",
    "\n",
    "*Hint:* `print(model.summary())` gives you a summary of the architecture of your model. Note in particular the number of trainable parameters (weights and biases)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next step is to choose an **optimizer** for the neural network, i.e. a variant of gradient descent that will be used to iteratively modify the parameters.\n",
    "\n",
    "`keras` provides an extensive list of optimizers: <https://keras.io/optimizers/>. The most useful in practice are:\n",
    "\n",
    "* `SGD`, the vanilla stochastic gradient descent.\n",
    "\n",
    "```python\n",
    "optimizer = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9, nesterov=True)\n",
    "```\n",
    "\n",
    "* `RMSprop`, using second moments:\n",
    "\n",
    "```python\n",
    "optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.001)\n",
    "```\n",
    "\n",
    "* `Adam`:\n",
    "\n",
    "```python\n",
    "optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)\n",
    "```\n",
    "\n",
    "Choosing a optimizer is a matter of taste and trial-and-error. In deep RL, a good choice is Adam: the default values for its other parameters are usually good, it converges well, so your only job is to find the right learning rate.\n",
    "\n",
    "Finally, the model must be **compiled** by defining:\n",
    "\n",
    "* A loss function. For multi-class classification, it should be `'categorical_crossentropy'`. For regression, it can be `'mse'`. See the list of built-in loss functions here: <https://keras.io/losses/> but know that you can also simply define your own.\n",
    "\n",
    "* The chosen optimizer.\n",
    "\n",
    "* The metrics, i.e. what you want tensorflow to print during training. By default it only prints the current value of the loss function. For classification tasks, it usually makes more sense to also print the `accuracy`.\n",
    "\n",
    "```python\n",
    "model.compile(\n",
    "    loss='categorical_crossentropy', \n",
    "    optimizer=optimizer,\n",
    "    metrics=['accuracy']\n",
    ")\n",
    "```\n",
    "\n",
    "**Q:** Compile the model for classification, using the Adam optimizer and a learning rate of 0.01."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now train the model on some dummy data. To show the power of deep neural networks, we will try to learn noise by heart.\n",
    "\n",
    "The following cell creates an input tensor `X` with 1000 random vectors of 10 elements, with values sampled between -1 and 1. The targets (desired outputs) `t` are class indices (0, 1 or 2), also randomly selected.  \n",
    "\n",
    "However, neural networks expect **one-hot encoded vectors** for the target, i.e. (1, 0, 0), (0, 1, 0), (0, 0, 1) instead of 0, 1, 2. The method `tf.keras.utils.to_categorical` allows you to do that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = np.random.uniform(-1.0, 1.0, (1000, 10))\n",
    "t = np.random.randint(0, 3, (1000, ))\n",
    "T = tf.keras.utils.to_categorical(t, 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's learn it. The `Sequential` model has a method called `fit()` where you simply pass the training data `(X, T)` and some other parameters, such as:\n",
    "\n",
    "* the batch size,\n",
    "* the total number of epochs,\n",
    "* the proportion of training examples to keep in order to compute the validation loss/accuracy (optional but recommmended).\n",
    "\n",
    "```python\n",
    "# Training\n",
    "history = tf.keras.callbacks.History()\n",
    "\n",
    "model.fit(\n",
    "    X, T,\n",
    "    batch_size=100, \n",
    "    epochs=50,\n",
    "    validation_split=0.1,\n",
    "    callbacks=[history]\n",
    ")\n",
    "```\n",
    "\n",
    "**Q:** Train the model on the data, using a batch size of 100 for 50 epochs. Explain why you obtained this result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training a MLP on MNIST\n",
    "\n",
    "Let's now try to learn something a bit more serious, the MNIST dataset. The following cell load the MNIST data (training set 60000 28x28 monochrome images, test set of 10000 images), normalizes it (values betwen 0 and 1 for each pixel), removes the mean image from the training set and transforms the targets to one-hot encoded vectors for the 10 classes. See the neurocomputing exercise for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the MNIST dataset\n",
    "(X_train, t_train), (X_test, t_test) = tf.keras.datasets.mnist.load_data()\n",
    "print(\"Training data:\", X_train.shape, t_train.shape)\n",
    "print(\"Test data:\", X_test.shape, t_test.shape)\n",
    "\n",
    "# Reshape the images to vectors and normalize\n",
    "X_train = X_train.reshape(X_train.shape[0], 784).astype('float32') / 255.\n",
    "X_test = X_test.reshape(X_test.shape[0], 784).astype('float32') / 255.\n",
    "\n",
    "# Mean removal\n",
    "X_mean = np.mean(X_train, axis=0)\n",
    "X_train -= X_mean\n",
    "X_test -= X_mean\n",
    "\n",
    "# One-hot encoded outputs\n",
    "T_train = tf.keras.utils.to_categorical(t_train, 10)\n",
    "T_test = tf.keras.utils.to_categorical(t_test, 10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q:** Create a fully connected neural network with 784 input neurons (one per pixel), 10 softmax output neurons and whatever you want in the middle, so that it can reach around 98% validation accuracy after **20 epochs**.\n",
    "\n",
    "* Put the network creation (including `compile()`) in a method `create_model()`, so that you can create a model multiple times.\n",
    "* Choose a good value for the learning rate.\n",
    "* Do not exagerate with the number of layers and neurons. Two or there hidden layers with 100 to 300 neurons are more than enough.\n",
    "* You will quickly observe that the network overfits: the training accuracy is higher than the validation accuracy. The training accuracy actually goes to 100% if your network is too big. In that case, feel free to add a dropout layer after each fully-connected layer:\n",
    "\n",
    "```python\n",
    "model.add(tf.keras.layers.Dropout(0.5))\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After training, one should evaluate the model on the test set. `keras` provides an `evaluate()` method that computes the different metrics (in our case the loss) on the data:\n",
    "\n",
    "```python\n",
    "score = model.evaluate(X_test, T_test)\n",
    "```\n",
    "\n",
    "Another solution would be to `predict()` labels on the test set and manually compare them to the ground truth:\n",
    "\n",
    "```python\n",
    "Y = model.predict(X_test)\n",
    "loss = - np.mean(T_test * np.log(Y))\n",
    "predicted_classes = np.argmax(Y, axis=1)\n",
    "accuracy = 1.0 - np.sum(predicted_classes != t_test)/t_test.shape[0]\n",
    "```\n",
    "\n",
    "Another important thing to visualize after training is how the training and validation loss (or accuracy) evolved during training. The `fit()` method updates a `History` object which contains the history of your metrics (loss and accuracy) after each epoch of training. These are simple numpy arrays, accessible with:\n",
    "\n",
    "```python\n",
    "history.history['loss']\n",
    "history.history['val_loss']\n",
    "history.history['accuracy']\n",
    "history.history['val_accuracy']\n",
    "```\n",
    "\n",
    "**Q:** Compute the test loss and accuracy of your model. Plot the history of the training and validation loss/accuracy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Correlated inputs\n",
    "\n",
    "Now that we have a basic NN working on MNIST, let's investigate why deep NN hate sequentially correlated inputs (which is the main justification for the experience replay memory in DQN). Is that really true, or is just some mathematical assumption that does not matter in practice?\n",
    "\n",
    "The idea of this part is the following: we will train the same network as before for 20 epochs, but each epoch will train the network on all the 0s first, then all the 1s, etc. Each epoch will contain the same number of training examples as before, but the order of presentation will simply be different.\n",
    "\n",
    "To get all examples of the training set which have the target 3 (for example), you just have to slice the matrices accordingly:\n",
    "\n",
    "```python\n",
    "X = X_train[t_train==3, :]\n",
    "T = T_train[t_train==3]\n",
    "```\n",
    "\n",
    "**Q:** Train the same network as before (but reinitialize it!) for 20 epochs, with each epoch sequentially iterating over the classes 0, 1, 2, 3, etc. Plot the loss and accurary during training. What do you observe?\n",
    "\n",
    "*Hint:* you will have two for loops to write: one over the epochs, one over the digits.\n",
    "\n",
    "```python\n",
    "for e in range(20):\n",
    "    for c in range(10):\n",
    "        model.fit(...)\n",
    "```\n",
    "\n",
    "You should only do one epoch for each call to `fit()`. Set `verbose=0` in `fit()` to avoid printing too much info."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q:** Evaluate the model after training on the whole test set. What happens?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q:** To better understand what happened, compute the test accuracy of the network on each class of the test set individually: all the 0s of the test set, then all the 1s, etc. What happens? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Q:** Increase and decrease the learning rate of the optimizer. What do you observe? Is there a solution to this problem? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}