{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Keras tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of this tutorial is to very quickly present keras, the high-level API of tensorflow, as it has already been seen in the Neurocomputing exercises. We will train a small fully-connected network on MNIST and observe what happens when the inputs or outputs are correlated, by training successively on the 0 digits, then the 1, etc. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Keras\n", "\n", "The first step is to install tensorflow. The easiest way is to use pip:\n", " \n", "```bash\n", "pip install tensorflow\n", "```\n", "\n", "`keras` is now available as a submodule of tensorflow (you can also install it as a separate package):\n", "\n", "```python\n", "import tensorflow as tf\n", "```\n", "\n", "Keras provides a lot of ready-made layer types, activation functions, optimizers and so on. Do not hesitate to read its documentation on ." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import tensorflow as tf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most important object in keras is `Sequential`. It is a container where you sequentially add layers of neurons (fully-connected, convolutional, recurrent, etc) and other stuff. It represents your model, i.e. the neural network itself.\n", "\n", "```python\n", "model = tf.keras.models.Sequential()\n", "```\n", "\n", "You can then `add()` layers to the model. A fully-connected layer is called `Dense` in keras. \n", "\n", "Let's create a MLP with 10 input neurons, two hidden layers with 100 hidden neurons each and 3 output neurons. \n", "\n", "The input layer is represented by the `Input` layer:\n", "\n", "```python\n", "model.add(tf.keras.layers.Input((10,)))\n", "```\n", "\n", "The first hidden layer can be added to the model with:\n", "\n", "```python\n", "model.add(tf.keras.layers.Dense(100, activation=\"relu\"))\n", "```\n", "\n", "The layer has 100 neurons and uses the ReLU activation function. One could optionally define the activation function as an additional \"layer\", but it is usually not needed:\n", "\n", "```python\n", "model.add(tf.keras.layers.Dense(100))\n", "model.add(tf.keras.layers.Activation('relu'))\n", "```\n", "\n", "Adding more layers is straightforward:\n", "\n", "```python\n", "model.add(tf.keras.layers.Dense(100, activation=\"relu\"))\n", "```\n", "\n", "Finally, we can add the output layer. The activation function depends on the problem:\n", "\n", "* For regression problems, a linear activation function should be used when the targets can take any value (e.g. Q-values):\n", "\n", "```python\n", "model.add(tf.keras.layers.Dense(3, activation=\"linear\"))\n", "```\n", "\n", "If the targets are bounded between 0 and 1, a logistic/sigmoid function can be used:\n", "\n", "```python\n", "model.add(tf.keras.layers.Dense(3, activation=\"sigmoid\"))\n", "```\n", "\n", "* For multi-class classification problems, a softmax activation function should be used:\n", "\n", "```python\n", "model.add(tf.keras.layers.Dense(3, activation=\"softmax\"))\n", "```\n", "\n", "This defines fully the structure of your desired neural network.\n", "\n", "**Q:** Implement a neural network for classification with 10 input neurons, two hidden layers with 100 neurons each (using ReLU) and 3 output neurons.\n", "\n", "*Hint:* `print(model.summary())` gives you a summary of the architecture of your model. Note in particular the number of trainable parameters (weights and biases)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Metal device set to: Apple M1 Pro\n", "\n", "systemMemory: 16.00 GB\n", "maxCacheSize: 5.33 GB\n", "\n", "Model: \"sequential\"\n", "_________________________________________________________________\n", " Layer (type) Output Shape Param # \n", "=================================================================\n", " dense (Dense) (None, 100) 1100 \n", " \n", " dense_1 (Dense) (None, 100) 10100 \n", " \n", " dense_2 (Dense) (None, 3) 303 \n", " \n", "=================================================================\n", "Total params: 11,503\n", "Trainable params: 11,503\n", "Non-trainable params: 0\n", "_________________________________________________________________\n", "None\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 18:31:35.132826: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.\n", "2022-11-24 18:31:35.133149: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )\n" ] } ], "source": [ "model = tf.keras.models.Sequential()\n", "model.add(tf.keras.layers.Input((10,)))\n", "model.add(tf.keras.layers.Dense(100, activation=\"relu\"))\n", "model.add(tf.keras.layers.Dense(100, activation='relu'))\n", "model.add(tf.keras.layers.Dense(3, activation='softmax'))\n", "print(model.summary())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next step is to choose an **optimizer** for the neural network, i.e. a variant of gradient descent that will be used to iteratively modify the parameters.\n", "\n", "`keras` provides an extensive list of optimizers: . The most useful in practice are:\n", "\n", "* `SGD`, the vanilla stochastic gradient descent.\n", "\n", "```python\n", "optimizer = tf.keras.optimizers.SGD(learning_rate=0.001, momentum=0.9, nesterov=True)\n", "```\n", "\n", "* `RMSprop`, using second moments:\n", "\n", "```python\n", "optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.001)\n", "```\n", "\n", "* `Adam`:\n", "\n", "```python\n", "optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)\n", "```\n", "\n", "Choosing a optimizer is a matter of taste and trial-and-error. In deep RL, a good choice is Adam: the default values for its other parameters are usually good, it converges well, so your only job is to find the right learning rate.\n", "\n", "Finally, the model must be **compiled** by defining:\n", "\n", "* A loss function. For multi-class classification, it should be `'categorical_crossentropy'`. For regression, it can be `'mse'`. See the list of built-in loss functions here: but know that you can also simply define your own.\n", "\n", "* The chosen optimizer.\n", "\n", "* The metrics, i.e. what you want tensorflow to print during training. By default it only prints the current value of the loss function. For classification tasks, it usually makes more sense to also print the `accuracy`.\n", "\n", "```python\n", "model.compile(\n", " loss='categorical_crossentropy', \n", " optimizer=optimizer,\n", " metrics=['accuracy']\n", ")\n", "```\n", "\n", "**Q:** Compile the model for classification, using the Adam optimizer and a learning rate of 0.01." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)\n", "\n", "model.compile(\n", " loss='categorical_crossentropy', \n", " optimizer=optimizer,\n", " metrics=['accuracy']\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now train the model on some dummy data. To show the power of deep neural networks, we will try to learn noise by heart.\n", "\n", "The following cell creates an input tensor `X` with 1000 random vectors of 10 elements, with values sampled between -1 and 1. The targets (desired outputs) `t` are class indices (0, 1 or 2), also randomly selected. \n", "\n", "However, neural networks expect **one-hot encoded vectors** for the target, i.e. (1, 0, 0), (0, 1, 0), (0, 0, 1) instead of 0, 1, 2. The method `tf.keras.utils.to_categorical` allows you to do that." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "X = np.random.uniform(-1.0, 1.0, (1000, 10))\n", "t = np.random.randint(0, 3, (1000, ))\n", "T = tf.keras.utils.to_categorical(t, 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's learn it. The `Sequential` model has a method called `fit()` where you simply pass the training data `(X, T)` and some other parameters, such as:\n", "\n", "* the batch size,\n", "* the total number of epochs,\n", "* the proportion of training examples to keep in order to compute the validation loss/accuracy (optional but recommmended).\n", "\n", "```python\n", "# Training\n", "history = tf.keras.callbacks.History()\n", "\n", "model.fit(\n", " X, T,\n", " batch_size=100, \n", " epochs=50,\n", " validation_split=0.1,\n", " callbacks=[history]\n", ")\n", "```\n", "\n", "**Q:** Train the model on the data, using a batch size of 100 for 50 epochs. Explain why you obtained this result." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/50\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 18:31:37.930309: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz\n", "2022-11-24 18:31:38.157447: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "9/9 - 1s - loss: 1.1183 - accuracy: 0.3244 - val_loss: 1.1404 - val_accuracy: 0.3300 - 1s/epoch - 162ms/step\n", "Epoch 2/50\n", "9/9 - 0s - loss: 1.0802 - accuracy: 0.3878 - val_loss: 1.1295 - val_accuracy: 0.2700 - 96ms/epoch - 11ms/step\n", "Epoch 3/50\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 18:31:39.375487: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "9/9 - 0s - loss: 1.0632 - accuracy: 0.4667 - val_loss: 1.1320 - val_accuracy: 0.2500 - 93ms/epoch - 10ms/step\n", "Epoch 4/50\n", "9/9 - 0s - loss: 1.0375 - accuracy: 0.4767 - val_loss: 1.1671 - val_accuracy: 0.2500 - 89ms/epoch - 10ms/step\n", "Epoch 5/50\n", "9/9 - 0s - loss: 1.0111 - accuracy: 0.5111 - val_loss: 1.1752 - val_accuracy: 0.3100 - 89ms/epoch - 10ms/step\n", "Epoch 6/50\n", "9/9 - 0s - loss: 0.9863 - accuracy: 0.5033 - val_loss: 1.2184 - val_accuracy: 0.2800 - 90ms/epoch - 10ms/step\n", "Epoch 7/50\n", "9/9 - 0s - loss: 0.9553 - accuracy: 0.5433 - val_loss: 1.2167 - val_accuracy: 0.2800 - 89ms/epoch - 10ms/step\n", "Epoch 8/50\n", "9/9 - 0s - loss: 0.9197 - accuracy: 0.5678 - val_loss: 1.3535 - val_accuracy: 0.3000 - 90ms/epoch - 10ms/step\n", "Epoch 9/50\n", "9/9 - 0s - loss: 0.8824 - accuracy: 0.5900 - val_loss: 1.3571 - val_accuracy: 0.3200 - 88ms/epoch - 10ms/step\n", "Epoch 10/50\n", "9/9 - 0s - loss: 0.8560 - accuracy: 0.6267 - val_loss: 1.3682 - val_accuracy: 0.2400 - 90ms/epoch - 10ms/step\n", "Epoch 11/50\n", "9/9 - 0s - loss: 0.8125 - accuracy: 0.6467 - val_loss: 1.4521 - val_accuracy: 0.2900 - 89ms/epoch - 10ms/step\n", "Epoch 12/50\n", "9/9 - 0s - loss: 0.7604 - accuracy: 0.6667 - val_loss: 1.4538 - val_accuracy: 0.2700 - 88ms/epoch - 10ms/step\n", "Epoch 13/50\n", "9/9 - 0s - loss: 0.6953 - accuracy: 0.7289 - val_loss: 1.5755 - val_accuracy: 0.2900 - 87ms/epoch - 10ms/step\n", "Epoch 14/50\n", "9/9 - 0s - loss: 0.6484 - accuracy: 0.7500 - val_loss: 1.6463 - val_accuracy: 0.3000 - 88ms/epoch - 10ms/step\n", "Epoch 15/50\n", "9/9 - 0s - loss: 0.6264 - accuracy: 0.7433 - val_loss: 1.7369 - val_accuracy: 0.2800 - 89ms/epoch - 10ms/step\n", "Epoch 16/50\n", "9/9 - 0s - loss: 0.5558 - accuracy: 0.8022 - val_loss: 1.8089 - val_accuracy: 0.2600 - 88ms/epoch - 10ms/step\n", "Epoch 17/50\n", "9/9 - 0s - loss: 0.4886 - accuracy: 0.8300 - val_loss: 1.9365 - val_accuracy: 0.2700 - 93ms/epoch - 10ms/step\n", "Epoch 18/50\n", "9/9 - 0s - loss: 0.4341 - accuracy: 0.8644 - val_loss: 2.0011 - val_accuracy: 0.3100 - 101ms/epoch - 11ms/step\n", "Epoch 19/50\n", "9/9 - 0s - loss: 0.3999 - accuracy: 0.8700 - val_loss: 2.1024 - val_accuracy: 0.2900 - 106ms/epoch - 12ms/step\n", "Epoch 20/50\n", "9/9 - 0s - loss: 0.3671 - accuracy: 0.8833 - val_loss: 2.3319 - val_accuracy: 0.3400 - 99ms/epoch - 11ms/step\n", "Epoch 21/50\n", "9/9 - 0s - loss: 0.3761 - accuracy: 0.8667 - val_loss: 2.4605 - val_accuracy: 0.2700 - 100ms/epoch - 11ms/step\n", "Epoch 22/50\n", "9/9 - 0s - loss: 0.3324 - accuracy: 0.8822 - val_loss: 2.4886 - val_accuracy: 0.3200 - 97ms/epoch - 11ms/step\n", "Epoch 23/50\n", "9/9 - 0s - loss: 0.2755 - accuracy: 0.9100 - val_loss: 2.5855 - val_accuracy: 0.3200 - 98ms/epoch - 11ms/step\n", "Epoch 24/50\n", "9/9 - 0s - loss: 0.2489 - accuracy: 0.9256 - val_loss: 2.7775 - val_accuracy: 0.3400 - 98ms/epoch - 11ms/step\n", "Epoch 25/50\n", "9/9 - 0s - loss: 0.2084 - accuracy: 0.9489 - val_loss: 2.7613 - val_accuracy: 0.2900 - 93ms/epoch - 10ms/step\n", "Epoch 26/50\n", "9/9 - 0s - loss: 0.1561 - accuracy: 0.9744 - val_loss: 3.0682 - val_accuracy: 0.3000 - 97ms/epoch - 11ms/step\n", "Epoch 27/50\n", "9/9 - 0s - loss: 0.1364 - accuracy: 0.9756 - val_loss: 3.0417 - val_accuracy: 0.2900 - 91ms/epoch - 10ms/step\n", "Epoch 28/50\n", "9/9 - 0s - loss: 0.1177 - accuracy: 0.9833 - val_loss: 3.2456 - val_accuracy: 0.2900 - 92ms/epoch - 10ms/step\n", "Epoch 29/50\n", "9/9 - 0s - loss: 0.0996 - accuracy: 0.9933 - val_loss: 3.3559 - val_accuracy: 0.3000 - 90ms/epoch - 10ms/step\n", "Epoch 30/50\n", "9/9 - 0s - loss: 0.0882 - accuracy: 0.9911 - val_loss: 3.4259 - val_accuracy: 0.3000 - 90ms/epoch - 10ms/step\n", "Epoch 31/50\n", "9/9 - 0s - loss: 0.0788 - accuracy: 0.9944 - val_loss: 3.6843 - val_accuracy: 0.2800 - 90ms/epoch - 10ms/step\n", "Epoch 32/50\n", "9/9 - 0s - loss: 0.0636 - accuracy: 0.9978 - val_loss: 3.6871 - val_accuracy: 0.3000 - 89ms/epoch - 10ms/step\n", "Epoch 33/50\n", "9/9 - 0s - loss: 0.0502 - accuracy: 1.0000 - val_loss: 3.8740 - val_accuracy: 0.3000 - 87ms/epoch - 10ms/step\n", "Epoch 34/50\n", "9/9 - 0s - loss: 0.0451 - accuracy: 0.9989 - val_loss: 3.7788 - val_accuracy: 0.3100 - 87ms/epoch - 10ms/step\n", "Epoch 35/50\n", "9/9 - 0s - loss: 0.0377 - accuracy: 1.0000 - val_loss: 4.1327 - val_accuracy: 0.2900 - 89ms/epoch - 10ms/step\n", "Epoch 36/50\n", "9/9 - 0s - loss: 0.0320 - accuracy: 1.0000 - val_loss: 3.9984 - val_accuracy: 0.2900 - 91ms/epoch - 10ms/step\n", "Epoch 37/50\n", "9/9 - 0s - loss: 0.0261 - accuracy: 1.0000 - val_loss: 4.1676 - val_accuracy: 0.2900 - 88ms/epoch - 10ms/step\n", "Epoch 38/50\n", "9/9 - 0s - loss: 0.0223 - accuracy: 1.0000 - val_loss: 4.2871 - val_accuracy: 0.2800 - 91ms/epoch - 10ms/step\n", "Epoch 39/50\n", "9/9 - 0s - loss: 0.0198 - accuracy: 1.0000 - val_loss: 4.3003 - val_accuracy: 0.2900 - 89ms/epoch - 10ms/step\n", "Epoch 40/50\n", "9/9 - 0s - loss: 0.0175 - accuracy: 1.0000 - val_loss: 4.4379 - val_accuracy: 0.2800 - 90ms/epoch - 10ms/step\n", "Epoch 41/50\n", "9/9 - 0s - loss: 0.0164 - accuracy: 1.0000 - val_loss: 4.4096 - val_accuracy: 0.2800 - 89ms/epoch - 10ms/step\n", "Epoch 42/50\n", "9/9 - 0s - loss: 0.0145 - accuracy: 1.0000 - val_loss: 4.5412 - val_accuracy: 0.2900 - 90ms/epoch - 10ms/step\n", "Epoch 43/50\n", "9/9 - 0s - loss: 0.0134 - accuracy: 1.0000 - val_loss: 4.5190 - val_accuracy: 0.2900 - 89ms/epoch - 10ms/step\n", "Epoch 44/50\n", "9/9 - 0s - loss: 0.0124 - accuracy: 1.0000 - val_loss: 4.6218 - val_accuracy: 0.2900 - 89ms/epoch - 10ms/step\n", "Epoch 45/50\n", "9/9 - 0s - loss: 0.0115 - accuracy: 1.0000 - val_loss: 4.6663 - val_accuracy: 0.2800 - 87ms/epoch - 10ms/step\n", "Epoch 46/50\n", "9/9 - 0s - loss: 0.0108 - accuracy: 1.0000 - val_loss: 4.6790 - val_accuracy: 0.3000 - 91ms/epoch - 10ms/step\n", "Epoch 47/50\n", "9/9 - 0s - loss: 0.0101 - accuracy: 1.0000 - val_loss: 4.7649 - val_accuracy: 0.2900 - 87ms/epoch - 10ms/step\n", "Epoch 48/50\n", "9/9 - 0s - loss: 0.0096 - accuracy: 1.0000 - val_loss: 4.8301 - val_accuracy: 0.2900 - 95ms/epoch - 11ms/step\n", "Epoch 49/50\n", "9/9 - 0s - loss: 0.0089 - accuracy: 1.0000 - val_loss: 4.8164 - val_accuracy: 0.2900 - 90ms/epoch - 10ms/step\n", "Epoch 50/50\n", "9/9 - 0s - loss: 0.0084 - accuracy: 1.0000 - val_loss: 4.9097 - val_accuracy: 0.2900 - 88ms/epoch - 10ms/step\n" ] } ], "source": [ "history = model.fit(\n", " X, T,\n", " batch_size=100, \n", " epochs=50,\n", " validation_split=0.1,\n", " verbose=2\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**A:** The final training is 100%, the validation accuracy is 33% (may vary depending on initialization). The network has learned the training examples by heart, although they are totally random, but totally fails to generalize.\n", "\n", "The main is reason is that we have only 1000 training examples, with a total number of free parameters (VC dimension) around 11500. By definition, the model can learn this training set perfectly, although it is totally random. Its VC dimension is however way to high to generalize anything. It is even worse here: as the data is random, there is nothing to generalize. A nice example to understand why NN overfit..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training a MLP on MNIST\n", "\n", "Let's now try to learn something a bit more serious, the MNIST dataset. The following cell load the MNIST data (training set 60000 28x28 monochrome images, test set of 10000 images), normalizes it (values betwen 0 and 1 for each pixel), removes the mean image from the training set and transforms the targets to one-hot encoded vectors for the 10 classes. See the neurocomputing exercise for more details." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training data: (60000, 28, 28) (60000,)\n", "Test data: (10000, 28, 28) (10000,)\n" ] } ], "source": [ "# Load the MNIST dataset\n", "(X_train, t_train), (X_test, t_test) = tf.keras.datasets.mnist.load_data()\n", "print(\"Training data:\", X_train.shape, t_train.shape)\n", "print(\"Test data:\", X_test.shape, t_test.shape)\n", "\n", "# Reshape the images to vectors and normalize\n", "X_train = X_train.reshape(X_train.shape[0], 784).astype('float32') / 255.\n", "X_test = X_test.reshape(X_test.shape[0], 784).astype('float32') / 255.\n", "\n", "# Mean removal\n", "X_mean = np.mean(X_train, axis=0)\n", "X_train -= X_mean\n", "X_test -= X_mean\n", "\n", "# One-hot encoded outputs\n", "T_train = tf.keras.utils.to_categorical(t_train, 10)\n", "T_test = tf.keras.utils.to_categorical(t_test, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Create a fully connected neural network with 784 input neurons (one per pixel), 10 softmax output neurons and whatever you want in the middle, so that it can reach around 98% validation accuracy after **20 epochs**.\n", "\n", "* Put the network creation (including `compile()`) in a method `create_model()`, so that you can create a model multiple times.\n", "* Choose a good value for the learning rate.\n", "* Do not exagerate with the number of layers and neurons. Two or there hidden layers with 100 to 300 neurons are more than enough.\n", "* You will quickly observe that the network overfits: the training accuracy is higher than the validation accuracy. The training accuracy actually goes to 100% if your network is too big. In that case, feel free to add a dropout layer after each fully-connected layer:\n", "\n", "```python\n", "model.add(tf.keras.layers.Dropout(0.5))\n", "```" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def create_model():\n", " # Create the model\n", " model = tf.keras.models.Sequential()\n", " \n", " # Input layer with 784 pixels\n", " model.add(tf.keras.layers.Input((784,)))\n", "\n", " # Hidden layer with 150 neurons\n", " model.add(tf.keras.layers.Dense(150, activation=\"relu\"))\n", " model.add(tf.keras.layers.Dropout(0.5))\n", "\n", " # Second hidden layer with 100 neurons\n", " model.add(tf.keras.layers.Dense(100, activation=\"relu\"))\n", " model.add(tf.keras.layers.Dropout(0.5))\n", "\n", " # Softmax output layer with 10 neurons\n", " model.add(tf.keras.layers.Dense(10, activation=\"softmax\"))\n", "\n", " # Learning rule\n", " optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)\n", "\n", " # Loss function\n", " model.compile(\n", " loss='categorical_crossentropy', # loss\n", " optimizer=optimizer, # learning rule\n", " metrics=['accuracy'] # show accuracy\n", " )\n", " print(model.summary())\n", " \n", " return model" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model: \"sequential_1\"\n", "_________________________________________________________________\n", " Layer (type) Output Shape Param # \n", "=================================================================\n", " dense_3 (Dense) (None, 150) 117750 \n", " \n", " dropout (Dropout) (None, 150) 0 \n", " \n", " dense_4 (Dense) (None, 100) 15100 \n", " \n", " dropout_1 (Dropout) (None, 100) 0 \n", " \n", " dense_5 (Dense) (None, 10) 1010 \n", " \n", "=================================================================\n", "Total params: 133,860\n", "Trainable params: 133,860\n", "Non-trainable params: 0\n", "_________________________________________________________________\n", "None\n", "Epoch 1/20\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 18:31:53.203907: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "538/540 [============================>.] - ETA: 0s - loss: 0.5194 - accuracy: 0.8409" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 18:31:58.951897: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "540/540 [==============================] - 6s 11ms/step - loss: 0.5185 - accuracy: 0.8411 - val_loss: 0.1482 - val_accuracy: 0.9577\n", "Epoch 2/20\n", "540/540 [==============================] - 6s 10ms/step - loss: 0.2497 - accuracy: 0.9261 - val_loss: 0.1195 - val_accuracy: 0.9637\n", "Epoch 3/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.2007 - accuracy: 0.9414 - val_loss: 0.0993 - val_accuracy: 0.9705\n", "Epoch 4/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.1732 - accuracy: 0.9485 - val_loss: 0.0932 - val_accuracy: 0.9718\n", "Epoch 5/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.1580 - accuracy: 0.9530 - val_loss: 0.0823 - val_accuracy: 0.9747\n", "Epoch 6/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.1453 - accuracy: 0.9562 - val_loss: 0.0819 - val_accuracy: 0.9752\n", "Epoch 7/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.1319 - accuracy: 0.9602 - val_loss: 0.0800 - val_accuracy: 0.9747\n", "Epoch 8/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.1238 - accuracy: 0.9616 - val_loss: 0.0740 - val_accuracy: 0.9780\n", "Epoch 9/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.1178 - accuracy: 0.9639 - val_loss: 0.0726 - val_accuracy: 0.9772\n", "Epoch 10/20\n", "540/540 [==============================] - 6s 10ms/step - loss: 0.1109 - accuracy: 0.9657 - val_loss: 0.0719 - val_accuracy: 0.9785\n", "Epoch 11/20\n", "540/540 [==============================] - 6s 10ms/step - loss: 0.1055 - accuracy: 0.9670 - val_loss: 0.0729 - val_accuracy: 0.9772\n", "Epoch 12/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.1029 - accuracy: 0.9681 - val_loss: 0.0718 - val_accuracy: 0.9777\n", "Epoch 13/20\n", "540/540 [==============================] - 6s 10ms/step - loss: 0.1007 - accuracy: 0.9683 - val_loss: 0.0664 - val_accuracy: 0.9803\n", "Epoch 14/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.0939 - accuracy: 0.9701 - val_loss: 0.0688 - val_accuracy: 0.9778\n", "Epoch 15/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.0916 - accuracy: 0.9716 - val_loss: 0.0684 - val_accuracy: 0.9793\n", "Epoch 16/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.0889 - accuracy: 0.9723 - val_loss: 0.0661 - val_accuracy: 0.9822\n", "Epoch 17/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.0845 - accuracy: 0.9740 - val_loss: 0.0687 - val_accuracy: 0.9807\n", "Epoch 18/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.0840 - accuracy: 0.9733 - val_loss: 0.0685 - val_accuracy: 0.9812\n", "Epoch 19/20\n", "540/540 [==============================] - 6s 11ms/step - loss: 0.0791 - accuracy: 0.9755 - val_loss: 0.0717 - val_accuracy: 0.9808\n", "Epoch 20/20\n", "540/540 [==============================] - 5s 10ms/step - loss: 0.0817 - accuracy: 0.9740 - val_loss: 0.0669 - val_accuracy: 0.9815\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = create_model()\n", "\n", "# Training\n", "history = tf.keras.callbacks.History()\n", "\n", "model.fit(\n", " X_train, T_train,\n", " batch_size=100, \n", " epochs=20,\n", " validation_split=0.1,\n", " callbacks=[history]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After training, one should evaluate the model on the test set. `keras` provides an `evaluate()` method that computes the different metrics (in our case the loss) on the data:\n", "\n", "```python\n", "score = model.evaluate(X_test, T_test)\n", "```\n", "\n", "Another solution would be to `predict()` labels on the test set and manually compare them to the ground truth:\n", "\n", "```python\n", "Y = model.predict(X_test)\n", "loss = - np.mean(T_test * np.log(Y))\n", "predicted_classes = np.argmax(Y, axis=1)\n", "accuracy = 1.0 - np.sum(predicted_classes != t_test)/t_test.shape[0]\n", "```\n", "\n", "Another important thing to visualize after training is how the training and validation loss (or accuracy) evolved during training. The `fit()` method updates a `History` object which contains the history of your metrics (loss and accuracy) after each epoch of training. These are simple numpy arrays, accessible with:\n", "\n", "```python\n", "history.history['loss']\n", "history.history['val_loss']\n", "history.history['accuracy']\n", "history.history['val_accuracy']\n", "```\n", "\n", "**Q:** Compute the test loss and accuracy of your model. Plot the history of the training and validation loss/accuracy." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 14/313 [>.............................] - ETA: 2s - loss: 0.0518 - accuracy: 0.9866" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 18:33:42.354208: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "313/313 [==============================] - 3s 9ms/step - loss: 0.0763 - accuracy: 0.9782\n", "Test loss: 0.07633379101753235\n", "Test accuracy: 0.9782000184059143\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Testing\n", "score = model.evaluate(X_test, T_test)\n", "print('Test loss:', score[0])\n", "print('Test accuracy:', score[1])\n", "\n", "plt.figure(figsize=(15, 6))\n", "plt.subplot(121)\n", "plt.plot(history.history['loss'], '-r', label=\"Training\")\n", "plt.plot(history.history['val_loss'], '-b', label=\"Validation\")\n", "plt.xlabel('Epoch #')\n", "plt.ylabel('Loss')\n", "plt.legend()\n", "plt.subplot(122)\n", "plt.plot(history.history['accuracy'], '-r', label=\"Training\")\n", "plt.plot(history.history['val_accuracy'], '-b', label=\"Validation\")\n", "plt.xlabel('Epoch #')\n", "plt.ylabel('Accuracy')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Correlated inputs\n", "\n", "Now that we have a basic NN working on MNIST, let's investigate why deep NN hate sequentially correlated inputs (which is the main justification for the experience replay memory in DQN). Is that really true, or is just some mathematical assumption that does not matter in practice?\n", "\n", "The idea of this part is the following: we will train the same network as before for 20 epochs, but each epoch will train the network on all the 0s first, then all the 1s, etc. Each epoch will contain the same number of training examples as before, but the order of presentation will simply be different.\n", "\n", "To get all examples of the training set which have the target 3 (for example), you just have to slice the matrices accordingly:\n", "\n", "```python\n", "X = X_train[t_train==3, :]\n", "T = T_train[t_train==3]\n", "```\n", "\n", "**Q:** Train the same network as before (but reinitialize it!) for 20 epochs, with each epoch sequentially iterating over the classes 0, 1, 2, 3, etc. Plot the loss and accurary during training. What do you observe?\n", "\n", "*Hint:* you will have two for loops to write: one over the epochs, one over the digits.\n", "\n", "```python\n", "for e in range(20):\n", " for c in range(10):\n", " model.fit(...)\n", "```\n", "\n", "You should only do one epoch for each call to `fit()`. Set `verbose=0` in `fit()` to avoid printing too much info." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model: \"sequential_2\"\n", "_________________________________________________________________\n", " Layer (type) Output Shape Param # \n", "=================================================================\n", " dense_6 (Dense) (None, 150) 117750 \n", " \n", " dropout_2 (Dropout) (None, 150) 0 \n", " \n", " dense_7 (Dense) (None, 100) 15100 \n", " \n", " dropout_3 (Dropout) (None, 100) 0 \n", " \n", " dense_8 (Dense) (None, 10) 1010 \n", " \n", "=================================================================\n", "Total params: 133,860\n", "Trainable params: 133,860\n", "Non-trainable params: 0\n", "_________________________________________________________________\n", "None\n", "Epoch: 1\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2022-11-24 18:33:52.698762: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.\n", "2022-11-24 18:33:53.547791: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " Training loss: 2.1789441108703613\n", " Training accuracy: 0.4577885866165161\n", " Validation loss: 0.06307889521121979\n", " Validation accuracy: 0.9831932783126831\n", "Epoch: 2\n", " Training loss: 1.1442071199417114\n", " Training accuracy: 0.7080687284469604\n", " Validation loss: 0.027842853218317032\n", " Validation accuracy: 0.9899159669876099\n", "Epoch: 3\n", " Training loss: 0.8554081320762634\n", " Training accuracy: 0.7674636244773865\n", " Validation loss: 0.06736434251070023\n", " Validation accuracy: 0.9831932783126831\n", "Epoch: 4\n", " Training loss: 0.6917049288749695\n", " Training accuracy: 0.7822189331054688\n", " Validation loss: 0.052796363830566406\n", " Validation accuracy: 0.9865546226501465\n", "Epoch: 5\n", " Training loss: 0.5919428467750549\n", " Training accuracy: 0.8147180080413818\n", " Validation loss: 0.04681151360273361\n", " Validation accuracy: 0.9831932783126831\n", "Epoch: 6\n", " Training loss: 0.4891780614852905\n", " Training accuracy: 0.8408666849136353\n", " Validation loss: 0.043771397322416306\n", " Validation accuracy: 0.9882352948188782\n", "Epoch: 7\n", " Training loss: 0.44778183102607727\n", " Training accuracy: 0.8599178194999695\n", " Validation loss: 0.03981785476207733\n", " Validation accuracy: 0.9882352948188782\n", "Epoch: 8\n", " Training loss: 0.48525184392929077\n", " Training accuracy: 0.8421741127967834\n", " Validation loss: 0.038229942321777344\n", " Validation accuracy: 0.9882352948188782\n", "Epoch: 9\n", " Training loss: 0.4880279004573822\n", " Training accuracy: 0.8502054810523987\n", " Validation loss: 0.04261568561196327\n", " Validation accuracy: 0.9899159669876099\n", "Epoch: 10\n", " Training loss: 0.41330939531326294\n", " Training accuracy: 0.86720210313797\n", " Validation loss: 0.043336544185876846\n", " Validation accuracy: 0.9865546226501465\n", "Epoch: 11\n", " Training loss: 0.42511239647865295\n", " Training accuracy: 0.8647740483283997\n", " Validation loss: 0.042776186019182205\n", " Validation accuracy: 0.9865546226501465\n", "Epoch: 12\n", " Training loss: 0.43565788865089417\n", " Training accuracy: 0.8683227896690369\n", " Validation loss: 0.037856556475162506\n", " Validation accuracy: 0.9865546226501465\n", "Epoch: 13\n", " Training loss: 0.4058941900730133\n", " Training accuracy: 0.8754202723503113\n", " Validation loss: 0.033589139580726624\n", " Validation accuracy: 0.9882352948188782\n", "Epoch: 14\n", " Training loss: 0.4229739308357239\n", " Training accuracy: 0.8685095310211182\n", " Validation loss: 0.04102521017193794\n", " Validation accuracy: 0.9848739504814148\n", "Epoch: 15\n", " Training loss: 0.40655088424682617\n", " Training accuracy: 0.8804632425308228\n", " Validation loss: 0.03230886533856392\n", " Validation accuracy: 0.9899159669876099\n", "Epoch: 16\n", " Training loss: 0.40451598167419434\n", " Training accuracy: 0.8726186156272888\n", " Validation loss: 0.035236991941928864\n", " Validation accuracy: 0.9848739504814148\n", "Epoch: 17\n", " Training loss: 0.3949117362499237\n", " Training accuracy: 0.8808367848396301\n", " Validation loss: 0.03679860755801201\n", " Validation accuracy: 0.9882352948188782\n", "Epoch: 18\n", " Training loss: 0.377407431602478\n", " Training accuracy: 0.8825177550315857\n", " Validation loss: 0.04177376627922058\n", " Validation accuracy: 0.9848739504814148\n", "Epoch: 19\n", " Training loss: 0.35344383120536804\n", " Training accuracy: 0.8911094665527344\n", " Validation loss: 0.03944191709160805\n", " Validation accuracy: 0.9865546226501465\n", "Epoch: 20\n", " Training loss: 0.3501055836677551\n", " Training accuracy: 0.8866268396377563\n", " Validation loss: 0.03972172364592552\n", " Validation accuracy: 0.9848739504814148\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model = create_model()\n", "\n", "history = tf.keras.callbacks.History()\n", "\n", "for e in range(20):\n", " print(\"Epoch: \", e+1)\n", " for c in range(10):\n", " # Training\n", " model.fit(\n", " X_train[t_train==c, :], T_train[t_train==c, :],\n", " batch_size=100, \n", " epochs=1,\n", " validation_split=0.1,\n", " verbose = 0,\n", " callbacks=[history]\n", " )\n", " print(\" Training loss:\", history.history[\"loss\"][-1])\n", " print(\" Training accuracy:\", history.history[\"accuracy\"][-1])\n", " print(\" Validation loss:\", history.history[\"val_loss\"][-1])\n", " print(\" Validation accuracy:\", history.history[\"val_accuracy\"][-1])\n", " \n", "plt.figure(figsize=(15, 6))\n", "plt.subplot(121)\n", "plt.plot(history.history['loss'], '-r', label=\"Training\")\n", "plt.plot(history.history['val_loss'], '-b', label=\"Validation\")\n", "plt.xlabel('Epoch #')\n", "plt.ylabel('Loss')\n", "plt.legend()\n", "plt.subplot(122)\n", "plt.plot(history.history['accuracy'], '-r', label=\"Training\")\n", "plt.plot(history.history['val_accuracy'], '-b', label=\"Validation\")\n", "plt.xlabel('Epoch #')\n", "plt.ylabel('Accuracy')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**A:** The training accuracy slowly increases (with some oscillations, some numbers are harder to learn than others), but the validation accuracy is suspiciously high right from the start..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Evaluate the model after training on the whole test set. What happens?" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test loss: 0.7852121591567993\n", "Test accuracy: 0.778700053691864\n" ] } ], "source": [ "# Testing\n", "score = model.evaluate(X_test, T_test, verbose=0)\n", "print('Test loss:', score[0])\n", "print('Test accuracy:', score[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**A:** Horror! The test accuracy is now awful, although the training and validation accuracies were fine after 20 epochs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** To better understand what happened, compute the test accuracy of the network on each class of the test set individually: all the 0s of the test set, then all the 1s, etc. What happens? " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Class 0\n", "Test loss: 0.41480857133865356\n", "Test accuracy: 0.9061225056648254\n", "Class 1\n", "Test loss: 0.5983759164810181\n", "Test accuracy: 0.882819414138794\n", "Class 2\n", "Test loss: 0.9812650084495544\n", "Test accuracy: 0.7490310072898865\n", "Class 3\n", "Test loss: 1.302044153213501\n", "Test accuracy: 0.5524752736091614\n", "Class 4\n", "Test loss: 1.409988284111023\n", "Test accuracy: 0.5346232652664185\n", "Class 5\n", "Test loss: 1.21005380153656\n", "Test accuracy: 0.6838565468788147\n", "Class 6\n", "Test loss: 0.16447730362415314\n", "Test accuracy: 0.9530271887779236\n", "Class 7\n", "Test loss: 1.548042893409729\n", "Test accuracy: 0.5661478638648987\n", "Class 8\n", "Test loss: 0.17124219238758087\n", "Test accuracy: 0.9691992402076721\n", "Class 9\n", "Test loss: 0.05847443267703056\n", "Test accuracy: 0.9831516742706299\n" ] } ], "source": [ "for c in range(10):\n", " score = model.evaluate(X_test[t_test==c, :], T_test[t_test==c, :], verbose=0)\n", " print(\"Class\", c)\n", " print('Test loss:', score[0])\n", " print('Test accuracy:', score[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**A:** The last digits to be seen during training are the 9s: they have a good test accuracy. The 8s were seen not too long ago, they are also OK. But the other digits have been forgotten! The memory has been erased. This explains why you cannot train a deep network on-policy: the last episode would be remembered, but all the previous ones would be erased (catastrophic forgetting).\n", "\n", "A notable exception is for the 6s, which look like 9s, and the 0s, which look like 8s: they share features with the digits which are well recognized, so they perform OK." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Increase and decrease the learning rate of the optimizer. What do you observe? Is there a solution to this problem? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**A:** Increasing the learning rate worsens the problem. Decreasing does help, but then learning is very slow. This is classical example of catastrophic forgetting: learning a new task erases the previous ones. There is no solution to this problem for now, apart from taking **i.i.d** samples in each minibatch." ] } ], "metadata": { "kernelspec": { "display_name": "deeprl", "language": "python", "name": "deeprl" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 4 }