{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Convolutional neural networks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of this exercise is to train a convolutional neural network on MNIST and better understand what is happening during training.\n", "\n", "## Training a CNN on MNIST" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "import tensorflow as tf\n", "\n", "print(tf.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Tip:** CNNs are much slower to train on CPU than the DNN of the last exercise. It is feasible to do this exercise on a normal computer, but if you have a Google account, we suggest to use `colab` to run this notebook on a GPU **for free** (training time should be divided by a factor 5 or so). \n", "\n", "Go then in the menu, \"Runtime\" and \"Change Runtime type\". You can then change the \"Hardware accelerator\" to GPU. Do not choose TPU, it will be as slow as CPU for the small networks we are using." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We import and normalize the MNIST data like last time, except we do not reshape the images: they stay with the shape (28, 28, 1):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Fetch the MNIST data\n", "(X_train, t_train), (X_test, t_test) = tf.keras.datasets.mnist.load_data()\n", "print(\"Training data:\", X_train.shape, t_train.shape)\n", "print(\"Test data:\", X_test.shape, t_test.shape)\n", "\n", "# Normalize the values\n", "X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.\n", "X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.\n", "\n", "# Mean removal\n", "X_mean = np.mean(X_train, axis=0)\n", "X_train -= X_mean\n", "X_test -= X_mean\n", "\n", "# One-hot encoding\n", "T_train = tf.keras.utils.to_categorical(t_train, 10)\n", "T_test = tf.keras.utils.to_categorical(t_test, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now define the CNN defined in the first image:\n", "\n", "* a convolutional layer with 16 3x3 filters, using valid padding and ReLU transfer functions,\n", "* a max-pooling layer over 2x2 regions,\n", "* a fully-connected layer with 100 ReLU neurons,\n", "* a softmax layer with 10 neurons.\n", "\n", "The CNN will be trained on MNIST using SGD with momentum.\n", "\n", "The following code defines this basic network in keras:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Delete all previous models to free memory\n", "tf.keras.backend.clear_session()\n", "\n", "# Sequential model\n", "model = tf.keras.models.Sequential()\n", "\n", "# Input layer representing the (28, 28) image\n", "model.add(tf.keras.layers.Input(shape=(28, 28, 1)))\n", "\n", "# Convolutional layer with 16 feature maps using 3x3 filters\n", "model.add(tf.keras.layers.Conv2D(16, (3, 3), padding='valid'))\n", "model.add(tf.keras.layers.Activation('relu')) \n", "\n", "# Max-pooling layerover 2x2 regions\n", "model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))\n", "\n", "# Flatten the feature maps into a vector\n", "model.add(tf.keras.layers.Flatten())\n", "\n", "# Fully-connected layer\n", "model.add(tf.keras.layers.Dense(units=100))\n", "model.add(tf.keras.layers.Activation('relu')) \n", "\n", "# Softmax output layer over 10 classes\n", "model.add(tf.keras.layers.Dense(units=10))\n", "model.add(tf.keras.layers.Activation('softmax')) \n", "\n", "# Learning rule\n", "optimizer = tf.keras.optimizers.SGD(lr=0.1, momentum=0.9, nesterov=True)\n", "\n", "# Loss function\n", "model.compile(\n", " loss='categorical_crossentropy', # loss function\n", " optimizer=optimizer, # learning rule\n", " metrics=['accuracy'] # show accuracy\n", ")\n", "\n", "print(model.summary())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note the use of `Flatten()` to transform the 13x13x16 tensor representing the max-pooling layer into a vector of 2704 elements.\n", "\n", "Note also the use of `padding='valid'` and its effect on the size of the tensor corresponding to the convolutional layer. Change it to `padding='same'` and conclude on its effect.\n", "\n", "**Q:** Which layer has the most parameters? Why? Compare with the fully-connected MLPs you obtained during exercise 5." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's now train this network on MNIST for 10 epochs, using minibatches of 64 images:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# History tracks the evolution of the metrics during learning\n", "history = tf.keras.callbacks.History()\n", "\n", "# Training procedure\n", "model.fit(\n", " X_train, T_train, # training data\n", " batch_size=64, # batch size\n", " epochs=10, # Maximum number of epochs\n", " validation_split=0.1, # Perceptage of training data used for validation\n", " callbacks=[history] # Track the metrics at the end of each epoch\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As in the previous exercise, the next cells compute the test loss and accuracy and display the evolution of the training and validation accuracies:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "score = model.evaluate(X_test, T_test, verbose=0)\n", "print('Test loss:', score[0])\n", "print('Test accuracy:', score[1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(15, 6))\n", "\n", "plt.subplot(121)\n", "plt.plot(history.history['loss'], '-r', label=\"Training\")\n", "plt.plot(history.history['val_loss'], '-b', label=\"Validation\")\n", "plt.xlabel('Epoch #')\n", "plt.ylabel('Loss')\n", "plt.legend()\n", "\n", "plt.subplot(122)\n", "plt.plot(history.history['accuracy'], '-r', label=\"Training\")\n", "plt.plot(history.history['val_accuracy'], '-b', label=\"Validation\")\n", "plt.xlabel('Epoch #')\n", "plt.ylabel('Accuracy')\n", "plt.legend()\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** What do you think of 1) the final accuracy and 2) the training time, compared to the MLP of last time?\n", "\n", "**Q:** When does your network start to overfit? How to recognize it?\n", "\n", "**Q:** Try different values for the batch size (16, 32, 64, 128..). What is its influence?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Improve the CNN to avoid overfitting. The test accuracy should be around 99%.\n", "\n", "You can:\n", "\n", "* change the learning rate\n", "* add another block on convolution + max-pooling before the fully-connected layer to reduce the number of parameters,\n", "* add dropout after some of the layers,\n", "* use L2 regularization,\n", "* use a different optimizer,\n", "* do whatever you want.\n", "\n", "**Beware:** training is now relatively slow, keep your number of tries limited. Once you find a good architecture that does not overfit, train it for 20 epochs and proceed to the next questions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysing the CNN\n", "\n", "Once a network has been trained, let's see what has happened internally.\n", "\n", "### Accessing trained weights\n", "\n", "Each layer of the network can be addressed individually. For example, `model.layers[0]` represents the first layer of your network (the first convolutional one, as the input layer does not count). The index of the other layers can be found by looking at the output of `model.summary()`.\n", "\n", "You can obtain the parameters of each layer (if any) with:\n", "\n", "```python\n", "W = model.layers[0].get_weights()[0]\n", "```\n", "\n", "**Q:** Print the shape of these weights and relate them to the network." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Visualize with `imshow()` each of the 16 filters of the first convolutional layer. Interpret what kind of operation they perform on the image.\n", "\n", "*Hint:* `subplot()` is going to be useful here. If you have 16 images `img[i]`, you can visualize them in a 4x4 grid with:\n", "\n", "```python\n", "for i in range(16):\n", " plt.subplot(4, 4, i+1)\n", " plt.imshow(img[i], cmap=plt.cm.gray)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizing the feature maps\n", "\n", "Let's take a random image from the training set and visualize it: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "idx = 31727 # or any other digit\n", "x = X_train[idx, :, :, :].reshape(1, 28, 28, 1)\n", "t = t_train[idx]\n", "\n", "print(t)\n", "\n", "plt.figure(figsize=(6, 6))\n", "plt.imshow(x[0, :, :, 0] + X_mean[:, :, 0], cmap=plt.cm.gray)\n", "plt.colorbar()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This example could be a 1 or 7. That is why you will never get 100% accuracy on MNIST: some examples are hard even for humans...\n", "\n", "**Q:** Print what the model predict for it, its true label, and visualize the probabilities in the softmax output layer (look at the doc of `model.predict()`):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Depending on how your network converged, you may have the correct prediction or not. \n", "\n", "**Q:** Visualize the output of the network for different examples. Do these ambiguities happen often?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's look inside the network. We will first visualize the 16 feature maps of the first convolutional layer.\n", "\n", "This is actually very simple using tensorflow 2.x: One only needs to create a new model (class `tf.keras.models.Model`, not Sequential) taking the same inputs as the original model, but returning the output of the first layer (`model.layers[0]` is the first convolutional layer of the model, as the input layer does not count):\n", "\n", "```python\n", "model_conv = tf.keras.models.Model(inputs=model.inputs, outputs=model.layers[0].output)\n", "```\n", "\n", "To get the tensor corresponding to the first convolutional layer, one simply needs to call `predict()` on the new model:\n", "\n", "```python\n", "feature_maps = model_conv.predict([x])\n", "```\n", "\n", "**Q:** Visualize the 16 feature maps using `subplot()`. Relate these activation with the filters you have visualized previously." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Do the same with the output of the first max-pooling layer.\n", "\n", "*Hint:* you need to find the index of that layer in `model.summary()`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Bonus question:** if you had several convolutional layers in your network, visualize them too. What do you think of the specificity of some features?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13 | packaged by conda-forge | (main, May 27 2022, 17:00:33) \n[Clang 13.0.1 ]" }, "vscode": { "interpreter": { "hash": "3d24234067c217f49dc985cbc60012ce72928059d528f330ba9cb23ce737906d" } } }, "nbformat": 4, "nbformat_minor": 4 }