{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Multi-layer Perceptron" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of this exercise is to implement a shallow multi-layer perceptron to perform non-linear classification. Let's start with usual imports, including the logistic function." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "import matplotlib.pyplot as plt\n", "from IPython.display import clear_output\n", "%matplotlib inline\n", "\n", "from sklearn.model_selection import train_test_split\n", "\n", "rng = np.random.default_rng()\n", "\n", "def logistic(x):\n", " return 1.0/(1.0 + np.exp(-x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Structure of the MLP \n", "\n", "In this exercise, we will consider a MLP for non-linear binary classification composed of 2 input neurons $\\mathbf{x}$, one output neuron $y$ and $K$ hidden neurons in a single hidden layer ($\\mathbf{h}$). \n", "\n", "\"MLP\"\n", "\n", "The output neuron is a vector $\\mathbf{y}$ with one element that sums its inputs with $K$ weights $W^2$ and a bias $\\mathbf{b}^2$.\n", "\n", "$$\\mathbf{y} = \\sigma( W^2 \\times \\mathbf{h} + \\mathbf{b}^2)$$\n", "\n", "It uses the logistic transfer function: \n", "\n", "$$\\sigma(x) = \\dfrac{1}{1 + \\exp -x}$$\n", "\n", "As in logistic regression for linear classification, we will interpret $y$ as the probability that the input $\\mathbf{x}$ belongs to the positive class. \n", "\n", "$W^2$ is a $1 \\times K$ matrix (we could interpret it as a vector, but this will make the computations easier) and $\\mathbf{b}^2$ is a vector with one element. \n", "\n", "Each of the $K$ hidden neurons receives 2 weights from the input layer, what gives a $K \\times 2$ weight matrix $W^1$, and $K$ biases in the vector $\\mathbf{b}^1$. They will also use the logistic activation function at first:\n", "\n", "$$\\mathbf{h} = \\sigma(W^1 \\times \\mathbf{x} + \\mathbf{b}^1)$$\n", "\n", "The goal is to implement the backpropagation algorithm by comparing the desired output $t$ with the prediction $y$:\n", "\n", "* The output error is a vector with one element:\n", " \n", "$$\\delta = (\\mathbf{t} - \\mathbf{y})$$\n", "\n", "* The backpropagated error is a vector with $K$ elements:\n", "\n", "$$\\delta_\\text{hidden} = \\sigma'(W^1 \\times \\mathbf{x} + \\mathbf{b}^1) \\, W_2^T \\times \\delta$$\n", "\n", "($W^2$ is a $1 \\times K$ matrix, so $W_2^T \\times \\delta$ is a $K \\times 1$ vector. The vector $\\sigma'(W^1 \\times \\mathbf{x} + \\mathbf{b}^1)$ is multiplied element-wise.)\n", "\n", "* Parameter updates follow the delta learning rule:\n", "\n", "$$\\Delta W^1 = \\eta \\, \\delta_\\text{hidden} \\times \\mathbf{x}^T$$\n", "\n", "$$\\Delta \\mathbf{b}^1 = \\eta \\, \\delta_\\text{hidden} $$\n", "\n", "$$\\Delta W^2 = \\eta \\, \\delta \\, \\mathbf{h}^T$$\n", "\n", "$$\\Delta \\mathbf{b}^2 = \\eta \\, \\delta$$\n", "\n", "Notice the transpose operators to obtain the correct shapes. You will remember that the derivative of the logistic function is given by:\n", "\n", "$$\\sigma'(x)= \\sigma(x) \\, (1- \\sigma(x))$$\n", "\n", "**Q:** Why do not we use the derivative of the transfer function of the output neuron when computing the output error $\\delta$? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The MLP will be trained on a non-linear dataset with samples of each class forming a circle. Each sample has two input dimensions. In the cell below, blue points represent the positive class (t=1), orange ones the negative class (t=0)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from sklearn.datasets import make_circles\n", "\n", "N = 100\n", "d = 2\n", "X, t = make_circles(n_samples=N, noise = 0.03, random_state=42)\n", "\n", "plt.figure(figsize=(10, 6))\n", "plt.scatter(X[t==1, 0], X[t==1, 1])\n", "plt.scatter(X[t==0, 0], X[t==0, 1])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Split the data into a training and test set (80/20). Make sure to call them `X_train, X_test, t_train, t_test`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Class definition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The neural network is entirely defined by its parameters, i.e. the weight matrices and bias vectors, as well the transfer function of the hidden neurons. In order to make your code more reusable, the MLP will be implemented as a Python class. The following cell defines the class, but we will explain it step by step afterwards." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class MLP:\n", " \n", " def __init__(self, d, K, activation_function, max_val, eta):\n", " \n", " self.d = d\n", " self.K = K\n", " self.activation_function = activation_function\n", " self.eta = eta\n", " \n", " self.W1 = rng.uniform(-max_val, max_val, (K, d)) \n", " self.b1 = rng.uniform(-max_val, max_val, (K, 1))\n", " \n", " self.W2 = rng.uniform(-max_val, max_val, (1, K)) \n", " self.b2 = rng.uniform(-max_val, max_val, (1, 1))\n", " \n", " def feedforward(self, x):\n", " \n", " # Make sure x has 2 rows\n", " x = np.array(x).reshape((self.d, -1))\n", "\n", " # Hidden layer\n", " self.h = self.activation_function(np.dot(self.W1, x) + self.b1)\n", "\n", " # Output layer\n", " self.y = logistic(np.dot(self.W2, self.h) + self.b2) \n", " \n", " \n", " def train(self, X_train, t_train, nb_epochs, visualize=True):\n", " errors = []\n", " \n", " for epoch in range(nb_epochs):\n", " \n", " nb_errors = 0\n", "\n", " # Epoch\n", " for i in range(X_train.shape[0]):\n", "\n", " # Feedforward pass: sets self.h and self.y\n", " self.feedforward(X_train[i, :])\n", " \n", " # Backpropagation\n", " self.backprop(X_train[i, :], t_train[i])\n", " \n", " # Predict the class: \n", " if self.y[0, 0] > 0.5:\n", " c = 1\n", " else:\n", " c = 0\n", "\n", " # Count the number of misclassifications\n", " if t_train[i] != c: \n", " nb_errors += 1\n", " \n", " # Compute the error rate\n", " errors.append(nb_errors/X_train.shape[0])\n", " \n", " # Plot the decision function every 10 epochs\n", " if epoch % 10 == 0 and visualize:\n", " self.plot_classification() \n", "\n", " # Stop when the error rate is 0\n", " if nb_errors == 0:\n", " if visualize:\n", " self.plot_classification() \n", " break\n", " \n", " return errors, epoch+1\n", "\n", " def backprop(self, x, t):\n", " \n", " # Make sure x has 2 rows\n", " x = np.array(x).reshape((self.d, -1))\n", "\n", " # TODO: implement backpropagation\n", " \n", " def test(self, X_test, t_test):\n", " \n", " nb_errors = 0\n", " for i in range(X_test.shape[0]):\n", "\n", " # Feedforward pass\n", " self.feedforward(X_test[i, :]) \n", "\n", " # Predict the class: \n", " if self.y[0, 0] > 0.5:\n", " c = 1\n", " else:\n", " c = 0\n", "\n", " # Count the number of misclassifications\n", " if t_test[i] != c: \n", " nb_errors += 1\n", "\n", " return nb_errors/X_test.shape[0]\n", " \n", " def plot_classification(self):\n", "\n", " # Allow redrawing \n", " clear_output(wait=True)\n", "\n", " x_min, x_max = X_train[:, 0].min(), X_train[:, 0].max()\n", " y_min, y_max = X_train[:, 1].min(), X_train[:, 1].max()\n", " xx, yy = np.meshgrid(np.arange(x_min, x_max, .02), np.arange(y_min, y_max, .02))\n", "\n", " x1 = xx.ravel()\n", " x2 = yy.ravel() \n", " x = np.array([[x1[i], x2[i]] for i in range(x1.shape[0])])\n", "\n", " self.feedforward(x.T)\n", " Z = self.y.copy()\n", " Z[Z>0.5] = 1\n", " Z[Z<=0.5] = 0\n", "\n", " from matplotlib.colors import ListedColormap\n", " cm_bright = ListedColormap(['#FF0000', '#0000FF'])\n", "\n", " fig = plt.figure(figsize=(10, 6))\n", " plt.contourf(xx, yy, Z.reshape(xx.shape), cmap=cm_bright, alpha=.4)\n", " plt.scatter(X_train[:, 0], X_train[:, 1], c=t_train, cmap=cm_bright, edgecolors='k')\n", " plt.scatter(X_test[:, 0], X_test[:, 1], c=t_test, cmap=cm_bright, alpha=0.4, edgecolors='k')\n", " plt.xlim(xx.min(), xx.max())\n", " plt.ylim(yy.min(), yy.max())\n", " plt.show() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The constructor `__init__` of the class accepts several arguments:\n", "\n", "* `d` is the number inputs, here 2.\n", "* `K` is the number of hidden neurons.\n", "* `activation_function` is the function to use for the hidden neurons, for example the `logistic` function defined at the beginning of the notebook. Note that the name of the method can be stored as a variable.\n", "* `max_val` is the maximum value used to initialize the weight matrices.\n", "* `eta` is the learning rate.\n", "\n", "The constructor starts by saving these arguments as attributes, so that they can be used in other method as `self.K`:\n", "\n", "```python\n", "def __init__(self, d, K, activation_function, max_val, eta):\n", "\n", " self.d = d\n", " self.K = K\n", " self.activation_function = activation_function\n", " self.eta = eta\n", "```\n", "\n", "The constructor then initializes randomly the weight matrices and bias vectors, uniformly between `-max_val` and `max_val`.\n", "\n", "```python\n", "self.W1 = rng.uniform(-max_val, max_val, (K, d)) \n", "self.b1 = rng.uniform(-max_val, max_val, (K, 1))\n", "\n", "self.W2 = rng.uniform(-max_val, max_val, (1, K)) \n", "self.b2 = rng.uniform(-max_val, max_val, (1, 1))\n", "```\n", "\n", "You can then already create the `MLP` object and observe how the parameters are initialized:\n", "\n", "```python\n", "mlp = MLP(d=2, K=15, activation_function=logistic, max_val=1.0, eta=0.05)\n", "```\n", "\n", "**Q:** Create the object and print the weight matrices and bias vectors." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `feedforward` method takes a vector `x` as input, reshapes it to make sure it has two rows, and computes the hidden activation $\\mathbf{h}$ and the prediction $\\mathbf{y}$.\n", "\n", "```python\n", "def feedforward(self, x):\n", "\n", " # Make sure x has 2 rows\n", " x = np.array(x).reshape((self.d, -1))\n", "\n", " # Hidden layer\n", " self.h = self.activation_function(np.dot(self.W1, x) + self.b1)\n", "\n", " # Output layer\n", " self.y = logistic(np.dot(self.W2, self.h) + self.b2) \n", "```\n", "\n", "Notice the use of `self.` to access attributes, as well as the use of `np.dot()` to mulitply vectors and matrices.\n", "\n", "**Q:** Using the randomly initialized weights, apply the `feedforward()` method to an input vector (for example $[0.5, 0.5]$) and print `h` and `y`. What is the predicted class of the example?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The class also provides a visualization method. It is not import to understand the code for the exercise, so you can safely skip it. It displays the training data as plain points, the test data as semi-transparent points and displays the decision function as a background color (all points in the blue region will be classified as negative examples).\n", "\n", "**Q:** Plot the initial classification on the dataset with random weights. Is there a need for learning? Reinitialize the weights and biases multiple times. What do you observe?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Backpropagation\n", "\n", "The `train()` method implements the training loop you have already implemented several times: several epochs over the training set, making a prediction for each input and modifying the parameters according to the prediction error: \n", "\n", "```python\n", "def train(self, X_train, t_train, nb_epochs, visualize=True):\n", " errors = []\n", "\n", " for epoch in range(nb_epochs):\n", "\n", " nb_errors = 0\n", "\n", " # Epoch\n", " for i in range(X_train.shape[0]):\n", "\n", " # Feedforward pass: sets self.h and self.y\n", " self.feedforward(X_train[i, :])\n", "\n", " # Backpropagation\n", " self.backprop(X_train[i, :], t_train[i])\n", "\n", " # Predict the class: \n", " if self.y[0, 0] > 0.5:\n", " c = 1\n", " else:\n", " c = 0\n", "\n", " # Count the number of misclassifications\n", " if t_train[i] != c: \n", " nb_errors += 1\n", "\n", " # Compute the error rate\n", " errors.append(nb_errors/X_train.shape[0])\n", "\n", " # Plot the decision function every 10 epochs\n", " if epoch % 10 == 0 and visualize:\n", " self.plot_classification() \n", "\n", " # Stop when the error rate is 0\n", " if nb_errors == 0:\n", " if visualize:\n", " self.plot_classification() \n", " break\n", "\n", " return errors, epoch+1\n", "```\n", "\n", "The training methods stops after `nb_epochs` epochs or when no error is made during the last epoch. The decision function is visualized every 10 epochs to better understand what is happening. The method returns a list containing the error rate after each epoch, as well as the number of epochs needed to reach an error rate of 0.\n", "\n", "The only thing missing is the `backprop(x, t)` method, which currently does nothing:\n", "\n", "```python\n", "def backprop(self, x, t):\n", "\n", " # Make sure x has 2 rows\n", " x = np.array(x).reshape((self.d, -1))\n", "\n", " # TODO: implement backpropagation\n", "```\n", "\n", "**Q:** Implement the *online* backpropagation algorithm. \n", "\n", "All you have to do is to backpropagate the output error and adapt the parameters using the delta learning rule:\n", "\n", "1. compute the output error `delta`.\n", "2. compute the backpropagated error `delta_hidden`.\n", "3. increment the parameters `self.W1, self.b1, self.W2, self.b2` accordingly.\n", "\n", "The only difficulty is to take care of the shape of each matrix (before multiplying two matrices or vectors, test what their shape is).\n", "\n", "*Note:* you can either edit directly the cell containing the definition of the class, or create a new class `TrainableMLP` inheriting from the class `MLP` and simply redefine the `backprop()` method. The solution will use the second option to be more readable, but it does not matter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Train the MLP for 1000 epochs on the data using a learning rate of 0.05, 15 hidden neurons and weights initialized between -1 and 1. Plot the evolution of the training error." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Use the `test()` method to compute the error on the test set. What is the test accuracy of your network after training? Compare it to the training accuracy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Experiments\n", "\n", "### Influence of the number of hidden neurons\n", "\n", "**Q:** Try different values for the number of hidden neurons $K$ (e.g. 2, 5, 10, 15, 20, 25, 50...) and observe how the accuracy and speed of convergence evolve." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Influence of the learning rate\n", "\n", "**Q:** Vary the learning rate between extreme values. How does the performance evolve?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Influence of weight initialization\n", "\n", "**Q:** The weights are initialized randomly between -1 and 1. Try to initialize them to 0. Does it work? Why?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** For a fixed number of hidden neurons (e.g. $K=15$) and a correct value of `eta`, train 10 times the network with different initial weights and superimpose on the same plot the evolution of the training error. Conclude." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Influence of the transfer function\n", "\n", "**Q:** Modify the `backprop()` method so that it applies backpropagation correctly for any of the four activation functions:\n", "\n", "* linear\n", "* logistic\n", "* tanh\n", "* relu" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Linear transfer function\n", "def linear(x):\n", " return x\n", "\n", "# tanh transfer function \n", "def tanh(x):\n", " return np.tanh(x)\n", "\n", "# ReLU transfer function\n", "def relu(x):\n", " x = x.copy()\n", " x[x < 0.] = 0.\n", " return x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember that the derivatives of these activations functions are easy to compute:\n", "\n", "* linear: $f'(x) = 1$\n", "* logistic: $f'(x) = f(x) \\, (1 - f(x))$\n", "* tanh: $f'(x) = 1 - f^2(x)$\n", "* relu: $f'(x) = \\begin{cases}1 \\; \\text{if} \\; x>0\\\\ 0 \\; \\text{if} \\; x \\leq 0\\\\ \\end{cases}$\n", "\n", "*Hint:* `activation_function` is a variable like others, although it is the name of a method. You can apply comparisons on it:\n", "\n", "```python\n", "if self.activation_function == linear:\n", " diff = something\n", "elif self.activation_function == logistic:\n", " diff = something\n", "elif self.activation_function == tanh:\n", " diff = something\n", "elif self.activation_function == relu:\n", " diff = something\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Use a linear transfer function for the hidden neurons. How does performance evolve? Is the non-linearity of the transfer function important for learning?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Use this time the hyperbolic tangent function as a transfer function for the hidden neurons. Does it improve learning? " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Use the Rectified Linear Unit (ReLU) transfer function. What does it change? Conclude on the importance of the transfer function for the hidden neurons. Select the best one from now on." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Influence of data normalization\n", "\n", "The input data returned by `make_circles` is nicely center around 0, with values between -1 and 1. What happens if this is not the case with your data?\n", "\n", "**Q:** Shift the input data `X` using the formula:\n", "\n", "$$X_\\text{shifted} = 8 \\, X + 2$$\n", "\n", "regenerate the training and test sets and train the MLP on them. What do you observe?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Normalize the shifted data so that it has a mean of 0 and a variance of 1 in each dimension, using the formula:\n", "\n", "$$X_\\text{normalized} = \\dfrac{X_\\text{shifted} - \\text{mean}(X_\\text{shifted})}{\\text{std}(X_\\text{shifted})}$$\n", "\n", "and retrain the network. Conclude." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Influence of randomization\n", "\n", "The training loop we used until now iterated over the training samples in the exact same order at every epoch. The samples are therefore not i.i.d (independent and identically distributed) as they follow the same sequence. \n", "\n", "**Q:** Modify the `train()` method so that the indices of the training samples are randomized between two epochs. Check the doc of `rng.permutation()` for help." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Influence of weight initialization - part 2\n", "\n", "According to the empirical analysis by Glorot and Bengio in “Understanding the difficulty of training deep feedforward neural networks”, the optimal initial values for the weights between two layers of a MLP are uniformly taken in the range:\n", "\n", "$$\n", " \\mathcal{U}( - \\sqrt{\\frac{6}{N_{\\text{in}}+N_{\\text{out}}}} , \\sqrt{\\frac{6}{N_{\\text{in}}+N_{\\text{out}}}} )\n", "$$\n", "\n", "where $N_{\\text{in}}$ is the number of neurons in the first layer and $N_{\\text{out}}$ the number of neurons in the second layer.\n", "\n", "**Q:** Modify the constructor of your class to initialize both hidden and output weights with this new range. The biases should be initialized to 0. What is the effect?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summary\n", "\n", "**Q:** Now that we optimized the MLP, it is time to cross-validate again the number of hidden neurons and the learning rate. As the networks always get a training error rate of 0 and the test set is not very relevant, the maincriteria will be the number of epochs needed on average to converge. Find the best MLP for the dataset (there is not a single solution), for example by iterating over multiple values of `K` and `eta`. What do you think of the change in performance between the first naive implementation and the final one? What were the most critical changes?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 4 }