{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MATH60629A\n", "# Week \\#5 - Neural Networks - Exercices\n", "\n", "This tutorial explores neural networks." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2021-10-01 15:19:56-- https://raw.githubusercontent.com/lcharlin/80-629/master/week5-NeuralNetworks/utils.py\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 13166 (13K) [text/plain]\n", "Saving to: ‘utilities.py’\n", "\n", "utilities.py 100%[===================>] 12.86K --.-KB/s in 0.002s \n", "\n", "2021-10-01 15:19:56 (7.38 MB/s) - ‘utilities.py’ saved [13166/13166]\n", "\n" ] } ], "source": [ "import numpy as np\n", "\n", "# Code to obtain utils.py\n", "!wget https://raw.githubusercontent.com/lcharlin/80-629/master/week5-NeuralNetworks/utils.py -O utilities.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A tiny neural network classifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to classify the examples, we will use the following simple neural network:\n", "\n", "\n", "\n", "where $\\sigma$ is the sigmoid function defined as:\n", "\n", "$$\n", " \\sigma(x) = \\frac{1}{1+ e^{-x}}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 1\n", "\n", "Assume that the parameters of the neural network are as follows: \n", "\n", "\\begin{aligned}\n", "& w_1 = -5 & w_2 = 10 && w_3 = 5 \\\\\n", "& w_4 = -10 & w_5 = 20 && w_6 = 20 \\\\\n", "& b_1 = 25 & b_2 = 40 && b_3 = -30 \n", "\\end{aligned}\n", "\n", "What would be the predicted label for the following data points:\n", "\n", " | x1 | x2 | o | label |\n", " |-------|-------|-----|-------|\n", " | 4 | -4 | | |\n", " |-4 | 4 | | |\n", " | -4 | -4 | | |\n", " | 4 | 4 | | |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use the following piece of code to evaluate the output of the network:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def sigmoid(x): \n", " return 1 / (1 + np.exp(-x))\n", "\n", "def nn1(x1, x2, w1, w2, w3, w4, w5, w6, b1, b2, b3):\n", " h1 = sigmoid(w1*x1 + w3*x2 + b1)\n", " h2 = sigmoid(w2*x1 + w4*x2 + b2)\n", " o = sigmoid(w5*h1 + w6*h2 + b3)\n", " return o" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding good parameters for our network\n", "\n", "Let's move to a slightly more realistic example. Here we focus on the task of (binary) classification. As always, we first load the data that we want to classify:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [], "source": [ "from utilities import load_data, plot_boundaries, plot_data # we wrote some helper functions\n", "X_train, y_train, X_test, y_test = load_data() # to help with data loading" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can plot the data using the helper function `plot_data`:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_data(X_train, y_train, X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, this data is not linearly separable. In other words, the positive and negative examples cannot be separated using a linear classifier. Our goal for the rest of this notebook-session is to learn the parameters of a neural-network model which can separate the positives from the negative examples.\n", "\n", "What do we mean by *learning the parameters*? Remember that our neural network has 9 parameters including three biases ($w_1, \\ldots, w_6, b_1, b_2, b_3$). Every different assignment of values to these parameters leads to a different classifier. We want to find the one which matches our data the best.\n", "\n", "Let's see how different choices of parameters changes the classifier. For a given set of parameters, the function `plot_boundaries` shows the regions of positive prediction (coloured blue) and negative prediction (coloured red):" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAQAAAAD8CAYAAACYVXqwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAOm0lEQVR4nO3dX4xkZZnH8e9jI7vZcWaHBAgJPWQgi7gsYmBbds1k/wkaXMl4sxeYaIhedNasZEwkCEy836wblajJpgN44yTsBnExhlWHKCZ7wSzDyIgwaAgh0ioBEnEIiJOZefaiq01XdVVXddWpc07V+/1cTXW/deqBqfrNe55+zunITCSV6W1NFyCpOQaAVDADQCqYASAVzACQCmYASAWrJAAiYndEPBARz0bEiYh4XxXHlTRd51R0nLuB72bmP0XEucCfVHRcSVMUkw4CRcQu4DhwWTpVJM2UKnYAlwGvAF+PiPcATwAHMvONjYsiYhlYBtixY8dfvuvyy7uPsrDQ9fDMmc0v1Pu1UdacPbt5Te/X+sXWKGtGibuq1ozC+NW63/zmBd5449UYtq6KHcAS8BiwLzOPRMTdwMnM/Pyg5yxdc00effTRrq/lrj/tenzy5ObnvfYaQ9f89rfdj3/3u81r3nyz+/Fbb21ec+rU1o8BTp/uftwvbHrX9FvX73nDnjPumqbNQo3z4CtfWWJ19ejQAKiiCbgKrGbmkc7jB4BrKziupCmbOAAy8yXgxYi4ovOl64FnJj2upOmr6qcAtwKHOj8BeB74REXHlTRFlQRAZj4JLI38hDNnNp3Q956s7OrpCcyCfuf75/T5P9xvXa/ec+W39dmrjbOmabNQY0mcBJQKZgBIBTMApIJV1QTcnjNn+v8Qf4N+P8Cc177AOD0B2Hw+Pe6aps1CjfPKHYBUMANAKpgBIBXMAJAK1lwTsPeqnRHMw7AQbG76OSzUbRZqnBfuAKSCGQBSwQwAqWDN9ADOnu1/p45tcliom8NC2i53AFLBDACpYAaAVDADQCpYc03A3lvzVsRhoW4OC2kr7gCkghkAUsEMAKlgzfQAMvv/Op4pcFioW1XDQoPWNclhoe1zByAVzACQCmYASAUzAKSCNTcI1O/3bdfEYaFu4w4Ctb3pNguNy6ZVtgOIiIWI+HFEfKeqY0qaripPAQ4AJyo8nqQpqyQAImIR+DBwTxXHk1SPqnoAXwZuB3aOtDqz0R5AL4eFuo07CDQL59xt71vUbeIdQETcBLycmU8MWbccEUcj4ugrr78+6ctKqkAVpwD7gP0R8QJwP/D+iPhG76LMXMnMpcxcumDnaBsFSdM1cQBk5p2ZuZiZe4GbgR9k5scmrkzS1DkIJBWs0kGgzHwUeHSEhaN1oxrksFC3eb3l+Cw0LqfJHYBUMANAKpgBIBWsmYuBYPOJ1oz1BGA2+wJVDQuBdxyeB+4ApIIZAFLBDACpYAaAVLDmbgs+bEKl5U1BcFio17wOC8Fs1DgOdwBSwQwAqWAGgFQwA0AqWHsnAcftRDWopGnBQet6OS3Ybu4ApIIZAFLBDACpYM0NAg07YarysrUGldQXcFio+3Eba+zlDkAqmAEgFcwAkApmAEgFa24QqNcoHZM5GBYCryLs5bBQc9wBSAUzAKSCGQBSwdpzMdA4UxQOC7WKw0LDta1GdwBSwSYOgIjYExE/jIgTEfF0RByoojBJ01fFKcBp4LOZeSwidgJPRMThzHymgmNLmqKJdwCZ+evMPNb58+vACeDiSY8rafoqbQJGxF7gGuBIn+8tA8sAl5x33uYnVzXp4bBQqzgstLWma6ysCRgR7wC+CXwmM0/2fj8zVzJzKTOXLtixo6qXlTSBSgIgIt7O2of/UGY+WMUxJU1fFT8FCOBe4ERmfnHykiTVpYoewD7g48BTEfFk52t3ZebDA5/R745A05r0cFioVRwWGq7OGicOgMz8X/q/PyW1nJOAUsEMAKlgBoBUsPbeEWiakx4OC7WKw0Jbm2aN7gCkghkAUsEMAKlg7ekB9Kpz0sNhoVZp27DQoHVNqmpYyB2AVDADQCqYASAVzACQCtbeJmA/Dgttm8NC3cZ9e7T9KsJ+/x0jPa/aMiTNEgNAKpgBIBVstnoAvRwW2jaHhbqNOwg0C8NCo3AHIBXMAJAKZgBIBTMApILNdhOwH4eFts1hoW7zfMvxXu4ApIIZAFLBDACpYPPXA+jVtmGhQetaxGGhzeb2jsNNFyCpOVX9evAbI+JnEfFcRNxRxTElTV8Vvx58Afga8CHgSuCjEXHlpMeVNH1V7ACuA57LzOcz8xRwP/CRCo4racqqaAJeDLy44fEq8Fe9iyJiGVgGuGT37gpedgJNDguBVxE2yGGhblXsAPq9N3LTFzJXMnMpM5cu2LGjgpeVNKkqAmAV2LPh8SLwqwqOK2nKqgiAx4HLI+LSiDgXuBn4dgXHlTRlE/cAMvN0RHwa+B6wANyXmU9PXJmkqatkEjAzHwYeruJYjai7o+NVhK1R5fDmLE4LOgkoFcwAkApmAEgFa+5qwFFOhprksNC2zeuwEEzvluNNDwu17FMnqU4GgFQwA0AqmAEgFaw9twSbtaYgjNfRGbSul8NCrTKtqwibHhZq2adMUp0MAKlgBoBUsPb0AHqNej7dpHFP1qZ1y/EZ7AnAbPYF5mVYqGWfKEl1MgCkghkAUsEMAKlg7W0C9jMPw0L91jks1GUWm4LQrmGh6Ndt7aNlnyBJdTIApIIZAFLBZqsH0GsWh4VgepMeDgu1StPDQqNo2adFUp0MAKlgBoBUMANAKthsNwH7afuwEDR7y3GHhRpV57DQKCb6dETEFyLi2Yj4SUR8KyJ2T3I8SfWa9J/Hw8BVmXk18HPgzslLklSXiQIgM7+fmesblseAxclLklSXKnsAnwT+c9A3I2IZWAa4ZHeNZwoOC3VzWKhVpjksNIqhARARjwAX9fnWwcx8qLPmIHAaODToOJm5AqwALC0u5ljVSqrU0ADIzBu2+n5E3ALcBFyfmX6wpRky0SlARNwIfA74u8x8s5qSJNVl0pPhrwI7gcMR8WRE/EcFNUmqyUQ7gMz8s6oKqZXDQt0cFmqVKv46vCOQpKEMAKlgBoBUsPm7GGgcDgt1G2U6ZdC6FnFYaLiWvcsl1ckAkApmAEgFMwCkgtkEHMRhoW5zehXhLDYFYfiwkINAkoYyAKSCGQBSwQwAqWA2AUfltOBmc3AV4TxPC46iZe9gSXUyAKSCGQBSwewBTMJhoW5zOiwEs9cXcBBI0lAGgFQwA0AqmAEgFcwmYJVKGhYatK7XHAwLwexdRWgTUNJQBoBUMANAKpg9gGmbxb7AqOf707rl+Az2BKBdfYFR32KVvBMj4raIyIg4v4rjSarHxAEQEXuADwC/mLwcSXWqYgfwJeB2ICs4lqQaTRQAEbEf+GVmHq+oHkk1GtoEjIhHgIv6fOsgcBfwwVFeKCKWgWWAS3bv3kaJc6jtVxGOOwjksFCXJpuCow4CDQ2AzLyh/wvEu4FLgeOx9mqLwLGIuC4zX+pznBVgBWBpcdHTBakFxv4xYGY+BVy4/jgiXgCWMvPVCuqSVIOW7T0l1amyQaDM3FvVsYozi8NCML07DjssNLFaB4EkzSYDQCqYASAVzACQCubVgG3V9mEhaPaW4w4LbckmoKShDACpYAaAVDB7ALPCYaFuDgttaWFhtHUtewdJqpMBIBXMAJAKZgBIBbMJOMscFurmsNAf2ASUNJQBIBXMAJAKZg9gnjgs1K3gYSF7AJKGMgCkghkAUsEMAKlgNgHnncNC3QoZFrIJKGkoA0AqmAEgFcweQGkcFuo2yrDQoHUtsmlY6MyZkZ7Xsr95SXWaOAAi4taI+FlEPB0R/1ZFUZLqMdEpQET8A/AR4OrM/H1EXFhNWZLqMOkO4FPAv2bm7wEy8+XJS5JUl8jM8Z8c8STwEHAj8BZwW2Y+PmDtMrDceXgV8NOxX7h65wOvNl3EBm2rB9pXk/Vs7YrM3Dls0dBTgIh4BLioz7cOdp5/HvDXwHuB/4qIy7JPqmTmCrDSOebRzFwa9tp1sZ7h2laT9WwtIo6Osm5oAGTmDVu8yKeABzsf+P+LiLOsJeEroxYqqTmT9gD+G3g/QES8EziXdm2DJG1h0kGg+4D7IuKnwCngln7b/z5WJnzdqlnPcG2ryXq2NlI9EzUBJc02JwGlghkAUsEaDYA2jhFHxG0RkRFxfsN1fCEino2In0TEtyJid0N13Nj5O3ouIu5oooYNteyJiB9GxInOe+ZAk/Wsi4iFiPhxRHyn6VoAImJ3RDzQef+ciIj3DVrbWAD0jBH/BfDvTdWyLiL2AB8AftF0LcBh4KrMvBr4OXBn3QVExALwNeBDwJXARyPiyrrr2OA08NnM/HPWZk/+peF61h0ATjRdxAZ3A9/NzHcB72GL2prcAbRxjPhLwO1A453RzPx+Zq5fg/oYsNhAGdcBz2Xm85l5CriftdBuRGb+OjOPdf78Omtv7IubqgcgIhaBDwP3NFnHuojYBfwtcC9AZp7KzNcGrW8yAN4J/E1EHImIH0XEexushYjYD/wyM483WccAnwT+p4HXvRh4ccPjVRr+wK2LiL3ANcCRZivhy6z9ozHCzQdqcRlrg3hf75yW3BMROwYtnuoNQaoaI66pnruAD07rtbdbT2Y+1FlzkLWt76E6a+vo90tpGt8dRcQ7gG8Cn8nMkw3WcRPwcmY+ERF/31QdPc4BrgVuzcwjEXE3cAfw+UGLp6ZtY8SD6omIdwOXAscjAta228ci4rrMfKnuejbUdQtwE3D9NINxC6vAng2PF4FfNVDHH0TE21n78B/KzAebrAXYB+yPiH8E/hjYFRHfyMyPNVjTKrCames7owdYC4C+mjwFaM0YcWY+lZkXZubezNzL2v/Ea6f54R8mIm4EPgfsz8w3GyrjceDyiLg0Is4Fbga+3VAtxFo63wucyMwvNlXHusy8MzMXO++Zm4EfNPzhp/OefTEiruh86XrgmUHrm7wn4LhjxKX4KvBHwOHOruSxzPznOgvIzNMR8Wnge8ACcF9mPl1nDT32AR8Hnupcig5wV2Y+3GBNbXQrcKgT2s8Dnxi00FFgqWBOAkoFMwCkghkAUsEMAKlgBoBUMANAKpgBIBXs/wHwBxmqi9eiQgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "w1 = 1; w2 = 1; w3 = 1; w4 = 1; w5 = 1; w6 = 1\n", "b1 = 0; b2 = 0; b3 = -1\n", "plot_boundaries(w1, w2, w3, w4, w5, w6, b1, b2, b3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's project the plot of data on these decision boundaries:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_boundaries(w1, w2, w3, w4, w5, w6, b1, b2, b3)\n", "plot_data(X_train, y_train, X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It appears that the classifier obtained using the above set of parameters does not match our data. (Of course, this is to be expected. This classifier with fixed weights a priori has a high bias and a low variance.)\n", "\n", "### Question 2\n", "Try the alternatives below and see which one is a better match for our data:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "w1 = -1; w2 = -1; w3 = -1; w4 = -1; w5 = 4; w6 = -3\n", "b1 = -4; b2 = 4; b3 = 1" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "w1 = 1; w2 = -1; w3 = -1; w4 = -1; w5 = -4; w6 = 3\n", "b1 = 4; b2 = -4; b3 = 2" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "w1 = -1; w2 = 2; w3 = 1; w4 = -2; w5 = 4; w6 = 4\n", "b1 = 5; b2 = 8; b3 = -6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Obviously, we need a better way than trial and error to find the best parameters. The way that we do this is by *minimizing a loss function*. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loss function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A *loss function* evaluates how much the predictions of our classifier are different from the actual labels. The loss function that we will use for our network is the *binary cross-entropy* loss. Let's represent our training data by the set $\\{(X_1, y_1), \\ldots, (X_n , y_n)\\}$ and our neural network function by $f$. Then the binary cross-entropy loss function will be defined as:\n", "\n", "\\begin{equation}\n", " \\ell = \\sum_{i=1}^n -y_i \\log f(X_i) - (1-y_i) log(1-f(X_i))\n", "\\end{equation}\n", "\n", "The binary cross-entropy relates to the Bernoulli distribution (maximizing the Bernoulli likelihood is equivalent to minimizing the binary cross-entropy). **It is the loss function that should be used for binary classification problems. It can be generalized to multiclass classification problems, see [cross entropy](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html).**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 3\n", "Let's see what this loss function means using a tiny example. Assume that our training data consists of only four examples, and the values of $X, f(X), y$ of those four examples are as follows:\n", "\n", "|X|f(X)|y|\n", "|:---|:---|:---|\n", "|(5.4, 1.6)|1|1|\n", "|(1.4, -0.5)|0.3679|1|\n", "|(3.5, -3)|0.8647|0|\n", "|(-3.5, 1.1)|0|0|\n", "\n", "Calculate the loss function using the equation above. You can calculate the *log* using this function:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "-0.6931471805599453" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.log(0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is important to remember that the loss function $l$ is a function of network parameters, since it is defined in terms of the network output. We can write the loss function as:\n", "\n", "\\begin{equation}\n", " \\ell(\\mathbf{w}, \\mathbf{b}) = \\sum_{i=1}^n -y_i \\log f(X_i, \\mathbf{w}, \\mathbf{b}) - (1-y_i) log(1-f(X_i, \\mathbf{w}, \\mathbf{b}))\n", "\\end{equation}\n", "\n", "In principle, we want to find the set of parameters $\\mathbf{w}, \\mathbf{b}$ for which $\\ell(\\mathbf{w}, \\mathbf{b})$ has the smallest value. We will use *gradient descent* to find these values. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Minimization by gradient descent" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The plot below shows the function $f(x_1, x_2) = x_1^2 + x_2^2$:\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question 4\n", "Point A on the plot has coordinates $(1, 1, 3)$. The blue vector AB shows the direction $(-1, -1)$, and the green vector AC shows the direction $(0, -1)$. \n", "Assume that we are at initial point $(1, 1)$ and we want to move in a direction that minimizes the function $f$. Which of these two directions moves faster towards the minimum: $(-1, -1)$ or $(0, -1)$?\n", "\n", "### Question 5\n", "Calculate the gradient of function $f$ in the point $(1, 1)$. How is this gradient related to the fastest path to the minimum (i.e. the steepest descent)?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training the neural network\n", "\n", "We now understand the theory of training neural networks. But how do we do this in practice? We will now develop our practical skills using the *scikit-learn* library to train our tiny network. Let's first define the network:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "from sklearn.neural_network import MLPClassifier\n", "clf = MLPClassifier(hidden_layer_sizes=(2,), \n", " activation='logistic', \n", " solver='lbfgs',\n", " random_state=0,\n", " max_iter=500,\n", " tol=1e-7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The argument `hidden_layer_sizes=(2,)` states that we only have one hidden layer with two neurons, and the argument `activation='logistic'` shows that we use the sigmoid activation function (Let's ignore the other arguments for now). \n", "\n", "We will now train the network using our training data:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "MLPClassifier(activation='logistic', alpha=0.0001, batch_size='auto',\n", " beta_1=0.9, beta_2=0.999, early_stopping=False, epsilon=1e-08,\n", " hidden_layer_sizes=(2,), learning_rate='constant',\n", " learning_rate_init=0.001, max_fun=15000, max_iter=500,\n", " momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,\n", " power_t=0.5, random_state=0, shuffle=True, solver='lbfgs',\n", " tol=1e-07, validation_fraction=0.1, verbose=False,\n", " warm_start=False)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "clf.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the network is trained, use the helper function `tiny_net_parameters` to get the parameters of the trained network (`tiny_net_parameters` is a wrapper around `clf.coefs_` and `clf.intercepts_`):" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "from utils import tiny_net_parameters\n", "w1, w2, w3, w4, w5, w6, b1, b2, b3 = tiny_net_parameters(clf)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_boundaries(w1, w2, w3, w4, w5, w6, b1, b2, b3)\n", "plot_data(X_train, y_train, X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The learned classifier does a good job at predicting labels both for the training examples and unseen examples (test data). \n", "\n", "--- \n", "\n", "In addition to the decision boundaries in the original data space, we can also visualize how the data are transformed through the neural networks. Since we use a hidden layer with two neurons, we can visualize its \"output\" in two dimensions. \n", "\n", "En plus des frontières de décisions dans l'espace original des données, nous pouvons aussi visualiser comment les données sont transformées à travers le réseau de neurones. Nous utilisons le fait que la couche cachée utilise deux neurones et donc nous pouvons visualiser sa sortie en deux dimensions. \n", "\n", "(for better visibility, we changed the color of the yellow class to blue.)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from utilities import plot_data_transformations\n", "plot_data_transformations(X_train, y_train, w1, w2, w3, w4, w5, w6, b1, b2, b3, \\\n", " language='English')" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from utilities import plot_data_transformations\n", "plot_data_transformations(X_test, y_test, w1, w2, w3, w4, w5, w6, b1, b2, b3, \\\n", " language='English')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tensorflow Playground questions\n", "\n", "We will now investigate a few properties of neural networks using [tensorflow playground](https://playground.tensorflow.org/). Take a few minutes to familiarize yourself with the playground:\n", "\n", "- Change the number of hidden layers to one\n", "- Change the data distribution to *exclusive OR*\n", "- Push the *run* button and see how the network is trained\n", "- Stop training after epoch 500 (each epoch involves doing gradient descent using the complete dataset)\n", "- Hover over the neurons in the hidden layer and see the vizualization of their outputs. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Learning rate\n", "\n", "Open [this](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=gauss®Dataset=reg-plane&learningRate=3®ularizationRate=0&noise=35&networkShape=1&seed=0.68448&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false) example on tensorflow playground. \n", "\n", "- Push the *run* button and see the learning process for 500 epochs. What do you observe?\n", "- Stop training and press the *restart* button. Change the learning rate from 3 to 0.1, and press the *run* button again. What is different from the previous run?\n", "- Try these steps using three learning rates: 0.3, 0.03, and 0.003:\n", " + Press the *reset* button\n", " + Change the learning rate\n", " + Press the *step* button (located at the right of *run* button) a few times, and observe how the training/test loss changes in each step. \n", " \n", "Which of those three rates would you use? " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Regularization\n", "\n", "Open [this](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=gauss®Dataset=reg-plane&learningRate=0.03®ularizationRate=0&noise=50&networkShape=4,4&seed=0.64895&showTestData=false&discretize=false&percTrainData=10&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false) example on playground. \n", "\n", "Let's first observe a few things about this example. Check the box titled *Show test data*. Uncheck the box again. As you can see, the data is noisy and the number of training examples is small. This is a situation prone to overfitting. \n", "- Press the *run* button and let the training proceed for 500 epochs, then pause the training. \n", "- What do you think about the decision boundary of the classifier? \n", "- What causes the difference between the training error and test error? (Check the *Show test data* box again)\n", "- Write down the test error\n", "\n", "We will now see how we can avoid overfitting using $L_2$ regularization. \n", "- Press the *restart* button\n", "- Change *regularization* from *None* to *L2*\n", "- Change *Regularization rate* from 0 to 0.3\n", "- Press the *run* button and run the model for 500 epochs\n", "- What is different from the previous setting? \n", "- Write down the test error\n", "\n", "Just like learning rate, different regularization rates will affect the classifier performance. Try these steps with regularization rates 0.03 and 0.003:\n", "- Press the *restart* button\n", "- Change *Regularization rate*\n", "- Press the *run* button and run the model for 500 epochs\n", "- Write down the test error\n", "\n", "Which of these regularization rates would you use?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 4 }