{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "7zSVW-HayxII", "slideshow": { "slide_type": "slide" } }, "source": [ "# Deep Feedforward Networks - a general description" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "SPkFcch2zKiP", "slideshow": { "slide_type": "slide" } }, "source": [ "**Deep feedforward networks**, also often called feedforward neural networks, or **multilayer perceptrons (MLPs)**, are the quintessential deep learning models. The goal of a feedforward network is to approximate some function $f^*$. For example, for a classifier, $y = f^∗(x)$ maps an input $x$ to a category $y$. A feedforward network defines a mapping $y = f (x; θ)$ and learns the value of the parameters θ that result in the best function approximation.\n", " \n", "These models are called feedforward because information flows through the function being evaluated from $x$, through the intermediate computations used to define $f$, and finally to the output $y$. \n", "\n", "Feedforward neural networks are called networks because they are typically represented by composing together many different functions. The model is asso- ciated with a directed acyclic graph describing how the functions are composed together. For example, we might have three functions $f^{(1)}$,$ f^{(2)}$, and $f^{(3)}$ connected in a chain, to form $f(x) = f^{(3)}(f^{(2)}(f^{(1)}(x)))$. These chain structures are the most commonly used structures of neural networks. In this case, $f^{(1)}$ is called the first layer of the network, $f^{(2)}$ is called the second layer, and so on. \n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The overall **length of the chain** gives the **depth** of the model. It is from this terminology that the name **“deep learning”** arises. The final layer of a feedforward network is called the **output layer**. During neural network training, we drive $f(x)$ to match $f^∗(x)$. The training data provides us with noisy, approximate examples of $f^∗(x)$ evaluated at different training points. Each example $x$ is accompanied by a label $y ≈ f^∗(x)$. The training examples specify directly what the output layer must do at each point x; it must produce a value that is close to $y$. The behavior of the other layers is not directly specified by the training data. The learning algorithm must decide how to use those layers to produce the desired output, but the training data does not say what each individual layer should do. Instead, the learning algorithm must decide how to use these layers to best implement an approximation of $f^∗$. Because the training data does not show the desired output for each of these layers, these layers are called **hidden layers**.\n", "\n", "Finally, these networks are called **neural** because they are loosely inspired by **neuroscience**. Each hidden layer of the network is typically **vector-valued**. The dimensionality of these hidden layers determines the width of the model. Each element of the vector may be interpreted as playing a role analogous to a neuron.\n", "\n", "![alt text](https://i.imgur.com/38lpenv.png\n", ")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "rUOfJNmw5FFX", "slideshow": { "slide_type": "slide" } }, "source": [ "# Learning the XOR function " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "3uXdVYV08xVS", "slideshow": { "slide_type": "slide" } }, "source": [ "To make the idea of a feedforward network more concrete, we begin with an example of a fully functioning feedforward network on a very simple task: learning the **XOR function**.\n", "\n", "The **XOR function (“exclusive or”)** is an operation on **two binary values, x1 and x2**. When **exactly one** of these binary values is **equal to 1**, the XOR function **returns** **1**. Otherwise, it returns 0. The XOR function provides the target function $y = f^∗(x)$ that we want to learn. Our model provides a function $y = f(x;θ)$ and our learning algorithm will adapt the parameters θ to make f as similar as possible to $f^∗$.\n", "\n", "In this simple example, we will not be concerned with statistical generalization. We want our network to perform correctly on the four points \n", "$\\mathbb{X}= \\{{[0,0]^T, [0,1]^T, [1,0]^T,[1,1]^T}\\}$. We will train the network on all four of these points. The only challenge is to fit the training set.\n", "\n", "Clearly, this is a regression problem where we can use the mean squared error loss function.\n", "\n", "Evaluated on our whole training set, the MSE loss function is\n", "\n", "$J(θ)= \\dfrac{1}{4} \\Sigma_{x\\epsilon\\mathbb{X}}(f (x)−f(x;θ))^2$ .\n", "\n", "Now we must choose the form of our model, f (x; θ). Suppose that we choose a linear model, with θ consisting of w and b. Our model is defined to be \n", "\n", "$f(x; w, b) = x^Tw + b$.\n", "\n", "We can minimize $J(θ)$ in closed form with respect to w and b using the normal\n", "equations.\n", "\n", "After solving the normal equations, we obtain a closed form solution, $w = 0$ and $b = 0.5$. This means that the linear model simply outputs 0.5 everywhere.\n", "\n", "**But why couldn't a linear model represent this function?**" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "47LeVbfZDXyJ", "slideshow": { "slide_type": "slide" } }, "source": [ "Let's now model this with a **DNN**. \n", "Specifically, we will introduce a very simple feedforward network with one\n", "hidden layer containing two hidden units. This feedforward network has a vector of hidden units $h$ that are\n", "computed by a function $f^{(1)} (x; W , c )$. The values of these hidden units are then\n", "used as the input for a second layer. The second layer is the output layer of the\n", "network. The output layer is still just a linear regression model, but now it is\n", "applied to $h$ rather than to $x$ . The network now contains two functions chained\n", "together: $h = f (1) (x; W , c )$ and $y = f^{(2)} (h; w, b)$, with the complete model being\n", "$f ( x ; W , c , w , b ) = f^{(2)} (f^{(1)} ( x ))$ .\n", "\n", "For a better illustration of the network, let's look at this figure.\n", "\n", "![alt text](https://imgur.com/BQH5IG1.png)\n", "\n", "Here $[x1,x2]$ is the input vector, and $y$ is the output. Now we need a non-linear transformation from the input feature space to the hidden feature space of $h$. This is achieved through the first layer. The first layer can be represented as $h=f^{(1)}(x,W,c)$ where $f^{(1)}$ is a non-linear transformation in itself. Thus $h = g(W^Tx+c) $ where $W^Tx+c$ is an affine transform and $g$ is a non-linear function. This function $g$ is called as the activation function. In this case, (and most cases we will) let's use a simple function known as **ReLU**. ReLU (Rectified Linear Unit), not as complex as it sounds, is the simple function $max\\{0,x\\}$. The output layer is just a linear function $w^Th+b$. Overall, the neural network represents the function\n", "\n", "$f(x;W,c,w,b)=w^Tmax\\{0,W^Tx+c\\} +b$\n", "\n", "\n", "**With this setup let's guess a solution**\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "3Un46qtVn3Tp", "slideshow": { "slide_type": "slide" } }, "source": [ "# Gradient-Based Learning\n", "**Can we keep guessing solutions like this ?**\n", "\n", "State-of-the-art neural networks have millions of parameters to be tuned. For optimizing these million parameters, we need an objective towards which we would like to drive our model. This objective is minimizing a value defined by a **cost function**. There are many cost functions which we use depending on the purpose" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "l0Ed4U8xpizs", "slideshow": { "slide_type": "slide" } }, "source": [ "## Cost functions\n", "The cost functions used in neural networks are the same as those used in simpler paramteric models such as the linear model. These cost functions represent a parametrized distribution whose parameters are to be optimized.\n", "\n", "You would have seen some cost functions yesterday. \n", "\n", "To name a few,\n", "\n", "* Mean Squared Error Loss\n", "* Cross Entropy Loss\n", "* L1 Loss\n", "\n", "Some times we use a neural network to model our loss function as well. You'll be learning that later.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "7HhHKnE2rJvt", "slideshow": { "slide_type": "slide" } }, "source": [ "## Gradient Descent\n", "Our task is now to minimize the cost function.\n", "\n", "**How do we do that?**\n", " In the case of the linear model, the loss function was convex w.r.t the parameters as seen in the image. This is a plot of a loss function $J$ w.r.t a parameter $w$.\n", " ![alt text](https://i.imgur.com/LAJ8Uag.png)\n", " \n", " From, the figure it is clear that for convex functions we can achieve the minima by descending using gradients. \n", " Esssentially, we can represent this using the update statement. We see that the gradient is positive if we move far too much from the optimal point and negative if we are far behind the optimal point. Thus the negative of the gradient gives the direction in which we have to drive our parameter $w$.\n", " \n", " ![alt text](https://i.imgur.com/oBIKHky.png)\n", " \n", " Here, $\\alpha$ is known as the learning rate because it decides how much to alter the parameter in each step.\n", " \n", "For convex cost functions there are global convergance guarantees.\n", " \n", "This step represents one iteration of gradient descent. We do this iteratively until we reach the optimal point. \n", " \n", " While this is true for convex loss functions, what about neural networks?\n", "\n", " The cost functions for Neural Networks need not be convex. Thus, there is no guarantees as such.\n", " \n", "So what do we do?\n", "\n", "We continue using gradient descent with some precautions. **We initialise the weights to be some random numbers close to 0**.\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "O-dIQqj00umt", "slideshow": { "slide_type": "slide" } }, "source": [ "### Back Propagation\n", "While in linear models gradients are easy to compute, what about neural networks?\n", "\n", "This can be done using chain rule. \n", "\n", "In \"**Deep Learning**\", we call it backpropagation." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "0i2GSwye4Q_L", "slideshow": { "slide_type": "slide" } }, "source": [ "# Circuit Intuition for Back Prop\n", "\n", "![alt text](https://i.imgur.com/MjZjeSP.png)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "_5Le7tSs1ZBD", "slideshow": { "slide_type": "slide" } }, "source": [ "## Optimization\n", "\n", "We will now discuss 3 optimizers.\n", "\n", "* **Batch Gradient Descent**\n", "* **Stochastic Gradient Descent**\n", "* **Mini Batch Gradient Descent**\n", "\n", "**Batch Gradient Descent Optimization ** involves computing the loss for the entire dataset at once and then backpropagating and updating at a time.\n", "\n", "In **Stochastic Gradient Descent Optimization**, we compute the loss function for a single training example , one at a time, and then backpropagating.\n", "\n", "In **Mini Batch Gradient Descent**, the entire training data is divided into batches and we compute loss for each batch at a time and backpropagate.\n", "\n", "Mini Batch Gradient Descent technique is commonly used for some advantages that it provides.\n", "\n", "There are various complex optimization algorithms such as momentum, RMSprop, Adam etc. Of course, we won't go into details on them.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "yPS1lbGE8ueK", "slideshow": { "slide_type": "slide" } }, "source": [ "## Hyperparameters\n", "\n", "Hyperparameters are some predefined constants used to train a network. \n", "For example, the learning rate, number of epochs to train for, batch size, number of layers in the network etc. are all hyperparameters\n", "\n", "There is no simple way to decide these. Basically, try them all and choose the best." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "0ec4pRkVu46U", "slideshow": { "slide_type": "slide" } }, "source": [ "# A Fast Pythonic Implementation of a Feed Forward Neural Network (Vectorised)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "79vGJ99fE17s", "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# Setup\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "%matplotlib inline\n", "plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n", "plt.rcParams['image.interpolation'] = 'nearest'\n", "plt.rcParams['image.cmap'] = 'gray'\n", "\n", "# for auto-reloading extenrnal modules\n", "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 504 }, "colab_type": "code", "executionInfo": { "elapsed": 1228, "status": "ok", "timestamp": 1531822868765, "user": { "displayName": "Rishhanth Maanav", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "116055284600069515854" }, "user_tz": -330 }, "id": "TStYjt7Fu-B2", "outputId": "381faa3c-156e-42a1-b94b-ed16e63d8238", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "(-1, 1)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#generate random data -- not linearly separable \n", "np.random.seed(0)\n", "N = 100 # number of points per class\n", "D = 2 # dimensionality\n", "K = 3 # number of classes\n", "X = np.zeros((N*K,D))\n", "num_train_examples = X.shape[0]\n", "y = np.zeros(N*K, dtype='uint8')\n", "for j in range(K):\n", " ix = range(N*j,N*(j+1))\n", " r = np.linspace(0.0,1,N) # radius\n", " t = np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.2 # theta\n", " X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]\n", " y[ix] = j\n", "fig = plt.figure()\n", "plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)\n", "plt.xlim([-1,1])\n", "plt.ylim([-1,1])" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "73yCicW1vW3v", "slideshow": { "slide_type": "slide" } }, "source": [ "The sigmoid function \"squashes\" inputs to lie between 0 and 1. Unfortunately, this means that for inputs with sigmoid output close to 0 or 1, the gradient with respect to those inputs are close to zero. This leads to the phenomenon of vanishing gradients, where gradients drop close to zero, and the net does not learn well.\n", "\n", "On the other hand, the relu function (max(0, x)) does not saturate with input size. Plot these functions to gain intution." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "ZqIIKzcLvEeX", "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "def sigmoid(x):\n", " x = 1/(1+np.exp(-x))\n", " return x\n", "\n", "def sigmoid_grad(x):\n", " return (x)*(1-x)\n", "\n", "def relu(x):\n", " return np.maximum(0,x)\n", "\n", "def tanh(x):\n", " return (2*sigmoid(2*x) - 1)\n", "\n", "def tanh_grad(x):\n", " return (1 - (tanh(x))**2)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "pUrzH3KqvHQx", "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "#function to train a three layer neural net with either RELU or sigmoid nonlinearity via vanilla grad descent\n", "\n", "def three_layer_net(NONLINEARITY,X,y, model, step_size, reg):\n", " #parameter initialization\n", " \n", " h= model['h']\n", " h2= model['h2']\n", " W1= model['W1']\n", " W2= model['W2']\n", " W3= model['W3']\n", " b1= model['b1']\n", " b2= model['b2']\n", " b3= model['b3']\n", " \n", " \n", " # some hyperparameters\n", "\n", "\n", " # gradient descent loop\n", " num_examples = X.shape[0]\n", " plot_array_1=[]\n", " plot_array_2=[]\n", " for i in range(50000):\n", "\n", " #FOWARD PROP\n", "\n", " if NONLINEARITY== 'RELU':\n", " hidden_layer = relu(np.dot(X, W1) + b1)\n", " hidden_layer2 = relu(np.dot(hidden_layer, W2) + b2)\n", " scores = np.dot(hidden_layer2, W3) + b3\n", "\n", " elif NONLINEARITY == 'SIGM':\n", " hidden_layer = sigmoid(np.dot(X, W1) + b1)\n", " hidden_layer2 = sigmoid(np.dot(hidden_layer, W2) + b2)\n", " scores = np.dot(hidden_layer2, W3) + b3\n", " \n", " elif NONLINEARITY == 'TANH':\n", " hidden_layer = tanh(np.dot(X, W1) + b1)\n", " hidden_layer2 = tanh(np.dot(hidden_layer, W2) + b2)\n", " scores = np.dot(hidden_layer2, W3) + b3\n", "\n", " exp_scores = np.exp(scores)\n", " probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) # [N x K]\n", "\n", " # compute the loss: average cross-entropy loss and regularization\n", " corect_logprobs = -np.log(probs[range(num_examples),y])\n", " data_loss = np.sum(corect_logprobs)/num_examples\n", " reg_loss = 0.5*reg*np.sum(W1*W1) + 0.5*reg*np.sum(W2*W2)+ 0.5*reg*np.sum(W3*W3)\n", " loss = data_loss + reg_loss\n", " if i % 1000 == 0:\n", " print(\"iteration : \"+ str(i) + \" loss : \" + str(loss) )\n", "\n", "\n", " # compute the gradient on scores\n", " dscores = probs\n", " dscores[range(num_examples),y] -= 1\n", " dscores /= num_examples\n", "\n", " \n", " # BACKPROP HERE\n", " dW3 = (hidden_layer2.T).dot(dscores)\n", " db3 = np.sum(dscores, axis=0, keepdims=True)\n", "\n", "\n", " if NONLINEARITY == 'RELU':\n", "\n", " #backprop ReLU nonlinearity here\n", " dhidden2 = np.dot(dscores, W3.T)\n", " dhidden2[hidden_layer2 <= 0] = 0\n", " dW2 = np.dot( hidden_layer.T, dhidden2)\n", " plot_array_2.append(np.sum(np.abs(dW2))/np.sum(np.abs(dW2.shape)))\n", " db2 = np.sum(dhidden2, axis=0)\n", " dhidden = np.dot(dhidden2, W2.T)\n", " dhidden[hidden_layer <= 0] = 0\n", " \n", " elif NONLINEARITY == 'SIGM':\n", "\n", " #backprop sigmoid nonlinearity here\n", " dhidden2 = dscores.dot(W3.T)*sigmoid_grad(hidden_layer2)\n", " dW2 = (hidden_layer.T).dot(dhidden2)\n", " plot_array_2.append(np.sum(np.abs(dW2))/np.sum(np.abs(dW2.shape)))\n", " db2 = np.sum(dhidden2, axis=0)\n", " dhidden = dhidden2.dot(W2.T)*sigmoid_grad(hidden_layer)\n", " \n", " elif NONLINEARITY == 'TANH':\n", " \n", " #backprop tanh nonlinearity here\n", " dhidden2 = dscores.dot(W3.T)*tanh_grad(hidden_layer2)\n", " dW2 = (hidden_layer.T).dot(dhidden2)\n", " plot_array_2.append(np.sum(np.abs(dW2))/np.sum(np.abs(dW2.shape)))\n", " db2 = np.sum(dhidden2, axis=0)\n", " dhidden = dhidden2.dot(W2.T)*tanh_grad(hidden_layer)\n", " \n", "\n", " \n", " dW1 = np.dot(X.T, dhidden)\n", " plot_array_1.append(np.sum(np.abs(dW1))/np.sum(np.abs(dW1.shape)))\n", " db1 = np.sum(dhidden, axis=0)\n", "\n", " # add regularization\n", " dW3+= reg * W3\n", " dW2 += reg * W2\n", " dW1 += reg * W1\n", " \n", " #option to return loss, grads -- uncomment next comment\n", " grads={}\n", " grads['W1']=dW1\n", " grads['W2']=dW2\n", " grads['W3']=dW3\n", " grads['b1']=db1\n", " grads['b2']=db2\n", " grads['b3']=db3\n", " #return loss, grads\n", " \n", " \n", " # update\n", " W1 += -step_size * dW1\n", " b1 += -step_size * db1\n", " W2 += -step_size * dW2\n", " b2 += -step_size * db2\n", " W3 += -step_size * dW3\n", " b3 += -step_size * db3\n", " # evaluate training set accuracy\n", " if NONLINEARITY == 'RELU':\n", " hidden_layer = relu(np.dot(X, W1) + b1)\n", " hidden_layer2 = relu(np.dot(hidden_layer, W2) + b2)\n", " elif NONLINEARITY == 'SIGM':\n", " hidden_layer = sigmoid(np.dot(X, W1) + b1)\n", " hidden_layer2 = sigmoid(np.dot(hidden_layer, W2) + b2)\n", " scores = np.dot(hidden_layer2, W3) + b3\n", " predicted_class = np.argmax(scores, axis=1)\n", " print(\"training accuracy:\" + str(np.mean(predicted_class == y))) \n", " #return cost, grads\n", " return plot_array_1, plot_array_2, W1, W2, W3, b1, b2, b3\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "xBECHHL3veTs", "slideshow": { "slide_type": "slide" } }, "source": [ "## Train net with sigmoid nonlinearity first" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 884 }, "colab_type": "code", "executionInfo": { "elapsed": 123616, "status": "ok", "timestamp": 1531822994727, "user": { "displayName": "Rishhanth Maanav", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "116055284600069515854" }, "user_tz": -330 }, "id": "22BtZy8-vJ1i", "outputId": "5748be4e-89e5-40ae-b24c-7552b64b2b48", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "iteration : 0 loss : 1.1564048888640268\n", "iteration : 1000 loss : 1.1007369422747169\n", "iteration : 2000 loss : 0.9996976474584222\n", "iteration : 3000 loss : 0.8554946768068612\n", "iteration : 4000 loss : 0.8194272926985829\n", "iteration : 5000 loss : 0.8148248201455692\n", "iteration : 6000 loss : 0.8105259085596594\n", "iteration : 7000 loss : 0.8059427888467368\n", "iteration : 8000 loss : 0.8006883214893732\n", "iteration : 9000 loss : 0.7939760223457257\n", "iteration : 10000 loss : 0.7832005415212004\n", "iteration : 11000 loss : 0.759908785971414\n", "iteration : 12000 loss : 0.7197916117626575\n", "iteration : 13000 loss : 0.6831938875797994\n", "iteration : 14000 loss : 0.6558471217342812\n", "iteration : 15000 loss : 0.6349964251436383\n", "iteration : 16000 loss : 0.6185271218753057\n", "iteration : 17000 loss : 0.6022458750439876\n", "iteration : 18000 loss : 0.5797104701761298\n", "iteration : 19000 loss : 0.5462637334154149\n", "iteration : 20000 loss : 0.5128309779987612\n", "iteration : 21000 loss : 0.4924029190124008\n", "iteration : 22000 loss : 0.481853679426717\n", "iteration : 23000 loss : 0.4759230744953976\n", "iteration : 24000 loss : 0.4720309172464675\n", "iteration : 25000 loss : 0.4690859008806442\n", "iteration : 26000 loss : 0.4666108795626802\n", "iteration : 27000 loss : 0.46438591591476186\n", "iteration : 28000 loss : 0.4623058909741167\n", "iteration : 29000 loss : 0.46031896492095215\n", "iteration : 30000 loss : 0.4583980659317958\n", "iteration : 31000 loss : 0.4565275149552446\n", "iteration : 32000 loss : 0.4546971513008592\n", "iteration : 33000 loss : 0.4529003005451173\n", "iteration : 34000 loss : 0.4511336807434287\n", "iteration : 35000 loss : 0.4493981141115461\n", "iteration : 36000 loss : 0.4476992220805742\n", "iteration : 37000 loss : 0.4460474551202782\n", "iteration : 38000 loss : 0.44445704769582944\n", "iteration : 39000 loss : 0.44294387221970544\n", "iteration : 40000 loss : 0.4415226348528942\n", "iteration : 41000 loss : 0.44020421535901677\n", "iteration : 42000 loss : 0.43899396979334165\n", "iteration : 43000 loss : 0.43789142435001516\n", "iteration : 44000 loss : 0.43689121337531434\n", "iteration : 45000 loss : 0.43598470278212487\n", "iteration : 46000 loss : 0.4351616778814694\n", "iteration : 47000 loss : 0.43441168312159384\n", "iteration : 48000 loss : 0.4337248769098653\n", "iteration : 49000 loss : 0.4330924580113893\n", "training accuracy:0.97\n" ] } ], "source": [ "#Initialize toy model, train sigmoid net\n", "\n", "N = 100 # number of points per class\n", "D = 2 # dimensionality\n", "K = 3 # number of classes\n", "h=50\n", "h2=50\n", "num_train_examples = X.shape[0]\n", "\n", "model={}\n", "model['h'] = h # size of hidden layer 1\n", "model['h2']= h2# size of hidden layer 2\n", "model['W1']= 0.1 * np.random.randn(D,h)\n", "model['b1'] = np.zeros((1,h))\n", "model['W2'] = 0.1 * np.random.randn(h,h2)\n", "model['b2']= np.zeros((1,h2))\n", "model['W3'] = 0.1 * np.random.randn(h2,K)\n", "model['b3'] = np.zeros((1,K))\n", "\n", "(sigm_array_1, sigm_array_2, s_W1, s_W2,s_W3, s_b1, s_b2,s_b3) = three_layer_net('SIGM', X,y,model, step_size=1e-1, reg=1e-3)\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 884 }, "colab_type": "code", "executionInfo": { "elapsed": 79867, "status": "ok", "timestamp": 1531823074622, "user": { "displayName": "Rishhanth Maanav", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "116055284600069515854" }, "user_tz": -330 }, "id": "OkOedIPBvNXO", "outputId": "62e8b87c-b250-4ffa-e21e-3c1b004618ff", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "iteration : 0 loss : 1.1161878505332794\n", "iteration : 1000 loss : 0.2750473719183876\n", "iteration : 2000 loss : 0.15229739108568793\n", "iteration : 3000 loss : 0.1363700498508273\n", "iteration : 4000 loss : 0.13085339604761723\n", "iteration : 5000 loss : 0.1278783087185082\n", "iteration : 6000 loss : 0.12595052350220223\n", "iteration : 7000 loss : 0.12459912899799815\n", "iteration : 8000 loss : 0.12350220538037718\n", "iteration : 9000 loss : 0.12259421916677461\n", "iteration : 10000 loss : 0.12183316159784804\n", "iteration : 11000 loss : 0.12120238213822207\n", "iteration : 12000 loss : 0.12064999237838808\n", "iteration : 13000 loss : 0.1201646007717303\n", "iteration : 14000 loss : 0.11973390288156793\n", "iteration : 15000 loss : 0.11934477378825434\n", "iteration : 16000 loss : 0.11900040882732316\n", "iteration : 17000 loss : 0.11869574387761538\n", "iteration : 18000 loss : 0.11842259785330742\n", "iteration : 19000 loss : 0.11816648588754516\n", "iteration : 20000 loss : 0.11793219180877035\n", "iteration : 21000 loss : 0.11771802461981937\n", "iteration : 22000 loss : 0.11752094533738827\n", "iteration : 23000 loss : 0.11733713100171589\n", "iteration : 24000 loss : 0.11716774683599088\n", "iteration : 25000 loss : 0.11701149261769367\n", "iteration : 26000 loss : 0.1168630591346129\n", "iteration : 27000 loss : 0.11672088022617585\n", "iteration : 28000 loss : 0.11657374759428504\n", "iteration : 29000 loss : 0.11642691110006997\n", "iteration : 30000 loss : 0.11629276946275235\n", "iteration : 31000 loss : 0.1161640711615598\n", "iteration : 32000 loss : 0.11603227969555499\n", "iteration : 33000 loss : 0.1159054915790851\n", "iteration : 34000 loss : 0.11578298884723828\n", "iteration : 35000 loss : 0.11566921776371392\n", "iteration : 36000 loss : 0.11555980390120035\n", "iteration : 37000 loss : 0.11545403322867147\n", "iteration : 38000 loss : 0.11535612294801542\n", "iteration : 39000 loss : 0.11526375605256224\n", "iteration : 40000 loss : 0.11517710451443897\n", "iteration : 41000 loss : 0.11509379999158459\n", "iteration : 42000 loss : 0.11501422326618407\n", "iteration : 43000 loss : 0.11493689583958515\n", "iteration : 44000 loss : 0.11486124742108847\n", "iteration : 45000 loss : 0.11478712171992289\n", "iteration : 46000 loss : 0.11471638412990753\n", "iteration : 47000 loss : 0.11464813357123718\n", "iteration : 48000 loss : 0.11458281113135187\n", "iteration : 49000 loss : 0.11452164702670961\n", "training accuracy:0.9933333333333333\n" ] } ], "source": [ "#Re-initialize model, train relu net\n", "\n", "model={}\n", "model['h'] = h # size of hidden layer 1\n", "model['h2']= h2# size of hidden layer 2\n", "model['W1']= 0.1 * np.random.randn(D,h)\n", "model['b1'] = np.zeros((1,h))\n", "model['W2'] = 0.1 * np.random.randn(h,h2)\n", "model['b2']= np.zeros((1,h2))\n", "model['W3'] = 0.1 * np.random.randn(h2,K)\n", "model['b3'] = np.zeros((1,K))\n", "\n", "(relu_array_1, relu_array_2, r_W1, r_W2,r_W3, r_b1, r_b2,r_b3) = three_layer_net('RELU', X,y,model, step_size=1e-1, reg=1e-3)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 816 }, "colab_type": "code", "id": "bB9GjQWnvRCD", "outputId": "7b098450-c924-4978-f966-df5c4b8d7bde", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "iteration : 0 loss : 1.1068763405022943\n", "iteration : 1000 loss : 0.765900305099139\n", "iteration : 2000 loss : 0.43433340575057927\n", "iteration : 3000 loss : 0.22102602425514709\n", "iteration : 4000 loss : 0.17630149048886168\n", "iteration : 5000 loss : 0.16120110789075087\n", "iteration : 6000 loss : 0.15403330978940422\n", "iteration : 7000 loss : 0.14995389923671038\n", "iteration : 8000 loss : 0.14756987592868553\n", "iteration : 9000 loss : 0.1460033554067789\n", "iteration : 10000 loss : 0.14484074848949718\n", "iteration : 11000 loss : 0.143917150556344\n", "iteration : 12000 loss : 0.1431524860703442\n", "iteration : 13000 loss : 0.1425024253504807\n", "iteration : 14000 loss : 0.1419396983654305\n", "iteration : 15000 loss : 0.14144531574346852\n", "iteration : 16000 loss : 0.14100513992998642\n", "iteration : 17000 loss : 0.1406086147052447\n", "iteration : 18000 loss : 0.14024783674661548\n", "iteration : 19000 loss : 0.1399167397799024\n", "iteration : 20000 loss : 0.1396105838903235\n", "iteration : 21000 loss : 0.13932578167142134\n", "iteration : 22000 loss : 0.13905997472316964\n", "iteration : 23000 loss : 0.13881219388086213\n", "iteration : 24000 loss : 0.13858276838754063\n", "iteration : 25000 loss : 0.1383725565194345\n", "iteration : 26000 loss : 0.13818155086583173\n", "iteration : 27000 loss : 0.13800777673779102\n", "iteration : 28000 loss : 0.13784743936906896\n", "iteration : 29000 loss : 0.1376962619276408\n", "iteration : 30000 loss : 0.1375508542335352\n", "iteration : 31000 loss : 0.13740910541048212\n", "iteration : 32000 loss : 0.137269810668697\n", "iteration : 33000 loss : 0.13713220837167583\n", "iteration : 34000 loss : 0.13699572940488333\n", "iteration : 35000 loss : 0.1368599317864888\n", "iteration : 36000 loss : 0.1367245167525608\n", "iteration : 37000 loss : 0.1365893505563261\n", "iteration : 38000 loss : 0.13645445725368094\n", "iteration : 39000 loss : 0.13631997913899033\n", "iteration : 40000 loss : 0.13618612025959603\n", "iteration : 41000 loss : 0.13605309543868602\n", "iteration : 42000 loss : 0.13592110291294712\n", "iteration : 43000 loss : 0.1357903243472332\n", "iteration : 44000 loss : 0.13566093965311873\n", "iteration : 45000 loss : 0.13553313752362012\n", "iteration : 46000 loss : 0.13540710938798145\n", "iteration : 47000 loss : 0.135283026892158\n", "iteration : 48000 loss : 0.13516101211596993\n", "iteration : 49000 loss : 0.13504111252381806\n", "training accuracy:0.9933333333333333\n" ] } ], "source": [ "#Re-initialize model, train tanh net\n", "\n", "model={}\n", "model['h'] = h # size of hidden layer 1\n", "model['h2']= h2# size of hidden layer 2\n", "model['W1']= 0.1 * np.random.randn(D,h)\n", "model['b1'] = np.zeros((1,h))\n", "model['W2'] = 0.1 * np.random.randn(h,h2)\n", "model['b2']= np.zeros((1,h2))\n", "model['W3'] = 0.1 * np.random.randn(h2,K)\n", "model['b3'] = np.zeros((1,K))\n", "\n", "(tanh_array_1, tanh_array_2, t_W1, t_W2,t_W3, t_b1, t_b2,t_b3) = three_layer_net('TANH', X,y,model, step_size=1e-1, reg=1e-3)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "pWJ8yas8vj7W", "slideshow": { "slide_type": "slide" } }, "source": [ "## The Vanishing Gradient Issue" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "DH7Wcv-Hvlqk", "slideshow": { "slide_type": "slide" } }, "source": [ "We can use the sum of the magnitude of gradients for the weights between hidden layers as a cheap heuristic to measure speed of learning (you can also use the magnitude of gradients for each neuron in the hidden layer here). Intuitevely, when the magnitude of the gradients of the weight vectors or of each neuron are large, the net is learning faster. (NOTE: For our net, each hidden layer has the same number of neurons. If you want to play around with this, make sure to adjust the heuristic to account for the number of neurons in the layer)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 214 }, "colab_type": "code", "executionInfo": { "elapsed": 2938, "status": "error", "timestamp": 1531822681048, "user": { "displayName": "Rishhanth Maanav", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "116055284600069515854" }, "user_tz": -330 }, "id": "jnXq-vKPvkWy", "outputId": "62c28bb4-001a-48fe-ebd6-3d387d7810c9", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(np.array(sigm_array_1))\n", "plt.plot(np.array(sigm_array_2))\n", "plt.title('Sum of magnitudes of gradients -- SIGM weights')\n", "plt.legend((\"sigm first layer\", \"sigm second layer\"))\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "ptKnXtCdvr3q", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(np.array(relu_array_1))\n", "plt.plot(np.array(relu_array_2))\n", "plt.title('Sum of magnitudes of gradients -- ReLU weights')\n", "plt.legend((\"relu first layer\", \"relu second layer\"))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "WLHDF9lOvwif", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(np.array(tanh_array_1))\n", "plt.plot(np.array(tanh_array_2))\n", "plt.title('Sum of magnitudes of gradients -- TANH weights')\n", "plt.legend((\"tanh first layer\", \"tanh second layer\"))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "QQQmG_DBvwj_", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Overlaying the two plots to compare\n", "plt.plot(np.array(relu_array_1))\n", "plt.plot(np.array(relu_array_2))\n", "plt.plot(np.array(sigm_array_1))\n", "plt.plot(np.array(sigm_array_2))\n", "plt.plot(np.array(tanh_array_1))\n", "plt.plot(np.array(tanh_array_2))\n", "plt.title('Sum of magnitudes of gradients -- hidden layer neurons')\n", "plt.legend((\"relu first layer\", \"relu second layer\",\"sigm first layer\", \"sigm second layer\",\"tanh first layer\",\"tanh second layer\"))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "S0fw2FG5v55n", "slideshow": { "slide_type": "slide" } }, "source": [ "We can see how well each classifier does in terms of distinguishing the toy data classes. As expected, since the ReLU net trains faster, for a set number of epochs it performs better compared to the sigmoid net" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "1HVsYKhNvwlk", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "(-1.8712034092398278, 1.8687965907601756)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "\n", "# plot the classifiers- SIGMOID\n", "h = 0.02\n", "x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1\n", "y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1\n", "xx, yy = np.meshgrid(np.arange(x_min, x_max, h),\n", " np.arange(y_min, y_max, h))\n", "Z = np.dot(sigmoid(np.dot(sigmoid(np.dot(np.c_[xx.ravel(), yy.ravel()], s_W1) + s_b1), s_W2) + s_b2), s_W3) + s_b3\n", "Z = np.argmax(Z, axis=1)\n", "Z = Z.reshape(xx.shape)\n", "fig = plt.figure()\n", "plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.8)\n", "plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)\n", "plt.xlim(xx.min(), xx.max())\n", "plt.ylim(yy.min(), yy.max())\n", "\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "5bAFKBFMvwp6", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "(-1.8712034092398278, 1.8687965907601756)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# plot the classifiers-- RELU\n", "h = 0.02\n", "x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1\n", "y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1\n", "xx, yy = np.meshgrid(np.arange(x_min, x_max, h),\n", " np.arange(y_min, y_max, h))\n", "Z = np.dot(relu(np.dot(relu(np.dot(np.c_[xx.ravel(), yy.ravel()], r_W1) + r_b1), r_W2) + r_b2), r_W3) + r_b3\n", "Z = np.argmax(Z, axis=1)\n", "Z = Z.reshape(xx.shape)\n", "fig = plt.figure()\n", "plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.8)\n", "plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)\n", "plt.xlim(xx.min(), xx.max())\n", "plt.ylim(yy.min(), yy.max())\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "XmnhXwPjwCQE", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "(-1.8712034092398278, 1.8687965907601756)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "h = 0.02\n", "x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1\n", "y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1\n", "xx, yy = np.meshgrid(np.arange(x_min, x_max, h),\n", " np.arange(y_min, y_max, h))\n", "Z = np.dot(tanh(np.dot(tanh(np.dot(np.c_[xx.ravel(), yy.ravel()], t_W1) + t_b1), t_W2) + t_b2), t_W3) + t_b3\n", "Z = np.argmax(Z, axis=1)\n", "Z = Z.reshape(xx.shape)\n", "fig = plt.figure()\n", "plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.8)\n", "plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)\n", "plt.xlim(xx.min(), xx.max())\n", "plt.ylim(yy.min(), yy.max())\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "qvWJhpih1rZr" }, "source": [ "# Overfitting and Underfitting" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "HfAwGQ3x11hJ" }, "source": [ "![overfitting](https://cdn-images-1.medium.com/max/1125/1*_7OPgojau8hkiPUiHoGK_w.png)\n", "\n", "![overfittign](https://www.apixio.com/wp-content/uploads/2017/10/classification-with-overfitting-2.png)\n", "\n", "![over](https://raw.githubusercontent.com/alexeygrigorev/wiki-figures/master/ufrt/kddm/overfitting-logreg-ex.png)\n", "\n", "![fitting_curves](http://bioinfo.iric.ca/wpbioinfo/wp-content/uploads/2017/10/error_curves.png)\n", "\n", "![overfitting](http://srdas.github.io/DLBook/DL_images/UnderfittingOverfitting.png)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "MIUR7PBY3cAo" }, "source": [ "# Overfitting\n", "\n", "Overfitting refers to a model that models the training data too well.\n", "\n", "- Overfitting happens when a model learns the **detail and noise** in the training data to the extent that it **negatively** impacts the performance of the model on new data.\n", "- **Noise or random fluctuations** in the training data is picked up and **learned** as concepts by the model\n", "\n", "- Occurs when the **Representation Power** of the model is way too much when compared to the **actual complexity** needed to solve the problem." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "OLpeC9a74l3O" }, "source": [ "# Underfitting\n", "\n", "Underfitting refers to a model that can neither model the training data nor generalize to new data.\n", "\n", "- An underfit machine learning model is not a suitable model and will be obvious as it will have poor performance on the training data.\n", "\n", "- Underfitting can easily be detected as the training performance will be low given a proper metric. So its obviously not suitable for deployment.\n", "\n", "- Increase the model's representation power by increasing the number of parameters to optimize incase of parametric models\n", " - In neural Nets, increase the number of hidden layers and no of neurons per hidden layer. This increases the models representation capability.\n", " " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "CUpvJ-6i9PBk" }, "source": [ "# How to avoid overfitting? \n", "\n", "> Regularization" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "sdxXCCiQ9dt0" }, "source": [ "## Parameter Penalties\n", "\n", "- Adding Parameter norm penalty $\\Omega(\\theta)$ to the loss function\n", "- $\\Omega(\\theta)$ can be any function of $\\theta$, we will see about it in detail.\n", "\n", "\\begin{equation}\n", "\\vec J(\\theta: X,y) = J(\\theta : X, y) + \\alpha\\ \\Omega(\\theta) \\\\\n", "\\alpha\\ \\epsilon\\ [0, \\infty]\n", "\\end{equation}\n", "\n", "The term $\\alpha$ decides the amount of regularization term to add.\n", "\n", "| $\\alpha$ | Regularization |\n", "|---|---|\n", "|0| No Regularization whatsoever| \n", "| $\\downarrow$ | $\\downarrow$ |\n", "| $\\uparrow$ | $\\uparrow$ |\n", "| $\\infty$ | Infinite Penalty, $\\theta$ collapses to 0\n", "\n", "- In Neural nets, only $W$ parameters are subject to regularization, bias vectors ($b$) are not. This is because, \n", " - Each weight $W_{ij}$ specifies how 2 variables interact. Fitting / finding the correct Weight value requires observing different values of the 2 variables at different conditions.\n", " - Bias Vectors control only single variable.\n", " - We might induce underfitting by including redularization in bias values." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Hgn4SAppDyy1" }, "source": [ "### $L^2$ Parameter Regularization\n", "\n", "> Commonly known as **weight decay**.\n", "\n", "- This regularization strategy drives weights close to **origin**.\n", "\n", "- $\\Omega(\\theta) = \\frac{1}{2}||w||_2^2$\n", "\n", "- Also known as **ridge regression** or **Tikhonov Regression**\n", "\n", "Lets assume A, B are highly correlated features.\n", "\n", "> Corellation means A and B are couppled in a sense. +ve correlation and -ve correlation.\n", "\n", "They are so correlated so that we can assume A $\\approx$ B. \n", "\n", "These 2 being the features of the model, the weights will multiply them and we get\n", "\n", "$\n", " Y = W_aA + W_bB\n", "$\n", "\n", "Lets assume $W_a = 4, W_b = -2$, but since A and B are almost equal so\n", "\n", "\\begin{equation}\n", "Y = 4A - 2B \\approx 2A\n", "\\end{equation}\n", "\n", "But,\n", "\n", "\\begin{equation}\n", "Y = 10A - 8B \\approx 2A\n", "\\end{equation}\n", "\n", "and again,\n", "\n", "\\begin{equation}\n", "Y = 1000002A - 1000000B \\approx 2A\n", "\\end{equation}\n", "\n", "So you can see the difficulty in $optimization$. So this regularization basically says, if such a condition arises, choose the smallest (closest to origin) $W_a, W_b$ which satisfies the condition.\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "CXvP8w7-H9f6" }, "source": [ "### $L^1$ Regularization\n", "\n", "Here similar to $L^2$ Regularization, but the regularization function is different.\n", "\n", "\\begin{equation}\n", "\\Omega(\\theta) = ||w||_1 = \\sum_i|w_i|\n", "\\end{equation}\n", "\n", "Here the optimal solution for some paramters will be 0. This means, $L^1$ regularization will favour **sparse** solutions. \n", "\n", "Can be used in **feature selection** mechanism. If weights of some features reduces to 0, this means we can safely disregard those features from our model.\n", " - Remember $W$ values for features implies the importance of the features in the prediction output. If the weight for a particular feature is 0, this means its nor important.\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "TBIzPubsLCoe" }, "source": [ "### Norm Regularizations as Constraint Optimizations\n", "\n", "Recall,\n", "\n", "\\begin{equation}\n", "\\vec J(\\theta; X, y) = J(\\theta; X, y) + \\alpha\\ \\Omega(\\theta)\n", "\\end{equation}\n", "\n", "Also recalling Lagrange Multipliers, \n", "\n", "![lagrange_multipliers](https://i.stack.imgur.com/9NIoJ.png)\n", "\n", "![lagrange_multipliers](http://math.etsu.edu/multicalc/prealpha/chap2/chap2-9/10-8-20.gif)\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "B5-kcQSU-waE" }, "source": [ "# We'll now see an example of overfitting and another where we try to combat that using regularization" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "Gcv-ZlCp--DZ" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] } ], "source": [ "from keras.datasets import mnist\n", "(train_images, train_labels), (test_images, test_labels) = mnist.load_data()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "dthGvXbN_IG3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training data shape : (60000, 28, 28) (60000,)\n", "Testing data shape : (10000, 28, 28) (10000,)\n", "Total number of outputs : 10\n", "Output classes : [0 1 2 3 4 5 6 7 8 9]\n" ] }, { "data": { "text/plain": [ "Text(0.5,1,'Ground Truth : 7')" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from keras.utils import to_categorical\n", " \n", "print('Training data shape : ', train_images.shape, train_labels.shape)\n", " \n", "print('Testing data shape : ', test_images.shape, test_labels.shape)\n", " \n", "# Find the unique numbers from the train labels\n", "classes = np.unique(train_labels)\n", "nClasses = len(classes)\n", "print('Total number of outputs : ', nClasses)\n", "print('Output classes : ', classes)\n", " \n", "plt.figure(figsize=[10,5])\n", " \n", "# Display the first image in training data\n", "plt.subplot(121)\n", "plt.imshow(train_images[0,:,:], cmap='gray')\n", "plt.title(\"Ground Truth : {}\".format(train_labels[0]))\n", " \n", "# Display the first image in testing data\n", "plt.subplot(122)\n", "plt.imshow(test_images[0,:,:], cmap='gray')\n", "plt.title(\"Ground Truth : {}\".format(test_labels[0]))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "KRENCQ3-_Kt0" }, "outputs": [], "source": [ "# Change from matrix to array of dimension 28x28 to array of dimention 784\n", "dimData = np.prod(train_images.shape[1:])\n", "train_data = train_images.reshape(train_images.shape[0], dimData)\n", "test_data = test_images.reshape(test_images.shape[0], dimData)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "0cXV8R_q_M-K" }, "outputs": [], "source": [ "# Change to float datatype\n", "train_data = train_data.astype('float32')\n", "test_data = test_data.astype('float32')\n", " \n", "# Scale the data to lie between 0 to 1\n", "train_data /= 255\n", "test_data /= 255" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "46HIlRDn_PSo" }, "outputs": [], "source": [ "# Change to float datatype\n", "train_data = train_data.astype('float32')\n", "test_data = test_data.astype('float32')\n", " \n", "# Scale the data to lie between 0 to 1\n", "train_data /= 255\n", "test_data /= 255" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Original label 0 : 5\n", "After conversion to categorical ( one-hot ) : [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]\n" ] } ], "source": [ "# Change the labels from integer to categorical data\n", "train_labels_one_hot = to_categorical(train_labels)\n", "test_labels_one_hot = to_categorical(test_labels)\n", " \n", "# Display the change for category label using one-hot encoding\n", "print('Original label 0 : ', train_labels[0])\n", "print('After conversion to categorical ( one-hot ) : ', train_labels_one_hot[0])" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "9qsxxDFN_RIf" }, "outputs": [], "source": [ "from keras.models import Sequential\n", "from keras.layers import Dense\n", " \n", "model = Sequential()\n", "model.add(Dense(512, activation='relu', input_shape=(dimData,)))\n", "model.add(Dense(512, activation='relu'))\n", "model.add(Dense(nClasses, activation='softmax'))" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "mRKtJUtO_TPC" }, "outputs": [], "source": [ "model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "PJYgU0MS_Uke" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 60000 samples, validate on 10000 samples\n", "Epoch 1/20\n", "60000/60000 [==============================] - 10s 163us/step - loss: 1.0841 - acc: 0.6743 - val_loss: 0.5491 - val_acc: 0.8353\n", "Epoch 2/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.4411 - acc: 0.8705 - val_loss: 0.3729 - val_acc: 0.8888\n", "Epoch 3/20\n", "60000/60000 [==============================] - 1s 12us/step - loss: 0.3483 - acc: 0.8982 - val_loss: 0.3074 - val_acc: 0.9093\n", "Epoch 4/20\n", "60000/60000 [==============================] - 1s 12us/step - loss: 0.3021 - acc: 0.9117 - val_loss: 0.3059 - val_acc: 0.9067\n", "Epoch 5/20\n", "60000/60000 [==============================] - 1s 12us/step - loss: 0.2653 - acc: 0.9222 - val_loss: 0.2584 - val_acc: 0.9240\n", "Epoch 6/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.2335 - acc: 0.9309 - val_loss: 0.2581 - val_acc: 0.9220\n", "Epoch 7/20\n", "60000/60000 [==============================] - 1s 12us/step - loss: 0.2056 - acc: 0.9389 - val_loss: 0.2149 - val_acc: 0.9345\n", "Epoch 8/20\n", "60000/60000 [==============================] - 1s 12us/step - loss: 0.1821 - acc: 0.9463 - val_loss: 0.1679 - val_acc: 0.9490\n", "Epoch 9/20\n", "60000/60000 [==============================] - 1s 12us/step - loss: 0.1619 - acc: 0.9514 - val_loss: 0.1533 - val_acc: 0.9536\n", "Epoch 10/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.1445 - acc: 0.9564 - val_loss: 0.1392 - val_acc: 0.9581\n", "Epoch 11/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.1307 - acc: 0.9611 - val_loss: 0.1280 - val_acc: 0.9609\n", "Epoch 12/20\n", "60000/60000 [==============================] - 1s 12us/step - loss: 0.1187 - acc: 0.9647 - val_loss: 0.1193 - val_acc: 0.9646\n", "Epoch 13/20\n", "60000/60000 [==============================] - 1s 12us/step - loss: 0.1074 - acc: 0.9679 - val_loss: 0.1148 - val_acc: 0.9652\n", "Epoch 14/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.0986 - acc: 0.9700 - val_loss: 0.1116 - val_acc: 0.9655\n", "Epoch 15/20\n", "60000/60000 [==============================] - 1s 14us/step - loss: 0.0896 - acc: 0.9730 - val_loss: 0.1040 - val_acc: 0.9676\n", "Epoch 16/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.0829 - acc: 0.9751 - val_loss: 0.0958 - val_acc: 0.9698\n", "Epoch 17/20\n", "60000/60000 [==============================] - 1s 12us/step - loss: 0.0763 - acc: 0.9767 - val_loss: 0.0908 - val_acc: 0.9703\n", "Epoch 18/20\n", "60000/60000 [==============================] - 1s 12us/step - loss: 0.0706 - acc: 0.9785 - val_loss: 0.0941 - val_acc: 0.9709\n", "Epoch 19/20\n", "60000/60000 [==============================] - 1s 12us/step - loss: 0.0653 - acc: 0.9809 - val_loss: 0.0897 - val_acc: 0.9722\n", "Epoch 20/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.0601 - acc: 0.9819 - val_loss: 0.0791 - val_acc: 0.9747\n" ] } ], "source": [ "history = model.fit(train_data, train_labels_one_hot, batch_size=256, epochs=20, verbose=1, \n", " validation_data=(test_data, test_labels_one_hot))" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "lgs9evVP_Vno" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10000/10000 [==============================] - 0s 20us/step\n", "Evaluation result on Test Data : Loss = 0.07914772396665067, accuracy = 0.9747\n" ] } ], "source": [ "[test_loss, test_acc] = model.evaluate(test_data, test_labels_one_hot)\n", "print(\"Evaluation result on Test Data : Loss = {}, accuracy = {}\".format(test_loss, test_acc))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "ZLMKzPOd_Z5C" }, "outputs": [ { "data": { "text/plain": [ "Text(0.5,1,'Accuracy Curves')" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#Plot the Loss Curves\n", "plt.figure(figsize=[8,6])\n", "plt.plot(history.history['loss'],'r',linewidth=3.0)\n", "plt.plot(history.history['val_loss'],'b',linewidth=3.0)\n", "plt.legend(['Training loss', 'Validation Loss'],fontsize=18)\n", "plt.xlabel('Epochs ',fontsize=16)\n", "plt.ylabel('Loss',fontsize=16)\n", "plt.title('Loss Curves',fontsize=16)\n", " \n", "#Plot the Accuracy Curves\n", "plt.figure(figsize=[8,6])\n", "plt.plot(history.history['acc'],'r',linewidth=3.0)\n", "plt.plot(history.history['val_acc'],'b',linewidth=3.0)\n", "plt.legend(['Training Accuracy', 'Validation Accuracy'],fontsize=18)\n", "plt.xlabel('Epochs ',fontsize=16)\n", "plt.ylabel('Accuracy',fontsize=16)\n", "plt.title('Accuracy Curves',fontsize=16)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "rsYJYkcX_eHA" }, "source": [ "## There is a clear sign of OverFitting. Why do you think so?\n", "\n", "Carefully see the Validation loss and Training loss curve. Validation loss decreases and then it gradually increases. This means that model is memorising the dataset, though in this case accuracy is much higher. \n", "\n", "** How to combat that?? **\n", "# Use Regularization !" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "evRF4jI6_fww" }, "outputs": [], "source": [ "from keras.layers import Dropout\n", " \n", "model_reg = Sequential()\n", "model_reg.add(Dense(512, activation='relu', input_shape=(dimData,)))\n", "model_reg.add(Dropout(0.5))\n", "model_reg.add(Dense(512, activation='relu'))\n", "model_reg.add(Dropout(0.5))\n", "model_reg.add(Dense(nClasses, activation='softmax'))" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "YbMVvqfb_jaX" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 60000 samples, validate on 10000 samples\n", "Epoch 1/20\n", "60000/60000 [==============================] - 1s 17us/step - loss: 0.1050 - acc: 0.9688 - val_loss: 0.0919 - val_acc: 0.9736\n", "Epoch 2/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.0999 - acc: 0.9699 - val_loss: 0.0972 - val_acc: 0.9729\n", "Epoch 3/20\n", "60000/60000 [==============================] - 1s 14us/step - loss: 0.1003 - acc: 0.9706 - val_loss: 0.0906 - val_acc: 0.9745\n", "Epoch 4/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.0946 - acc: 0.9721 - val_loss: 0.0884 - val_acc: 0.9739\n", "Epoch 5/20\n", "60000/60000 [==============================] - 1s 15us/step - loss: 0.0933 - acc: 0.9727 - val_loss: 0.0869 - val_acc: 0.9754\n", "Epoch 6/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.0895 - acc: 0.9729 - val_loss: 0.0877 - val_acc: 0.9743\n", "Epoch 7/20\n", "60000/60000 [==============================] - 1s 15us/step - loss: 0.0883 - acc: 0.9739 - val_loss: 0.0887 - val_acc: 0.9738\n", "Epoch 8/20\n", "60000/60000 [==============================] - 1s 15us/step - loss: 0.0863 - acc: 0.9741 - val_loss: 0.0866 - val_acc: 0.9751\n", "Epoch 9/20\n", "60000/60000 [==============================] - 1s 14us/step - loss: 0.0859 - acc: 0.9746 - val_loss: 0.0836 - val_acc: 0.9770\n", "Epoch 10/20\n", "60000/60000 [==============================] - 1s 15us/step - loss: 0.0845 - acc: 0.9749 - val_loss: 0.0857 - val_acc: 0.9759\n", "Epoch 11/20\n", "60000/60000 [==============================] - 1s 14us/step - loss: 0.0809 - acc: 0.9759 - val_loss: 0.0830 - val_acc: 0.9774\n", "Epoch 12/20\n", "60000/60000 [==============================] - 1s 14us/step - loss: 0.0810 - acc: 0.9768 - val_loss: 0.0887 - val_acc: 0.9759\n", "Epoch 13/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.0787 - acc: 0.9770 - val_loss: 0.0856 - val_acc: 0.9760\n", "Epoch 14/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.0764 - acc: 0.9776 - val_loss: 0.0827 - val_acc: 0.9768\n", "Epoch 15/20\n", "60000/60000 [==============================] - 1s 13us/step - loss: 0.0763 - acc: 0.9777 - val_loss: 0.0804 - val_acc: 0.9779\n", "Epoch 16/20\n", "60000/60000 [==============================] - 1s 14us/step - loss: 0.0742 - acc: 0.9784 - val_loss: 0.0811 - val_acc: 0.9782\n", "Epoch 17/20\n", "60000/60000 [==============================] - 1s 14us/step - loss: 0.0728 - acc: 0.9781 - val_loss: 0.0785 - val_acc: 0.9775\n", "Epoch 18/20\n", "60000/60000 [==============================] - 1s 14us/step - loss: 0.0721 - acc: 0.9786 - val_loss: 0.0819 - val_acc: 0.9774\n", "Epoch 19/20\n", "60000/60000 [==============================] - 1s 15us/step - loss: 0.0701 - acc: 0.9792 - val_loss: 0.0795 - val_acc: 0.9799\n", "Epoch 20/20\n", "60000/60000 [==============================] - 1s 15us/step - loss: 0.0722 - acc: 0.9796 - val_loss: 0.0753 - val_acc: 0.9798\n" ] }, { "data": { "text/plain": [ "Text(0.5,1,'Accuracy Curves')" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model_reg.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])\n", "history_reg = model_reg.fit(train_data, train_labels_one_hot, batch_size=256, epochs=20, verbose=1, \n", " validation_data=(test_data, test_labels_one_hot))\n", " \n", "#Plot the Loss Curves\n", "plt.figure(figsize=[8,6])\n", "plt.plot(history_reg.history['loss'],'r',linewidth=3.0)\n", "plt.plot(history_reg.history['val_loss'],'b',linewidth=3.0)\n", "plt.legend(['Training loss', 'Validation Loss'],fontsize=18)\n", "plt.xlabel('Epochs ',fontsize=16)\n", "plt.ylabel('Loss',fontsize=16)\n", "plt.title('Loss Curves',fontsize=16)\n", " \n", "#Plot the Accuracy Curves\n", "plt.figure(figsize=[8,6])\n", "plt.plot(history_reg.history['acc'],'r',linewidth=3.0)\n", "plt.plot(history_reg.history['val_acc'],'b',linewidth=3.0)\n", "plt.legend(['Training Accuracy', 'Validation Accuracy'],fontsize=18)\n", "plt.xlabel('Epochs ',fontsize=16)\n", "plt.ylabel('Accuracy',fontsize=16)\n", "plt.title('Accuracy Curves',fontsize=16)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "OCP-IqMD_n9D" }, "source": [ "## What we note??\n", "\n", "* Validation loss is not increasing as it did before.\n", "* Difference between the validation and training accuracy is not that much\n", "\n", "This implies better generalisation and can work will on unseen data samples.\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "3tkYXx4et74o" }, "source": [ "# Comparision of Various Optimizers: Stochastic Gradient Descent, RMSprop, Adam, Adagrad" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 8530, "status": "ok", "timestamp": 1531761101645, "user": { "displayName": "Rishhanth Maanav", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "116055284600069515854" }, "user_tz": -330 }, "id": "J1ISUIQZm6TB", "outputId": "2a2c3e8b-ce85-49b4-ac10-234bc0208c17" }, "outputs": [], "source": [ "import time\n", "import numpy as np\n", "from matplotlib import pyplot as plt\n", "from keras.utils import np_utils\n", "import keras.callbacks as cb\n", "from keras.models import Sequential\n", "from keras.layers.core import Dense, Dropout, Activation\n", "from keras.optimizers import *\n", "from keras.datasets import mnist" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "RmdLi4CGm_eK" }, "outputs": [], "source": [ "class LossHistory(cb.Callback):\n", " def on_train_begin(self, logs={}):\n", " self.losses = []\n", "\n", " def on_batch_end(self, batch, logs={}):\n", " batch_loss = logs.get('loss')\n", " self.losses.append(batch_loss)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "KTDro5wJnClX" }, "outputs": [], "source": [ "def load_data():\n", " print ('Loading data...')\n", " (X_train, y_train), (X_test, y_test) = mnist.load_data()\n", "\n", " X_train = X_train.astype('float32')\n", " X_test = X_test.astype('float32')\n", "\n", " X_train /= 255\n", " X_test /= 255\n", "\n", " y_train = np_utils.to_categorical(y_train, 10)\n", " y_test = np_utils.to_categorical(y_test, 10)\n", "\n", " X_train = np.reshape(X_train, (60000, 784))\n", " X_test = np.reshape(X_test, (10000, 784))\n", "\n", " print ('Data loaded.')\n", " return [X_train, X_test, y_train, y_test]\n" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "XGSzKUf6nbLX" }, "outputs": [], "source": [ "def init_model(Optimizer,lrnr):\n", " start_time = time.time()\n", " print ('Compiling Model ... ')\n", " model = Sequential()\n", " model.add(Dense(500, input_dim=784))\n", " model.add(Activation('relu'))\n", " model.add(Dropout(0.4))\n", " model.add(Dense(300))\n", " model.add(Activation('relu'))\n", " model.add(Dropout(0.4))\n", " model.add(Dense(10))\n", " model.add(Activation('softmax'))\n", " \n", " optim_dict = {}\n", " optim_dict['RMSprop'] = RMSprop(lr = lrnr)\n", " optim_dict['SGD'] = SGD(lr = lrnr)\n", " optim_dict['Adam'] = Adam(lr = lrnr)\n", " optim_dict['Adagrad'] = Adagrad(lr = lrnr)\n", " optim = optim_dict[Optimizer]\n", " model.compile(loss='categorical_crossentropy', optimizer=optim, metrics=['accuracy'])\n", " print ('Model compield in {0} seconds'.format(time.time() - start_time))\n", " return model" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "dZpfXbzYn-8z" }, "outputs": [], "source": [ "def run_network(data=None, model=None, epochs=20, batch=256, Optimizer = 'RMSprop',lrnr = 1e-2):\n", " try:\n", " start_time = time.time()\n", " if data is None:\n", " X_train, X_test, y_train, y_test = load_data()\n", " else:\n", " X_train, X_test, y_train, y_test = data\n", "\n", " if model is None:\n", " model = init_model(Optimizer,lrnr)\n", "\n", " history = LossHistory()\n", " print(\"####### Optimizer being Used: \" + Optimizer)\n", " print ('Training model...')\n", " model.fit(X_train, y_train, epochs=epochs, batch_size=batch,\n", " callbacks=[history],\n", " validation_data=(X_test, y_test), verbose=2)\n", "\n", " print (\"Training duration : {0}\".format(time.time() - start_time))\n", " score = model.evaluate(X_test, y_test, batch_size=16)\n", "\n", " print (\"Network's test score [loss, accuracy]: {0}\".format(score))\n", " return model, history.losses\n", " except KeyboardInterrupt:\n", " print (' KeyboardInterrupt')\n", " return model, history.losses" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "77npFi_boEQG" }, "outputs": [], "source": [ "def plot_losses(losses):\n", " fig = plt.figure()\n", " ax = fig.add_subplot(111)\n", " ax.plot(losses)\n", " ax.set_title('Loss per batch')\n", " fig.show()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 3471 }, "colab_type": "code", "executionInfo": { "elapsed": 140286, "status": "ok", "timestamp": 1531761247397, "user": { "displayName": "Rishhanth Maanav", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "116055284600069515854" }, "user_tz": -330 }, "id": "KVfSCHDNoQS6", "outputId": "153a975f-ddf7-41b0-fdaa-5bf847494971" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Loading data...\n", "Data loaded.\n", "Compiling Model ... \n", "Model compield in 0.08845806121826172 seconds\n", "####### Optimizer being Used: SGD\n", "Training model...\n", "Train on 60000 samples, validate on 10000 samples\n", "Epoch 1/20\n", " - 1s - loss: 1.7773 - acc: 0.4428 - val_loss: 1.0247 - val_acc: 0.8108\n", "Epoch 2/20\n", " - 1s - loss: 0.9804 - acc: 0.7167 - val_loss: 0.5855 - val_acc: 0.8594\n", "Epoch 3/20\n", " - 1s - loss: 0.7236 - acc: 0.7817 - val_loss: 0.4597 - val_acc: 0.8796\n", "Epoch 4/20\n", " - 1s - loss: 0.6116 - acc: 0.8163 - val_loss: 0.4007 - val_acc: 0.8935\n", "Epoch 5/20\n", " - 1s - loss: 0.5440 - acc: 0.8373 - val_loss: 0.3659 - val_acc: 0.9000\n", "Epoch 6/20\n", " - 1s - loss: 0.4992 - acc: 0.8503 - val_loss: 0.3403 - val_acc: 0.9060\n", "Epoch 7/20\n", " - 1s - loss: 0.4716 - acc: 0.8589 - val_loss: 0.3216 - val_acc: 0.9113\n", "Epoch 8/20\n", " - 1s - loss: 0.4442 - acc: 0.8681 - val_loss: 0.3063 - val_acc: 0.9146\n", "Epoch 9/20\n", " - 1s - loss: 0.4234 - acc: 0.8746 - val_loss: 0.2935 - val_acc: 0.9179\n", "Epoch 10/20\n", " - 1s - loss: 0.4053 - acc: 0.8797 - val_loss: 0.2828 - val_acc: 0.9189\n", "Epoch 11/20\n", " - 1s - loss: 0.3892 - acc: 0.8843 - val_loss: 0.2728 - val_acc: 0.9226\n", "Epoch 12/20\n", " - 1s - loss: 0.3750 - acc: 0.8885 - val_loss: 0.2635 - val_acc: 0.9254\n", "Epoch 13/20\n", " - 1s - loss: 0.3651 - acc: 0.8904 - val_loss: 0.2560 - val_acc: 0.9266\n", "Epoch 14/20\n", " - 1s - loss: 0.3501 - acc: 0.8970 - val_loss: 0.2487 - val_acc: 0.9304\n", "Epoch 15/20\n", " - 1s - loss: 0.3421 - acc: 0.8992 - val_loss: 0.2412 - val_acc: 0.9326\n", "Epoch 16/20\n", " - 1s - loss: 0.3311 - acc: 0.9023 - val_loss: 0.2351 - val_acc: 0.9327\n", "Epoch 17/20\n", " - 1s - loss: 0.3225 - acc: 0.9056 - val_loss: 0.2292 - val_acc: 0.9346\n", "Epoch 18/20\n", " - 1s - loss: 0.3135 - acc: 0.9074 - val_loss: 0.2233 - val_acc: 0.9354\n", "Epoch 19/20\n", " - 1s - loss: 0.3072 - acc: 0.9094 - val_loss: 0.2176 - val_acc: 0.9369\n", "Epoch 20/20\n", " - 1s - loss: 0.3006 - acc: 0.9123 - val_loss: 0.2128 - val_acc: 0.9379\n", "Training duration : 13.381985187530518\n", "10000/10000 [==============================] - 0s 35us/step\n", "Network's test score [loss, accuracy]: [0.21281864611394705, 0.9379]\n", "Loading data...\n", "Data loaded.\n", "Compiling Model ... \n", "Model compield in 0.09253859519958496 seconds\n", "####### Optimizer being Used: RMSprop\n", "Training model...\n", "Train on 60000 samples, validate on 10000 samples\n", "Epoch 1/20\n", " - 1s - loss: 0.8625 - acc: 0.8409 - val_loss: 0.2801 - val_acc: 0.9166\n", "Epoch 2/20\n", " - 1s - loss: 0.3014 - acc: 0.9236 - val_loss: 0.2317 - val_acc: 0.9443\n", "Epoch 3/20\n", " - 1s - loss: 0.2773 - acc: 0.9351 - val_loss: 0.1938 - val_acc: 0.9558\n", "Epoch 4/20\n", " - 1s - loss: 0.2864 - acc: 0.9372 - val_loss: 0.1707 - val_acc: 0.9631\n", "Epoch 5/20\n", " - 1s - loss: 0.2799 - acc: 0.9416 - val_loss: 0.1910 - val_acc: 0.9601\n", "Epoch 6/20\n", " - 1s - loss: 0.2876 - acc: 0.9445 - val_loss: 0.1602 - val_acc: 0.9652\n", "Epoch 7/20\n", " - 1s - loss: 0.3024 - acc: 0.9456 - val_loss: 0.1608 - val_acc: 0.9645\n", "Epoch 8/20\n", " - 1s - loss: 0.2887 - acc: 0.9469 - val_loss: 0.1865 - val_acc: 0.9650\n", "Epoch 9/20\n", " - 1s - loss: 0.3007 - acc: 0.9468 - val_loss: 0.2039 - val_acc: 0.9579\n", "Epoch 10/20\n", " - 1s - loss: 0.3076 - acc: 0.9470 - val_loss: 0.1843 - val_acc: 0.9643\n", "Epoch 11/20\n", " - 1s - loss: 0.3095 - acc: 0.9480 - val_loss: 0.2046 - val_acc: 0.9654\n", "Epoch 12/20\n", " - 1s - loss: 0.2962 - acc: 0.9508 - val_loss: 0.1813 - val_acc: 0.9725\n", "Epoch 13/20\n", " - 1s - loss: 0.3197 - acc: 0.9508 - val_loss: 0.2212 - val_acc: 0.9671\n", "Epoch 14/20\n", " - 1s - loss: 0.3212 - acc: 0.9512 - val_loss: 0.1949 - val_acc: 0.9709\n", "Epoch 15/20\n", " - 1s - loss: 0.3231 - acc: 0.9514 - val_loss: 0.2288 - val_acc: 0.9700\n", "Epoch 16/20\n", " - 1s - loss: 0.3479 - acc: 0.9486 - val_loss: 0.2139 - val_acc: 0.9690\n", "Epoch 17/20\n", " - 1s - loss: 0.3371 - acc: 0.9518 - val_loss: 0.1936 - val_acc: 0.9668\n", "Epoch 18/20\n", " - 1s - loss: 0.3541 - acc: 0.9504 - val_loss: 0.2487 - val_acc: 0.9664\n", "Epoch 19/20\n", " - 1s - loss: 0.3515 - acc: 0.9527 - val_loss: 0.2099 - val_acc: 0.9712\n", "Epoch 20/20\n", " - 1s - loss: 0.3672 - acc: 0.9517 - val_loss: 0.2479 - val_acc: 0.9709\n", "Training duration : 15.204601287841797\n", "10000/10000 [==============================] - 0s 38us/step\n", "Network's test score [loss, accuracy]: [0.24785369726756343, 0.9709]\n", "Loading data...\n", "Data loaded.\n", "Compiling Model ... \n", "Model compield in 0.08783793449401855 seconds\n", "####### Optimizer being Used: Adam\n", "Training model...\n", "Train on 60000 samples, validate on 10000 samples\n", "Epoch 1/20\n", " - 1s - loss: 0.3655 - acc: 0.8904 - val_loss: 0.1667 - val_acc: 0.9529\n", "Epoch 2/20\n", " - 1s - loss: 0.2550 - acc: 0.9281 - val_loss: 0.1373 - val_acc: 0.9599\n", "Epoch 3/20\n", " - 1s - loss: 0.2459 - acc: 0.9331 - val_loss: 0.1261 - val_acc: 0.9656\n", "Epoch 4/20\n", " - 1s - loss: 0.2257 - acc: 0.9408 - val_loss: 0.1384 - val_acc: 0.9637\n", "Epoch 5/20\n", " - 1s - loss: 0.2182 - acc: 0.9424 - val_loss: 0.1244 - val_acc: 0.9663\n", "Epoch 6/20\n", " - 1s - loss: 0.2291 - acc: 0.9413 - val_loss: 0.1303 - val_acc: 0.9658\n", "Epoch 7/20\n", " - 1s - loss: 0.2233 - acc: 0.9447 - val_loss: 0.1279 - val_acc: 0.9667\n", "Epoch 8/20\n", " - 1s - loss: 0.2233 - acc: 0.9445 - val_loss: 0.1321 - val_acc: 0.9683\n", "Epoch 9/20\n", " - 1s - loss: 0.2304 - acc: 0.9436 - val_loss: 0.1325 - val_acc: 0.9675\n", "Epoch 10/20\n", " - 1s - loss: 0.2096 - acc: 0.9484 - val_loss: 0.1218 - val_acc: 0.9679\n", "Epoch 11/20\n", " - 1s - loss: 0.1990 - acc: 0.9495 - val_loss: 0.1336 - val_acc: 0.9689\n", "Epoch 12/20\n", " - 1s - loss: 0.2045 - acc: 0.9508 - val_loss: 0.1349 - val_acc: 0.9668\n", "Epoch 13/20\n", " - 1s - loss: 0.2046 - acc: 0.9520 - val_loss: 0.1448 - val_acc: 0.9646\n", "Epoch 14/20\n", " - 1s - loss: 0.1984 - acc: 0.9539 - val_loss: 0.1350 - val_acc: 0.9707\n", "Epoch 15/20\n", " - 1s - loss: 0.2026 - acc: 0.9536 - val_loss: 0.1379 - val_acc: 0.9689\n", "Epoch 16/20\n", " - 1s - loss: 0.2211 - acc: 0.9507 - val_loss: 0.1409 - val_acc: 0.9671\n", "Epoch 17/20\n", " - 1s - loss: 0.2139 - acc: 0.9507 - val_loss: 0.1375 - val_acc: 0.9663\n", "Epoch 18/20\n", " - 1s - loss: 0.2072 - acc: 0.9538 - val_loss: 0.1456 - val_acc: 0.9685\n", "Epoch 19/20\n", " - 1s - loss: 0.1957 - acc: 0.9551 - val_loss: 0.1284 - val_acc: 0.9727\n", "Epoch 20/20\n", " - 1s - loss: 0.1956 - acc: 0.9563 - val_loss: 0.1484 - val_acc: 0.9650\n", "Training duration : 17.11251473426819\n", "10000/10000 [==============================] - 0s 40us/step\n", "Network's test score [loss, accuracy]: [0.14839486477555325, 0.965]\n", "Loading data...\n", "Data loaded.\n", "Compiling Model ... \n", "Model compield in 0.09581565856933594 seconds\n", "####### Optimizer being Used: Adagrad\n", "Training model...\n", "Train on 60000 samples, validate on 10000 samples\n", "Epoch 1/20\n", " - 1s - loss: 0.3521 - acc: 0.8922 - val_loss: 0.1426 - val_acc: 0.9553\n", "Epoch 2/20\n", " - 1s - loss: 0.1613 - acc: 0.9527 - val_loss: 0.1168 - val_acc: 0.9636\n", "Epoch 3/20\n", " - 1s - loss: 0.1265 - acc: 0.9625 - val_loss: 0.0945 - val_acc: 0.9716\n", "Epoch 4/20\n", " - 1s - loss: 0.1059 - acc: 0.9679 - val_loss: 0.0830 - val_acc: 0.9745\n", "Epoch 5/20\n", " - 1s - loss: 0.0955 - acc: 0.9711 - val_loss: 0.0766 - val_acc: 0.9762\n", "Epoch 6/20\n", " - 1s - loss: 0.0844 - acc: 0.9741 - val_loss: 0.0719 - val_acc: 0.9779\n", "Epoch 7/20\n", " - 1s - loss: 0.0748 - acc: 0.9772 - val_loss: 0.0702 - val_acc: 0.9775\n", "Epoch 8/20\n", " - 1s - loss: 0.0686 - acc: 0.9791 - val_loss: 0.0669 - val_acc: 0.9788\n", "Epoch 9/20\n", " - 1s - loss: 0.0633 - acc: 0.9804 - val_loss: 0.0649 - val_acc: 0.9793\n", "Epoch 10/20\n", " - 1s - loss: 0.0594 - acc: 0.9821 - val_loss: 0.0659 - val_acc: 0.9792\n", "Epoch 11/20\n", " - 1s - loss: 0.0554 - acc: 0.9829 - val_loss: 0.0610 - val_acc: 0.9807\n", "Epoch 12/20\n", " - 1s - loss: 0.0521 - acc: 0.9839 - val_loss: 0.0613 - val_acc: 0.9809\n", "Epoch 13/20\n", " - 1s - loss: 0.0486 - acc: 0.9851 - val_loss: 0.0622 - val_acc: 0.9808\n", "Epoch 14/20\n", " - 1s - loss: 0.0473 - acc: 0.9851 - val_loss: 0.0601 - val_acc: 0.9818\n", "Epoch 15/20\n", " - 1s - loss: 0.0438 - acc: 0.9864 - val_loss: 0.0585 - val_acc: 0.9813\n", "Epoch 16/20\n", " - 1s - loss: 0.0416 - acc: 0.9874 - val_loss: 0.0579 - val_acc: 0.9826\n", "Epoch 17/20\n", " - 1s - loss: 0.0396 - acc: 0.9879 - val_loss: 0.0587 - val_acc: 0.9819\n", "Epoch 18/20\n", " - 1s - loss: 0.0385 - acc: 0.9880 - val_loss: 0.0576 - val_acc: 0.9823\n", "Epoch 19/20\n", " - 1s - loss: 0.0349 - acc: 0.9895 - val_loss: 0.0567 - val_acc: 0.9820\n", "Epoch 20/20\n", " - 1s - loss: 0.0349 - acc: 0.9893 - val_loss: 0.0570 - val_acc: 0.9825\n", "Training duration : 15.652199983596802\n", "10000/10000 [==============================] - 0s 39us/step\n", "Network's test score [loss, accuracy]: [0.05698278585342341, 0.9825]\n" ] } ], "source": [ "model1,losses1 = run_network(Optimizer='SGD')\n", "model2,losses2 = run_network(Optimizer='RMSprop')\n", "model3,losses3 = run_network(Optimizer='Adam')\n", "model4,losses4 = run_network(Optimizer='Adagrad')" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 362 }, "colab_type": "code", "executionInfo": { "elapsed": 1112, "status": "ok", "timestamp": 1531761248543, "user": { "displayName": "Rishhanth Maanav", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "116055284600069515854" }, "user_tz": -330 }, "id": "9uv-RZBHooTA", "outputId": "2509eb4a-17aa-4d91-95ba-b64c49066642" }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.title(\"Comparison of Various Optimizers' Performance\")\n", "plt.plot(losses1)\n", "plt.plot(losses2)\n", "plt.plot(losses3)\n", "plt.plot(losses4)\n", "plt.legend(['SGD','RMSProp','Adam','Adagrad'])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "B3vv0ICWEe5N" }, "source": [ "# Time to train your own neural network !!!" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "clk5SDoU_sw4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: matplotlib in /opt/anaconda3/envs/py35/lib/python3.5/site-packages (2.2.2)\n", "Requirement already satisfied: numpy>=1.7.1 in /opt/anaconda3/envs/py35/lib/python3.5/site-packages (from matplotlib) (1.14.2)\n", "Requirement already satisfied: cycler>=0.10 in /opt/anaconda3/envs/py35/lib/python3.5/site-packages (from matplotlib) (0.10.0)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/anaconda3/envs/py35/lib/python3.5/site-packages (from matplotlib) (2.2.0)\n", "Requirement already satisfied: python-dateutil>=2.1 in /opt/anaconda3/envs/py35/lib/python3.5/site-packages (from matplotlib) (2.7.2)\n", "Requirement already satisfied: pytz in /opt/anaconda3/envs/py35/lib/python3.5/site-packages (from matplotlib) (2018.3)\n", "Requirement already satisfied: six>=1.10 in /opt/anaconda3/envs/py35/lib/python3.5/site-packages (from matplotlib) (1.11.0)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /opt/anaconda3/envs/py35/lib/python3.5/site-packages (from matplotlib) (1.0.1)\n", "Requirement already satisfied: setuptools in /opt/anaconda3/envs/py35/lib/python3.5/site-packages (from kiwisolver>=1.0.1->matplotlib) (39.0.1)\n", "Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz\n", "170500096/170498071 [==============================] - 174s 1us/step\n" ] } ], "source": [ "!pip install matplotlib\n", "\n", "from keras.datasets import cifar10\n", "import numpy as np\n", "(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "Gmer73id_vpo" }, "outputs": [], "source": [ "from keras.utils import to_categorical\n", " \n", "print('Training data shape : ', train_images.shape, train_labels.shape)\n", " \n", "print('Testing data shape : ', test_images.shape, test_labels.shape)\n", " \n", "# Find the unique numbers from the train labels\n", "classes = np.unique(train_labels)\n", "nClasses = len(classes)\n", "print('Total number of outputs : ', nClasses)\n", "print('Output classes : ', classes)\n", " \n", "plt.figure(figsize=[5,2])\n", " \n", "# Display the first image in training data\n", "plt.subplot(121)\n", "plt.imshow(train_images[0,:,:], cmap='gray')\n", "plt.title(\"Ground Truth : {}\".format(train_labels[0]))\n", " \n", "# Display the first image in testing data\n", "plt.subplot(122)\n", "plt.imshow(test_images[0,:,:], cmap='gray')\n", "plt.title(\"Ground Truth : {}\".format(test_labels[0]))\n" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "1w-ldsNx_4b4" }, "outputs": [], "source": [ "\n", "# Change from matrix to array of dimension 28x28 to array of dimention 784\n", "dimData = np.prod(train_images.shape[1:])\n", "train_data = train_images.reshape(train_images.shape[0], dimData)\n", "test_data = test_images.reshape(test_images.shape[0], dimData)" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "iYqqvCQ8_5Qt" }, "outputs": [], "source": [ "# Change to float datatype\n", "train_data = train_data.astype('float32')\n", "test_data = test_data.astype('float32')\n", " \n", "# Scale the data to lie between 0 to 1\n", "train_data /= 255\n", "test_data /= 255" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "nNPTvC8S_7Gx" }, "outputs": [], "source": [ "# Change the labels from integer to categorical data\n", "train_labels_one_hot = to_categorical(train_labels)\n", "test_labels_one_hot = to_categorical(test_labels)\n", " \n", "# Display the change for category label using one-hot encoding\n", "print('Original label 0 : ', train_labels[0])\n", "print('After conversion to categorical ( one-hot ) : ', train_labels_one_hot[0])" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "fpVcf7kSADBU" }, "outputs": [], "source": [ "from keras.models import Sequential\n", "from keras.layers import Dense\n", " \n", "model = Sequential()\n", "\n", "model.add(Dense(720, activation='sigmoid', input_shape=(dimData,)))\n", "model.add(Dense(720, activation='relu', input_shape=(dimData,)))\n", "model.add(Dense(720, activation='sigmoid', input_shape=(dimData,)))\n", "model.add(Dense(720, activation='relu', input_shape=(dimData,)))\n", "model.add(Dense(720, activation='sigmoid', input_shape=(dimData,)))\n", "model.add(Dense(720, activation='relu', input_shape=(dimData,)))\n", "model.add(Dense(720, activation='sigmoid', input_shape=(dimData,)))\n", "model.add(Dense(720, activation='relu', input_shape=(dimData,)))\n", "model.add(Dense(720, activation='sigmoid', input_shape=(dimData,)))\n", "model.add(Dense(720, activation='relu', input_shape=(dimData,)))\n", "model.add(Dense(720, activation='relu'))\n", "model.add(Dense(nClasses, activation='softmax'))" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "1SglB_EnAFSQ" }, "outputs": [], "source": [ "model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "jEqD8W6KAGHa" }, "outputs": [], "source": [ "history = model.fit(train_data, train_labels_one_hot, batch_size=256, epochs=20, verbose=1, \n", " validation_data=(test_data, test_labels_one_hot))" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "GqRpjKusAKUA" }, "outputs": [], "source": [ "[test_loss, test_acc] = model.evaluate(test_data, test_labels_one_hot)\n", "print(\"Evaluation result on Test Data : Loss = {}, accuracy = {}\".format(test_loss, test_acc))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "mKJ78Hi4AOjq" }, "source": [ "## Try to maximize the test accuracy !!! Take it as a challenge. (Tuning the parameters is basically an ART :p)" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "OMtvA5tzwCxo" }, "outputs": [], "source": [] } ], "metadata": { "accelerator": "GPU", "celltoolbar": "Slideshow", "colab": { "collapsed_sections": [], "default_view": {}, "name": "Session2.ipynb", "provenance": [], "toc_visible": true, "version": "0.3.2", "views": {} }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.5" } }, "nbformat": 4, "nbformat_minor": 1 }