{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Artificial Neural Networks \n", "by Shawn Rhoads, Georgetown University (NSCI 526)\n", "
(adapted from [Shamdasani](https://dev.to/shamdasani/build-a-flexible-neural-network-with-backpropagation-in-python))\n", "***\n", "Recall from class, artificial neural networks are typically organized into three main layers: the input layer, the hidden layer, and the output layer. There are several inputs (also called features) that produce output(s) (also called a label(s))\n", "\n", "In a feed forward network information always moves one direction without cycles/loops in the network; it never goes backwards ([Wikipedia](https://en.wikipedia.org/wiki/Feedforward_neural_network)):\n", "![Feedforward Neural Net](https://thedatamage.com/wp-content/uploads/2018/08/0_0mia7BQKjUAuXeqZ.jpeg)\n", "\n", "Above, the circles represent \"neurons\" while the lines represent \"synapses\". The role of a synapse is to take the multiply the inputs and weights. You can think of weights as the \"strength\" of the connection between neurons. Weights primarily define the output of a neural network. However, they are highly flexible. After, an activation function is applied to return an output.\n", "\n", "## Here's a brief overview of how a simple feedforward neural network works:\n", "1. Takes inputs as a matrix (2D array of numbers)\n", "2. Multiplies the input by a set weights (performs a dot product aka matrix multiplication)\n", "3. Applies an activation function\n", "4. Returns an output\n", "5. Error is calculated by taking the difference from the desired output from the data and the predicted output. This creates our gradient descent, which we can use to alter the weights\n", "6. The weights are then altered slightly according to the error.\n", "7. To train, this process is repeated 1,000+ times. The more the data is trained upon, the more accurate our outputs will be.\n", "\n", "> \"They just perform a dot product with the input and weights and apply an activation function. When weights are adjusted via the gradient of loss function, the network adapts to the changes to produce more accurate outputs.\"\n", ">

(via [Shamdasani](https://dev.to/shamdasani/build-a-flexible-neural-network-with-backpropagation-in-python))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What does this mean?\n", "\n", "Let's model a single hidden later with three inputs and one output. In the network, we will be predicting the score of our exam based on the inputs of how many hours we studied and how many hours we slept the day before. Our test score is the output. Here's our sample data of what we'll be training our Neural Network on: " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will want to predict the test score of someone who studied for four hours and slept for eight hours based on their prior performance.\n", "\n", "Our inputs (`X`) are in hours. Our output (`y`) is a test score from 0-100. Therefore, we need to scale our data by dividing by the maximum value for each variable." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Hours studied, Hours slept]\n", "[[0.66666667 1. ]\n", " [0.33333333 0.55555556]\n", " [1. 0.66666667]]\n", "\n", "[Scores on test]\n", "[0.92 0.86 0.89]\n" ] } ], "source": [ "# X = (hours studying, hours sleeping), y = score on test\n", "X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)\n", "y = np.array((92, 86, 89), dtype=float)\n", "\n", "# scale units\n", "X_max = np.amax(X, axis=0)\n", "X = X/X_max # maximum of X array\n", "y = y/100 # max test score is 100\n", "\n", "# print\n", "print('[Hours studied, Hours slept]')\n", "print(X.view())\n", "\n", "print('\\n[Scores on test]')\n", "print(y.view())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Synapses perform a dot product of the input and weight. For our first calculation, we will generate random weights between 0 and 1.\n", "\n", "Our input data, `X`, is a 3x2 matrix. Our output data, `y`, is a 3x1 matrix. Each element in matrix `X` needs to be multiplied by a corresponding weight and then added together with all the other results for each neuron in the hidden layer. \n", "\n", "First, the products of the random generated weights (.2, .6, .1, .8, .3, .7) on each synapse and the corresponding inputs are summed to arrive as the first values of the hidden layer. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "W1 = np.array(([.2, .6, .1], [.8, .3, .7]), dtype=float)\n", "W2 = np.array([.4, .5, .9], dtype=float)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's how the first input data element (2 hours studying and 9 hours sleeping) would calculate an output in the network:\n", "\n", "![Example algorithm](./images/feedforward_net_small.png)\n", "\n", "Here is our first calculation for the hidden layer above:\n", "\n", "$\\begin{align*} \\mathbf{X_{1}} \\cdot \\mathbf{W1} &= \\begin{bmatrix} x_{11} & x_{12} \\end{bmatrix} \\cdot \\begin{bmatrix} w_{11} & w_{12} & w_{13} \\\\ w_{21} & w_{22} & w_{23} \\end{bmatrix} \\\\ \\\\ &= \\begin{bmatrix} x_{12}w_{11} + x_{12}w_{21} & x_{12}w_{12} + x_{12}w_{22} & x_{12}w_{13} + x_{12}w_{23} \\end{bmatrix} \\\\ \\\\ &= \\begin{bmatrix} .67 & 1 \\end{bmatrix} \\cdot \\begin{bmatrix} .2 & .6 & .1 \\\\ .8 & .3 & .7 \\end{bmatrix} \\\\ \\\\ &= \\begin{bmatrix} (.67*.2 + 1*.8) & (.67*.6 + 1*.3) & (.67*.1 + 1*.7) \\end{bmatrix} \\\\ \\\\ &= \\begin{bmatrix} 0.93 & 0.70 & 0.77 \\end{bmatrix} \\end{align*}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To obtain the final value for the hidden layer, we need to apply an **activation function**, which will introduce nonlinearity. One advantage of this is that the output is mapped from a range of 0 and 1, making it easier to alter weights in the future.\n", "\n", "**Enter the sigmoid function:**\n", "\n", "$\\begin{align*} f(x) = \\frac{1}{1+e^{-\\beta x}} \\end{align*}$\n", "\n", "![Sigmoid](https://miro.medium.com/max/730/1*Sek4P_MzBAipJJpwA8iS7Q.png)\n", "\n", "Thus our calculation for the output above:\n", "```\n", "1 / (1 + np.exp(-0.93)) = 0.72\n", "1 / (1 + np.exp(-0.70)) = 0.67\n", "1 / (1 + np.exp(-0.77)) = 0.68\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rinse, repeat for output layer:\n", "\n", "$\\begin{align*} \\beta \\cdot \\mathbf{F} &= \\begin{bmatrix} \\beta_{11} & \\beta_{21} & \\beta_{31} \\end{bmatrix} \\cdot \\begin{bmatrix} f_{11} \\\\ f_{12} \\\\ f_{13} \\end{bmatrix} \\\\ \\\\ &= \\begin{bmatrix} (\\beta_{11}*f_{11}) + (\\beta_{21}*f_{12}) + (\\beta_{31}*f_{13}) \\end{bmatrix} \\\\ \\\\ &= \\begin{bmatrix} .4 & .5 & .9 \\end{bmatrix} \\cdot \\begin{bmatrix} .72 \\\\ .67 \\\\ .68 \\end{bmatrix} = \\begin{bmatrix} (.4*.72) + (.5*.67) + (.9*.68) \\end{bmatrix} \\\\ \\\\ &= 1.24 \\end{align*}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pass through activation (sigmoid) function:\n", "\n", "```\n", "1 / (1 + np.exp(-1.24)) = 0.77\n", "```\n", "\n", "Theoretically, our neural network would calculate `.77` as our test score. However, our target was `.92`. Does not perform quite as good as one could hope!\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feedforward Implementation\n", "\n", "Now, we are ready to write a forward propagation function. Let's pass in our input, `X`. We can use the variable `z` to simulate the activity between the input and output layers.\n", "\n", "Remember, we will need to take a dot product of the inputs and weights, apply the activation function, take another dot product of the hidden layer and second set of weights, and lastly apply a final activation function to recieve our output:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# FORWARD PROPAGATION\n", "def forward(X,W1,W2):\n", " z = np.dot(X, W1) # dot product of X (input) and first set of 3x2 weights\n", " print(\"z=\")\n", " print(z.view())\n", " print()\n", "\n", " z2 = sigmoid(z) # activation function\n", " print(\"z2=\")\n", " print(z2.view())\n", " print()\n", "\n", " z3 = np.dot(z2, W2) # dot product of hidden layer (z2) and second set of 3x1 weights\n", " print(\"z3=\")\n", " print(z3.view())\n", " print()\n", "\n", " o = sigmoid(z3) # final activation function\n", " print(\"o=\")\n", " print(o.view())\n", " print()\n", "\n", " return z,z2,z3,o" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def sigmoid(s):\n", " # activation function\n", " return 1/(1+np.exp(-s))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "z=\n", "[[0.93333333 0.7 0.76666667]\n", " [0.51111111 0.36666667 0.42222222]\n", " [0.73333333 0.8 0.56666667]]\n", "\n", "z2=\n", "[[0.71775106 0.66818777 0.68279939]\n", " [0.62506691 0.59065328 0.60401489]\n", " [0.67553632 0.68997448 0.63799367]]\n", "\n", "z3=\n", "[1.23571376 1.0889668 1.18939607]\n", "\n", "o=\n", "[0.77481705 0.74818711 0.76663304]\n", "\n", "Predicted Output: \n", "[0.77481705 0.74818711 0.76663304]\n", "Actual Output: \n", "[0.92 0.86 0.89]\n" ] } ], "source": [ "z,z2,z3,o = forward(X,W1,W2)\n", "\n", "print(\"Predicted Output: \\n\" + str(o))\n", "print(\"Actual Output: \\n\" + str(y))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how terrible this performs! With our simple feedforward network, we aren't able to predict our test scores very well. Why? Our network needs to learn!\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Enter Backpropagation\n", "\n", "Since we initialize with a random set of weights, we need to alter them to make our inputs equal to the corresponding outputs from our data set. This is done through **backpropagation**, which works by using a **loss function** to calculate how far the network was from the target output. \n", "\n", "Like the activation function, there is no one-size-fits-all loss function. Two common loss functions include: \n", "\n", "__mean absolute error (L1 Loss):__ measured as the average of sum of absolute differences between predictions and actual observations; more robust to outliers since it does not make use of square\n", "\n", "$$\n", "MAE = \\frac {\\sum_{i=1}^n |y_i - \\hat{y_i}|}{n}\n", "$$\n", "\n", "__mean square error (L2 Loss):__ the average of squared difference between predictions and actual observations; due to squaring, predictions that are far away from actual values are penalized heavily in comparison to less deviated predictions\n", "\n", "$$\n", "MSE = \\frac {\\sum_{i=1}^n (y_i - \\hat{y_i})^2}{n}\n", "$$\n", "\n", "Using our example: `o` is our predicted output, and `y` is our actual output. Our goal is to get our loss function as close as we can to `0`, meaning we will need to have close to no loss at all. \n", "\n", "Training = minimizing the loss. \n", "\n", "$$\n", "Loss = \\frac {\\sum (o - y)^2}{2}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Enter Gradient Descent\n", "To figure out which direction to alter our weights, we need to find the rate of change of our loss with respect to our weights (i.e., we need to use the derivative of the loss function to understand how the weights affect the input):\n", "\n", "![gradient descent](https://raw.githubusercontent.com/bfortuner/ml-cheatsheet/master/docs/images/gradient_descent_demystified.png)\n", "\n", "*via https://github.com/bfortuner/ml-cheatsheet/blob/master/docs/gradient_descent.rst*\n", "\n", "### Here's how we will calculate the incremental change to our weights:\n", "1. Find the margin of error of the output layer `o` by taking the difference of the predicted output and the actual output `y`.\n", "2. Apply the derivative of our sigmoid activation function to the output layer error. We call this result the **delta output sum**.\n", "3. Use the delta output sum of the output layer error to figure out how much our `z2` (hidden) layer contributed to the output error by performing a dot product with our second weight matrix. We can call this the z^2 error.\n", "4. Calculate the delta output sum for the `z2` layer by applying the derivative of our sigmoid activation function (just like step 2).\n", "5. Adjust the weights for the first layer by performing a dot product of the input layer with the **hidden delta output sum**. For the second weight, perform a dot product of the hidden(`z2`) layer and the **output (`o`) delta output sum**.\n", "\n", "Calculating the delta output sum and then applying the derivative of the sigmoid function are very important to backpropagation. The derivative of the sigmoid, also known as **sigmoid prime**, will give us the rate of change, or slope, of the activation function at output sum.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Backprop Implementation\n", "Let's continue to code our `Neural_Network` class by adding a `sigmoidPrime` (derivative of sigmoid) function and a `backward` propagation function that performs the four steps above. \n", "\n", "Then, we can define our output through initiating foward propagation and intiate the backward function by calling it in a `train` function: " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# BACKPROPAGATION with Mean Square Error Minimization\n", "def backward(X,W1,W2,y,z,z2,z3,o):\n", " # backward propgate through the network\n", " o_error = np.square(y - o) # error in output\n", " print(\"Error in output (o_error):\")\n", " print(o_error.view())\n", " print()\n", "\n", " o_delta = o_error*sigmoidPrime(o)\n", " print(\"Applied gradient of sigmoid to error (o_delta):\")\n", " print(o_delta.view())\n", " print()\n", "\n", " z2_error = o_delta.dot(W2.T) \n", " print(\"How much our hidden layer weights contributed to output error (z2 error):\")\n", " print(z2_error.view())\n", " print()\n", "\n", " z2_delta = z2_error*sigmoidPrime(z2)\n", " print(\"Applied gradient of sigmoid to z2 error (z2_delta):\")\n", " print(z2_error.view())\n", " print()\n", "\n", " W1_b = W1 + X.T.dot(z2_delta)\n", " print(\"Adjusting first set (input --> hidden) weights:\")\n", " print(\"W1=\")\n", " print(W1_b.view())\n", " print()\n", "\n", " W2_b = W2 + z2.T.dot(o_delta)\n", " print(\"Adjusting second set (hidden --> output) weights:\")\n", " print(\"W2=\")\n", " print(W2_b.view())\n", " print()\n", " \n", " return W1_b, W2_b" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def sigmoidPrime(s):\n", " #derivative of sigmoid (e.g., gradient)\n", " return s * (1 - s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try running one iteration of backpropagation:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FORWARD PROPOGATION #1\n", "\n", "z=\n", "[[0.93333333 0.7 0.76666667]\n", " [0.51111111 0.36666667 0.42222222]\n", " [0.73333333 0.8 0.56666667]]\n", "\n", "z2=\n", "[[0.71775106 0.66818777 0.68279939]\n", " [0.62506691 0.59065328 0.60401489]\n", " [0.67553632 0.68997448 0.63799367]]\n", "\n", "z3=\n", "[1.23571376 1.0889668 1.18939607]\n", "\n", "o=\n", "[0.77481705 0.74818711 0.76663304]\n", "\n", "----------------------\n", "\n", "BACK PROPOGATION #1\n", "\n", "Error in output (o_error):\n", "[0.02107809 0.01250212 0.01521941]\n", "\n", "Applied gradient of sigmoid to error (o_delta):\n", "[0.00367761 0.00235544 0.00272286]\n", "\n", "How much our hidden layer weights contributed to output error (z2 error):\n", "0.005099334735433654\n", "\n", "Applied gradient of sigmoid to z2 error (z2_delta):\n", "0.005099334735433654\n", "\n", "Adjusting first set (input --> hidden) weights:\n", "W1=\n", "[[0.20220476 0.6022555 0.10232058]\n", " [0.80244211 0.30254275 0.70256718]]\n", "\n", "Adjusting second set (hidden --> output) weights:\n", "W2=\n", "[0.40595131 0.50572728 0.90567096]\n", "\n", "----------------------\n", "\n", "FORWARD PROPOGATION #2\n", "\n", "z=\n", "[[0.93724529 0.70404641 0.7707809 ]\n", " [0.51320276 0.36883114 0.42442196]\n", " [0.73716617 0.80395066 0.5706987 ]]\n", "\n", "z2=\n", "[[0.71854288 0.6690843 0.68368979]\n", " [0.62555698 0.59117651 0.6045409 ]\n", " [0.67637587 0.69081893 0.63892438]]\n", "\n", "z3=\n", "[1.2492656 1.1004349 1.2025969]\n", "\n", "o=\n", "[0.77717271 0.75034158 0.76898644]\n", "\n" ] } ], "source": [ "print(\"FORWARD PROPOGATION #1\\n\")\n", "z,z2,z3,o = forward(X,W1,W2)\n", "print(\"----------------------\\n\")\n", "print(\"BACK PROPOGATION #1\\n\")\n", "W1_b, W2_b = backward(X,W1,W2,y,z,z2,z3,o)\n", "print(\"----------------------\\n\")\n", "print(\"FORWARD PROPOGATION #2\\n\")\n", "z,z2,z3,o = forward(X,W1_b,W2_b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the outputs are much better after implementing backprop twice.. \n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's train our network!\n", "\n", "We will define a python `class` and insert our functions from above. We will add an `init` function where we'll specify our parameters such as the input, hidden, and output layers." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Hours studied, Hours slept]\n", "[[0.66666667 1. ]\n", " [0.33333333 0.55555556]\n", " [1. 0.66666667]]\n", "\n", "[Scores on test]\n", "[[0.92]\n", " [0.86]\n", " [0.89]]\n" ] } ], "source": [ "# X = (hours studying, hours sleeping), y = score on test\n", "X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)\n", "y = np.array(([92], [86], [89]), dtype=float)\n", "\n", "# scale units\n", "X_max = np.amax(X, axis=0)\n", "X = X/X_max # maximum of X array\n", "y = y/100 # max test score is 100\n", "\n", "# print\n", "print('[Hours studied, Hours slept]')\n", "print(X.view())\n", "\n", "print('\\n[Scores on test]')\n", "print(y.view())" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "class Neural_Network(object):\n", " \n", " def __init__(self):\n", " #parameters\n", " self.inputNum = 2 # two inputs\n", " self.outputNum = 1 # 1 ouput\n", " self.hiddenNum = 3 # 3 nodes in our hidden layer\n", "\n", " #weights\n", " np.random.seed(2019) #set random seed for reproducibility\n", " self.W1 = np.random.randn(self.inputNum, self.hiddenNum) # (3x2) weight matrix from input to hidden layer\n", " self.W2 = np.random.randn(self.hiddenNum, self.outputNum) # (3x1) weight matrix from hidden to output layer\n", " \n", " def forward(self, X):\n", " #forward propagation through our network\n", " self.z = np.dot(X, self.W1) # dot product of X (input) and first set of 3x2 weights\n", " self.z2 = self.sigmoid(self.z) # activation function\n", " self.z3 = np.dot(self.z2, self.W2) # dot product of hidden layer (z2) and second set of 3x1 weights\n", " o = self.sigmoid(self.z3) # final activation function\n", " return o \n", " \n", " def sigmoid(self, s):\n", " # activation function\n", " return 1/(1+np.exp(-s))\n", " \n", " def sigmoidPrime(self, s):\n", " #derivative of sigmoid\n", " return s * (1 - s)\n", "\n", " def backward(self, X, y, o):\n", " # backward propgate through the network\n", " self.o_error = np.square(y - o) # error in output\n", " self.o_delta = self.o_error*self.sigmoidPrime(o) # applying derivative of sigmoid to error\n", "\n", " self.z2_error = self.o_delta.dot(self.W2.T) # z2 error: how much our hidden layer weights contributed to output error\n", " self.z2_delta = self.z2_error*self.sigmoidPrime(self.z2) # applying derivative of sigmoid to z2 error\n", "\n", " self.W1 += X.T.dot(self.z2_delta) # adjusting first set (input --> hidden) weights\n", " self.W2 += self.z2.T.dot(self.o_delta) # adjusting second set (hidden --> output) weights\n", " \n", " def train (self, X, y):\n", " o = self.forward(X)\n", " self.backward(X, y, o)\n", " \n", " def test(self, X):\n", " # test using new data\n", " return self.forward(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To run the network, all we have to do is to run the `train` function. \n", "\n", "We will want to do this multiple (e.g., hundreds) of times. So, we'll use a for loop:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interation: #0\n", "Input: \n", "[[0.66666667 1. ]\n", " [0.33333333 0.55555556]\n", " [1. 0.66666667]]\n", "Actual Output: \n", "[[0.92]\n", " [0.86]\n", " [0.89]]\n", "Predicted Output: \n", "[[0.59829797]\n", " [0.58952764]\n", " [0.58856997]]\n", "Mean square error: \n", "0.089169184931786\n", "\n", "\n", "Interation: #50\n", "Input: \n", "[[0.66666667 1. ]\n", " [0.33333333 0.55555556]\n", " [1. 0.66666667]]\n", "Actual Output: \n", "[[0.92]\n", " [0.86]\n", " [0.89]]\n", "Predicted Output: \n", "[[0.81578845]\n", " [0.78576724]\n", " [0.81093399]]\n", "Mean square error: \n", "0.0075406612338980985\n", "\n", "\n", "Interation: #100\n", "Input: \n", "[[0.66666667 1. ]\n", " [0.33333333 0.55555556]\n", " [1. 0.66666667]]\n", "Actual Output: \n", "[[0.92]\n", " [0.86]\n", " [0.89]]\n", "Predicted Output: \n", "[[0.84301206]\n", " [0.81213973]\n", " [0.83894441]]\n", "Mean square error: \n", "0.0036081405451177857\n", "\n", "\n", "Interation: #150\n", "Input: \n", "[[0.66666667 1. ]\n", " [0.33333333 0.55555556]\n", " [1. 0.66666667]]\n", "Actual Output: \n", "[[0.92]\n", " [0.86]\n", " [0.89]]\n", "Predicted Output: \n", "[[0.85560731]\n", " [0.82459002]\n", " [0.85190143]]\n", "Mean square error: \n", "0.002283928756475065\n", "\n", "\n", "Interation: #199\n", "Input: \n", "[[0.66666667 1. ]\n", " [0.33333333 0.55555556]\n", " [1. 0.66666667]]\n", "Actual Output: \n", "[[0.92]\n", " [0.86]\n", " [0.89]]\n", "Predicted Output: \n", "[[0.86312449]\n", " [0.83210664]\n", " [0.85963112]]\n", "Mean square error: \n", "0.0016450440135330274\n", "\n", "\n" ] } ], "source": [ "NN = Neural_Network()\n", "mse = np.array([]) # let's track how the mean sum squared error goes down as the network learns \n", "\n", "n_iter = 200\n", "for i in range(n_iter): # trains the network n_iter times\n", " mse = np.append(mse,np.mean(np.square(y - NN.forward(X)))) #store error\n", " # let's print the output every (n_iter/4) iteration and last:\n", " if i % (n_iter/4) == 0 or i == (n_iter-1):\n", " print(\"Interation: #%i\" % (i))\n", " print(\"Input: \\n\" + str(X) )\n", " print(\"Actual Output: \\n\" + str(y)) \n", " print(\"Predicted Output: \\n\" + str(NN.forward(X)) )\n", " print(\"Mean square error: \\n\" + str(mse[i])) # mean sum squared error\n", " print(\"\\n\")\n", " NN.train(X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Observe how the mean sum squared error goes down with each iteration. That's the network learning via gradient descent! \n", "\n", "We can also plot the gradient descent below (notice how quickly it converges):" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "fig, ax = plt.subplots()\n", "ax.plot(range(n_iter), mse)\n", "\n", "ax.set(xlabel='iteration', \n", " ylabel='MSE')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testing our network with new data (validation):\n", "How well does our classifier perform on unseen data? To test how well it performs on new data, let's pretend we split our hours studying/sleeping data into training and testing samples. The example above uses few training samples (N=3) to train our network. Let's use a \"testing sample\" to validate our neural network model." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "predicted scores=\n", "[0.86, 0.83, 0.84, 0.87, 0.85, 0.86]\n", "actual scores=\n", "[0.9, 0.88, 0.91, 0.95, 0.9, 0.92]\n" ] } ], "source": [ "X_test = np.array(([2, 8], [1, 4], [2, 4], [3, 8], [1,8], [2,9]), dtype=float)\n", "X_test = X_test/X_max\n", "y_actual = np.array(([.9], [.88], [.91], [.95], [.90], [.92]), dtype=float)\n", "\n", "# Test \n", "y_pred = NN.test(X_test)\n", "\n", "print(\"predicted scores=\")\n", "print([round(y_pred[i,0],2) for i in range(len(y_pred))])\n", "\n", "print(\"actual scores=\")\n", "print([y_actual[j,0] for j in range(len(y_actual))])" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "accuracy = 79.2%\n" ] } ], "source": [ "from scipy.stats.stats import pearsonr\n", "\n", "accuracy = pearsonr(y_pred,y_actual)\n", "print(str('accuracy = %03.1f%%' % (accuracy[0][0]*100)))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "ax.scatter(y_pred, y_actual)\n", "ax.set(xlabel='actual score',\n", "ylabel='predicted score')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }