{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

# Neural Networks Demystified

\n", "

## Part 2: Forward Propagation

\n", "\n", "\n", "

#### @stephencwelch

\n", "\n", "|Code Symbol | Math Symbol | Definition | Dimensions\n", "| :-: | :-: | :-: | :-: |\n", "|X|$$X$$|Input Data, each row in an example| (numExamples, inputLayerSize)|\n", "|y |$$y$$|target data|(numExamples, outputLayerSize)|\n", "|W1 | $$W^{(1)}$$ | Layer 1 weights | (inputLayerSize, hiddenLayerSize) |\n", "|W2 | $$W^{(2)}$$ | Layer 2 weights | (hiddenLayerSize, outputLayerSize) |\n", "|z2 | $$z^{(2)}$$ | Layer 2 activation | (numExamples, hiddenLayerSize) |\n", "|a2 | $$a^{(2)}$$ | Layer 2 activity | (numExamples, hiddenLayerSize) |\n", "|z3 | $$z^{(3)}$$ | Layer 3 activation | (numExamples, outputLayerSize) |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Last time, we setup our neural network on paper. This time, we’ll implement it in the programming language python. We’ll build our network as a python class and our init method will take care of instantiating important constants and variables. We’ll make these values accessible to the whole class by placing a self dot in front of each variable name." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our network has 2 inputs, 3 hidden units, and 1 output. These are examples of hyperparameters. Hyperparameters are constants that establish the structure and behavior of a neural network, but are not updated as we train the network. Our learning algorithm is not capable of, for example, deciding that it needs another hidden unit, this is something that WE must decide on before training. What a neural network does learn are parameters, specifically the weights on the synapses." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We’ll take care of moving data through our network in a method called forward. Rather than pass inputs through the network one at a time, we’re going to use matrices to pass through multiple inputs at once. Doing this allows for big computational speedups, especially when using tools like MATLAB or Numpy. Our input data matrix, X, is of dimension 3 by 2, because we have 3, 2-dimensional examples. Our corresponding output data, y, is of dimension 3 by 1." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "#Import code from last time\n", "%pylab inline\n", "from partOne import *" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3, 2) (3, 1)\n" ] } ], "source": [ "print(X.shape, y.shape)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "class Neural_Network(object):\n", " def __init__(self): \n", " #Define Hyperparameters\n", " self.inputLayerSize = 2\n", " self.outputLayerSize = 1\n", " self.hiddenLayerSize = 3\n", " \n", " def forward(self, X):\n", " #Propagate inputs though network\n", " pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each input value, or element in matrix X, needs to be multiplied by a corresponding weight and then added together with all the other results for each neuron. This is a complex operation, but if we take the three outputs we're looking for as a single row of a matrix, and place all our individual weights into a matrix of weights, we can create the exact behavior we need by multiplying our input data matrix by our weight matrix. Using matrix multiplication allows us to pass multiple inputs through at once by simply adding rows to the matrix X. From here on out, we'll refer to these matrics as X, W one, and z two, where z two the activity of our second layer. Notice that each entry in z is a sum of weighted inputs to each hidden neuron. Z is of size 3 by 3, one row for each example, and one column for each hidden unit. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have our first official formula, $z^{(2)} = XW^{(1)}$. Matrix notation is really nice here, because it allows us to express the complex underlying process in a single line!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "z^{(2)} = XW^{(1)} \\tag{1}\\\\\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have the activities for our second layer, z two, we need to apply the activation function. We'll independently apply the function to each entry in matrix z using a python method for this called sigmoid, because we’re using a sigmoid as our activation function. Using numpy is really nice here, because we can pass in a scalar, vector, or matrix, Numpy will apply the activation function element-wise, and return a result of the same dimension as it was given." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def sigmoid(z):\n", " #Apply sigmoid activation function to scalar, vector, or matrix\n", " return 1/(1+np.exp(-z))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "