# Introduction to Neural Prediction: Forward Propagation

In this chapter, we will:
- Create a simple network that makes a prediction.
- Understand what exactly is a neural network, and what does it do?
- We will make a prediction with multiple inputs.
- We will make a prediction with multiple outputs.
- We will make a prediction with multiple inputs and outputs.
- We will use predictions to make other predictions.

> [Warren Ellis] I try not to get envolved in the business of prediction. It's a quick way to look like an idiot.

## 1. Predict 

Previously, we have learned about the paradigm "Predict, Compare, Learn". In this chapter, we'll dive deep into the first step: **Predict**. For our first neural network, we will be predicting one data point at a time (single input, single output), like so:

<div style="text-align:center;"><img style="width:400px;" src="static/imgs/03/one-to-one.png" /></div>

The "shape" of the input has a significant impact on what a network looks like (its architecture). 

We usually start by inputting all of the information that correspond to the input entity (example: all pixel values of a cat image). If we are short on computational resources, we limit the inputs to the ones that we think will be "most" helpful in prediction. 

As a result, we can create a network only after we understand the **shape** of the input and output data sets.

We are going to build a network with a single knob mapping from the input point to the output. In representation learning, these. knobs are acutally called "weights".

Here's our first network, with a single "weight" mapping from the input "# toes" to the output "win?":

<div style="text-align:center;"><img style="width:400px;" src="static/imgs/03/first-network.png" /></div>

## A Simple Neural Network Making a prediction

Let's start with the simplest neural network possible:

In [2]:
# an empty network.
weight = .1
def neural_network(x, w):
    prediction = x * w
    return prediction

In [3]:
# inserting one input datapoint.
number_of_toes = [8.5, 9.5, 10, 9]
x0 = number_of_toes[0]
pred = neural_network(x0, weight)
print(pred)

0.8500000000000001


From the previous example, we can say that a neural network is of one or more weights that we can multiply by the input data to make a prediction. The input data is a numerical value that we measure in the real world, and a prediction is what. the neural network tells us, given the input data.

The prediction, however, is not always right. Sometimes the neural network makes mistakes, but what is important is that it learns from them. A neural network learns by following these steps:
1. It tries to make a prediction
2. It sees whether the predictin was too high or too low.
3. It changes the weight (up or down) to predict mre accurately the next time it sees the same input.

## Neural Network Internals
### It multiplies the Input by a Weight. It "Scales" the Input by a Certain Amount

A neural network, in its simplest form, uses the power of multiplication. Some weight values make parts of the input bigger, while others make other parts smaller.

A neural network considers the input value as "given information" and its weight values as **knowledge**. By using both, it outputs a prediction. A neural network uses the knowledge stored in its weights to interpret the information in the input. Weights can be interpreted as a measue of **sensitivity** between the input and the prediction (weight is a volume knob). 

As demonstrated in the previous example, the NN had access to only one input instance. This implies that if we were to feed in `number_of_toes[1]`, the NN wouldn't remmeber the prediction it made in the last timestep. All in all, a neural network knows only what you feed it as input, it forgets everything else.

Later, we will learn how to give a neural network a "short-term memory" by feeding in multiple inputs at once.

## Making a Prediction with Multiple Inputs

Now we want to use multiple inputs or "features" that describe the same input instance.


<div style="text-align:center;"><img style="width:400px;" src="static/imgs/03/many-to-one.png" /></div>

In [4]:
# Our implementation.
def w_sum(W, X):
    """Calculates W*X"""
    assert(len(W) == len(X))
    muls = list()
    for i in range(slen(ws)):
        muls.append(W[i] * X[i])
    return sum(muls)

In [6]:
def w_sum(a,b):
    """Books implementation of W*X"""
    assert(len(a) == len(b))
    output = 0
    for i in range(len(a)):
        output += (a[i] * b[i])
    return output

In [8]:
# An empty network with multiple inputs.
weights = [.1, .2, 0]
def neural_network(X, weights):
    pred = w_sum(X, weights)
    return pred

This dataset provides the following information at the beginning of each game for the first four games in a season:
- *toes*: current average number of toes per player.
- *wlrec*: current games won (in percent).
- *nfans*: fan count (in millions).

In [9]:
toes = [8.5, 9.5, 9.9, 9]
wlrec = [.65, .8, .8, .9]
nfans = [1.2, 1.3, .5, 1]
x0 = [toes[0], wlrec[0], nfans[0]]

In [10]:
neural_network(x0, weights)

0.9800000000000001

## Multiple Inputs: What Does this Network Do?

The network simply multiples the 3 inputs by its internal 3 knob weights then sums them. This is what we call a **weighted sum**.

As a result of the input having 3 values, the network also has 3 knobs, we multiply each input by its own weight. Because we have multiple inputs, we have to sum their respective local predictions. This is called the weighted sum or the **dot product**.  

#### Challenge: Vector Math
Being able to manipulate vectors is a cornerstore technique for representation learning. Let's implement simple functions that support the following operations:

In [13]:
def elementwise_multiplication(vec_a, vec_b):
    """Element-wise multiplication of two vectors of the same size."""
    assert(len(vec_a) == len(vec_b))
    muls = list()
    for i in range(len(vec_a)):
        muls.append(vec_a[i] * vec_b[i])
    return muls

In [14]:
def elementwise_addition(vec_a, vec_b):
    """Element-wise addition of two vectors of the same size."""
    assert(len(vec_a) == len(vec_b))
    adds = list()
    for i in range(len(vec_b)):
        adds.append(vec_a[i] + vec_b[i])
    return adds

In [15]:
def vector_sum(vec_a):
    """Sums the values of a vector."""
    assert(type(vec_a) == list)
    return sum(vec_a)

In [16]:
def vector_avg(vec_a):
    """Averages the values in a vector."""
    assert(type(vec_a) == list)
    return sum(vec_a) / len(vec_a)

In [17]:
a, b = [1,2,3], [4,5,6]
vector_sum(elementwise_multiplication(a, b))

32

The intuition behind the dot product operation is one of the most important parts of truly understanding how neural networks make predictions. Because we are summing up element-wise multiplications, the dot product gives us a measure of **similarity between two vectors**. Let's consider the following example:

```
a = [ 0, 1, 0, 1]         w_sum(a,b) = 0
b = [ 1, 0, 1, 0]         w_sum(b,c) = 1
c = [ 0, 1, 1, 0]         w_sum(b,d) = 1
d = [.5, 0,.5, 0]         w_sum(c,c) = 2
e = [ 0, 1,-1, 0]         w_sum(c,e) = 0
```

We can equate the properties of the dot product to the logical `AND`. In this analogy, negative weights tend to imply a logical `NOT` operator, and as we can observe, positive weights pairs with negative weights will cause the overall similarity score to go down.

Neural networks are also able to model partial `AND`ing. But after multiplication, comes the summation (`OR`). In the case of `OR`, if any feature result in a high product, it will effect the overall score.

Amusingly, this gives us a kind of crude language for reading weights. The following examples assume we're performing the dot product. In the case when `if` statments return `True`, we give back a high score:

```
W = [ 1, 0, 1] => if input[0] OR input[2]
W = [ 0, 0, 1] => if input[2]
W = [ 1, 0,-1] => if input[0] OR NOT input[2]
W = [-1, 0,-1] => if NOT input[0] OR NOT input[2]
W = [.5, 0, 1] => if BIG input[0] or input[2]
```

Takeaways:
- The neural network gives a high score to the input most similar to the weights.
- `nfans` is completely ignored in the prediction since its corresponding weight is **0**.
- Shuffling the weights completely changes the way the network makes predictions.

This analogy will help us significantly in the future, especially when putting networks together in increasingly complex ways.

## Multiple Inputs: Complete Runnable Code

Now's the time to use `numpy` (numerical python), which is a Python library, to optimize our neural network implementation: 

In [20]:
import numpy as np

In [23]:
weights = np.array([.1, .2, 0])
toes = [8.5, 9.5, 9.9, 9]
wlrec = [.65, .8, .8, .9]
nfans = [1.2, 1.3, .5, 1]

def neural_network(W, X):
    assert(W.shape[0] == X.shape[0])
    return np.dot(weights, X)

In [24]:
neural_network(weights, np.array([toes[0], wlrec[0], nfans[0]]))

0.9800000000000001

## Making a Prediction with Multiple Outputs

Neural networks can also make multiple predictions using only a single input. Prediction occurs the same as if there were 3 disconnected single-weight neural networks:

<div style="text-align:center;"><img style="width:400px;" src="static/imgs/03/NN-1-to-many.png" /></div>

In [25]:
def neural_network(W, X):
    """Note: NumPy performs accurate dot products based on the Input & Weight shapes."""
    return X * W

In [27]:
neural_network(np.array([3]), np.array([.2, .7, 0]))

array([0.6, 2.1, 0. ])

In [28]:
weights = [.3, .2, .9]

def neural_network(W, X):
    """Book's Implementation."""
    pred = ele_mul(X, W)
    return pred

In [29]:
def ele_mul(c, l):
    """Element-wise scalar multiplication."""
    assert(type(c) == int)
    result = list()
    for i in range(len(l)):
        result.append(c * l[i])
    return result

We should note that the three predictions are completely separate. Unike neural networks with multiple inputs & a single output where we sum products. This network truly behaves as three independent components, each receiving the same input data.

## Predicting with Multiple Inputs & Outputs

Finally, neural networks can also predict multiple outputs given multiple inputs:

<div style="text-align:center;"><img style="width:200px;" src="static/imgs/03/NN-many-to-many.png" /></div>

In [31]:
# We set the weights (knowledge) values
         #toes #%win #fans
weights = [[.1, .1, -.3],  # 1st Neuron: Hurt ?
           [.1, .2, .0],   # win ?
           [.0, 1.3, .1]]  # Sad ?

In [32]:
def neural_network(X, W):
    pred = vect_mat_mult(X, W)
    return pred

In [36]:
# Input [R^{1x3}] ; Weights [R_{3x3}]
def vect_mat_mult(vect, matrix):
    """Calculates X (vect) * W (matrix)"""
    assert(len(vect) == len(matrix))
    output = [0] * len(vect)
    for i in range(len(vect)):
        output[i] = w_sum(vect, matrix[i])
    return output

In [37]:
# inputs.
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [.65, .8, .8, .9]
nfans = [1.2, 1.3, .5, 1.0]

In [38]:
# one column.
x0 = [toes[0], wlrec[0], nfans[0]]

In [39]:
pred = neural_network(x0, weights); pred

[0.555, 0.9800000000000001, 0.9650000000000001]

## Multiple Inputs & Outputs: How does it work ?

In the case of multiple inputs & outputs, the network performs three independent weighted sums of the input to make three predictions:

<div style="text-align:center;"><img style="width:300px;" src="static/imgs/03/many-to-many-NN.png" /></div>

The network performs three independent dot products (weighted sums) to output the predictions. 

## Predicting on Predictions 

Neural networks can also be **stacked**:

<div style="text-align:center;"><img style="width:300px;" src="static/imgs/03/deep-NN.png" /></div>

We can also take the output of one network and feed it as input to another network. This will result in two consecutive vector-matrix multiplications. Let's try an example:

In [40]:
# A Network with multiple inputs and outputs
ih_wgt = [[0.1, 0.2, -0.1],
          [-0.1,0.1, 0.9],
          [0.1, 0.4, 0.1]]
hp_wgt = [[0.3, 1.1, -0.3],
          [0.1, 0.2, 0.0],
          [0.0, 1.3, 0.1]]

In [41]:
weights = [ih_wgt, hp_wgt]

In [42]:
def neural_network(X, W):
    hid = vect_mat_mult(X, W[0])
    pred = vect_mat_mult(hid, W[1])
    return pred

In [43]:
pred = neural_network(x0, weights); pred

[0.21350000000000002, 0.14500000000000002, 0.5065]

The following example shows us how we can perform the same computatios using a convenient Python library called `Numpy`. Using libraries like `NumPy` makes our code faster and easier to read and write.

In [44]:
import numpy as np 

In [45]:
ih_wgt = np.array(ih_wgt).transpose()
hp_wgt = np.array(hp_wgt).transpose()
weights = [ih_wgt, hp_wgt]

In [46]:
def neural_network(X, W):
    out = X.dot(W[0])
    pred = out.dot(W[1])
    return pred

In [47]:
toes = np.array(toes)
wlrec = np.array(wlrec)
nfans = np.array(nfans)

In [48]:
x0 = np.array([toes[0], wlrec[0], nfans[0]])

In [49]:
pred = neural_network(x0, weights); pred

array([0.2135, 0.145 , 0.5065])

## A Quick Primer on NumPy

`Numpy` adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. We note the following before diving into examples:

- We will keep using native python functions to be sure we fully understand what is going on inside them.
- We call a matrix with only one row as a vector.
- We create matrices by listing (rows, columns): **Rows come first, Columns come Second**.s

Let's check some `numpy` examples:

In [50]:
a = np.array([0,1,2,3])  # a vector.
b = np.array([4,5,6,7])  # another vector.
c = np.array([[0,1,2,3], [4,5,6,7]])  # A Matrix.
d = np.zeros((2,4))  # 2x4 matrix of zeros.
e = np.random.rand(2,5)  # 2x5 matrix of random number between 0 & 1.

In [52]:
print(a,b,c,d,e, sep='\n')

[0 1 2 3]
[4 5 6 7]
[[0 1 2 3]
 [4 5 6 7]]
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[0.1645878  0.29578112 0.18441271 0.14036276 0.07897252]
 [0.27901318 0.69096197 0.60646666 0.01898684 0.88302135]]


In [53]:
# element-wise multiplication.
print(a*.2)

[0.  0.2 0.4 0.6]


In [54]:
# element-wise multiplication.
print(c*.1)

[[0.  0.1 0.2 0.3]
 [0.4 0.5 0.6 0.7]]


In [55]:
# multiply two vectors (element wise).
print(a*b)

[ 0  5 12 21]


In [56]:
# complex element-wise multiplications.
print(a*b*.3)

[0.  1.5 3.6 6.3]


In [57]:
# element-wise row multiplications (because of compatible shapes).
print(a*c)

[[ 0  1  4  9]
 [ 0  5 12 21]]


In [58]:
# error in case of incompatible shapes.
print(a*e)

ValueError: operands could not be broadcast together with shapes (4,) (2,5) 

When we multiply two variables using the `*` operator, `numpy` automatically detects what kind of variables we are working with and tries to figure out what kind of operation we want. When we use (`+`, `-`, `/`), either the two variables should have the same number of columns or one of them should have `1` in its `shape`.

When we are performing an operation using `numpy`, we should always keep the shapes of the inputs in mind. 

In [59]:
a = np.zeros((1,4))
b = np.zeros((4,3))
c = a.dot(b)
print(c.shape)

(1, 3)


If we use the `.dot` `numpy` operator, we should pay attention to the order because we dot the variables that are next to each other. 

Let's check more examples that demonstrate the concept of `shape`:

In [60]:
import numpy as np

In [61]:
a = np.zeros((2,4))
b = np.zeros((4,3))
c = a.dot(b)
print(c.shape)

(2, 3)


In [62]:
e = np.zeros((2,1))
f = np.zeros((1,3))
g = e.dot(f)
print(g.shape)

(2, 3)


In [63]:
h = np.zeros((5,4)).T
i = np.zeros((5,6))
j = h.dot(i)
print(j.shape)

(4, 6)


In [64]:
import numpy as np

h = np.zeros((5,4))
i = np.zeros((5,6))
j = h.dot(i)
print(j.shape)

ValueError: shapes (5,4) and (5,6) not aligned: 4 (dim 1) != 5 (dim 0)

## Summary
### To Predict, Neural Networks Perform Repeated Weighted Sums of the Input

To predict, neural networks perform repeated weighted sums of the input. The network's "intelligence" depends on the weight values we give it. 

Everything we have done in this chapter is a form of what is called **forward propagation**.

# Sketches

<div style="text-align:center;"><img style="width:400px" src="static/imgs/03/1-to-1-NN-Note.jpg" /><img style="width:333px" src="static/imgs/03/Reminder-1-to-1-NN-Note.jpg" /></div>

---