{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "77a61416-9c30-43a7-ba02-c6fee8b793f7",
        "_uuid": "077b3c5b5b58d8e4cd82970a2c46f98617462cc4"
      },
      "source": [
        "# Machine Learning For Finance - Chapter 1\n",
        "\n",
        "This Kernel contains the code samples for the chapter 1 of my [Machine Learning For Finance book](https://www.amazon.com/Machine-Learning-Finance-algorithms-financial/dp/1789136369/ref=sr_1_4?ie=UTF8&qid=1523854999&sr=8-4&keywords=machine+learning+for+finance).  Note that the original text feature far more text, explanations and figures. This notebook only features the code and some related comments. \n",
        "If you enjoy this content, take a look at the [book](https://www.amazon.com/Machine-Learning-Finance-algorithms-financial/dp/1789136369/ref=sr_1_4?ie=UTF8&qid=1523854999&sr=8-4&keywords=machine+learning+for+finance)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "4f0579df-cbad-4fd4-874c-dc2e339b19e7",
        "_uuid": "4f9ac543afafe5b6717938467b55074c7b001397"
      },
      "source": [
        "# A logistic regressor\n",
        "The simplest neural network is a logistic regressor. Logistic regression takes in values of any range but outputs only values between zero and one. There are many applications for logistic regressors. One example use case is to predict the likelihood of a homeowner to default on a mortgage. We might take all kinds of values into account to predict the likelihood of default, the debtor’s salary, whether she has a car, the security of her job, etc., but the likelihood will always be a value between zero and one. Even the worst debtor ever cannot have a default likelihood above 100% and the best cannot go below 0%.\n",
        "\n",
        "We will use a library called numpy which enables easy and fast matrix operations in Python. To ensure we get the same result in all of our experiments, we have to set a random seed."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "66ccd4b5-aaa6-4721-8092-d8296bc2d3a1",
        "_uuid": "2035ac8bbe860f610bb552da37e36633c96acb48",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "np.random.seed(1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "fd5c9cfc-a719-42b5-a9f1-038ff276012f",
        "_uuid": "cf005021b4fb1e4c6017dff493e392f0c162eca0"
      },
      "source": [
        "Since our dataset is quite small, we define it manually as numpy matrices. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "de7998c9-e74c-4e3b-88b4-1f0c617801e2",
        "_uuid": "614454881e906a2fdabf913aa54cee2a46a37b2a",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "X = np.array([[0,1,0],\n",
        "              [1,0,0],\n",
        "              [1,1,1],\n",
        "              [0,1,1]])\n",
        "\n",
        "y = np.array([[0,1,1,0]]).T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "dfbeff4b-3f9b-4dec-b820-ebe77f06af27",
        "_uuid": "6a268f926fcbe7383fb9019458ffcc528f6ea5f9"
      },
      "source": [
        "To compute the output of the regressor, we must first do a linear step. We compute the dot product of the input X and the weights W. This is the same as multiplying each value of X with its weight and then taking the sum. To this number, we add the bias b. Afterwards, we do a nonlinear step. In the nonlinear step, we run the linear intermediate product z through an activation function, in this case, the sigmoid function. The sigmoid function squishes input values to outputs between zero and one.\n",
        "\n",
        "We can define the sigmoid activation function as a Python function."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "32bb5d97-f7ec-4fdc-a74d-187b7d05759a",
        "_uuid": "9779bc33ff20e7c85acfe36dbd580478a5fbc475",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "def sigmoid(x):\n",
        "    return 1/(1+np.exp(-x))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "d7a7c4fa-f1c9-4075-a501-4b05cbf521a6",
        "_uuid": "118964f6be1e93d56c94f6b0d5f8ec50cfab419d"
      },
      "source": [
        "So far, so good. Now we need to initialize W. In this case, we actually know already which values W should have. But we cannot know for other problems where we do not know the function yet. So, we have to assign weights randomly. The weights are usually assigned randomly with a mean of zero. The bias is usually set to zero by default. Numpys `random` function expects to recieve the shape of the random matrix to be passed on as a tuple so `random((3,1))` creates a 3 by 1 matrix. By default, the random values generated are between zero and one, with a mean of 0.5 and a standard deviation of 0.5. We want the random values to have a mean of 0 and a standard deviation of 1, so we first multiply the values generated by 2 and then subtract 1."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "3d41e2be-9901-4364-b634-15291170bff4",
        "_uuid": "cd743c057fa9e6f605211e4c1ed2df16d38a6267",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "W = 2*np.random.random((3,1)) - 1\n",
        "b = 0"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "56ed955e-c9cc-4456-8159-41722e5f1935",
        "_uuid": "eab32cd3df612d9090d98716a1b453b0c5233fb1"
      },
      "source": [
        "Now that all variables are set, we can do the linear step:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "6fec8791-2ce4-438c-99a5-d0591c879f63",
        "_uuid": "81bfee2776e814a145078f63b817dcf1e02c1041",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "z = X.dot(W) + b"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "08a24799-089d-44eb-bb1e-d535090e59bd",
        "_uuid": "92b7faa422f944f5e410a1de3732ae19c0339153"
      },
      "source": [
        "And the nonlinear step:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "9cd49ab4-ed5b-4443-b789-a81621301938",
        "_uuid": "32b92af15b0435cc0e5a2146b7fcdb721f285847",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "A = sigmoid(z)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "c765821c-3e89-4032-b5bd-700435a46834",
        "_uuid": "9f8b6096880a97684495dbdb447fbb502e0ca4ea"
      },
      "source": [
        "If we print out `A` now, we get the following output:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "133c2191-c288-4211-8e91-c47f3ea912b5",
        "_uuid": "5cd4078e451e5ca4c3dd5de26ccafb2e1f871638",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "print(A)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "03a8397f-4ac6-4ab4-8909-a8735cc99477",
        "_uuid": "20320c839128de4f458bef5c8189627fb5502c44"
      },
      "source": [
        "This looks nothing like our desired output y at all! Clearly, our regressor is representing some function, but it is quite far away from the function we want. \n",
        "To better approximate our desired function, we have to tweak the weights W and the bias b to get better results.\n",
        "\n",
        "In this case, our problem is a binary classification problem, so we will use the binary cross entropy loss:\n",
        "$$D_{BCE}(y,\\hat y) = -\\frac{1}{N} \\sum_{i = i}^N[y_i  log(\\hat y_i) + (1-y_i)log(1-\\hat y_i)]$$\n",
        "\n",
        "Let's go through this step by step.\n",
        " \n",
        " 1. $D_{BCE}(y,\\hat y)$ is the distance function for binary cross entropy loss.\n",
        " \n",
        " \n",
        " 2. $-\\frac{1}{N} \\sum_{i=1}^N$ The loss over a batch of N examples is the average loss of all examples. \n",
        " \n",
        " \n",
        " 3. $y_i * \\log \\hat y_i$ This part of the loss only comes into play if the true value, $y_i$ is 1. If $y_i$ is 1, we want $\\hat y_i$ to be as close to 1 as possible, to achieve a low loss.\n",
        " \n",
        " \n",
        " 4. $(1-y_i)\\log(1-\\hat y_i)$ This part of the loss comes into play if $y_i$ is 0. If so, we want $\\hat y_i$ to be close to 0 as well.\n",
        "\n",
        "In Python this loss function is implemented as follows:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "8279f3a2-8267-4cd0-ae79-86fec4f8e6b6",
        "_uuid": "09899ff3bc2bc00559076582b9be0d13c137c3e5",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "def bce_loss(y,y_hat):\n",
        "  N = y.shape[0]\n",
        "  loss = -1/N * np.sum((y*np.log(y_hat) + (1 - y)*np.log(1-y_hat)))\n",
        "  return loss "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "40220ad1-492b-4d0c-974e-b7e55aa0870e",
        "_uuid": "534a98b74b1bdf0aa04be51dd1b71a46c4c7d89e",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "bce_loss(y,A)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "0a4a118c-0b90-4a17-a57c-cfb5833e820a",
        "_uuid": "b8c557e218549361c10ae29590f8a5ca4a6be467"
      },
      "source": [
        "## Backpropagation \n",
        "\n",
        "To update the parameters, we need to calculate the derivative of the loss function with respect to the weights and biases. If you imagine the parameters of our models like the geo coordinates in our mountain analogy, calculating the loss derivative with respect to a parameter is like checking the mountain slope in the direction north to see whether you should go north or south.\n",
        "\n",
        "Note: to keep things simple, we refer to the derivative of the loss function to any variable as $d$variable. For example we write the derivative of the loss function with respect to the weights as $dW$.\n",
        "\n",
        "To calculate the gradient with respect to different parameters of our model, we can make use of the chain rule. You might remember the chain rule as:\n",
        "\n",
        "$$(f(g(x)))' = g(x)' * f'(g(x))$$\n",
        "\n",
        "Sometimes also written as:\n",
        "$$\\frac{dy}{dx} = \\frac{dy}{du} \\frac{du}{dx}$$\n",
        "\n",
        "What the chain rule basically says is that if you want to take the derivative through a number of nested functions you multiply the derivative of the inner function with the derivative of the outer function. This is useful since neural networks, and our logistic regressor, are nested functions. The input goes through the linear step, a function of input, weights and biases. The output of the linear step, $z$ goes through the activation function.\n",
        "\n",
        "So when we compute the loss derivative with respect to weights and bias, we first compute the loss derivative with respect to the output of the linear step $z$, and use it to compute the $dW$. In code it looks like this:\n",
        "```python\n",
        "dz = (A - y)\n",
        "\n",
        "dW = 1/N * np.dot(X.T,dz)\n",
        "\n",
        "db = 1/N * np.sum(dz,axis=0,keepdims=True)     \n",
        "```\n",
        "\n",
        "## Parameter updates\n",
        "Now we have the gradients, how do we improve our model? Or, to stay with our mountain analogy, now that we know that the mountain goes up in the North direction and up in the East direction, where do we go? To the South and to the West of course! Mathematically speaking, we go in the opposite direction than the gradient. If the gradient is positive with respect to a parameter, speak the slope is upward, we reduce the parameter. If it is negative, speak downward sloping, we increase it. When our slope is steeper, we move our gradient more.\n",
        "\n",
        "The update rule for a parameter p then goes like:\n",
        "$$p = p - \\alpha * dp$$\n",
        "\n",
        "Where $p$ is a model parameter (either a weight or a bias), $dp$ is the loss derivative with respect to $p$ and $\\alpha$ is the **learning rate**. The learning rate is something like the gas pedal in a car. It sets how much we want to apply the gradient updates. It is one of those hyper parameters that we have to set manually. We will discuss it in the next chapter."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "b3f4b8a7-a442-40e5-a3e0-de9d4366c32d",
        "_uuid": "1779b7f373ed05c5c759d55488171c67153adf4f",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "\n",
        "# Randomly initialize the weights\n",
        "W = 2*np.random.random((3,1)) - 1\n",
        "b = 0\n",
        "\n",
        "# Set the learning rate alpha to 1\n",
        "alpha = 1\n",
        "\n",
        "# We will train for 20 epochs\n",
        "epochs = 20\n",
        "\n",
        "# Count the number of training examples we have (4)\n",
        "N = y.shape[0]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_uuid": "a1f74d6b6c36231785f975c59c7f14117e07fb8c"
      },
      "source": [
        "In the loop below we do multiple forward and backward passes and apply the gradient descent update rule."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "ccf1ee14-839e-4034-8314-bb740fc83e24",
        "_uuid": "4a845c79eede5385b2ea71ddc996fd8a94ebd3f3",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "losses = []\n",
        "for i in range(epochs):\n",
        "    # Do the linear step\n",
        "    z = X.dot(W) + b\n",
        "    \n",
        "    # Do the non linear step\n",
        "    A = sigmoid(z)\n",
        "    \n",
        "    # Calculate the loss\n",
        "    loss = bce_loss(y,A)\n",
        "    \n",
        "    # Keep track of the loss\n",
        "    print('Epoch:',i,'Loss:',loss)\n",
        "    losses.append(loss)\n",
        "    \n",
        "    # Back propagate\n",
        "    dz = (A - y)\n",
        "    \n",
        "    # ... calcualte loss derivative with respect to weights\n",
        "    dW = 1/N * np.dot(X.T,dz)\n",
        "    \n",
        "    # ... calculate loss derivative with respect to bias\n",
        "    db = 1/N * np.sum(dz,axis=0,keepdims=True)    \n",
        "    \n",
        "    # Update parameters\n",
        "    W -= alpha * dW\n",
        "    b -= alpha * db"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "48353065-068a-4ddb-a127-7b10909519f3",
        "_uuid": "53d633ce62032696a4a29ac4d1415e7e855d5496",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "import matplotlib.pyplot as plt\n",
        "plt.plot(losses)\n",
        "plt.xlabel('epoch')\n",
        "plt.ylabel('loss')\n",
        "plt.show()\n",
        "#fig.savefig('loss.jpg')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "559906ba-4e52-44af-9742-38637de1d9df",
        "_uuid": "94c81a93ccfc890affbe2e6ea619e3ef1dc91ddd"
      },
      "source": [
        "# A deeper network \n",
        "We established earlier that in order to approximate more complex functions, we need bigger, deeper networks. Creating a deeper networks works by stacking layers on top of each other.\n",
        "\n",
        "In this section we will build a 2 layer neural network\n",
        "\n",
        "The input gets multiplied with the first set of weights $W_1$, producing an intermediate product $z_1$ and then run through an activation function to produce the first layers activations $A_1$. These activations then get multiplied with a second layer of weights $W_2$, producing an intermediate product $z_2$ which gets run through a second activation function which produces the output $A_2$ of our neural net."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "42cb8462-2a03-4baf-a97c-fe501539698d",
        "_uuid": "eb77fc3065551acf8ea3f4ca9986650d8cad7f40",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Package imports\n",
        "# Matplotlib is a matlab like plotting library\n",
        "import matplotlib\n",
        "import matplotlib.pyplot as plt\n",
        "# Numpy handles matrix operations\n",
        "import numpy as np\n",
        "# SciKitLearn is a useful machine learning utilities library\n",
        "import sklearn\n",
        "# The sklearn dataset module helps generating datasets\n",
        "import sklearn.datasets\n",
        "import sklearn.linear_model\n",
        "\n",
        "\n",
        "# Display plots inline and change default figure size\n",
        "%matplotlib inline\n",
        "matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "134351ea-2cc0-47ec-be11-e5a38c13624f",
        "_uuid": "5c20db78cc7010644ca479dc87b9b3b9f41f8bb9",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Just some helper functions we moved over from the last chapter\n",
        "# sigmoid function\n",
        "def sigmoid(x):\n",
        "    '''\n",
        "    Calculates the sigmoid activation of a given input x\n",
        "    See: https://en.wikipedia.org/wiki/Sigmoid_function\n",
        "    '''\n",
        "    return 1/(1+np.exp(-x))\n",
        "\n",
        "#Log Loss function\n",
        "def bce_loss(y,y_hat):\n",
        "    '''\n",
        "    Calculates the logistic loss between a prediction y_hat and the labels y\n",
        "    See: http://wiki.fast.ai/index.php/Log_Loss\n",
        "\n",
        "    We need to clip values that get too close to zero to avoid zeroing out. \n",
        "    Zeroing out is when a number gets so small that the computer replaces it with 0.\n",
        "    Therefore, we clip numbers to a minimum value.\n",
        "    '''\n",
        "    minval = 0.000000000001\n",
        "    N = y.shape[0]\n",
        "    l = -1/N * np.sum(y * np.log(y_hat.clip(min=minval)) + (1-y) * np.log((1-y_hat).clip(min=minval)))\n",
        "    return l\n",
        "\n",
        "# Log loss derivative\n",
        "def bce_loss_derivative(y,y_hat):\n",
        "    '''\n",
        "    Calculates the gradient (derivative) of the log loss between point y and y_hat\n",
        "    See: https://stats.stackexchange.com/questions/219241/gradient-for-logistic-loss-function\n",
        "    '''\n",
        "    return (y_hat-y)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "19f6e41f-1bba-419a-8151-440bfcffdbf8",
        "_uuid": "c8353a528a86b5dc906ac2f54f376fdabe26e685",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "def forward_prop(model,a0):\n",
        "    '''\n",
        "    Forward propagates through the model, stores results in cache.\n",
        "    See: https://stats.stackexchange.com/questions/147954/neural-network-forward-propagation\n",
        "    A0 is the activation at layer zero, it is the same as X\n",
        "    '''\n",
        "    \n",
        "    # Load parameters from model\n",
        "    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']\n",
        "    \n",
        "    # Linear step\n",
        "    z1 = a0.dot(W1) + b1\n",
        "    \n",
        "    # First activation function\n",
        "    a1 = np.tanh(z1)\n",
        "    \n",
        "    # Second linear step\n",
        "    z2 = a1.dot(W2) + b2\n",
        "    \n",
        "    # Second activation function\n",
        "    a2 = sigmoid(z2)\n",
        "    cache = {'a0':a0,'z1':z1,'a1':a1,'z1':z1,'a2':a2}\n",
        "    return cache"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "b6083996-4d46-4602-a836-8c662da48d0b",
        "_uuid": "69e4a3e36f4f58445c96e8865c85bac8d76fec83",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "def tanh_derivative(x):\n",
        "    '''\n",
        "    Calculates the derivative of the tanh function that is used as the first activation function\n",
        "    See: https://socratic.org/questions/what-is-the-derivative-of-tanh-x\n",
        "    '''\n",
        "    return (1 - np.power(x, 2))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "4591d3ae-c6d9-4834-bcd0-2caca210cb5b",
        "_uuid": "f854f7c2a6e84c336dbdac600a487d3b8b51c23b",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "def backward_prop(model,cache,y):\n",
        "    '''\n",
        "    Backward propagates through the model to calculate gradients.\n",
        "    Stores gradients in grads dictionary.\n",
        "    See: https://en.wikipedia.org/wiki/Backpropagation\n",
        "    '''\n",
        "    # Load parameters from model\n",
        "    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']\n",
        "    \n",
        "    # Load forward propagation results\n",
        "    a0,a1, a2 = cache['a0'],cache['a1'],cache['a2']\n",
        "    \n",
        "    # Backpropagation\n",
        "    # Calculate loss derivative with respect to output\n",
        "    dz2 = bce_loss_derivative(y=y,y_hat=a2)\n",
        "    \n",
        "    # Calculate loss derivative with respect to second layer weights\n",
        "    dW2 = (a1.T).dot(dz2)\n",
        "    \n",
        "    # Calculate loss derivative with respect to second layer bias\n",
        "    db2 = np.sum(dz2, axis=0, keepdims=True)\n",
        "    \n",
        "    # Calculate loss derivative with respect to first layer\n",
        "    dz1 = dz2.dot(W2.T) * tanh_derivative(a1)\n",
        "    \n",
        "    # Calculate loss derivative with respect to first layer weights\n",
        "    dW1 = np.dot(a0.T, dz1)\n",
        "    \n",
        "    # Calculate loss derivative with respect to first layer bias\n",
        "    db1 = np.sum(dz1, axis=0)\n",
        "    \n",
        "    # Store gradients\n",
        "    grads = {'dW2':dW2,'db2':db2,'dW1':dW1,'db1':db1}\n",
        "    return grads"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "5107f9bb-f269-4f3f-a39d-cb9930a98941",
        "_uuid": "91cf13cfbb8f2a89fdf96213b051a0ca43fa75ed",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Helper function to plot a decision boundary.\n",
        "# If you don't fully understand this function don't worry, it just generates the contour plot below.\n",
        "def plot_decision_boundary(pred_func):\n",
        "    # Set min and max values and give it some padding\n",
        "    x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5\n",
        "    y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5\n",
        "    h = 0.01\n",
        "    # Generate a grid of points with distance h between them\n",
        "    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
        "    # Predict the function value for the whole gid\n",
        "    Z = pred_func(np.c_[xx.ravel(), yy.ravel()])\n",
        "    Z = Z.reshape(xx.shape)\n",
        "    # Plot the contour and training examples\n",
        "    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)\n",
        "    plt.scatter(X[:, 0], X[:, 1], c=y.flatten(), cmap=plt.cm.Spectral)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "e8c6884c-0316-4b1f-966a-61a639dc4e5a",
        "_uuid": "44585d9c25209b777018eeec73db699439be156b",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Generate a dataset and plot it\n",
        "np.random.seed(0)\n",
        "X, y = sklearn.datasets.make_moons(200, noise=0.15)\n",
        "y = y.reshape(200,1)\n",
        "plt.scatter(X[:,0], X[:,1], s=40, c=y.flatten(), cmap=plt.cm.Spectral)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "ffce597f-5590-4a26-b5fe-47683eb80ace",
        "_uuid": "ae30101b945a165faf90289371bbedf345b3882f",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "def predict(model, x):\n",
        "    '''\n",
        "    Predicts y_hat as 1 or 0 for a given input X\n",
        "    '''\n",
        "    # Do forward pass\n",
        "    c = forward_prop(model,x)\n",
        "    #get y_hat\n",
        "    y_hat = c['a2']\n",
        "    \n",
        "    # Turn values to either 1 or 0\n",
        "    y_hat[y_hat > 0.5] = 1\n",
        "    y_hat[y_hat < 0.5] = 0\n",
        "    return y_hat"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "cc3b15d5-0946-4e18-833e-42b1ce8b4dcd",
        "_uuid": "8c92efe436d670a14306b9a7ae507e5e1b48587d",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "def calc_accuracy(model,x,y):\n",
        "    '''\n",
        "    Calculates the accuracy of the model given an input x and a correct output y.\n",
        "    The accuracy is the percentage of examples our model classified correctly\n",
        "    '''\n",
        "    # Get total number of examples\n",
        "    m = y.shape[0]\n",
        "    # Do a prediction with the model\n",
        "    pred = predict(model,x)\n",
        "    # Ensure prediction and truth vector y have the same shape\n",
        "    pred = pred.reshape(y.shape)\n",
        "    # Calculate the number of wrong examples\n",
        "    error = np.sum(np.abs(pred-y))\n",
        "    # Calculate accuracy\n",
        "    return (m - error)/m * 100"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "69ba935e-485b-40ec-abd8-73f30c5014ff",
        "_uuid": "9c8d58e9ad4a9d0003539d3132a9ea6cbf626989",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "def initialize_parameters(nn_input_dim,nn_hdim,nn_output_dim):\n",
        "    '''\n",
        "    Initializes weights with random number between -1 and 1\n",
        "    Initializes bias with 0\n",
        "    Assigns weights and parameters to model\n",
        "    '''\n",
        "    # First layer weights\n",
        "    W1 = 2 *np.random.randn(nn_input_dim, nn_hdim) - 1\n",
        "    \n",
        "    # First layer bias\n",
        "    b1 = np.zeros((1, nn_hdim))\n",
        "    \n",
        "    # Second layer weights\n",
        "    W2 = 2 * np.random.randn(nn_hdim, nn_output_dim) - 1\n",
        "    \n",
        "    # Second layer bias\n",
        "    b2 = np.zeros((1, nn_output_dim))\n",
        "    \n",
        "    # Package and return model\n",
        "    model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}\n",
        "    return model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "d4a9eeba-b5fb-4210-9e8c-ffecab6cf405",
        "_uuid": "a3814c843ef102fa60fd0f593f26ea82c8dd3773",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "def update_parameters(model,grads,learning_rate):\n",
        "    '''\n",
        "    Updates parameters accoarding to gradient descent algorithm\n",
        "    See: https://en.wikipedia.org/wiki/Gradient_descent\n",
        "    '''\n",
        "    # Load parameters\n",
        "    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']\n",
        "    \n",
        "    # Update parameters\n",
        "    W1 -= learning_rate * grads['dW1']\n",
        "    b1 -= learning_rate * grads['db1']\n",
        "    W2 -= learning_rate * grads['dW2']\n",
        "    b2 -= learning_rate * grads['db2']\n",
        "    \n",
        "    # Store and return parameters\n",
        "    model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}\n",
        "    return model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "0b04246b-2480-459d-889b-4dbc55b3498e",
        "_uuid": "2b43cd9f48a8427ec0e1c50be50e49045bf08dbd",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "def train(model,X_,y_,learning_rate, num_passes=20000, print_loss=False):\n",
        "    # Gradient descent. For each batch...\n",
        "    for i in range(0, num_passes):\n",
        "\n",
        "        # Forward propagation\n",
        "        cache = forward_prop(model,X_)\n",
        "        #a1, probs = cache['a1'],cache['a2']\n",
        "        # Backpropagation\n",
        "        \n",
        "        grads = backward_prop(model,cache,y)\n",
        "        # Gradient descent parameter update\n",
        "        # Assign new parameters to the model\n",
        "        model = update_parameters(model=model,grads=grads,learning_rate=learning_rate)\n",
        "    \n",
        "        # Pring loss & accuracy every 100 iterations\n",
        "        if print_loss and i % 100 == 0:\n",
        "            y_hat = cache['a2']\n",
        "            print('Loss after iteration',i,':',bce_loss(y,y_hat))\n",
        "            print('Accuracy after iteration',i,':',calc_accuracy(model,X_,y_),'%')\n",
        "    \n",
        "    return model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_uuid": "e48b246c4d9ea7223da31e9d61df1ac78a13db54"
      },
      "source": [
        "## Little noise and a good hidden size\n",
        "In this section, we will fit a model with a good hidden layer size to data with little noise"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "f8868a6f-74a2-4721-9a83-6bebd3af6268",
        "_uuid": "b3e8c73d4d979af3ea2243228502735348bed88f",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Hyper parameters\n",
        "hiden_layer_size = 3\n",
        "# I picked this value because it showed good results in my experiments\n",
        "learning_rate = 0.01"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "1d0c98cf-6d11-444f-a8a3-5af4348e2714",
        "_uuid": "8aad1905514f7dbc4e08d2c4dfa491f8adfd3a72",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Initialize the parameters to random values. We need to learn these.\n",
        "np.random.seed(0)\n",
        "# This is what we return at the end\n",
        "model = initialize_parameters(nn_input_dim=2, nn_hdim= hiden_layer_size, nn_output_dim= 1)\n",
        "model = train(model,X,y,learning_rate=learning_rate,num_passes=1000,print_loss=True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "d7f7d4e6-6a58-4f1e-bcd9-0fb920e1ccfe",
        "_uuid": "25a49d37c17895c081cea3638e6c9be78655fd25",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Plot the decision boundary\n",
        "plot_decision_boundary(lambda x: predict(model,x))\n",
        "plt.title(\"Decision Boundary for hidden layer size 3\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "cf2f13d1-8283-40a4-8f70-c53740e7aade",
        "_uuid": "9144f9cee2da0346db210687b438f129226d0f16",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Now with more noise\n",
        "# Generate a dataset and plot it\n",
        "np.random.seed(0)\n",
        "# The data generator alows us to regulate the noise level\n",
        "X, y = sklearn.datasets.make_moons(200, noise=0.3)\n",
        "y = y.reshape(200,1)\n",
        "plt.scatter(X[:,0], X[:,1], s=40, c=y.flatten(), cmap=plt.cm.Spectral)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_uuid": "e289cbae8d9a3959fde0373f9a68f175e89449d3"
      },
      "source": [
        "## Too small hidden size\n",
        "In this section, the hidden layer size is 1, which is too small. The data also has more noise."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "69fd9502-b9c3-41a8-a38e-eacca3ea062d",
        "_uuid": "8d0d8147bbeba2578e2e1e97c9712f8fdd825ab4",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Hyper parameters\n",
        "hiden_layer_size = 1\n",
        "# I picked this value because it showed good results in my experiments\n",
        "learning_rate = 0.01\n",
        "\n",
        "# Initialize the parameters to random values. We need to learn these.\n",
        "np.random.seed(0)\n",
        "# This is what we return at the end\n",
        "model = initialize_parameters(nn_input_dim=2, nn_hdim= hiden_layer_size, nn_output_dim= 1)\n",
        "model = train(model,X,y,learning_rate=learning_rate,num_passes=1000,print_loss=True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "2db33060-d2d4-4a95-b94d-2ca229585bfb",
        "_uuid": "98f7aecd488d4c5adf407535b9e0850b8fdd1537",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Plot the decision boundary\n",
        "plot_decision_boundary(lambda x: predict(model,x))\n",
        "plt.title(\"Decision Boundary for hidden layer size 1\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_uuid": "67c3a6552c9a68dc62deb86c044c1ae95e379061"
      },
      "source": [
        "## Too large hidden layer size\n",
        "In this section, the hidden layer size is too large and the model fits the noise."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "f525bb30-cf17-433b-9b57-9aa7123c642b",
        "_uuid": "41032216395e35f90beae9b08ab8469b0f22f645",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Hyper parameters\n",
        "hiden_layer_size = 500\n",
        "# I picked this value because it showed good results in my experiments\n",
        "learning_rate = 0.01\n",
        "\n",
        "# Initialize the parameters to random values. We need to learn these.\n",
        "np.random.seed(0)\n",
        "# This is what we return at the end\n",
        "model = initialize_parameters(nn_input_dim=2, nn_hdim= hiden_layer_size, nn_output_dim= 1)\n",
        "model = train(model,X,y,learning_rate=learning_rate,num_passes=1000,print_loss=True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "1daafc22-b44a-427c-ad0d-a6d8bf9afd41",
        "_uuid": "c634bb1bdec81a3ed0ebcaea3042ce86618845c7",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Plot the decision boundary\n",
        "# This might take a little while as our model is very big now\n",
        "plot_decision_boundary(lambda x: predict(model,x))\n",
        "plt.title(\"Decision Boundary for hidden layer size 500\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_cell_guid": "6638cac0-c971-44a7-a4e4-7fa9489a249d",
        "_uuid": "496fae1682f45472dbc2c9d96652cee80c5f0831"
      },
      "source": [
        "# Keras\n",
        "In this section we will build the same model with the Keras Sequential API"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "5aa70286-1357-4ef5-ae0c-4ee377e7747a",
        "_uuid": "8254c838591e8e40be3d86b629e531716cb63698",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "# Generate a dataset and plot it\n",
        "np.random.seed(0)\n",
        "X, y = sklearn.datasets.make_moons(200, noise=0.15)\n",
        "y = y.reshape(200,1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_uuid": "679b7bdfbfca3a1b96dd5477656837cab2ac4f97"
      },
      "source": [
        "## Importing Keras \n",
        "When importing Keras, we usually just import the modules we will use. In this case, we need two types of layers: The `Dense` layer is the plain layer which we have gotten to know in this chapter. The `Activation` layer allows us to add an activation function. We can import them like this:\n",
        "\n",
        "```Python \n",
        "from keras.layers import Dense, Activation\n",
        "```\n",
        "\n",
        "Keras offers two kinds of ways to build models. The sequential and the functional API. The sequential API is easier to use and allows more rapid building of models so we will use it in most of the book. In later chapters we will take a look at the functional API as well. We can access the sequential API like this:\n",
        "```Python \n",
        "from keras.models import Sequential\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "1dfcce6c-8cda-4f4a-aef2-0eef8bae882a",
        "_uuid": "bbace2f5fa286aa0d5206d9da474db2c35b676b1",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "from keras.layers import Dense, Activation\n",
        "from keras.models import Sequential"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_uuid": "24dc003637cf206b5b67ed9f5c0b762ff5b5938f"
      },
      "source": [
        "## A two layer model in Keras \n",
        "Building a neural network in the sequential API works as follows:\n",
        "\n",
        "### Stacking layers\n",
        "First, we create an empty sequential model with no layers:\n",
        "```Python\n",
        "model = Sequential()\n",
        "```\n",
        "Then we can add layers to this model just like stacking a layer cake with `model.add()`. For the first layer, we have to specify the input dimensions of the layer. In our case, the data has two features, the coordinates of the point. We can add a hidden layer with hidden layer size 3 like this: \n",
        "\n",
        "```Python \n",
        "model.add(Dense(3,input_dim=2))\n",
        "```\n",
        "Note how we nest the functions: Inside `model.add()` we specify the `Dense` layer. The positional argument is the size of the layer. This `Dense` layer now only does the linear step. To add a tanh activation function, we call:\n",
        "\n",
        "```Python \n",
        "model.add(Activation('tanh'))\n",
        "```\n",
        "\n",
        "We add the linear step and the activation function of the output layer in the same way:\n",
        "```Python \n",
        "model.add(Dense(1))\n",
        "model.add(Activation('sigmoid'))\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "127bad17-867d-444a-aa36-f11e77ced4f6",
        "_uuid": "97346e9a852f680213393e18445e1d5239dfda6c",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "model = Sequential()\n",
        "model.add(Dense(3,input_dim=2))\n",
        "model.add(Activation('tanh'))\n",
        "model.add(Dense(1))\n",
        "model.add(Activation('sigmoid'))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_uuid": "301393592b5c5ca2ccf63f0cc2d01593850d9adf"
      },
      "source": [
        "### Compiling the model \n",
        "\n",
        "Before we can start training the model we have to specify how exactly we want to train the model. Most importantly we need to specify which optimizer  and which loss function we want to use. The simple optimizer we have used so far is called 'Stochastic Gradient Descent' or SGD, for more optimizers, see chapter 2. The loss function we use for this binary classification problem is called 'binary crossentropy'. We can also specify which metrics we want to track during training. In our case, accuracy would be interesting to track, or just 'acc' to keep it short."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "e36efbd5-3ee8-4261-843f-94aca039ad4a",
        "_uuid": "9c63e075132fbb4ceec65aee1c2743e04e4ca899",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "model.compile(optimizer='sgd',loss='binary_crossentropy',metrics=['acc'])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "c28f4b4b-b572-45d6-bdcb-ee9ba8fd879f",
        "_uuid": "43b6e76b0c367dc8bef1d3ae9325f80d1a8df271",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "model.summary()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "_uuid": "042fa4c2909ce028d71688d5f9387777e437aaf5"
      },
      "source": [
        "### Training the model \n",
        "Now we are ready to run the training process:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "5c8c417e-8914-4cdf-a022-856693147e6f",
        "_uuid": "22db54d490a464cb3b0c2c22ecf6b563b91b8d34",
        "collapsed": true
      },
      "outputs": [],
      "source": [
        "history = model.fit(X,y,epochs=900)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "_cell_guid": "97d2f920-30cc-4f74-b205-a53021f5a7de",
        "_uuid": "632dc87f87a6cc6810591c369f607af080ae31c5",
        "collapsed": true
      },
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "anaconda-cloud": {},
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.4"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 1
}