{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Neural Networks Learning\n",
    "\n",
    "**NOTE: The example and sample data is being taken from the \"Machine Learning course by Andrew Ng\" in Coursera.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction\n",
    "In the previous exercise, you implemented feedforward propagation for neural networks and used it to predict handwritten digits with the weights we provided. In this exercise, you will implement the backpropagation algorithm\n",
    "to learn the parameters for the neural network.\n",
    "The provided script, ex4.m, will help you step through this exercise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# initial imports\n",
    "import numpy as np\n",
    "from matplotlib import pyplot as plt\n",
    "%matplotlib inline\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# imports from my models\n",
    "from models.data_preprocessing import add_bias_unit\n",
    "from models.logistic_regression import sigmoid"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualizing the data\n",
    "Firstly we will load the data and display it on a 2-dimensional plot (Figure 1).\n",
    "\n",
    "There are\n",
    "5000 training examples in ex3data1.mat, where each training example is a\n",
    "20 pixel by 20 pixel grayscale image of the digit. Each pixel is represented by\n",
    "a floating point number indicating the grayscale intensity at that location.\n",
    "The 20 by 20 grid of pixels is “unrolled” into a 400-dimensional vector. Each\n",
    "of these training examples becomes a single row in our data matrix X. This\n",
    "gives us a 5000 by 400 matrix X where every row is a training example for a\n",
    "handwritten digit image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# loading dataset\n",
    "import scipy.io as sio  # sio for loading matlab file .mat\n",
    "data = sio.loadmat('data/ex4data1.mat')\n",
    "X = data['X']\n",
    "y = data['y']\n",
    "y[y==10] = 0 # mapping zeroes in y to 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# setting up variables we will be using for this example\n",
    "input_layer_size  = 400  # 20x20 Input Images of Digits\n",
    "hidden_layer_size = 25   # 25 hidden units\n",
    "num_labels = 10          # 10 labels, from 1 to 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading and Visualizing Data ...\n",
      "Randomly selecting 100 data points to display\n"
     ]
    }
   ],
   "source": [
    "print('Loading and Visualizing Data ...')\n",
    "\n",
    "m = X.shape[0]\n",
    "\n",
    "print(\"Randomly selecting 100 data points to display\")\n",
    "rand_indices = np.random.choice(range(0,m), 100)\n",
    "rand_samples = X[rand_indices, :]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 720x720 with 100 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# displaying the 100 random samples using matplotlib\n",
    "sns.set_style('white')\n",
    "fig, axis = plt.subplots(10,10,sharex=True, sharey=True, figsize=(10,10))\n",
    "fig.subplots_adjust(wspace=0.1, hspace=0.1)\n",
    "axis_flt = axis.flatten()\n",
    "for i in range(100):\n",
    "    axis_flt[i].imshow(rand_samples[i, :].reshape([20,20]).T, cmap='gray')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model representation\n",
    "Our neural network is shown in Figure 2. It has 3 layers – an input layer, a\n",
    "hidden layer and an output layer. Recall that our inputs are pixel values of\n",
    "digit images. Since the images are of size 20×20, this gives us 400 input layer\n",
    "units (excluding the extra bias unit which always outputs +1). As before,\n",
    "the training data will be loaded into the variables X and y.\n",
    "You have been provided with a set of network parameters (Θ(1) , Θ(2))\n",
    "already trained by us. These are stored in ex3weights.mat and will be\n",
    "loaded by scipy.io into Theta1 and Theta2 The parameters have dimensions\n",
    "that are sized for a neural network with 25 units in the second layer and 10\n",
    "output units (corresponding to the 10 digit classes)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src='data/nn.jpg'>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading Pameters\n",
    "\n",
    "In this part of the exercise, we load some pre-initialized neural network parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading Saved Neural Network Parameters ...\n"
     ]
    }
   ],
   "source": [
    "print('Loading Saved Neural Network Parameters ...')\n",
    "\n",
    "# Load the weights into variables Theta1 and Theta2\n",
    "weights = sio.loadmat('data/ex4weights.mat')\n",
    "theta1 = weights['Theta1']  # theta1 = numpy array of shape 25x401\n",
    "theta2 = weights['Theta2']  # theta2 = numpy array of shape 10x26\n",
    "\n",
    "# swap first and last columns of Theta2, due to legacy from MATLAB indexing\n",
    "# since the weight file ex3weights.mat was saved based on MATLAB indexing\n",
    "theta2 = np.roll(theta2, 1, axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compute Cost (Feedforward)\n",
    "\n",
    "For easy convenience i have made a cost function in the nural_network file in models which we will be inheriting here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# importing the cost function\n",
    "from models.neural_network import cost_function\n",
    "\n",
    "# joining and flattening the weights into 1D Vector for inputing in the function\n",
    "nn_params = np.concatenate([theta1.flatten(), theta2.flatten()], axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cost without regularizatiion at parameters (loaded from ex4weights): [[0.28762917]] \n",
      "(this value should be about 0.287629)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# testing the cost function without regularization\n",
    "lamda = 0\n",
    "J, grad = cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lamda)\n",
    "\n",
    "print('Cost without regularizatiion at parameters (loaded from ex4weights): {} \\n(this value should be about 0.287629)\\n'.format(J))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cost without regularizatiion at parameters (loaded from ex4weights): [[0.38376986]] \n",
      "(this value should be about 0.383770)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# testing the cost function with regularization\n",
    "lamda = 1\n",
    "J, grad = cost_function(nn_params, input_layer_size, hidden_layer_size, num_labels, X, y, lamda)\n",
    "\n",
    "print('Cost without regularizatiion at parameters (loaded from ex4weights): {} \\n(this value should be about 0.383770)\\n'.format(J))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluating sigmoid gradient\n",
    "\n",
    "Here we are testing the sigmoid_gradient function which returns the gradient of the sigmoid function evaluated at z.\n",
    "\n",
    "For large values (both positive and negative) of z, the gradient should be close to 0. When z = 0, the gradient should be exactly 0.25"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sigmoid gradient evaluated at [-1.  -0.5  0.   0.5  1. ] is: \n",
      " [0.19661193 0.23500371 0.25       0.23500371 0.19661193]\n"
     ]
    }
   ],
   "source": [
    "# generating temporary data for finding gradient\n",
    "z_temp = np.array([-1, -0.5, 0, 0.5, 1])\n",
    "\n",
    "from models.neural_network import sigmoid_gradient\n",
    "g = sigmoid_gradient(z_temp)\n",
    "print(\"Sigmoid gradient evaluated at {} is: \\n {}\".format(z_temp, g))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Initializing Parameters\n",
    "\n",
    "When training neural networks, it is important to randomly initialize the parameters for symmetry breaking. One effective strategy for random initialization is to randomly select values for Θ(l) uniformly in the range [−epsilon , epsilon ]. We are using epsilon = 0.12. This range of values ensures that the parameters\n",
    "are kept small and makes the learning more efficient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "epsilon = 0.12\n",
    "initial_theta1 = np.random.uniform(-epsilon, epsilon, [hidden_layer_size, input_layer_size+1])\n",
    "initial_theta2 = np.random.uniform(-epsilon, epsilon, [num_labels, hidden_layer_size+1])\n",
    "\n",
    "# unrolling parameters\n",
    "initial_nn_params = np.concatenate([initial_theta1.flatten(), initial_theta2.flatten()], axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implementing Backpropagation\n",
    "\n",
    "Given a training example (x (t) , y (t) ), we will first run a “forward pass” to compute all the activations throughout the network, including the output value of the hypothesis h Θ (x). Then, for each node j in layer l, we would like to compute δ an “error term” δ that measures how much that node was “responsible” for any errors in our output.\n",
    "\n",
    "For an output node, we can directly measure the difference between the\n",
    "δ(3) network’s activation and the true target value, and use that to define δ(j)\n",
    "(since layer 3 is the output layer). For the hidden units, you will compute δ(j) based on a weighted average of the error terms of the nodes in layer (l + 1)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"data/back_nn.jpg\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 1.23148991e-02  1.23148991e-02]\n",
      " [ 4.32118297e-05  4.32118324e-05]\n",
      " [-3.41523148e-04 -3.41523149e-04]\n",
      " [-5.22636696e-04 -5.22636697e-04]\n",
      " [ 3.92411900e-03  3.92411900e-03]\n",
      " [-1.85358706e-04 -1.85358706e-04]\n",
      " [-2.76203793e-05 -2.76203785e-05]\n",
      " [-4.23866853e-05 -4.23866877e-05]\n",
      " [-8.08387428e-03 -8.08387428e-03]\n",
      " [-2.44172342e-04 -2.44172343e-04]\n",
      " [ 3.11973323e-04  3.11973322e-04]\n",
      " [ 4.77620381e-04  4.77620379e-04]\n",
      " [-1.26677103e-02 -1.26677103e-02]\n",
      " [-7.80554021e-05 -7.80554002e-05]\n",
      " [ 3.64940025e-04  3.64940024e-04]\n",
      " [ 5.58582427e-04  5.58582428e-04]\n",
      " [-5.59364413e-03 -5.59364413e-03]\n",
      " [ 1.59500657e-04  1.59500657e-04]\n",
      " [ 8.20818258e-05  8.20818265e-05]\n",
      " [ 1.25696296e-04  1.25696295e-04]\n",
      " [ 3.09340815e-01  3.09340815e-01]\n",
      " [ 1.62091400e-01  1.62091400e-01]\n",
      " [ 1.46266029e-01  1.46266029e-01]\n",
      " [ 1.58240774e-01  1.58240774e-01]\n",
      " [ 1.58413849e-01  1.58413849e-01]\n",
      " [ 1.46212932e-01  1.46212932e-01]\n",
      " [ 1.08132809e-01  1.08132809e-01]\n",
      " [ 5.62398975e-02  5.62398975e-02]\n",
      " [ 5.13370615e-02  5.13370615e-02]\n",
      " [ 5.54625023e-02  5.54625023e-02]\n",
      " [ 5.49725002e-02  5.49725002e-02]\n",
      " [ 5.14873520e-02  5.14873520e-02]\n",
      " [ 1.06276909e-01  1.06276909e-01]\n",
      " [ 5.50211796e-02  5.50211796e-02]\n",
      " [ 5.08837124e-02  5.08837124e-02]\n",
      " [ 5.42040970e-02  5.42040970e-02]\n",
      " [ 5.40011159e-02  5.40011159e-02]\n",
      " [ 5.09460155e-02  5.09460155e-02]]\n",
      "The above two columns you get should be very similar.\n",
      "(Left-Your Numerical Gradient, Right-Analytical Gradient)\n",
      "\n",
      "If your backpropagation implementation is correct, then \n",
      "the relative difference will be small (less than 1e-9). \n",
      "Relative Difference: 2.08634e-11\n"
     ]
    }
   ],
   "source": [
    "from models.neural_network import check_nn_gradients, cost_function\n",
    "\n",
    "check_nn_gradients(cost_function, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Testing Regularization\n",
    "\n",
    "To account for regularization, it turns out that you can add this as an additional term after computing the gradients using backpropagation.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Checking Backpropagation with regularization\n",
      "[[ 0.01231391  0.01231391]\n",
      " [ 0.05517882  0.05517882]\n",
      " [ 0.00868734  0.00868734]\n",
      " [-0.04577885 -0.04577885]\n",
      " [ 0.00392563  0.00392563]\n",
      " [-0.01680889 -0.01680889]\n",
      " [ 0.03955839  0.03955839]\n",
      " [ 0.05947046  0.05947046]\n",
      " [-0.00808597 -0.00808597]\n",
      " [-0.03331116 -0.03331116]\n",
      " [-0.06006927 -0.06006927]\n",
      " [-0.03170477 -0.03170477]\n",
      " [-0.01266695 -0.01266695]\n",
      " [ 0.05875679  0.05875679]\n",
      " [ 0.03880249  0.03880249]\n",
      " [-0.01685445 -0.01685445]\n",
      " [-0.00559433 -0.00559433]\n",
      " [-0.04512375 -0.04512375]\n",
      " [ 0.00883054  0.00883054]\n",
      " [ 0.05474083  0.05474083]\n",
      " [ 0.30936235  0.30936235]\n",
      " [ 0.2171389   0.2171389 ]\n",
      " [ 0.15479044  0.15479044]\n",
      " [ 0.1123028   0.1123028 ]\n",
      " [ 0.10154926  0.10154926]\n",
      " [ 0.12913713  0.12913713]\n",
      " [ 0.10815189  0.10815189]\n",
      " [ 0.11588885  0.11588885]\n",
      " [ 0.07604573  0.07604573]\n",
      " [ 0.02258916  0.02258916]\n",
      " [-0.00467339 -0.00467339]\n",
      " [ 0.01909496  0.01909496]\n",
      " [ 0.10629201  0.10629201]\n",
      " [ 0.11424367  0.11424367]\n",
      " [ 0.09025339  0.09025339]\n",
      " [ 0.03670767  0.03670767]\n",
      " [-0.00372026 -0.00372026]\n",
      " [ 0.00618246  0.00618246]]\n",
      "The above two columns you get should be very similar.\n",
      "(Left-Your Numerical Gradient, Right-Analytical Gradient)\n",
      "\n",
      "If your backpropagation implementation is correct, then \n",
      "the relative difference will be small (less than 1e-9). \n",
      "Relative Difference: 2.22115e-11\n"
     ]
    }
   ],
   "source": [
    "print(\"Checking Backpropagation with regularization\")\n",
    "check_nn_gradients(cost_function, lamda=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "the training ended with final cost of nural network = 0.35886562944784284\n"
     ]
    }
   ],
   "source": [
    "### Training Neural Network\n",
    "\n",
    "# creating short hand for cost function\n",
    "c_fun = lambda p: cost_function(p, input_layer_size, hidden_layer_size, num_labels, X, y, lamda=1)\n",
    "\n",
    "# importing scipy.optimize.minimize() function\n",
    "from scipy.optimize import minimize\n",
    "\n",
    "result = minimize(fun=c_fun, x0=initial_nn_params, jac=True, method=\"CG\", options={'maxiter':100})\n",
    "\n",
    "nn_params = result.x\n",
    "\n",
    "# Obtain Theta1 and Theta2 back from nn_params\n",
    "theta1 = np.reshape(nn_params[:hidden_layer_size * (input_layer_size + 1)],\n",
    "                    (hidden_layer_size, (input_layer_size + 1)))\n",
    "\n",
    "theta2 = np.reshape(nn_params[(hidden_layer_size * (input_layer_size + 1)):],\n",
    "                    (num_labels, (hidden_layer_size + 1)))\n",
    "\n",
    "print(\"the training ended with final cost of nural network = {}\".format(result.fun))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualizing Weights\n",
    "\n",
    "One way to understand what your neural network is learning is to visualize\n",
    "what the representations captured by the hidden units. Informally, given a\n",
    "particular hidden unit, one way to visualize what it computes is to find an\n",
    "input x that will cause it to activate \n",
    "\n",
    "For the neural network you trained, notice that the i<sup>th</sup> row\n",
    "of Θ (1) is a 401-dimensional vector that represents the parameter for the i<sup>th</sup>\n",
    "hidden unit. If we discard the bias term, we get a 400 dimensional vector\n",
    "that represents the weights from each input pixel to the hidden unit.\n",
    "Thus, one way to visualize the “representation” captured by the hidden\n",
    "unit is to reshape this 400 dimensional vector into a 20 × 20 image and\n",
    "display it.\n",
    "\n",
    "In our trained network, you should find that the hidden units corre-\n",
    "sponds roughly to detectors that look for strokes and other patterns in the\n",
    "input.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 720x720 with 25 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# displaying the 100 random samples using matplotlib\n",
    "temp = theta1[:, 1:]\n",
    "sns.set_style('white')\n",
    "fig1, axis1 = plt.subplots(5,5,sharex=True, sharey=True, figsize=(10,10))\n",
    "fig1.subplots_adjust(wspace=0.01, hspace=0.01)\n",
    "axis_flt1 = axis1.flatten()\n",
    "for i in range(temp.shape[0]):\n",
    "    axis_flt1[i].imshow(temp[i, :].reshape([20,20]).T, cmap='gray')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Predicting Labels with the trained model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Calculating accuracy of the neural network ...\n"
     ]
    },
    {
     "ename": "NameError",
     "evalue": "name 'sigmoid' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-1-b123e4378d6b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Calculating accuracy of the neural network ...\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mh1\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msigmoid\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0madd_bias_unit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m@\u001b[0m \u001b[0mtheta1\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      3\u001b[0m \u001b[0mh2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msigmoid\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0madd_bias_unit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mh1\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m@\u001b[0m \u001b[0mtheta2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mT\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0mprediction\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0margmax\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mh2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mNameError\u001b[0m: name 'sigmoid' is not defined"
     ]
    }
   ],
   "source": [
    "print(\"Calculating accuracy of the neural network ...\")\n",
    "h1 = sigmoid(add_bias_unit(X) @ theta1.T)\n",
    "h2 = sigmoid(add_bias_unit(h1) @ theta2.T)\n",
    "prediction = np.argmax(h2, axis=1).reshape(y.shape)\n",
    "\n",
    "print(\"Accuracy of neural network is {:.2f}%\".format(np.mean(prediction==y)*100))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}