{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rF2trPuyzm9C"
   },
   "source": [
    "# Exercise 4.1\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "ipcsUFDUzm9C"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "MCJe_ITJzm9G"
   },
   "source": [
    "**Disclaimer**\n",
    "\n",
    "The book mistakently refers to the page for Exercise 4.2 when introducting Exercise 4.1, etc. Of course, these numbers should match: Book Exercise 4.1 is discussed under Exercise 4.1 \n",
    "\n",
    "**Simple Network**\n",
    "\n",
    "We continue with the dataset first encountered in Exercise 3.2. Please refer to the discussion there for an introduction to the data and the learning objective.\n",
    "\n",
    "Here, we manually implement a simple network architecture"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "NopU99AT9G7s",
    "outputId": "d7e8848e-b9c0-4eb4-8f18-5acda9d8c343"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/bin/sh: wget: command not found\r\n"
     ]
    }
   ],
   "source": [
    "# The code snippet below is responsible for downloading the dataset\n",
    "# - for example when running via Google Colab.\n",
    "#\n",
    "# You can also directly download the file using the link if you work\n",
    "# with a local setup (in that case, ignore the !wget)\n",
    "\n",
    "!wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "5ONqeI5Uzm9H",
    "outputId": "d31ba8d4-cf0a-4f25-8a93-9091c0dd041a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "data: (4898, 12)\n",
      "First example:\n",
      "Features: [5.900e+00 3.400e-01 2.200e-01 2.400e+00 3.000e-02 1.900e+01 1.350e+02\n",
      " 9.894e-01 3.410e+00 7.800e-01 1.390e+01]\n",
      "Quality: 7.0\n"
     ]
    }
   ],
   "source": [
    "# Before working with the data, \n",
    "# we download and prepare all features\n",
    "\n",
    "# load all examples from the file\n",
    "data = np.genfromtxt('winequality-white.csv',delimiter=\";\",skip_header=1)\n",
    "\n",
    "print(\"data:\", data.shape)\n",
    "\n",
    "# Prepare for proper training\n",
    "np.random.shuffle(data) # randomly sort examples\n",
    "\n",
    "# take the first 3000 examples for training\n",
    "# (remember array slicing from last week)\n",
    "X_train = data[:3000,:11] # all features except last column\n",
    "y_train = data[:3000,11]  # quality column\n",
    "\n",
    "# and the remaining examples for testing\n",
    "X_test = data[3000:,:11] # all features except last column\n",
    "y_test = data[3000:,11] # quality column\n",
    "\n",
    "print(\"First example:\")\n",
    "print(\"Features:\", X_train[0])\n",
    "print(\"Quality:\", y_train[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jiwnyNHpzm9L"
   },
   "source": [
    "# Problems\n",
    "\n",
    "The goal is to implement the training of a neural network with one input layer, one hidden layer, and one output layer using gradient descent. We first (below) define the matrices and initialise with random values. We need W, b, W' and b'. The shapes will be:\n",
    "  * W: (number of hidden nodes, number of inputs) named `W`\n",
    "  * b: (number of hidden nodes) named `b`\n",
    "  * W': (number of hidden nodes) named `Wp`\n",
    "  * b': (one) named `bp`\n",
    "\n",
    "Your tasks are:     \n",
    "   * Implement a forward pass of the network as `dnn` (see below)\n",
    "   * Implement a function that uses one data point to update the weights using gradient descent. You can follow the `update_weights` skeleton below\n",
    "   * Now you can use the code below (training loop and evaluation) to train the network for multiple data points and even over several epochs. Try to find a set of hyperparameters (number of nodes in the hidden layer, learning rate, number of training epochs) that gives stable results. What is the best result (as measured by the loss on the training sample) you can get?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(50, 11)\n"
     ]
    }
   ],
   "source": [
    "# Initialise weights with suitable random distributions\n",
    "hidden_nodes = 50 # number of nodes in the hidden layer\n",
    "n_inputs = 11 # input features in the dataset\n",
    "\n",
    "# See section 4.3 of the book for more information on\n",
    "# how to initialise network parameters\n",
    "W = np.random.randn(hidden_nodes,11)*np.sqrt(2./n_inputs)\n",
    "b = np.random.randn(hidden_nodes)*np.sqrt(2./n_inputs)\n",
    "Wp = np.random.randn(hidden_nodes)*np.sqrt(2./hidden_nodes)\n",
    "bp = np.random.randn((1))\n",
    "\n",
    "print(W.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You can use this implementation of the ReLu activation function\n",
    "def relu(x):\n",
    "    return np.maximum(x, 0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "def dnn(x,W,b,Wp,bp):\n",
    "    # TODO Calculate and return network output of forward pass\n",
    "    # See Hint 1 for additional information\n",
    "    return -1 # change to the calculated output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "def update_weights(x,y, W, b, Wp, bp):\n",
    "    \n",
    "    learning_rate = 0.01\n",
    "\n",
    "    # TODO: Calculate the network output (use the function dnn defined above)\n",
    "  \n",
    "    # TODO: Derive the gradient for each of W,b,Wp,bp by taking the partial\n",
    "    # derivative of the loss function with respect to the variable and\n",
    "    # then implement the resulting weight-update procedure\n",
    "    # See Hint 2 for additional information\n",
    "\n",
    "    # You might need these numpy functions:\n",
    "    # np.dot, np.outer, np.heaviside\n",
    "    # Hint: Use .shape and print statements to make sure all operations\n",
    "    # do what you want them to \n",
    "    \n",
    "    # TODO: Update the weights/bias following the rule:  weight_new = weight_old - learning_rate * gradient    \n",
    "\n",
    "    return -1 # no return value needed, you can modify the weights in-place"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training loop and evaluation below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The code below implements the training.\n",
    "# If you correctly implement  dnn and update_weights above, \n",
    "# you should not need to change anything below. \n",
    "# (apart from increasing the number of epochs)\n",
    "\n",
    "train_losses = []\n",
    "test_losses = []\n",
    "\n",
    "# How many epochs to train\n",
    "# This will just train for one epoch\n",
    "# You will want a higher number once everything works\n",
    "n_epochs = 1 \n",
    "\n",
    "# Loop over the epochs\n",
    "for ep in range(n_epochs):\n",
    "        \n",
    "    # Each epoch is a complete over the training data\n",
    "    for i in range(X_train.shape[0]):\n",
    "        \n",
    "        # pick one example\n",
    "        x = X_train[i]\n",
    "        y = y_train[i]\n",
    "\n",
    "        # use it to update the weights\n",
    "        update_weights(x,y,W,b,Wp,bp)\n",
    "    \n",
    "    # Calculate predictions for the full training and testing sample\n",
    "    y_pred_train = [dnn(x,W,b,Wp,bp)[0] for x in X_train]\n",
    "    y_pred = [dnn(x,W,b,Wp,bp)[0] for x in X_test]\n",
    "\n",
    "    # Calculate aver loss / example over the epoch\n",
    "    train_loss = sum((y_pred_train-y_train)**2) / y_train.shape[0]\n",
    "    test_loss = sum((y_pred-y_test)**2) / y_test.shape[0] \n",
    "    \n",
    "    # print some information\n",
    "    print(\"Epoch:\",ep, \"Train Loss:\", train_loss, \"Test Loss:\", test_loss)\n",
    "    \n",
    "    # and store the losses for later use\n",
    "    train_losses.append(train_loss)\n",
    "    test_losses.append(test_loss)\n",
    "    \n",
    "    \n",
    "# After the training:\n",
    "    \n",
    "# Prepare scatter plot\n",
    "y_pred = [dnn(x,W,b,Wp,bp)[0] for x in X_test]\n",
    "\n",
    "print(\"Best loss:\", min(test_losses), \"Final loss:\", test_losses[-1])\n",
    "\n",
    "print(\"Correlation coefficient:\", np.corrcoef(y_pred,y_test)[0,1])\n",
    "plt.scatter(y_pred_train,y_train)\n",
    "plt.xlabel(\"Predicted\")\n",
    "plt.ylabel(\"True\")\n",
    "plt.show()\n",
    "\n",
    "# Prepare and loss over time\n",
    "plt.plot(train_losses,label=\"train\")\n",
    "plt.plot(test_losses,label=\"test\")\n",
    "plt.legend()\n",
    "plt.xlabel(\"Epoch\")\n",
    "plt.ylabel(\"Loss\")\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hint 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want a network with one hidden layer. As activiation in the hidden layer $\\sigma$ we apply element-wise ReLu, while no activation is used for the output layer. The forward pass of the network then reads:\n",
    "$$\\hat{y}=\\mathbf{W}^{\\prime} \\sigma(\\mathbf{W} \\vec{x}+\\vec{b})+b^{\\prime}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hint 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the regression problem the objective function is the mean squared error between the prediction and the true label $y$: \n",
    "$$\n",
    "L=(\\hat{y}-y)^{2}\n",
    "$$\n",
    "\n",
    "Taking the partial derivatives - and diligently the applying chain rule - with respect to the different objects yields:\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\frac{\\partial L}{\\partial b^{\\prime}} &=2(\\hat{y}-y) \\\\\n",
    "\\frac{\\partial L}{\\partial b_{k}} &=2(\\hat{y}-y) \\mathbf{W}_{k}^{\\prime} \\theta\\left(\\sum_{i} \\mathbf{W}_{i k} x_{i}+b_{k}\\right) \\\\\n",
    "\\frac{\\partial L}{\\partial \\mathbf{W}_{k}^{\\prime}} &=2(\\hat{y}-y) \\sigma\\left(\\sum_{i} \\mathbf{W}_{i k} x_{i}+b_{k}\\right) \\\\\n",
    "\\frac{\\partial L}{\\partial \\mathbf{W}_{k m}} &=2(\\hat{y}-y) \\mathbf{W}_{m}^{\\prime} \\theta\\left(\\sum_{i} \\mathbf{W}_{i k} x_{i}+b_{m}\\right) x_{k}\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "Here, $\\Theta$ denotes the Heaviside step function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "Exercise 4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}