{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "+ [Part 1: Linear Regression From Scratch](#) <--- (You Are Here)\n",
    "    + [Inputs](#Inputs)\n",
    "    + [Model](#Model)\n",
    "        + [Parameters](#Parameters)\n",
    "        + [Operations](#Operations)\n",
    "        + [Cost Function](#Cost-Function)\n",
    "    + [Optimization](#Optimization)\n",
    "        + [Gradient Descent](#Gradient-Descent)\n",
    "    + [Training](#Training)\n",
    "    + [Testing](#Testing)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A Gentle Introduction to Deep Learning\n",
    "\n",
    "### Michelle Fullwood\n",
    "PyCon 2017 talk\n",
    "\n",
    "This Jupyter notebook only contains code for the talk."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 1: Linear Regression From Scratch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Inputs\n",
    "Here's a typical linear regression problem. We're trying to `predict prices of individual houses`. And we're given three pieces of information about each house, three features:\n",
    "+ `floor area`\n",
    "+ `public transport distance`\n",
    "+ `room number`\n",
    "\n",
    "And we're going to represent our features in a matrix with as many rows as we have houses and three columns, one for each of our input features. We'll call that matrix $X$. And we're trying to predict this vector $Y$, which represents the housing prices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# getting numpy setup\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# setting up training data for X and Y\n",
    "X_train = np.array([\n",
    "    [1250, 350, 3],\n",
    "    [1700, 900, 6],\n",
    "    [1400, 600, 3]\n",
    "])\n",
    "\n",
    "Y_train = np.array([345000, 580000, 360000])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model\n",
    "\n",
    "Next we have to consider our `model`. The model is the set of functions that we're going to consider in mapping $X$ to $Y$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Parameters\n",
    "So the parameters of this model will be the three weights that correspond to each feature, and the intercept:\n",
    "```python\n",
    "weights = np.array([300, -10, -1])\n",
    "intercept = -26497\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Operations\n",
    "And the key operation of this model will be matrix multiplication of $X$ by the `weights`. Then we'll add the intercept element-wise to get our final prediction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def model(X, weights, intercept):\n",
    "    return X.dot(weights) + intercept"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Cost Function\n",
    "Now the next ingredient we'll need is a `cost function`, also called a *loss function*. We need this to measure how good or bad a set of parameters is, how close our predictions are getting to the actual values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cost(Y_hat, Y):\n",
    "    return np.sum((Y_hat - Y)**2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Optimization\n",
    "Now we need to actually find the parameters that give us the best fit. In other words, holding $X$ and $Y$ constant, we'll adjust our parameters to minimize the cost. Each set of parameters will yield a cost, so we can plot cost against parameter values. Our goal in optimization is to find the parameters that correspond to that lowest point.\n",
    "\n",
    "And we're going to do that by trial and error. By this I don't mean just trying random sets of parameters and seeing what works best, but the trial and error you do when you're, say, practising how to shoot hoops and you're trying to adjust your angle. So you shoot, and you miss by a couple inches. You're too far to the right. So you adjust your angle to the left and try again.\n",
    "                \n",
    "That's what we're going to do also. We'll try a set of parameters, then we'll calculate our cost, and then we'll follow the gradient of the cost curve at that point down towards the minimum. This process is called `gradient descent`.                "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "#### Gradient Descent\n",
    "$$\n",
    "\\hat{y} = w_0x_0 + w_1x_1 + w_2x_2 + b\n",
    "\\\\\n",
    "\\epsilon = (y-\\hat{y})^2\n",
    "\\\\\n",
    "Goal:(\\frac{\\partial\\epsilon}{\\partial w_i}, \\frac{\\partial\\epsilon}{\\partial b})\n",
    "$$\n",
    "So we need to be able to calculate the gradient of the cost, $\\epsilon$, with respect to each of the weights and the intercept:\n",
    "\n",
    "$$ Chain Rule: (\\frac{\\partial\\epsilon}{\\partial w_i} = \\frac{d\\epsilon}{d\\hat{y}}\\frac{\\partial\\hat{y}}{\\partial w_i})$$\n",
    "Applying the chain rule, we can break that up into two pieces: the gradient of the cost with respect to $\\hat{y}$ (i.e. the predicted $y$), and the gradient of $\\hat{y}$ with respect to the weight. So let's calculate those:\n",
    "\n",
    "$$\n",
    "\\hat{y} = w_0x_0 + w_1x_1 + w_2x_2 + b\n",
    "\\\\\n",
    "\\frac{\\partial\\hat{y}}{\\partial w_0} = x_0\n",
    "$$\n",
    "The gradient of $\\hat{y}$ with respect to $w_0$ is pretty simple. All the terms are constant with respect to $w_0$ so those go to zero and we're left with $x_0$.\n",
    "\n",
    "$$\n",
    "\\epsilon = (y-\\hat{y})^2\n",
    "\\\\\n",
    "\\frac{d\\epsilon}{d\\hat{y}} = -2(y-\\hat{y})\n",
    "$$\n",
    "For the second gradient, we bring down the power and then apply the chain rule again to bring out that negative sign.\n",
    "\n",
    "$$\n",
    "\\frac{\\partial\\hat{y}}{\\partial w_0} = x_0\n",
    "\\\\\n",
    "\\frac{d\\epsilon}{d\\hat{y}} = -2(y-\\hat{y})\n",
    "\\\\\n",
    "\\frac{\\partial\\epsilon}{\\partial w_0} = -2(y-\\hat{y})x_0\n",
    "$$\n",
    "So to get our desired gradient, we multiply those together to get this expression. And that goes for all the weights.\n",
    "\n",
    "$$\n",
    "\\hat{y} = w_0x_0 + w_1x_1 + w_2x_2 + b\\cdot 1\n",
    "\\\\\n",
    "\\frac{\\partial\\epsilon}{\\partial b} = -2(y-\\hat{y})\\cdot 1\n",
    "$$\n",
    "As for the intercept $b$, we can consider that a special weight where the $x$ it corresponds to is always 1. So that's the form the gradient will take with respect to $b$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training\n",
    "Putting all that together, here's how training goes. Whatever our current weights and intercept are, we *calculate our prediction*, *calculate our error*, *calculate the gradients*, and *update our parameters* by gradient descent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def training_round(x, y, weights, intercept, alpha=0.05):\n",
    "    # calculate our prediction\n",
    "    y_hat = model(x, weights, intercept)\n",
    "\n",
    "    # calculate error\n",
    "    delta_y = y - y_hat\n",
    "\n",
    "    # calculate gradients\n",
    "    gradient_weights = -2 * delta_y * x\n",
    "    gradient_intercept = -2 * delta_y\n",
    "\n",
    "    # update parameters\n",
    "    weights = weights - alpha * gradient_weights\n",
    "    intercept = intercept - alpha * gradient_intercept\n",
    "\n",
    "    return weights, intercept"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That was a **single round of training**. The entire training process involves first *initializing our parameters* and doing some *number of training rounds*. Whatever the weights and intercept are at the end, that's what we'll use to predict with."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "NUM_EPOCHS = 10\n",
    "\n",
    "def train(X, Y):\n",
    "    # initialize parameters\n",
    "    weights = np.random.randn(3)\n",
    "    intercept = 0\n",
    "\n",
    "    # training rounds\n",
    "    for i in range(NUM_EPOCHS):\n",
    "        for (x, y) in zip(X, Y):\n",
    "            weights, intercept = training_round(x, y, weights, intercept)\n",
    "            \n",
    "    # return weights and intercept\n",
    "    return weights, intercept"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Testing\n",
    "And testing is simple: we get our estimate and figure out how far off we were on average."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def test(X_test, Y_test, weights, intercept):\n",
    "    Y_predicted = model(X_test, weights, intercept)\n",
    "    error = cost(Y_predicted, Y_test)\n",
    "    return np.sqrt(np.mean(error))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}