{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Currently we are using a naive minimization algorithm. It will approximate the gradient and move in that direction. This requires lots of function calls. \n",
    "- We have no way to get the Hessian - which is needed for the variance estimates of our parameters. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Automatic differentiation - how to differentiate an algorithm\n",
    "\n",
    "Symbolic differentiation is what we learned in school, and software like SymPy, Wolfram and Mathematica do this well. But I want to differentiate the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "from matplotlib import pyplot as plt\n",
    "\n",
    "def f(x):\n",
    "    y = 0.\n",
    "    for i in range(100):\n",
    "        y = np.sin(y + x)\n",
    "    return y\n",
    "\n",
    "x = np.linspace(-2, 2, 250)\n",
    "plt.plot(x, [f(_) for _ in x]);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mathematically, it's something like:\n",
    "\n",
    "$$f(x) = \\sin(x + \\sin(x + \\sin(x + ...\\sin(x)))...)$$\n",
    "\n",
    "Good luck differentiating that and getting a nice closed form. If this is not complicated enough for you, feel free to add some `if` statements. \n",
    "\n",
    "We can use `autograd`, an automatical diff package, to compute _exact, pointwise_ derivatives."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from autograd import grad\n",
    "from autograd import numpy as np\n",
    "\n",
    "def f(x):\n",
    "    y = 0.\n",
    "    for i in range(100):\n",
    "        # this np. is now from autograd - important! \n",
    "        y = np.sin(y + x)\n",
    "    return y\n",
    "\n",
    "grad_f = grad(f)\n",
    "grad_f(1.)\n",
    "# magic! "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### We just differentiated an algorithm. Pick your brain pieces off the walls please. \n",
    "\n",
    "At a high level, autograd has a lookup of simple functions and their derivatives, and uses repeated use of the chain rule + calculus rules \n",
    "\n",
    "Of course, you can string together these pointwise values into a function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.linspace(-2, 2, 250)\n",
    "plt.plot(x, [grad_f(x_) for x_ in x])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To use this in our optimization routines:\n",
    "\n",
    " - we can automatically compute gradients that the optimizer can use. \n",
    " - Hessians can be exactly calculated (we will do this later)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "T = (np.random.exponential(size=1000)/1.5) ** 2.3\n",
    "E = np.random.binomial(1, 0.95, size=1000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cumulative_hazard(params, t):\n",
    "    lambda_, rho_ = params\n",
    "    return (t / lambda_) ** rho_\n",
    "\n",
    "def log_hazard(params, t):\n",
    "    lambda_, rho_ = params\n",
    "    return np.log(rho_) - np.log(lambda_) + (rho_ - 1) * (np.log(t) - np.log(lambda_))\n",
    "\n",
    "def log_likelihood(params, t, e):\n",
    "    return np.sum(e * log_hazard(params, t)) - np.sum(cumulative_hazard(params, t))\n",
    "\n",
    "def negative_log_likelihood(params, t, e):\n",
    "    return -log_likelihood(params, t, e)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So `grad(negative_log_likelihood)` will find the gradient of `negative_log_likelihood` with respect to the first parameter, `params`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "grad_negative_log_likelihood = # what goes here?\n",
    "\n",
    "print(grad_negative_log_likelihood(np.array([1., 1.]), T, E))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.optimize import minimize\n",
    "\n",
    "results = minimize(negative_log_likelihood, \n",
    "        x0 = np.array([1.0, 1.0]),\n",
    "        method=None, \n",
    "        args=(T, E),\n",
    "        jac=grad_negative_log_likelihood,\n",
    "        bounds=((0.00001, None), (0.00001, None)))\n",
    "\n",
    "print(results)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from autograd import value_and_grad\n",
    "\n",
    "results = minimize(\n",
    "        # fill this in. \n",
    "        x0 = np.array([1.0, 1.0]),\n",
    "        method=None, \n",
    "        args=(T, E),\n",
    "        jac=True, # notice this set to True now.\n",
    "        bounds=((0.00001, None), (0.00001, None)))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's continue this analytical-train 🚂 to Part 4! "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}