{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "HTML(\"\"\"\n",
    "<style>\n",
    ".rendered_html table, .rendered_html th, .rendered_html tr, .rendered_html td {\n",
    "     font-size: 100%;\n",
    "}\n",
    "</style>\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "using PyPlot, PyCall\n",
    "using AIBECS, WorldOceanAtlasTools\n",
    "using LinearAlgebra"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<img src=\"https://pbs.twimg.com/profile_images/1829321548/ess_logo_400x400.png\" width=20% align=right>\n",
    "<img src=\"https://user-images.githubusercontent.com/4486578/57202054-3d1c4400-6fe4-11e9-97d7-9a1ffbfcb2fc.png\" width=20% align=left> \n",
    "\n",
    "<div style=\"text-align: center;\">\n",
    "    <span style=\"font-size: title\"><h1>The F-1 algorithm</h1></span><br><br>\n",
    "    <span style=\"color:#4063d8\">Benoît Pasquier</span> and <span style=\"color:#4063d8\">François Primeau</span><br>\n",
    "    University of California, Irvine\n",
    "</div>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<img src=\"https://user-images.githubusercontent.com/4486578/61258204-c657b000-a7b7-11e9-883a-f7d38a39d35c.png\" width=18% align=right>\n",
    "\n",
    "<span style=\"color:#4063d8\">**Paper**</span>: *The F-1 algorithm for efficient computation of the Hessian matrix of an objective function defined implicitly by the solution of a steady-state problem*\n",
    "(under review)\n",
    "\n",
    "<span style=\"color:#cb3c33\">**Code**</span>: <span style=\"color:#cb3c33\">**F1Method.jl**</span> (Julia package — check it out on GitHub!)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "$\\newcommand{\\state}{\\boldsymbol{x}}$\n",
    "$\\newcommand{\\sol}{\\boldsymbol{s}}$\n",
    "$\\newcommand{\\params}{\\boldsymbol{p}}$\n",
    "$\\newcommand{\\lambdas}{\\boldsymbol{\\lambda}}$\n",
    "$\\newcommand{\\statefun}{\\boldsymbol{F}}$\n",
    "$\\newcommand{\\cost}{f}$\n",
    "$\\newcommand{\\objective}{\\hat{f}}$\n",
    "$\\DeclareMathOperator*{\\minimize}{minimize}$\n",
    "$\\newcommand{\\vece}{\\boldsymbol{e}}$\n",
    "$\\newcommand{\\matI}{\\mathbf{I}}$\n",
    "$\\newcommand{\\matA}{\\mathbf{A}}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## F-1 algorithm – Outline\n",
    "\n",
    "1. Motivation \\& Context \n",
    "1. Autodifferentiation\n",
    "1. What is the F-1 algorithm?\n",
    "1. Benchmarks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<img src=\"https://raw.githubusercontent.com/JuliaLang/julia-logo-graphics/master/images/julia-logo-color.png\" width=25% align=right>\n",
    "As we go through, I will demo some Julia code!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<img src=\"https://user-images.githubusercontent.com/4486578/60554111-8fc27400-9d79-11e9-9ca7-6d78ee89ea70.png\" width=40% align=right>\n",
    "\n",
    "# Motivation\n",
    "\n",
    "For <span style=\"color:#cb3c33\">**parameter optimization**</span> and parameter sensitivity estimation!\n",
    "\n",
    "The [**AIBECS**](https://github.com/briochemc/AIBECS.jl) \n",
    "    *for building global marine steady-state biogeochemistry models in just a few commands* \n",
    "    (yesterday's CCRC seminar)\n",
    "    \n",
    "And the <span style=\"color:#cb3c33\">**F-1 algorithm**</span> to facilitate optimization of biogeochemical parameters.\n",
    "\n",
    "But the context of the **F-1 algorithm** is more general..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "# Context\n",
    "<span style=\"color:#9558b2\">**Steady-state**</span> equation\n",
    "\n",
    "<span style=\"color:#9558b2\">$$\\frac{\\partial \\state}{\\partial t} = \\statefun(\\state,\\params) = 0$$</span>\n",
    "\n",
    "for some <span style=\"color:#4063d8\">state $\\state$</span> (size $n \\sim 400\\,000$) and <span style=\"color:#4063d8\">parameters $\\params$</span> (size $m \\sim 10$)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<span style=\"color:#cb3c33\">**Constrained optimization**</span> problem\n",
    "\n",
    "$$\\left\\{\\begin{aligned}\n",
    "        \\minimize_{\\boldsymbol{x}, \\boldsymbol{p}} & & \\cost(\\state, \\params) \\\\\n",
    "        \\textrm{subject to} & & \\statefun(\\state, \\params) = 0\n",
    "    \\end{aligned}\\right.$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "function constrained_optimization_plot()\n",
    "    figure(figsize=(10,6))\n",
    "    f(x,p) = (x - 1)^2 + (p - 1)^2 + 1\n",
    "    s(p) = 1 + 0.9atan(p - 0.3)\n",
    "    xs, ps = -1.2:0.1:3, -1.2:0.1:3.2\n",
    "    plt = plot_surface(xs, ps, [f(x,p) for p in ps, x in xs], alpha = 0.5, cmap=:viridis_r)\n",
    "    gca(projection=\"3d\")\n",
    "    P = repeat(reshape(ps, length(ps), 1), outer = [1, length(xs)])\n",
    "    X = repeat(reshape(xs, 1, length(xs)), outer = [length(ps), 1])\n",
    "    contour(X, P, [f(x,p) for p in ps, x in xs], levels = 2:6, colors=\"black\", alpha=0.5, linewidths=0.5)\n",
    "    plot([s(p) for p in ps], ps)\n",
    "    legend((\"\\$F(x,p) = 0\\$\",))\n",
    "    xlabel(\"state \\$x\\$\"); ylabel(\"parameters \\$p\\$\"); zlabel(\"objective \\$f(x,p)\\$\")\n",
    "    xticks(unique(round.(xs))); yticks(unique(round.(ps))); zticks(2:6)\n",
    "    return plt, f, s, xs, ps\n",
    "end\n",
    "constrained_optimization_plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<span style=\"color:#cb3c33\">**Unconstrained optimization**</span> along the manifold of steady-state solutions.\n",
    "\n",
    "<span style=\"color:#cb3c33\">$$\\minimize_\\params \\objective(\\params)$$</span>\n",
    "\n",
    "where \n",
    "<span style=\"color:#9558b2\">$$\\objective(\\params) \\equiv \\cost\\big(\\sol(\\params), \\params\\big)$$</span> \n",
    "is the <span style=\"color:#9558b2\">**objective**</span> function.\n",
    "\n",
    "And <span style=\"color:#4063d8\">$\\sol(\\params)$</span> is the <span style=\"color:#4063d8\">**steady-state solution**</span> for parameters <span style=\"color:#4063d8\">$\\params$</span>, i.e., such that\n",
    "\n",
    "$$\\statefun\\left(\\sol(\\params),\\params\\right) = 0$$\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "function unconstrained_optimization_plot()\n",
    "    plt, f, s, xs, ps = constrained_optimization_plot()\n",
    "    plot([s(p) for p in ps], ps, [f(s(p),p) for p in ps], color=\"black\", linewidth=3)\n",
    "    plot(maximum(xs) * ones(length(ps)), ps, [f(s(p),p) for p in ps], color=\"red\")\n",
    "    legend((\"\\$x=s(p) \\\\Longleftrightarrow F(x,p)=0\\$\",\"\\$f(x,p)\\$ with \\$x = s(p)\\$\", \"\\$\\\\hat{f}(p) = f(s(p),p)\\$\"))\n",
    "    for p in ps[1:22:end]\n",
    "        plot(s(p) * [1,1], p * [1,1], f(s(p),p) * [0,1], color=\"black\", alpha = 0.3, linestyle=\"--\")\n",
    "        plot([s(p), maximum(xs)], p * [1,1], f(s(p),p) * [1,1], color=\"black\", alpha = 0.3, linestyle=\"--\", marker=\"o\")\n",
    "    end\n",
    "    return plt\n",
    "end\n",
    "unconstrained_optimization_plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<img src=\"https://user-images.githubusercontent.com/4486578/61255958-25b0c280-a7ae-11e9-9d4c-7cd246e94e69.png\" width=40% align=right>\n",
    "\n",
    "Two **nested** iterative algorithms\n",
    "\n",
    "<span style=\"color:#cb3c33\">**Inner solver**</span> with Newton step\n",
    "\n",
    "$$\\Delta \\state \\equiv - \\left[\\nabla_\\state \\statefun(\\state,\\params)\\right]^{-1} \\statefun(\\state,\\params)$$\n",
    "\n",
    "Outer <span style=\"color:#cb3c33\">**Optimizer**</span> with Newton step\n",
    "\n",
    "$$\\Delta\\params \\equiv - \\left[\\nabla^2\\objective(\\params)\\right]^{-1}\\nabla \\objective(\\params)$$\n",
    "\n",
    "requires <span style=\"color:#389826\">the Hessian</span> of the objective function, \n",
    "\n",
    "<span style=\"color:#389826\">$$\\nabla^2 \\objective(\\params)$$</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Autodifferentiation\n",
    "\n",
    "Take the **Taylor expansion** of $\\objective(\\params)$ in the $h\\vece_j$ direction for a given $h$:\n",
    "\n",
    "$$\\objective(\\params + h \\vece_j)\n",
    "    = \\objective(\\params)\n",
    "    + h \\underbrace{\\nabla\\objective(\\params) \\, \\vece_j}_{?} \n",
    "    + \\frac{h^2}{2} \\, \\left[\\vece_j^\\mathsf{T} \\, \\nabla^2\\objective(\\params) \\, \\vece_j\\right]\n",
    "    + \\ldots$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "A standard solution is to use <span style=\"color:#964b00\">**Finite differences**</span>:\n",
    "\n",
    "<span style=\"color:#964b00\">$$\\nabla\\objective(\\params) \\, \\vece_j \n",
    "    = \\frac{\\objective(\\params + h \\vece_j) - \\objective(\\params)}{h}\n",
    "    + \\mathcal{O}(h)$$</span>\n",
    "    \n",
    "But <span style=\"color:#cb3c33\">truncation</span> and <span style=\"color:#cb3c33\">round-off</span> errors!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "A better solution is to use <span style=\"color:#9558b2\">**Complex**</span> numbers!<br>\n",
    "Taylor-expand in the $ih\\vece_j$ direction:\n",
    "$$\\objective(\\params + i h \\vece_j)\n",
    "    = \\objective(\\params)\n",
    "    + i h \\underbrace{\\nabla\\objective(\\params) \\, \\vece_j}_{?}\n",
    "    - \\frac{h^2}{2} \\, \\left[\\vece_j^\\mathsf{T} \\, \\nabla^2\\objective(\\params) \\, \\vece_j\\right]\n",
    "    + \\ldots$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Because when taking the imaginary part, the convergence is faster and there are no more round-off errors:\n",
    "\n",
    "<span style=\"color:#9558b2\">$$\\nabla\\objective(\\params) \\, \\vece_j = \\Im\\left[\\frac{\\objective(\\params + i h \\vece_j)}{h}\\right] + \\mathcal{O}(h^2)$$</span>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "𝑓(x) = cos(x^2) + exp(x)\n",
    "∇𝑓(x) = -2x * sin(x^2) + exp(x)\n",
    "finite_differences(f, x, h) = (f(x + h) - f(x)) / h\n",
    "centered_differences(f, x, h) = (f(x + h) - f(x - h)) / 2h\n",
    "complex_step_method(f, x, h) = imag(f(x + im * h)) / h\n",
    "relative_error(m, f, ∇f, x, h) = Float64(abs(BigFloat(m(f, x, h)) - ∇f(BigFloat(x))) / abs(∇f(x)))\n",
    "x, step_sizes = 2.0, 10 .^ (-20:0.02:0)\n",
    "numerical_schemes = [finite_differences, centered_differences, complex_step_method]\n",
    "plot(step_sizes, [relative_error(m, 𝑓, ∇𝑓, x, h) for h in step_sizes, m in numerical_schemes])\n",
    "loglog(), legend(string.(numerical_schemes)), xlabel(\"step size, \\$h\\$\"), ylabel(\"Relative Error, \\$\\\\frac{|\\\\bullet - \\\\nabla f(x)|}{|\\\\nabla f(x)|}\\$\")\n",
    "title(\"There are better alternatives to finite differences\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "An even better solution is to use <span style=\"color:#389826\">**Dual**</span> numbers! \n",
    "\n",
    "Define <span style=\"color:#389826\">$\\varepsilon \\ne 0$</span> s.t. <span style=\"color:#389826\">$\\varepsilon^2 = 0$</span>, then the complete Taylor expansion in the $\\varepsilon \\vece_j$ direction is\n",
    "\n",
    "$$\\objective(\\params + \\varepsilon \\vece_j)\n",
    "    = \\objective(\\params)\n",
    "    + \\varepsilon \\underbrace{\\nabla\\objective(\\params) \\, \\vece_j}_{?}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Hence, 1st derivatives are given <span style=\"color:#389826\">**exactly**</span> by\n",
    "\n",
    "<span style=\"color:#389826\">$$\\nabla\\objective(\\params) \\, \\vece_j = \\mathfrak{D}\\left[\\objective(\\params + \\varepsilon \\vece_j)\\right]$$</span>\n",
    "\n",
    "where <span style=\"color:#389826\">$\\mathfrak{D}$</span> is the dual part (the <span style=\"color:#389826\">$\\varepsilon$</span> coefficient)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "The dual number <span style=\"color:#389826\">$\\varepsilon$</span> behaves like an <span style=\"color:#389826\">**infinitesimal**</span> and it gives very accurate derivatives!<br>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "using DualNumbers\n",
    "dual_step_method(f, x, h) = dualpart(f(x + ε))\n",
    "push!(numerical_schemes, dual_step_method)\n",
    "plot(step_sizes, [relative_error(m, 𝑓, ∇𝑓, x, h) for h in step_sizes, m in numerical_schemes])\n",
    "loglog(), legend(string.(numerical_schemes)), xlabel(\"step size, \\$h\\$\"), ylabel(\"Relative Error, \\$\\\\frac{|\\\\bullet - \\\\nabla f(x)|}{|\\\\nabla f(x)|}\\$\")\n",
    "title(\"There are even better alternatives to complex-step differentiation\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Just like complex identify with $\\mathbb{R}[X]/(X^2+1)$,<br>\n",
    "dual numbers identify with $\\mathbb{R}[X]/(X^2)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "The <span style=\"color:#389826\">**gradient**</span> of the objective function can be computed in $m$ dual evaluations of the objective function, via\n",
    "\n",
    "<span style=\"color:#389826\">$$\\nabla\\objective(\\params) = \\mathfrak{D} \\left( \\left[\\begin{matrix}\n",
    "\t\t\\objective(\\params + \\varepsilon \\vece_1) \\\\\n",
    "\t\t\\objective(\\params + \\varepsilon \\vece_2) \\\\\n",
    "\t\t\\vdots \\\\\n",
    "        \\objective(\\params + \\varepsilon \\vece_{m})\n",
    "    \\end{matrix} \\right]^\\mathsf{T} \\right)$$</span>\n",
    "    \n",
    "where $m$ is the number of parameters."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "For second derivatives, we can use <span style=\"color:#4063d8\">hyperdual</span> numbers.<br>\n",
    "Let <span style=\"color:#4063d8\">$\\varepsilon_1$</span> and <span style=\"color:#4063d8\">$\\varepsilon_2$</span> be the hyperdual units defined by <span style=\"color:#4063d8\">$\\varepsilon_1^2 = \\varepsilon_2^2 = 0$</span> but <span style=\"color:#4063d8\">$\\varepsilon_1 \\varepsilon_2 \\ne 0$</span>."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Let <span style=\"color:#4063d8\">$\\params_{jk} = \\params + \\varepsilon_1 \\vece_j + \\varepsilon_2 \\vece_k$</span>, for which the Taylor expansion of $\\objective$ is\n",
    "\n",
    "$$\\objective(\\params_{jk})\n",
    "    = \\objective(\\params)\n",
    "    + \\varepsilon_1 \\nabla\\objective(\\params) \\vece_j\n",
    "    + \\varepsilon_2 \\nabla\\objective(\\params) \\vece_k\n",
    "    + \\varepsilon_1 \\varepsilon_2 \\underbrace{\\vece_j^\\mathsf{T} \\nabla^2\\objective(\\params) \\vece_k}_{?}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Taking the \"hyperdual\" part gives the second derivative:\n",
    "\n",
    "<span style=\"color:#4063d8\">$$\\vece_j^\\mathsf{T} \\nabla^2\\objective(\\params) \\vece_k = \\mathfrak{H}\\left[\\objective(\\params_{jk})\\right]$$</span>\n",
    "\n",
    "where <span style=\"color:#4063d8\">$\\mathfrak{H}$</span> stands for hyperdual part (the $\\varepsilon_1 \\varepsilon_2$ coefficient)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<span style=\"color:#4063d8\">Hyperdual</span> steps also give very accurate derivatives!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "∇²𝑓(x) = -2sin(x^2) - 4x^2 * cos(x^2) + exp(x)\n",
    "using HyperDualNumbers\n",
    "finite_differences_2(f, x, h) = (f(x + h) - 2f(x) + f(x - h)) / h^2\n",
    "hyperdual_step_method(f, x, h) = ε₁ε₂part(f(x + ε₁ + ε₂))\n",
    "numerical_schemes_2 = [finite_differences_2, hyperdual_step_method]\n",
    "plot(step_sizes, [relative_error(m, 𝑓, ∇²𝑓, x, h) for h in step_sizes, m in numerical_schemes_2])\n",
    "loglog(), legend(string.(numerical_schemes_2)), xlabel(\"step size, \\$h\\$\"), ylabel(\"Relative Error, \\$\\\\frac{|\\\\bullet - \\\\nabla f(x)|}{|\\\\nabla f(x)|}\\$\")\n",
    "title(\"HyperDual Numbers for second derivatives\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### <span style=\"color:#cb3c33\">Autodifferentiating through an iterative solver is expensive</span>\n",
    "\n",
    "The <span style=\"color:#4063d8\">Hessian</span> of $\\objective$ can be computed in $\\frac{m(m+1)}{2}$ hyperdual evaluations:\n",
    "\n",
    "<span style=\"color:#4063d8\">$$\\nabla^2\\objective(\\params) = \\mathfrak{H} \\left( \\left[\\begin{matrix}\n",
    "\t\t\\objective(\\params_{11})\n",
    "        & \\objective(\\params_{12})\n",
    "        & \\cdots\n",
    "        & \\objective(\\params_{1m})\n",
    "        \\\\\n",
    "\t\t\\objective(\\params_{12})\n",
    "        & \\objective(\\params_{22})\n",
    "        & \\cdots\n",
    "        & \\objective(\\params_{2m})\n",
    "        \\\\\n",
    "        \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
    "\t\t\\objective(\\params_{1m})\n",
    "        & \\objective(\\params_{2m})\n",
    "        & \\cdots\n",
    "        & \\objective(\\params_{mm})\n",
    "    \\end{matrix} \\right] \\right)$$</span>\n",
    "\n",
    "\n",
    "But this requires <span style=\"color:#cb3c33\">$\\frac{m(m+1)}{2}$ calls to the inner solver</span>, which requires <span style=\"color:#cb3c33\">hyperdual factorizations</span>, and <span style=\"color:#cb3c33\">fine-tuned non-real tolerances</span>!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# What is the F-1 algorithm\n",
    "\n",
    "### What does it do?\n",
    "\n",
    "The F-1 algorithm allows you to calculate the <span style=\"color:#389826\">**gradient**</span> and <span style=\"color:#4063d8\">**Hessian**</span> of an objective function, $\\objective(\\params)$, defined implicitly by the solution of a steady-state problem.\n",
    "\n",
    "### How does it work?\n",
    "\n",
    "It leverages analytical shortcuts, combined with <span style=\"color:#389826\">**dual**</span> and <span style=\"color:#4063d8\">**hyperdual**</span> numbers, to avoid calls to the inner solver and unnecessary factorizations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Analytical <span style=\"color:#389826\">gradient</span>\n",
    "\n",
    "Differentiate the objective function $\\objective(\\params) = \\cost\\left(\\sol(\\params), \\params\\right)$:\n",
    "\n",
    "<img src=\"https://user-images.githubusercontent.com/4486578/62348010-0e5c2e00-b53f-11e9-9276-ef7fba0a1bba.png\" width=30% align=right>\n",
    "\n",
    "$$\\color{forestgreen}{\\underbrace{\\nabla\\objective(\\params)}_{1 \\times m}}\n",
    "    = \\color{royalblue}{\\underbrace{\\nabla_\\state\\cost(\\sol, \\params)_{\\vphantom{\\params}}}_{1 \\times n}} \\,\n",
    "     \\color{red}{\\underbrace{\\nabla \\sol(\\params)_{\\vphantom{\\params}}}_{n \\times m}}\n",
    "    + \\color{DarkOrchid}{\\underbrace{\\nabla_\\params \\cost(\\sol, \\params)}_{1 \\times m}}$$\n",
    "    \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Differentiate the steady-state equation, $\\statefun\\left(\\sol(\\params),\\params\\right)=0$:  \n",
    "\n",
    "<img src=\"https://user-images.githubusercontent.com/4486578/62348830-3ea4cc00-b541-11e9-8ee9-2eddb0fa64eb.png\" width=30% align=right>\n",
    "\n",
    "$$\\color{royalblue}{\\underbrace{\\matA_{\\vphantom{\\params}}}_{n \\times n}} \\,\n",
    "     \\color{red}{\\underbrace{\\nabla\\sol(\\params)_{\\vphantom{\\params}}}_{n \\times m}}\n",
    "    + \\color{forestgreen}{\\underbrace{\\nabla_\\params \\statefun(\\sol, \\params)}_{n\\times m}} = 0$$\n",
    "  \n",
    "where $\\matA \\equiv \\nabla_\\state\\statefun(\\sol,\\params)$ is the <span style=\"color:#cb3c33\">**Jacobian**</span> of the steady-state system (a large, sparse matrix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Analytical <span style=\"color:#4063d8\">Hessian</span>\n",
    "\n",
    "<center><img src=\"https://imgs.xkcd.com/comics/headache.png\" width=30%></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Differentiate $\\objective(\\params) = \\cost\\left(\\sol(\\params), \\params\\right)$ twice:\n",
    "\n",
    "$$\\begin{split}\n",
    "    \t\\nabla^2 \\objective(\\params)\n",
    "    \t&= \\nabla_{\\state\\state}\\cost(\\sol, \\params) \\, (\\nabla\\sol \\otimes \\nabla\\sol)\n",
    "        + \\nabla_{\\state\\params}\\cost(\\sol, \\params) \\, (\\nabla\\sol \\otimes \\matI_\\params) \\\\\n",
    "    \t&\\quad+ \\nabla_{\\params\\state}\\cost(\\sol, \\params) \\, (\\matI_\\params \\otimes \\nabla\\sol)\n",
    "        + \\nabla_{\\params\\params}\\cost(\\sol, \\params) \\, (\\matI_\\params \\otimes \\matI_\\params) \\\\\n",
    "    \t&\\quad+ \\nabla_\\state\\cost(\\sol, \\params) \\, \\nabla^2\\sol,\n",
    "\t\\end{split}$$\n",
    "    \n",
    "Differentiate the steady-state equation, $\\statefun\\left(\\sol(\\params),\\params\\right)=0$, twice:\n",
    "\n",
    "$$\\begin{split}\n",
    "          0 & = \\nabla_{\\state\\state}\\statefun(\\sol, \\params) \\, (\\nabla\\sol \\otimes \\nabla\\sol)\n",
    "        + \\nabla_{\\state\\params}\\statefun(\\sol, \\params) \\, (\\nabla\\sol \\otimes \\matI_\\params)\\\\\n",
    "\t\t& \\quad + \\nabla_{\\params\\state}\\statefun(\\sol, \\params) \\, (\\matI_\\params \\otimes \\nabla\\sol)\n",
    "        + \\nabla_{\\params\\params}\\statefun(\\sol, \\params) \\, (\\matI_\\params \\otimes \\matI_\\params) \\\\\n",
    "\t\t& \\quad + \\matA \\, \\nabla^2\\sol.\n",
    "\t\\end{split}$$\n",
    "    \n",
    "(Tensor notation of Manton [2012])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### F-1  <span style=\"color:#389826\">Gradient</span> and <span style=\"color:#4063d8\">Hessian</span>\n",
    "\n",
    "1. Find the steady state solution $\\sol(\\params)$\n",
    "\n",
    "1. Factorize the Jacobian $\\matA = \\nabla_\\state \\statefun\\left(\\sol(\\params), \\params\\right)$ <span style=\"color:#9558b2\">($1$ factorization)</span>\n",
    "\n",
    "1. Compute $\\nabla\\sol(\\params) = -\\matA^{-1} \\nabla_\\params \\statefun(\\sol, \\params)$ <span style=\"color:#9558b2\">($m$ forward and back substitutions)</span>\n",
    "\n",
    "1. Compute the <span style=\"color:#389826\">**gradient**</span>\n",
    "\n",
    "    <span style=\"color:#389826\">$$\\nabla\\objective(\\params)\n",
    "    = \\nabla_\\state\\cost(\\sol, \\params) \\,\n",
    "     \\nabla \\sol(\\params)\n",
    "    + \\nabla_\\params \\cost(\\sol, \\params)$$</span>\n",
    "\n",
    "1. Compute the <span style=\"color:#4063d8\">**Hessian**</span> <span style=\"color:#9558b2\">($1$ forward and back substitution)</span>\n",
    "\n",
    "    <span style=\"color:#4063d8\">$$[\\nabla^2\\objective(\\params)]_{jk}\n",
    "    = \\mathfrak{H}\\big[\\cost(\\state_{jk}, \\params_{jk})\\big]\n",
    "    - \\mathfrak{H}\\big[\\statefun(\\state_{jk}, \\params_{jk})^\\mathsf{T}\\big]\n",
    "    \\, \\matA^{-\\mathsf{T}} \\,\\nabla_\\state\\cost(\\sol,\\params)^\\mathsf{T}$$</span>\n",
    "    \n",
    "    where $\\state_{jk} \\equiv \\sol + \\varepsilon_1 \\nabla\\sol \\, \\vece_j +\\varepsilon_2 \\nabla\\sol \\, \\vece_k$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Demo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Let us first quickly build a model using the AIBECS\n",
    "\n",
    "This will create $\\statefun(\\state,\\params)$ and $\\cost(\\state,\\params)$ and some derivatives, such that we have an **expensive objective function**, $\\objective(\\params)$, defined implicitly by the solution $\\sol(\\params)$.\n",
    "\n",
    "Below I will skip the details of the model but I will run the code that generates `F`, `∇ₓF`, `f`, and `∇ₓf`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<img src=\"https://user-images.githubusercontent.com/4486578/62021873-e9994b00-b20c-11e9-88eb-adf892d7ecad.gif\" width=20% align=right>\n",
    "\n",
    "### Simple phosphorus-cycling model:<br>\n",
    "\\- Dissolved inorganic phosphorus (**<span style=\"color:#0076BA\">DI</span>P**)<br>\n",
    "\\- Particulate organic phosphorus (**<span style=\"color:#1DB100\">PO</span>P**)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "const wet3D, grd, T_OCIM = OCIM0.load()\n",
    "T_DIP(p) = T_OCIM\n",
    "const ztop = vector_of_top_depths(wet3D, grd) .|> ustrip\n",
    "const iwet, v = findall(vec(wet3D)), vector_of_volumes(wet3D, grd) .|> ustrip\n",
    "const nb, z = length(iwet), grd.depth_3D[iwet] .|> ustrip\n",
    "const DIV, Iabove = buildDIV(wet3D, iwet, grd), buildIabove(wet3D, iwet)\n",
    "const S₀, S′ = buildPFD(ones(nb), DIV, Iabove), buildPFD(ztop, DIV, Iabove)\n",
    "T_POP(p) = p.w₀ * S₀ + p.w′ * S′\n",
    "function G_DIP!(dx, DIP, POP, p)\n",
    "    τ, k, z₀, κ, xgeo, τgeo = p.τ, p.k, p.z₀, p.κ, p.xgeo, p.τgeo\n",
    "    dx .= @. (xgeo - DIP) / τgeo - (DIP ≥ 0) / τ * DIP^2 / (DIP + k) * (z ≤ z₀) + κ * POP\n",
    "end\n",
    "function G_POP!(dx, DIP, POP, p)\n",
    "    τ, k, z₀, κ = p.τ, p.k, p.z₀, p.κ\n",
    "    dx .= @. (DIP ≥ 0) / τ * DIP^2 / (DIP + k) * (z ≤ z₀) - κ * POP\n",
    "end\n",
    "const iDIP = 1:nb\n",
    "const iPOP = nb .+ iDIP\n",
    "t = empty_parameter_table()\n",
    "add_parameter!(t, :xgeo, 2.17u\"mmol/m^3\", optimizable = true)\n",
    "add_parameter!(t, :τgeo, 1.0u\"Myr\")\n",
    "add_parameter!(t, :k, 5.0u\"μmol/m^3\", optimizable = true)\n",
    "add_parameter!(t, :z₀, 80.0u\"m\")\n",
    "add_parameter!(t, :w₀, 0.5u\"m/d\", optimizable = true)\n",
    "add_parameter!(t, :w′, 0.1u\"1/d\", optimizable = true)\n",
    "add_parameter!(t, :κ, 0.3u\"1/d\", optimizable = true)\n",
    "add_parameter!(t, :τ, 100.0u\"d\", optimizable = true)\n",
    "initialize_Parameters_type(t, \"Pcycle_Parameters\")\n",
    "p = Pcycle_Parameters()\n",
    "F!, ∇ₓF = inplace_state_function_and_Jacobian((T_DIP,T_POP), (G_DIP!,G_POP!), nb)\n",
    "x = p.xgeo * ones(2nb)\n",
    "const μDIPobs3D, σ²DIPobs3D = WorldOceanAtlasTools.fit_to_grid(grd, 2018, \"phosphate\", \"annual\", \"1°\", \"an\")\n",
    "const μDIPobs, σ²DIPobs = μDIPobs3D[iwet], σ²DIPobs3D[iwet]\n",
    "const μx, σ²x = (μDIPobs, missing), (σ²DIPobs, missing)\n",
    "const ωs, ωp = [1.0, 0.0], 1e-4\n",
    "f, ∇ₓf, ∇ₚf = generate_objective_and_derivatives(ωs, μx, σ²x, v, ωp, mean_obs(p), variance_obs(p))\n",
    "F(x::Vector{Tx}, p::Pcycle_Parameters{Tp}) where {Tx,Tp} = F!(Vector{promote_type(Tx,Tp)}(undef,length(x)),x,p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<img src=\"https://user-images.githubusercontent.com/4486578/62021873-e9994b00-b20c-11e9-88eb-adf892d7ecad.gif\" width=20% align=right>\n",
    "\n",
    "Tracer equations:\n",
    "\n",
    "$$\\left[\\frac{\\partial}{\\partial t} + \\nabla \\cdot (\\boldsymbol{u} + \\mathbf{K}\\cdot\\nabla )\\right] x_\\mathsf{DIP} = -U(x_\\mathsf{DIP}) + R(x_\\mathsf{POP})$$\n",
    "\n",
    "$$\\left[\\frac{\\partial}{\\partial t} + \\nabla \\cdot \\boldsymbol{w}\\right] x_\\mathsf{POP} = U(x_\\mathsf{DIP}) - R(x_\\mathsf{POP})$$\n",
    "\n",
    "- DIP&rarr;POP: $U=\\frac{x_\\mathsf{DIP}}{\\tau} \\, \\frac{x_\\mathsf{DIP}}{x_\\mathsf{DIP} + k} \\, (z < z_0)$\n",
    "\n",
    "- POP&rarr;DIP: $R = \\kappa \\, x_\\mathsf{POP}$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### 6 parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "p"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "p[:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "# Using the F-1 algorithm is easy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "using F1Method\n",
    "mem = F1Method.initialize_mem(x, p) \n",
    "objective(p) = F1Method.objective(f, F, ∇ₓF,      mem, p, CTKAlg(); preprint=\" \")\n",
    "gradient(p)  =  F1Method.gradient(f, F, ∇ₓf, ∇ₓF, mem, p, CTKAlg(); preprint=\" \")\n",
    "hessian(p)   =   F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, p, CTKAlg(); preprint=\" \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "That's it, we have defined functions that \"autodifferentiate\" through the iterative solver!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "# Testing the F-1 algorithm\n",
    "\n",
    "We simply call the objective,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "objective(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "the gradient,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "gradient(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "and the Hessian,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "hessian(p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "And do it again after updating the parameters, this time also recording the time spent, for the objective,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "@time objective(1.1p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "the gradient,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "@time gradient(1.1p) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "and the Hessian,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "@time hessian(1.1p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Factorizations are the bottleneck"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "@time factorize(∇ₓF(mem.s, mem.p))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "length(p), length(p)^2 * 20.313170 * u\"s\" |> u\"minute\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Benchmark the F-1 algorithm<br>(for a full optimization run)\n",
    "\n",
    "\n",
    "| Algorithm   | Definition                                            | Factorizations     |\n",
    "|:------------|:------------------------------------------------------|:-------------------|\n",
    "| <span style=\"color:#cb3c33\"><b>F-1</b></span>     | F-1 algorithm                                         | $\\mathcal{O}(1)$   |\n",
    "| <span style=\"color:#389826\"><b>DUAL</b></span>   | Dual step for Hessian + Analytical gradient           | $\\mathcal{O}(m)$   |\n",
    "| <span style=\"color:#389826\"><b>COMPLEX</b></span> | Complex step for Hessian + Analytical gradient        | $\\mathcal{O}(m)$   |\n",
    "| <span style=\"color:#389826\"><b>FD1</b></span>     | Finite differences for Hessian + Analytical gradient  | $\\mathcal{O}(m)$   |\n",
    "| <span style=\"color:#9558b2\"><b>HYPER</b></span>   | Hyperdual step for Hessian and dual step for gradient | $\\mathcal{O}(m^2)$ |\n",
    "| <span style=\"color:#9558b2\"><b>FD2</b></span>     | Finite differences for Hessian and gradient           | $\\mathcal{O}(m^2)$ |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "# Computation time full optimization\n",
    "\n",
    "<center><img src=\"https://user-images.githubusercontent.com/4486578/61168770-56b6aa80-a596-11e9-9f0d-deb26babb8cf.png\" width=70%></center>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "# Partition of computation time\n",
    "\n",
    "<center><img src=\"https://user-images.githubusercontent.com/4486578/61168767-4c94ac00-a596-11e9-90e4-7137177bf554.png\" width=50%></center>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<img src=\"https://user-images.githubusercontent.com/4486578/61168770-56b6aa80-a596-11e9-9f0d-deb26babb8cf.png\" width=50% align=right>\n",
    "<h1>Conclusions</h1>\n",
    "\n",
    "The F-1 algorithm is \n",
    "- <span style=\"color:#389826\">**easy**</span> to use and understand\n",
    "- <span style=\"color:#4063d8\">**accurate**</span> (machine-precision)\n",
    "- <span style=\"color:#cb3c33\">**fast!**</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<img src=\"https://user-images.githubusercontent.com/4486578/61168767-4c94ac00-a596-11e9-90e4-7137177bf554.png\" width=40% align=left>\n",
    "\n",
    "Check it on Github<br>at [briochemc/F1Method.jl](https://github.com/briochemc/F1Method.jl)!\n"
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Julia 1.1.1",
   "language": "julia",
   "name": "julia-1.1"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.1.1"
  },
  "rise": {
   "scroll": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 3
}