{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "HTML(\"\"\"\n", "\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "using PyPlot, PyCall\n", "using AIBECS, WorldOceanAtlasTools\n", "using LinearAlgebra" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\n", " \n", "\n", "
\n", "

The F-1 algorithm



\n", " Benoît Pasquier and François Primeau
\n", " University of California, Irvine\n", "
\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "\n", "\n", "**Paper**: *The F-1 algorithm for efficient computation of the Hessian matrix of an objective function defined implicitly by the solution of a steady-state problem*\n", "(under review)\n", "\n", "**Code**: **F1Method.jl** (Julia package — check it out on GitHub!)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "$\\newcommand{\\state}{\\boldsymbol{x}}$\n", "$\\newcommand{\\sol}{\\boldsymbol{s}}$\n", "$\\newcommand{\\params}{\\boldsymbol{p}}$\n", "$\\newcommand{\\lambdas}{\\boldsymbol{\\lambda}}$\n", "$\\newcommand{\\statefun}{\\boldsymbol{F}}$\n", "$\\newcommand{\\cost}{f}$\n", "$\\newcommand{\\objective}{\\hat{f}}$\n", "$\\DeclareMathOperator*{\\minimize}{minimize}$\n", "$\\newcommand{\\vece}{\\boldsymbol{e}}$\n", "$\\newcommand{\\matI}{\\mathbf{I}}$\n", "$\\newcommand{\\matA}{\\mathbf{A}}$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## F-1 algorithm – Outline\n", "\n", "1. Motivation \\& Context \n", "1. Autodifferentiation\n", "1. What is the F-1 algorithm?\n", "1. Benchmarks" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "\n", "As we go through, I will demo some Julia code!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "\n", "# Motivation\n", "\n", "For **parameter optimization** and parameter sensitivity estimation!\n", "\n", "The [**AIBECS**](https://github.com/briochemc/AIBECS.jl) \n", " *for building global marine steady-state biogeochemistry models in just a few commands* \n", " (yesterday's CCRC seminar)\n", " \n", "And the **F-1 algorithm** to facilitate optimization of biogeochemical parameters.\n", "\n", "But the context of the **F-1 algorithm** is more general..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Context\n", "**Steady-state** equation\n", "\n", "$$\\frac{\\partial \\state}{\\partial t} = \\statefun(\\state,\\params) = 0$$\n", "\n", "for some state $\\state$ (size $n \\sim 400\\,000$) and parameters $\\params$ (size $m \\sim 10$)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**Constrained optimization** problem\n", "\n", "$$\\left\\{\\begin{aligned}\n", " \\minimize_{\\boldsymbol{x}, \\boldsymbol{p}} & & \\cost(\\state, \\params) \\\\\n", " \\textrm{subject to} & & \\statefun(\\state, \\params) = 0\n", " \\end{aligned}\\right.$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "function constrained_optimization_plot()\n", " figure(figsize=(10,6))\n", " f(x,p) = (x - 1)^2 + (p - 1)^2 + 1\n", " s(p) = 1 + 0.9atan(p - 0.3)\n", " xs, ps = -1.2:0.1:3, -1.2:0.1:3.2\n", " plt = plot_surface(xs, ps, [f(x,p) for p in ps, x in xs], alpha = 0.5, cmap=:viridis_r)\n", " gca(projection=\"3d\")\n", " P = repeat(reshape(ps, length(ps), 1), outer = [1, length(xs)])\n", " X = repeat(reshape(xs, 1, length(xs)), outer = [length(ps), 1])\n", " contour(X, P, [f(x,p) for p in ps, x in xs], levels = 2:6, colors=\"black\", alpha=0.5, linewidths=0.5)\n", " plot([s(p) for p in ps], ps)\n", " legend((\"\\$F(x,p) = 0\\$\",))\n", " xlabel(\"state \\$x\\$\"); ylabel(\"parameters \\$p\\$\"); zlabel(\"objective \\$f(x,p)\\$\")\n", " xticks(unique(round.(xs))); yticks(unique(round.(ps))); zticks(2:6)\n", " return plt, f, s, xs, ps\n", "end\n", "constrained_optimization_plot()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Unconstrained optimization** along the manifold of steady-state solutions.\n", "\n", "$$\\minimize_\\params \\objective(\\params)$$\n", "\n", "where \n", "$$\\objective(\\params) \\equiv \\cost\\big(\\sol(\\params), \\params\\big)$$ \n", "is the **objective** function.\n", "\n", "And $\\sol(\\params)$ is the **steady-state solution** for parameters $\\params$, i.e., such that\n", "\n", "$$\\statefun\\left(\\sol(\\params),\\params\\right) = 0$$\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "function unconstrained_optimization_plot()\n", " plt, f, s, xs, ps = constrained_optimization_plot()\n", " plot([s(p) for p in ps], ps, [f(s(p),p) for p in ps], color=\"black\", linewidth=3)\n", " plot(maximum(xs) * ones(length(ps)), ps, [f(s(p),p) for p in ps], color=\"red\")\n", " legend((\"\\$x=s(p) \\\\Longleftrightarrow F(x,p)=0\\$\",\"\\$f(x,p)\\$ with \\$x = s(p)\\$\", \"\\$\\\\hat{f}(p) = f(s(p),p)\\$\"))\n", " for p in ps[1:22:end]\n", " plot(s(p) * [1,1], p * [1,1], f(s(p),p) * [0,1], color=\"black\", alpha = 0.3, linestyle=\"--\")\n", " plot([s(p), maximum(xs)], p * [1,1], f(s(p),p) * [1,1], color=\"black\", alpha = 0.3, linestyle=\"--\", marker=\"o\")\n", " end\n", " return plt\n", "end\n", "unconstrained_optimization_plot()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\n", "\n", "Two **nested** iterative algorithms\n", "\n", "**Inner solver** with Newton step\n", "\n", "$$\\Delta \\state \\equiv - \\left[\\nabla_\\state \\statefun(\\state,\\params)\\right]^{-1} \\statefun(\\state,\\params)$$\n", "\n", "Outer **Optimizer** with Newton step\n", "\n", "$$\\Delta\\params \\equiv - \\left[\\nabla^2\\objective(\\params)\\right]^{-1}\\nabla \\objective(\\params)$$\n", "\n", "requires the Hessian of the objective function, \n", "\n", "$$\\nabla^2 \\objective(\\params)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Autodifferentiation\n", "\n", "Take the **Taylor expansion** of $\\objective(\\params)$ in the $h\\vece_j$ direction for a given $h$:\n", "\n", "$$\\objective(\\params + h \\vece_j)\n", " = \\objective(\\params)\n", " + h \\underbrace{\\nabla\\objective(\\params) \\, \\vece_j}_{?} \n", " + \\frac{h^2}{2} \\, \\left[\\vece_j^\\mathsf{T} \\, \\nabla^2\\objective(\\params) \\, \\vece_j\\right]\n", " + \\ldots$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A standard solution is to use **Finite differences**:\n", "\n", "$$\\nabla\\objective(\\params) \\, \\vece_j \n", " = \\frac{\\objective(\\params + h \\vece_j) - \\objective(\\params)}{h}\n", " + \\mathcal{O}(h)$$\n", " \n", "But truncation and round-off errors!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A better solution is to use **Complex** numbers!
\n", "Taylor-expand in the $ih\\vece_j$ direction:\n", "$$\\objective(\\params + i h \\vece_j)\n", " = \\objective(\\params)\n", " + i h \\underbrace{\\nabla\\objective(\\params) \\, \\vece_j}_{?}\n", " - \\frac{h^2}{2} \\, \\left[\\vece_j^\\mathsf{T} \\, \\nabla^2\\objective(\\params) \\, \\vece_j\\right]\n", " + \\ldots$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Because when taking the imaginary part, the convergence is faster and there are no more round-off errors:\n", "\n", "$$\\nabla\\objective(\\params) \\, \\vece_j = \\Im\\left[\\frac{\\objective(\\params + i h \\vece_j)}{h}\\right] + \\mathcal{O}(h^2)$$\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "𝑓(x) = cos(x^2) + exp(x)\n", "∇𝑓(x) = -2x * sin(x^2) + exp(x)\n", "finite_differences(f, x, h) = (f(x + h) - f(x)) / h\n", "centered_differences(f, x, h) = (f(x + h) - f(x - h)) / 2h\n", "complex_step_method(f, x, h) = imag(f(x + im * h)) / h\n", "relative_error(m, f, ∇f, x, h) = Float64(abs(BigFloat(m(f, x, h)) - ∇f(BigFloat(x))) / abs(∇f(x)))\n", "x, step_sizes = 2.0, 10 .^ (-20:0.02:0)\n", "numerical_schemes = [finite_differences, centered_differences, complex_step_method]\n", "plot(step_sizes, [relative_error(m, 𝑓, ∇𝑓, x, h) for h in step_sizes, m in numerical_schemes])\n", "loglog(), legend(string.(numerical_schemes)), xlabel(\"step size, \\$h\\$\"), ylabel(\"Relative Error, \\$\\\\frac{|\\\\bullet - \\\\nabla f(x)|}{|\\\\nabla f(x)|}\\$\")\n", "title(\"There are better alternatives to finite differences\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "An even better solution is to use **Dual** numbers! \n", "\n", "Define $\\varepsilon \\ne 0$ s.t. $\\varepsilon^2 = 0$, then the complete Taylor expansion in the $\\varepsilon \\vece_j$ direction is\n", "\n", "$$\\objective(\\params + \\varepsilon \\vece_j)\n", " = \\objective(\\params)\n", " + \\varepsilon \\underbrace{\\nabla\\objective(\\params) \\, \\vece_j}_{?}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Hence, 1st derivatives are given **exactly** by\n", "\n", "$$\\nabla\\objective(\\params) \\, \\vece_j = \\mathfrak{D}\\left[\\objective(\\params + \\varepsilon \\vece_j)\\right]$$\n", "\n", "where $\\mathfrak{D}$ is the dual part (the $\\varepsilon$ coefficient)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The dual number $\\varepsilon$ behaves like an **infinitesimal** and it gives very accurate derivatives!
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "using DualNumbers\n", "dual_step_method(f, x, h) = dualpart(f(x + ε))\n", "push!(numerical_schemes, dual_step_method)\n", "plot(step_sizes, [relative_error(m, 𝑓, ∇𝑓, x, h) for h in step_sizes, m in numerical_schemes])\n", "loglog(), legend(string.(numerical_schemes)), xlabel(\"step size, \\$h\\$\"), ylabel(\"Relative Error, \\$\\\\frac{|\\\\bullet - \\\\nabla f(x)|}{|\\\\nabla f(x)|}\\$\")\n", "title(\"There are even better alternatives to complex-step differentiation\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "notes" } }, "source": [ "Just like complex identify with $\\mathbb{R}[X]/(X^2+1)$,
\n", "dual numbers identify with $\\mathbb{R}[X]/(X^2)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The **gradient** of the objective function can be computed in $m$ dual evaluations of the objective function, via\n", "\n", "$$\\nabla\\objective(\\params) = \\mathfrak{D} \\left( \\left[\\begin{matrix}\n", "\t\t\\objective(\\params + \\varepsilon \\vece_1) \\\\\n", "\t\t\\objective(\\params + \\varepsilon \\vece_2) \\\\\n", "\t\t\\vdots \\\\\n", " \\objective(\\params + \\varepsilon \\vece_{m})\n", " \\end{matrix} \\right]^\\mathsf{T} \\right)$$\n", " \n", "where $m$ is the number of parameters." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "For second derivatives, we can use hyperdual numbers.
\n", "Let $\\varepsilon_1$ and $\\varepsilon_2$ be the hyperdual units defined by $\\varepsilon_1^2 = \\varepsilon_2^2 = 0$ but $\\varepsilon_1 \\varepsilon_2 \\ne 0$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let $\\params_{jk} = \\params + \\varepsilon_1 \\vece_j + \\varepsilon_2 \\vece_k$, for which the Taylor expansion of $\\objective$ is\n", "\n", "$$\\objective(\\params_{jk})\n", " = \\objective(\\params)\n", " + \\varepsilon_1 \\nabla\\objective(\\params) \\vece_j\n", " + \\varepsilon_2 \\nabla\\objective(\\params) \\vece_k\n", " + \\varepsilon_1 \\varepsilon_2 \\underbrace{\\vece_j^\\mathsf{T} \\nabla^2\\objective(\\params) \\vece_k}_{?}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Taking the \"hyperdual\" part gives the second derivative:\n", "\n", "$$\\vece_j^\\mathsf{T} \\nabla^2\\objective(\\params) \\vece_k = \\mathfrak{H}\\left[\\objective(\\params_{jk})\\right]$$\n", "\n", "where $\\mathfrak{H}$ stands for hyperdual part (the $\\varepsilon_1 \\varepsilon_2$ coefficient)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Hyperdual steps also give very accurate derivatives!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "∇²𝑓(x) = -2sin(x^2) - 4x^2 * cos(x^2) + exp(x)\n", "using HyperDualNumbers\n", "finite_differences_2(f, x, h) = (f(x + h) - 2f(x) + f(x - h)) / h^2\n", "hyperdual_step_method(f, x, h) = ε₁ε₂part(f(x + ε₁ + ε₂))\n", "numerical_schemes_2 = [finite_differences_2, hyperdual_step_method]\n", "plot(step_sizes, [relative_error(m, 𝑓, ∇²𝑓, x, h) for h in step_sizes, m in numerical_schemes_2])\n", "loglog(), legend(string.(numerical_schemes_2)), xlabel(\"step size, \\$h\\$\"), ylabel(\"Relative Error, \\$\\\\frac{|\\\\bullet - \\\\nabla f(x)|}{|\\\\nabla f(x)|}\\$\")\n", "title(\"HyperDual Numbers for second derivatives\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Autodifferentiating through an iterative solver is expensive\n", "\n", "The Hessian of $\\objective$ can be computed in $\\frac{m(m+1)}{2}$ hyperdual evaluations:\n", "\n", "$$\\nabla^2\\objective(\\params) = \\mathfrak{H} \\left( \\left[\\begin{matrix}\n", "\t\t\\objective(\\params_{11})\n", " & \\objective(\\params_{12})\n", " & \\cdots\n", " & \\objective(\\params_{1m})\n", " \\\\\n", "\t\t\\objective(\\params_{12})\n", " & \\objective(\\params_{22})\n", " & \\cdots\n", " & \\objective(\\params_{2m})\n", " \\\\\n", " \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\t\t\\objective(\\params_{1m})\n", " & \\objective(\\params_{2m})\n", " & \\cdots\n", " & \\objective(\\params_{mm})\n", " \\end{matrix} \\right] \\right)$$\n", "\n", "\n", "But this requires $\\frac{m(m+1)}{2}$ calls to the inner solver, which requires hyperdual factorizations, and fine-tuned non-real tolerances!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What is the F-1 algorithm\n", "\n", "### What does it do?\n", "\n", "The F-1 algorithm allows you to calculate the **gradient** and **Hessian** of an objective function, $\\objective(\\params)$, defined implicitly by the solution of a steady-state problem.\n", "\n", "### How does it work?\n", "\n", "It leverages analytical shortcuts, combined with **dual** and **hyperdual** numbers, to avoid calls to the inner solver and unnecessary factorizations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Analytical gradient\n", "\n", "Differentiate the objective function $\\objective(\\params) = \\cost\\left(\\sol(\\params), \\params\\right)$:\n", "\n", "\n", "\n", "$$\\color{forestgreen}{\\underbrace{\\nabla\\objective(\\params)}_{1 \\times m}}\n", " = \\color{royalblue}{\\underbrace{\\nabla_\\state\\cost(\\sol, \\params)_{\\vphantom{\\params}}}_{1 \\times n}} \\,\n", " \\color{red}{\\underbrace{\\nabla \\sol(\\params)_{\\vphantom{\\params}}}_{n \\times m}}\n", " + \\color{DarkOrchid}{\\underbrace{\\nabla_\\params \\cost(\\sol, \\params)}_{1 \\times m}}$$\n", " \n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Differentiate the steady-state equation, $\\statefun\\left(\\sol(\\params),\\params\\right)=0$: \n", "\n", "\n", "\n", "$$\\color{royalblue}{\\underbrace{\\matA_{\\vphantom{\\params}}}_{n \\times n}} \\,\n", " \\color{red}{\\underbrace{\\nabla\\sol(\\params)_{\\vphantom{\\params}}}_{n \\times m}}\n", " + \\color{forestgreen}{\\underbrace{\\nabla_\\params \\statefun(\\sol, \\params)}_{n\\times m}} = 0$$\n", " \n", "where $\\matA \\equiv \\nabla_\\state\\statefun(\\sol,\\params)$ is the **Jacobian** of the steady-state system (a large, sparse matrix)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Analytical Hessian\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Differentiate $\\objective(\\params) = \\cost\\left(\\sol(\\params), \\params\\right)$ twice:\n", "\n", "$$\\begin{split}\n", " \t\\nabla^2 \\objective(\\params)\n", " \t&= \\nabla_{\\state\\state}\\cost(\\sol, \\params) \\, (\\nabla\\sol \\otimes \\nabla\\sol)\n", " + \\nabla_{\\state\\params}\\cost(\\sol, \\params) \\, (\\nabla\\sol \\otimes \\matI_\\params) \\\\\n", " \t&\\quad+ \\nabla_{\\params\\state}\\cost(\\sol, \\params) \\, (\\matI_\\params \\otimes \\nabla\\sol)\n", " + \\nabla_{\\params\\params}\\cost(\\sol, \\params) \\, (\\matI_\\params \\otimes \\matI_\\params) \\\\\n", " \t&\\quad+ \\nabla_\\state\\cost(\\sol, \\params) \\, \\nabla^2\\sol,\n", "\t\\end{split}$$\n", " \n", "Differentiate the steady-state equation, $\\statefun\\left(\\sol(\\params),\\params\\right)=0$, twice:\n", "\n", "$$\\begin{split}\n", " 0 & = \\nabla_{\\state\\state}\\statefun(\\sol, \\params) \\, (\\nabla\\sol \\otimes \\nabla\\sol)\n", " + \\nabla_{\\state\\params}\\statefun(\\sol, \\params) \\, (\\nabla\\sol \\otimes \\matI_\\params)\\\\\n", "\t\t& \\quad + \\nabla_{\\params\\state}\\statefun(\\sol, \\params) \\, (\\matI_\\params \\otimes \\nabla\\sol)\n", " + \\nabla_{\\params\\params}\\statefun(\\sol, \\params) \\, (\\matI_\\params \\otimes \\matI_\\params) \\\\\n", "\t\t& \\quad + \\matA \\, \\nabla^2\\sol.\n", "\t\\end{split}$$\n", " \n", "(Tensor notation of Manton [2012])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### F-1 Gradient and Hessian\n", "\n", "1. Find the steady state solution $\\sol(\\params)$\n", "\n", "1. Factorize the Jacobian $\\matA = \\nabla_\\state \\statefun\\left(\\sol(\\params), \\params\\right)$ ($1$ factorization)\n", "\n", "1. Compute $\\nabla\\sol(\\params) = -\\matA^{-1} \\nabla_\\params \\statefun(\\sol, \\params)$ ($m$ forward and back substitutions)\n", "\n", "1. Compute the **gradient**\n", "\n", " $$\\nabla\\objective(\\params)\n", " = \\nabla_\\state\\cost(\\sol, \\params) \\,\n", " \\nabla \\sol(\\params)\n", " + \\nabla_\\params \\cost(\\sol, \\params)$$\n", "\n", "1. Compute the **Hessian** ($1$ forward and back substitution)\n", "\n", " $$[\\nabla^2\\objective(\\params)]_{jk}\n", " = \\mathfrak{H}\\big[\\cost(\\state_{jk}, \\params_{jk})\\big]\n", " - \\mathfrak{H}\\big[\\statefun(\\state_{jk}, \\params_{jk})^\\mathsf{T}\\big]\n", " \\, \\matA^{-\\mathsf{T}} \\,\\nabla_\\state\\cost(\\sol,\\params)^\\mathsf{T}$$\n", " \n", " where $\\state_{jk} \\equiv \\sol + \\varepsilon_1 \\nabla\\sol \\, \\vece_j +\\varepsilon_2 \\nabla\\sol \\, \\vece_k$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Demo" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us first quickly build a model using the AIBECS\n", "\n", "This will create $\\statefun(\\state,\\params)$ and $\\cost(\\state,\\params)$ and some derivatives, such that we have an **expensive objective function**, $\\objective(\\params)$, defined implicitly by the solution $\\sol(\\params)$.\n", "\n", "Below I will skip the details of the model but I will run the code that generates `F`, `∇ₓF`, `f`, and `∇ₓf`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\n", "\n", "### Simple phosphorus-cycling model:
\n", "\\- Dissolved inorganic phosphorus (**DIP**)
\n", "\\- Particulate organic phosphorus (**POP**)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "const wet3D, grd, T_OCIM = OCIM0.load()\n", "T_DIP(p) = T_OCIM\n", "const ztop = vector_of_top_depths(wet3D, grd) .|> ustrip\n", "const iwet, v = findall(vec(wet3D)), vector_of_volumes(wet3D, grd) .|> ustrip\n", "const nb, z = length(iwet), grd.depth_3D[iwet] .|> ustrip\n", "const DIV, Iabove = buildDIV(wet3D, iwet, grd), buildIabove(wet3D, iwet)\n", "const S₀, S′ = buildPFD(ones(nb), DIV, Iabove), buildPFD(ztop, DIV, Iabove)\n", "T_POP(p) = p.w₀ * S₀ + p.w′ * S′\n", "function G_DIP!(dx, DIP, POP, p)\n", " τ, k, z₀, κ, xgeo, τgeo = p.τ, p.k, p.z₀, p.κ, p.xgeo, p.τgeo\n", " dx .= @. (xgeo - DIP) / τgeo - (DIP ≥ 0) / τ * DIP^2 / (DIP + k) * (z ≤ z₀) + κ * POP\n", "end\n", "function G_POP!(dx, DIP, POP, p)\n", " τ, k, z₀, κ = p.τ, p.k, p.z₀, p.κ\n", " dx .= @. (DIP ≥ 0) / τ * DIP^2 / (DIP + k) * (z ≤ z₀) - κ * POP\n", "end\n", "const iDIP = 1:nb\n", "const iPOP = nb .+ iDIP\n", "t = empty_parameter_table()\n", "add_parameter!(t, :xgeo, 2.17u\"mmol/m^3\", optimizable = true)\n", "add_parameter!(t, :τgeo, 1.0u\"Myr\")\n", "add_parameter!(t, :k, 5.0u\"μmol/m^3\", optimizable = true)\n", "add_parameter!(t, :z₀, 80.0u\"m\")\n", "add_parameter!(t, :w₀, 0.5u\"m/d\", optimizable = true)\n", "add_parameter!(t, :w′, 0.1u\"1/d\", optimizable = true)\n", "add_parameter!(t, :κ, 0.3u\"1/d\", optimizable = true)\n", "add_parameter!(t, :τ, 100.0u\"d\", optimizable = true)\n", "initialize_Parameters_type(t, \"Pcycle_Parameters\")\n", "p = Pcycle_Parameters()\n", "F!, ∇ₓF = inplace_state_function_and_Jacobian((T_DIP,T_POP), (G_DIP!,G_POP!), nb)\n", "x = p.xgeo * ones(2nb)\n", "const μDIPobs3D, σ²DIPobs3D = WorldOceanAtlasTools.fit_to_grid(grd, 2018, \"phosphate\", \"annual\", \"1°\", \"an\")\n", "const μDIPobs, σ²DIPobs = μDIPobs3D[iwet], σ²DIPobs3D[iwet]\n", "const μx, σ²x = (μDIPobs, missing), (σ²DIPobs, missing)\n", "const ωs, ωp = [1.0, 0.0], 1e-4\n", "f, ∇ₓf, ∇ₚf = generate_objective_and_derivatives(ωs, μx, σ²x, v, ωp, mean_obs(p), variance_obs(p))\n", "F(x::Vector{Tx}, p::Pcycle_Parameters{Tp}) where {Tx,Tp} = F!(Vector{promote_type(Tx,Tp)}(undef,length(x)),x,p)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "\n", "\n", "Tracer equations:\n", "\n", "$$\\left[\\frac{\\partial}{\\partial t} + \\nabla \\cdot (\\boldsymbol{u} + \\mathbf{K}\\cdot\\nabla )\\right] x_\\mathsf{DIP} = -U(x_\\mathsf{DIP}) + R(x_\\mathsf{POP})$$\n", "\n", "$$\\left[\\frac{\\partial}{\\partial t} + \\nabla \\cdot \\boldsymbol{w}\\right] x_\\mathsf{POP} = U(x_\\mathsf{DIP}) - R(x_\\mathsf{POP})$$\n", "\n", "- DIP→POP: $U=\\frac{x_\\mathsf{DIP}}{\\tau} \\, \\frac{x_\\mathsf{DIP}}{x_\\mathsf{DIP} + k} \\, (z < z_0)$\n", "\n", "- POP→DIP: $R = \\kappa \\, x_\\mathsf{POP}$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### 6 parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "p" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "p[:]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Using the F-1 algorithm is easy" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "using F1Method\n", "mem = F1Method.initialize_mem(x, p) \n", "objective(p) = F1Method.objective(f, F, ∇ₓF, mem, p, CTKAlg(); preprint=\" \")\n", "gradient(p) = F1Method.gradient(f, F, ∇ₓf, ∇ₓF, mem, p, CTKAlg(); preprint=\" \")\n", "hessian(p) = F1Method.hessian(f, F, ∇ₓf, ∇ₓF, mem, p, CTKAlg(); preprint=\" \")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "That's it, we have defined functions that \"autodifferentiate\" through the iterative solver!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Testing the F-1 algorithm\n", "\n", "We simply call the objective," ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "objective(p)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "the gradient," ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "gradient(p)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "and the Hessian," ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "hessian(p)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "And do it again after updating the parameters, this time also recording the time spent, for the objective," ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "@time objective(1.1p)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "the gradient," ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "@time gradient(1.1p) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "and the Hessian," ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "@time hessian(1.1p)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Factorizations are the bottleneck" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "@time factorize(∇ₓF(mem.s, mem.p))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "length(p), length(p)^2 * 20.313170 * u\"s\" |> u\"minute\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Benchmark the F-1 algorithm
(for a full optimization run)\n", "\n", "\n", "| Algorithm | Definition | Factorizations |\n", "|:------------|:------------------------------------------------------|:-------------------|\n", "| F-1 | F-1 algorithm | $\\mathcal{O}(1)$ |\n", "| DUAL | Dual step for Hessian + Analytical gradient | $\\mathcal{O}(m)$ |\n", "| COMPLEX | Complex step for Hessian + Analytical gradient | $\\mathcal{O}(m)$ |\n", "| FD1 | Finite differences for Hessian + Analytical gradient | $\\mathcal{O}(m)$ |\n", "| HYPER | Hyperdual step for Hessian and dual step for gradient | $\\mathcal{O}(m^2)$ |\n", "| FD2 | Finite differences for Hessian and gradient | $\\mathcal{O}(m^2)$ |\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Computation time full optimization\n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "# Partition of computation time\n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "

Conclusions

\n", "\n", "The F-1 algorithm is \n", "- **easy** to use and understand\n", "- **accurate** (machine-precision)\n", "- **fast!**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "\n", "\n", "Check it on Github
at [briochemc/F1Method.jl](https://github.com/briochemc/F1Method.jl)!\n" ] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Julia 1.1.1", "language": "julia", "name": "julia-1.1" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.1.1" }, "rise": { "scroll": true } }, "nbformat": 4, "nbformat_minor": 3 }