{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Adaptive PDE discretizations on Cartesian grids\n", "## Volume : Algorithmic tools\n", "## Part : Automatic differentiation\n", "## Chapter : Sparse automatic differentiation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook illustrates automatic differentiation, with an application to an simple one-dimensional denoising problem. More precisely, we use the following flavors of automatic differentiation:\n", "* *First* and *Second* order. The variables used encode gradient and (sometimes) hessian information.\n", "* *Sparse*. The Taylor expansion information - gradient and hessian - is stored in sparse form. This is practical for functions with low complexity, but an input of arbitrary dimension.\n", "* *Forward*. The Taylor expansion information is propagated forward along with the computations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[**Summary**](Summary.ipynb) of volume Algorithmic tools, this series of notebooks.\n", "\n", "[**Main summary**](../Summary.ipynb) of the Adaptive Grid Discretizations \n", "\tbook of notebooks, including the other volumes.\n", "\n", "# Table of contents\n", " * [1. A word on functional optimization](#1.-A-word-on-functional-optimization)\n", " * [2. Sparse automatic differentiation](#2.-Sparse-automatic-differentiation)\n", " * [2.1 Elementary symbolic perturbation](#2.1-Elementary-symbolic-perturbation)\n", " * [2.2 Evaluating the objective function](#2.2-Evaluating-the-objective-function)\n", " * [2.3 Evaluating the PDE residue](#2.3-Evaluating-the-PDE-residue)\n", " * [2.4 Other operations and mathematical functions](#2.4-Other-operations-and-mathematical-functions)\n", " * [3. Composition of Dense and Sparse AD](#3.-Composition-of-Dense-and-Sparse-AD)\n", " * [3.1 The problem : growth of the Sparse AD information](#3.1-The-problem-:-growth-of-the-Sparse-AD-information)\n", " * [3.2 A first solution: regular simplifications](#3.2-A-first-solution:-regular-simplifications)\n", " * [3.3 Another possible solution: the combination with dense automatic differentiation](#3.3-Another-possible-solution:-the-combination-with-dense-automatic-differentiation)\n", " * [4. Differentiating extrema using the envelope theorem](#4.-Differentiating-extrema-using-the-envelope-theorem)\n", "\n", "\n", "\n", "**Acknowledgement.** Some of the experiments presented in these notebooks are part of \n", "ongoing research with Ludovic Métivier and Da Chen.\n", "\n", "Copyright Jean-Marie Mirebeau, Centre Borelli, ENS Paris-Saclay, CNRS, University Paris-Saclay" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Known bugs and incompatibilities.** Our implementation of automatic differentiation is based on subclassing the numpy array class. While simple and powerful, this approach suffers from a few pitfalls, described in notebook [ADBugs](ADBugs.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Importing the required libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.682733Z", "iopub.status.busy": "2024-04-30T08:47:01.682444Z", "iopub.status.idle": "2024-04-30T08:47:01.692504Z", "shell.execute_reply": "2024-04-30T08:47:01.692041Z" } }, "outputs": [], "source": [ "import sys; sys.path.insert(0,\"..\") # Allow import of agd from parent directory (useless if conda package installed)\n", "#from Miscellaneous import TocTools; print(TocTools.displayTOC('Sparse','Algo'))" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.695207Z", "iopub.status.busy": "2024-04-30T08:47:01.695012Z", "iopub.status.idle": "2024-04-30T08:47:01.745388Z", "shell.execute_reply": "2024-04-30T08:47:01.744997Z" } }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.747174Z", "iopub.status.busy": "2024-04-30T08:47:01.747018Z", "iopub.status.idle": "2024-04-30T08:47:01.752226Z", "shell.execute_reply": "2024-04-30T08:47:01.751937Z" } }, "outputs": [], "source": [ "import agd.AutomaticDifferentiation as ad\n", "import agd.FiniteDifferences as fd" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.753702Z", "iopub.status.busy": "2024-04-30T08:47:01.753612Z", "iopub.status.idle": "2024-04-30T08:47:01.755820Z", "shell.execute_reply": "2024-04-30T08:47:01.755560Z" } }, "outputs": [], "source": [ "def LInfNorm(a): return np.max(np.abs(a))\n", "def LInfNorm_AD(a): return np.max((LInfNorm(a.value),LInfNorm(a.coef)))\n", "def LInfNorm_AD2(a): return np.max((LInfNorm(a.value),LInfNorm(a.coef1),LInfNorm(a.coef2)))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.757224Z", "iopub.status.busy": "2024-04-30T08:47:01.757136Z", "iopub.status.idle": "2024-04-30T08:47:01.759162Z", "shell.execute_reply": "2024-04-30T08:47:01.758911Z" } }, "outputs": [], "source": [ "def reload_packages():\n", " from Miscellaneous.rreload import rreload\n", " global ad,fd\n", " [ad,fd] = rreload([ad,fd],rootdir='..')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. A word on functional optimization\n", "\n", "We illustrate sparse automatic differentiation with an optimization problem of the form\n", "$$\n", " \\min_u \\int_\\Omega F(x,\\nabla u(x)) + G(x,u(x)) \\, dx,\n", "$$\n", "where the unknown $u$ is a function defined over a given domain $\\Omega$, with arbitrary boundary conditions.\n", "\n", "The cost functions $F$ and $G$ are given as well. Classicaly, $F$ is a regularization term, whereas $G$ is a data fidelity term.\n", "The simplest case would be \n", "$$\n", " F_0(x,v) := \\frac 1 2 \\|v\\|^2, \\qquad G_0(x,y) := \\frac 1 2 \\|y-g_0(x)\\|^2,\n", "$$\n", "where $g_0(x)$ is some given data.\n", "However, one may also think of numerous variants, involving for instance and without loss of generality:\n", "* Inhomogeneous and Anisotropic norms, such as $F(x,v) = \\frac 1 2 $ where $D(x)$ is a positive definite matrix. This allows, in dimension $d\\geq 2$, to enforce regularity more or less strongly depending on the position and the orientation.\n", "* $L^1$ penalization, such as $F(x,v) = \\|v\\|$ or $G(x,y)=|y-g_0(x)|$. These convex yet weaker cost functions promote 'sparsity' in certain contexts. \n", "* Entropic penalization, for positive functions : $G(x,y) = y\\ln(y/u_0(x))$.\n", "\n", "**Costs involving a kernel.** Applications such as deblurring often involve an additional cost of the form \n", "$$\n", " H(x, (\\rho\\star u)(x)),\n", "$$\n", "where $\\rho$ is a convolution kernel. Because convolution is a non-local operator with a large support, sparse automatic differentiation is not a good fit. Backward automatic differentiation would be appropriate here, but it is not supported at this point. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Problem parameters.**\n", "In order to emphasize automatic differentiation tools, rather than a particular application, we consider as a start one of the most basic problem instantiations. See the other notebooks for more advanced examples.\n", "\n", "The domain $\\Omega$ is defined as the (one dimensional) segment $[0,1]$ equipped with periodic boundary conditions.\n", "We let $F_0$ and $G_0$ be the simple instances above defined, with a discontinuous $u_0$." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.760660Z", "iopub.status.busy": "2024-04-30T08:47:01.760580Z", "iopub.status.idle": "2024-04-30T08:47:01.763097Z", "shell.execute_reply": "2024-04-30T08:47:01.762850Z" } }, "outputs": [], "source": [ "def F0(x,v):\n", " return 0.5*v**2\n", "\n", "def g0(x):\n", " return (np.abs(x-0.5)<0.3) + np.sin(2.*np.pi*x)\n", "def G0(x,y):\n", " return 0.5*(y-g0(x))**2\n", "\n", "def Objective(u,x,dx,F,G):\n", "# h=x[1]-x[0] # The grid scale\n", " du = fd.DiffUpwind(u,(1,),dx,padding=None) # padding=None is for periodic bc\n", " return (F(x+0.5*dx,du)+G(x,u)).sum(axis=0) * dx # Integrate over the domain" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.764463Z", "iopub.status.busy": "2024-04-30T08:47:01.764382Z", "iopub.status.idle": "2024-04-30T08:47:01.766210Z", "shell.execute_reply": "2024-04-30T08:47:01.765976Z" } }, "outputs": [], "source": [ "x,dx=np.linspace(0,1,10,endpoint=False,retstep=True)\n", "u=np.zeros(x.shape)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.767477Z", "iopub.status.busy": "2024-04-30T08:47:01.767404Z", "iopub.status.idle": "2024-04-30T08:47:01.770643Z", "shell.execute_reply": "2024-04-30T08:47:01.770404Z" } }, "outputs": [ { "data": { "text/plain": [ "0.5" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Objective(u,x,dx,F0,G0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The optimality condition of the variational problem, for above specific choices of $F_0$ and $G_0$, is \n", "$$\n", " -\\Delta u + u-g_0 = 0.\n", "$$\n", "We can therefore, alternatively, solve directly this PDE instead going through the minimization problem." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.772054Z", "iopub.status.busy": "2024-04-30T08:47:01.771970Z", "iopub.status.idle": "2024-04-30T08:47:01.773958Z", "shell.execute_reply": "2024-04-30T08:47:01.773722Z" } }, "outputs": [], "source": [ "def Scheme0(u,x):\n", " h=float(x[1]-x[0])\n", " d2u = fd.Diff2(u,(1,),h,padding=None)\n", " return -d2u +u -g0(x)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.775295Z", "iopub.status.busy": "2024-04-30T08:47:01.775205Z", "iopub.status.idle": "2024-04-30T08:47:01.777398Z", "shell.execute_reply": "2024-04-30T08:47:01.777181Z" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0. , -0.58778525, -0.95105652, -1.95105652, -1.58778525,\n", " -1. , -0.41221475, -0.04894348, 0.95105652, 0.58778525])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Scheme0(u,x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Sparse automatic differentiation\n", "\n", "The sparse automatic differentiation classes, first and second order, allow to compute gradients, jacobian matrices, and hessians, represented in a sparse manner." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1 Elementary symbolic perturbation\n", "\n", "Assume that $u^0 = (u^0_i)_{0 \\leq i < n}$ is some array. \n", "Using the `ad.Sparse.identity` constructor, one can create an augmented variable $u^1$ with components\n", "$$\n", " u^1_i = u^0_i + \\delta_i + o(\\delta_i)\n", "$$\n", "where $(\\delta_i)_{0 \\leq i < n}$ are independent first order symbolic perturbations. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.778814Z", "iopub.status.busy": "2024-04-30T08:47:01.778726Z", "iopub.status.idle": "2024-04-30T08:47:01.780473Z", "shell.execute_reply": "2024-04-30T08:47:01.780247Z" } }, "outputs": [], "source": [ "x = np.linspace(0,1,5,endpoint=False)\n", "u0 = np.cos(2.*np.pi*x)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.781823Z", "iopub.status.busy": "2024-04-30T08:47:01.781747Z", "iopub.status.idle": "2024-04-30T08:47:01.783372Z", "shell.execute_reply": "2024-04-30T08:47:01.783150Z" } }, "outputs": [], "source": [ "u1 = ad.Sparse.identity(constant=u0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Elementary properties of the `np.ndarray` class are inherited" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.784682Z", "iopub.status.busy": "2024-04-30T08:47:01.784606Z", "iopub.status.idle": "2024-04-30T08:47:01.786525Z", "shell.execute_reply": "2024-04-30T08:47:01.786267Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ndim : 1 , shape : (5,) , size : 5 , len : 5\n" ] } ], "source": [ "print(\"ndim :\",u1.ndim,\", shape :\",u1.shape,\", size :\",u1.size,\", len :\",len(u1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, internal data includes coefficient and index arrays, with one additional dimension. For now, this additional dimension is a singleton, because the perturbations of each component $u_i^1$ is the elementary $\\delta_i$." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.787839Z", "iopub.status.busy": "2024-04-30T08:47:01.787768Z", "iopub.status.idle": "2024-04-30T08:47:01.789655Z", "shell.execute_reply": "2024-04-30T08:47:01.789426Z" } }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u1.size_ad" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.791020Z", "iopub.status.busy": "2024-04-30T08:47:01.790947Z", "iopub.status.idle": "2024-04-30T08:47:01.792763Z", "shell.execute_reply": "2024-04-30T08:47:01.792521Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Constant term : [ 1. 0.30901699 -0.80901699 -0.80901699 0.30901699]\n", "Coefficients of the symbolic perturbation : [[1.]\n", " [1.]\n", " [1.]\n", " [1.]\n", " [1.]]\n", "Indices of the symbolic perturbation : [[0]\n", " [1]\n", " [2]\n", " [3]\n", " [4]]\n" ] } ], "source": [ "print(\"Constant term :\", u1.value)\n", "print(\"Coefficients of the symbolic perturbation :\", u1.coef)\n", "print(\"Indices of the symbolic perturbation :\", u1.index)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.794016Z", "iopub.status.busy": "2024-04-30T08:47:01.793943Z", "iopub.status.idle": "2024-04-30T08:47:01.796069Z", "shell.execute_reply": "2024-04-30T08:47:01.795844Z" } }, "outputs": [ { "data": { "text/plain": [ "spAD(array([ 1. , 0.30901699, -0.80901699, -0.80901699, 0.30901699]), array([[1.],\n", " [1.],\n", " [1.],\n", " [1.],\n", " [1.]]), array([[0],\n", " [1],\n", " [2],\n", " [3],\n", " [4]]))" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u1 # Scalar values, gradient coefficients, gradient indices." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Second order sparse automatic differentiation is based on the same principles. \n", "Using the `ad.Sparse2.identity` constructor, one generates an augmented variable $u^2$ with components\n", "$$\n", " u^2_i = u^0_i + \\delta_i + o(\\delta_i^2)\n", "$$\n", "where $(\\delta_i)_{0 \\leq i < n}$ are independent first order symbolic perturbations. The only difference with the first order case here lies in the remainder $o(\\delta_i^2)$: the second order coefficients are known and explicitly set to zero." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.797425Z", "iopub.status.busy": "2024-04-30T08:47:01.797346Z", "iopub.status.idle": "2024-04-30T08:47:01.798993Z", "shell.execute_reply": "2024-04-30T08:47:01.798760Z" } }, "outputs": [], "source": [ "u2 = ad.Sparse2.identity(constant=u0)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.800212Z", "iopub.status.busy": "2024-04-30T08:47:01.800133Z", "iopub.status.idle": "2024-04-30T08:47:01.802156Z", "shell.execute_reply": "2024-04-30T08:47:01.801937Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Constant term : [ 1. 0.30901699 -0.80901699 -0.80901699 0.30901699]\n", "First order coefficients : [[1.]\n", " [1.]\n", " [1.]\n", " [1.]\n", " [1.]]\n", "First order indices : [[0]\n", " [1]\n", " [2]\n", " [3]\n", " [4]]\n", "\n", "Second order coefficients : []\n", "Second order row indices : []\n", "Second order column indices : []\n" ] } ], "source": [ "print(\"Constant term :\", u2.value)\n", "print(\"First order coefficients :\", u2.coef1)\n", "print(\"First order indices :\", u2.index)\n", "print()\n", "\n", "print(\"Second order coefficients :\",u2.coef2)\n", "print(\"Second order row indices :\",u2.index_row)\n", "print(\"Second order column indices :\",u2.index_col)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.803400Z", "iopub.status.busy": "2024-04-30T08:47:01.803324Z", "iopub.status.idle": "2024-04-30T08:47:01.805494Z", "shell.execute_reply": "2024-04-30T08:47:01.805272Z" } }, "outputs": [ { "data": { "text/plain": [ "spAD2(array([ 1. , 0.30901699, -0.80901699, -0.80901699, 0.30901699]), array([[1.],\n", " [1.],\n", " [1.],\n", " [1.],\n", " [1.]]), array([[0],\n", " [1],\n", " [2],\n", " [3],\n", " [4]]), array([], shape=(5, 0), dtype=float64), array([], shape=(5, 0), dtype=int64), array([], shape=(5, 0), dtype=int64))" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 Evaluating the objective function \n", "Arithmetic operations, such as left and right multiplication, addition, substraction, division, work as usual.\n", "In particular, we can evaluate our objective function." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.806844Z", "iopub.status.busy": "2024-04-30T08:47:01.806771Z", "iopub.status.idle": "2024-04-30T08:47:01.808615Z", "shell.execute_reply": "2024-04-30T08:47:01.808394Z" } }, "outputs": [], "source": [ "obj1 = Objective(u1,x,dx,F0,F0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is an `array scalar`, i.e. an array whose shape is the empty tuple, but with substantial AD information." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.809942Z", "iopub.status.busy": "2024-04-30T08:47:01.809868Z", "iopub.status.idle": "2024-04-30T08:47:01.811958Z", "shell.execute_reply": "2024-04-30T08:47:01.811743Z" } }, "outputs": [ { "data": { "text/plain": [ "spAD(array(17.39957514), array([ -6.90983006, -11.18033989, 0. , 11.18033989,\n", " 6.90983006, 6.90983006, 11.18033989, -0. ,\n", " -11.18033989, -6.90983006, 0.1 , 0.0309017 ,\n", " -0.0809017 , -0.0809017 , 0.0309017 ]), array([1, 2, 3, 4, 0, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]))" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dense representation of the AD information would be more adequate for this particular variable. A conversion is obtained as follows." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.813246Z", "iopub.status.busy": "2024-04-30T08:47:01.813166Z", "iopub.status.idle": "2024-04-30T08:47:01.815232Z", "shell.execute_reply": "2024-04-30T08:47:01.814982Z" } }, "outputs": [ { "data": { "text/plain": [ "denseAD(array(17.39957514),\n", "array([ 13.91966011, 4.30141153, -11.26124159, -11.26124159,\n", " 4.30141153]))" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "obj1.to_dense()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note also that the variable $b^1=$ `obj1` has the form\n", "$$\n", " b^1 = b^0 + \\sum_{0 \\leq i " ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.864402Z", "iopub.status.busy": "2024-04-30T08:47:01.864323Z", "iopub.status.idle": "2024-04-30T08:47:01.866156Z", "shell.execute_reply": "2024-04-30T08:47:01.865918Z" } }, "outputs": [], "source": [ "def fun_with_loop(y):\n", " r=y.copy()\n", " for i in range(3):\n", " r=r+r**2\n", " return r" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.867412Z", "iopub.status.busy": "2024-04-30T08:47:01.867337Z", "iopub.status.idle": "2024-04-30T08:47:01.869707Z", "shell.execute_reply": "2024-04-30T08:47:01.869478Z" } }, "outputs": [ { "data": { "text/plain": [ "spAD(array([42. , 0.89091371, -0.11356996, -0.11356996, 0.89091371]), array([[ 1.00000000e+00, 2.00000000e+00, 4.00000000e+00,\n", " 8.00000000e+00, 1.20000000e+01, 2.40000000e+01,\n", " 4.80000000e+01, 9.60000000e+01],\n", " [ 1.00000000e+00, 6.18033989e-01, 8.09016994e-01,\n", " 5.00000000e-01, 1.13627124e+00, 7.02254249e-01,\n", " 9.19262746e-01, 5.68135621e-01],\n", " [ 1.00000000e+00, -1.61803399e+00, -3.09016994e-01,\n", " 5.00000000e-01, -2.61271243e-01, 4.22745751e-01,\n", " 8.07372542e-02, -1.30635621e-01],\n", " [ 1.00000000e+00, -1.61803399e+00, -3.09016994e-01,\n", " 5.00000000e-01, -2.61271243e-01, 4.22745751e-01,\n", " 8.07372542e-02, -1.30635621e-01],\n", " [ 1.00000000e+00, 6.18033989e-01, 8.09016994e-01,\n", " 5.00000000e-01, 1.13627124e+00, 7.02254249e-01,\n", " 9.19262746e-01, 5.68135621e-01]]), array([[0, 0, 0, 0, 0, 0, 0, 0],\n", " [1, 1, 1, 1, 1, 1, 1, 1],\n", " [2, 2, 2, 2, 2, 2, 2, 2],\n", " [3, 3, 3, 3, 3, 3, 3, 3],\n", " [4, 4, 4, 4, 4, 4, 4, 4]]))" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f1 = fun_with_loop(u1)\n", "f1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After each binary operation, the AD information is as big as the concatenation of the input AD informations. In a loop, it therefore grows exponentially and becomes extremely redundant. Here, the result can be much simplified.\n", "\n", "AD information is however simplified only at the request of the user, because that is a costly operation, involving some sorting and array resizing. And simplifying the final result only partially solves the problem, due to the cost of computing the intermediate results." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.870994Z", "iopub.status.busy": "2024-04-30T08:47:01.870918Z", "iopub.status.idle": "2024-04-30T08:47:01.873351Z", "shell.execute_reply": "2024-04-30T08:47:01.873129Z" } }, "outputs": [ { "data": { "text/plain": [ "spAD(array([42. , 0.89091371, -0.11356996, -0.11356996, 0.89091371]), array([[195. ],\n", " [ 6.25297484],\n", " [ -0.31547484],\n", " [ -0.31547484],\n", " [ 6.25297484]]), array([[0],\n", " [1],\n", " [2],\n", " [3],\n", " [4]]))" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f1.simplify_ad()\n", "f1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 A first solution: regular simplifications\n", "\n", "When a loop is present, the AD information grows exponentially and its size is quickly our of control, as illustrated above. A partial solution is to simplify the AD information at each iteration." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.874683Z", "iopub.status.busy": "2024-04-30T08:47:01.874607Z", "iopub.status.idle": "2024-04-30T08:47:01.876428Z", "shell.execute_reply": "2024-04-30T08:47:01.876195Z" } }, "outputs": [], "source": [ "def fun_with_loop_simpl(y):\n", " r=y.copy()\n", " for i in range(3):\n", " r=r+r**2\n", " ad.simplify_ad(r) # Simplify AD information\n", " return r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Complexity, is now linear w.r.t. the number of iterations, instead of exponential." ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.877748Z", "iopub.status.busy": "2024-04-30T08:47:01.877674Z", "iopub.status.idle": "2024-04-30T08:47:01.880175Z", "shell.execute_reply": "2024-04-30T08:47:01.879943Z" } }, "outputs": [ { "data": { "text/plain": [ "spAD(array([42. , 0.89091371, -0.11356996, -0.11356996, 0.89091371]), array([[195. ],\n", " [ 6.25297484],\n", " [ -0.31547484],\n", " [ -0.31547484],\n", " [ 6.25297484]]), array([[0],\n", " [1],\n", " [2],\n", " [3],\n", " [4]]))" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fun_with_loop_simpl(u1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function had to be slightly modified, but it still applies transparently to non AD variables." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.881450Z", "iopub.status.busy": "2024-04-30T08:47:01.881374Z", "iopub.status.idle": "2024-04-30T08:47:01.883394Z", "shell.execute_reply": "2024-04-30T08:47:01.883138Z" } }, "outputs": [ { "data": { "text/plain": [ "array([42. , 0.89091371, -0.11356996, -0.11356996, 0.89091371])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fun_with_loop_simpl(u0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3 Another possible solution: the combination with dense automatic differentiation\n", "\n", "Dense automatic differentiation is not subject to the same limitations: it is efficient for evaluating complex functions depending only on a few independent variables. \n", "\n", "When such a function needs to be executed on a sparse AD variable, the recommended approach is to compose dense and sparse automatic differentiation. This is achieved using the `ad.apply` command and indicating that the last dimensions are bound together." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.884746Z", "iopub.status.busy": "2024-04-30T08:47:01.884669Z", "iopub.status.idle": "2024-04-30T08:47:01.886300Z", "shell.execute_reply": "2024-04-30T08:47:01.886086Z" } }, "outputs": [], "source": [ "u1 = ad.Sparse.identity(constant=u0)\n", "u2 = ad.Sparse2.identity(constant=u0)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.887546Z", "iopub.status.busy": "2024-04-30T08:47:01.887471Z", "iopub.status.idle": "2024-04-30T08:47:01.889875Z", "shell.execute_reply": "2024-04-30T08:47:01.889649Z" } }, "outputs": [ { "data": { "text/plain": [ "spAD(array([42. , 0.89091371, -0.11356996, -0.11356996, 0.89091371]), array([[195. ],\n", " [ 6.25297484],\n", " [ -0.31547484],\n", " [ -0.31547484],\n", " [ 6.25297484]]), array([[0],\n", " [1],\n", " [2],\n", " [3],\n", " [4]]))" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ad.apply(fun_with_loop,u1,shape_bound=u1.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following command achieves the same result \"by hand\", in a less automated way." ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.891216Z", "iopub.status.busy": "2024-04-30T08:47:01.891139Z", "iopub.status.idle": "2024-04-30T08:47:01.893792Z", "shell.execute_reply": "2024-04-30T08:47:01.893583Z" } }, "outputs": [ { "data": { "text/plain": [ "spAD(array([42. , 0.89091371, -0.11356996, -0.11356996, 0.89091371]), array([[195. ],\n", " [ 6.25297484],\n", " [ -0.31547484],\n", " [ -0.31547484],\n", " [ 6.25297484]]), array([[0],\n", " [1],\n", " [2],\n", " [3],\n", " [4]]))" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# These dimensions can be \"bound together\", a.k.a. not regarded as independent.\n", "shape_bound = u1.shape\n", "# Generate an adequately dimensioned denseAD variable\n", "u1d = ad.Dense.identity(constant=u1.value,shape_bound=shape_bound)\n", "# Evaluate the complex function \n", "fu1d = fun_with_loop(u1d)\n", "# Compose the dense en sparse AD\n", "ad.compose(fu1d,np.expand_dims(u1,axis=0),shape_bound=shape_bound)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Composition also works between *dense* AD types. It may speed up computations in the case of the evaluation of a complex function on a dense AD type with many components." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.895220Z", "iopub.status.busy": "2024-04-30T08:47:01.895142Z", "iopub.status.idle": "2024-04-30T08:47:01.897512Z", "shell.execute_reply": "2024-04-30T08:47:01.897274Z" } }, "outputs": [ { "data": { "text/plain": [ "denseAD(array([42. , 0.89091371, -0.11356996, -0.11356996, 0.89091371]),\n", "array([[195. , 0. , 0. , 0. ,\n", " 0. ],\n", " [ 0. , 6.25297484, 0. , 0. ,\n", " 0. ],\n", " [ 0. , 0. , -0.31547484, 0. ,\n", " 0. ],\n", " [ 0. , 0. , 0. , -0.31547484,\n", " 0. ],\n", " [ 0. , 0. , 0. , 0. ,\n", " 6.25297484]]))" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ad.apply(fun_with_loop,u1.to_dense(),shape_bound=u1.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Composition also works for second order automatic differentiation." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.898833Z", "iopub.status.busy": "2024-04-30T08:47:01.898754Z", "iopub.status.idle": "2024-04-30T08:47:01.900542Z", "shell.execute_reply": "2024-04-30T08:47:01.900311Z" } }, "outputs": [], "source": [ "result = fun_with_loop(u2.to_dense())" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.901868Z", "iopub.status.busy": "2024-04-30T08:47:01.901789Z", "iopub.status.idle": "2024-04-30T08:47:01.906335Z", "shell.execute_reply": "2024-04-30T08:47:01.906104Z" } }, "outputs": [ { "data": { "text/plain": [ "(1.4210854715202004e-14, 7.105427357601002e-15, 0.0)" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tuple(LInfNorm_AD2(result - a.to_dense()) for a in (\n", " fun_with_loop(u2),\n", " fun_with_loop_simpl(u2),\n", " ad.apply(fun_with_loop,u2,shape_bound=u2.shape),\n", "))" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.907686Z", "iopub.status.busy": "2024-04-30T08:47:01.907606Z", "iopub.status.idle": "2024-04-30T08:47:01.910073Z", "shell.execute_reply": "2024-04-30T08:47:01.909845Z" } }, "outputs": [ { "data": { "text/plain": [ "(0.0,)" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tuple(LInfNorm_AD2(result - a) for a in (\n", " ad.apply(fun_with_loop,u2.to_dense(),shape_bound=u2.shape),\n", "))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Differentiating extrema using the envelope theorem\n", "\n", "*The envelope theorem.* Consider a function $f:X \\to R$ defined as the minimum of a bivariate function $F:X \\times A \\to R$ over the second variable:\n", "$$\n", " f(x):=\\min_{\\alpha\\in A} F(x,\\alpha),\n", "$$\n", "for each element $x$ of a domain $X \\subset R^d$, and where $A$ is an arbitrary set. The *envelope theorem* states that, under minor technical assumptions, one has \n", "$$\n", " \\nabla f(x) = \\frac{\\partial F}{\\partial x}(x,\\alpha(x)),\n", "$$\n", "where $\\alpha(x) \\in A$ is the minimizer to the above optimization problem, that is $f(x) = F(x,\\alpha(x))$.\n", "\n", "**Generalizations.**\n", "The function $f$ may be defined by a maximum instead of a minimum, or a combination e.g. $f(x) =\\min_{\\alpha\\in A}\\max_{\\beta \\in B} F(x,\\alpha,\\beta)$. It may also be defined in a piecewise manner, etc.\n", "\n", "**Relevance to automatic differentiation.**\n", "In a typical pratical situation:\n", "* Computing the optimal $\\alpha(x)$ is a costly procedure, involving an exhausitve serach or an iterative optimization method. The AD information is useless in this context, and has an significant performance impact. It is preferable *not to* use AD types part.\n", "* Evaluating $f(x) = F(x,\\alpha(x))$ is rather straightforward once $\\alpha(x)$ is known. Furthermore AD types are needed in this final step in order to effectively differentiate $f$.\n", "\n", "**First order derivatives only !** From a mathematical standpoint, the envelope theorem *does not apply to second order derivatives* in general. Please recall this. Our AD library will not stop you or issue a warning if you attempt to do so. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Practical implementation.**\n", "Implement the function $f$ with the following enhancements:\n", "* *Input*. In addition to the standard parameters (a.k.a. the variable $x$ above), the function input should feature an `oracle` *optional* parameter (corresponding to $\\alpha(x)$). If the oracle is provided, then the optimization step is bypassed.\n", "* *Output*. In addition to the relevant value $f(x)$ the function output should include the optimized parameter $\\alpha(x)$, which may be used as an `oracle` in subsequent calls.\n", "\n", "Then call the function $f$ twice, first with raw `np.ndarray` data types, and then with the desired AD type. We provide a helper function in the AD library to automate this process." ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.911460Z", "iopub.status.busy": "2024-04-30T08:47:01.911385Z", "iopub.status.idle": "2024-04-30T08:47:01.913049Z", "shell.execute_reply": "2024-04-30T08:47:01.912818Z" } }, "outputs": [], "source": [ "u1 = ad.Sparse.identity(constant=u0)\n", "u2 = ad.Sparse2.identity(constant=u0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We define a function whose evaluation involves a minimization by exhaustive search in a large table, here with $100$ entries." ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.914368Z", "iopub.status.busy": "2024-04-30T08:47:01.914293Z", "iopub.status.idle": "2024-04-30T08:47:01.916039Z", "shell.execute_reply": "2024-04-30T08:47:01.915799Z" } }, "outputs": [], "source": [ "def f(x,A,B):\n", " return np.min(A*x+B,axis=0) # Exhaustive search" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.917307Z", "iopub.status.busy": "2024-04-30T08:47:01.917232Z", "iopub.status.idle": "2024-04-30T08:47:01.919123Z", "shell.execute_reply": "2024-04-30T08:47:01.918898Z" } }, "outputs": [], "source": [ "A=np.linspace(0,1,100)**0.5; B=np.linspace(1,0,100)**2\n", "A,B = (np.expand_dims(e,axis=-1) for e in (A,B))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A little bit of code rewrite is needed to make this function compatible with the oracle interface.\n", "For convenience, we provide a `min_argmin` function, and likewise a `max_argmax` function. (Note that the minimizer index given in the output of `f_oracle` will be erroneous if an oracle is given, but this shall not be an issue.)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.920413Z", "iopub.status.busy": "2024-04-30T08:47:01.920341Z", "iopub.status.idle": "2024-04-30T08:47:01.922265Z", "shell.execute_reply": "2024-04-30T08:47:01.922065Z" } }, "outputs": [], "source": [ "def f_oracle(x,A,B,oracle=None):\n", " if not (oracle is None): \n", " A,B = (np.take_along_axis(e,np.expand_dims(oracle,axis=0),axis=0) for e in (A,B)) \n", " return ad.min_argmin(A*x+B,axis=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is no need to use the AD types to compute the minimizer." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.923569Z", "iopub.status.busy": "2024-04-30T08:47:01.923491Z", "iopub.status.idle": "2024-04-30T08:47:01.925242Z", "shell.execute_reply": "2024-04-30T08:47:01.925012Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Value (without AD) : [ 0.92667447 0.30279844 -0.80901699 -0.80901699 0.30279844] \n", "and minimizer [69 91 99 99 91]\n" ] } ], "source": [ "value,minimizer = f_oracle(u0,A,B)\n", "print(\"Value (without AD) : \", value, \"\\nand minimizer \", minimizer)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given the minimizer, one can efficiently evaluate $f$ on the AD type, bypassing most computations." ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.926577Z", "iopub.status.busy": "2024-04-30T08:47:01.926502Z", "iopub.status.idle": "2024-04-30T08:47:01.928759Z", "shell.execute_reply": "2024-04-30T08:47:01.928528Z" } }, "outputs": [ { "data": { "text/plain": [ "spAD(array([ 0.92667447, 0.30279844, -0.80901699, -0.80901699, 0.30279844]), array([[0.83484711],\n", " [0.95874497],\n", " [1. ],\n", " [1. ],\n", " [0.95874497]]), array([[0],\n", " [1],\n", " [2],\n", " [3],\n", " [4]]))" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f_oracle(u1,A,B,oracle=minimizer)[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `ad.apply`, with the keyword argument `envelope=True`, to automatically apply the above procedure." ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:47:01.930079Z", "iopub.status.busy": "2024-04-30T08:47:01.930004Z", "iopub.status.idle": "2024-04-30T08:47:01.932349Z", "shell.execute_reply": "2024-04-30T08:47:01.932105Z" } }, "outputs": [ { "data": { "text/plain": [ "(spAD(array([ 0.92667447, 0.30279844, -0.80901699, -0.80901699, 0.30279844]), array([[0.83484711],\n", " [0.95874497],\n", " [1. ],\n", " [1. ],\n", " [0.95874497]]), array([[0],\n", " [1],\n", " [2],\n", " [3],\n", " [4]])),\n", " array([69, 91, 99, 99, 91]))" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ad.apply(f_oracle,u1,A,B,envelope=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }