{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Adaptive PDE discretizations on Cartesian grids\n", "## Volume : Algorithmic tools\n", "## Part : Automatic differentiation\n", "## Chapter : Reverse automatic differentiation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook illustrates *reverse* automatic differentiation, of *first* and *second* order. Recall that this approach is recommended for functions from a large dimensional space, to a small dimensional space, and whose jacobian does not have a sparse structure. It is typically appropriate for large scale optimization problems.\n", "\n", "**Disclaimer.** First order reverse automatic differentiation is a classical feature, found in many packages better optimized and better maintained than this one. If you only need this specific feature, then it could be wise to use another implementation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[**Summary**](Summary.ipynb) of volume Algorithmic tools, this series of notebooks.\n", "\n", "[**Main summary**](../Summary.ipynb) of the Adaptive Grid Discretizations \n", "\tbook of notebooks, including the other volumes.\n", "\n", "# Table of contents\n", " * [1. Generating variables and requesting an expression's derivatives](#1.-Generating-variables-and-requesting-an-expression's-derivatives)\n", " * [1.1 Gradient](#1.1-Gradient)\n", " * [1.2 Hessian operator](#1.2-Hessian-operator)\n", " * [1.3 Simplification of the AD information](#1.3-Simplification-of-the-AD-information)\n", " * [2. Linear operators and their adjoints](#2.-Linear-operators-and-their-adjoints)\n", " * [2.1 Linear mapping](#2.1-Linear-mapping)\n", " * [2.2 Linear inversion](#2.2-Linear-inversion)\n", " * [3. Non-linear operators](#3.-Non-linear-operators)\n", " * [3.1 Defining an operator and its adjoint](#3.1-Defining-an-operator-and-its-adjoint)\n", " * [3.2 Defining an operator from a block of instructions](#3.2-Defining-an-operator-from-a-block-of-instructions)\n", " * [3.3 Loops](#3.3-Loops)\n", " * [4. Format of variables](#4.-Format-of-variables)\n", " * [4.1 Scalars and multi-dimensional arrays](#4.1-Scalars-and-multi-dimensional-arrays)\n", " * [4.2 Lists and dictionnaries](#4.2-Lists-and-dictionnaries)\n", "\n", "\n", "\n", "**Acknowledgement.** Some of the experiments presented in these notebooks are part of \n", "ongoing research with Ludovic Métivier and Da Chen.\n", "\n", "Copyright Jean-Marie Mirebeau, Centre Borelli, ENS Paris-Saclay, CNRS, University Paris-Saclay" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Known bugs and incompatibilities.** Our implementation of automatic differentiation is based on subclassing the numpy array class. While simple and powerful, this approach suffers from a few pitfalls, described in notebook [ADBugs](ADBugs.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Importing the required libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.787343Z", "iopub.status.busy": "2024-04-30T08:46:16.786980Z", "iopub.status.idle": "2024-04-30T08:46:16.796642Z", "shell.execute_reply": "2024-04-30T08:46:16.796200Z" } }, "outputs": [], "source": [ "import sys; sys.path.insert(0,\"..\") # Allow importing agd from parent directory\n", "#from Miscellaneous import TocTools; print(TocTools.displayTOC('Reverse','Algo'))" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.799123Z", "iopub.status.busy": "2024-04-30T08:46:16.798925Z", "iopub.status.idle": "2024-04-30T08:46:16.898632Z", "shell.execute_reply": "2024-04-30T08:46:16.898303Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import scipy.sparse; import scipy.sparse.linalg" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.900362Z", "iopub.status.busy": "2024-04-30T08:46:16.900244Z", "iopub.status.idle": "2024-04-30T08:46:16.904787Z", "shell.execute_reply": "2024-04-30T08:46:16.904535Z" } }, "outputs": [], "source": [ "import agd.AutomaticDifferentiation as ad\n", "import agd.FiniteDifferences as fd\n", "import agd.LinearParallel as lp" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.906180Z", "iopub.status.busy": "2024-04-30T08:46:16.906095Z", "iopub.status.idle": "2024-04-30T08:46:16.907846Z", "shell.execute_reply": "2024-04-30T08:46:16.907622Z" } }, "outputs": [], "source": [ "def LInfNorm(a): return np.max(np.abs(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Generating variables and requesting an expression's derivatives\n", "\n", "Reverse automatic differentiation works by keeping a history of the computations. The sensitivity of the result w.r.t some components is obtained by propagating the sensitivities backward in the computation queue and using the operator adjoints.\n", "\n", "Our implementation of the reverseAD is differs from the denseAD or sparseAD classes: here we do not define an data type overloading the arithmetic operators and the basic special functions. Instead, the reverseAD class is only meant to keep a history of user specified computations.\n", "\n", "*Note to reader.* This section only shows how to generate variables, and request an expression's derivatives. See the next section for an actual use of automatic differentiation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.1 Gradient\n", "\n", "We first create some basic numpy arrays." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.909184Z", "iopub.status.busy": "2024-04-30T08:46:16.909102Z", "iopub.status.idle": "2024-04-30T08:46:16.910976Z", "shell.execute_reply": "2024-04-30T08:46:16.910751Z" } }, "outputs": [], "source": [ "x0=np.linspace(0,np.pi,4)\n", "gridScale=x0[1]-x0[0]\n", "u0=np.sin(x0)\n", "v0=np.cos(x0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we generate a reverseAD history, and register the variables of interest." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.912456Z", "iopub.status.busy": "2024-04-30T08:46:16.912378Z", "iopub.status.idle": "2024-04-30T08:46:16.914359Z", "shell.execute_reply": "2024-04-30T08:46:16.914111Z" } }, "outputs": [], "source": [ "# Create an empty history\n", "rev = ad.Reverse.empty()\n", "# Create AD variables w.r.t which the gradient will be required\n", "u1 = rev.identity(constant=u0)\n", "v1 = rev.identity(constant=v0)\n", "\n", "# Equivalently : \n", "#rev,(u1,v1) = ad.Reverse.empty(inputs=(u0,v0))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In our implementation, the generated AD variables are of sparse AD type." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.915739Z", "iopub.status.busy": "2024-04-30T08:46:16.915659Z", "iopub.status.idle": "2024-04-30T08:46:16.917811Z", "shell.execute_reply": "2024-04-30T08:46:16.917545Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "u1: spAD(array([0.00000000e+00, 8.66025404e-01, 8.66025404e-01, 1.22464680e-16]), array([[1.],\n", " [1.],\n", " [1.],\n", " [1.]]), array([[0],\n", " [1],\n", " [2],\n", " [3]]))\n", "v1: spAD(array([ 1. , 0.5, -0.5, -1. ]), array([[1.],\n", " [1.],\n", " [1.],\n", " [1.]]), array([[4],\n", " [5],\n", " [6],\n", " [7]]))\n" ] } ], "source": [ "print(\"u1:\",u1)\n", "print(\"v1:\",v1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we make some computation using the variables registered in the reverseAD class, then we can request the gradient of the final expression." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.919181Z", "iopub.status.busy": "2024-04-30T08:46:16.919097Z", "iopub.status.idle": "2024-04-30T08:46:16.920751Z", "shell.execute_reply": "2024-04-30T08:46:16.920517Z" } }, "outputs": [], "source": [ "result = (u1**2+v1**2).sum()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.922031Z", "iopub.status.busy": "2024-04-30T08:46:16.921956Z", "iopub.status.idle": "2024-04-30T08:46:16.925491Z", "shell.execute_reply": "2024-04-30T08:46:16.925272Z" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0.00000000e+00, 1.73205081e+00, 1.73205081e+00, 2.44929360e-16,\n", " 2.00000000e+00, 1.00000000e+00, -1.00000000e+00, -2.00000000e+00])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rev.gradient(result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If needed, this expression can be reshaped similar to the input variables." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.926854Z", "iopub.status.busy": "2024-04-30T08:46:16.926781Z", "iopub.status.idle": "2024-04-30T08:46:16.928595Z", "shell.execute_reply": "2024-04-30T08:46:16.928359Z" } }, "outputs": [], "source": [ "u_grad,v_grad = rev.to_inputshapes(rev.gradient(result))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.929873Z", "iopub.status.busy": "2024-04-30T08:46:16.929791Z", "iopub.status.idle": "2024-04-30T08:46:16.931513Z", "shell.execute_reply": "2024-04-30T08:46:16.931283Z" } }, "outputs": [], "source": [ "assert LInfNorm(v_grad-2*v0) < 1e-6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2 Hessian operator\n", "\n", "The hessian operator may be accessed as well, using the Reverse2 submodule. Note that the hessian matrix itself is never assembled, since in applications it would typically be dense and of large size." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.932908Z", "iopub.status.busy": "2024-04-30T08:46:16.932826Z", "iopub.status.idle": "2024-04-30T08:46:16.934921Z", "shell.execute_reply": "2024-04-30T08:46:16.934676Z" } }, "outputs": [], "source": [ "rev,(u1,v1) = ad.Reverse2.empty(inputs=(u0,v0))\n", "result = (u1**2+v1**2).sum()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.936264Z", "iopub.status.busy": "2024-04-30T08:46:16.936186Z", "iopub.status.idle": "2024-04-30T08:46:16.937950Z", "shell.execute_reply": "2024-04-30T08:46:16.937724Z" } }, "outputs": [], "source": [ "grad = rev.gradient(result) # Gradient\n", "hess = rev.hessian(result) # Hessian operator" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.939220Z", "iopub.status.busy": "2024-04-30T08:46:16.939143Z", "iopub.status.idle": "2024-04-30T08:46:16.941226Z", "shell.execute_reply": "2024-04-30T08:46:16.940993Z" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0.00000000e+00, 1.73205081e+00, 1.73205081e+00, 2.44929360e-16,\n", " 2.00000000e+00, 1.00000000e+00, -1.00000000e+00, -2.00000000e+00])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grad" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this specific example, the hessian operator is twice the identity." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.942666Z", "iopub.status.busy": "2024-04-30T08:46:16.942593Z", "iopub.status.idle": "2024-04-30T08:46:16.944923Z", "shell.execute_reply": "2024-04-30T08:46:16.944672Z" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0.00000000e+00, 3.46410162e+00, 3.46410162e+00, 4.89858720e-16,\n", " 4.00000000e+00, 2.00000000e+00, -2.00000000e+00, -4.00000000e+00])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hess(grad)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.3 Simplification of the AD information\n", "\n", "Our implementation of reverse AD does rely on sparse AD, for the elementary operations, such as :\n", "* arithmetic operators $+,-,*,/$.\n", "* special functions $\\cos$, $\\sin$, $\\ln$, $\\exp$.\n", "* variable reshaping.\n", "\n", "One may recall that sparseness is easily lost, as in the example below, which may result in complexity issues." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.946277Z", "iopub.status.busy": "2024-04-30T08:46:16.946203Z", "iopub.status.idle": "2024-04-30T08:46:16.948488Z", "shell.execute_reply": "2024-04-30T08:46:16.948252Z" } }, "outputs": [], "source": [ "n=4\n", "rev = ad.Reverse2.empty()\n", "u1 = rev.identity(shape=(n,n))\n", "u2 = (1.+u1+u1**2).sum(axis=0)\n", "result = (u2**2).sum(axis=0)\n", "\n", "grad,hess = rev.gradient(result),rev.hessian(result)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.949803Z", "iopub.status.busy": "2024-04-30T08:46:16.949723Z", "iopub.status.idle": "2024-04-30T08:46:16.951458Z", "shell.execute_reply": "2024-04-30T08:46:16.951213Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Result AD size : 272\n" ] } ], "source": [ "print(\"Result AD size : \",result.size_ad2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sparse AD simplification may limit sparsity loss, but it is a costly process (involving sorting), which is not always very efficient." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.952771Z", "iopub.status.busy": "2024-04-30T08:46:16.952695Z", "iopub.status.idle": "2024-04-30T08:46:16.956218Z", "shell.execute_reply": "2024-04-30T08:46:16.956004Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Result AD size with sparse AD simplification : 80\n" ] } ], "source": [ "u2.simplify_ad()\n", "result = (u2**2).sum(axis=0)\n", "print(\"Result AD size with sparse AD simplification : \",result.size_ad2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can take advantage of reverse AD to reintroduce some sparsity. \n", "\n", "**Important.** \n", "Simplification based on reverse AD should not be confused with simplification based on sparse AD. Indeed, it does not involve sorting, and it is always optimally efficient." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.957614Z", "iopub.status.busy": "2024-04-30T08:46:16.957536Z", "iopub.status.idle": "2024-04-30T08:46:16.960023Z", "shell.execute_reply": "2024-04-30T08:46:16.959799Z" } }, "outputs": [], "source": [ "rev = ad.Reverse2.empty()\n", "u1 = rev.identity(shape=(n,n))\n", "u2 = (1.+u1+u1**2).sum(axis=0)\n", "u2_simpl = rev.simplify(u2) # Intermediate simplification\n", "result = (u2_simpl**2).sum(axis=0)\n", "\n", "grad_simpl,hess_simpl = rev.gradient(result),rev.hessian(result)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.961336Z", "iopub.status.busy": "2024-04-30T08:46:16.961256Z", "iopub.status.idle": "2024-04-30T08:46:16.962940Z", "shell.execute_reply": "2024-04-30T08:46:16.962718Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Result AD size with reverse AD simplification : 4\n" ] } ], "source": [ "print(\"Result AD size with reverse AD simplification : \",result.size_ad2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reverse AD simplification proceeds by replacing the the complicated symbolic perturbations in $u_2$ with placeholders, in $u_2\\_$simpl, with negative indices." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.964212Z", "iopub.status.busy": "2024-04-30T08:46:16.964135Z", "iopub.status.idle": "2024-04-30T08:46:16.966346Z", "shell.execute_reply": "2024-04-30T08:46:16.966112Z" } }, "outputs": [ { "data": { "text/plain": [ "spAD2(array([4., 4., 4., 4.]), array([[1.],\n", " [1.],\n", " [1.],\n", " [1.]]), array([[-1],\n", " [-2],\n", " [-3],\n", " [-4]]), array([], shape=(4, 0), dtype=float64), array([], shape=(4, 0), dtype=int64), array([], shape=(4, 0), dtype=int64))" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u2_simpl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The relative dependency of the old and new symbolic perturbations is registered, which allows computing derivatives." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.967733Z", "iopub.status.busy": "2024-04-30T08:46:16.967656Z", "iopub.status.idle": "2024-04-30T08:46:16.970174Z", "shell.execute_reply": "2024-04-30T08:46:16.969942Z" } }, "outputs": [], "source": [ "assert LInfNorm(grad-grad_simpl) < 1e-13\n", "dir_hessian = grad**2+1.\n", "assert LInfNorm(hess(dir_hessian)-hess_simpl(dir_hessian)) < 1e-13" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Linear operators and their adjoints\n", "\n", "We illustrate Reverse automatic differentiation, in its intended use case involving operators operators whose jacobians are expected to be both high-dimensional and non-sparse. Note that linear operators often fit this desciption. For instance:\n", "* A linear mapping, given in sparse form, but iterated many times.\n", "* The inverse of a linear mapping, given in sparse form.\n", "* The fast fourier transform, etc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1 Linear mapping\n", "\n", "We first construct some sparse matrix, for the exposition, based on a transport scheme implemented using upwind finite differences." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.971595Z", "iopub.status.busy": "2024-04-30T08:46:16.971519Z", "iopub.status.idle": "2024-04-30T08:46:16.973235Z", "shell.execute_reply": "2024-04-30T08:46:16.973018Z" } }, "outputs": [], "source": [ "def TransportScheme(u,h,speed,dt):\n", " return u+dt*speed*fd.DiffUpwind(u,(1,),h,padding=0.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to construct the matrix, we evaluate the scheme on a sparse AD variable." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.974546Z", "iopub.status.busy": "2024-04-30T08:46:16.974470Z", "iopub.status.idle": "2024-04-30T08:46:16.976404Z", "shell.execute_reply": "2024-04-30T08:46:16.976164Z" } }, "outputs": [], "source": [ "speed = 1.; T=1.5; nsteps=5; dt = T/nsteps;\n", "transport_ad = TransportScheme(ad.Sparse.identity(u0.shape),gridScale,speed,dt)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.977715Z", "iopub.status.busy": "2024-04-30T08:46:16.977639Z", "iopub.status.idle": "2024-04-30T08:46:16.979417Z", "shell.execute_reply": "2024-04-30T08:46:16.979198Z" } }, "outputs": [], "source": [ "transport_matrix = scipy.sparse.coo_matrix(transport_ad.triplets()).tocsr()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now use this matrix in the course of reverse AD computations, mixing both linear and non-linear steps." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.980751Z", "iopub.status.busy": "2024-04-30T08:46:16.980671Z", "iopub.status.idle": "2024-04-30T08:46:16.982656Z", "shell.execute_reply": "2024-04-30T08:46:16.982435Z" } }, "outputs": [], "source": [ "rev,(u1,v1) = ad.Reverse.empty(inputs=(u0,v0)) # Create the reverse AD environnement\n", "u2 = rev.apply_linear_mapping(transport_matrix,u1**2,niter=nsteps) #Implement the desired method\n", "result_Reverse = (u2*v1).sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Due to the multiple iterations, the variable $u_2$ does not depend in a sparse manner on $u_1$.\n", "In our implementation, the variable $u_2$ is of sparseAD type but features negative placeholder indices. \n", "Likewise for the final computation result." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.983978Z", "iopub.status.busy": "2024-04-30T08:46:16.983903Z", "iopub.status.idle": "2024-04-30T08:46:16.986075Z", "shell.execute_reply": "2024-04-30T08:46:16.985851Z" } }, "outputs": [ { "data": { "text/plain": [ "spAD(array([5.02050076e-01, 4.17158585e-01, 1.38706040e-01, 2.77367654e-33]), array([[1.],\n", " [1.],\n", " [1.],\n", " [1.]]), array([[-1],\n", " [-2],\n", " [-3],\n", " [-4]]))" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u2" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.987396Z", "iopub.status.busy": "2024-04-30T08:46:16.987316Z", "iopub.status.idle": "2024-04-30T08:46:16.989423Z", "shell.execute_reply": "2024-04-30T08:46:16.989201Z" } }, "outputs": [ { "data": { "text/plain": [ "spAD(array(0.64127635), array([ 1.00000000e+00, 5.00000000e-01, -5.00000000e-01, -1.00000000e+00,\n", " 5.02050076e-01, 4.17158585e-01, 1.38706040e-01, 2.77367654e-33]), array([-1, -2, -3, -4, 4, 5, 6, 7]))" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result_Reverse" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The negative indices are newly introduced symbolic perturbations, whose depedence on the original ones is known thanks to the registered adjoint. This is exploited for computing the gradient of the result." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.990735Z", "iopub.status.busy": "2024-04-30T08:46:16.990654Z", "iopub.status.idle": "2024-04-30T08:46:16.993056Z", "shell.execute_reply": "2024-04-30T08:46:16.992841Z" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0.00000000e+00, 8.03222547e-01, 6.77741743e-01, -2.49367755e-17,\n", " 5.02050076e-01, 4.17158585e-01, 1.38706040e-01, 2.77367654e-33])" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rev.gradient(result_Reverse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check the validity of this implementation using dense automatic differentiation, since this specific instance is small." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.994383Z", "iopub.status.busy": "2024-04-30T08:46:16.994308Z", "iopub.status.idle": "2024-04-30T08:46:16.996632Z", "shell.execute_reply": "2024-04-30T08:46:16.996407Z" } }, "outputs": [], "source": [ "n=x0.size\n", "u1 = ad.Dense2.identity(constant=u0,shift=(0,n))\n", "v1 = ad.Dense2.identity(constant=v0,shift=(n,0))\n", "u2 = ad.apply_linear_mapping(transport_matrix,u1**2,niter=nsteps)\n", "result_Dense2 = (u2*v1).sum()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:16.997935Z", "iopub.status.busy": "2024-04-30T08:46:16.997859Z", "iopub.status.idle": "2024-04-30T08:46:16.999734Z", "shell.execute_reply": "2024-04-30T08:46:16.999522Z" } }, "outputs": [], "source": [ "assert LInfNorm(rev.gradient(result_Reverse) - result_Dense2.coef1) < 1e-13" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Second order reverse automatic differentiation is also implemented." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.001016Z", "iopub.status.busy": "2024-04-30T08:46:17.000935Z", "iopub.status.idle": "2024-04-30T08:46:17.003050Z", "shell.execute_reply": "2024-04-30T08:46:17.002825Z" } }, "outputs": [], "source": [ "rev,(u1,v1) = ad.Reverse2.empty(inputs=(u0,v0))\n", "u2 = rev.apply_linear_mapping(transport_matrix,u1**2,niter=nsteps) \n", "result_Reverse2 = (u2*v1).sum()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.004340Z", "iopub.status.busy": "2024-04-30T08:46:17.004260Z", "iopub.status.idle": "2024-04-30T08:46:17.006087Z", "shell.execute_reply": "2024-04-30T08:46:17.005867Z" } }, "outputs": [], "source": [ "assert LInfNorm(rev.gradient(result_Reverse2) - result_Dense2.coef1) < 1e-13" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.007361Z", "iopub.status.busy": "2024-04-30T08:46:17.007286Z", "iopub.status.idle": "2024-04-30T08:46:17.009171Z", "shell.execute_reply": "2024-04-30T08:46:17.008945Z" } }, "outputs": [], "source": [ "grad = rev.gradient(result_Reverse2)\n", "hess = rev.hessian(result_Reverse2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the hessian is provided as an operator, that can be applied to arbitrary vectors, and not in a matrix form. This is enough to run e.g. the conjugate gradient method." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.010501Z", "iopub.status.busy": "2024-04-30T08:46:17.010421Z", "iopub.status.idle": "2024-04-30T08:46:17.012409Z", "shell.execute_reply": "2024-04-30T08:46:17.012176Z" } }, "outputs": [ { "data": { "text/plain": [ ".hess_operator(dir_hessian, coef2_init=None, with_grad=False)>" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hess" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.013717Z", "iopub.status.busy": "2024-04-30T08:46:17.013635Z", "iopub.status.idle": "2024-04-30T08:46:17.016069Z", "shell.execute_reply": "2024-04-30T08:46:17.015838Z" } }, "outputs": [], "source": [ "dir_hessian = grad**2+1.\n", "assert LInfNorm(hess(dir_hessian) - np.dot(result_Dense2.coef2,dir_hessian)) < 1e-13" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 Linear inversion\n", "\n", "Linear inversion is almost always a non-local procedure, but with a simple and explicit adjoint. It is again a good fit for reverse AD. The syntax is almost identical, except for the choice of a linear solver." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.017378Z", "iopub.status.busy": "2024-04-30T08:46:17.017302Z", "iopub.status.idle": "2024-04-30T08:46:17.020086Z", "shell.execute_reply": "2024-04-30T08:46:17.019843Z" } }, "outputs": [], "source": [ "rev,(u1,v1) = ad.Reverse2.empty(inputs=(u0,v0))\n", "\n", "# Choose a solver, and request the inversion.\n", "solver = scipy.sparse.linalg.spsolve\n", "u2 = rev.apply_linear_inverse(solver,transport_matrix,u1**2) \n", "\n", "result_Reverse2 = (u2*v1).sum()\n", "\n", "grad = rev.gradient(result_Reverse2)\n", "hess = rev.hessian(result_Reverse2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we control the results in this simple instance using dense AD." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.021461Z", "iopub.status.busy": "2024-04-30T08:46:17.021383Z", "iopub.status.idle": "2024-04-30T08:46:17.023700Z", "shell.execute_reply": "2024-04-30T08:46:17.023478Z" } }, "outputs": [], "source": [ "n=x0.size\n", "u1 = ad.Dense2.identity(constant=u0,shift=(0,n))\n", "v1 = ad.Dense2.identity(constant=v0,shift=(n,0))\n", "u2 = ad.apply_linear_inverse(solver,transport_matrix,u1**2)\n", "result_Dense2 = (u2*v1).sum()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.025032Z", "iopub.status.busy": "2024-04-30T08:46:17.024950Z", "iopub.status.idle": "2024-04-30T08:46:17.027704Z", "shell.execute_reply": "2024-04-30T08:46:17.027490Z" } }, "outputs": [], "source": [ "assert LInfNorm(grad - result_Dense2.coef1) < 1e-13\n", "dir_hessian = grad**2+1.\n", "assert LInfNorm(hess(dir_hessian) - np.dot(result_Dense2.coef2,dir_hessian)) < 1e-13" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Non-linear operators\n", "\n", "In the previous section, we have shown how to apply reverse AD to operators which are either:\n", "- linear\n", "- non-linear but local\n", "- compositions of the above\n", "\n", "We discuss here the general case of arbitrary non-linear operators. These techniques are typically needed in the context of non-linear evolution equations.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1 Defining an operator and its adjoint\n", "\n", "For an operator to be used in a reverse AD context, two execution paths must be implemented:\n", "* Standard program flow, the output is computed in a forward manner from the inputs.\n", "* Adjoint jacobian, if a co_output is specified. The operator should return a list of pairs, each consisting of an input argument and the relative co_input.\n", "\n", "**Note on the differentiation order.** This specific subsection only applies to first order reverse AD. A slightly more complex implementation is required for second order reverse AD." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.029076Z", "iopub.status.busy": "2024-04-30T08:46:17.029003Z", "iopub.status.idle": "2024-04-30T08:46:17.030942Z", "shell.execute_reply": "2024-04-30T08:46:17.030721Z" } }, "outputs": [], "source": [ "def my_operator(u,co_output=None):\n", " if co_output is None: \n", " return transport_matrix*u # Standard program flow\n", " else:\n", " co_output_value,co_arg_request = co_output\n", " assert len(co_arg_request)==1 and co_arg_request[0] is u\n", " return [(u, transport_matrix.T*co_output_value)] # Adjoint jacobian " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This operator is functionally equivalent to the method `rev.apply_linear_mapping(transport_matrix,...)`" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.032319Z", "iopub.status.busy": "2024-04-30T08:46:17.032242Z", "iopub.status.idle": "2024-04-30T08:46:17.034459Z", "shell.execute_reply": "2024-04-30T08:46:17.034234Z" } }, "outputs": [], "source": [ "rev,(u1,v1) = ad.Reverse.empty(inputs=(u0,v0)) # Create the reverse AD environnement\n", "u2 = rev.apply(my_operator,u1**2) #Implement the desired method\n", "result_Reverse = (u2*v1).sum()\n", "\n", "grad = rev.gradient(result_Reverse)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.035752Z", "iopub.status.busy": "2024-04-30T08:46:17.035675Z", "iopub.status.idle": "2024-04-30T08:46:17.037729Z", "shell.execute_reply": "2024-04-30T08:46:17.037525Z" } }, "outputs": [ { "data": { "text/plain": [ "array([ 0.00000000e+00, 1.11412341e+00, -3.69829398e-01, -2.09845813e-16,\n", " 2.14859173e-01, 7.50000000e-01, 5.35140827e-01, 1.07011025e-32])" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grad" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.039035Z", "iopub.status.busy": "2024-04-30T08:46:17.038957Z", "iopub.status.idle": "2024-04-30T08:46:17.041245Z", "shell.execute_reply": "2024-04-30T08:46:17.041020Z" } }, "outputs": [], "source": [ "rev,(u1,v1) = ad.Reverse.empty(inputs=(u0,v0))\n", "u2 = rev.apply_linear_mapping(transport_matrix,u1**2) #Implement the desired method\n", "result_Reverse = (u2*v1).sum()\n", "\n", "assert LInfNorm(grad-rev.gradient(result_Reverse))<1e-13" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 Defining an operator from a block of instructions\n", "\n", "For the purposes of illustration, we define an operator which:\n", "- has multiple inputs, each multi-dimensional.\n", "- has a multi-dimensional output.\n", "- is nonlocal (because of the inner loop involving a linear mapping).\n", "\n", "This operator is defined using a block of elementary instructions." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.042669Z", "iopub.status.busy": "2024-04-30T08:46:17.042580Z", "iopub.status.idle": "2024-04-30T08:46:17.044731Z", "shell.execute_reply": "2024-04-30T08:46:17.044511Z" } }, "outputs": [], "source": [ "def my_operator(u,v,co_output=None): \n", " # Generate an operator-like AD environnement\n", " rev,(u1,v1) = ad.Reverse.operator_like(inputs=(u,v),co_output=co_output)\n", " \n", " # Do some computations\n", " u2 = rev.apply_linear_mapping(transport_matrix,u1**2,niter=nsteps)\n", " result = u2*v1\n", " \n", " # Return the suitably converted result\n", " return rev.output(result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The non-linear operator defined above can be applied to ordinary variables without AD information, or dense AD variables. (The multiplication with `transport_matrix` forbids the use of sparse AD variables.)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.046053Z", "iopub.status.busy": "2024-04-30T08:46:17.045975Z", "iopub.status.idle": "2024-04-30T08:46:17.047689Z", "shell.execute_reply": "2024-04-30T08:46:17.047484Z" } }, "outputs": [], "source": [ "result_float = (my_operator(u0,u0*v0)*v0).sum()" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.048943Z", "iopub.status.busy": "2024-04-30T08:46:17.048870Z", "iopub.status.idle": "2024-04-30T08:46:17.051178Z", "shell.execute_reply": "2024-04-30T08:46:17.050932Z" } }, "outputs": [], "source": [ "n=x0.size\n", "u1 = ad.Dense2.identity(constant=u0,shift=(0,n))\n", "v1 = ad.Dense2.identity(constant=v0,shift=(n,0))\n", "\n", "result_Dense2 = (my_operator(u1,u1*v1)*v1).sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It can also be used in a reverse AD context. (Note that this is *recursive* reverse AD.)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.052499Z", "iopub.status.busy": "2024-04-30T08:46:17.052421Z", "iopub.status.idle": "2024-04-30T08:46:17.054985Z", "shell.execute_reply": "2024-04-30T08:46:17.054760Z" } }, "outputs": [], "source": [ "rev,(u1,v1) = ad.Reverse.empty(inputs=(u0,v0))\n", "\n", "result_Reverse = (rev.apply(my_operator,u1,u1*v1)*v1).sum()\n", "\n", "grad = rev.gradient(result_Reverse)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.056349Z", "iopub.status.busy": "2024-04-30T08:46:17.056271Z", "iopub.status.idle": "2024-04-30T08:46:17.057921Z", "shell.execute_reply": "2024-04-30T08:46:17.057701Z" } }, "outputs": [], "source": [ "assert LInfNorm(grad - result_Dense2.coef1) < 1e-13" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.059232Z", "iopub.status.busy": "2024-04-30T08:46:17.059156Z", "iopub.status.idle": "2024-04-30T08:46:17.062182Z", "shell.execute_reply": "2024-04-30T08:46:17.061950Z" } }, "outputs": [], "source": [ "rev,(u1,v1) = ad.Reverse2.empty(inputs=(u0,v0))\n", "\n", "result_Reverse2 = (rev.apply(my_operator,u1,u1*v1)*v1).sum()\n", "\n", "grad = rev.gradient(result_Reverse2)\n", "hess = rev.hessian(result_Reverse2)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.063568Z", "iopub.status.busy": "2024-04-30T08:46:17.063486Z", "iopub.status.idle": "2024-04-30T08:46:17.067298Z", "shell.execute_reply": "2024-04-30T08:46:17.067084Z" } }, "outputs": [], "source": [ "assert LInfNorm(grad - result_Dense2.coef1) < 1e-13\n", "dir_hessian = grad**2+1.\n", "assert LInfNorm(hess(dir_hessian)-np.dot(result_Dense2.coef2,dir_hessian)) < 1e-13" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3 Loops\n", "\n", "We illustrate recursive reverse AD in nested loops. This approach allows to limit memory usage, at the cost of some recomputation.\n", "\n", "**Note on complexity** Reverse automatic differentiation requires saving program state at each intermediate computation steps.\n", "If a program contains a loop, this may result in severe memory usage. Therefore, it is common to only store a small proportion of the intermediate states, referred to as keypoints, and to use recomputations for reconstructing the other steps. For best efficiency, this procedure must be made recursive. \n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our first method makes $n$=niter iterations of the arithmetico-geometric algorithm. \n", "\n", "*Side note* : the chosen example is not perfect, because this algorithm converges excessively fast, hence we only use a very small number of iterations, which may seem pointless." ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.068782Z", "iopub.status.busy": "2024-04-30T08:46:17.068702Z", "iopub.status.idle": "2024-04-30T08:46:17.071058Z", "shell.execute_reply": "2024-04-30T08:46:17.070838Z" } }, "outputs": [], "source": [ "counter=0 # Iteration counter\n", "def my_loop(a,b,niter,co_output=None):\n", " global counter\n", " rev,(a,b) = ad.Reverse.operator_like(inputs=(a,b),co_output=co_output)\n", " for i in range(niter):\n", " a,b = (a+b)/2., np.sqrt(a*b)\n", " a,b = rev.simplify(a),rev.simplify(b)\n", " counter+=1\n", " result = (a,b)\n", " return rev.output(result)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.072335Z", "iopub.status.busy": "2024-04-30T08:46:17.072258Z", "iopub.status.idle": "2024-04-30T08:46:17.073984Z", "shell.execute_reply": "2024-04-30T08:46:17.073760Z" } }, "outputs": [], "source": [ "a0=np.array([1.,2.,4.,8.])\n", "b0=1/a0" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.075313Z", "iopub.status.busy": "2024-04-30T08:46:17.075239Z", "iopub.status.idle": "2024-04-30T08:46:17.077483Z", "shell.execute_reply": "2024-04-30T08:46:17.077263Z" } }, "outputs": [ { "data": { "text/plain": [ "((array([1. , 1.125 , 1.5625 , 2.53125]),\n", " array([1. , 1.11803399, 1.45773797, 2.01556444])),\n", " 2)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "niter=2; counter=0\n", "my_loop(a0,b0,niter),counter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After the selected number of iterations, we are quite far from convergence.\n", "We do note want to increase $n$, which would augment the number of intermediate steps saved. Instead, we achieve $n^2$ iterations while saving only $2n$ states, using an outer loop. This comes at the cost of recomputing $2$ times each step in the gradient evaluation.\n", "\n", "With a similar basic recursion, one can achieve $n^k$ iterations, while saving only $k n$ steps, using $k$ nested loops. This comes at the cost of recomputing $k$ times each step in the gradient evaluation.\n", "\n", "With the convention below, $k=$`nrec`$+1$" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.078737Z", "iopub.status.busy": "2024-04-30T08:46:17.078657Z", "iopub.status.idle": "2024-04-30T08:46:17.080855Z", "shell.execute_reply": "2024-04-30T08:46:17.080629Z" } }, "outputs": [], "source": [ "def my_outer_loop(a,b,niter,nrec,operator,co_output=None):\n", " rev,(a,b) = ad.Reverse.operator_like(inputs=(a,b),co_output=co_output)\n", " for i in range(niter):\n", " if nrec==1: a,b = rev.apply(operator,a,b,niter)\n", " else: a,b = rev.apply(my_outer_loop,a,b,niter,nrec-1,operator)\n", " result = (a,b)\n", " return rev.output(result)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.082212Z", "iopub.status.busy": "2024-04-30T08:46:17.082135Z", "iopub.status.idle": "2024-04-30T08:46:17.084421Z", "shell.execute_reply": "2024-04-30T08:46:17.084192Z" } }, "outputs": [ { "data": { "text/plain": [ "((array([1. , 1.12151429, 1.50966462, 2.26607262]),\n", " array([1. , 1.12151429, 1.50966455, 2.26606075])),\n", " 4)" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "niter=2; nrec=1; counter=0\n", "my_outer_loop(a0,b0,niter,nrec,my_loop), counter" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.085733Z", "iopub.status.busy": "2024-04-30T08:46:17.085653Z", "iopub.status.idle": "2024-04-30T08:46:17.088010Z", "shell.execute_reply": "2024-04-30T08:46:17.087785Z" } }, "outputs": [ { "data": { "text/plain": [ "((array([1. , 1.12151429, 1.50966459, 2.26606669]),\n", " array([1. , 1.12151429, 1.50966459, 2.26606669])),\n", " 8)" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "niter=2; nrec=2; counter=0\n", "my_outer_loop(a0,b0,niter,nrec,my_loop), counter" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.089304Z", "iopub.status.busy": "2024-04-30T08:46:17.089225Z", "iopub.status.idle": "2024-04-30T08:46:17.096474Z", "shell.execute_reply": "2024-04-30T08:46:17.096220Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration counter for forward evaluation 8\n", "Iteration counter for backward evaluation 24\n" ] } ], "source": [ "niter=2; nrec=2; counter=0; \n", "rev,(a1,b1) = ad.Reverse.empty(inputs=(a0,b0))\n", "\n", "a2,b2 = rev.apply(my_outer_loop,a1,b1,niter,nrec,my_loop)\n", "result = (a2*b2).sum()\n", "print(\"Iteration counter for forward evaluation\", counter)\n", "\n", "counter=0;\n", "grad_rec = rev.gradient(result)\n", "print(\"Iteration counter for backward evaluation\", counter)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For verification purposes, we compare our results non-nested iteration." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.097821Z", "iopub.status.busy": "2024-04-30T08:46:17.097740Z", "iopub.status.idle": "2024-04-30T08:46:17.102788Z", "shell.execute_reply": "2024-04-30T08:46:17.102538Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration counter for forward evaluation 8\n", "Iteration counter for backward evaluation 8\n" ] } ], "source": [ "niter2 = niter**(1+nrec); counter=0\n", "rev,(a1,b1) = ad.Reverse.empty(inputs=(a0,b0))\n", "\n", "a2,b2 = rev.apply(my_loop,a1,b1,niter2)\n", "result = (a2*b2).sum()\n", "print(\"Iteration counter for forward evaluation\", counter)\n", "\n", "counter=0;\n", "grad_iter = rev.gradient(result)\n", "print(\"Iteration counter for backward evaluation\", counter)\n", "\n", "assert LInfNorm(grad_rec-grad_iter)<1e-13" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The hessian evaluation in second order reverse AD requires even more iterations, because it combines a forward and a backward pass. " ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.104174Z", "iopub.status.busy": "2024-04-30T08:46:17.104091Z", "iopub.status.idle": "2024-04-30T08:46:17.125582Z", "shell.execute_reply": "2024-04-30T08:46:17.125318Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration counter for forward evaluation 8\n", "Iteration counter for gradient evaluation 24\n", "Iteration counter for hessian evaluation 48\n" ] } ], "source": [ "niter=2; nrec=2; counter=0; \n", "rev,(a1,b1) = ad.Reverse2.empty(inputs=(a0,b0))\n", "\n", "a2,b2 = rev.apply(my_outer_loop,a1,b1,niter,nrec,my_loop)\n", "result = (a2*b2).sum()\n", "print(\"Iteration counter for forward evaluation\", counter)\n", "\n", "counter=0;\n", "grad_rec2 = rev.gradient(result)\n", "print(\"Iteration counter for gradient evaluation\", counter)\n", "assert LInfNorm(grad_rec-grad_rec2)<1e-13\n", "\n", "counter=0\n", "hess = rev.hessian(result)\n", "dir_hessian = grad_rec**2+1.\n", "hess_rec2 = hess(dir_hessian)\n", "print(\"Iteration counter for hessian evaluation\", counter)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Eventually, we check our results using dense AD, and an unrolled loop." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.126931Z", "iopub.status.busy": "2024-04-30T08:46:17.126852Z", "iopub.status.idle": "2024-04-30T08:46:17.129938Z", "shell.execute_reply": "2024-04-30T08:46:17.129709Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration counter for forward evaluation 8\n" ] } ], "source": [ "niter2 = niter**(1+nrec); counter=0\n", "a1 = ad.Dense2.identity(constant=a0,shift=(0,4))\n", "b1 = ad.Dense2.identity(constant=b0,shift=(4,0))\n", "\n", "a2,b2 = my_loop(a1,b1,niter2)\n", "result = (a2*b2).sum()\n", "print(\"Iteration counter for forward evaluation\", counter)\n", "\n", "grad_dense2 = result.coef1\n", "hess_dense2 = np.dot(result.coef2,dir_hessian)\n", "\n", "assert np.allclose(grad_rec2,grad_dense2) \n", "assert np.allclose(hess_rec2,hess_dense2) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Format of variables\n", "\n", "Variables used in reverseAD mode may be scalars, arrays, multidimensional arrays, as well as tuples, lists and dicts. The latter two cases, lists and dicts, must be explicitly requested as they are not considered by default." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.1 Scalars and multi-dimensional arrays\n", "\n", "Arrays of arbitrary depth are supported, including array scalars.\n", "As usual, please be careful when manipulating array scalars and `numpy.float64` values, see notebook [ADBugs](ADBugs.ipynb)." ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.131336Z", "iopub.status.busy": "2024-04-30T08:46:17.131252Z", "iopub.status.idle": "2024-04-30T08:46:17.133102Z", "shell.execute_reply": "2024-04-30T08:46:17.132878Z" } }, "outputs": [], "source": [ "x0=1.;x1=np.array([1.,2.]);x2=np.array([[3.,4.],[5.,6.]]);x3=np.zeros((2,2,2))" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.134430Z", "iopub.status.busy": "2024-04-30T08:46:17.134353Z", "iopub.status.idle": "2024-04-30T08:46:17.137443Z", "shell.execute_reply": "2024-04-30T08:46:17.137189Z" } }, "outputs": [ { "data": { "text/plain": [ "(array(1.),\n", " array([ 0., 256.]),\n", " array([[0., 0.],\n", " [0., 0.]]),\n", " array([[[ 96., 128.],\n", " [160., 192.]],\n", " \n", " [[ 96., 128.],\n", " [160., 192.]]]))" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rev,(y0,y1,y2,y3)=ad.Reverse.empty(inputs=(x0,x1,x2,x3))\n", "z0 = (y1[1]+y2*y3).sum()\n", "t0 = rev.simplify(z0)\n", "grad_rev = rev.gradient(y0+t0**2)\n", "rev.to_inputshapes(grad_rev)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.138728Z", "iopub.status.busy": "2024-04-30T08:46:17.138647Z", "iopub.status.idle": "2024-04-30T08:46:17.141833Z", "shell.execute_reply": "2024-04-30T08:46:17.141609Z" } }, "outputs": [], "source": [ "rev,(y0,y1,y2,y3)=ad.Reverse2.empty(inputs=(x0,x1,x2,x3))\n", "z0 = (y1[1]+y2*y3).sum()\n", "t0 = rev.simplify(z0)\n", "obj = y0+t0**2 \n", "grad_rev2 = rev.gradient(obj)\n", "hess_rev2 = rev.hessian(obj)(grad_rev2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We next check the result using dense automatic differentiation." ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.143191Z", "iopub.status.busy": "2024-04-30T08:46:17.143111Z", "iopub.status.idle": "2024-04-30T08:46:17.145379Z", "shell.execute_reply": "2024-04-30T08:46:17.145155Z" } }, "outputs": [], "source": [ "y0,y1,y2,y3 = ad.Dense2.register(inputs=(x0,x1,x2,x3))\n", "z0=(y1[1]+y2*y3).sum()\n", "obj=y0+z0**2\n", "grad_dens = obj.gradient()\n", "hess_dens = lp.dot_AV(obj.hessian(),grad_dens)" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.146686Z", "iopub.status.busy": "2024-04-30T08:46:17.146610Z", "iopub.status.idle": "2024-04-30T08:46:17.154607Z", "shell.execute_reply": "2024-04-30T08:46:17.154387Z" } }, "outputs": [], "source": [ "y0,y1,y2,y3 = ad.Sparse2.register(inputs=(x0,x1,x2,x3))\n", "z0=(y1[1]+y2*y3).sum()\n", "obj=(y0+z0**2).to_dense()\n", "grad_sp = obj.gradient()\n", "hess_sp = lp.dot_AV(obj.hessian(),grad_dens)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.155956Z", "iopub.status.busy": "2024-04-30T08:46:17.155880Z", "iopub.status.idle": "2024-04-30T08:46:17.157751Z", "shell.execute_reply": "2024-04-30T08:46:17.157524Z" } }, "outputs": [], "source": [ "assert(all(grad_dens==grad_rev))\n", "assert(all(grad_dens==grad_rev2))\n", "assert(all(grad_dens==grad_sp))\n", "assert(all(hess_dens==hess_rev2))\n", "assert(all(hess_dens==hess_sp))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.2 Lists and dictionnaries\n", "\n", "By default, the provided reverseAD library does not look into input lists and dictionaries for AD information : only tuples are explored. Lists and dictionnaries are regarded as opaque objects, whose contents are not differentiable.\n", "This behavior can be changed, as described below." ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.159156Z", "iopub.status.busy": "2024-04-30T08:46:17.159075Z", "iopub.status.idle": "2024-04-30T08:46:17.161095Z", "shell.execute_reply": "2024-04-30T08:46:17.160848Z" } }, "outputs": [], "source": [ "def myfun(mylist,mydict,revType=ad.Reverse,co_output=None):\n", " rev,(l,dx,dy) = revType.operator_like(\n", " inputs=(mylist,mydict['x'],mydict['y']),co_output=co_output,\n", " input_iterables=(tuple,list),\n", " output_iterables=(list,))\n", " return rev.output( [l[0]*dx+dy,l[1]*dy] )" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.162401Z", "iopub.status.busy": "2024-04-30T08:46:17.162319Z", "iopub.status.idle": "2024-04-30T08:46:17.165354Z", "shell.execute_reply": "2024-04-30T08:46:17.165126Z" } }, "outputs": [], "source": [ "a0=[1.,np.array([2.,3.])]\n", "b0={'x':np.array([4.,5.]),'y':6.}\n", "rev,[a1,b1] = ad.Reverse.empty(inputs=[a0,b0], # list of list and dict\n", " input_iterables=(list,dict),\n", " output_iterables=(list,)) \n", "u,v= rev.apply(myfun,a1,b1)\n", "result = lp.dot_VV(u,v)\n", "grad_rev1 = rev.gradient(result)" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.166730Z", "iopub.status.busy": "2024-04-30T08:46:17.166654Z", "iopub.status.idle": "2024-04-30T08:46:17.171879Z", "shell.execute_reply": "2024-04-30T08:46:17.171649Z" } }, "outputs": [], "source": [ "revType=ad.Reverse2\n", "rev,[a1,b1] = ad.Reverse2.empty(inputs=[a0,b0], # list of list and dict\n", " input_iterables=(list,dict),\n", " output_iterables=(list,)) \n", "u,v= rev.apply(myfun,a1,b1,ad.Reverse2)\n", "result = lp.dot_VV(u,v)\n", "grad_rev2 = rev.gradient(result)\n", "hess_rev2 = rev.hessian(result)(grad_rev2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Checking with denseAD" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.173238Z", "iopub.status.busy": "2024-04-30T08:46:17.173153Z", "iopub.status.idle": "2024-04-30T08:46:17.175435Z", "shell.execute_reply": "2024-04-30T08:46:17.175201Z" } }, "outputs": [], "source": [ "a1,b1 = ad.Dense2.register([a0,b0],iterables=(list,dict))\n", "u,v= myfun(a1,b1)\n", "result = lp.dot_VV(u,v)\n", "grad_dense2 = result.gradient()\n", "hess_dense2 = lp.dot_AV(result.hessian(),grad_dense2)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "execution": { "iopub.execute_input": "2024-04-30T08:46:17.176764Z", "iopub.status.busy": "2024-04-30T08:46:17.176687Z", "iopub.status.idle": "2024-04-30T08:46:17.178417Z", "shell.execute_reply": "2024-04-30T08:46:17.178197Z" } }, "outputs": [], "source": [ "assert(all(grad_dense2==grad_rev1))\n", "assert(all(grad_dense2==grad_rev2))\n", "assert(all(hess_dense2==hess_rev2))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.8" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }