{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "
\n", " \n", " \"QuantEcon\"\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Solvers, Optimizers, and Automatic Differentiation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents\n", "\n", "- [Solvers, Optimizers, and Automatic Differentiation](#Solvers,-Optimizers,-and-Automatic-Differentiation) \n", " - [Overview](#Overview) \n", " - [Introduction to Differentiable Programming](#Introduction-to-Differentiable-Programming) \n", " - [Optimization](#Optimization) \n", " - [Systems of Equations and Least Squares](#Systems-of-Equations-and-Least-Squares) \n", " - [LeastSquaresOptim.jl](#LeastSquaresOptim.jl) \n", " - [Additional Notes](#Additional-Notes) \n", " - [Exercises](#Exercises) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "In this lecture we introduce a few of the Julia libraries that we’ve found particularly useful for quantitative work in economics." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "hide-output": true }, "outputs": [], "source": [ "using InstantiateFromURL\n", "# optionally add arguments to force installation: instantiate = true, precompile = true\n", "github_project(\"QuantEcon/quantecon-notebooks-julia\", version = \"0.7.0\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "hide-output": true }, "outputs": [], "source": [ "using LinearAlgebra, Statistics\n", "using ForwardDiff, Zygote, Optim, JuMP, Ipopt, BlackBoxOptim, Roots, NLsolve, LeastSquaresOptim\n", "using Optim: converged, maximum, maximizer, minimizer, iterations #some extra functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction to Differentiable Programming\n", "\n", "The promise of differentiable programming is that we can move towards taking the derivatives of almost arbitrarily\n", "complicated computer programs, rather than simply thinking about the derivatives of mathematical functions. Differentiable\n", "programming is the natural evolution of automatic differentiation (AD, sometimes called algorithmic differentiation).\n", "\n", "Stepping back, there are three ways to calculate the gradient or Jacobian\n", "\n", "- Analytic derivatives / Symbolic differentiation \n", " \n", " - You can sometimes calculate the derivative on pen-and-paper, and potentially simplify the expression. \n", " - In effect, repeated applications of the chain rule, product rule, etc. \n", " - It is sometimes, though not always, the most accurate and fastest option if there are algebraic simplifications. \n", " - Sometimes symbolic integration on the computer a good solution, if the package can handle your functions. Doing algebra by hand is tedious and error-prone, but\n", " is sometimes invaluable. \n", " \n", "- Finite differences \n", " \n", " - Evaluate the function at least $ N+1 $ times to get the gradient – Jacobians are even worse. \n", " - Large $ \\Delta $ is numerically stable but inaccurate, too small of $ \\Delta $ is numerically unstable but more accurate. \n", " - Choosing the $ \\Delta $ is hard, so use packages such as [DiffEqDiffTools.jl](https://github.com/JuliaDiffEq/DiffEqDiffTools.jl). \n", " - If a function is $ R^N \\to R $ for a large $ N $, this requires $ O(N) $ function evaluations. \n", " \n", "\n", "\n", "$$\n", "\\partial_{x_i}f(x_1,\\ldots x_N) \\approx \\frac{f(x_1,\\ldots x_i + \\Delta,\\ldots x_N) - f(x_1,\\ldots x_i,\\ldots x_N)}{\\Delta}\n", "$$\n", "\n", "- Automatic Differentiation \n", " \n", " - The same as analytic/symbolic differentiation, but where the **chain rule** is calculated **numerically** rather than symbolically. \n", " - Just as with analytic derivatives, can establish rules for the derivatives of individual functions (e.g. $ d\\left(sin(x)\\right) $ to $ cos(x) dx $) for intrinsic derivatives. \n", " \n", "\n", "\n", "AD has two basic approaches, which are variations on the order of evaluating the chain rule: reverse and forward mode (although mixed mode is possible).\n", "\n", "1. If a function is $ R^N \\to R $, then **reverse-mode** AD can find the gradient in $ O(1) $ sweep (where a “sweep” is $ O(1) $ function evaluations). \n", "1. If a function is $ R \\to R^N $, then **forward-mode** AD can find the jacobian in $ O(1) $ sweeps. \n", "\n", "\n", "We will explore two types of automatic differentiation in Julia (and discuss a few packages which implement them). For both, remember the [chain rule](https://en.wikipedia.org/wiki/Chain_rule)\n", "\n", "$$\n", "\\frac{dy}{dx} = \\frac{dy}{dw} \\cdot \\frac{dw}{dx}\n", "$$\n", "\n", "Forward-mode starts the calculation from the left with $ \\frac{dy}{dw} $ first, which then calculates the product with $ \\frac{dw}{dx} $. On the other hand, reverse mode starts on the right hand side with $ \\frac{dw}{dx} $ and works backwards.\n", "\n", "Take an example a function with fundamental operations and known analytical derivatives\n", "\n", "$$\n", "f(x_1, x_2) = x_1 x_2 + \\sin(x_1)\n", "$$\n", "\n", "And rewrite this as a function which contains a sequence of simple operations and temporaries." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "f (generic function with 1 method)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function f(x_1, x_2)\n", " w_1 = x_1\n", " w_2 = x_2\n", " w_3 = w_1 * w_2\n", " w_4 = sin(w_1)\n", " w_5 = w_3 + w_4\n", " return w_5\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we can identify all of the underlying functions (`*, sin, +`), and see if each has an\n", "intrinsic derivative. While these are obvious, with Julia we could come up with all sorts of differentiation rules for arbitrarily\n", "complicated combinations and compositions of intrinsic operations. In fact, there is even [a package](https://github.com/JuliaDiff/ChainRules.jl) for registering more." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Forward-Mode Automatic Differentiation\n", "\n", "In forward-mode AD, you first fix the variable you are interested in (called “seeding”), and then evaluate the chain rule in left-to-right order.\n", "\n", "For example, with our $ f(x_1, f_2) $ example above, if we wanted to calculate the derivative with respect to $ x_1 $ then\n", "we can seed the setup accordingly. $ \\frac{\\partial w_1}{\\partial x_1} = 1 $ since we are taking the derivative of it, while $ \\frac{\\partial w_1}{\\partial x_1} = 0 $.\n", "\n", "Following through with these, redo all of the calculations for the derivative in parallel with the function itself.\n", "\n", "$$\n", "\\begin{array}{l|l}\n", "f(x_1, x_2) &\n", "\\frac{\\partial f(x_1,x_2)}{\\partial x_1}\n", "\\\\\n", "\\hline\n", "w_1 = x_1 &\n", "\\frac{\\partial w_1}{\\partial x_1} = 1 \\text{ (seed)}\\\\\n", "w_2 = x_2 &\n", "\\frac{\\partial w_2}{\\partial x_1} = 0 \\text{ (seed)}\n", "\\\\\n", "w_3 = w_1 \\cdot w_2 &\n", "\\frac{\\partial w_3}{\\partial x_1} = w_2 \\cdot \\frac{\\partial w_1}{\\partial x_1} + w_1 \\cdot \\frac{\\partial w_2}{\\partial x_1}\n", "\\\\\n", "w_4 = \\sin w_1 &\n", "\\frac{\\partial w_4}{\\partial x_1} = \\cos w_1 \\cdot \\frac{\\partial w_1}{\\partial x_1}\n", "\\\\\n", "w_5 = w_3 + w_4 &\n", "\\frac{\\partial w_5}{\\partial x_1} = \\frac{\\partial w_3}{\\partial x_1} + \\frac{\\partial w_4}{\\partial x_1}\n", "\\end{array}\n", "$$\n", "\n", "Since these two could be done at the same time, we say there is “one pass” required for this calculation.\n", "\n", "Generalizing a little, if the function was vector-valued, then that single pass would get the entire row of the Jacobian in that single pass. Hence for a $ R^N \\to R^M $ function, requires $ N $ passes to get a dense Jacobian using forward-mode AD.\n", "\n", "How can you implement forward-mode AD? It turns out to be fairly easy with a generic programming language to make a simple example (while the devil is in the details for\n", "a high-performance implementation)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Forward-Mode with Dual Numbers\n", "\n", "One way to implement forward-mode AD is to use [dual numbers](https://en.wikipedia.org/wiki/Dual_number).\n", "\n", "Instead of working with just a real number, e.g. $ x $, we will augment each with an infinitesimal $ \\epsilon $ and use $ x + \\epsilon $.\n", "\n", "From Taylor’s theorem,\n", "\n", "$$\n", "f(x + \\epsilon) = f(x) + f'(x)\\epsilon + O(\\epsilon^2)\n", "$$\n", "\n", "where we will define the infinitesimal such that $ \\epsilon^2 = 0 $.\n", "\n", "With this definition, we can write a general rule for differentiation of $ g(x,y) $ as the chain rule for the total derivative\n", "\n", "$$\n", "g(x + \\epsilon, y + \\epsilon) = g(x, y) + (\\partial_x g(x,y) + \\partial_y g(x,y))\\epsilon\n", "$$\n", "\n", "But, note that if we keep track of the constant in front of the $ \\epsilon $ terms (e.g. a $ x' $ and $ y' $)\n", "\n", "$$\n", "g(x + x'\\epsilon, y + y'\\epsilon) = g(x, y) + (\\partial_x g(x,y)x' + \\partial_y g(x,y)y')\\epsilon\n", "$$\n", "\n", "This is simply the chain rule. A few more examples\n", "\n", "$$\n", "\\begin{aligned}\n", " (x + x'\\epsilon) + (y + y'\\epsilon) &= (x + y) + (x' + y')\\epsilon\\\\\n", "(x + x'\\epsilon)\\times(y + y'\\epsilon) &= (xy) + (x'y + y'x)\\epsilon\\\\\n", "\\exp(x + x'\\epsilon) &= \\exp(x) + (x'\\exp(x))\\epsilon\\\\\n", " \\end{aligned}\n", "$$\n", "\n", "Using the generic programming in Julia, it is easy to define a new dual number type which can encapsulate the pair $ (x, x') $ and provide a definitions for\n", "all of the basic operations. Each definition then has the chain-rule built into it.\n", "\n", "With this approach, the “seed” process is simple the creation of the $ \\epsilon $ for the underlying variable.\n", "\n", "So if we have the function $ f(x_1, x_2) $ and we wanted to find the derivative $ \\partial_{x_1} f(3.8, 6.9) $ then then we would seed them with the dual numbers $ x_1 \\to (3.8, 1) $ and $ x_2 \\to (6.9, 0) $.\n", "\n", "If you then follow all of the same scalar operations above with a seeded dual number, it will calculate both the function value and the derivative in a single “sweep” and without modifying any of your (generic) code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ForwardDiff.jl\n", "\n", "Dual-numbers are at the heart of one of the AD packages we have already seen." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ForwardDiff.gradient(h, x) = [26.354764961030977 16.663053156992284]\n" ] }, { "data": { "text/plain": [ "5-element Array{Float64,1}:\n", " 2.170407065095802\n", " 1.8623366421227727\n", " 2.45745946239708\n", " 1.8599838132723714\n", " 3.382562928646287" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using ForwardDiff\n", "h(x) = sin(x[1]) + x[1] * x[2] + sinh(x[1] * x[2]) # multivariate.\n", "x = [1.4 2.2]\n", "@show ForwardDiff.gradient(h,x) # use AD, seeds from x\n", "\n", "#Or, can use complicated functions of many variables\n", "f(x) = sum(sin, x) + prod(tan, x) * sum(sqrt, x)\n", "g = (x) -> ForwardDiff.gradient(f, x); # g() is now the gradient\n", "g(rand(5)) # gradient at a random point\n", "# ForwardDiff.hessian(f,x') # or the hessian" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can even auto-differentiate complicated functions with embedded iterations." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "1.4142135623730951" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function squareroot(x) #pretending we don't know sqrt()\n", " z = copy(x) # Initial starting point for Newton’s method\n", " while abs(z*z - x) > 1e-13\n", " z = z - (z*z-x)/(2z)\n", " end\n", " return z\n", "end\n", "squareroot(2.0)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "0.35355339059327373" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using ForwardDiff\n", "dsqrt(x) = ForwardDiff.derivative(squareroot, x)\n", "dsqrt(2.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Zygote.jl\n", "\n", "Unlike forward-mode auto-differentiation, reverse-mode is very difficult to implement efficiently, and there are many variations on the best approach.\n", "\n", "Many reverse-mode packages are connected to machine-learning packages, since the efficient gradients of $ R^N \\to R $ loss functions are necessary for the gradient descent optimization algorithms used in machine learning.\n", "\n", "One recent package is [Zygote.jl](https://github.com/FluxML/Zygote.jl), which is used in the Flux.jl framework." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "(25.0, 2.0)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Zygote\n", "\n", "h(x, y) = 3x^2 + 2x + 1 + y*x - y\n", "gradient(h, 3.0, 5.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we see that Zygote has a gradient function as the interface, which returns a tuple.\n", "\n", "You could create this as an operator if you wanted to.," ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "-0.6536436208636119" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "D(f) = x-> gradient(f, x)[1] # returns first in tuple\n", "\n", "D_sin = D(sin)\n", "D_sin(4.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For functions of one (Julia) variable, we can find the by simply using the `'` after a function name" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 0.3333333333333333\n", " 0.3333333333333333\n", " -0.3333333333333333" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Statistics\n", "p(x) = mean(abs, x)\n", "p'([1.0, 3.0, -2.0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or, using the complicated iterative function we defined for the squareroot," ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "0.3535533905932737" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "squareroot'(2.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can Zygote supports combinations of vectors and scalars as the function parameters." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "([0.13736056394868904, 0.5494422557947561, 0.8241633836921343], -1.2725553130925444 + 0.0im)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h(x,n) = (sum(x.^n))^(1/n)\n", "gradient(h, [1.0, 4.0, 6.0], 2.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The gradients can be very high dimensional. For example, to do a simple nonlinear optimization problem\n", "with 1 million dimensions, solved in a few seconds." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "minimum = 5.771603309891095 with in 2 iterations\n" ] } ], "source": [ "using Optim, LinearAlgebra\n", "N = 1000000\n", "y = rand(N)\n", "λ = 0.01\n", "obj(x) = sum((x .- y).^2) + λ*norm(x)\n", "\n", "x_iv = rand(N)\n", "function g!(G, x)\n", " G .= obj'(x)\n", "end\n", "\n", "results = optimize(obj, g!, x_iv, LBFGS()) # or ConjugateGradient()\n", "println(\"minimum = $(results.minimum) with in \"*\n", "\"$(results.iterations) iterations\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Caution: while Zygote is the most exciting reverse-mode AD implementation in Julia, it has many rough edges.\n", "\n", "- If you write a function, take its gradient, and then modify the function, you need to call `Zygote.refresh()` or else the gradient will be out of sync. This may not apply for Julia 1.3+. \n", "- It provides no features for getting Jacobians, so you would have to ask for each row of the Jacobian separately. That said, you\n", " probably want to use `ForwardDiff.jl` for Jacobians if the dimension of the output is similar to the dimension of the input. \n", "- You cannot, in the current release, use mutating functions (e.g. modify a value in an array/etc.) although that feature is in progress. \n", "- Compiling can be very slow for complicated functions. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Optimization\n", "\n", "There are a large number of packages intended to be used for optimization in Julia.\n", "\n", "Part of the reason for the diversity of options is that Julia makes it possible to efficiently implement a large number of variations on optimization routines.\n", "\n", "The other reason is that different types of optimization problems require different algorithms." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Optim.jl\n", "\n", "A good pure-Julia solution for the (unconstrained or box-bounded) optimization of\n", "univariate and multivariate function is the [Optim.jl](https://github.com/JuliaNLSolvers/Optim.jl) package.\n", "\n", "By default, the algorithms in `Optim.jl` target minimization rather than\n", "maximization, so if a function is called `optimize` it will mean minimization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Univariate Functions on Bounded Intervals\n", "\n", "[Univariate optimization](http://julianlsolvers.github.io/Optim.jl/stable/user/minimization/#minimizing-a-univariate-function-on-a-bounded-interval)\n", "defaults to a robust hybrid optimization routine called [Brent’s method](https://en.wikipedia.org/wiki/Brent%27s_method)." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "Results of Optimization Algorithm\n", " * Algorithm: Brent's Method\n", " * Search Interval: [-2.000000, 1.000000]\n", " * Minimizer: 0.000000e+00\n", " * Minimum: 0.000000e+00\n", " * Iterations: 5\n", " * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true\n", " * Objective Function Calls: 6" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Optim\n", "using Optim: converged, maximum, maximizer, minimizer, iterations #some extra functions\n", "\n", "result = optimize(x-> x^2, -2.0, 1.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Always check if the results converged, and throw errors otherwise" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "converged(result) || error(\"Failed to converge in $(iterations(result)) iterations\")\n", "xmin = result.minimizer\n", "result.minimum" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first line is a logical OR between `converged(result)` and `error(\"...\")`.\n", "\n", "If the convergence check passes, the logical sentence is true, and it will proceed to the next line; if not, it will throw the error.\n", "\n", "Or to maximize" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "-0.0" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f(x) = -x^2\n", "result = maximize(f, -2.0, 1.0)\n", "converged(result) || error(\"Failed to converge in $(iterations(result)) iterations\")\n", "xmin = maximizer(result)\n", "fmax = maximum(result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** Notice that we call `optimize` results using `result.minimizer`, and `maximize` results using `maximizer(result)`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Unconstrained Multivariate Optimization\n", "\n", "There are a variety of [algorithms and options](http://julianlsolvers.github.io/Optim.jl/stable/user/minimization/#_top) for multivariate optimization.\n", "\n", "From the documentation, the simplest version is" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ " * Status: success\n", "\n", " * Candidate solution\n", " Minimizer: [1.00e+00, 1.00e+00]\n", " Minimum: 3.525527e-09\n", "\n", " * Found with\n", " Algorithm: Nelder-Mead\n", " Initial Point: [0.00e+00, 0.00e+00]\n", "\n", " * Convergence measures\n", " √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08\n", "\n", " * Work counters\n", " Seconds run: 0 (vs limit Inf)\n", " Iterations: 60\n", " f(x) calls: 118\n" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n", "x_iv = [0.0, 0.0]\n", "results = optimize(f, x_iv) # i.e. optimize(f, x_iv, NelderMead())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The default algorithm in `NelderMead`, which is derivative-free and hence requires many function evaluations.\n", "\n", "To change the algorithm type to [L-BFGS](http://julianlsolvers.github.io/Optim.jl/stable/algo/lbfgs/)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "minimum = 5.3784046148998115e-17 with argmin = [0.9999999926662393, 0.9999999853324786] in 24 iterations\n" ] } ], "source": [ "results = optimize(f, x_iv, LBFGS())\n", "println(\"minimum = $(results.minimum) with argmin = $(results.minimizer) in \"*\n", "\"$(results.iterations) iterations\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that this has fewer iterations.\n", "\n", "As no derivative was given, it used [finite differences](https://en.wikipedia.org/wiki/Finite_difference) to approximate the gradient of `f(x)`.\n", "\n", "However, since most of the algorithms require derivatives, you will often want to use auto differentiation or pass analytical gradients if possible." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "minimum = 5.191703158437428e-27 with argmin = [0.999999999999928, 0.9999999999998559] in 24 iterations\n" ] } ], "source": [ "f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n", "x_iv = [0.0, 0.0]\n", "results = optimize(f, x_iv, LBFGS(), autodiff=:forward) # i.e. use ForwardDiff.jl\n", "println(\"minimum = $(results.minimum) with argmin = $(results.minimizer) in \"*\n", "\"$(results.iterations) iterations\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we did not need to use `ForwardDiff.jl` directly, as long as our `f(x)` function was written to be generic (see the [generic programming lecture](generic_programming.html) ).\n", "\n", "Alternatively, with an analytical gradient" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "minimum = 5.191703158437428e-27 with argmin = [0.999999999999928, 0.9999999999998559] in 24 iterations\n" ] } ], "source": [ "f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n", "x_iv = [0.0, 0.0]\n", "function g!(G, x)\n", " G[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]\n", " G[2] = 200.0 * (x[2] - x[1]^2)\n", "end\n", "\n", "results = optimize(f, g!, x_iv, LBFGS()) # or ConjugateGradient()\n", "println(\"minimum = $(results.minimum) with argmin = $(results.minimizer) in \"*\n", "\"$(results.iterations) iterations\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For derivative-free methods, you can change the algorithm – and have no need to provide a gradient" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ " * Status: failure (reached maximum number of iterations) (line search failed)\n", "\n", " * Candidate solution\n", " Minimizer: [9.50e-01, 9.04e-01]\n", " Minimum: 3.202099e-03\n", "\n", " * Found with\n", " Algorithm: Simulated Annealing\n", " Initial Point: [0.00e+00, 0.00e+00]\n", "\n", " * Convergence measures\n", " |x - x'| = NaN ≰ 0.0e+00\n", " |x - x'|/|x'| = NaN ≰ 0.0e+00\n", " |f(x) - f(x')| = NaN ≰ 0.0e+00\n", " |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00\n", " |g(x)| = NaN ≰ 1.0e-08\n", "\n", " * Work counters\n", " Seconds run: 0 (vs limit Inf)\n", " Iterations: 1000\n", " f(x) calls: 1001\n" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n", "x_iv = [0.0, 0.0]\n", "results = optimize(f, x_iv, SimulatedAnnealing()) # or ParticleSwarm() or NelderMead()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, you will note that this did not converge, as stochastic methods typically require many more iterations as a tradeoff for their global-convergence properties.\n", "\n", "See the [maximum likelihood](http://julianlsolvers.github.io/Optim.jl/stable/examples/generated/maxlikenlm/)\n", "example and the accompanying [Jupyter notebook](https://nbviewer.jupyter.org/github/JuliaNLSolvers/Optim.jl/blob/gh-pages/v0.15.3/examples/generated/maxlikenlm.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### JuMP.jl\n", "\n", "The [JuMP.jl](https://github.com/JuliaOpt/JuMP.jl) package is an ambitious implementation of a modelling language for optimization problems in Julia.\n", "\n", "In that sense, it is more like an AMPL (or Pyomo) built on top of the Julia\n", "language with macros, and able to use a variety of different commerical and open source solvers.\n", "\n", "If you have a linear, quadratic, conic, mixed-integer linear, etc. problem then this will likely be the ideal “meta-package” for calling various solvers.\n", "\n", "For nonlinear problems, the modelling language may make things difficult for complicated functions (as it is not designed to be used as a general-purpose nonlinear optimizer).\n", "\n", "See the [quick start guide](http://www.juliaopt.org/JuMP.jl/0.18/quickstart.html) for more details on all of the options.\n", "\n", "The following is an example of calling a linear objective with a nonlinear constraint (provided by an external function).\n", "\n", "Here `Ipopt` stands for `Interior Point OPTimizer`, a [nonlinear solver](https://github.com/JuliaOpt/Ipopt.jl) in Julia" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "hide-output": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Warning: `with_optimizer` is deprecated. Adapt the following example to update your code:\n", "│ `with_optimizer(Ipopt.Optimizer)` becomes `Ipopt.Optimizer`.\n", "│ caller = top-level scope at In[21]:13\n", "└ @ Core In[21]:13\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "******************************************************************************\n", "This program contains Ipopt, a library for large-scale nonlinear optimization.\n", " Ipopt is released as open source code under the Eclipse Public License (EPL).\n", " For more information visit http://projects.coin-or.org/Ipopt\n", "******************************************************************************\n", "\n", "This is Ipopt version 3.12.10, running with linear solver mumps.\n", "NOTE: Other linear solvers might be more efficient (see Ipopt documentation).\n", "\n", "Number of nonzeros in equality constraint Jacobian...: 0\n", "Number of nonzeros in inequality constraint Jacobian.: 2\n", "Number of nonzeros in Lagrangian Hessian.............: 3\n", "\n", "Total number of variables............................: 2\n", " variables with only lower bounds: 0\n", " variables with lower and upper bounds: 0\n", " variables with only upper bounds: 0\n", "Total number of equality constraints.................: 0\n", "Total number of inequality constraints...............: 1\n", " inequality constraints with only lower bounds: 0\n", " inequality constraints with lower and upper bounds: 0\n", " inequality constraints with only upper bounds: 1\n", "\n", "iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls\n", " 0 -1.0000000e+00 0.00e+00 2.07e-01 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0\n", " 1 -1.4100714e+00 0.00e+00 5.48e-02 -1.7 3.94e-01 - 1.00e+00 7.36e-01f 1\n", " 2 -1.4113851e+00 0.00e+00 2.83e-08 -2.5 9.29e-04 - 1.00e+00 1.00e+00f 1\n", " 3 -1.4140632e+00 0.00e+00 1.50e-09 -3.8 1.89e-03 - 1.00e+00 1.00e+00f 1\n", " 4 -1.4142117e+00 0.00e+00 1.84e-11 -5.7 1.05e-04 - 1.00e+00 1.00e+00f 1\n", " 5 -1.4142136e+00 0.00e+00 8.23e-09 -8.6 1.30e-06 - 1.00e+00 1.00e+00f 1\n", "\n", "Number of Iterations....: 5\n", "\n", " (scaled) (unscaled)\n", "Objective...............: -1.4142135740093271e+00 -1.4142135740093271e+00\n", "Dual infeasibility......: 8.2280586788385790e-09 8.2280586788385790e-09\n", "Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00\n", "Complementarity.........: 2.5059035815063646e-09 2.5059035815063646e-09\n", "Overall NLP error.......: 8.2280586788385790e-09 8.2280586788385790e-09\n", "\n", "\n", "Number of objective function evaluations = 6\n", "Number of objective gradient evaluations = 6\n", "Number of equality constraint evaluations = 0\n", "Number of inequality constraint evaluations = 6\n", "Number of equality constraint Jacobian evaluations = 0\n", "Number of inequality constraint Jacobian evaluations = 6\n", "Number of Lagrangian Hessian evaluations = 5\n", "Total CPU secs in IPOPT (w/o function evaluations) = 1.647\n", "Total CPU secs in NLP function evaluations = 1.529\n", "\n", "EXIT: Optimal Solution Found.\n", "JuMP.optimize!(m) = nothing\n" ] } ], "source": [ "using JuMP, Ipopt\n", "# solve\n", "# max( x[1] + x[2] )\n", "# st sqrt(x[1]^2 + x[2]^2) <= 1\n", "\n", "function squareroot(x) # pretending we don't know sqrt()\n", " z = x # Initial starting point for Newton’s method\n", " while abs(z*z - x) > 1e-13\n", " z = z - (z*z-x)/(2z)\n", " end\n", " return z\n", "end\n", "m = Model(with_optimizer(Ipopt.Optimizer))\n", "# need to register user defined functions for AD\n", "JuMP.register(m,:squareroot, 1, squareroot, autodiff=true)\n", "\n", "@variable(m, x[1:2], start=0.5) # start is the initial condition\n", "@objective(m, Max, sum(x))\n", "@NLconstraint(m, squareroot(x[1]^2+x[2]^2) <= 1)\n", "@show JuMP.optimize!(m)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And this is an example of a quadratic objective" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "hide-output": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Warning: `with_optimizer` is deprecated. Adapt the following example to update your code:\n", "│ `with_optimizer(Ipopt.Optimizer)` becomes `Ipopt.Optimizer`.\n", "│ caller = top-level scope at In[22]:6\n", "└ @ Core In[22]:6\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "This is Ipopt version 3.12.10, running with linear solver mumps.\n", "NOTE: Other linear solvers might be more efficient (see Ipopt documentation).\n", "\n", "Number of nonzeros in equality constraint Jacobian...: 0\n", "Number of nonzeros in inequality constraint Jacobian.: 0\n", "Number of nonzeros in Lagrangian Hessian.............: 3\n", "\n", "Total number of variables............................: 2\n", " variables with only lower bounds: 0\n", " variables with lower and upper bounds: 0\n", " variables with only upper bounds: 0\n", "Total number of equality constraints.................: 0\n", "Total number of inequality constraints...............: 0\n", " inequality constraints with only lower bounds: 0\n", " inequality constraints with lower and upper bounds: 0\n", " inequality constraints with only upper bounds: 0\n", "\n", "iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls\n", " 0 1.0000000e+00 0.00e+00 2.00e+00 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0\n", " 1 9.5312500e-01 0.00e+00 1.25e+01 -1.0 1.00e+00 - 1.00e+00 2.50e-01f 3\n", " 2 4.8320569e-01 0.00e+00 1.01e+00 -1.0 9.03e-02 - 1.00e+00 1.00e+00f 1\n", " 3 4.5708829e-01 0.00e+00 9.53e+00 -1.0 4.29e-01 - 1.00e+00 5.00e-01f 2\n", " 4 1.8894205e-01 0.00e+00 4.15e-01 -1.0 9.51e-02 - 1.00e+00 1.00e+00f 1\n", " 5 1.3918726e-01 0.00e+00 6.51e+00 -1.7 3.49e-01 - 1.00e+00 5.00e-01f 2\n", " 6 5.4940990e-02 0.00e+00 4.51e-01 -1.7 9.29e-02 - 1.00e+00 1.00e+00f 1\n", " 7 2.9144630e-02 0.00e+00 2.27e+00 -1.7 2.49e-01 - 1.00e+00 5.00e-01f 2\n", " 8 9.8586451e-03 0.00e+00 1.15e+00 -1.7 1.10e-01 - 1.00e+00 1.00e+00f 1\n", " 9 2.3237475e-03 0.00e+00 1.00e+00 -1.7 1.00e-01 - 1.00e+00 1.00e+00f 1\n", "iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls\n", " 10 2.3797236e-04 0.00e+00 2.19e-01 -1.7 5.09e-02 - 1.00e+00 1.00e+00f 1\n", " 11 4.9267371e-06 0.00e+00 5.95e-02 -1.7 2.53e-02 - 1.00e+00 1.00e+00f 1\n", " 12 2.8189505e-09 0.00e+00 8.31e-04 -2.5 3.20e-03 - 1.00e+00 1.00e+00f 1\n", " 13 1.0095040e-15 0.00e+00 8.68e-07 -5.7 9.78e-05 - 1.00e+00 1.00e+00f 1\n", " 14 1.3288608e-28 0.00e+00 2.02e-13 -8.6 4.65e-08 - 1.00e+00 1.00e+00f 1\n", "\n", "Number of Iterations....: 14\n", "\n", " (scaled) (unscaled)\n", "Objective...............: 1.3288608467480825e-28 1.3288608467480825e-28\n", "Dual infeasibility......: 2.0183854587685121e-13 2.0183854587685121e-13\n", "Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00\n", "Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00\n", "Overall NLP error.......: 2.0183854587685121e-13 2.0183854587685121e-13\n", "\n", "\n", "Number of objective function evaluations = 36\n", "Number of objective gradient evaluations = 15\n", "Number of equality constraint evaluations = 0\n", "Number of inequality constraint evaluations = 0\n", "Number of equality constraint Jacobian evaluations = 0\n", "Number of inequality constraint Jacobian evaluations = 0\n", "Number of Lagrangian Hessian evaluations = 14\n", "Total CPU secs in IPOPT (w/o function evaluations) = 0.079\n", "Total CPU secs in NLP function evaluations = 0.011\n", "\n", "EXIT: Optimal Solution Found.\n", "x = 0.9999999999999899 y = 0.9999999999999792\n", "This is Ipopt version 3.12.10, running with linear solver mumps.\n", "NOTE: Other linear solvers might be more efficient (see Ipopt documentation).\n", "\n", "Number of nonzeros in equality constraint Jacobian...: 2\n", "Number of nonzeros in inequality constraint Jacobian.: 0\n", "Number of nonzeros in Lagrangian Hessian.............: 3\n", "\n", "Total number of variables............................: 2\n", " variables with only lower bounds: 0\n", " variables with lower and upper bounds: 0\n", " variables with only upper bounds: 0\n", "Total number of equality constraints.................: 1\n", "Total number of inequality constraints...............: 0\n", " inequality constraints with only lower bounds: 0\n", " inequality constraints with lower and upper bounds: 0\n", " inequality constraints with only upper bounds: 0\n", "\n", "iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls\n", " 0 1.0000000e+00 1.00e+01 1.00e+00 -1.0 0.00e+00 - 0.00e+00 0.00e+00 0\n", " 1 9.6315968e+05 0.00e+00 3.89e+05 -1.0 9.91e+00 - 1.00e+00 1.00e+00h 1\n", " 2 1.6901461e+05 0.00e+00 1.16e+05 -1.0 3.24e+00 - 1.00e+00 1.00e+00f 1\n", " 3 2.5433173e+04 1.78e-15 3.18e+04 -1.0 2.05e+00 - 1.00e+00 1.00e+00f 1\n", " 4 2.6527756e+03 0.00e+00 7.79e+03 -1.0 1.19e+00 - 1.00e+00 1.00e+00f 1\n", " 5 1.1380324e+02 0.00e+00 1.35e+03 -1.0 5.62e-01 - 1.00e+00 1.00e+00f 1\n", " 6 3.3745506e+00 0.00e+00 8.45e+01 -1.0 1.50e-01 - 1.00e+00 1.00e+00f 1\n", " 7 2.8946196e+00 0.00e+00 4.22e-01 -1.0 1.07e-02 - 1.00e+00 1.00e+00f 1\n", " 8 2.8946076e+00 0.00e+00 1.07e-05 -1.7 5.42e-05 - 1.00e+00 1.00e+00f 1\n", " 9 2.8946076e+00 0.00e+00 5.91e-13 -8.6 1.38e-09 - 1.00e+00 1.00e+00f 1\n", "\n", "Number of Iterations....: 9\n", "\n", " (scaled) (unscaled)\n", "Objective...............: 2.8946075504894599e+00 2.8946075504894599e+00\n", "Dual infeasibility......: 5.9130478291535837e-13 5.9130478291535837e-13\n", "Constraint violation....: 0.0000000000000000e+00 0.0000000000000000e+00\n", "Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00\n", "Overall NLP error.......: 5.9130478291535837e-13 5.9130478291535837e-13\n", "\n", "\n", "Number of objective function evaluations = 10\n", "Number of objective gradient evaluations = 10\n", "Number of equality constraint evaluations = 10\n", "Number of inequality constraint evaluations = 0\n", "Number of equality constraint Jacobian evaluations = 1\n", "Number of inequality constraint Jacobian evaluations = 0\n", "Number of Lagrangian Hessian evaluations = 9\n", "Total CPU secs in IPOPT (w/o function evaluations) = 0.002\n", "Total CPU secs in NLP function evaluations = 0.000\n", "\n", "EXIT: Optimal Solution Found.\n", "x = " ] }, { "name": "stdout", "output_type": "stream", "text": [ "2.701147124098218 y = 7.2988528759017814\n" ] } ], "source": [ "# solve\n", "# min (1-x)^2 + 100(y-x^2)^2)\n", "# st x + y >= 10\n", "\n", "using JuMP,Ipopt\n", "m = Model(with_optimizer(Ipopt.Optimizer)) # settings for the solver\n", "@variable(m, x, start = 0.0)\n", "@variable(m, y, start = 0.0)\n", "\n", "@NLobjective(m, Min, (1-x)^2 + 100(y-x^2)^2)\n", "\n", "JuMP.optimize!(m)\n", "println(\"x = \", value(x), \" y = \", value(y))\n", "\n", "# adding a (linear) constraint\n", "@constraint(m, x + y == 10)\n", "JuMP.optimize!(m)\n", "println(\"x = \", value(x), \" y = \", value(y))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### BlackBoxOptim.jl\n", "\n", "Another package for doing global optimization without derivatives is [BlackBoxOptim.jl](https://github.com/robertfeldt/BlackBoxOptim.jl).\n", "\n", "To see an example from the documentation" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Starting optimization with optimizer DiffEvoOpt{FitPopulation{Float64},RadiusLimitedSelector,BlackBoxOptim.AdaptiveDiffEvoRandBin{3},RandomBound{ContinuousRectSearchSpace}}\n", "0.00 secs, 0 evals, 0 steps\n", "\n", "Optimization stopped after 10001 steps and 0.08 seconds\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Termination reason: Max number of steps (10000) reached\n", "Steps per second = 123231.78\n", "Function evals per second = 124476.29\n", "Improvements/step = 0.20420\n", "Total function evaluations = 10102\n", "\n", "\n", "Best candidate found: " ] }, { "name": "stdout", "output_type": "stream", "text": [ "[1.0, 1.0]\n", "\n", "Fitness: 0.000000000\n", "\n" ] } ], "source": [ "using BlackBoxOptim\n", "\n", "function rosenbrock2d(x)\n", "return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n", "end\n", "\n", "results = bboptimize(rosenbrock2d; SearchRange = (-5.0, 5.0), NumDimensions = 2);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An example for [parallel execution](https://github.com/robertfeldt/BlackBoxOptim.jl/blob/master/examples/rosenbrock_parallel.jl) of the objective is provided." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Systems of Equations and Least Squares" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Roots.jl\n", "\n", "A root of a real function $ f $ on $ [a,b] $ is an $ x \\in [a, b] $ such that $ f(x)=0 $.\n", "\n", "For example, if we plot the function\n", "\n", "\n", "\n", "$$\n", "f(x) = \\sin(4 (x - 1/4)) + x + x^{20} - 1 \\tag{1}\n", "$$\n", "\n", "with $ x \\in [0,1] $ we get\n", "\n", "\n", "\n", " \n", "The unique root is approximately 0.408.\n", "\n", "The [Roots.jl](https://github.com/JuliaLang/Roots.jl) package offers `fzero()` to find roots" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "0.40829350427936706" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Roots\n", "f(x) = sin(4 * (x - 1/4)) + x + x^20 - 1\n", "fzero(f, 0, 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NLsolve.jl\n", "\n", "The [NLsolve.jl](https://github.com/JuliaNLSolvers/NLsolve.jl/) package provides functions to solve for multivariate systems of equations and fixed points.\n", "\n", "From the documentation, to solve for a system of equations without providing a Jacobian" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "Results of Nonlinear Solver Algorithm\n", " * Algorithm: Trust-region with dogleg and autoscaling\n", " * Starting Point: [0.1, 1.2]\n", " * Zero: [-7.775548712324193e-17, 0.9999999999999999]\n", " * Inf-norm of residuals: 0.000000\n", " * Iterations: 4\n", " * Convergence: true\n", " * |x - x'| < 0.0e+00: false\n", " * |f(x)| < 1.0e-08: true\n", " * Function Calls (f): 5\n", " * Jacobian Calls (df/dx): 5" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using NLsolve\n", "\n", "f(x) = [(x[1]+3)*(x[2]^3-7)+18\n", " sin(x[2]*exp(x[1])-1)] # returns an array\n", "\n", "results = nlsolve(f, [ 0.1; 1.2])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above case, the algorithm used finite differences to calculate the Jacobian.\n", "\n", "Alternatively, if `f(x)` is written generically, you can use auto-differentiation with a single setting." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "converged=true at root=[-3.487552479724522e-16, 1.0000000000000002] in 4 iterations and 5 function calls\n" ] } ], "source": [ "results = nlsolve(f, [ 0.1; 1.2], autodiff=:forward)\n", "\n", "println(\"converged=$(NLsolve.converged(results)) at root=$(results.zero) in \"*\n", "\"$(results.iterations) iterations and $(results.f_calls) function calls\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Providing a function which operates inplace (i.e., modifies an argument) may help performance for large systems of equations (and hurt it for small ones)." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "converged=true at root=[-3.487552479724522e-16, 1.0000000000000002] in 4 iterations and 5 function calls\n" ] } ], "source": [ "function f!(F, x) # modifies the first argument\n", " F[1] = (x[1]+3)*(x[2]^3-7)+18\n", " F[2] = sin(x[2]*exp(x[1])-1)\n", "end\n", "\n", "results = nlsolve(f!, [ 0.1; 1.2], autodiff=:forward)\n", "\n", "println(\"converged=$(NLsolve.converged(results)) at root=$(results.zero) in \"*\n", "\"$(results.iterations) iterations and $(results.f_calls) function calls\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## LeastSquaresOptim.jl\n", "\n", "Many optimization problems can be solved using linear or nonlinear least squares.\n", "\n", "Let $ x \\in R^N $ and $ F(x) : R^N \\to R^M $ with $ M \\geq N $, then the nonlinear least squares problem is\n", "\n", "$$\n", "\\min_x F(x)^T F(x)\n", "$$\n", "\n", "While $ F(x)^T F(x) \\to R $, and hence this problem could technically use any nonlinear optimizer, it is useful to exploit the structure of the problem.\n", "\n", "In particular, the Jacobian of $ F(x) $, can be used to approximate the Hessian of the objective.\n", "\n", "As with most nonlinear optimization problems, the benefits will typically become evident only when analytical or automatic differentiation is possible.\n", "\n", "If $ M = N $ and we know a root $ F(x^*) = 0 $ to the system of equations exists, then NLS is the defacto method for solving large **systems of equations**.\n", "\n", "An implementation of NLS is given in [LeastSquaresOptim.jl](https://github.com/matthieugomez/LeastSquaresOptim.jl).\n", "\n", "From the documentation" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "Results of Optimization Algorithm\n", " * Algorithm: Dogleg\n", " * Minimizer: [1.0,1.0]\n", " * Sum of squares at Minimum: 0.000000\n", " * Iterations: 51\n", " * Convergence: true\n", " * |x - x'| < 1.0e-08: false\n", " * |f(x) - f(x')| / |f(x)| < 1.0e-08: true\n", " * |g(x)| < 1.0e-08: false\n", " * Function Calls: 52\n", " * Gradient Calls: 36\n", " * Multiplication Calls: 159\n" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using LeastSquaresOptim\n", "function rosenbrock(x)\n", " [1 - x[1], 100 * (x[2]-x[1]^2)]\n", "end\n", "LeastSquaresOptim.optimize(rosenbrock, zeros(2), Dogleg())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** Because there is a name clash between `Optim.jl` and this package, to use both we need to qualify the use of the `optimize` function (i.e. `LeastSquaresOptim.optimize`).\n", "\n", "Here, by default it will use AD with `ForwardDiff.jl` to calculate the Jacobian,\n", "but you could also provide your own calculation of the Jacobian (analytical or using finite differences) and/or calculate the function inplace." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "Results of Optimization Algorithm\n", " * Algorithm: Dogleg\n", " * Minimizer: [1.0,1.0]\n", " * Sum of squares at Minimum: 0.000000\n", " * Iterations: 51\n", " * Convergence: true\n", " * |x - x'| < 1.0e-08: false\n", " * |f(x) - f(x')| / |f(x)| < 1.0e-08: true\n", " * |g(x)| < 1.0e-08: false\n", " * Function Calls: 52\n", " * Gradient Calls: 36\n", " * Multiplication Calls: 159\n" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function rosenbrock_f!(out, x)\n", " out[1] = 1 - x[1]\n", " out[2] = 100 * (x[2]-x[1]^2)\n", "end\n", "LeastSquaresOptim.optimize!(LeastSquaresProblem(x = zeros(2),\n", " f! = rosenbrock_f!, output_length = 2))\n", "\n", "# if you want to use gradient\n", "function rosenbrock_g!(J, x)\n", " J[1, 1] = -1\n", " J[1, 2] = 0\n", " J[2, 1] = -200 * x[1]\n", " J[2, 2] = 100\n", "end\n", "LeastSquaresOptim.optimize!(LeastSquaresProblem(x = zeros(2),\n", " f! = rosenbrock_f!, g! = rosenbrock_g!, output_length = 2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Additional Notes\n", "\n", "Watch [this video](https://www.youtube.com/watch?v=vAp6nUMrKYg&feature=youtu.be) from one of Julia’s creators on automatic differentiation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1\n", "\n", "Doing a simple implementation of forward-mode auto-differentiation is very easy in Julia since it is generic. In this exercise, you\n", "will fill in a few of the operations required for a simple AD implementation.\n", "\n", "First, we need to provide a type to hold the dual." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "hide-output": false }, "outputs": [], "source": [ "struct DualNumber{T} <: Real\n", " val::T\n", " ϵ::T\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we have made it a subtype of `Real` so that it can pass through functions expecting Reals.\n", "\n", "We can add on a variety of chain rule definitions by importing in the appropriate functions and adding DualNumber versions. For example" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "+ (generic function with 265 methods)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import Base: +, *, -, ^, exp\n", "+(x::DualNumber, y::DualNumber) = DualNumber(x.val + y.val, x.ϵ + y.ϵ) # dual addition\n", "+(x::DualNumber, a::Number) = DualNumber(x.val + a, x.ϵ) # i.e. scalar addition, not dual\n", "+(a::Number, x::DualNumber) = DualNumber(x.val + a, x.ϵ) # i.e. scalar addition, not dual" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With that, we can seed a dual number and find simple derivatives," ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "DualNumber{Float64}(8.0, 1.0)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f(x, y) = 3.0 + x + y\n", "\n", "x = DualNumber(2.0, 1.0) # x -> 2.0 + 1.0\\epsilon\n", "y = DualNumber(3.0, 0.0) # i.e. y = 3.0, no derivative\n", "\n", "\n", "# seeded calculates both teh function and the d/dx gradient!\n", "f(x,y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this assignment:\n", "\n", "1. Add in AD rules for the other operations: `*, -, ^, exp`. \n", "1. Come up with some examples of univariate and multivariate functions combining those operations and use your AD implementation to find the derivatives. " ] } ], "metadata": { "date": 1590087462.6997023, "download_nb": 1, "download_nb_path": "https://julia.quantecon.org/", "filename": "optimization_solver_packages.rst", "filename_with_path": "more_julia/optimization_solver_packages", "kernelspec": { "display_name": "Julia 1.4.1", "language": "julia", "name": "julia-1.4" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.4.1" }, "title": "Solvers, Optimizers, and Automatic Differentiation" }, "nbformat": 4, "nbformat_minor": 2 }