{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "<a id='optimization-solver-packages'></a>\n",
    "<div id=\"qe-notebook-header\" style=\"text-align:right;\">\n",
    "        <a href=\"https://quantecon.org/\" title=\"quantecon.org\">\n",
    "                <img style=\"width:250px;display:inline;\" src=\"https://assets.quantecon.org/img/qe-menubar-logo.svg\" alt=\"QuantEcon\">\n",
    "        </a>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Solvers, Optimizers, and Automatic Differentiation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Contents\n",
    "\n",
    "- [Solvers, Optimizers, and Automatic Differentiation](#Solvers,-Optimizers,-and-Automatic-Differentiation)  \n",
    "  - [Overview](#Overview)  \n",
    "  - [Introduction to Automatic Differentiation](#Introduction-to-Automatic-Differentiation)  \n",
    "  - [Optimization](#Optimization)  \n",
    "  - [Systems of Equations and Least Squares](#Systems-of-Equations-and-Least-Squares)  \n",
    "  - [LeastSquaresOptim.jl](#LeastSquaresOptim.jl)  \n",
    "  - [Additional Notes](#Additional-Notes)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "In this lecture we introduce a few of the Julia libraries that we’ve found particularly useful for quantitative work in economics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": true
   },
   "outputs": [],
   "source": [
    "using InstantiateFromURL\n",
    "github_project(\"QuantEcon/quantecon-notebooks-julia\", path = \"more_julia\", version = \"0.1.0\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": true
   },
   "outputs": [],
   "source": [
    "using LinearAlgebra, Statistics, Compat\n",
    "using ForwardDiff, Flux, Optim, JuMP, Ipopt, BlackBoxOptim, Roots, NLsolve\n",
    "using LeastSquaresOptim, Flux.Tracker\n",
    "using Flux.Tracker: update!\n",
    "using Optim: converged, maximum, maximizer, minimizer, iterations #some extra functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction to Automatic Differentiation\n",
    "\n",
    "Automatic differentiation (AD, sometimes called algorithmic differentiation) is a crucial way to increase the performance of both estimation and solution methods\n",
    "\n",
    "There are essentially four ways to calculate the gradient or Jacobian on a computer\n",
    "\n",
    "- Calculation by hand  \n",
    "  \n",
    "  - Where possible, you can calculate the derivative on “pen and paper” and potentially simplify the expression  \n",
    "  - Sometimes, though not always, the most accurate and fastest option if there are algebraic simplifications  \n",
    "  - The algebra is error prone for non-trivial setups  \n",
    "  \n",
    "- Finite differences  \n",
    "  \n",
    "  - Evaluate the function at least $ N $ times to get the gradient – Jacobians are even worse  \n",
    "  - Large $ \\Delta $ is numerically stable but inaccurate, too small of $ \\Delta $ is numerically unstable but more accurate  \n",
    "  - Avoid if you can, and use packages (e.g. [DiffEqDiffTools.jl](https://github.com/JuliaDiffEq/DiffEqDiffTools.jl) ) to get a good choice of $ \\Delta $  \n",
    "  \n",
    "\n",
    "\n",
    "$$\n",
    "\\partial_{x_i}f(x_1,\\ldots x_N) \\approx \\frac{f(x_1,\\ldots x_i + \\Delta,\\ldots x_N) - f(x_1,\\ldots x_i,\\ldots x_N)}{\\Delta}\n",
    "$$\n",
    "\n",
    "- Symbolic differentiation  \n",
    "  \n",
    "  - If you put in an expression for a function, some packages will do symbolic differentiation  \n",
    "  - In effect, repeated applications of the chain rule, product rule, etc.  \n",
    "  - Sometimes a good solution, if the package can handle your functions  \n",
    "  \n",
    "- Automatic Differentiation  \n",
    "  \n",
    "  - Essentially the same as symbolic differentiation, just occurring at a different time in the compilation process  \n",
    "  - Equivalent to analytical derivatives since it uses the chain rule, etc.  \n",
    "  \n",
    "\n",
    "\n",
    "We will explore AD packages in Julia rather than the alternatives"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Automatic Differentiation\n",
    "\n",
    "To summarize here, first recall the chain rule (adapted from [Wikipedia](https://en.wikipedia.org/wiki/Chain_rule))\n",
    "\n",
    "$$\n",
    "\\frac{dy}{dx} = \\frac{dy}{dw} \\cdot \\frac{dw}{dx}\n",
    "$$\n",
    "\n",
    "Consider functions composed of calculations with fundamental operations with known analytical derivatives, such as $ f(x_1, x_2) = x_1 x_2 + \\sin(x_1) $\n",
    "\n",
    "To compute $ \\frac{d f(x_1,x_2)}{d x_1} $\n",
    "\n",
    "$$\n",
    "\\begin{array}{l|l}\n",
    "\\text{Operations to compute value} &\n",
    "\\text{Operations to compute \\$\\frac{\\partial f(x_1,x_2)}{\\partial x_1}\\$}\n",
    "\\\\\n",
    "\\hline\n",
    "w_1 = x_1 &\n",
    "\\frac{d w_1}{d x_1} = 1 \\text{ (seed)}\\\\\n",
    "w_2 = x_2 &\n",
    "\\frac{d  w_2}{d x_1} = 0 \\text{ (seed)}\n",
    "\\\\\n",
    "w_3 = w_1 \\cdot w_2 &\n",
    "\\frac{\\partial  w_3}{\\partial x_1} = w_2 \\cdot \\frac{d  w_1}{d x_1} + w_1 \\cdot \\frac{d  w_2}{d x_1}\n",
    "\\\\\n",
    "w_4 = \\sin w_1 &\n",
    "\\frac{d  w_4}{d x_1} = \\cos w_1 \\cdot \\frac{d  w_1}{d x_1}\n",
    "\\\\\n",
    "w_5 = w_3 + w_4 &\n",
    "\\frac{\\partial  w_5}{\\partial x_1} = \\frac{\\partial  w_3}{\\partial x_1} + \\frac{d  w_4}{d x_1}\n",
    "\\end{array}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using Dual Numbers\n",
    "\n",
    "One way to implement this (used in forward-mode AD) is to use [dual numbers](https://en.wikipedia.org/wiki/Dual_number)\n",
    "\n",
    "Take a number $ x $ and augment it with an infinitesimal $ \\epsilon $ such that $ \\epsilon^2 = 0 $, i.e. $ x \\to x + x' \\epsilon $\n",
    "\n",
    "All math is then done with this (mathematical, rather than Julia) tuple $ (x, x') $ where the $ x' $ may be hidden from the user\n",
    "\n",
    "With this definition, we can write a general rule for differentiation of $ g(x,y) $ as\n",
    "\n",
    "$$\n",
    "g \\big( \\left(x,x'\\right),\\left(y,y'\\right) \\big) = \\left(g(x,y),\\partial_x g(x,y)x' + \\partial_y g(x,y)y' \\right)\n",
    "$$\n",
    "\n",
    "This calculation is simply the chain rule for the total derivative\n",
    "\n",
    "An AD library using dual numbers concurrently calculates the function and its derivatives, repeating the chain rule until it hits a set of intrinsic rules such as\n",
    "\n",
    "$$\n",
    "\\begin{align*}\n",
    "x + y \\to \\left(x,x'\\right) + \\left(y,y'\\right) &= \\left(x + y,\\underbrace{x' + y'}_{\\partial(x + y) = \\partial x + \\partial y}\\right)\\\\\n",
    "x y \\to \\left(x,x'\\right) \\times \\left(y,y'\\right) &= \\left(x y,\\underbrace{x'y + y'x}_{\\partial(x y) = y \\partial x + x \\partial y}\\right)\\\\\n",
    "\\exp(x) \\to \\exp(\\left(x, x'\\right)) &= \\left(\\exp(x),\\underbrace{x'\\exp(x)}_{\\partial(\\exp(x)) = \\exp(x)\\partial x} \\right)\n",
    "\\end{align*}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ForwardDiff.jl\n",
    "\n",
    "We have already seen one of the AD packages in Julia"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "using ForwardDiff\n",
    "h(x) = sin(x[1]) + x[1] * x[2] + sinh(x[1] * x[2]) # multivariate.\n",
    "x = [1.4 2.2]\n",
    "@show ForwardDiff.gradient(h,x) # use AD, seeds from x\n",
    "\n",
    "#Or, can use complicated functions of many variables\n",
    "f(x) = sum(sin, x) + prod(tan, x) * sum(sqrt, x)\n",
    "g = (x) -> ForwardDiff.gradient(f, x); # g() is now the gradient\n",
    "@show g(rand(20)); # gradient at a random point\n",
    "# ForwardDiff.hessian(f,x') # or the hessian"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can even auto-differentiate complicated functions with embedded iterations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "function squareroot(x) #pretending we don't know sqrt()\n",
    "    z = copy(x) # Initial starting point for Newton’s method\n",
    "    while abs(z*z - x) > 1e-13\n",
    "        z = z - (z*z-x)/(2z)\n",
    "    end\n",
    "    return z\n",
    "end\n",
    "squareroot(2.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "using ForwardDiff\n",
    "dsqrt(x) = ForwardDiff.derivative(squareroot, x)\n",
    "dsqrt(2.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Flux.jl\n",
    "\n",
    "Another is [Flux.jl](https://github.com/FluxML/Flux.jl), a machine learning library in Julia\n",
    "\n",
    "AD is one of the main reasons that machine learning has become so powerful in\n",
    "recent years, and is an essential component of any machine learning package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "using Flux\n",
    "using Flux.Tracker\n",
    "using Flux.Tracker: update!\n",
    "\n",
    "f(x) = 3x^2 + 2x + 1\n",
    "\n",
    "# df/dx = 6x + 2\n",
    "df(x) = Tracker.gradient(f, x)[1]\n",
    "\n",
    "df(2) # 14.0 (tracked)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "A = rand(2,2)\n",
    "f(x) = A * x\n",
    "x0 = [0.1, 2.0]\n",
    "f(x0)\n",
    "Flux.jacobian(f, x0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As before, we can differentiate complicated functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "dsquareroot(x) = Tracker.gradient(squareroot, x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the documentation, we can use a machine learning approach to a linear regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "W = rand(2, 5)\n",
    "b = rand(2)\n",
    "\n",
    "predict(x) = W*x .+ b\n",
    "\n",
    "function loss(x, y)\n",
    "ŷ = predict(x)\n",
    "sum((y .- ŷ).^2)\n",
    "end\n",
    "\n",
    "x, y = rand(5), rand(2) # Dummy data\n",
    "loss(x, y) # ~ 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "W = param(W)\n",
    "b = param(b)\n",
    "\n",
    "gs = Tracker.gradient(() -> loss(x, y), Params([W, b]))\n",
    "\n",
    "Δ = gs[W]\n",
    "\n",
    "# Update the parameter and reset the gradient\n",
    "update!(W, -0.1Δ)\n",
    "\n",
    "loss(x, y) # ~ 2.5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimization\n",
    "\n",
    "There are a large number of packages intended to be used for optimization in Julia\n",
    "\n",
    "Part of the reason for the diversity of options is that Julia makes it possible to efficiently implement a large number of variations on optimization routines\n",
    "\n",
    "The other reason is that different types of optimization problems require different algorithms"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Optim.jl\n",
    "\n",
    "A good pure-Julia solution for the (unconstrained or box-bounded) optimization of\n",
    "univariate and multivariate function is the [Optim.jl](https://github.com/JuliaNLSolvers/Optim.jl) package\n",
    "\n",
    "By default, the algorithms in `Optim.jl` target minimization rather than\n",
    "maximization, so if a function is called `optimize` it will mean minimization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Univariate Functions on Bounded Intervals\n",
    "\n",
    "[Univariate optimization](http://julianlsolvers.github.io/Optim.jl/stable/user/minimization/#minimizing-a-univariate-function-on-a-bounded-interval)\n",
    "defaults to a robust hybrid optimization routine called [Brent’s method](https://en.wikipedia.org/wiki/Brent%27s_method)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "using Optim\n",
    "using Optim: converged, maximum, maximizer, minimizer, iterations #some extra functions\n",
    "\n",
    "result = optimize(x-> x^2, -2.0, 1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Always check if the results converged, and throw errors otherwise"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "converged(result) || error(\"Failed to converge in $(iterations(result)) iterations\")\n",
    "xmin = result.minimizer\n",
    "result.minimum"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first line is a logical OR between `converged(result)` and `error(\"...\")`\n",
    "\n",
    "If the convergence check passes, the logical sentence is true, and it will proceed to the next line; if not, it will throw the error\n",
    "\n",
    "Or to maximize"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "f(x) = -x^2\n",
    "result = maximize(f, -2.0, 1.0)\n",
    "converged(result) || error(\"Failed to converge in $(iterations(result)) iterations\")\n",
    "xmin = maximizer(result)\n",
    "fmax = maximum(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** Notice that we call `optimize` results using `result.minimizer`, and `maximize` results using `maximizer(result)`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Unconstrained Multivariate Optimization\n",
    "\n",
    "There are a variety of [algorithms and options](http://julianlsolvers.github.io/Optim.jl/stable/user/minimization/#_top) for multivariate optimization\n",
    "\n",
    "From the documentation, the simplest version is"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n",
    "x_iv = [0.0, 0.0]\n",
    "results = optimize(f, x_iv) # i.e. optimize(f, x_iv, NelderMead())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The default algorithm in `NelderMead`, which is derivative-free and hence requires many function evaluations\n",
    "\n",
    "To change the algorithm type to [L-BFGS](http://julianlsolvers.github.io/Optim.jl/stable/algo/lbfgs/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "results = optimize(f, x_iv, LBFGS())\n",
    "println(\"minimum = $(results.minimum) with argmin = $(results.minimizer) in \"*\n",
    "\"$(results.iterations) iterations\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that this has fewer iterations\n",
    "\n",
    "As no derivative was given, it used [finite differences](https://en.wikipedia.org/wiki/Finite_difference) to approximate the gradient of `f(x)`\n",
    "\n",
    "However, since most of the algorithms require derivatives, you will often want to use auto differentiation or pass analytical gradients if possible"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n",
    "x_iv = [0.0, 0.0]\n",
    "results = optimize(f, x_iv, LBFGS(), autodiff=:forward) # i.e. use ForwardDiff.jl\n",
    "println(\"minimum = $(results.minimum) with argmin = $(results.minimizer) in \"*\n",
    "\"$(results.iterations) iterations\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that we did not need to use `ForwardDiff.jl` directly, as long as our `f(x)` function was written to be generic (see the [generic programming lecture](generic_programming.html) )\n",
    "\n",
    "Alternatively, with an analytical gradient"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n",
    "x_iv = [0.0, 0.0]\n",
    "function g!(G, x)\n",
    "    G[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]\n",
    "    G[2] = 200.0 * (x[2] - x[1]^2)\n",
    "end\n",
    "\n",
    "results = optimize(f, g!, x0, LBFGS()) # or ConjugateGradient()\n",
    "println(\"minimum = $(results.minimum) with argmin = $(results.minimizer) in \"*\n",
    "\"$(results.iterations) iterations\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For derivative-free methods, you can change the algorithm – and have no need to provide a gradient"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n",
    "x_iv = [0.0, 0.0]\n",
    "results = optimize(f, x_iv, SimulatedAnnealing()) # or ParticleSwarm() or NelderMead()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, you will note that this did not converge, as stochastic methods typically require many more iterations as a tradeoff for their global-convergence properties\n",
    "\n",
    "See the [maximum likelihood](http://julianlsolvers.github.io/Optim.jl/stable/examples/generated/maxlikenlm/)\n",
    "example and the accompanying [Jupyter notebook](https://nbviewer.jupyter.org/github/JuliaNLSolvers/Optim.jl/blob/gh-pages/v0.15.3/examples/generated/maxlikenlm.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### JuMP.jl\n",
    "\n",
    "The [JuMP.jl](https://github.com/JuliaOpt/JuMP.jl) package is an ambitious implementation of a modelling language for optimization problems in Julia\n",
    "\n",
    "In that sense, it is more like an AMPL (or Pyomo) built on top of the Julia\n",
    "language with macros, and able to use a variety of different commerical and open source solvers\n",
    "\n",
    "If you have a linear, quadratic, conic, mixed-integer linear, etc. problem then this will likely be the ideal “meta-package” for calling various solvers\n",
    "\n",
    "For nonlinear problems, the modelling language may make things difficult for complicated functions (as it is not designed to be used as a general-purpose nonlinear optimizer)\n",
    "\n",
    "See the [quick start guide](http://www.juliaopt.org/JuMP.jl/0.18/quickstart.html) for more details on all of the options\n",
    "\n",
    "The following is an example of calling a linear objective with a nonlinear constraint (provided by an external function)\n",
    "\n",
    "Here `Ipopt` stands for `Interior Point OPTimizer`, a [nonlinear solver](https://github.com/JuliaOpt/Ipopt.jl) in Julia"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "using JuMP, Ipopt\n",
    "# solve\n",
    "# max( x[1] + x[2] )\n",
    "# st sqrt(x[1]^2 + x[2]^2) <= 1\n",
    "\n",
    "function squareroot(x) # pretending we don't know sqrt()\n",
    "    z = x # Initial starting point for Newton’s method\n",
    "    while abs(z*z - x) > 1e-13\n",
    "        z = z - (z*z-x)/(2z)\n",
    "    end\n",
    "    return z\n",
    "end\n",
    "m = Model(with_optimizer(Ipopt.Optimizer))\n",
    "# need to register user defined functions for AD\n",
    "JuMP.register(m,:squareroot, 1, squareroot, autodiff=true)\n",
    "\n",
    "@variable(m, x[1:2], start=0.5) # start is the initial condition\n",
    "@objective(m, Max, sum(x))\n",
    "@NLconstraint(m, squareroot(x[1]^2+x[2]^2) <= 1)\n",
    "@show JuMP.optimize!(m)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And this is an example of a quadratic objective"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# solve\n",
    "# min (1-x)^2 + 100(y-x^2)^2)\n",
    "# st x + y >= 10\n",
    "\n",
    "using JuMP,Ipopt\n",
    "m = Model(with_optimizer(Ipopt.Optimizer)) # settings for the solver\n",
    "@variable(m, x, start = 0.0)\n",
    "@variable(m, y, start = 0.0)\n",
    "\n",
    "@NLobjective(m, Min, (1-x)^2 + 100(y-x^2)^2)\n",
    "\n",
    "JuMP.optimize!(m)\n",
    "println(\"x = \", value(x), \" y = \", value(y))\n",
    "\n",
    "# adding a (linear) constraint\n",
    "@constraint(m, x + y == 10)\n",
    "JuMP.optimize!(m)\n",
    "println(\"x = \", value(x), \" y = \", value(y))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### BlackBoxOptim.jl\n",
    "\n",
    "Another package for doing global optimization without derivatives is [BlackBoxOptim.jl](https://github.com/robertfeldt/BlackBoxOptim.jl)\n",
    "\n",
    "To see an example from the documentation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "using BlackBoxOptim\n",
    "\n",
    "function rosenbrock2d(x)\n",
    "return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2\n",
    "end\n",
    "\n",
    "results = bboptimize(rosenbrock2d; SearchRange = (-5.0, 5.0), NumDimensions = 2);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An example for [parallel execution](https://github.com/robertfeldt/BlackBoxOptim.jl/blob/master/examples/rosenbrock_parallel.jl) of the objective is provided"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Systems of Equations and Least Squares"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Roots.jl\n",
    "\n",
    "A root of a real function $ f $ on $ [a,b] $ is an $ x \\in [a, b] $ such that $ f(x)=0 $\n",
    "\n",
    "For example, if we plot the function\n",
    "\n",
    "\n",
    "<a id='equation-root-f'></a>\n",
    "$$\n",
    "f(x) = \\sin(4 (x - 1/4)) + x + x^{20} - 1 \\tag{1}\n",
    "$$\n",
    "\n",
    "with $ x \\in [0,1] $ we get\n",
    "\n",
    "<img src=\"_static/figures/sine-screenshot-2.png\" style=\"\">\n",
    "\n",
    "  \n",
    "The unique root is approximately 0.408\n",
    "\n",
    "The [Roots.jl](https://github.com/JuliaLang/Roots.jl) package offers `fzero()` to find roots"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "using Roots\n",
    "f(x) = sin(4 * (x - 1/4)) + x + x^20 - 1\n",
    "fzero(f, 0, 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### NLsolve.jl\n",
    "\n",
    "The [NLsolve.jl](https://github.com/JuliaNLSolvers/NLsolve.jl/) package provides functions to solve for multivariate systems of equations and fixed points\n",
    "\n",
    "From the documentation, to solve for a system of equations without providing a Jacobian"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "using NLsolve\n",
    "\n",
    "f(x) = [(x[1]+3)*(x[2]^3-7)+18\n",
    "        sin(x[2]*exp(x[1])-1)] # returns an array\n",
    "\n",
    "results = nlsolve(f, [ 0.1; 1.2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the above case, the algorithm used finite differences to calculate the Jacobian\n",
    "\n",
    "Alternatively, if `f(x)` is written generically, you can use auto-differentiation with a single setting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "results = nlsolve(f, [ 0.1; 1.2], autodiff=:forward)\n",
    "\n",
    "println(\"converged=$(NLsolve.converged(results)) at root=$(results.zero) in \"*\n",
    "\"$(results.iterations) iterations and $(results.f_calls) function calls\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Providing a function which operates inplace (i.e. modifies an argument) may help performance for large systems of equations (and hurt it for small ones)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "function f!(F, x) # modifies the first argument\n",
    "    F[1] = (x[1]+3)*(x[2]^3-7)+18\n",
    "    F[2] = sin(x[2]*exp(x[1])-1)\n",
    "end\n",
    "\n",
    "results = nlsolve(f!, [ 0.1; 1.2], autodiff=:forward)\n",
    "\n",
    "println(\"converged=$(NLsolve.converged(results)) at root=$(results.zero) in \"*\n",
    "\"$(results.iterations) iterations and $(results.f_calls) function calls\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## LeastSquaresOptim.jl\n",
    "\n",
    "Many optimization problems can be solved using linear or nonlinear least squares\n",
    "\n",
    "Let $ x \\in R^N $ and $ F(x) : R^N \\to R^M $ with $ M \\geq N $, then the nonlinear least squares problem is\n",
    "\n",
    "$$\n",
    "\\min_x F(x)^T F(x)\n",
    "$$\n",
    "\n",
    "While $ F(x)^T F(x) \\to R $, and hence this problem could technically use any nonlinear optimizer, it is useful to exploit the structure of the problem\n",
    "\n",
    "In particular, the Jacobian of $ F(x) $, can be used to approximate the Hessian of the objective\n",
    "\n",
    "As with most nonlinear optimization problems, the benefits will typically become evident only when analytical or automatic differentiation is possible\n",
    "\n",
    "If $ M = N $ and we know a root $ F(x^*) = 0 $ to the system of equations exists, then NLS is the defacto method for solving large **systems of equations**\n",
    "\n",
    "An implementation of NLS is given in [LeastSquaresOptim.jl](https://github.com/matthieugomez/LeastSquaresOptim.jl)\n",
    "\n",
    "From the documentation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "using LeastSquaresOptim\n",
    "function rosenbrock(x)\n",
    "    [1 - x[1], 100 * (x[2]-x[1]^2)]\n",
    "end\n",
    "LeastSquaresOptim.optimize(rosenbrock, zeros(2), Dogleg())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** Because there is a name clash between `Optim.jl` and this package, to use both we need to qualify the use of the `optimize` function (i.e. `LeastSquaresOptim.optimize`)\n",
    "\n",
    "Here, by default it will use AD with `ForwardDiff.jl` to calculate the Jacobian,\n",
    "but you could also provide your own calculation of the Jacobian (analytical or using finite differences) and/or calculate the function inplace"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "function rosenbrock_f!(out, x)\n",
    "    out[1] = 1 - x[1]\n",
    "    out[2] = 100 * (x[2]-x[1]^2)\n",
    "end\n",
    "LeastSquaresOptim.optimize!(LeastSquaresProblem(x = zeros(2),\n",
    "                                f! = rosenbrock_f!, output_length = 2))\n",
    "\n",
    "# if you want to use gradient\n",
    "function rosenbrock_g!(J, x)\n",
    "    J[1, 1] = -1\n",
    "    J[1, 2] = 0\n",
    "    J[2, 1] = -200 * x[1]\n",
    "    J[2, 2] = 100\n",
    "end\n",
    "LeastSquaresOptim.optimize!(LeastSquaresProblem(x = zeros(2),\n",
    "                                f! = rosenbrock_f!, g! = rosenbrock_g!, output_length = 2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Additional Notes\n",
    "\n",
    "Watch [this video](https://www.youtube.com/watch?v=vAp6nUMrKYg&feature=youtu.be) from one of Julia’s creators on automatic differentiation"
   ]
  }
 ],
 "metadata": {
  "filename": "optimization_solver_packages.rst",
  "kernelspec": {
   "display_name": "Julia 1.2",
   "language": "julia",
   "name": "julia-1.2"
  },
  "title": "Solvers, Optimizers, and Automatic Differentiation"
 },
 "nbformat": 4,
 "nbformat_minor": 2
}