{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p style=\"font-size:30px; text-align:center\"><b>Newton's Method</b></p>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"text-align: right\">(C) <a href=\"https://github.com/niknow\">Nikolai Nowaczyk</a>, <a href=\"https://github.com/Lapsilago\">Jörg Kienitz</a> 2019-2021</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib widget\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "from numpy import polyder, poly1d\n",
    "import ipywidgets as wd\n",
    "from scipy.optimize import newton"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Newton's method is a standard algorithm to solve non-linear equations, i.e. equations of the form\n",
    "$$ F(x)=0, $$\n",
    "where $F:\\mathbb{R}^m \\to \\mathbb{R}^m$ is a non-linear function. Solving such a non-linear equation can be very difficult in general, yet solving linear equations is easy and well understood. The idea of Newton's method is to assume that $F$ is differentiable and thus reducing the problem of solving a non-linear equation to solving a linear equation by approximating $F$ by its differential $\\nabla F$. This only yields an approximate solution, but when applied iteratively, one can hope to converge to the exact solution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The 1D case"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We first consider the one-dimensional case of a function $f:\\mathbb{R} \\to \\mathbb{R}$ and we have to assume that $f$ is differentiable at least once. We also assume that we are given a start value $x_0$. The tangent to the graph of $f$ through the point $(x_0, f(x_0))$ is given by\n",
    "$$t_0(x) = f'(x_0)(x-x_0) + f(x_0)$$\n",
    "The root $x_1$ of the tangent can be found easily as\n",
    "$$ 0 = t_0(x_1) = f'(x_0)(x_1-x_0) + f(x_0) \\Longleftrightarrow x_1 = x_0 - \\frac{f(x_0)}{f'(x_0)} $$\n",
    "Thus, we take $x_1$ as our improved guess of the root of $f$. If we are not yet happy with that approximation, we can successively continue this, which yields the following recursively defined sequence, called the **Newton's method sequence**:\n",
    "\n",
    "\\begin{align}\n",
    "    x_{n+1} = x_n - \\frac{f(x_n)}{f'(x_n)}\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: Polynomials"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As an example we consider 3rd order polynomials, i.e. the functions of the form\n",
    "$$ p(x) = c_3 x^3 + c_2 x^2 +  c_1 x^1 + c_0 $$\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def newton_1d_poly(c, x0, N):\n",
    "    \"\"\"\n",
    "    Calculates the Newton's sequence for a polynomial with coefficients c=(c_0, \\ldots, c_d) at initial value x0.\n",
    "    \n",
    "    params:\n",
    "    c : numpy array with coefficients c[i] corresponds to c_i\n",
    "    x0: initial value\n",
    "    N : number of iterations\n",
    "    \n",
    "    returns: numpy array x with x_0, \\ldots, x_N elements of the Newton sequence\n",
    "    \"\"\"\n",
    "    x = np.zeros(N+1)\n",
    "    x[0] = x0\n",
    "    c = c[::-1] # poly1d and polyder assume that coefficients are in decreasing order\n",
    "    p = poly1d(c)\n",
    "    dp = poly1d(polyder(c))\n",
    "    for n in range(N):\n",
    "        x[n+1] = x[n] - p(x[n]) / dp(x[n])\n",
    "    return x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "16ffa9bc19804c7daeba0955b3c602e0",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to  previous…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "31d3292062ba4901bed9bc351d7282ed",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "interactive(children=(FloatSlider(value=2.0, description='c0', max=3.0, min=-3.0), FloatSlider(value=1.0, desc…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "fig_netwton, ax_newton = plt.subplots()\n",
    "\n",
    "@wd.interact(c0=wd.FloatSlider(min=-3, max=3, value=2),\n",
    "             c1=wd.FloatSlider(min=-3, max=3, value=1),\n",
    "             c2=wd.FloatSlider(min=-3, max=3, value=2),\n",
    "             c3=wd.FloatSlider(min=-3, max=3, value=0.5),\n",
    "             x0=wd.FloatSlider(min=-5, max=5, value=-3),\n",
    "             N=wd.IntSlider(min=1, max=20, value=1),\n",
    "             xlim_min=wd.FloatSlider(min=-10, max=0, value=-7),\n",
    "             xlim_max=wd.FloatSlider(min=0, max=10, value=5),\n",
    "             ylim_min=wd.FloatSlider(min=-50, max=0, value=-10),\n",
    "             ylim_max=wd.FloatSlider(min=0, max=50, value=20),\n",
    "            )\n",
    "def plot_1d_example(c0, c1, c2, c3, x0, N, xlim_min, xlim_max, ylim_min, ylim_max):\n",
    "    ax_newton.clear()\n",
    "    c = np.array([c0, c1, c2, c3])\n",
    "    xgrid = np.linspace(xlim_min, xlim_max, 100)\n",
    "    ygrid = poly1d(c[::-1])(xgrid)\n",
    "    x = newton_1d_poly(c, x0, N)\n",
    "    fig_netwton.suptitle('Newton\\'s Method: Iteration %i' % N)\n",
    "    ax_newton.plot(xgrid, ygrid, label='f')\n",
    "    ax_newton.axhline(y=0, color='k')\n",
    "    ax_newton.scatter(x, np.zeros_like(x), marker='o', facecolors='none', edgecolors='r')\n",
    "    ax_newton.scatter(x[N], 0, marker='x', color='r')\n",
    "    ax_newton.plot(xgrid,poly1d(polyder(c[::-1]))(x[N-1])*(xgrid -x[N-1]) + poly1d(c[::-1])(x[N-1]), color='g', label='tangent')\n",
    "    ax_newton.set_xlim([xlim_min, xlim_max])\n",
    "    ax_newton.set_ylim([ylim_min, ylim_max])\n",
    "    ax_newton.set_xlabel('x')\n",
    "    ax_newton.set_ylabel('y')\n",
    "    ax_newton.legend()\n",
    "    plt.show()\n",
    "    print(\"Solution (current guess): %f\" % x[N])\n",
    "    print(\"Solution (scipy): %f\" % newton(poly1d(c[::-1]), x0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: Square Root"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One of the most popular examples of the Newton's method is the calculation of the square root of a number $a \\in \\mathbb{R}_{>0}$. This can be expressed as the root of $f_a:\\mathbb{R}_{>0} \\to \\mathbb{R}$, $x \\mapsto 1 - \\frac{a}{x^2}$. We choose $x_0 := a$ as the initial guess and the Newton sequence in this case is given by\n",
    "$$ x_{n+1} \n",
    "= x_n - \\frac{f_a(x_n)}{f_a'(x_n)}\n",
    "= x_n - \\frac{1-\\frac{a}{x_n^2}}{2ax_n^{-3}}\n",
    "= x_n - \\frac{x_n^3}{2a} + \\frac{x_n}{2}\n",
    "= \\frac{x_n}{2}\\Big( 3 - \\frac{x_n^2}{a} \\Big)$$\n",
    "This sequence is called *Heron's method* and converges quite quickly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "4e354d218191417fa5613d08e78a25a9",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to  previous…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7a847a01691d422c9a5752a32204de07",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "interactive(children=(FloatSlider(value=2.0, description='a', max=3.0, min=1.0), IntSlider(value=10, descripti…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "fig_root, ax_root = plt.subplots()\n",
    "\n",
    "@wd.interact(a=wd.FloatSlider(min=1, max=3, value=2),\n",
    "             N=wd.IntSlider(min=1, max=100, value=10))\n",
    "def plot_square_root(a, N):\n",
    "    fig_root.suptitle('Heron\\'s method for a=%0.2f' % a)\n",
    "    ax_root.clear()\n",
    "    x = np.zeros(N+1)\n",
    "    x[0] = a\n",
    "    for n in range(N):\n",
    "        x[n+1] = x[n]/2*(3 - x[n]**2/a)\n",
    "    ax_root.plot(range(N+1), x, label='heron approx')\n",
    "    ax_root.axhline(y=np.sqrt(a), color='k', label='exact')\n",
    "    ax_root.set_ylim([0, 3])\n",
    "    ax_root.set_xlabel('num iterations')\n",
    "    ax_root.set_ylabel('value')\n",
    "    ax_root.legend()\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multi-Dimensional Newton's Method\n",
    "\n",
    "The Newton's method can be applied to multivariate functions $F:\\mathbb{R}^m \\to \\mathbb{R}^m$ in the same fashion. The Newton sequence in this case is given by\n",
    "\\begin{align}\n",
    "    x_{n+1} = x_n - \\nabla F(x_n)^{-1} F(x_n).\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def F(x):\n",
    "    return np.array([0.2 * x[0]**2 + x[1] - 1, \n",
    "                     0.5*x[1]**3 + x[0] + 1])\n",
    "\n",
    "def dF(x):\n",
    "    return np.array([[0.4 * x[0], 1],\n",
    "                     [1, 1.5*x[1]**2]])\n",
    "\n",
    "def newton_naive(F, dF, x0, N):\n",
    "    X = np.zeros((N+1, x0.shape[0]))\n",
    "    X[0, :] = x0\n",
    "    for n in range(N):\n",
    "        X[n+1, :] = X[n, :] - np.linalg.inv(dF(X[n, :])) @ F(X[n, :]) \n",
    "    return X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "X, Y = np.meshgrid(np.linspace(0, 5, 20), \n",
    "                   np.linspace(-4, 1, 20))\n",
    "X = X.flatten()\n",
    "Y = Y.flatten()\n",
    "ZX, ZY = np.array(np.array([F(np.array([x, y])) for x,y in zip(X,Y)])).T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "3b6c41fe915f40d68287d56788c81f4d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to  previous…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5bf075b42dd24891b9e50031fb6a5b15",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "interactive(children=(FloatSlider(value=2.1, description='x0x', max=5.0, min=-5.0), FloatSlider(value=-1.1, de…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "fig_netwton2d, ax_newton2d = plt.subplots()\n",
    "fig_netwton2d.suptitle(\"Newton 2D\")\n",
    "\n",
    "@wd.interact(x0x=wd.FloatSlider(min=-5, max=5, value=2.1),\n",
    "             x0y=wd.FloatSlider(min=-5, max=5, value=-1.1),\n",
    "             N=wd.IntSlider(min=1, max=100, value=1))\n",
    "def plot_newton(x0x, x0y, N):\n",
    "    x0 = np.array([x0x, x0y])\n",
    "    ax_newton2d.clear()\n",
    "    ax_newton2d.quiver(X, Y, ZX, ZY)\n",
    "    Xnwt = newton_naive(F, dF, x0, N)\n",
    "    ax_newton2d.scatter(Xnwt[:,0], Xnwt[:,1], marker='o', facecolors='none', edgecolors='r')\n",
    "    ax_newton2d.set_xlim([0, 5])\n",
    "    ax_newton2d.set_ylim([-4, 1])\n",
    "    plt.show()\n",
    "    print(\"Solution (current guess): \" + str(Xnwt[N, :]))\n",
    "    print(\"Solution (scipy):  \" + str(newton(F, x0, maxiter=100)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Netwon's Method for Optimization\n",
    "Newton's method can be used for optimization: If one wants to find a solution to\n",
    "$$ \\min_{x \\in \\mathbb{R}}{f(x)} $$\n",
    "then a neccessary condition for a local optimum $x^*$ is $f'(x^*)=0$. Thus by applying Newton's method to $f'$ one can find a local optimum of $f$. The Newton sequence is then given by\n",
    "$$ x_{n+1} = x_n - \\frac{f'(x_n)}{f''(x_n)}. $$\n",
    "As we can see, this requires $f$ to be twice differentiable and requires the computation of second order derivatives. This also applies in multiple dimensions: If $f:\\mathbb{R}^m \\to \\mathbb{R}$, then a necessary condition for $x^*$ to be a local optimum is that $\\nabla F: \\mathbb{R}^m \\to \\mathbb{R}^m$ vanishes at $x^*$. The Newton sequence requires the computation of the Hessian $\\nabla^2 F: \\mathbb{R}^m \\to \\mathbb{R}^{m \\times m}$:\n",
    "$$ x_{n+1} = x_n - \\nabla^2 F(x_n)^{-1} \\nabla F(x_n)$$\n",
    "For that reason, other methods such as *gradient descent* are sometimes used."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Properties of the Newton's Method\n",
    "\n",
    "**1. Local Quadratic Convergence:** The Newton sequence converges quadratically against the root in a local neighbourhood around the root. Thus, the choice of a good starting value is key. In particular, if $F$ has multiple roots, Newton's method finds one root, not all of them.\n",
    "\n",
    "**2. Singluar Jacobian:** If the Jacobian $\\nabla F$ fails to be invertible, the computation of the next element in the sequence becomes problematic.\n",
    "\n",
    "**3. Cycles:** For certain values, the sequence can become periodic, thus failing to converge.\n",
    "\n",
    "**4. Computational Complexities:** Calculating the full Jacobian $\\nabla F$ and solving the resulting system of linear equations in every step can be quite intense computationally.\n",
    "\n",
    "The last step can be addressed by replacing the full Jacobian $\\nabla F$ by an approximation. Those methods are called *Quasi-Newton methods*. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "heroku",
   "language": "python",
   "name": "heroku"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.12"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}