{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Newton's Method

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
(C) Nikolai Nowaczyk, Jörg Kienitz 2019-2021
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib widget\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from numpy import polyder, poly1d\n", "import ipywidgets as wd\n", "from scipy.optimize import newton" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Newton's method is a standard algorithm to solve non-linear equations, i.e. equations of the form\n", "$$ F(x)=0, $$\n", "where $F:\\mathbb{R}^m \\to \\mathbb{R}^m$ is a non-linear function. Solving such a non-linear equation can be very difficult in general, yet solving linear equations is easy and well understood. The idea of Newton's method is to assume that $F$ is differentiable and thus reducing the problem of solving a non-linear equation to solving a linear equation by approximating $F$ by its differential $\\nabla F$. This only yields an approximate solution, but when applied iteratively, one can hope to converge to the exact solution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The 1D case" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first consider the one-dimensional case of a function $f:\\mathbb{R} \\to \\mathbb{R}$ and we have to assume that $f$ is differentiable at least once. We also assume that we are given a start value $x_0$. The tangent to the graph of $f$ through the point $(x_0, f(x_0))$ is given by\n", "$$t_0(x) = f'(x_0)(x-x_0) + f(x_0)$$\n", "The root $x_1$ of the tangent can be found easily as\n", "$$ 0 = t_0(x_1) = f'(x_0)(x_1-x_0) + f(x_0) \\Longleftrightarrow x_1 = x_0 - \\frac{f(x_0)}{f'(x_0)} $$\n", "Thus, we take $x_1$ as our improved guess of the root of $f$. If we are not yet happy with that approximation, we can successively continue this, which yields the following recursively defined sequence, called the **Newton's method sequence**:\n", "\n", "\\begin{align}\n", " x_{n+1} = x_n - \\frac{f(x_n)}{f'(x_n)}\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: Polynomials" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As an example we consider 3rd order polynomials, i.e. the functions of the form\n", "$$ p(x) = c_3 x^3 + c_2 x^2 + c_1 x^1 + c_0 $$\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def newton_1d_poly(c, x0, N):\n", " \"\"\"\n", " Calculates the Newton's sequence for a polynomial with coefficients c=(c_0, \\ldots, c_d) at initial value x0.\n", " \n", " params:\n", " c : numpy array with coefficients c[i] corresponds to c_i\n", " x0: initial value\n", " N : number of iterations\n", " \n", " returns: numpy array x with x_0, \\ldots, x_N elements of the Newton sequence\n", " \"\"\"\n", " x = np.zeros(N+1)\n", " x[0] = x0\n", " c = c[::-1] # poly1d and polyder assume that coefficients are in decreasing order\n", " p = poly1d(c)\n", " dp = poly1d(polyder(c))\n", " for n in range(N):\n", " x[n+1] = x[n] - p(x[n]) / dp(x[n])\n", " return x" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "16ffa9bc19804c7daeba0955b3c602e0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "31d3292062ba4901bed9bc351d7282ed", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(FloatSlider(value=2.0, description='c0', max=3.0, min=-3.0), FloatSlider(value=1.0, desc…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig_netwton, ax_newton = plt.subplots()\n", "\n", "@wd.interact(c0=wd.FloatSlider(min=-3, max=3, value=2),\n", " c1=wd.FloatSlider(min=-3, max=3, value=1),\n", " c2=wd.FloatSlider(min=-3, max=3, value=2),\n", " c3=wd.FloatSlider(min=-3, max=3, value=0.5),\n", " x0=wd.FloatSlider(min=-5, max=5, value=-3),\n", " N=wd.IntSlider(min=1, max=20, value=1),\n", " xlim_min=wd.FloatSlider(min=-10, max=0, value=-7),\n", " xlim_max=wd.FloatSlider(min=0, max=10, value=5),\n", " ylim_min=wd.FloatSlider(min=-50, max=0, value=-10),\n", " ylim_max=wd.FloatSlider(min=0, max=50, value=20),\n", " )\n", "def plot_1d_example(c0, c1, c2, c3, x0, N, xlim_min, xlim_max, ylim_min, ylim_max):\n", " ax_newton.clear()\n", " c = np.array([c0, c1, c2, c3])\n", " xgrid = np.linspace(xlim_min, xlim_max, 100)\n", " ygrid = poly1d(c[::-1])(xgrid)\n", " x = newton_1d_poly(c, x0, N)\n", " fig_netwton.suptitle('Newton\\'s Method: Iteration %i' % N)\n", " ax_newton.plot(xgrid, ygrid, label='f')\n", " ax_newton.axhline(y=0, color='k')\n", " ax_newton.scatter(x, np.zeros_like(x), marker='o', facecolors='none', edgecolors='r')\n", " ax_newton.scatter(x[N], 0, marker='x', color='r')\n", " ax_newton.plot(xgrid,poly1d(polyder(c[::-1]))(x[N-1])*(xgrid -x[N-1]) + poly1d(c[::-1])(x[N-1]), color='g', label='tangent')\n", " ax_newton.set_xlim([xlim_min, xlim_max])\n", " ax_newton.set_ylim([ylim_min, ylim_max])\n", " ax_newton.set_xlabel('x')\n", " ax_newton.set_ylabel('y')\n", " ax_newton.legend()\n", " plt.show()\n", " print(\"Solution (current guess): %f\" % x[N])\n", " print(\"Solution (scipy): %f\" % newton(poly1d(c[::-1]), x0))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: Square Root" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the most popular examples of the Newton's method is the calculation of the square root of a number $a \\in \\mathbb{R}_{>0}$. This can be expressed as the root of $f_a:\\mathbb{R}_{>0} \\to \\mathbb{R}$, $x \\mapsto 1 - \\frac{a}{x^2}$. We choose $x_0 := a$ as the initial guess and the Newton sequence in this case is given by\n", "$$ x_{n+1} \n", "= x_n - \\frac{f_a(x_n)}{f_a'(x_n)}\n", "= x_n - \\frac{1-\\frac{a}{x_n^2}}{2ax_n^{-3}}\n", "= x_n - \\frac{x_n^3}{2a} + \\frac{x_n}{2}\n", "= \\frac{x_n}{2}\\Big( 3 - \\frac{x_n^2}{a} \\Big)$$\n", "This sequence is called *Heron's method* and converges quite quickly." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4e354d218191417fa5613d08e78a25a9", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "7a847a01691d422c9a5752a32204de07", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(FloatSlider(value=2.0, description='a', max=3.0, min=1.0), IntSlider(value=10, descripti…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig_root, ax_root = plt.subplots()\n", "\n", "@wd.interact(a=wd.FloatSlider(min=1, max=3, value=2),\n", " N=wd.IntSlider(min=1, max=100, value=10))\n", "def plot_square_root(a, N):\n", " fig_root.suptitle('Heron\\'s method for a=%0.2f' % a)\n", " ax_root.clear()\n", " x = np.zeros(N+1)\n", " x[0] = a\n", " for n in range(N):\n", " x[n+1] = x[n]/2*(3 - x[n]**2/a)\n", " ax_root.plot(range(N+1), x, label='heron approx')\n", " ax_root.axhline(y=np.sqrt(a), color='k', label='exact')\n", " ax_root.set_ylim([0, 3])\n", " ax_root.set_xlabel('num iterations')\n", " ax_root.set_ylabel('value')\n", " ax_root.legend()\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Multi-Dimensional Newton's Method\n", "\n", "The Newton's method can be applied to multivariate functions $F:\\mathbb{R}^m \\to \\mathbb{R}^m$ in the same fashion. The Newton sequence in this case is given by\n", "\\begin{align}\n", " x_{n+1} = x_n - \\nabla F(x_n)^{-1} F(x_n).\n", "\\end{align}" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def F(x):\n", " return np.array([0.2 * x[0]**2 + x[1] - 1, \n", " 0.5*x[1]**3 + x[0] + 1])\n", "\n", "def dF(x):\n", " return np.array([[0.4 * x[0], 1],\n", " [1, 1.5*x[1]**2]])\n", "\n", "def newton_naive(F, dF, x0, N):\n", " X = np.zeros((N+1, x0.shape[0]))\n", " X[0, :] = x0\n", " for n in range(N):\n", " X[n+1, :] = X[n, :] - np.linalg.inv(dF(X[n, :])) @ F(X[n, :]) \n", " return X" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "X, Y = np.meshgrid(np.linspace(0, 5, 20), \n", " np.linspace(-4, 1, 20))\n", "X = X.flatten()\n", "Y = Y.flatten()\n", "ZX, ZY = np.array(np.array([F(np.array([x, y])) for x,y in zip(X,Y)])).T" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "3b6c41fe915f40d68287d56788c81f4d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "5bf075b42dd24891b9e50031fb6a5b15", "version_major": 2, "version_minor": 0 }, "text/plain": [ "interactive(children=(FloatSlider(value=2.1, description='x0x', max=5.0, min=-5.0), FloatSlider(value=-1.1, de…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig_netwton2d, ax_newton2d = plt.subplots()\n", "fig_netwton2d.suptitle(\"Newton 2D\")\n", "\n", "@wd.interact(x0x=wd.FloatSlider(min=-5, max=5, value=2.1),\n", " x0y=wd.FloatSlider(min=-5, max=5, value=-1.1),\n", " N=wd.IntSlider(min=1, max=100, value=1))\n", "def plot_newton(x0x, x0y, N):\n", " x0 = np.array([x0x, x0y])\n", " ax_newton2d.clear()\n", " ax_newton2d.quiver(X, Y, ZX, ZY)\n", " Xnwt = newton_naive(F, dF, x0, N)\n", " ax_newton2d.scatter(Xnwt[:,0], Xnwt[:,1], marker='o', facecolors='none', edgecolors='r')\n", " ax_newton2d.set_xlim([0, 5])\n", " ax_newton2d.set_ylim([-4, 1])\n", " plt.show()\n", " print(\"Solution (current guess): \" + str(Xnwt[N, :]))\n", " print(\"Solution (scipy): \" + str(newton(F, x0, maxiter=100)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Netwon's Method for Optimization\n", "Newton's method can be used for optimization: If one wants to find a solution to\n", "$$ \\min_{x \\in \\mathbb{R}}{f(x)} $$\n", "then a neccessary condition for a local optimum $x^*$ is $f'(x^*)=0$. Thus by applying Newton's method to $f'$ one can find a local optimum of $f$. The Newton sequence is then given by\n", "$$ x_{n+1} = x_n - \\frac{f'(x_n)}{f''(x_n)}. $$\n", "As we can see, this requires $f$ to be twice differentiable and requires the computation of second order derivatives. This also applies in multiple dimensions: If $f:\\mathbb{R}^m \\to \\mathbb{R}$, then a necessary condition for $x^*$ to be a local optimum is that $\\nabla F: \\mathbb{R}^m \\to \\mathbb{R}^m$ vanishes at $x^*$. The Newton sequence requires the computation of the Hessian $\\nabla^2 F: \\mathbb{R}^m \\to \\mathbb{R}^{m \\times m}$:\n", "$$ x_{n+1} = x_n - \\nabla^2 F(x_n)^{-1} \\nabla F(x_n)$$\n", "For that reason, other methods such as *gradient descent* are sometimes used." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Properties of the Newton's Method\n", "\n", "**1. Local Quadratic Convergence:** The Newton sequence converges quadratically against the root in a local neighbourhood around the root. Thus, the choice of a good starting value is key. In particular, if $F$ has multiple roots, Newton's method finds one root, not all of them.\n", "\n", "**2. Singluar Jacobian:** If the Jacobian $\\nabla F$ fails to be invertible, the computation of the next element in the sequence becomes problematic.\n", "\n", "**3. Cycles:** For certain values, the sequence can become periodic, thus failing to converge.\n", "\n", "**4. Computational Complexities:** Calculating the full Jacobian $\\nabla F$ and solving the resulting system of linear equations in every step can be quite intense computationally.\n", "\n", "The last step can be addressed by replacing the full Jacobian $\\nabla F$ by an approximation. Those methods are called *Quasi-Newton methods*. " ] } ], "metadata": { "kernelspec": { "display_name": "heroku", "language": "python", "name": "heroku" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.12" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }