"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib widget\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"from numpy import polyder, poly1d\n",
"import ipywidgets as wd\n",
"from scipy.optimize import newton"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Newton's method is a standard algorithm to solve non-linear equations, i.e. equations of the form\n",
"$$ F(x)=0, $$\n",
"where $F:\\mathbb{R}^m \\to \\mathbb{R}^m$ is a non-linear function. Solving such a non-linear equation can be very difficult in general, yet solving linear equations is easy and well understood. The idea of Newton's method is to assume that $F$ is differentiable and thus reducing the problem of solving a non-linear equation to solving a linear equation by approximating $F$ by its differential $\\nabla F$. This only yields an approximate solution, but when applied iteratively, one can hope to converge to the exact solution"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# The 1D case"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We first consider the one-dimensional case of a function $f:\\mathbb{R} \\to \\mathbb{R}$ and we have to assume that $f$ is differentiable at least once. We also assume that we are given a start value $x_0$. The tangent to the graph of $f$ through the point $(x_0, f(x_0))$ is given by\n",
"$$t_0(x) = f'(x_0)(x-x_0) + f(x_0)$$\n",
"The root $x_1$ of the tangent can be found easily as\n",
"$$ 0 = t_0(x_1) = f'(x_0)(x_1-x_0) + f(x_0) \\Longleftrightarrow x_1 = x_0 - \\frac{f(x_0)}{f'(x_0)} $$\n",
"Thus, we take $x_1$ as our improved guess of the root of $f$. If we are not yet happy with that approximation, we can successively continue this, which yields the following recursively defined sequence, called the **Newton's method sequence**:\n",
"\n",
"\\begin{align}\n",
" x_{n+1} = x_n - \\frac{f(x_n)}{f'(x_n)}\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example: Polynomials"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As an example we consider 3rd order polynomials, i.e. the functions of the form\n",
"$$ p(x) = c_3 x^3 + c_2 x^2 + c_1 x^1 + c_0 $$\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def newton_1d_poly(c, x0, N):\n",
" \"\"\"\n",
" Calculates the Newton's sequence for a polynomial with coefficients c=(c_0, \\ldots, c_d) at initial value x0.\n",
" \n",
" params:\n",
" c : numpy array with coefficients c[i] corresponds to c_i\n",
" x0: initial value\n",
" N : number of iterations\n",
" \n",
" returns: numpy array x with x_0, \\ldots, x_N elements of the Newton sequence\n",
" \"\"\"\n",
" x = np.zeros(N+1)\n",
" x[0] = x0\n",
" c = c[::-1] # poly1d and polyder assume that coefficients are in decreasing order\n",
" p = poly1d(c)\n",
" dp = poly1d(polyder(c))\n",
" for n in range(N):\n",
" x[n+1] = x[n] - p(x[n]) / dp(x[n])\n",
" return x"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "16ffa9bc19804c7daeba0955b3c602e0",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "31d3292062ba4901bed9bc351d7282ed",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(FloatSlider(value=2.0, description='c0', max=3.0, min=-3.0), FloatSlider(value=1.0, desc…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig_netwton, ax_newton = plt.subplots()\n",
"\n",
"@wd.interact(c0=wd.FloatSlider(min=-3, max=3, value=2),\n",
" c1=wd.FloatSlider(min=-3, max=3, value=1),\n",
" c2=wd.FloatSlider(min=-3, max=3, value=2),\n",
" c3=wd.FloatSlider(min=-3, max=3, value=0.5),\n",
" x0=wd.FloatSlider(min=-5, max=5, value=-3),\n",
" N=wd.IntSlider(min=1, max=20, value=1),\n",
" xlim_min=wd.FloatSlider(min=-10, max=0, value=-7),\n",
" xlim_max=wd.FloatSlider(min=0, max=10, value=5),\n",
" ylim_min=wd.FloatSlider(min=-50, max=0, value=-10),\n",
" ylim_max=wd.FloatSlider(min=0, max=50, value=20),\n",
" )\n",
"def plot_1d_example(c0, c1, c2, c3, x0, N, xlim_min, xlim_max, ylim_min, ylim_max):\n",
" ax_newton.clear()\n",
" c = np.array([c0, c1, c2, c3])\n",
" xgrid = np.linspace(xlim_min, xlim_max, 100)\n",
" ygrid = poly1d(c[::-1])(xgrid)\n",
" x = newton_1d_poly(c, x0, N)\n",
" fig_netwton.suptitle('Newton\\'s Method: Iteration %i' % N)\n",
" ax_newton.plot(xgrid, ygrid, label='f')\n",
" ax_newton.axhline(y=0, color='k')\n",
" ax_newton.scatter(x, np.zeros_like(x), marker='o', facecolors='none', edgecolors='r')\n",
" ax_newton.scatter(x[N], 0, marker='x', color='r')\n",
" ax_newton.plot(xgrid,poly1d(polyder(c[::-1]))(x[N-1])*(xgrid -x[N-1]) + poly1d(c[::-1])(x[N-1]), color='g', label='tangent')\n",
" ax_newton.set_xlim([xlim_min, xlim_max])\n",
" ax_newton.set_ylim([ylim_min, ylim_max])\n",
" ax_newton.set_xlabel('x')\n",
" ax_newton.set_ylabel('y')\n",
" ax_newton.legend()\n",
" plt.show()\n",
" print(\"Solution (current guess): %f\" % x[N])\n",
" print(\"Solution (scipy): %f\" % newton(poly1d(c[::-1]), x0))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example: Square Root"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One of the most popular examples of the Newton's method is the calculation of the square root of a number $a \\in \\mathbb{R}_{>0}$. This can be expressed as the root of $f_a:\\mathbb{R}_{>0} \\to \\mathbb{R}$, $x \\mapsto 1 - \\frac{a}{x^2}$. We choose $x_0 := a$ as the initial guess and the Newton sequence in this case is given by\n",
"$$ x_{n+1} \n",
"= x_n - \\frac{f_a(x_n)}{f_a'(x_n)}\n",
"= x_n - \\frac{1-\\frac{a}{x_n^2}}{2ax_n^{-3}}\n",
"= x_n - \\frac{x_n^3}{2a} + \\frac{x_n}{2}\n",
"= \\frac{x_n}{2}\\Big( 3 - \\frac{x_n^2}{a} \\Big)$$\n",
"This sequence is called *Heron's method* and converges quite quickly."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4e354d218191417fa5613d08e78a25a9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "7a847a01691d422c9a5752a32204de07",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(FloatSlider(value=2.0, description='a', max=3.0, min=1.0), IntSlider(value=10, descripti…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig_root, ax_root = plt.subplots()\n",
"\n",
"@wd.interact(a=wd.FloatSlider(min=1, max=3, value=2),\n",
" N=wd.IntSlider(min=1, max=100, value=10))\n",
"def plot_square_root(a, N):\n",
" fig_root.suptitle('Heron\\'s method for a=%0.2f' % a)\n",
" ax_root.clear()\n",
" x = np.zeros(N+1)\n",
" x[0] = a\n",
" for n in range(N):\n",
" x[n+1] = x[n]/2*(3 - x[n]**2/a)\n",
" ax_root.plot(range(N+1), x, label='heron approx')\n",
" ax_root.axhline(y=np.sqrt(a), color='k', label='exact')\n",
" ax_root.set_ylim([0, 3])\n",
" ax_root.set_xlabel('num iterations')\n",
" ax_root.set_ylabel('value')\n",
" ax_root.legend()\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Multi-Dimensional Newton's Method\n",
"\n",
"The Newton's method can be applied to multivariate functions $F:\\mathbb{R}^m \\to \\mathbb{R}^m$ in the same fashion. The Newton sequence in this case is given by\n",
"\\begin{align}\n",
" x_{n+1} = x_n - \\nabla F(x_n)^{-1} F(x_n).\n",
"\\end{align}"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"def F(x):\n",
" return np.array([0.2 * x[0]**2 + x[1] - 1, \n",
" 0.5*x[1]**3 + x[0] + 1])\n",
"\n",
"def dF(x):\n",
" return np.array([[0.4 * x[0], 1],\n",
" [1, 1.5*x[1]**2]])\n",
"\n",
"def newton_naive(F, dF, x0, N):\n",
" X = np.zeros((N+1, x0.shape[0]))\n",
" X[0, :] = x0\n",
" for n in range(N):\n",
" X[n+1, :] = X[n, :] - np.linalg.inv(dF(X[n, :])) @ F(X[n, :]) \n",
" return X"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"X, Y = np.meshgrid(np.linspace(0, 5, 20), \n",
" np.linspace(-4, 1, 20))\n",
"X = X.flatten()\n",
"Y = Y.flatten()\n",
"ZX, ZY = np.array(np.array([F(np.array([x, y])) for x,y in zip(X,Y)])).T"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3b6c41fe915f40d68287d56788c81f4d",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5bf075b42dd24891b9e50031fb6a5b15",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"interactive(children=(FloatSlider(value=2.1, description='x0x', max=5.0, min=-5.0), FloatSlider(value=-1.1, de…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig_netwton2d, ax_newton2d = plt.subplots()\n",
"fig_netwton2d.suptitle(\"Newton 2D\")\n",
"\n",
"@wd.interact(x0x=wd.FloatSlider(min=-5, max=5, value=2.1),\n",
" x0y=wd.FloatSlider(min=-5, max=5, value=-1.1),\n",
" N=wd.IntSlider(min=1, max=100, value=1))\n",
"def plot_newton(x0x, x0y, N):\n",
" x0 = np.array([x0x, x0y])\n",
" ax_newton2d.clear()\n",
" ax_newton2d.quiver(X, Y, ZX, ZY)\n",
" Xnwt = newton_naive(F, dF, x0, N)\n",
" ax_newton2d.scatter(Xnwt[:,0], Xnwt[:,1], marker='o', facecolors='none', edgecolors='r')\n",
" ax_newton2d.set_xlim([0, 5])\n",
" ax_newton2d.set_ylim([-4, 1])\n",
" plt.show()\n",
" print(\"Solution (current guess): \" + str(Xnwt[N, :]))\n",
" print(\"Solution (scipy): \" + str(newton(F, x0, maxiter=100)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Netwon's Method for Optimization\n",
"Newton's method can be used for optimization: If one wants to find a solution to\n",
"$$ \\min_{x \\in \\mathbb{R}}{f(x)} $$\n",
"then a neccessary condition for a local optimum $x^*$ is $f'(x^*)=0$. Thus by applying Newton's method to $f'$ one can find a local optimum of $f$. The Newton sequence is then given by\n",
"$$ x_{n+1} = x_n - \\frac{f'(x_n)}{f''(x_n)}. $$\n",
"As we can see, this requires $f$ to be twice differentiable and requires the computation of second order derivatives. This also applies in multiple dimensions: If $f:\\mathbb{R}^m \\to \\mathbb{R}$, then a necessary condition for $x^*$ to be a local optimum is that $\\nabla F: \\mathbb{R}^m \\to \\mathbb{R}^m$ vanishes at $x^*$. The Newton sequence requires the computation of the Hessian $\\nabla^2 F: \\mathbb{R}^m \\to \\mathbb{R}^{m \\times m}$:\n",
"$$ x_{n+1} = x_n - \\nabla^2 F(x_n)^{-1} \\nabla F(x_n)$$\n",
"For that reason, other methods such as *gradient descent* are sometimes used."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Properties of the Newton's Method\n",
"\n",
"**1. Local Quadratic Convergence:** The Newton sequence converges quadratically against the root in a local neighbourhood around the root. Thus, the choice of a good starting value is key. In particular, if $F$ has multiple roots, Newton's method finds one root, not all of them.\n",
"\n",
"**2. Singluar Jacobian:** If the Jacobian $\\nabla F$ fails to be invertible, the computation of the next element in the sequence becomes problematic.\n",
"\n",
"**3. Cycles:** For certain values, the sequence can become periodic, thus failing to converge.\n",
"\n",
"**4. Computational Complexities:** Calculating the full Jacobian $\\nabla F$ and solving the resulting system of linear equations in every step can be quite intense computationally.\n",
"\n",
"The last step can be addressed by replacing the full Jacobian $\\nabla F$ by an approximation. Those methods are called *Quasi-Newton methods*. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "heroku",
"language": "python",
"name": "heroku"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.12"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}