{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "
\n", " \n", " \"QuantEcon\"\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Krylov Methods and Matrix Conditioning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents\n", "\n", "- [Krylov Methods and Matrix Conditioning](#Krylov-Methods-and-Matrix-Conditioning) \n", " - [Overview](#Overview) \n", " - [Ill-Conditioned Matrices](#Ill-Conditioned-Matrices) \n", " - [Stationary Iterative Algorithms for Linear Systems](#Stationary-Iterative-Algorithms-for-Linear-Systems) \n", " - [Krylov Methods](#Krylov-Methods) \n", " - [Iterative Methods for Linear Least Squares](#Iterative-Methods-for-Linear-Least-Squares) \n", " - [Iterative Methods for Eigensystems](#Iterative-Methods-for-Eigensystems) \n", " - [Krylov Methods for Markov-Chain Dynamics](#Krylov-Methods-for-Markov-Chain-Dynamics) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "This lecture takes the structure of [numerical methods for linear algebra](numerical_linear_algebra.html) and builds further\n", "toward working with large, sparse matrices. In the process, we will examine foundational numerical analysis such as\n", "ill-conditioned matrices." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "hide-output": true }, "outputs": [], "source": [ "using InstantiateFromURL\n", "# optionally add arguments to force installation: instantiate = true, precompile = true\n", "github_project(\"QuantEcon/quantecon-notebooks-julia\", version = \"0.8.0\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "hide-output": true }, "outputs": [], "source": [ "using LinearAlgebra, Statistics, BenchmarkTools, Random\n", "Random.seed!(42); # seed random numbers for reproducibility" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Applications\n", "\n", "In this section, we will consider variations on classic problems\n", "\n", "1. Solving a linear system for a square $ A $ where we will maintain throughout that there is a unique solution to \n", " $$\n", " A x = b\n", " $$\n", "1. [Linear least-squares](https://en.wikipedia.org/wiki/Linear_least_squares) solution, for a rectangular $ A $ \n", " $$\n", " \\min_x \\| Ax -b \\|^2\n", " $$\n", " From theory, we know that if $ A $ has linearly independent columns, then the solution is the [normal equation](https://en.wikipedia.org/wiki/Linear_least_squares#Derivation_of_the_normal_equations) \n", " $$\n", " x = (A'A)^{-1}A'b\n", " $$\n", "1. In the case of a square matrix $ A $, the eigenvalue problem is that of finding $ x $ and $ \\lambda $ such that \n", " $$\n", " A x = \\lambda x\n", " $$\n", " For eigenvalue problems, keep in mind that you do not always require all of the $ \\lambda $, and sometimes the largest (or smallest) would be enough. For example, calculating the spectral radius requires only the eigenvalue with maximum absolute value. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ill-Conditioned Matrices\n", "\n", "An important consideration in numerical linear algebra, and iterative methods in general, is the [condition number](https://en.wikipedia.org/wiki/Condition_number#Matrices).\n", "\n", "An ill-conditioned matrix is one where the basis eigenvectors are close to, but not exactly, collinear. While this poses no problem on pen and paper,\n", "or with infinite-precision numerical methods, it is important in practice, for two reasons:\n", "\n", "1. Ill-conditioned matrices introduce numerical errors roughly in proportion to the base-10 log of the condition number. \n", "1. The convergence speed of many iterative methods is based on the spectral properties of the matrices (e.g., the basis formed by the eigenvectors), and hence ill-conditioned systems can converge slowly. \n", "\n", "\n", "The solutions to these problems are to\n", "\n", "- be careful with operations which introduce error based on the condition number (e.g., matrix inversions when the condition number is high) \n", "- choose, where possible, alternative representations which have less collinearity (e.g., an orthogonal polynomial basis rather than a monomial one) \n", "- use a preconditioner for iterative methods, which changes the spectral properties to increase convergence speed " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Condition Number\n", "\n", "First, let’s define and explore the condition number $ \\kappa $\n", "\n", "$$\n", "\\kappa(A) \\equiv \\|A\\| \\|A^{-1}\\|\n", "$$\n", "\n", "where you can use the Cauchy–Schwarz inequality to show that $ \\kappa(A) \\geq 1 $. While the condition number can be calculated with any norm, we will focus on the 2-norm.\n", "\n", "First, a warning on calculations: Calculating the condition number for a matrix can be an expensive operation (as would calculating a determinant)\n", "and should be thought of as roughly equivalent to doing an eigendecomposition. So use it for detective work judiciously.\n", "\n", "Let’s look at the condition number of a few matrices using the `cond` function (which allows a choice of the norm, but we’ll stick with the default 2-norm)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = I(2)\n", "cond(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we see an example of the best-conditioned matrix, the identity matrix with its completely orthonormal basis, which has a condition number of 1.\n", "\n", "On the other hand, notice that" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "2.0000000000005004e6" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ϵ = 1E-6\n", "A = [1.0 0.0\n", " 1.0 ϵ]\n", "cond(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "has a condition number of order `10E6` - and hence (taking the base-10 log) you would expect to be introducing numerical errors of about 6 significant digits if you\n", "are not careful. For example, note that the inverse has both extremely large and extremely small negative numbers" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "2×2 Array{Float64,2}:\n", " 1.0 0.0\n", " -1.0e6 1.0e6" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inv(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we know that the determinant of nearly collinear matrices is close to zero, this shows another symptom of poor conditioning" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "1.0e-6" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "det(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, be careful since the determinant has a scale, while the condition number is dimensionless. That is," ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "det(1000A) = 1.0\n", "cond(1000A) = 2.0000000000005001e6\n" ] } ], "source": [ "@show det(1000 * A)\n", "@show cond(1000 * A);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In that case, the determinant of `A` is 1, while the condition number is unchanged. This example also provides some\n", "intuition that ill-conditioned matrices typically occur when a matrix has radically different scales (e.g., contains both `1` and `1E-6`, or `1000` and `1E-3`). This can occur frequently with both function approximation and linear least squares." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Condition Numbers and Matrix Operations\n", "\n", "Multiplying a matrix by a constant does not change the condition number. What about other operations?\n", "\n", "For this example, we see that the inverse has the same condition number (though this will not always be the case)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cond(A) = 2.0000000000005004e6\n", "cond(inv(A)) = 2.0000000002463197e6\n" ] } ], "source": [ "@show cond(A)\n", "@show cond(inv(A));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The condition number of the product of two matrices can change radically and lead things to becoming\n", "even more ill-conditioned.\n", "\n", "This comes up frequently when calculating the product of a matrix and its transpose (e.g., forming the covariance matrix). A classic example is the [Läuchli matrix](https://link.springer.com/article/10.1007%2FBF01386022)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "3×4 Array{Float64,2}:\n", " 1.0 1.0e-8 0.0 0.0\n", " 1.0 0.0 1.0e-8 0.0\n", " 1.0 0.0 0.0 1.0e-8" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lauchli(N, ϵ) = [ones(N)'; ϵ * I(N)]'\n", "ϵ = 1E-8\n", "L = lauchli(3, ϵ) |> Matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the condition number increases substantially" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cond(L) = 1.732050807568878e8\n", "cond(L' * L) = " ] }, { "name": "stdout", "output_type": "stream", "text": [ "5.345191558726545e32\n" ] } ], "source": [ "@show cond(L)\n", "@show cond(L' * L);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can show that the analytic eigenvalues of this are $ \\{3 + \\epsilon^2, \\epsilon^2, \\epsilon^2\\} $ but the poor conditioning\n", "means it is difficult to distinguish these from $ 0 $.\n", "\n", "This comes up when conducting [Principal Component Analysis](https://en.wikipedia.org/wiki/Principal_component_analysis#Singular_value_decomposition), which\n", "requires calculations of the eigenvalues of the covariance matrix" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "3-element Array{Complex{Float64},1}:\n", " 0.0 + 4.870456104375987e-9im\n", " 4.2146848510894035e-8 + 0.0im\n", " 1.7320508075688772 + 0.0im" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sort(sqrt.(Complex.(eigen(L*L').values)), lt = (x,y) -> abs(x) < abs(y))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that these are significantly different than the known analytic solution and, in particular, are difficult to distinguish from 0." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 1.0e-8\n", " 1.0e-8\n", " 1.7320508075688772" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sqrt.([3 + ϵ^2, ϵ^2, ϵ^2]) |> sort" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, we could calculate these by taking the square of the singular values of $ L $ itself, which is much more accurate\n", "and lets us clearly distinguish from zero" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 9.999999999999997e-9\n", " 1.0e-8\n", " 1.7320508075688774" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "svd(L).S |> sort" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, we are better off calculating least squares directly rather than forming the normal equation (i.e., $ A' A x = A' b $) ourselves" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "2502.05373776057" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N = 3\n", "A = lauchli(N, 1E-7)' |> Matrix\n", "b = rand(N+1)\n", "x_sol_1 = A \\ b # using a least-squares solver\n", "x_sol_2 = (A' * A) \\ (A' * b) # forming the normal equation ourselves\n", "norm(x_sol_1 - x_sol_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Why a Monomial Basis Is a Bad Idea\n", "\n", "A classic example of poorly conditioned matrices is using a monomial basis of a polynomial with interpolation.\n", "\n", "Take a grid of points, $ x_0, \\ldots x_N $ and values $ y_0, \\ldots y_N $ where we want to calculate the\n", "interpolating polynomial.\n", "\n", "If we were to use the simplest, and most obvious, polynomial basis, then the calculation consists of finding the coefficients $ c_1, \\ldots c_n $ where\n", "\n", "$$\n", "P(x) = \\sum_{i=0}^N c_i x^i\n", "$$\n", "\n", "To solve for the coefficients, we notice that this is a simple system of equations\n", "\n", "$$\n", "\\begin{array}\n", " \\,y_0 = c_0 + c_1 x_0 + \\ldots c_N x_0^N\\\\\n", " \\,\\ldots\\\\\n", " \\,y_N = c_0 + c_1 x_N + \\ldots c_N x_N^N\n", "\\end{array}\n", "$$\n", "\n", "Or, stacking $ c = \\begin{bmatrix} c_0 & \\ldots & c_N\\end{bmatrix}, y = \\begin{bmatrix} y_0 & \\ldots & y_N\\end{bmatrix} $ and\n", "\n", "$$\n", "A = \\begin{bmatrix} 1 & x_0 & x_0^2 & \\ldots &x_0^N\\\\\n", " \\vdots & \\vdots & \\vdots & \\vdots & \\vdots \\\\\n", " 1 & x_N & x_N^2 & \\ldots & x_N^N\n", " \\end{bmatrix}\n", "$$\n", "\n", "We can then calculate the interpolating coefficients as the solution to\n", "\n", "$$\n", "A c = y\n", "$$\n", "\n", "Implementing this for the interpolation of the $ exp(x) $ function" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "1.356966095045209e-9" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N = 5\n", "f(x) = exp(x)\n", "x = range(0.0, 10.0, length = N+1)\n", "y = f.(x) # generate some data to interpolate\n", "\n", "A = [x_i^n for x_i in x, n in 0:N]\n", "A_inv = inv(A)\n", "c = A_inv * y\n", "norm(A * c - f.(x), Inf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The final step just checks the interpolation vs. the analytic function at the nodes. Keep in mind that this should be very close to zero\n", "since we are interpolating the function precisely at those nodes.\n", "In our example, the Inf-norm (i.e., maximum difference) of the interpolation errors at the nodes is around `1E-9`, which\n", "is reasonable for many problems.\n", "\n", "But note that with $ N=5 $ the condition number is already of order `1E6`." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "564652.3214053963" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cond(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What if we increase the degree of the polynomial with the hope of increasing the precision of the\n", "interpolation?" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "8.61171429278329e-7" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N = 10\n", "f(x) = exp(x)\n", "x = range(0.0, 10.0, length = N+1)\n", "y = f.(x) # generate some data to interpolate\n", "\n", "A = [x_i^n for x_i in x, n in 0:N]\n", "A_inv = inv(A)\n", "c = A_inv * y\n", "norm(A * c - f.(x), Inf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we see that hoping to increase the precision between points by adding extra polynomial terms is backfiring. By going to a 10th-order polynomial, we have\n", "introduced an error of about `1E-5`, even at the interpolation points themselves.\n", "\n", "This blows up quickly" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "19978.410967681375" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N = 20\n", "f(x) = exp(x)\n", "x = range(0.0, 10.0, length = N+1)\n", "y = f.(x) # generate some data to interpolate\n", "\n", "A = [x_i^n for x_i in x, n in 0:N]\n", "A_inv = inv(A)\n", "c = A_inv * y\n", "norm(A * c - f.(x), Inf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see the source of the problem, note that the condition number is astronomical." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "2.0386741019186427e24" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cond(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point, you should be suspicious of the use of `inv(A)`, since we have considered solving\n", "linear systems by taking the inverse as verboten. Indeed, this made things much worse. The\n", "error drops dramatically if we solve it as a linear system" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "1.864464138634503e-10" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = A \\ y\n", "norm(A * c - f.(x), Inf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But an error of `1E-10` at the interpolating nodes themselves can be a problem in many applications, and if you increase `N`\n", "then the error will become non-trivial eventually - even without taking the inverse.\n", "\n", "The heart of the issue is that the monomial basis leads to a [Vandermonde matrix](https://en.wikipedia.org/wiki/Vandermonde_matrix), which\n", "is especially ill-conditioned." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Aside on Runge’s Phenomenon\n", "\n", "The monomial basis is also a good opportunity to look at a separate type of error due to [Runge’s Phenomenon](https://en.wikipedia.org/wiki/Runge%27s_phenomenon). It is an important\n", "issue in approximation theory, albeit not one driven by numerical approximation errors.\n", "\n", "It turns out that using a uniform grid of points is, in general, the worst possible choice of interpolation nodes for a polynomial approximation. This phenomenon can be seen with the interpolation of the seemingly innocuous Runge’s function, $ g(x) = \\frac{1}{1 + 25 x^2} $.\n", "\n", "Let’s calculate the interpolation with a monomial basis to find the $ c_i $ such that\n", "\n", "$$\n", "\\frac{1}{1 + 25 x^2} \\approx \\sum_{i=0}^N c_i x^i,\\, \\text{ for } -1 \\leq x \\leq 1\n", "$$\n", "\n", "First, interpolate with $ N = 5 $ and avoid taking the inverse. In that case, as long as we avoid taking an inverse, the numerical errors from the ill-conditioned matrix are manageable." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "hide-output": false }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Plots\n", "N_display = 100\n", "g(x) = 1/(1 + 25x^2)\n", "x_display = range(-1, 1, length = N_display)\n", "y_display = g.(x_display)\n", "\n", "# interpolation\n", "N = 5\n", "x = range(-1.0, 1.0, length = N+1)\n", "y = g.(x)\n", "A_5 = [x_i^n for x_i in x, n in 0:N]\n", "c_5 = A_5 \\ y\n", "\n", "# use the coefficients to evaluate on x_display grid\n", "B_5 = [x_i^n for x_i in x_display, n in 0:N] # calculate monomials for display grid\n", "y_5 = B_5 * c_5 # calculates for each in x_display_grid\n", "plot(x_display, y_5, label = \"P_5(x)\")\n", "plot!(x_display, y_display, w = 3, label = \"g(x)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that while the function, $ g(x) $, and the approximation with a 5th-order polynomial, $ P_5(x) $, coincide at the 6 nodes, the\n", "approximation has a great deal of error everywhere else.\n", "\n", "The oscillations near the boundaries are the hallmarks of Runge’s Phenomenon. You might guess that increasing the number\n", "of grid points and the order of the polynomial will lead to better approximations:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "hide-output": false }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N = 9\n", "x = range(-1.0, 1.0, length = N+1)\n", "y = g.(x)\n", "A_9 = [x_i^n for x_i in x, n in 0:N]\n", "c_9 = A_9 \\ y\n", "\n", "# use the coefficients to evaluate on x_display grid\n", "B_9 = [x_i^n for x_i in x_display, n in 0:N] # calculate monomials for display grid\n", "y_9 = B_9 * c_9 # calculates for each in x_display_grid\n", "plot(x_display, y_9, label = \"P_9(x)\")\n", "plot!(x_display, y_display, w = 3, label = \"g(x)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While the approximation is better near `x=0`, the oscillations near the boundaries have become worse. Adding on extra polynomial terms will not\n", "globally increase the quality of the approximation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Using an Orthogonal Polynomial Basis\n", "\n", "We can minimize the numerical problems of an ill-conditioned basis matrix by choosing a different basis for the polynomials.\n", "\n", "For example, [Chebyshev polynomials](https://en.wikipedia.org/wiki/Chebyshev_polynomials) form an orthonormal basis under an appropriate inner product, and we can form precise high-order approximations, with very little numerical error" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(g_approx.(x) - g.(x), Inf) = 4.440892098500626e-16\n" ] }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using ApproxFun\n", "N = 10000\n", "S = Chebyshev(-1.0..1.0) # form Chebyshev basis\n", "x = points(S, N) # chooses Chebyshev nodes\n", "y = g.(x)\n", "g_approx = Fun(S,ApproxFun.transform(S,y)) # transform fits the polynomial\n", "@show norm(g_approx.(x) - g.(x), Inf)\n", "plot(x_display, g_approx.(x_display), label = \"P_10000(x)\")\n", "plot!(x_display, g.(x_display), w = 3, label = \"g(x)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Besides the use of a different polynomial basis, we are approximating at different nodes (i.e., [Chebyshev nodes](https://en.wikipedia.org/wiki/Chebyshev_nodes)). Interpolation with Chebyshev polynomials at the Chebyshev nodes ends up minimizing (but not eliminating) Runge’s Phenomenon." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Lessons for Approximation and Interpolation\n", "\n", "To summarize:\n", "\n", "1. Check the condition number on systems you suspect might be ill-conditioned (based on intuition of collinearity). \n", "1. If you are working with ill-conditioned matrices, be especially careful not to take the inverse or multiply by the transpose. \n", "1. Avoid a monomial polynomial basis. Instead, use polynomials (e.g., Chebyshev or Lagrange) orthogonal under an appropriate inner product, or use a non-global basis such as cubic splines. \n", "1. If possible, avoid using a uniform grid for interpolation and approximation, and choose nodes appropriate for the basis. \n", "\n", "\n", "However, sometimes you can’t avoid ill-conditioned matrices. This is especially common with discretization of PDEs and with linear least squares." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stationary Iterative Algorithms for Linear Systems\n", "\n", "As before, consider solving the equation\n", "\n", "$$\n", "A x = b\n", "$$\n", "\n", "We will now\n", "focus on cases where $ A $ is both massive (e.g., potentially millions of equations) and sparse, and sometimes ill-conditioned - but where there is always a unique solution.\n", "\n", "While this may seem excessive, it occurs in practice due to the curse of dimensionality, discretizations\n", "of PDEs, and when working with big data.\n", "\n", "The methods in the previous lectures (e.g., factorization and approaches similar to Gaussian elimination) are called direct methods, and are able\n", "in theory to converge to the exact solution in a finite number of steps while directly working with the matrix in memory.\n", "\n", "Instead, iterative solutions start with a guess on a solution and iterate until convergence. The benefit will be that\n", "each iteration uses a lower-order operation (e.g., an $ O(N^2) $ matrix-vector product) which will make it possible to\n", "\n", "1. solve much larger systems, even if done less precisely. \n", "1. define linear operators in terms of the matrix-vector products, rather than storing as a matrix. \n", "1. get approximate solutions in progress prior to the completion of all algorithm steps, unlike the direct methods, which provide a solution only at the end. \n", "\n", "\n", "Of course, there is no free lunch, and the computational order of the iterations themselves would be comparable to the direct methods for a given level of tolerance (e.g., $ O(N^3) $ operations may be required to solve a dense unstructured system).\n", "\n", "There are two types of iterative methods we will consider. The first type is stationary methods, which iterate on a map in a way that’s similar to fixed-point problems, and the second type is [Krylov](https://en.wikipedia.org/wiki/Krylov_subspace) methods, which iteratively solve using left-multiplications of the linear operator.\n", "\n", "For our main examples, we will use the valuation of the continuous-time Markov chain from the [numerical methods for linear algebra](numerical_linear_algebra.html) lecture. That is, given a payoff vector $ r $, a\n", "discount rate $ \\rho $, and the infinitesimal generator of the Markov chain $ Q $, solve the equation\n", "\n", "$$\n", "\\rho v = r + Q v\n", "$$\n", "\n", "With the sizes and types of matrices here, iterative methods are inappropriate in practice, but they will help us understand\n", "the characteristics of convergence and how they relate to matrix conditioning." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stationary Methods\n", "\n", "First, we will solve with a direct method, which will give the solution to machine precision." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "100.00000000000004" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using LinearAlgebra, IterativeSolvers, Statistics\n", "α = 0.1\n", "N = 100\n", "Q = Tridiagonal(fill(α, N-1), [-α; fill(-2α, N-2); -α], fill(α, N-1))\n", "\n", "r = range(0.0, 10.0, length=N)\n", "ρ = 0.05\n", "\n", "A = ρ * I - Q\n", "v_direct = A \\ r\n", "mean(v_direct)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Without proof, consider that given the discount rate of $ \\rho > 0 $, this problem could be set up as a contraction for solving the Bellman\n", "equation through methods such as value-function iteration.\n", "\n", "The condition we will examine here is called [**diagonal dominance**](https://en.wikipedia.org/wiki/Diagonally_dominant_matrix).\n", "\n", "$$\n", "|A_{ii}| \\geq \\sum_{j\\neq i} |A_{ij}| \\quad\\text{for all } i = 1\\ldots N\n", "$$\n", "\n", "That is, in every row, the diagonal element is weakly greater in absolute value than the sum of all of the other elements in the row. In cases\n", "where it is strictly greater, we say that the matrix is strictly diagonally dominant.\n", "\n", "With our example, given that $ Q $ is the infinitesimal generator of a Markov chain, we know that each row sums to 0, and hence\n", "it is weakly diagonally dominant.\n", "\n", "However, notice that when $ \\rho > 0 $, and since the diagonal of $ Q $ is negative, $ A = ρ I - Q $ makes the matrix strictly diagonally dominant." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Jacobi Iteration\n", "\n", "For matrices that are **strictly diagonally dominant**, you can prove that a simple decomposition and iteration procedure\n", "will converge.\n", "\n", "To solve a system $ A x = b $, split the matrix $ A $ into its diagonal and off-diagonal elements. That is,\n", "\n", "$$\n", "A = D + R\n", "$$\n", "\n", "where\n", "\n", "$$\n", "D = \\begin{bmatrix} A_{11} & 0 & \\ldots & 0\\\\\n", " 0 & A_{22} & \\ldots & 0\\\\\n", " \\vdots & \\vdots & \\vdots & \\vdots\\\\\n", " 0 & 0 & \\ldots & A_{NN}\n", " \\end{bmatrix}\n", "$$\n", "\n", "and\n", "\n", "$$\n", "R = \\begin{bmatrix} 0 & A_{12} & \\ldots & A_{1N} \\\\\n", " A_{21} & 0 & \\ldots & A_{2N} \\\\\n", " \\vdots & \\vdots & \\vdots & \\vdots\\\\\n", " A_{N1} & A_{N2} & \\ldots & 0\n", " \\end{bmatrix}\n", "$$\n", "\n", "Rearrange the $ (D + R)x = b $ as\n", "\n", "$$\n", "\\begin{align}\n", "D x &= b - R x\\\\\n", "x &= D^{-1} (b - R x)\n", "\\end{align}\n", "$$\n", "\n", "where, since $ D $ is diagonal, its inverse is trivial to calculate with $ O(N) $ complexity.\n", "\n", "To solve, take an iteration $ x^k $, starting from $ x^0 $, and then form a new guess with\n", "\n", "$$\n", "x^{k+1} = D^{-1}(b - R x^k)\n", "$$\n", "\n", "The complexity here is $ O(N^2) $ for the matrix-vector product, and $ O(N) $ for the vector subtraction and division.\n", "\n", "The package [IterativeSolvers.jl](https://github.com/JuliaMath/IterativeSolvers.jl) package implements this method.\n", "\n", "For our example, we start with a guess and solve for the value function and iterate" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(v - v_direct, Inf) = 0.022858373200932647\n" ] }, { "data": { "text/plain": [ "0.022858373200932647" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using IterativeSolvers, LinearAlgebra, SparseArrays\n", "v = zeros(N)\n", "jacobi!(v, A, r, maxiter = 40)\n", "@show norm(v - v_direct, Inf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this, after 40 iterations we see that the error is in the order of `1E-2`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Other Stationary Methods\n", "\n", "In practice, there are many methods that are better than Jacobi iteration. For example [Gauss-Siedel](https://en.wikipedia.org/wiki/Gauss%E2%80%93Seidel_method)., which\n", "splits the matrix $ A = L + U $ into a lower-triangular matrix $ L $ and an upper-triangular matrix $ U $ without the diagonal.\n", "\n", "The iteration becomes\n", "\n", "$$\n", "L x^{k+1} = b - U x^k\n", "$$\n", "\n", "In that case, since the $ L $ matrix is triangular, the system can be solved in $ O(N^2) $ operations after $ b - U x^k $ is formed" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(v - v_direct, Inf) = 1.5616376089155892e-5\n" ] } ], "source": [ "v = zeros(N)\n", "gauss_seidel!(v, A, r, maxiter = 40)\n", "@show norm(v - v_direct, Inf);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The accuracy increases substantially. After 40 iterations, we see that the error is of the order of `1E-5`\n", "\n", "Another example is [Successive Over-relaxation (SOR)](https://en.wikipedia.org/wiki/Successive_over-relaxation), which takes a relaxation parameter $ \\omega > 1 $ and decomposes the matrix as $ A = L + D + U $, where $ L, U $ are strictly upper- and lower-diagonal matrices and $ D $ is diagonal.\n", "\n", "Decompose the $ A $ matrix, multiply the system by $ \\omega $, and rearrange to find\n", "\n", "$$\n", "(D + \\omega L) x^{k+1} = \\omega b - \\left(\\omega U +(\\omega - 1)D \\right)x^k\n", "$$\n", "\n", "In that case, $ D + \\omega L $ is a triangular matrix, and hence the linear solution is $ O(N^2) $." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(v - v_direct, Inf) = 3.745356593753968e-7\n" ] } ], "source": [ "v = zeros(N)\n", "sor!(v, A, r, 1.1, maxiter = 40)\n", "@show norm(v - v_direct, Inf);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The accuracy is now `1E-7`. If we change the parameter to $ \\omega = 1.2 $, the accuracy further increases to `1E-9`.\n", "\n", "This technique is common with iterative methods: Frequently, adding a damping or a relaxation parameter will counterintuitively speed up the convergence process.\n", "\n", "**Note:** The stationary iterative methods are not always used directly, but are sometimes used as a “smoothing” step (e.g., running 5-10 times) prior to using other Krylov methods." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Krylov Methods\n", "\n", "A more commonly used set of iterative methods is based on [Krylov subspaces](https://en.wikipedia.org/wiki/Krylov_subspace), which involve iterating the $ A^k x $ matrix-vector product, and orthogonalizing to ensure that the resulting iteration is not too collinear.\n", "\n", "The prototypical Krylov method is [Conjugate Gradient](https://en.wikipedia.org/wiki/Conjugate_gradient_method), which requires the $ A $ matrix to be\n", "symmetric and positive definite.\n", "\n", "Solving an example:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "isposdef(A) = true\n" ] }, { "data": { "text/plain": [ "3.5791585364800934e10" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N = 100\n", "A = sprand(100, 100, 0.1) # 10 percent non-zeros\n", "A = A * A' # easy way to generate a symmetric positive-definite matrix\n", "@show isposdef(A)\n", "b = rand(N)\n", "x_direct = A \\ b # sparse direct solver more appropriate here\n", "cond(Matrix(A * A'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the condition numbers tend to be large for large random matrices.\n", "\n", "Solving this system with the conjugate gradient method:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "Converged after 174 iterations." ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = zeros(N)\n", "sol = cg!(x, A, b, log=true, maxiter = 1000)\n", "sol[end]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Introduction to Preconditioning\n", "\n", "If you tell a numerical analyst that you are using direct methods, their first question may be, “which factorization?” But if you tell them you\n", "are using an iterative method, they may ask “which preconditioner?”.\n", "\n", "As discussed at the beginning of the lecture, the spectral properties of matrices determine the rate of convergence\n", "of iterative methods. In particular, ill-conditioned matrices can converge slowly with iterative methods, for the same\n", "reasons that naive value-function iteration will converge slowly if the discount rate is close to `1`.\n", "\n", "Preconditioning solves this problem by adjusting the spectral properties of the matrix, at the cost of some extra computational\n", "operations.\n", "\n", "To see an example of a right-preconditioner, consider a matrix $ P $ which has a convenient and numerically stable inverse. Then\n", "\n", "$$\n", "\\begin{align}\n", "A x &= b\\\\\n", "A P^{-1} P x &= b\\\\\n", "A P^{-1} y &= b\\\\\n", "P x &= y\n", "\\end{align}\n", "$$\n", "\n", "That is, solve $ (A P^{-1})y = b $ for $ y $, and then solve $ P x = y $ for $ x $.\n", "\n", "There are all sorts of preconditioners, and they are specific to the particular problem at hand. The key features are that they have convenient (and lower-order!) ways to solve the\n", "resulting system and they lower the condition number of the matrix. To see this in action, we can look at a simple preconditioner.\n", "\n", "The diagonal precondition is simply `P = Diagonal(A)`. Depending on the matrix, this can change the condition number a little or a lot." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cond(Matrix(A)) = 189186.6473381337\n", "cond(Matrix(AP)) = 175174.59095330362\n" ] } ], "source": [ "AP = A * inv(Diagonal(A))\n", "@show cond(Matrix(A))\n", "@show cond(Matrix(AP));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But it may or may not decrease the number of iterations" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "Converged after 174 iterations." ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Preconditioners\n", "x = zeros(N)\n", "P = DiagonalPreconditioner(A)\n", "sol = cg!(x, A, b, log=true, maxiter = 1000)\n", "sol[end]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another classic preconditioner is the incomplete LU decomposition" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "Converged after 86 iterations." ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using IncompleteLU\n", "x = zeros(N)\n", "P = ilu(A, τ = 0.1)\n", "sol = cg!(x, A, b, Pl = P, log=true, maxiter = 1000)\n", "sol[end]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `τ` parameter determines the degree of the LU decomposition to conduct, providing a tradeoff between preconditioner and solve speed.\n", "\n", "A good rule of thumb is that you should almost always be using a preconditioner with iterative methods, and you should experiment to find preconditioners that are appropriate for your problem.\n", "\n", "Finally, naively trying another preconditioning approach (called [Algebraic Multigrid](https://en.wikipedia.org/wiki/Multigrid_method#Algebraic_MultiGrid_%28AMG%29)) gives us a further drop in the number of iterations." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "Converged after 59 iterations." ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = zeros(N)\n", "P = AMGPreconditioner{RugeStuben}(A)\n", "sol = cg!(x, A, b, Pl = P, log=true, maxiter = 1000)\n", "sol[end]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note:* Preconditioning is also available for stationary, iterative methods (see [this example](https://en.wikipedia.org/wiki/Preconditioner#Preconditioned_iterative_methods)), but\n", "is frequently not implemented since such methods are not often used for the complete solution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Methods for General Matrices\n", "\n", "There are many algorithms which exploit matrix structure (e.g., the conjugate gradient method for positive-definite matrices, and MINRES for matrices that are only symmetric/Hermitian).\n", "\n", "On the other hand, if there is no structure to a sparse matrix, then GMRES is a good approach.\n", "\n", "To experiment with these methods, we will use our ill-conditioned interpolation problem with a monomial basis." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cond(A) = 4.462833495403007e12, Converged after 11 iterations. Norm error 7.62520357966423e-8\n" ] } ], "source": [ "using IterativeSolvers\n", "\n", "N = 10\n", "f(x) = exp(x)\n", "x = range(0.0, 10.0, length = N+1)\n", "y = f.(x) # generate some data to interpolate\n", "A = sparse([x_i^n for x_i in x, n in 0:N])\n", "c = zeros(N+1) # initial guess required for iterative solutions\n", "results = gmres!(c, A, y, log=true, maxiter = 1000)\n", "println(\"cond(A) = $(cond(Matrix(A))), $(results[end]) Norm error $(norm(A*c - y, Inf))\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That method converged in 11 iterations. Now if we try it with an incomplete LU preconditioner, we see that it converges immediately." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Converged after 1 iterations. Norm error 4.5034175855107605e-7\n" ] } ], "source": [ "N = 10\n", "f(x) = exp(x)\n", "x = range(0.0, 10.0, length = N+1)\n", "y = f.(x) # generate some data to interpolate\n", "A = [x_i^n for x_i in x, n in 0:N]\n", "P = ilu(sparse(A), τ = 0.1)\n", "c = zeros(N+1) # initial guess required for iterative solutions\n", "results = gmres!(c, A, y, Pl = P,log=true, maxiter = 1000)\n", "println(\"$(results[end]) Norm error $(norm(A*c - y, Inf))\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With other preconditioners (e.g., `DiagonalPreconditioner`), we may save only one or two iterations. Keep in mind,\n", "however, to consider the cost of the preconditioning process in your problem." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Matrix-Free Methods\n", "\n", "First, lets use a Krylov method to solve our simple valuation problem" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Converged after 20 iterations.\n" ] } ], "source": [ "α = 0.1\n", "N = 100\n", "Q = Tridiagonal(fill(α, N-1), [-α; fill(-2α, N-2); -α], fill(α, N-1))\n", "\n", "r = range(0.0, 10.0, length=N)\n", "ρ = 0.05\n", "\n", "A = ρ * I - Q\n", "v = zeros(N)\n", "results = gmres!(v, A, r, log=true)\n", "v_sol = results[1]\n", "println(\"$(results[end])\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While the `A` matrix was important to be kept in memory for direct methods, Krylov methods such as GMRES are built on matrix-vector products, i.e., $ A x $ for iterations on the $ x $.\n", "\n", "This product can be written directly for a given $ x $,\n", "\n", "$$\n", "A x = \\begin{bmatrix} (\\rho + \\alpha) x_1 - \\alpha x_2\\\\\n", " - \\alpha x_1 + (\\rho + 2 \\alpha) x_2 - \\alpha x_3\\\\\n", " \\vdots\\\\\n", " - \\alpha x_{N-2} + (\\rho + 2 \\alpha) x_{N-1} - \\alpha x_{N}\\\\\n", " - \\alpha x_{N-1} + (\\rho + \\alpha) x_N\n", " \\end{bmatrix}\n", "$$\n", "\n", "This can be implemented as a function (either in-place or out-of-place) which calculates $ y = A x $" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(A * x - A_mul(x)) = 0.0\n" ] } ], "source": [ "A_mul(x) = [ (ρ + α) * x[1] - α * x[2];\n", " [-α * x[i-1] + (ρ + 2*α) * x[i] - α * x[i+1] for i in 2:N-1]; # comprehension\n", " - α * x[end-1] + (ρ + α) * x[end]]\n", "\n", "x = rand(N)\n", "@show norm(A * x - A_mul(x)) # compare to matrix;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The final line verifies that the `A_mul` function provides the same result as the matrix multiplication with our original `A` for a random vector.\n", "\n", "In abstract mathematics, a finite-dimensional [linear operator](https://en.wikipedia.org/wiki/Linear_map) is a mapping $ A : R^N \\to R^N $\n", "that satisfies a number of criteria such as $ A (c_1 x_1 + c_2 x_2) = c_1 A x_1 + c_2 A x_2 $ for scalars $ c_i $ and vectors $ x_i $.\n", "\n", "Moving from abstract mathematics to [generic programming](../more_julia/generic_programming.html), we can think of a linear operator\n", "as a map that satisfies a number of requirements (e.g., it has a left-multiply to apply the map `*`, an in-place left-multiply `mul!`, an associated `size`). A Julia matrix\n", "is just one possible implementation of the abstract concept of a linear operator.\n", "\n", "Convenience wrappers can provide some of the boilerplate which turns the `A_mul` function into something that behaves like a matrix. One\n", "package is [LinearMaps.jl](https://github.com/Jutho/LinearMaps.jl) and another is [LinearOperators.jl](https://github.com/JuliaSmoothOptimizers/LinearOperators.jl)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "LinearMaps.FunctionMap{Float64}(A_mul, 100, 100; ismutating=false, issymmetric=false, ishermitian=false, isposdef=false)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using LinearMaps\n", "A_map = LinearMap(A_mul, N) # map uses the A_mul function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, with the `A_map` object, we can fulfill many of the operations we would expect from a matrix" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(A_map * x - A * x) = 0.0\n", "norm(y - A * x) = " ] }, { "name": "stdout", "output_type": "stream", "text": [ "0.0\n", "size(A_map) = (100, 100)\n", "norm(Matrix(A_map) - A) = " ] }, { "name": "stdout", "output_type": "stream", "text": [ "0.0\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "nnz(sparse(A_map)) = 298\n" ] } ], "source": [ "x = rand(N)\n", "@show norm(A_map * x - A * x)\n", "y = similar(x)\n", "mul!(y, A_map, x) # in-place multiplication\n", "@show norm(y - A * x)\n", "@show size(A_map)\n", "@show norm(Matrix(A_map) - A)\n", "@show nnz(sparse(A_map));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** In the case of `sparse(A_map)` and `Matrix(A_map)`, the code is using the left-multiplication operator with `N` standard basis vectors to construct\n", "the full matrix. This should be used only for testing purposes.\n", "\n", "But notice that as the linear operator does not have indexing operations, it is not an array or a matrix." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "false" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "typeof(A_map) <: AbstractArray" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As long as algorithms using linear operators are written generically (e.g., using the matrix-vector `*` or `mul!` functionss) and the types of functions are not\n", "unnecessarily constrained to be `Matrix` or `AbstractArray` when it isn’t strictly necessary, then the `A_map` type can work in places which would otherwise require a matrix.\n", "\n", "For example, the Krylov methods in `IterativeSolvers.jl` are written for generic left-multiplication" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Converged after 20 iterations.\n" ] } ], "source": [ "results = gmres(A_map, r, log = true) # Krylov method using the matrix-free type\n", "println(\"$(results[end])\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These methods are typically not competitive with sparse, direct methods unless the problems become very large. In that case,\n", "we often want to work with pre-allocated vectors. Instead of using `y = A * x` for matrix-vector products,\n", "we would use the in-place `mul!(y, A, x)` function. The wrappers for linear operators all support in-place non-allocating versions for this purpose." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(A_map_2 * v - A * v) = 0.0\n", "Converged after 20 iterations." ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "function A_mul!(y, x) # in-place version\n", " y[1] = (ρ + α) * x[1] - α * x[2]\n", " for i in 2:N-1\n", " y[i] = -α * x[i-1] + (ρ + 2α) * x[i] -α * x[i+1]\n", " end\n", " y[end] = - α * x[end-1] + (ρ + α) * x[end]\n", " return y\n", "end\n", "A_map_2 = LinearMap(A_mul!, N, ismutating = true) # ismutating == in-place\n", "\n", "v = zeros(N)\n", "@show norm(A_map_2 * v - A * v) # can still call with * and have it allocate\n", "results = gmres!(v, A_map_2, r, log=true) # in-place gmres\n", "println(\"$(results[end])\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, keep in mind that the linear operators can compose, so that $ A (c_1 x) + B (c_2 x) + x = (c_1 A + c_2 B + I) x $ is well defined for any linear operators - just as\n", "it would be for matrices $ A, B $ and scalars $ c_1, c_2 $.\n", "\n", "For example, take $ 2 A x + x = (2 A + I) x \\equiv B x $ as a new linear map," ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "LinearMaps.LinearCombination{Float64,Tuple{LinearMaps.CompositeMap{Float64,Tuple{LinearMaps.FunctionMap{Float64,typeof(A_mul),Nothing},LinearMaps.UniformScalingMap{Float64}}},LinearMaps.UniformScalingMap{Bool}}}" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B = 2.0 * A_map + I # composite linear operator\n", "B * rand(N) # left-multiply works with the composition\n", "typeof(B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The wrappers, such as `LinearMap` wrappers, make this composition possible by keeping the composition\n", "graph of the expression (i.e., `LinearCombination`) and implementing the left-multiply recursively using the rules of linearity.\n", "\n", "Another example is to solve the $ \\rho v = r + Q v $ equation for $ v $ with composition of matrix-free methods for $ L $\n", "rather than by creating the full $ A = \\rho - Q $ operator, which we implemented as `A_mul`" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(A - sparse(A_composed)) = 0.0\n" ] }, { "data": { "text/plain": [ "Converged after 20 iterations." ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Q_mul(x) = [ -α * x[1] + α * x[2];\n", " [α * x[i-1] - 2*α * x[i] + α*x[i+1] for i in 2:N-1]; # comprehension\n", " α * x[end-1] - α * x[end];]\n", "Q_map = LinearMap(Q_mul, N)\n", "A_composed = ρ * I - Q_map # map composition, performs no calculations\n", "@show norm(A - sparse(A_composed)) # test produces the same matrix\n", "gmres(A_composed, r, log=true)[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, the left-multiply of the `A_composed` used by `gmres` uses the left-multiply of `Q_map` and `I` with the rules\n", "of linearity. The `A_composed = ρ * I - Q_map` operation simply creates the `LinearMaps.LinearCombination` type, and doesn’t perform any calculations on its own." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Iterative Methods for Linear Least Squares\n", "\n", "In theory, the solution to the least-squares problem, $ \\min_x \\| Ax -b \\|^2 $, is simply the solution to the normal equations $ (A'A) x = A'b $.\n", "\n", "We saw, however, that in practice, direct methods use a QR decomposition - in part because an ill-conditioned matrix $ A $ becomes even worse when $ A' A $ is formed.\n", "\n", "For large problems, we can also consider Krylov methods for solving the linear least-squares problem. One formulation is the [LSMR](https://stanford.edu/group/SOL/software/lsmr/LSMR-SISC-2011.pdf) algorithm,\n", "which can solve the regularized\n", "\n", "$$\n", "\\min_x \\| Ax -b \\|^2 + \\| \\lambda x\\|^2\n", "$$\n", "\n", "The purpose of the $ \\lambda \\geq 0 $ parameter is to dampen the iteration process and/or regularize the solution. This isn’t required, but can help convergence for ill-conditioned matrices $ A $. With the\n", "damping parameter, the normalized equations would become $ (A'A + \\lambda^2 I) x = A'b $.\n", "\n", "We can compare solving the least-squares problem with LSMR and direct methods" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(β_direct - β_lsmr) = 9.139228893911292e-6\n", "Converged after 14 iterations.\n" ] } ], "source": [ "M = 1000\n", "N = 10000\n", "σ = 0.1\n", "β = rand(M)\n", "# simulate data\n", "X = sprand(N, M, 0.1)\n", "y = X * β + σ * randn(N)\n", "β_direct = X \\ y\n", "results = lsmr(X, y, log = true)\n", "β_lsmr = results[1]\n", "@show norm(β_direct - β_lsmr)\n", "println(\"$(results[end])\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that rather than forming this version of the normal equations, the LSMR algorithm uses the $ A x $ and $ A' y $ (i.e., the matrix-vector product and the matrix-transpose vector product) to implement an iterative\n", "solution. Unlike the previous versions, the left-multiply is insufficient since the least squares also deals with the transpose of the operator. For this reason, in order to use\n", "matrix-free methods, we need to define the `A * x` and `transpose(A) * y` functions separately." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Converged after 14 iterations.\n" ] } ], "source": [ "# Could implement as matrix-free functions.\n", "X_func(u) = X * u # matrix-vector product\n", "X_T_func(v) = X' * v # i.e., adjoint-vector product\n", "\n", "X_map = LinearMap(X_func, X_T_func, N, M)\n", "results = lsmr(X_map, y, log = true)\n", "println(\"$(results[end])\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Iterative Methods for Eigensystems\n", "\n", "When you use `eigen` on a dense matrix, it calculates an eigendecomposition and provides all the eigenvalues and eigenvectors.\n", "\n", "While this is sometimes necessary, a spectral decomposition of a dense, unstructured matrix is one of the costliest $ O(N^3) $ operations (i.e., it has\n", "one of the largest constants). For large matrices, it is often infeasible.\n", "\n", "Luckily, we frequently need only a few eigenvectors/eigenvalues (in some cases just one), which enables a different set of algorithms.\n", "\n", "For example, in the case of a discrete-time Markov chain, in order to find the stationary distribution, we are looking for the\n", "eigenvector associated with the eigenvalue 1. As usual, a little linear algebra goes a long way.\n", "\n", "From the [Perron-Frobenius theorem](https://en.wikipedia.org/wiki/Perron%E2%80%93Frobenius_theorem#Stochastic_matrices), the largest eigenvalue of an irreducible stochastic matrix is 1 - the same eigenvalue we are looking for.\n", "\n", "Iterative methods for solving eigensystems allow targeting the smallest magnitude, the largest magnitude, and many others. The easiest library\n", "to use is [Arpack.jl](https://julialinearalgebra.github.io/Arpack.jl/latest/).\n", "\n", "As an example," ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "λ = Complex{Float64}[1.0000000000000189 + 0.0im]\n", "mean(ϕ) = 0.0010000000000000002\n" ] } ], "source": [ "using Arpack, LinearAlgebra\n", "N = 1000\n", "A = Tridiagonal([fill(0.1, N-2); 0.2], fill(0.8, N), [0.2; fill(0.1, N-2);])\n", "A_adjoint = A'\n", "\n", "λ, ϕ = eigs(A_adjoint, nev=1, which=:LM, maxiter=1000) # Find 1 of the largest magnitude eigenvalue\n", "ϕ = real(ϕ) ./ sum(real(ϕ))\n", "@show λ\n", "@show mean(ϕ);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indeed, the `λ` is equal to `1`. If we choose `nev = 2`, it will provide the eigenpairs with the two eigenvalues of largest absolute value.\n", "\n", "*Hint*: If you get errors using `Arpack`, increase the `maxiter` parameter for your problems.\n", "\n", "Iterative methods for eigensystems rely on matrix-vector products rather than decompositions, and are amenable to matrix-free approaches. For example,\n", "take the Markov chain for a simple counting process:\n", "\n", "1. The count starts at $ 1 $ and has a maximum of $ N $. \n", "1. With probability $ \\theta \\geq 0 $, an existing count is lost with probability $ \\zeta \\geq 0 $ such that $ \\theta + \\zeta \\leq 1 $. \n", "1. If the count is at $ 1 $, then the only transition is to add a count with probability $ \\theta $. \n", "1. If the current count is $ N $, then the only transition is to lose the count with probability $ \\zeta $. \n", "\n", "\n", "First, finding the transition matrix $ P $ and its adjoint directly as a check" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "5×5 Tridiagonal{Float64,Array{Float64,1}}:\n", " 0.9 0.05 ⋅ ⋅ ⋅ \n", " 0.1 0.85 0.05 ⋅ ⋅ \n", " ⋅ 0.1 0.85 0.05 ⋅ \n", " ⋅ ⋅ 0.1 0.85 0.05\n", " ⋅ ⋅ ⋅ 0.1 0.95" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "θ = 0.1\n", "ζ = 0.05\n", "N = 5\n", "P = Tridiagonal(fill(ζ, N-1), [1-θ; fill(1-θ-ζ, N-2); 1-ζ], fill(θ, N-1))\n", "P'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Implementing the adjoint-vector product directly, and verifying that it gives the same matrix as the adjoint" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(P' - sparse(P_adj_map)) = 0.0\n" ] }, { "data": { "text/plain": [ "0.0" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P_adj_mul(x) = [ (1-θ) * x[1] + ζ * x[2];\n", " [θ * x[i-1] + (1-θ-ζ) * x[i] + ζ * x[i+1] for i in 2:N-1]; # comprehension\n", " θ * x[end-1] + (1-ζ) * x[end];]\n", "P_adj_map = LinearMap(P_adj_mul, N)\n", "@show norm(P' - sparse(P_adj_map))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, solving for the stationary distribution using the matrix-free method (which could be verified against the decomposition approach of $ P' $)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "λ = Complex{Float64}[1.0 + 0.0im]\n", "ϕ = " ] }, { "name": "stdout", "output_type": "stream", "text": [ "[0.03225806451612657; 0.06451612903225695; 0.1290322580645172; 0.25806451612903425; 0.516129032258065]\n" ] }, { "data": { "text/plain": [ "5×1 Array{Float64,2}:\n", " 0.03225806451612657\n", " 0.06451612903225695\n", " 0.1290322580645172\n", " 0.25806451612903425\n", " 0.516129032258065" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "λ, ϕ = eigs(P_adj_map, nev=1, which=:LM, maxiter=1000)\n", "ϕ = real(ϕ) ./ sum(real(ϕ))\n", "@show λ\n", "@show ϕ" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course, for a problem this simple, the direct eigendecomposition will be significantly faster. Use matrix-free iterative methods only for large systems where\n", "you do not need all of the eigenvalues." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Krylov Methods for Markov-Chain Dynamics\n", "\n", "This example applies the methods in this lecture to a large continuous-time Markov chain, and provides some practice working with arrays of arbitrary dimensions.\n", "\n", "Consider a version of the Markov-chain dynamics in [[Per19]](../zreferences.html#perla2019), where a firm has a discrete number of customers of different types. To keep things as simple as possible, assume that there are $ m=1, \\ldots M $ types of customers and that the firm may have $ n = 1, \\ldots N $ customers of each type.\n", "\n", "To set the notation, let $ n_m \\in \\{1, \\ldots N\\} $ be the number of customers of type $ m $, so that the state of a firm is $ \\{n_1, \\ldots n_m \\ldots, n_M\\} $. The cardinality of possible states is then $ \\mathbf{N}\\equiv N^M $, which can blow up quickly as the number of types increases.\n", "\n", "The stochastic process is a simple counting/forgetting process, as follows:\n", "\n", "1. For every $ 1 \\leq n_m(t) < N $, there is a $ \\theta $ intensity of arrival of a new customer, so that $ n_m(t+\\Delta) = n_m(t) + 1 $. \n", "1. For every $ 1 < n_m(t) \\leq N $, there is a $ \\zeta $ intensity of losing a customer, so that $ n_m(t+\\Delta) = n_m(t) - 1 $. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Matrix-free Infinitesimal Generator\n", "\n", "In order to define an intensity matrix $ Q $ of size $ \\mathbf{N}\\times \\mathcal{N} $, we need to choose a consistent ordering of the states. But\n", "before we enumerate them linearly, take a $ v\\in R^{\\mathbf{N}} $ interpreted as a multidimensional array and look at the left product of the linear operator $ Q v \\to R^{\\mathbf{N}} $.\n", "\n", "For example, if we were implementing the product at the row of $ Q $ corresponding to the $ (n_1, \\ldots, n_M) $ state, then\n", "\n", "$$\n", "\\begin{align}\n", " Q_{(n_1, \\ldots n_M)} \\cdot v &=\n", "\\theta \\sum_{m=1}^M (n_m < N) v(n_1, \\ldots, n_m + 1, \\ldots, n_M)\\\\\n", " &+ \\zeta \\sum_{m=1}^M (1 < n_m) v(n_1, \\ldots, n_m - 1, \\ldots, n_M)\\\\\n", " &-\\left(\\theta\\, \\text{Count}(n_m < N) + \\zeta\\, \\text{Count}( n_m > 1)\\right)v(n_1, \\ldots, n_M)\n", "\\end{align}\n", "$$\n", "\n", "Here:\n", "\n", "- the first term includes all of the arrivals of new customers into the various $ m $ \n", "- the second term is the loss of a customer for the various $ m $ \n", "- the last term is the intensity of all exits from this state (i.e., counting the intensity of all other transitions, to ensure that the row will sum to $ 0 $) \n", "\n", "\n", "In practice, rather than working with the $ f $ as a multidimensional type, we will need to enumerate the discrete states linearly, so that we can iterate $ f $ between $ 1 $ and $ \\mathbf{N} $. An especially convenient\n", "approach is to enumerate them in the same order as the $ K $-dimensional Cartesian product of the $ N $ states in the multi-dimensional array above.\n", "\n", "This can be done with the `CartesianIndices` function, which is used internally in Julia for the `eachindex` function. For example," ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "typeof(v) = Array{Float64,3}\n", "v(1, 1, 1) = 0.639089412234831" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "v(2, 1, 1) = 0.4302368488000152\n", "v(1, 2, 1) = 0.21490768283644002\n", "v(2, 2, 1) = 0.7542051014748841\n", "v(1, 1, 2) = 0.4330861190374067\n", "v(2, 1, 2) = 0.07556766967902084\n", "v(1, 2, 2) = 0.2143739072351467\n", "v(2, 2, 2) = 0.43231874437572815\n" ] } ], "source": [ "N = 2\n", "M = 3\n", "shape = Tuple(fill(N, M))\n", "v = rand(shape...)\n", "@show typeof(v)\n", "for ind in CartesianIndices(v)\n", " println(\"v$(ind.I) = $(v[ind])\") # .I gets the tuple to display\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The added benefit of this approach is that it will be the most efficient way to iterate through vectors in the implementation.\n", "\n", "For the counting process with arbitrary dimensions, we will frequently be incrementing or decrementing the $ m $ unit vectors of the `CartesianIndex` type with" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "3-element Array{CartesianIndex{3},1}:\n", " CartesianIndex(1, 0, 0)\n", " CartesianIndex(0, 1, 0)\n", " CartesianIndex(0, 0, 1)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "e_m = [CartesianIndex((1:M .== i)*1...) for i in 1:M]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and then use the vector to increment. For example, if the current count is `(1, 2, 2)` and we want to add a count of `1` to the first index and remove a count\n", "of `1` from the third index, then" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ind + e_m[1] = CartesianIndex(2, 2, 2)\n", "ind - e_m[3] = CartesianIndex(1, 2, 1)\n" ] } ], "source": [ "ind = CartesianIndex(1, 2, 2) # example counts coming from CartesianIndices\n", "@show ind + e_m[1] # increment the first index\n", "@show ind - e_m[3]; # decrement the third index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This works, of course, because the `CartesianIndex` type is written to support efficient addition and subtraction. Finally, to implement the operator, we need to count the indices in the states where increment and decrement occurs." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ind = CartesianIndex(1, 2, 2)\n", "count(ind.I .> 1) = " ] }, { "name": "stdout", "output_type": "stream", "text": [ "2\n", "count(ind.I .< N) = 1\n" ] } ], "source": [ "@show ind\n", "@show count(ind.I .> 1)\n", "@show count(ind.I .< N);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this, we are now able to write the $ Q $ operator on the $ f $ vector, which is enumerated by the Cartesian indices. First, collect the\n", "parameters in a named tuple generator" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "##NamedTuple_kw#256 (generic function with 2 methods)" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Parameters, BenchmarkTools\n", "default_params = @with_kw (θ = 0.1, ζ = 0.05, ρ = 0.03, N = 10, M = 6,\n", " shape = Tuple(fill(N, M)), # for reshaping vector to M-d array\n", " e_m = ([CartesianIndex((1:M .== i)*1...) for i in 1:M]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, implement the in-place matrix-free product" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 53.485 ms (0 allocations: 0 bytes)\n" ] } ], "source": [ "function Q_mul!(dv, v, p)\n", " @unpack θ, ζ, N, M, shape, e_m = p\n", " v = reshape(v, shape) # now can access v, dv as M-dim arrays\n", " dv = reshape(dv, shape)\n", "\n", " @inbounds for ind in CartesianIndices(v)\n", " dv[ind] = 0.0\n", " for m in 1:M\n", " n_m = ind[m]\n", " if(n_m < N)\n", " dv[ind] += θ * v[ind + e_m[m]]\n", " end\n", " if(n_m > 1)\n", " dv[ind] += ζ *v[ind - e_m[m]]\n", " end\n", " end\n", " dv[ind] -= (θ * count(ind.I .< N) + ζ * count(ind.I .> 1)) * v[ind]\n", " end\n", "end\n", "\n", "p = default_params()\n", "v = zeros(p.shape)\n", "dv = similar(v)\n", "@btime Q_mul!($dv, $v, $p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the output of the benchmarking, note that the implementation of the left-multiplication takes less than 100 milliseconds, and allocates little or no memory, even though the Markov chain has a million possible states (i.e., $ N^M = 10^6 $)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solving a Valuation Problem\n", "\n", "As before, we could use this Markov chain to solve a Bellman equation. Assume that the firm discounts at rate $ \\rho > 0 $ and gets a flow payoff of a different $ z_m $ per\n", "customer of type $ m $. For example, if the state of the firm is $ (n_1, n_2, n_3) = (2,3,2) $, then it gets $ \\begin{bmatrix}2 & 3 & 2\\end{bmatrix} \\cdot \\begin{bmatrix}z_1& z_2 & z_3\\end{bmatrix} $ in flow profits.\n", "\n", "Given this profit function, we can write the simple Bellman equation in our standard form of $ \\rho v = r + Q v $, defining the appropriate payoff $ r $. For example, if $ z_m = m^2 $, then" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "typeof(r_vec(p)) = Array{Float64,1}\n" ] }, { "data": { "text/plain": [ "250.25" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function r_vec(p)\n", " z = (1:p.M).^2 # payoffs per type m\n", " r = [0.5 * dot(ind.I, z) for ind in CartesianIndices(p.shape)]\n", " return reshape(r, p.N^p.M) # return as a vector\n", "end\n", "@show typeof(r_vec(p))\n", "r_vec(p) |> mean" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the returned $ r $ is a vector, enumerated in the same order as the $ n_m $ states.\n", "\n", "Since the ordering of $ r $ is consistent with that of $ Q $, we can solve $ (\\rho - Q) v = r $ as a linear system.\n", "\n", "Below, we create a linear operator and compare the algorithm for a few different iterative methods [(GMRES, BiCGStab(l), IDR(s), etc.)](https://juliamath.github.io/IterativeSolvers.jl/dev/#What-method-should-I-use-for-linear-systems?-1) with a small problem\n", "of only 10,000 possible states." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 679.258 ms (75 allocations: 181.85 MiB)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "norm(gmres(A, r) - v_direct) = 2.390837794373907e-5\n", " " ] }, { "name": "stdout", "output_type": "stream", "text": [ "5.272 ms (226 allocations: 3.37 MiB)\n", "norm(bicgstabl(A, r) - v_direct) = " ] }, { "name": "stdout", "output_type": "stream", "text": [ "1.108634674599104e-5\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 23.129 ms (432 allocations: 9.86 MiB)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "norm(idrs(A, r) - v_direct) = 4.992408654237078e-8\n", " " ] }, { "name": "stdout", "output_type": "stream", "text": [ "7.443 ms (312 allocations: 4.91 MiB)\n" ] } ], "source": [ "p = default_params(N=10, M=4)\n", "Q = LinearMap((df, f) -> Q_mul!(df, f, p), p.N^p.M, ismutating = true)\n", "A = p.ρ * I - Q\n", "A_sparse = sparse(A) # expensive: use only in tests\n", "r = r_vec(p)\n", "v_direct = A_sparse \\ r\n", "iv = zero(r)\n", "\n", "@btime $A_sparse \\ $r # direct\n", "@show norm(gmres(A, r) - v_direct)\n", "@btime gmres!(iv, $A, $r) setup = (iv = zero(r))\n", "\n", "@show norm(bicgstabl(A, r) - v_direct)\n", "@btime bicgstabl!(iv, $A, $r) setup = (iv = zero(r))\n", "\n", "@show norm(idrs(A, r) - v_direct)\n", "@btime idrs($A, $r);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we see that even if the $ A $ matrix has been created, the direct sparse solver (which uses a sparse LU or QR) is at least an order of magnitude slower and allocates over an order of magnitude more memory. This is in addition to the allocation for the `A_sparse` matrix itself, which is not needed for iterative methods.\n", "\n", "The different iterative methods have tradeoffs when it comes to accuracy, speed, convergence rate, memory requirements, and usefulness of preconditioning. Going much above $ \\mathbf{N} = 10^4 $, the direct methods quickly become infeasible.\n", "\n", "Putting everything together to solving much larger systems with GMRES as our linear solvers" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1.581 s (270 allocations: 366.23 MiB)\n" ] } ], "source": [ "function solve_bellman(p; iv = zeros(p.N^p.M))\n", " @unpack ρ, N, M = p\n", " Q = LinearMap((df, f) -> Q_mul!(df, f, p), N^M, ismutating = true)\n", " A = ρ * I - Q\n", " r = r_vec(p)\n", "\n", " sol = gmres!(iv, A, r, log = false) # iterative solver, matrix-free\n", " return sol\n", "end\n", "p = default_params(N=10, M=6)\n", "@btime solve_bellman($p);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This solves a value function with a Markov chain of a million states in a little over a second! This general approach seems to scale roughly linearly. For example, try $ N=10, M=8 $\n", "to solve an equation with a Markov chain with 100 million possible states, which can be solved in about 3-4 minutes. Above that order of magnitude, you may need to tinker with the linear solver parameters to ensure that you are not memory limited (e.g., change the `restart` parameter of GMRES)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Markov Chain Steady State and Dynamics\n", "\n", "Recall that given an $ N $-dimensional intensity matrix $ Q $ of a CTMC, the evolution of the pdf from an initial condition $ \\psi(0) $ is the system of linear differential equations\n", "\n", "$$\n", "\\dot{\\psi}(t) = Q^T \\psi(t)\n", "$$\n", "\n", "If $ Q $ is a matrix, we could just take its transpose to find the adoint. However, with matrix-free methods, we need to implement the\n", "adjoint-vector product directly.\n", "\n", "The logic for the adjoint is that for a given $ n = (n_1,\\ldots, n_m, \\ldots n_M) $, the $ Q^T $ product for that row has terms enter when\n", "\n", "1. $ 1 < n_m \\leq N $, entering into the identical $ n $ except with one less customer in the :math:`m`th position \n", "1. $ 1 \\leq n_m < N $, entering into the identical $ n $ except with one more customer in the :math:`m`th position \n", "\n", "\n", "Implementing this logic, first in math and then in code,\n", "\n", "$$\n", "\\begin{align}\n", " Q^T_{(n_1, \\ldots, n_M)} \\cdot \\psi &=\n", "\\theta \\sum_{m=1}^M (n_m > 1) \\psi(n_1, \\ldots, n_m - 1, \\ldots, n_M)\\\\\n", " &+ \\zeta \\sum_{m=1}^M (n_m < N) \\psi(n_1, \\ldots, n_m + 1, \\ldots, n_M)\\\\\n", " &-\\left(\\theta\\, \\text{Count}(n_m < N) + \\zeta\\, \\text{Count}( n_m > 1)\\right)\\psi(n_1, \\ldots, n_M)\n", "\\end{align}\n", "$$" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "hide-output": false }, "outputs": [ { "data": { "text/plain": [ "Q_T_mul! (generic function with 1 method)" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function Q_T_mul!(dψ, ψ, p)\n", " @unpack θ, ζ, N, M, shape, e_m = p\n", " ψ = reshape(ψ, shape)\n", " dψ = reshape(dψ, shape)\n", "\n", " @inbounds for ind in CartesianIndices(ψ)\n", " dψ[ind] = 0.0\n", " for m in 1:M\n", " n_m = ind[m]\n", " if(n_m > 1)\n", " dψ[ind] += θ * ψ[ind - e_m[m]]\n", " end\n", " if(n_m < N)\n", " dψ[ind] += ζ *ψ[ind + e_m[m]]\n", " end\n", " end\n", " dψ[ind] -= (θ * count(ind.I .< N) + ζ * count(ind.I .> 1)) * ψ[ind]\n", " end\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `sparse` function for the operator is useful for testing that the function is correct, and is the adjoint of\n", "our `Q` operator." ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm((sparse(Q))' - sparse(Q_T)) = 0.0\n" ] } ], "source": [ "p = default_params(N=5, M=4) # sparse is too slow for the full matrix\n", "Q = LinearMap((df, f) -> Q_mul!(df, f, p), p.N^p.M, ismutating = true)\n", "Q_T = LinearMap((dψ, ψ) -> Q_T_mul!(dψ, ψ, p), p.N^p.M, ismutating = true)\n", "@show norm(sparse(Q)' - sparse(Q_T)); # reminder: use sparse only for testing!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As discussed previously, the steady state can be found as the eigenvector associated with the zero eigenvalue (i.e., the one that solves $ Q^T \\psi = 0 \\psi $). We could\n", "do this with a dense eigenvalue solution for relatively small matrices" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "eig_Q_T.values[end] = -4.163336342344337e-16 + 0.0im\n" ] } ], "source": [ "p = default_params(N=5, M=4)\n", "eig_Q_T = eigen(Matrix(Q_T))\n", "vec = real(eig_Q_T.vectors[:,end])\n", "direct_ψ = vec ./ sum(vec)\n", "@show eig_Q_T.values[end];" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This approach relies on a full factorization of the underlying matrix, delivering the entire spectrum. For our purposes, this is not necessary.\n", "\n", "Instead, we could use the `Arpack.jl` package to target the eigenvalue of smallest absolute value, which relies on an iterative method.\n", "\n", "A final approach in this case is to notice that the $ \\mathbf{N}\\times\\mathbf{N} $ matrix is of\n", "rank $ \\mathbf{N} - 1 $ when the Markov chain is irreducible. The stationary solution is a vector in the $ 1 $-dimensional nullspace\n", "of the matrix.\n", "\n", "Using Krylov methods to solve a linear system with the right-hand side all $ 0 $min_x ||A x - 0||_2` solved\n", "iteratively from a non-zero initial condition will converge to a point in the nullspace.\n", "\n", "We can use various Krylov methods for this trick (e.g., if the matrix is symmetric and positive definite, we could use Conjugate Gradient) but in our case we will\n", "use GMRES since we do not have any structure." ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "norm(ψ - direct_ψ) = 6.098250476301026e-11\n" ] } ], "source": [ "p = default_params(N=5, M=4) # sparse is too slow for the full matrix\n", "Q_T = LinearMap((dψ, ψ) -> Q_T_mul!(dψ, ψ, p), p.N^p.M, ismutating = true)\n", "ψ = fill(1/(p.N^p.M), p.N^p.M) # can't use 0 as initial guess\n", "sol = gmres!(ψ, Q_T, zeros(p.N^p.M)) # i.e., solve Ax = 0 iteratively\n", "ψ = ψ / sum(ψ)\n", "@show norm(ψ - direct_ψ);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The speed and memory differences between these methods can be orders of magnitude." ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 38.993 ms (270 allocations: 2.28 MiB)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 4.485 ms (341 allocations: 1.21 MiB)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 180.634 μs (359 allocations: 66.03 KiB)\n" ] } ], "source": [ "p = default_params(N=4, M=4) # Dense and sparse matrices are too slow for the full dataset.\n", "Q_T = LinearMap((dψ, ψ) -> Q_T_mul!(dψ, ψ, p), p.N^p.M, ismutating = true)\n", "Q_T_dense = Matrix(Q_T)\n", "Q_T_sparse = sparse(Q_T)\n", "b = zeros(p.N^p.M)\n", "@btime eigen($Q_T_dense)\n", "@btime eigs($Q_T_sparse, nev=1, which=:SM, v0 = iv) setup = (iv = fill(1/(p.N^p.M), p.N^p.M))\n", "@btime gmres!(iv, $Q_T, $b) setup = (iv = fill(1/(p.N^p.M), p.N^p.M));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The differences become even more stark as the matrix grows. With `default_params(N=5, M=5)`, the `gmres` solution is at least 3 orders of magnitude faster, and uses close to 3 orders of magnitude less memory than the dense solver. In addition, the `gmres` solution is about an order of magnitude faster than the iterative sparse eigenvalue solver.\n", "\n", "The algorithm can solve for the steady state of $ 10^5 $ states in a few seconds" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "hide-output": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3.183 s (4880 allocations: 19.32 MiB)\n" ] } ], "source": [ "function stationary_ψ(p)\n", " Q_T = LinearMap((dψ, ψ) -> Q_T_mul!(dψ, ψ, p), p.N^p.M, ismutating = true)\n", " ψ = fill(1/(p.N^p.M), p.N^p.M) # can't use 0 as initial guess\n", " sol = gmres!(ψ, Q_T, zeros(p.N^p.M)) # i.e., solve Ax = 0 iteratively\n", " return ψ / sum(ψ)\n", "end\n", "p = default_params(N=10, M=5)\n", "@btime stationary_ψ($p);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a final demonstration, consider calculating the full evolution of the $ ψ(t) $ Markov chain. For the constant\n", "$ Q' $ matrix, the solution to this system of equations is $ \\psi(t) = \\exp(Q') \\psi(0) $\n", "\n", "Matrix-free Krylov methods using a technique called [exponential integration](https://en.wikipedia.org/wiki/Exponential_integrator) can solve this for high-dimensional problems.\n", "\n", "For this, we can set up a `MatrixFreeOperator` for our `Q_T_mul!` function (equivalent to the `LinearMap`, but with some additional requirements for the ODE solver) and use the [LinearExponential](http://docs.juliadiffeq.org/latest/solvers/ode_solve.html#Exponential-Methods-for-Linear-and-Affine-Problems-1) time-stepping method." ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "hide-output": false }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using OrdinaryDiffEq, DiffEqOperators\n", "\n", "function solve_transition_dynamics(p, t)\n", " @unpack N, M = p\n", "\n", " ψ_0 = [1.0; fill(0.0, N^M - 1)]\n", " O! = MatrixFreeOperator((dψ, ψ, p, t) -> Q_T_mul!(dψ, ψ, p), (p, 0.0), size=(N^M,N^M), opnorm=(p)->1.25)\n", "\n", " # define the corresponding ODE problem\n", " prob = ODEProblem(O!,ψ_0,(0.0,t[end]), p)\n", " return solve(prob, LinearExponential(krylov=:simple), tstops = t)\n", "end\n", "t = 0.0:5.0:100.0\n", "p = default_params(N=10, M=6)\n", "sol = solve_transition_dynamics(p, t)\n", "v = solve_bellman(p)\n", "plot(t, [dot(sol(tval), v) for tval in t], xlabel = \"t\", label = [\"E_t(v)\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above plot (1) calculates the full dynamics of the Markov chain from the $ n_m = 1 $ for all $ m $ initial condition; (2) solves the dynamics of a system of a million ODEs; and (3) uses the calculation of the Bellman equation to find the expected valuation during that transition. The entire process takes less than 30 seconds." ] } ], "metadata": { "date": 1591310628.1086543, "download_nb": 1, "download_nb_path": "https://julia.quantecon.org/", "filename": "iterative_methods_sparsity.rst", "filename_with_path": "tools_and_techniques/iterative_methods_sparsity", "kernelspec": { "display_name": "Julia 1.4.2", "language": "julia", "name": "julia-1.4" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.4.2" }, "title": "Krylov Methods and Matrix Conditioning" }, "nbformat": 4, "nbformat_minor": 2 }