--- title: Iterative Methods for Solving Linear Equations subtitle: Biostat/Biomath M257 author: Dr. Hua Zhou @ UCLA date: today format: html: theme: cosmo embed-resources: true number-sections: true toc: true toc-depth: 4 toc-location: left code-fold: false jupyter: jupytext: formats: 'ipynb,qmd' text_representation: extension: .qmd format_name: quarto format_version: '1.0' jupytext_version: 1.14.5 kernelspec: display_name: Julia 1.8.5 language: julia name: julia-1.8 --- System information (for reproducibility): ```{julia} versioninfo() ``` Load packages: ```{julia} using Pkg Pkg.activate(pwd()) Pkg.instantiate() Pkg.status() ``` ## Introduction So far we have considered direct methods for solving linear equations. * **Direct methods** (flops fixed _a priori_) vs **iterative methods**: - Direct method (GE/LU, Cholesky, QR, SVD): good for dense, small to moderate sized $\mathbf{A}$ - Iterative methods (Jacobi, Gauss-Seidal, SOR, conjugate-gradient, GMRES): good for large, sparse, structured linear system, parallel computing, warm start ## PageRank problem * $\mathbf{A} \in \{0,1\}^{n \times n}$ the connectivity matrix of webpages with entries $$ a_{ij} = \begin{cases} 1 & \text{if page $i$ links to page $j$} \\ 0 & \text{otherwise} \end{cases}. $$ $n \approx 10^9$ in May 2017. * $r_i = \sum_j a_{ij}$ is the *out-degree* of page $i$. * [Larry Page](https://en.wikipedia.org/wiki/PageRank) imagines a random surfer wandering on internet according to following rules: - From a page $i$ with $r_i>0$ * with probability $p$, (s)he randomly chooses a link on page $i$ (uniformly) and follows that link to the next page * with probability $1-p$, (s)he randomly chooses one page from the set of all $n$ pages (uniformly) and proceeds to that page - From a page $i$ with $r_i=0$ (a dangling page), (s)he randomly chooses one page from the set of all $n$ pages (uniformly) and proceeds to that page The process defines a Markov chain on the space of $n$ pages. Stationary distribution of this Markov chain gives the ranks (probabilities) of each page. * Stationary distribution is the top left eigenvector of the transition matrix $\mathbf{P}$ corresponding to eigenvalue 1. Equivalently it can be cast as a linear equation $$ (\mathbf{I} - \mathbf{P}^T) \mathbf{x} = \mathbf{0}. $$ * Gene Golub: Largest matrix computation in world. * GE/LU will take $2 \times (10^9)^3/3/10^{12} \approx 6.66 \times 10^{14}$ seconds $\approx 20$ million years on a tera-flop supercomputer! * Iterative methods come to the rescue. ## Jacobi method Solve $\mathbf{A} \mathbf{x} = \mathbf{b}$. * Jacobi iteration: $$ x_i^{(t+1)} = \frac{b_i - \sum_{j=1}^{i-1} a_{ij} x_j^{(t)} - \sum_{j=i+1}^n a_{ij} x_j^{(t)}}{a_{ii}}. $$ * With splitting: $\mathbf{A} = \mathbf{L} + \mathbf{D} + \mathbf{U}$, Jacobi iteration can be written as $$ \mathbf{D} \mathbf{x}^{(t+1)} = - (\mathbf{L} + \mathbf{U}) \mathbf{x}^{(t)} + \mathbf{b}, $$ i.e., $$ \mathbf{x}^{(t+1)} = - \mathbf{D}^{-1} (\mathbf{L} + \mathbf{U}) \mathbf{x}^{(t)} + \mathbf{D}^{-1} \mathbf{b} = - \mathbf{D}^{-1} \mathbf{A} \mathbf{x}^{(t)} + \mathbf{x}^{(t)} + \mathbf{D}^{-1} \mathbf{b}. $$ * One round costs $2n^2$ flops with an **unstructured** $\mathbf{A}$. Gain over GE/LU if converges in $o(n)$ iterations. Saving is huge for **sparse** or **structured** $\mathbf{A}$. By structured, we mean the matrix-vector multiplication $\mathbf{A} \mathbf{v}$ is fast ($O(n)$ or $O(n \log n)$). ## Gauss-Seidel method * Gauss-Seidel (GS) iteration: $$ x_i^{(t+1)} = \frac{b_i - \sum_{j=1}^{i-1} a_{ij} x_j^{(t+1)} - \sum_{j=i+1}^n a_{ij} x_j^{(t)}}{a_{ii}}. $$ * With splitting, $(\mathbf{D} + \mathbf{L}) \mathbf{x}^{(t+1)} = - \mathbf{U} \mathbf{x}^{(t)} + \mathbf{b}$, i.e., $$ \mathbf{x}^{(t+1)} = - (\mathbf{D} + \mathbf{L})^{-1} \mathbf{U} \mathbf{x}^{(t)} + (\mathbf{D} + \mathbf{L})^{-1} \mathbf{b}. $$ * GS converges for any $\mathbf{x}^{(0)}$ for symmetric and pd $\mathbf{A}$. * Convergence rate of Gauss-Seidel is the spectral radius of the $(\mathbf{D} + \mathbf{L})^{-1} \mathbf{U}$. ## Successive over-relaxation (SOR) * SOR: $$ x_i^{(t+1)} = \omega \left( b_i - \sum_{j=1}^{i-1} a_{ij} x_j^{(t+1)} - \sum_{j=i+1}^n a_{ij} x_j^{(t)} \right) / a_{ii} + (1-\omega) x_i^{(t)}, $$ i.e., $$ \mathbf{x}^{(t+1)} = (\mathbf{D} + \omega \mathbf{L})^{-1} [(1-\omega) \mathbf{D} - \omega \mathbf{U}] \mathbf{x}^{(t)} + (\mathbf{D} + \omega \mathbf{L})^{-1} (\mathbf{D} + \mathbf{L})^{-1} \omega \mathbf{b}. $$ * Need to pick $\omega \in [0,1]$ beforehand, with the goal of improving convergence rate. ## Conjugate gradient method * **Conjugate gradient and its variants are the top-notch iterative methods for solving huge, structured linear systems.** * A UCLA invention! Hestenes and Stiefel in 60s. * Solving $\mathbf{A} \mathbf{x} = \mathbf{b}$ is equivalent to minimizing the quadratic function $\frac{1}{2} \mathbf{x}^T \mathbf{A} \mathbf{x} - \mathbf{b}^T \mathbf{x}$. [Kershaw's results](http://www.sciencedirect.com/science/article/pii/0021999178900980?via%3Dihub) for a fusion problem. | Method | Number of Iterations | |----------------------------------------|----------------------| | Gauss Seidel | 208,000 | | Block SOR methods | 765 | | Incomplete Cholesky conjugate gradient | 25 | ## MatrixDepot.jl MatrixDepot.jl is an extensive collection of test matrices in Julia. After installation, we can check available test matrices by ```{julia} using MatrixDepot mdinfo() ``` ```{julia} # List matrices that are positive definite and sparse: mdlist(:symmetric & :posdef & :sparse) ``` ```{julia} # Get help on Poisson matrix mdinfo("poisson") ``` ```{julia} # Generate a Poisson matrix of dimension n=10 A = matrixdepot("poisson", 10) ``` ```{julia} using UnicodePlots spy(A) ``` ```{julia} # Get help on Wathen matrix mdinfo("wathen") ``` ```{julia} # Generate a Wathen matrix of dimension n=5 A = matrixdepot("wathen", 5) ``` ```{julia} spy(A) ``` ## Numerical examples The [`IterativeSolvers.jl`](https://github.com/JuliaMath/IterativeSolvers.jl) package implements most commonly used iterative solvers. ### Generate test matrix ```{julia} using BenchmarkTools, IterativeSolvers, LinearAlgebra, MatrixDepot, Random Random.seed!(280) n = 100 # Poisson matrix of dimension n^2=10000, pd and sparse A = matrixdepot("poisson", n) @show typeof(A) # dense matrix representation of A Afull = convert(Matrix, A) @show typeof(Afull) # sparsity level count(!iszero, A) / length(A) ``` ```{julia} spy(A) ``` ```{julia} # storage difference Base.summarysize(A), Base.summarysize(Afull) ``` ### Matrix-vector muliplication ```{julia} # randomly generated vector of length n^2 b = randn(n^2) # dense matrix-vector multiplication @benchmark $Afull * $b ``` ```{julia} # sparse matrix-vector multiplication @benchmark $A * $b ``` ### Dense solve via Cholesky ```{julia} # record the Cholesky solution xchol = cholesky(Afull) \ b; ``` ```{julia} # dense solve via Cholesky @benchmark cholesky($Afull) \ $b ``` ### Jacobi solver It seems that Jacobi solver doesn't give the correct answer. ```{julia} xjacobi = jacobi(A, b) @show norm(xjacobi - xchol) ``` Reading [documentation](https://juliamath.github.io/IterativeSolvers.jl/dev/linear_systems/stationary/#Jacobi-1) we found that the default value of `maxiter` is 10. A couple trial runs shows that 30000 Jacobi iterations are enough to get an accurate solution. ```{julia} xjacobi = jacobi(A, b, maxiter = 30000) @show norm(xjacobi - xchol) ``` ```{julia} @benchmark jacobi($A, $b, maxiter = 30000) ``` ### Gauss-Seidal ```{julia} # Gauss-Seidel solution is fairly close to Cholesky solution after 15000 iters xgs = gauss_seidel(A, b, maxiter = 15000) @show norm(xgs - xchol) ``` ```{julia} @benchmark gauss_seidel($A, $b, maxiter = 15000) ``` ### SOR ```{julia} # symmetric SOR with ω=0.75 xsor = ssor(A, b, 0.75, maxiter = 10000) @show norm(xsor - xchol) ``` ```{julia} @benchmark sor($A, $b, 0.75, maxiter = 10000) ``` ### Conjugate Gradient (preview of next lecture) ```{julia} # conjugate gradient xcg = cg(A, b) @show norm(xcg - xchol) ``` ```{julia} @benchmark cg($A, $b) ```