--- title: A list of "easy" linear systems subtitle: Biostat/Biomath M257 author: Dr. Hua Zhou @ UCLA date: today format: html: theme: cosmo embed-resources: true number-sections: true toc: true toc-depth: 4 toc-location: left code-fold: false jupyter: jupytext: formats: 'ipynb,qmd' text_representation: extension: .qmd format_name: quarto format_version: '1.0' jupytext_version: 1.14.5 kernelspec: display_name: Julia (8 threads) 1.8.5 language: julia name: julia-_8-threads_-1.8 --- System information (for reproducibility): ```{julia} versioninfo() ``` Load packages: ```{julia} using Pkg Pkg.activate(pwd()) Pkg.instantiate() Pkg.status() ``` # Introduction Consider $\mathbf{A} \mathbf{x} = \mathbf{b}$, $\mathbf{A} \in \mathbb{R}^{n \times n}$. Or, consider matrix inverse (if you want). $\mathbf{A}$ can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, ... We should be alert to many easy linear systems. Don't blindly use `A \ b` and `inv` in Julia or `solve` function in R. **Don't waste computing resources by bad choices of algorithms!** ## Diagonal matrix Diagonal $\mathbf{A}$: $n$ flops. Use `Diagonal` type of Julia. ```{julia} using BenchmarkTools, LinearAlgebra, Random # generate random data Random.seed!(123) n = 1000 A = diagm(0 => randn(n)) # a diagonal matrix stored as Matrix{Float64} b = randn(n); ``` ```{julia} # should give link to the source code @which A \ b ``` ```{julia} # check `istril(A)` and `istriu(A)` (O(n^2)), then call `Diagonal(A) \ b` (O(n)) @benchmark $A \ $b ``` ```{julia} # O(n) computation, no extra array allocation @benchmark Diagonal($A) \ $b ``` ## Bidiagonal, tridiagonal, and banded matrices Bidiagonal, tridiagonal, or banded $\mathbf{A}$: Band LU, band Cholesky, ... roughly $O(n)$ flops. * Use [`Bidiagonal`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.Bidiagonal), [`Tridiagonal`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.Tridiagonal), [`SymTridiagonal`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.SymTridiagonal) types of Julia. ```{julia} Random.seed!(123) n = 1000 dv = randn(n) ev = randn(n - 1) b = randn(n) # rhs # symmetric tridiagonal matrix A = SymTridiagonal(dv, ev) ``` ```{julia} # convert to a full matrix Afull = Matrix(A) # LU decomposition (2/3) n^3 flops! @benchmark $Afull \ $b ``` ```{julia} # specialized algorithm for tridiagonal matrix, O(n) flops @benchmark $A \ $b ``` ## Triangular matrix Triangular $\mathbf{A}$: $n^2$ flops to solve linear system. ```{julia} Random.seed!(123) n = 1000 A = tril(randn(n, n)) # a lower-triangular matrix stored as Matrix{Float64} b = randn(n) # check istril() then triangular solve @benchmark $A \ $b ``` ```{julia} # triangular solve directly; save the cost of istril() @benchmark LowerTriangular($A) \ $b ``` ## Block diagonal matrix Block diagonal: Suppose $n = \sum_b n_b$. For linear equations, $(\sum_b n_b)^3$ (without using block diagonal structure) vs $\sum_b n_b^3$ (using block diagonal structure). Julia has a [`blockdiag`](https://docs.julialang.org/en/v1/stdlib/SparseArrays/#SparseArrays.blockdiag) function that generates a **sparse** matrix. **Anyone interested writing a `BlockDiagonal.jl` package?** ```{julia} using SparseArrays Random.seed!(123) B = 10 # number of blocks ni = 100 A = blockdiag([sprandn(ni, ni, 0.01) for b in 1:B]...) ``` ```{julia} using UnicodePlots spy(A) ``` ## Kronecker product Use $$ \begin{eqnarray*} (\mathbf{A} \otimes \mathbf{B})^{-1} &=& \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\ (\mathbf{C}^T \otimes \mathbf{A}) \text{vec}(\mathbf{B}) &=& \text{vec}(\mathbf{A} \mathbf{B} \mathbf{C}) \\ \text{det}(\mathbf{A} \otimes \mathbf{B}) &=& [\text{det}(\mathbf{A})]^p [\text{det}(\mathbf{B})]^m, \quad \mathbf{A} \in \mathbb{R}^{m \times m}, \mathbf{B} \in \mathbb{R}^{p \times p} \end{eqnarray*} $$ to avoid forming and doing costly computation on the potentially huge Kronecker $\mathbf{A} \otimes \mathbf{B}$. **Anyone interested writing a package?** ```{julia} using MatrixDepot, LinearAlgebra A = matrixdepot("lehmer", 50) # a pd matrix ``` ```{julia} B = matrixdepot("oscillate", 100) # pd matrix ``` ```{julia} M = kron(A, B) c = ones(size(M, 2)) # rhs # Method 1: form Kronecker product and Cholesky solve x1 = cholesky(Symmetric(M)) \ c; ``` ```{julia} # Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1} m, p = size(A, 1), size(B, 1) x2 = vec(transpose(cholesky(Symmetric(A)) \ transpose(cholesky(Symmetric(B)) \ reshape(c, p, m)))); ``` ```{julia} # relative error norm(x1 - x2) / norm(x1) ``` ```{julia} using BenchmarkTools # Method 1: form Kronecker and Cholesky solve @benchmark cholesky(Symmetric(kron($A, $B))) \ c ``` ```{julia} # Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1} @benchmark vec(transpose(cholesky(Symmetric($A)) \ transpose(cholesky(Symmetric($B)) \ reshape($c, p, m)))) ``` ## Sparse matrix Sparsity: sparse matrix decomposition or iterative method. * The easiest recognizable structure. Familiarize yourself with the sparse matrix computation tools in Julia, Matlab, R (`Matrix` package), MKL (sparse BLAS), ... as much as possible. ```{julia} using MatrixDepot Random.seed!(123) # a 7701-by-7701 sparse pd matrix A = matrixdepot("wathen", 50) # random generated rhs b = randn(size(A, 1)) Afull = Matrix(A) count(!iszero, A) / length(A) # sparsity ``` ```{julia} using UnicodePlots spy(A) ``` ### Matrix-vector multiplication ```{julia} # dense matrix-vector multiplication @benchmark $Afull * $b ``` ```{julia} # sparse matrix-vector multiplication @benchmark $A * $b ``` ### Solve linear equation ```{julia} # solve via dense Cholesky xchol = cholesky(Symmetric(Afull)) \ b @benchmark cholesky($(Symmetric(Afull))) \ $b ``` ```{julia} # solve via sparse Cholesky xcholsp = cholesky(Symmetric(A)) \ b @show norm(xchol - xcholsp) @benchmark cholesky($(Symmetric(A))) \ $b ``` ```{julia} # sparse solve via conjugate gradient using IterativeSolvers xcg = cg(A, b) @show norm(xcg - xchol) @benchmark cg($A, $b) ``` ```{julia} # sparse solve via preconditioned conjugate gradient using Preconditioners xpcg = cg(A, b, Pl = DiagonalPreconditioner(A)) @show norm(xpcg - xchol) @benchmark cg($A, $b, Pl = $(DiagonalPreconditioner(A))) ``` ## Easy plus low rank Easy plus low rank: $\mathbf{U} \in \mathbb{R}^{n \times r}$, $\mathbf{V} \in \mathbb{R}^{r \times n}$, $r \ll n$. Woodbury formula \begin{eqnarray*} (\mathbf{A} + \mathbf{U} \mathbf{V}^T)^{-1} &=& \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} (\mathbf{I}_r + \mathbf{V}^T \mathbf{A}^{-1} \mathbf{U})^{-1} \mathbf{V}^T \mathbf{A}^{-1} \\ \text{det}(\mathbf{A} + \mathbf{U} \mathbf{V}^T) &=& \text{det}(\mathbf{A}) \text{det}(\mathbf{I}_r + \mathbf{V} \mathbf{A}^{-1} \mathbf{U}^T). \end{eqnarray*} * Keep HW3 (multivariate density) and HW4 (PageRank) problems in mind. * [`WoodburyMatrices.jl`](https://github.com/timholy/WoodburyMatrices.jl) package can be useful. ```{julia} using BenchmarkTools, Random, WoodburyMatrices Random.seed!(123) n = 1000 r = 5 A = Diagonal(rand(n)) B = randn(n, r) D = Diagonal(rand(r)) b = randn(n) # Woodbury structure: W = A + B * D * B' W = SymWoodbury(A, B, D) Wfull = Matrix(W) # stored as a Matrix{Float64} ``` ```{julia} # compares storage Base.summarysize(W), Base.summarysize(Wfull) ``` ### Solve linear equation ```{julia} # solve via Cholesky @benchmark cholesky($(Symmetric(Wfull))) \ $b ``` ```{julia} # solve using Woodbury formula @benchmark $W \ reshape($b, n, 1) # hack; need to file an issue ``` ### Matrix-vector multiplication ```{julia} # multiplication without using Woodbury structure @benchmark $Wfull * $b ``` ```{julia} # multiplication using Woodbury structure @benchmark $W * $b ``` ### Determinant ```{julia} # determinant without using Woodbury structure @benchmark det($Wfull) ``` ```{julia} # determinant using Woodbury structure @benchmark det($W) ``` ## Easy plus border Easy plus border: For $\mathbf{A}$ pd and $\mathbf{V}$ full row rank, $$ \begin{pmatrix} \mathbf{A} & \mathbf{V}^T \\ \mathbf{V} & \mathbf{0} \end{pmatrix}^{-1} = \begin{pmatrix} \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \\ (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & - (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \end{pmatrix}. $$ **Anyone interested writing a package?** ## Orthogonal matrix Orthogonal $\mathbf{A}$: $n^2$ flops **at most**. Why? Permutation matrix, Householder matrix, Jacobi matrix, ... take less. ## Toeplitz matrix Toeplitz systems (constant diagonals): $$ \mathbf{T} = \begin{pmatrix} r_0 & r_1 & r_2 & r_3 \\ r_{-1} & r_0 & r_1 & r_2 \\ r_{-2} & r_{-1} & r_0 & r_1 \\ r_{-3} & r_{-2} & r_{-1} & r_0 \end{pmatrix}. $$ $\mathbf{T} \mathbf{x} = \mathbf{b}$, where $\mathbf{T}$ is pd and Toeplitz, can be solved in $O(n^2)$ flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general $\mathbf{b}$), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics. * [`ToeplitzMatrices.jl`](https://github.com/JuliaMatrices/ToeplitzMatrices.jl) package can be useful. ## Circulant matrix Circulant systems: Toeplitz matrix with wraparound $$ C(\mathbf{z}) = \begin{pmatrix} z_0 & z_4 & z_3 & z_2 & z_1 \\ z_1 & z_0 & z_4 & z_3 & z_2 \\ z_2 & z_1 & z_0 & z_4 & z_3 \\ z_3 & z_2 & z_1 & z_0 & z_4 \\ z_4 & z_3 & z_2 & z_1 & z_0 \end{pmatrix}, $$ FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform). ## Vandermonde matrix Vandermonde matrix: such as in interpolation and approximation problems $$ \mathbf{V}(x_0,\ldots,x_n) = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_0 & x_1 & \cdots & x_n \\ \vdots & \vdots & & \vdots \\ x_0^n & x_1^n & \cdots & x_n^n \end{pmatrix}. $$ $\mathbf{V} \mathbf{x} = \mathbf{b}$ or $\mathbf{V}^T \mathbf{x} = \mathbf{b}$ can be solved in $O(n^2)$ flops. ## Cauchy-like matrix Cauchy-like matrices: $$ \Omega \mathbf{A} - \mathbf{A} \Lambda = \mathbf{R} \mathbf{S}^T, $$ where $\Omega = \text{diag}(\omega_1,\ldots,\omega_n)$ and $\Lambda = \text{diag}(\lambda_1,\ldots, \lambda_n)$. $O(n)$ flops for LU and QR. ## Structured-rank matrix Structured-rank problems: semiseparable matrices (LU and QR takes $O(n)$ flops), quasiseparable matrices, ...