---
title: Eigen-decomposition and SVD
subtitle: Biostat/Biomath M257
author: Dr. Hua Zhou @ UCLA
date: today
format:
html:
theme: cosmo
embed-resources: true
number-sections: true
toc: true
toc-depth: 4
toc-location: left
code-fold: false
jupyter:
jupytext:
formats: 'ipynb,qmd'
text_representation:
extension: .qmd
format_name: quarto
format_version: '1.0'
jupytext_version: 1.14.5
kernelspec:
display_name: Julia 1.9.0
language: julia
name: julia-1.9
---
System information (for reproducibility):
```{julia}
versioninfo()
```
Load packages:
```{julia}
using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()
```
# Introduction
Our last topic on numerical linear algebra is eigen-decomposition and singular value decomposition (SVD). We already saw the wide applications of QR decomposition in least squares problem and solving square and under-determined linear equations. Eigen-decomposition and SVD can be deemed as more thorough orthogonalization of a matrix. We start with a brief review of the related linear algebra.
## Linear algebra review: eigen-decomposition
For a quick review of eigen-decomposition, see [Biostat 216 slides](https://ucla-biostat-216.github.io/2022fall/slides/10-eig/10-eig.html).
* **Eigenvalues** are defined as roots of the characteristic equation $\det(\lambda \mathbf{I}_n - \mathbf{A})=0$.
* If $\lambda$ is an eigenvalue of $\mathbf{A}$, then there exist non-zero $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ such that $\mathbf{A} \mathbf{x} = \lambda \mathbf{x}$ and $\mathbf{y}^T \mathbf{A} = \lambda \mathbf{y}^T$. $\mathbf{x}$ and $\mathbf{y}$ are called the (column) **eigenvector** and **row eigenvector** of $\mathbf{A}$ associated with the eigenvalue $\lambda$.
* $\mathbf{A}$ is singular if and only if it has at least one 0 eigenvalue.
* Eigenvectors associated with distinct eigenvalues are linearly independent.
* Eigenvalues of an upper or lower triangular matrix are its diagonal entries: $\lambda_i = a_{ii}$.
* Eigenvalues of an idempotent matrix are either 0 or 1.
* Eigenvalues of an orthogonal matrix have complex modulus 1.
* In most statistical applications, we deal with eigenvalues/eigenvectors of symmetric matrices.
The eigenvalues and eigenvectors of a real **symmetric** matrix are real.
* Eigenvectors associated with distinct eigenvalues of a symmetry matrix are orthogonal.
* **Eigen-decompostion of a symmetric matrix**: $\mathbf{A} = \mathbf{U} \Lambda \mathbf{U}^T$, where
* $\Lambda = \text{diag}(\lambda_1,\ldots,\lambda_n)$
* columns of $\mathbf{U}$ are the eigenvectors, which are (or can be chosen to be) mutually orthonormal. Thus $\mathbf{U}$ is an orthogonal matrix.
* A real symmetric matrix is positive semidefinite (positive definite) if and only if all eigenvalues are nonnegative (positive).
* **Spectral radius** $\rho(\mathbf{A}) = \max_i |\lambda_i|$.
* $\mathbf{A} \in \mathbb{R}^{n \times n}$ a square matrix (not required to be symmetric), then $\text{tr}(\mathbf{A}) = \sum_i \lambda_i$ and $\det(\mathbf{A}) = \prod_i \lambda_i$.
## Linear algebra review: singular value decomposition (SVD)
For a quick review of SVD, see [Biostat 216 slides](https://ucla-biostat-216.github.io/2022fall/slides/12-svd/12-svd.html).
* **Singular value decomposition (SVD)**: For a rectangular matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$, let $p = \min\{m,n\}$, then we have the SVD
$$
\mathbf{A} = \mathbf{U} \Sigma \mathbf{V}^T,
$$
where
* $\mathbf{U} = (\mathbf{u}_1,\ldots,\mathbf{u}_m) \in \mathbb{R}^{m \times m}$ is orthogonal
* $\mathbf{V} = (\mathbf{v}_1,\ldots,\mathbf{v}_n) \in \mathbb{R}^{n \times n}$ is orthogonal
* $\Sigma = \text{diag}(\sigma_1, \ldots, \sigma_p) \in \mathbb{R}^{m \times n}$, $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_p \ge 0$.
$\sigma_i$ are called the **singular values**, $\mathbf{u}_i$ are the **left singular vectors**, and $\mathbf{v}_i$ are the **right singular vectors**.
* **Thin/Skinny SVD**. Assume $m \ge n$. $\mathbf{A}$ can be factored as
$$
\mathbf{A} = \mathbf{U}_n \Sigma_n \mathbf{V}^T = \sum_{i=1}^n \sigma_i \mathbf{u}_i \mathbf{v}_i^T,
$$
where
* $\mathbf{U}_n \in \mathbb{R}^{m \times n}$, $\mathbf{U}_n^T \mathbf{U}_n = \mathbf{I}_n$
* $\mathbf{V} \in \mathbb{R}^{n \times n}$, $\mathbf{V}^T \mathbf{V} = \mathbf{I}_n$
* $\Sigma_n = \text{diag}(\sigma_1,\ldots,\sigma_n) \in \mathbb{R}^{n \times n}$, $\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_n \ge 0$
* Denote $\sigma(\mathbf{A})=(\sigma_1,\ldots,\sigma_p)^T$. Then
* $r = \text{rank}(\mathbf{A}) = \# \text{ nonzero singular values} = \|\sigma(\mathbf{A})\|_0$
* $\mathbf{A} = \mathbf{U}_r \Sigma_r \mathbf{V}_r^T = \sum_{i=1}^r \sigma_i \mathbf{u}_i \mathbf{v}_i^T$
* $\|\mathbf{A}\|_{\text{F}} = (\sum_{i=1}^p \sigma_i^2)^{1/2} = \|\sigma(\mathbf{A})\|_2$
* $\|\mathbf{A}\|_2 = \sigma_1 = \|\sigma(\mathbf{A})\|_\infty$
* Assume $\text{rank}(\mathbf{A}) = r$ and partition
$$
\begin{eqnarray*}
\mathbf{U} &=& (\mathbf{U}_r, \tilde{\mathbf{U}}_r) \in \mathbb{R}^{m \times m} \\
\mathbf{V} &=& (\mathbf{V}_r, \tilde{\mathbf{V}}_r) \in \mathbb{R}^{n \times n}.
\end{eqnarray*}
$$
Then
* ${\cal C}(\mathbf{A}) = {\cal C}(\mathbf{U}_r)$, ${\cal N}(\mathbf{A}^T) = {\cal C}(\tilde{\mathbf{U}}_r)$
* ${\cal N}(\mathbf{A}) = {\cal C}(\tilde{\mathbf{V}}_r)$, ${\cal C}(\mathbf{A}^T) = {\cal C}(\mathbf{V}_r)$
* $\mathbf{U}_r \mathbf{U}_r^T$ is the orthogonal projection onto ${\cal C}(\mathbf{A})$
* $\tilde{\mathbf{U}}_r \tilde{\mathbf{U}}_r^T$ is the orthogonal projection onto ${\cal N}(\mathbf{A}^T)$
* $\mathbf{V}_r \mathbf{V}_r^T$ is the orthogonal projection onto ${\cal C}(\mathbf{A}^T)$
* $\tilde{\mathbf{V}}_r \tilde{\mathbf{V}}_r^T$ is the orthogonal projection onto ${\cal N}(\mathbf{A})$
* Relation to eigen-decomposition. Using thin SVD,
$$
\begin{eqnarray*}
\mathbf{A}^T \mathbf{A} &=& \mathbf{V} \Sigma \mathbf{U}^T \mathbf{U} \Sigma \mathbf{V}^T = \mathbf{V} \Sigma^2 \mathbf{V}^T \\
\mathbf{A} \mathbf{A}^T &=& \mathbf{U} \Sigma \mathbf{V}^T \mathbf{V} \Sigma \mathbf{U}^T = \mathbf{U} \Sigma^2 \mathbf{U}^T.
\end{eqnarray*}
$$
In principle we can obtain singular triplets of $\mathbf{A}$ by doing two eigen-decompositions.
* Another relation to eigen-decomposition. Using thin SVD,
$$
\begin{eqnarray*}
\begin{pmatrix} \mathbf{0}_{n \times n} & \mathbf{A}^T \\ \mathbf{A} & \mathbf{0}_{m \times m} \end{pmatrix} = \frac{1}{\sqrt 2} \begin{pmatrix} \mathbf{V} & \mathbf{V} \\ \mathbf{U} & -\mathbf{U} \end{pmatrix} \begin{pmatrix} \Sigma & \mathbf{0}_{n \times n} \\ \mathbf{0}_{n \times n} & - \Sigma \end{pmatrix} \frac{1}{\sqrt 2} \begin{pmatrix} \mathbf{V}^T & \mathbf{U}^T \\ \mathbf{V}^T & - \mathbf{U}^T \end{pmatrix}.
\end{eqnarray*}
$$
Hence any symmetric eigen-solver can produce the SVD of a matrix $\mathbf{A}$ without forming $\mathbf{A} \mathbf{A}^T$ or $\mathbf{A}^T \mathbf{A}$.
* Yet another relation to eigen-decomposition: If the eigen-decomposition of a real symmetric matrix is $\mathbf{A} = \mathbf{W} \Lambda \mathbf{W}^T = \mathbf{W} \text{diag}(\lambda_1, \ldots, \lambda_n) \mathbf{W}^T$, then
$$
\begin{eqnarray*}
\mathbf{A} = \mathbf{W} \Lambda \mathbf{W}^T = \mathbf{W} \begin{pmatrix}
|\lambda_1| & & \\
& \ddots & \\
& & |\lambda_n|
\end{pmatrix} \begin{pmatrix}
\text{sgn}(\lambda_1) & & \\
& \ddots & \\
& & \text{sgn}(\lambda_n)
\end{pmatrix} \mathbf{W}^T
\end{eqnarray*}
$$
is the SVD of $\mathbf{A}$.
## Applications of eigen-decomposition and SVD
### Principal components analysis (PCA).
$\mathbf{X} \in \mathbb{R}^{n \times p}$ is a centered data matrix. Perform SVD $\mathbf{X} = \mathbf{U} \Sigma \mathbf{V}^T$ or equivalently eigendecomposition $\mathbf{X}^T \mathbf{X} = \mathbf{V} \Sigma^2 \mathbf{V}^T$. The linear combinations $\tilde{\mathbf{x}}_i = \mathbf{X} \mathbf{v}_i$ are the **principal components** (PC) and have variance $\sigma_i^2$.
* Dimension reduction: reduce dimensionality $p$ to $q \ll p$. Use top PCs $\tilde{\mathbf{x}}_1, \ldots, \tilde{\mathbf{x}}_q$ in visualization and downstream analysis.
Above picture is from the article [Genes mirror geography within Europe](http://www.nature.com/nature/journal/v456/n7218/full/nature07331.html) by Novembre et al (2008) published in _Nature_.
* Use PCs to adjust for confounding - a serious issue in association studies in large data sets.
* Use of PCA to adjust for confounding in modern genetic studies is proposed in the paper [Principal components analysis corrects for stratification in genome-wide association studies](http://www.nature.com/ng/journal/v38/n8/full/ng1847.html) by Price et al (2006) published in _Nature Genetics_. It has been cited 6,937 times as of May 3, 2019.
### Low rank approximation
For example, image/data compression. Find a low rank approximation of data matrix $\mathbf{x}$.
**Eckart-Young theorem**:
$$
\min_{\text{rank}(\mathbf{Y})=r} \|\mathbf{X} - \mathbf{Y} \|_{\text{F}}^2
$$
is achieved by $\mathbf{Y} = \sum_{i=1}^r \sigma_i \mathbf{u}_i \mathbf{v}_i^T$ with optimal value $\sum_{i=r}^{p} \sigma_i^2$, where $(\sigma_i, \mathbf{u}_i, \mathbf{v}_i)$ are singular values and vectors of $\mathbf{X}$.
* Gene Golub's $897 \times 598$ picture requires $3 \times 897 \times 598 \times 8 = 12,873,744$ bytes (3 RGB channels).
* Rank 50 approximation requires $3 \times 50 \times (897 + 598) \times 8 = 1,794,000$ bytes.
* Rank 12 approximation requires $12 \times (2691+598) \times 8 = 430,560$ bytes.
### Moore-Penrose (MP) inverse
Using thin SVD,
$$
\mathbf{A}^+ = \mathbf{V} \Sigma^+ \mathbf{U}^T,
$$
where $\Sigma^+ = \text{diag}(\sigma_1^{-1}, \ldots, \sigma_r^{-1}, 0, \ldots, 0)$, $r= \text{rank}(\mathbf{A})$. This is how the [`pinv`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.pinv) function is implemented in Julia.
```{julia}
using Random, LinearAlgebra
Random.seed!(123)
X = randn(5, 3)
pinv(X)
```
```{julia}
# calculation of Moore-Penrose inverse by SVD
@which pinv(X)
```
### Least squares
Given thin SVD $\mathbf{X} = \mathbf{U} \Sigma \mathbf{V}^T$,
$$
\begin{eqnarray*}
\widehat \beta &=& (\mathbf{X}^T \mathbf{X})^- \mathbf{X}^T \mathbf{y} \\
&=& (\mathbf{V} \Sigma^2 \mathbf{V}^T)^+ \mathbf{V} \Sigma \mathbf{U}^T \mathbf{y} \\
&=& \mathbf{V} (\Sigma^{2})^+ \mathbf{V}^T \mathbf{V} \Sigma \mathbf{U}^T \mathbf{y} \\
&=& \mathbf{V}_r \Sigma_r^{-1} \mathbf{U}_r^T \mathbf{y} \\
&=& \sum_{i=1}^r \left( \frac{\mathbf{u}_i^T \mathbf{y}}{\sigma_i} \right) \mathbf{v}_i
\end{eqnarray*}
$$
and
$$
\begin{eqnarray*}
\widehat{\mathbf{y}} &=& \mathbf{X} \widehat \beta = \mathbf{U}_r \mathbf{U}_r^T \mathbf{y}.
\end{eqnarray*}
$$
In general, SVD is more expensive than other approaches (Cholesky, Sweep, QR) we learnt. In some applications, SVD is computed for other purposes then we get least squares solution for free.
### Ridge regression
* In ridge regression, we minimize
$$
\|\mathbf{y} - \mathbf{X} \beta\|_2^2 + \lambda \|\beta\|_2^2,
$$
where $\lambda$ is a tuning parameter.
* Ridge regression by augmented linear regression. Ridge regression problem is equivalent to
$$
\left\| \begin{pmatrix} \mathbf{y} \\ \mathbf{0}_p \end{pmatrix} - \begin{pmatrix}
\mathbf{X} \\ \sqrt \lambda \mathbf{I}_p
\end{pmatrix} \beta \right\|_2^2.
$$
Therefore any methods for linear regression can be applied.
* Ridge regression by method of normal equation. The normal equation for the ridge problem is
$$
(\mathbf{X}^T \mathbf{X} + \lambda \mathbf{I}_p) \beta = \mathbf{X}^T \mathbf{y}.
$$
Therefore Cholesky or sweep operator can be used.
* Ridge regression by SVD. If we obtain the (thin) SVD of $\mathbf{X}$
$$
\mathbf{X} = \mathbf{U} \Sigma_{p \times p} \mathbf{V}^T.
$$
Then the normal equation reads
$$
(\Sigma^2 + \lambda \mathbf{I}_p) \mathbf{V}^T \beta = \Sigma \mathbf{U}^T \mathbf{y}
$$
and we get
$$
\widehat \beta (\lambda) = \sum_{i=1}^p \frac{\sigma_i \mathbf{u}_i^T \mathbf{y}}{\sigma_i^2 + \lambda} \mathbf{v}_i = \sum_{i=1}^r \frac{\sigma_i \mathbf{u}_i^T \mathbf{y}}{\sigma_i^2 + \lambda} \mathbf{v}_i, \quad r = \text{rank}(\mathbf{X}).
$$
It is clear that
$$
\begin{eqnarray*}
\lim_{\lambda \to 0} \widehat \beta (\lambda) = \widehat \beta_{\text{OLS}}
\end{eqnarray*}
$$
and $\|\widehat \beta (\lambda)\|_2$ is monotone decreasing as $\lambda$ increases.
* Only one SVD is needed for all $\lambda$ (!), in contrast to the method of augmented linear regression, Cholesky, or sweep.
### Other applications
See Chapters 8-9 of [Numerical Analysis for Statisticians](http://ucla.worldcat.org/title/numerical-analysis-for-statisticians/oclc/793808354&referer=brief_results) by Kenneth Lange (2010) for more applications of eigen-decomposition and SVD.
## Algorithms for eigen-decomposition
### One eigen-pair: power method
* **Power method** iterates according to
$$
\begin{eqnarray*}
\mathbf{x}^{(t)} &\gets& \frac{1}{\|\mathbf{A} \mathbf{x}^{(t-1)}\|_2} \mathbf{A} \mathbf{x}^{(t-1)}
\end{eqnarray*}
$$
from an initial guess $\mathbf{x}^{(0)}$ of unit norm.
* Suppose we arrange $|\lambda_1| > |\lambda_2| \ge \cdots \ge |\lambda_n|$ (the first inequality strict) with corresponding eigenvectors $\mathbf{u}_i$, and expand $\mathbf{x}^{(0)} = c_1 \mathbf{u}_1 + \cdots + c_n \mathbf{u}_n$, then
$$
\begin{eqnarray*}
\mathbf{x}^{(t)} &=& \frac{\left( \sum_i \lambda_i^t \mathbf{u}_i \mathbf{u}_i^T \right) \left( \sum_i c_i \mathbf{u}_i \right)}{\|\left( \sum_i \lambda_i^t \mathbf{u}_i \mathbf{u}_i^T \right) \left( \sum_i c_i \mathbf{u}_i \right)\|_2} \\
&=& \frac{\sum_i c_i \lambda_i^t \mathbf{u}_i}{\|\sum_i c_i \lambda_i^t \mathbf{u}_i\|_2} \\
&=& \frac{c_1 \mathbf{u}_1 + c_2 (\lambda_2/\lambda_1)^t \mathbf{u}_2 + \cdots + c_n (\lambda_n/\lambda_1)^t \mathbf{u}_n}{\|c_1 \mathbf{u}_1 + c_2 (\lambda_2/\lambda_1)^t \mathbf{u}_2 + \cdots + c_n (\lambda_n/\lambda_1)^t \mathbf{u}_n\|_2} \left( \frac{\lambda_1}{|\lambda_1|} \right)^t.
\end{eqnarray*}
$$
Thus $\mathbf{x}^{(t)} - \frac{c_1 \mathbf{u}_1}{\|c_1 \mathbf{u}_1\|_2} \left( \frac{\lambda_1}{|\lambda_1|} \right)^t \to 0$ as $t \to \infty$. The convergence rate is $|\lambda_2|/|\lambda_1|$.
* $\lambda_1^{(t)} = \mathbf{x}^{(t)T} \mathbf{A} \mathbf{x}^{(t)}$ converges to $\lambda_1$.
* **Inverse power method** for finding the eigenvalue of smallest absolute value: Substitute $\mathbf{A}$ by $\mathbf{A}^{-1}$ in the power method. (E.g., pre-compute LU or Cholesky of $\mathbf{A}$).
* **Shifted inverse power**: Substitute $(\mathbf{A} - \mu \mathbf{I})^{-1}$ in the power method. It converges to an eigenvalue close to the given $\mu$.
* **Rayleigh quotient iteration**: Substitute $(\mathbf{A} - \mu^{(t-1)} \mathbf{I})^{-1}$, where $\mu^{(t-1)} = \mathbf{x}^{(t-1)T} \mathbf{A} \mathbf{x}^{(t-1)}$ in the shifted inverse method. Faster convergence.
* Example: PageRank problem seeks top left eigenvector of transition matrix $\mathbf{P}$ and costs $O(n)$ per iteration.
### Top $r$ eigen-pairs: orthogonal iteration
Generalization of power method to higher dimensional invariant subspace.
* **Orthogonal iteration**: Initialize $\mathbf{Q}^{(0)} \in \mathbb{R}^{n \times r}$ with orthonormal columns. For $t=1,2,\ldots$,
$$
\begin{eqnarray*}
\mathbf{Z}^{(t)} &\gets& \mathbf{A} \mathbf{Q}^{(t-1)} \quad \text{($2n^2r$ flops)} \\
\mathbf{Q}^{(t)} \mathbf{R}^{(t)} &\gets& \mathbf{Z}^{(t)} \quad \text{(QR factorization)}%, $nr^2 - r^3/3$ flops)}
\end{eqnarray*}
$$
* $\mathbf{Z}^{(t)}$ converges to the eigenspace of the largest $r$ eigenvalues if they are real and separated from remaining spectrum. The convergence rate is $|\lambda_{r+1}|/|\lambda_r|$.
### (Impractical) full eigen-decomposition: QR iteration
Assume $\mathbf{A} \in \mathbb{R}^{n \times n}$ symmetric.
* Take $r=n$ in the orthogonal iteration. Then $\mathbf{Q}^{(t)}$ converges to the eigenspace $\mathbf{U}$ of $\mathbf{A}$. This implies that
$$
\mathbf{T}^{(t)} := \mathbf{Q}^{(t)T} \mathbf{A} \mathbf{Q}^{(t)}
$$
converges to a diagonal form $\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n)$.
* Note how to compute $\mathbf{T}^{(t)}$ from $\mathbf{T}^{(t-1)}$
$$
\begin{eqnarray*}
\mathbf{T}^{(t-1)} &=& \mathbf{Q}^{(t-1)T} \mathbf{A} \mathbf{Q}^{(t-1)} = \mathbf{Q}^{(t-1)T} (\mathbf{A} \mathbf{Q}^{(t-1)}) = (\mathbf{Q}^{(t-1)T} \mathbf{Q}^{(t)}) \mathbf{R}^{(t)} \\\mathbf{A}
\mathbf{T}^{(t)} &=& \mathbf{Q}^{(t)T} \mathbf{A} \mathbf{Q}^{(t)} = \mathbf{Q}^{(t)T} \mathbf{A} \mathbf{Q}^{(t-1)} \mathbf{Q}^{(t-1)T} \mathbf{Q}^{(t)} = \mathbf{R}^{(t)} ( \mathbf{Q}^{(t-1)T} \mathbf{Q}^{(t)}).
\end{eqnarray*}
$$
* **QR iteration**: Initialize $\mathbf{U}^{(0)} \in \mathbb{R}^{n \times n}$ orthogonal and set $\mathbf{T}^{(0)} = \mathbf{U}^{(0)T} \mathbf{A} \mathbf{U}^{(0)}$. \\
For $t=1,2,\ldots$
$$
\begin{eqnarray*}
\mathbf{U}^{(t)} \mathbf{R}^{(t)} &\gets& \mathbf{T}^{(t-1)} \quad \text{(QR factorization)} \\
\mathbf{T}^{(t)} &\gets& \mathbf{R}^{(t)} \mathbf{U}^{(t)}
\end{eqnarray*}
$$
* QR iteration is expensive in general: $O(n^3)$ per iteration and linear convergence rate.
### QR algorithm for symmetric eigen-decomposition
Assume $\mathbf{A} \in \mathbb{R}^{n \times n}$ symmetric.
* Reading: [The QR algorithm](http://hua-zhou.github.io/teaching/biostatm280-2018spring/readings/qr.pdf) by Beresford N. Parlett.
* This is the algorithm implemented in LAPACK: used by Julia, Matlab, R.
* Idea: Tri-diagonalization (by Householder) + QR iteration on the tri-diagonal system with implicit shift.
0. Step 1: Householder tri-diagonalization: $4n^3/3$ for eigenvalues only, $8n^3/3$ for both eigenvalues and eigenvectors. (Why can't we apply Householder to make it diagonal directly?)
0. Step 2: QR iteration on the tridiagonal matrix. Implicit shift accelerates convergence rate. On average 1.3-1.6 QR iteration per eigenvalue, $\sim 20n$ flops per QR iteration. So total operation count is about $30n^2$. Eigenvectors need an extra of about $6n^3$ flops.
| Stage | Eigenvalue | Eigenvector |
|------------------------|--------------|-------------|
| Householder reduction | $4n^3/3$ | $4n^3/3$ |
| QR with implicit shift | $\sim 30n^2$ | $\sim 6n^3$ |
* Message: **Don't request eigenvectors unless necessary**. Use [`eigvals`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigvals) in Julia to request only eigenvalues.
* The **unsymmetric QR algorithm** obtains the real Schur decomposition of an asymmetric matrix $\mathbf{A}$.
### Example
Julia functions: [`eigen`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigen), [`eigen!`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigen!), [`eigvals`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigvals!), [`eigvecs`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigvecs), [`eigmax`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigmax), [`eigmin`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.eigmin).
```{julia}
Random.seed!(123)
A = Symmetric(randn(5, 5), :U)
Aeig = eigen(A)
```
```{julia}
# eigen-values
Aeig.values
```
```{julia}
# eigen-vectors
Aeig.vectors
```
```{julia}
# inversion by eigen-decomposition
inv(Aeig)
```
```{julia}
@which inv(Aeig)
```
```{julia}
# determinant by eigen-decomposition
det(Aeig)
```
```{julia}
@which det(Aeig)
```
```{julia}
@which eigvals(A)
```
```{julia}
@which eigmax(A)
```
```{julia}
@which eigmin(A)
```
Don't request eigenvectors unless needed.
```{julia}
using BenchmarkTools, Random, LinearAlgebra
Random.seed!(123)
n = 1000
A = Symmetric(randn(n, n), :U)
```
```{julia}
# requesting eigenvalues only is cheaper
@benchmark eigvals($A)
```
```{julia}
# requesting eigenvectors requires extra work
@benchmark eigen($A)
```
```{julia}
@benchmark eigvecs($A)
```
## Algorithm for singular value decomposition (SVD)
Assume $\mathbf{A} \in \mathbb{R}^{m \times n}$ and we seek the SVD $\mathbf{A} = \mathbf{U} \mathbf{D} \mathbf{V}^T$.
* **Golub-Kahan-Reinsch algorithm**:
* Stage 1: Transform $\mathbf{A}$ to an upper bidiagonal form $\mathbf{B}$ (by Householder).
* Stage 2: Apply implicit-shift QR iteration to the tridiagonal matrix $\mathbf{B}^T \mathbf{B}$ _implicitly_.
* See Section 8.6 of [Matrix Computation](http://catalog.library.ucla.edu/vwebv/holdingsInfo?bibId=7122088) by Gene Golub and Charles Van Loan (2013) for more details.
* $4m^2 n + 8mn^2 + 9n^3$ flops for a tall $(m \ge n)$ matrix.
### Example
Julia functions: [`svd`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.svd), [`svd!`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.svd), [`svdvals`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.svdvals), [`svdvals!`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.svdvals!).
```{julia}
Random.seed!(123)
A = randn(5, 3)
Asvd = svd(A)
```
```{julia}
Asvd.U
```
```{julia}
# Vt is cheaper to extract than V
Asvd.Vt
```
```{julia}
Asvd.V
```
```{julia}
Asvd.S
```
**Don't request singular vectors unless needed.**
```{julia}
Random.seed!(123)
n, p = 1000, 500
A = randn(n, p)
@benchmark svdvals(A)
```
```{julia}
@benchmark svd(A)
```
## Lanczos/Arnoldi iterative method for top eigen-pairs
* Consider the Google PageRank problem. We want to find the top left eigenvector of the transition matrix $\mathbf{P}$. Direct methods such as (unsymmetric) QR or SVD takes forever. Iterative methods such as power method is feasible. However power method may take a large number of iterations.
* **Krylov subspace methods** are the state-of-art iterative methods for obtaining the top eigen-values/vectors or singular values/vectors of large **sparse** or **structured** matrices.
* **Lanczos method**: top eigen-pairs of a large _symmetric_ matrix.
* **Arnoldi method**: top eigen-pairs of a large _asymmetric_ matrix.
* Both methods are also adapted to obtain top singular values/vectors of large sparse or structured matrices.
* `eigs` and `svds` in Julia [Arpack.jl](https://github.com/JuliaLinearAlgebra/Arpack.jl) package and Matlab are wrappers of the ARPACK package, which implements Lanczos and Arnoldi methods.
```{julia}
using MatrixDepot, SparseArrays
# Download the Boeing/bcsstk38 matrix (sparse, pd, 8032-by-8032) from SuiteSparse collection
# https://www.cise.ufl.edu/research/sparse/matrices/Boeing/bcsstk38.html
A = matrixdepot("Boeing/bcsstk38")
# Change type of A from Symmetric{Float64,SparseMatrixCSC{Float64,Int64}} to SparseMatrixCSC
A = sparse(A)
@show typeof(A)
Afull = Matrix(A)
@show typeof(Afull)
# actual sparsity level
count(!iszero, A) / length(A)
```
```{julia}
using UnicodePlots
spy(A)
```
```{julia}
# top 5 eigenvalues by LAPACK (QR algorithm)
n = size(A, 1)
@time eigvals(Symmetric(Afull), (n-4):n)
```
```{julia}
using Arpack
# top 5 eigenvalues by iterative methods
@time eigs(A; nev=5, ritzvec=false, which=:LM)
```
```{julia}
@benchmark eigs($A; nev=5, ritzvec=false, which=:LM)
```
We see >1000 fold speedup in this case.
## Jacobi method for symmetric eigen-decomposition
Assume $\mathbf{A} \in \mathbf{R}^{n \times n}$ is symmetric and we seek the eigen-decomposition $\mathbf{A} = \mathbf{U} \Lambda \mathbf{U}^T$.
* Idea: Systematically reduce off-diagonal entries
$$
\text{off}(\mathbf{A}) = \sum_i \sum_{j \ne i} a_{ij}^2
$$
by Jacobi rotations.
* Jacobi/Givens rotations:
$$
\begin{eqnarray*}
\mathbf{J}(p,q,\theta) = \begin{pmatrix}
1 & & 0 & & 0 & & 0 \\
\vdots & \ddots & \vdots & & \vdots & & \vdots \\
0 & & \cos(\theta) & & \sin(\theta) & & 0 \\
\vdots & & \vdots & \ddots & \vdots & & \vdots \\
0 & & - \sin(\theta) & & \cos(\theta) & & 0 \\
\vdots & & \vdots & & \vdots & \ddots & \vdots \\
0 & & 0 & & 0 & & 1 \end{pmatrix},
\end{eqnarray*}
$$
$\mathbf{J}(p,q,\theta)$ is orthogonal.
* Consider $\mathbf{B} = \mathbf{J}^T \mathbf{A} \mathbf{J}$. $\mathbf{B}$ preserves the symmetry and eigenvalues of $\mathbf{A}$. Taking
$$
\begin{eqnarray*}
\begin{cases}
\tan (2\theta) = 2a_{pq}/({a_{qq}-a_{pp}}) & \text{if } a_{pp} \ne a_{qq} \\
\theta = \pi/4 & \text{if } a_{pp}=a_{qq}
\end{cases}
\end{eqnarray*}
$$
forces $b_{pq}=0$.
* Since orthogonal transform preserves Frobenius norm, we have
$$
b_{pp}^2 + b_{qq}^2 = a_{pp}^2 + a_{qq}^2 + 2a_{pq}^2.
$$
(Just check the 2-by-2 block)
* Since $\|\mathbf{A}\|_{\text{F}} = \|\mathbf{B}\|_{\text{F}}$, this implies that the off-diagonal part
$$
\text{off}(\mathbf{B}) = \text{off}(\mathbf{A}) - 2a_{pq}^2
$$
is decreased whenever $a_{pq} \ne 0$.
* One Jacobi rotation costs $O(n)$ flops.
* **Classical Jacobi**: search for the largest $|a_{ij}|$ at each iteration. $O(n^2)$ efforts!
* $\text{off}(\mathbf{A}) \le n(n-1) a_{ij}^2$ and $\text{off}(\mathbf{B}) = \text{off}(\mathbf{A}) - 2 a_{ij}^2$ together implies
$$
\text{off}(\mathbf{B}) \le \left( 1 - \frac{2}{n(n-1)} \right) \text{off}(\mathbf{A}).
$$
Thus Jacobi method converges in $O(n^2)$ iterations.
* In practice, cyclic-by-row implementation, to avoid the costly $O(n^2)$ search in the classical Jacobi.
* Jacobi method attracts a lot recent attention because of its rich inherent parallelism.
* **Parallel Jacobi**: "merry-go-round" to generate parallel ordering.