{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "

1  Eigen-decomposition and SVD
1.1  Linear algebra review: eigen-decomposition
1.2  Linear algebra review: singular value decomposition (SVD)
1.3  Applications of eigen-decomposition and SVD
1.3.1  Principal components analysis (PCA).
1.3.2  Low rank approximation
1.3.3  Moore-Penrose (MP) inverse
1.3.4  Least squares
1.3.5  Ridge regression
1.3.6  Other applications
1.4  Algorithms for eigen-decomposition
1.4.1  One eigen-pair: power method
1.4.2  Top $r$ eigen-pairs: orthogonal iteration
1.4.3  (Impractical) full eigen-decomposition: QR iteration
1.4.4  QR algorithm for symmetric eigen-decomposition
1.4.5  Example
1.5  Algorithm for singular value decomposition (SVD)
1.5.1  Example
1.6  Lanczos/Arnoldi iterative method for top eigen-pairs
1.7  Jacobi method for symmetric eigen-decomposition
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Eigen-decomposition and SVD\n", "\n", "Our last topic on numerical linear algebra is eigen-decomposition and singular value decomposition (SVD). We already saw the wide applications of QR decomposition in least squares problem and solving square and under-determined linear equations. Eigen-decomposition and SVD can be deemed as more thorough orthogonalization of a matrix. We start with a brief review of the related linear algebra." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear algebra review: eigen-decomposition\n", "\n", "* **Eigenvalues** are defined as roots of the characteristic equation $\\det(\\lambda \\mathbf{I}_n - \\mathbf{A})=0$.\n", "\n", "* If $\\lambda$ is an eigenvalue of $\\mathbf{A}$, then there exist non-zero $\\mathbf{x}, \\mathbf{y} \\in \\mathbb{R}^n$ such that $\\mathbf{A} \\mathbf{x} = \\lambda \\mathbf{x}$ and $\\mathbf{y}^T \\mathbf{A} = \\lambda \\mathbf{y}^T$. $\\mathbf{x}$ and $\\mathbf{x}$ are called the (column) **eigenvector** and **row eigenvector** of $\\mathbf{A}$ associated with the eigenvalue $\\lambda$.\n", "\n", "* $\\mathbf{A}$ is singular if and only if it has at least one 0 eigenvalue.\n", "\n", "* Eigenvectors associated with distinct eigenvalues are linearly independent.\n", "\n", "* Eigenvalues of an upper or lower triangular matrix are its diagonal entries: $\\lambda_i = a_{ii}$.\n", "\n", "* Eigenvalues of an idempotent matrix are either 0 or 1.\n", "\n", "* Eigenvalues of an orthogonal matrix have complex modulus 1.\n", "\n", "* In most statistical applications, we deal with eigenvalues/eigenvectors of symmetric matrices. \n", "The eigenvalues and eigenvectors of a real **symmetric** matrix are real.\n", "\n", "* Eigenvectors associated with distinct eigenvalues of a symmetry matrix are orthogonal.\n", "\n", "* **Eigen-decompostion of a symmetric matrix**: $\\mathbf{A} = \\mathbf{U} \\Lambda \\mathbf{U}^T$, where\n", " * $\\Lambda = \\text{diag}(\\lambda_1,\\ldots,\\lambda_n)$\n", " * columns of $\\mathbf{U}$ are the eigenvectors, which are (or can be chosen to be) mutually orthonormal. Thus $\\mathbf{U}$ is an orthogonal matrix.\n", "\n", "* A real symmetric matrix is positive semidefinite (positive definite) if and only if all eigenvalues are nonnegative (positive).\n", "\n", "* **Spectral radius** $\\rho(\\mathbf{A}) = \\max_i |\\lambda_i|$.\n", "\n", "* $\\mathbf{A} \\in \\mathbb{R}^{n \\times n}$ a square matrix (not required to be symmetric), then $\\text{tr}(\\mathbf{A}) = \\sum_i \\lambda_i$ and $\\det(\\mathbf{A}) = \\prod_i \\lambda_i$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear algebra review: singular value decomposition (SVD)\n", "\n", "* **Singular value decomposition (SVD)**: For a rectangular matrix $\\mathbf{A} \\in \\mathbb{R}^{m \\times n}$, let $p = \\min\\{m,n\\}$, then we have the SVD\n", "$$\n", "\\mathbf{A} = \\mathbf{U} \\Sigma \\mathbf{V}^T,\n", "$$\n", "where\n", " * $\\mathbf{U} = (\\mathbf{u}_1,\\ldots,\\mathbf{u}_m) \\in \\mathbb{R}^{m \\times m}$ is orthogonal\n", " * $\\mathbf{V} = (\\mathbf{v}_1,\\ldots,\\mathbf{v}_n) \\in \\mathbb{R}^{n \\times n}$ is orthogonal\n", " * $\\Sigma = \\text{diag}(\\sigma_1, \\ldots, \\sigma_p) \\in \\mathbb{R}^{m \\times n}$, $\\sigma_1 \\ge \\sigma_2 \\ge \\cdots \\ge \\sigma_p \\ge 0$. \n", "$\\sigma_i$ are called the **singular values**, $\\mathbf{u}_i$ are the **left singular vectors**, and $\\mathbf{v}_i$ are the **right singular vectors**.\n", "\n", "* **Thin/Skinny SVD**. Assume $m \\ge n$. $\\mathbf{A}$ can be factored as \n", "$$\n", "\\mathbf{A} = \\mathbf{U}_n \\Sigma_n \\mathbf{V}^T = \\sum_{i=1}^n \\sigma_i \\mathbf{u}_i \\mathbf{v}_i^T,\n", "$$ \n", "where \n", " * $\\mathbf{U}_n \\in \\mathbb{R}^{m \\times n}$, $\\mathbf{U}_n^T \\mathbf{U}_n = \\mathbf{I}_n$\n", " * $\\mathbf{V} \\in \\mathbb{R}^{n \\times n}$, $\\mathbf{V}^T \\mathbf{V} = \\mathbf{I}_n$\n", " * $\\Sigma_n = \\text{diag}(\\sigma_1,\\ldots,\\sigma_n) \\in \\mathbb{R}^{n \\times n}$, $\\sigma_1 \\ge \\sigma_2 \\ge \\cdots \\ge \\sigma_n \\ge 0$\n", " \n", "* Denote $\\sigma(\\mathbf{A})=(\\sigma_1,\\ldots,\\sigma_p)^T$. Then \n", " * $r = \\text{rank}(\\mathbf{A}) = \\# \\text{ nonzero singular values} = \\|\\sigma(\\mathbf{A})\\|_0$ \n", " * $\\mathbf{A} = \\mathbf{U}_r \\Sigma_r \\mathbf{V}_r^T = \\sum_{i=1}^r \\sigma_i \\mathbf{u}_i \\mathbf{v}_i^T$\n", " * $\\|\\mathbf{A}\\|_{\\text{F}} = (\\sum_{i=1}^p \\sigma_i^2)^{1/2} = \\|\\sigma(\\mathbf{A})\\|_2$\n", " * $\\|\\mathbf{A}\\|_2 = \\sigma_1 = \\|\\sigma(\\mathbf{A})\\|_\\infty$\n", "\n", "* Assume $\\text{rank}(\\mathbf{A}) = r$ and partition \n", "$$\n", "\\begin{eqnarray*}\n", "\\mathbf{U} &=& (\\mathbf{U}_r, \\tilde{\\mathbf{U}}_r) \\in \\mathbb{R}^{m \\times m} \\\\\n", "\\mathbf{V} &=& (\\mathbf{V}_r, \\tilde{\\mathbf{V}}_r) \\in \\mathbb{R}^{n \\times n}.\n", "\\end{eqnarray*}\n", "$$\n", "Then\n", " * ${\\cal C}(\\mathbf{A}) = {\\cal C}(\\mathbf{U}_r)$, ${\\cal N}(\\mathbf{A}^T) = {\\cal C}(\\tilde{\\mathbf{U}}_r)$\n", " * ${\\cal N}(\\mathbf{A}) = {\\cal C}(\\tilde{\\mathbf{V}}_r)$, ${\\cal C}(\\mathbf{A}^T) = {\\cal C}(\\mathbf{V}_r)$\n", " * $\\mathbf{U}_r \\mathbf{U}_r^T$ is the orthogonal projection onto ${\\cal C}(\\mathbf{A})$\n", " * $\\tilde{\\mathbf{U}}_r \\tilde{\\mathbf{U}}_r^T$ is the orthogonal projection onto ${\\cal N}(\\mathbf{A}^T)$\n", " * $\\mathbf{V}_r \\mathbf{V}_r^T$ is the orthogonal projection onto ${\\cal C}(\\mathbf{A}^T)$\n", " * $\\tilde{\\mathbf{V}}_r \\tilde{\\mathbf{V}}_r^T$ is the orthogonal projection onto ${\\cal N}(\\mathbf{A})$\n", "\n", "* Relation to eigen-decomposition. Using thin SVD,\n", "$$\n", "\\begin{eqnarray*}\n", "\t\\mathbf{A}^T \\mathbf{A} &=& \\mathbf{V} \\Sigma \\mathbf{U}^T \\mathbf{U} \\Sigma \\mathbf{V}^T = \\mathbf{V} \\Sigma^2 \\mathbf{V}^T \\\\\n", "\t\\mathbf{A} \\mathbf{A}^T &=& \\mathbf{U} \\Sigma \\mathbf{V}^T \\mathbf{V} \\Sigma \\mathbf{U}^T = \\mathbf{U} \\Sigma^2 \\mathbf{U}^T.\n", "\\end{eqnarray*}\n", "$$\n", "In principle we can obtain singular triplets of $\\mathbf{A}$ by doing two eigen-decompositions.\n", "\n", "* Another relation to eigen-decomposition. Using thin SVD,\n", "$$\n", "\\begin{eqnarray*}\n", "\t\\begin{pmatrix} \\mathbf{0}_{n \\times n} & \\mathbf{A}^T \\\\ \\mathbf{A} & \\mathbf{0}_{m \\times m} \\end{pmatrix} = \\frac{1}{\\sqrt 2} \\begin{pmatrix} \\mathbf{V} & \\mathbf{V} \\\\ \\mathbf{U} & -\\mathbf{U} \\end{pmatrix} \\begin{pmatrix} \\Sigma & \\mathbf{0}_{n \\times n} \\\\ \\mathbf{0}_{n \\times n} & - \\Sigma \\end{pmatrix} \\frac{1}{\\sqrt 2} \\begin{pmatrix} \\mathbf{V}^T & \\mathbf{U}^T \\\\ \\mathbf{V}^T & - \\mathbf{U}^T \\end{pmatrix}.\n", "\\end{eqnarray*}\n", "$$\n", "Hence any symmetric eigen-solver can produce the SVD of a matrix $\\mathbf{A}$ without forming $\\mathbf{A} \\mathbf{A}^T$ or $\\mathbf{A}^T \\mathbf{A}$.\n", "\n", "* Yet another relation to eigen-decomposition: If the eigen-decomposition of a real symmetric matrix is $\\mathbf{A} = \\mathbf{W} \\Lambda \\mathbf{W}^T = \\mathbf{W} \\text{diag}(\\lambda_1, \\ldots, \\lambda_n) \\mathbf{W}^T$, then\n", "$$\n", "\\begin{eqnarray*}\n", "\t\\mathbf{A} = \\mathbf{W} \\Lambda \\mathbf{W}^T = \\mathbf{W} \\begin{pmatrix} \n", "\t|\\lambda_1| & & \\\\\n", "\t& \\ddots & \\\\\n", "\t& & |\\lambda_n|\n", "\t\\end{pmatrix} \\begin{pmatrix} \n", "\t\\text{sgn}(\\lambda_1) & & \\\\\n", "\t& \\ddots & \\\\\n", "\t& & \\text{sgn}(\\lambda_n)\n", "\t\\end{pmatrix} \\mathbf{W}^T\n", "\\end{eqnarray*}\n", "$$\n", "is the SVD of $\\mathbf{A}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Applications of eigen-decomposition and SVD\n", "\n", "\n", "### Principal components analysis (PCA). \n", "\n", "$\\mathbf{X} \\in \\mathbb{R}^{n \\times p}$ is a centered data matrix. Perform SVD $\\mathbf{X} = \\mathbf{U} \\Sigma \\mathbf{V}^T$ or equivalently eigendecomposition $\\mathbf{X}^T \\mathbf{X} = \\mathbf{V} \\Sigma^2 \\mathbf{V}^T$. The linear combinations $\\tilde{\\mathbf{x}}_i = \\mathbf{X} \\mathbf{v}_i$ are the **principal components** (PC) and have variance $\\sigma_i^2$.\n", "\n", "* Dimension reduction: reduce dimensionality $p$ to $q \\ll p$. Use top PCs $\\tilde{\\mathbf{x}}_1, \\ldots, \\tilde{\\mathbf{x}}_q$ in visualization and downstream analysis.\n", "\n", "\n", "\n", "Above picture is from the article [Genes mirror geography within Europe](http://www.nature.com/nature/journal/v456/n7218/full/nature07331.html) by Novembre et al (2008) published in _Nature_. \n", "\n", "* Use PCs to adjust for confounding - a serious issue in association studies in large data sets.\n", " * Use of PCA to adjust for confounding in modern genetic studies is proposed in the paper [Principal components analysis corrects for stratification in genome-wide association studies](http://www.nature.com/ng/journal/v38/n8/full/ng1847.html) by Price et al (2006) published in _Nature Genetics_. It has been cited 5,312 times as of May 8, 2017.\n", " \n", "### Low rank approximation\n", "\n", "For example, image/data compression. Find a low rank approximation of data matrix $\\mathbf{x}$. \n", "**Eckart-Young theorem**: \n", "$$\n", "\\min_{\\text{rank}(\\mathbf{Y})=r} \\|\\mathbf{X} - \\mathbf{Y} \\|_{\\text{F}}^2\n", "$$\n", "is achieved by $\\mathbf{Y} = \\sum_{i=1}^r \\sigma_i \\mathbf{u}_i \\mathbf{v}_i^T$ with optimal value $\\sum_{i=r}^{p} \\sigma_i^2$, where $(\\sigma_i, \\mathbf{u}_i, \\mathbf{v}_i)$ are singular values and vectors of $\\mathbf{X}$.\n", "\n", "* Gene Golub's $897 \\times 598$ picture requires $2691 \\times 598 \\times 8 = 12,873,744$ bytes (3 RGB channels). \n", "* Rank 120 approximation requires $120 \\times (2691+598) \\times 8 = 3,157,440$ bytes. \n", "* Rank 50 approximation requires $50 \\times (2691+598) \\times 8 = 1,315,600$ bytes. \n", "* Rank 12 approximation requires $12 \\times (2691+598) \\times 8 = 315,744$ bytes.\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "### Moore-Penrose (MP) inverse\n", "\n", "Using thin SVD, \n", "$$\n", "\\mathbf{A}^+ = \\mathbf{V} \\Sigma^+ \\mathbf{U}^T,\n", "$$\n", "where $\\Sigma^+ = \\text{diag}(\\sigma_1^{-1}, \\ldots, \\sigma_r^{-1}, 0, \\ldots, 0)$, $r= \\text{rank}(\\mathbf{A})$. This is how the [`pinv`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.pinv) function is implemented in Julia." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×5 Array{Float64,2}:\n", " 0.427013 0.0888668 0.0964091 0.37396 0.26257 \n", " 0.358517 0.0510759 0.214358 -0.420113 0.137496\n", " 0.00805499 -0.080998 -0.50504 0.295226 -0.180994" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X = randn(5, 3)\n", "pinv(X)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "pinv{T}(A::Union{Base.ReshapedArray{T,2,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{DenseArray, SubArray{T,N,P,I,true} where I<:Tuple{Union{Base.Slice, UnitRange},Vararg{Any,N} where N} where P where N where T}, DenseArray{T,2}, SubArray{T,2,A,I,L} where L} where I<:Tuple{Vararg{Union{Base.AbstractCartesianIndex, Int64, Range{Int64}},N} where N} where A<:Union{Base.ReshapedArray{T,N,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{DenseArray, SubArray{T,N,P,I,true} where I<:Tuple{Union{Base.Slice, UnitRange},Vararg{Any,N} where N} where P where N where T} where N where T, DenseArray}) at linalg/dense.jl:863" ], "text/plain": [ "pinv(A::Union{Base.ReshapedArray{T,2,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{DenseArray, SubArray{T,N,P,I,true} where I<:Tuple{Union{Base.Slice, UnitRange},Vararg{Any,N} where N} where P where N where T}, DenseArray{T,2}, SubArray{T,2,A,I,L} where L} where I<:Tuple{Vararg{Union{Base.AbstractCartesianIndex, Int64, Range{Int64}},N} where N} where A<:Union{Base.ReshapedArray{T,N,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{DenseArray, SubArray{T,N,P,I,true} where I<:Tuple{Union{Base.Slice, UnitRange},Vararg{Any,N} where N} where P where N where T} where N where T, DenseArray}) where T in Base.LinAlg at linalg/dense.jl:863" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# calculation of Moore-Penrose inverse by SVD\n", "@which pinv(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Least squares\n", "\n", "Given thin SVD $\\mathbf{X} = \\mathbf{U} \\Sigma \\mathbf{V}^T$, \n", "$$\n", "\\begin{eqnarray*}\n", "\t\\widehat \\beta &=& (\\mathbf{X}^T \\mathbf{X})^- \\mathbf{X}^T \\mathbf{y} \\\\\n", "\t&=& (\\mathbf{V} \\Sigma^2 \\mathbf{V}^T)^+ \\mathbf{V} \\Sigma \\mathbf{U}^T \\mathbf{y} \\\\\n", "\t&=& \\mathbf{V} (\\Sigma^{2})^+ \\mathbf{V}^T \\mathbf{V} \\Sigma \\mathbf{U}^T \\mathbf{y} \\\\\n", "\t&=& \\mathbf{V}_r \\Sigma_r^{-1} \\mathbf{U}_r^T \\mathbf{y} \\\\\n", "\t&=& \\sum_{i=1}^r \\left( \\frac{\\mathbf{u}_i^T \\mathbf{y}}{\\sigma_i} \\right) \\mathbf{v}_i\n", "\\end{eqnarray*}\n", "$$\n", "and\n", "$$\n", "\\begin{eqnarray*}\n", "\t\\widehat{\\mathbf{y}} &=& \\mathbf{X} \\widehat \\beta = \\mathbf{U}_r \\mathbf{U}_r^T \\mathbf{y}.\n", "\\end{eqnarray*}\n", "$$\n", "In general, SVD is more expensive than other approaches (Cholesky, Sweep, QR) we learnt. In some applications, SVD is computed for other purposes then we get least squares solution for free." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ridge regression\n", "\n", "* In ridge regression, we minimize\n", "$$\n", "\t\\|\\mathbf{y} - \\mathbf{X} \\beta\\|_2^2 + \\lambda \\|\\beta\\|_2^2,\n", "$$\n", "where $\\lambda$ is a tuning parameter.\n", "\n", "* Ridge regression by augmented linear regression. Ridge regression problem is equivalent to\n", "$$\n", "\t\\left\\| \\begin{pmatrix} \\mathbf{y} \\\\ \\mathbf{0}_p \\end{pmatrix} - \\begin{pmatrix}\n", "\t\\mathbf{X} \\\\ \\sqrt \\lambda \\mathbf{I}_p\n", "\t\\end{pmatrix} \\beta \\right\\|_2^2.\n", "$$\n", "Therefore any methods for linear regression can be applied.\n", "\n", "* Ridge regression by method of normal equation. The normal equation for the ridge problem is\n", "$$\n", "\t(\\mathbf{X}^T \\mathbf{X} + \\lambda \\mathbf{I}_p) \\beta = \\mathbf{X}^T \\mathbf{y}.\n", "$$\n", "Therefore Cholesky or sweep operator can be used.\n", "\n", "* Ridge regression by SVD. If we obtain the (thin) SVD of $\\mathbf{X}$\n", "$$\n", "\t\\mathbf{X} = \\mathbf{U} \\Sigma_{p \\times p} \\mathbf{V}^T.\n", "$$\n", "Then the normal equation reads\n", "$$\n", "\t(\\Sigma^2 + \\lambda \\mathbf{I}_p) \\mathbf{V}^T \\beta = \\Sigma \\mathbf{U}^T \\mathbf{y}\n", "$$\n", "and we get\n", "$$\n", "\t\\widehat \\beta (\\lambda) = \\sum_{i=1}^p \\frac{\\sigma_i \\mathbf{u}_i^T \\mathbf{y}}{\\sigma_i^2 + \\lambda} \\mathbf{v}_i = \\sum_{i=1}^r \\frac{\\sigma_i \\mathbf{u}_i^T \\mathbf{y}}{\\sigma_i^2 + \\lambda} \\mathbf{v}_i, \\quad r = \\text{rank}(\\mathbf{X}).\n", "$$\n", "It is clear that \n", "$$\n", "\\begin{eqnarray*}\n", "\t\\lim_{\\lambda \\to 0} \\widehat \\beta (\\lambda) = \\widehat \\beta_{\\text{OLS}}\n", "\\end{eqnarray*}\n", "$$\n", "and $\\|\\widehat \\beta (\\lambda)\\|_2$ is monotone decreasing as $\\lambda$ increases.\n", "\n", "* Only one SVD is needed for all $\\lambda$ (!), in contrast to the method of augmented linear regression, Cholesky, or sweep." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Other applications\n", "\n", "See Chapters 8-9 of [Numerical Analysis for Statisticians](http://ucla.worldcat.org/title/numerical-analysis-for-statisticians/oclc/793808354&referer=brief_results) by Kenneth Lange (2010) for more applications of eigen-decomposition and SVD." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Algorithms for eigen-decomposition\n", "\n", "### One eigen-pair: power method\n", "\n", "* **Power method** iterates according to\n", "$$\n", "\\begin{eqnarray*}\n", "\t\\mathbf{x}^{(t)} &\\gets& \\frac{1}{\\|\\mathbf{A} \\mathbf{x}^{(t-1)}\\|_2} \\mathbf{A} \\mathbf{x}^{(t-1)}\n", "\\end{eqnarray*}\n", "$$\n", "from an initial guess $\\mathbf{x}^{(0)}$ of unit norm.\n", "\n", "* Suppose we arrange $|\\lambda_1| > |\\lambda_2| \\ge \\cdots \\ge |\\lambda_n|$ (the first inequality strict) with corresponding eigenvectors $\\mathbf{u}_i$, and expand $\\mathbf{x}^{(0)} = c_1 \\mathbf{u}_1 + \\cdots + c_n \\mathbf{u}_n$, then\n", "$$\n", "\\begin{eqnarray*}\n", "\t\\mathbf{x}^{(t)} &=& \\frac{\\left( \\sum_i \\lambda_i^t \\mathbf{u}_i \\mathbf{u}_i^T \\right) \\left( \\sum_i c_i \\mathbf{u}_i \\right)}{\\|\\left( \\sum_i \\lambda_i^t \\mathbf{u}_i \\mathbf{u}_i^T \\right) \\left( \\sum_i c_i \\mathbf{u}_i \\right)\\|_2} \\\\\n", "\t&=& \\frac{\\sum_i c_i \\lambda_i^t \\mathbf{u}_i}{\\|\\sum_i c_i \\lambda_i^t \\mathbf{u}_i\\|_2}\t\\\\\n", "\t&=& \\frac{c_1 \\mathbf{u}_1 + c_2 (\\lambda_2/\\lambda_1)^t \\mathbf{u}_2 + \\cdots + c_n (\\lambda_n/\\lambda_1)^t \\mathbf{u}_n}{\\|c_1 \\mathbf{u}_1 + c_2 (\\lambda_2/\\lambda_1)^t \\mathbf{u}_2 + \\cdots + c_n (\\lambda_n/\\lambda_1)^t \\mathbf{u}_n\\|_2} \\left( \\frac{\\lambda_1}{|\\lambda_1|} \\right)^t.\n", "\\end{eqnarray*}\n", "$$\n", "Thus $\\mathbf{x}^{(t)} - \\frac{c_1 \\mathbf{u}_1}{\\|c_1 \\mathbf{u}_1\\|_2} \\left( \\frac{\\lambda_1}{|\\lambda_1|} \\right)^t \\to 0$ as $t \\to \\infty$. The convergence rate is $|\\lambda_2|/|\\lambda_1|$.\n", "\n", "* $\\lambda_1^{(t)} = \\mathbf{x}^{(t)T} \\mathbf{A} \\mathbf{x}^{(t)}$ converges to $\\lambda_1$.\n", "\n", "* **Inverse power method** for finding the eigenvalue of smallest absolute value: Substitute $\\mathbf{A}$ by $\\mathbf{A}^{-1}$ in the power method. (E.g., pre-compute LU or Cholesky of $\\mathbf{A}$).\n", "\n", "* **Shifted inverse power**: Substitute $(\\mathbf{A} - \\mu \\mathbf{I})^{-1}$ in the power method. It converges to an eigenvalue close to the given $\\mu$.\n", "\n", "* **Rayleigh quotient iteration**: Substitute $(\\mathbf{A} - \\mu^{(t-1)} \\mathbf{I})^{-1}$, where $\\mu^{(t-1)} = \\mathbf{x}^{(t-1)T} \\mathbf{A} \\mathbf{x}^{(t-1)}$ in the shifted inverse method. Faster convergence.\n", "\n", "* Example: PageRank problem seeks top left eigenvector of transition matrix $\\mathbf{P}$ and costs $O(n)$ per iteration." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Top $r$ eigen-pairs: orthogonal iteration\n", "\n", "Generalization of power method to higher dimensional invariant subspace.\n", "\n", "* **Orthogonal iteration**: Initialize $\\mathbf{Q}^{(0)} \\in \\mathbb{R}^{n \\times r}$ with orthonormal columns. For $t=1,2,\\ldots$, \n", "$$\n", "\\begin{eqnarray*}\n", "\t\\mathbf{Z}^{(t)} &\\gets& \\mathbf{A} \\mathbf{Q}^{(t-1)} \\quad \\text{($2n^2r$ flops)} \\\\\n", "\t\\mathbf{Q}^{(t)} \\mathbf{R}^{(t)} &\\gets& \\mathbf{Z}^{(t)} \\quad \\text{(QR factorization)}%, $nr^2 - r^3/3$ flops)}\n", "\\end{eqnarray*}\n", "$$\n", "\n", "* $\\mathbf{Z}^{(t)}$ converges to the eigenspace of the largest $r$ eigenvalues if they are real and separated from remaining spectrum. The convergence rate is $|\\lambda_{r+1}|/|\\lambda_r|$.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### (Impractical) full eigen-decomposition: QR iteration\n", "\n", "Assume $\\mathbf{A} \\in \\mathbb{R}^{n \\times n}$ symmetric.\n", "\n", "* Take $r=n$ in the orthogonal iteration. Then $\\mathbf{Q}^{(t)}$ converges to the eigenspace $\\mathbf{U}$ of $\\mathbf{A}$. This implies that\n", "$$\n", "\t\\mathbf{T}^{(t)} := \\mathbf{Q}^{(t)T} \\mathbf{A} \\mathbf{Q}^{(t)}\n", "$$\n", "converges to a diagonal form $\\Lambda = \\text{diag}(\\lambda_1, \\ldots, \\lambda_n)$.\n", "\n", "* Note how to compute $\\mathbf{T}^{(t)}$ from $\\mathbf{T}^{(t-1)}$\n", "$$\n", "\\begin{eqnarray*}\n", "\t\\mathbf{T}^{(t-1)} &=& \\mathbf{Q}^{(t-1)T} \\mathbf{A} \\mathbf{Q}^{(t-1)} = \\mathbf{Q}^{(t-1)T} (\\mathbf{A} \\mathbf{Q}^{(t-1)}) = (\\mathbf{Q}^{(t-1)T} \\mathbf{Q}^{(t)}) \\mathbf{R}^{(t)}\t\\\\\\mathbf{A}\n", "\t\\mathbf{T}^{(t)} &=& \\mathbf{Q}^{(t)T} \\mathbf{A} \\mathbf{Q}^{(t)} = \\mathbf{Q}^{(t)T} \\mathbf{A} \\mathbf{Q}^{(t-1)} \\mathbf{Q}^{(t-1)T} \\mathbf{Q}^{(t)} = \\mathbf{R}^{(t)} ( \\mathbf{Q}^{(t-1)T} \\mathbf{Q}^{(t)}).\n", "\\end{eqnarray*}\n", "$$\n", "\n", "* **QR iteration**: Initialize $\\mathbf{U}^{(0)} \\in \\mathbb{R}^{n \\times n}$ orthogonal and set $\\mathbf{T}^{(0)} = \\mathbf{U}^{(0)T} \\mathbf{A} \\mathbf{U}^{(0)}$. \\\\\n", "For $t=1,2,\\ldots$\n", "$$\n", "\\begin{eqnarray*}\n", "\t\\mathbf{U}^{(t)} \\mathbf{R}^{(t)} &\\gets& \\mathbf{T}^{(t-1)} \\quad \\text{(QR factorization)} \\\\\n", "\t\\mathbf{T}^{(t)} &\\gets& \\mathbf{R}^{(t)} \\mathbf{U}^{(t)}\n", "\\end{eqnarray*}\n", "$$\n", "\n", "* QR iteration is expensive in general: $O(n^3)$ per iteration and linear convergence rate." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### QR algorithm for symmetric eigen-decomposition\n", "\n", "Assume $\\mathbf{A} \\in \\mathbb{R}^{n \\times n}$ symmetric.\n", "\n", "* Reading: [The QR algorithm](http://hua-zhou.github.io/teaching/biostatm280-2018spring/readings/qr.pdf) by Beresford N. Parlett.\n", "\n", "* This is the algorithm implemented in LAPACK: used by Julia, Matlab, R.\n", "\n", "\n", "\n", "* Idea: Tri-diagonalization (by Householder) + QR iteration on the tri-diagonal system with implicit shift.\n", " \n", " 0. Step 1: Householder tri-diagonalization: $4n^3/3$ for eigenvalues only, $8n^3/3$ for both eigenvalues and eigenvectors. (Why can't we apply Householder to make it diagonal directly?)\n", " \n", " 0. Step 2: QR iteration on the tridiagonal matrix. Implicit shift accelerates convergence rate. On average 1.3-1.6 QR iteration per eigenvalue, $\\sim 20n$ flops per QR iteration. So total operation count is about $30n^2$. Eigenvectors need an extra of about $6n^3$ flops.\n", "\n", "| | Eigenvalue | Eigenvector |\n", "|------------------------|--------------|-------------|\n", "| Householder reduction | $4n^3/3$ | $4n^3/3$ |\n", "| QR with implicit shift | $\\sim 30n^2$ | $\\sim 6n^3$ |\n", "\n", "* Message: Don't request eigenvectors unless necessary. Use [`eigvals`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.eigvals) in Julia to request only eigenvalues.\n", "\n", "* The **unsymmetric QR algorithm** obtains the real Schur decomposition of an asymmetric matrix $\\mathbf{A}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example\n", "\n", "Julia functions: [`eigfact`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.eigfact), [`eigfact!`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.eigfact!), [`eig`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.eig), [`eigvals`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.eigvals), [`eigvecs`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.eigvecs), [`eigmax`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.eigmax), [`eigmin`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.eigmin)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Base.LinAlg.Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}([-2.48944, -1.77509, -0.50108, 0.486929, 4.20338], [0.658469 -0.143335 … -0.438088 0.530903; -0.113456 -0.48779 … -0.619946 -0.582809; … ; 0.62722 -0.0304166 … 0.0857872 -0.372504; 0.357966 -0.249277 … 0.589831 -0.326102])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srand(280)\n", "A = Symmetric(randn(5, 5), :U)\n", "\n", "Aeig = eigfact(A)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "eigfact(A::Union{Hermitian{Complex{T},S}, Hermitian{T,S}, Symmetric{T,S}} where S where T<:Real) at linalg/symmetric.jl:296" ], "text/plain": [ "eigfact(A::Union{Hermitian{Complex{T},S}, Hermitian{T,S}, Symmetric{T,S}} where S where T<:Real) in Base.LinAlg at linalg/symmetric.jl:296" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@which eigfact(A)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5-element Array{Float64,1}:\n", " -2.48944 \n", " -1.77509 \n", " -0.50108 \n", " 0.486929\n", " 4.20338 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# eigen-values\n", "Aeig[:values]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 0.658469 -0.143335 -0.268503 -0.438088 0.530903\n", " -0.113456 -0.48779 -0.15871 -0.619946 -0.582809\n", " 0.178891 0.823682 -0.296119 -0.261698 -0.365203\n", " 0.62722 -0.0304166 0.677902 0.0857872 -0.372504\n", " 0.357966 -0.249277 -0.596221 0.589831 -0.326102" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# eigen-vectors\n", "Aeig[:vectors]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 0.131582 0.389729 0.0498403 0.0706637 -1.00615 \n", " 0.389729 0.680625 0.52453 0.17737 -0.946774 \n", " 0.0498403 0.52453 -0.397678 0.355914 -0.551065 \n", " 0.0706637 0.17737 0.355914 -1.02755 0.84497 \n", " -1.00615 -0.946774 -0.551065 0.84497 -0.0561269" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# inversion by eigen-decomposition\n", "inv(Aeig)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "inv(A::Base.LinAlg.Eigen) at linalg/eigen.jl:285" ], "text/plain": [ "inv(A::Base.LinAlg.Eigen) in Base.LinAlg at linalg/eigen.jl:285" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@which inv(Aeig)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-4.532047740558188" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# determinant by eigen-decomposition\n", "det(Aeig)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "det(A::Base.LinAlg.Eigen) at linalg/eigen.jl:286" ], "text/plain": [ "det(A::Base.LinAlg.Eigen) in Base.LinAlg at linalg/eigen.jl:286" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@which det(Aeig)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "eigvals(A::Union{Hermitian{Complex{T},S}, Hermitian{T,S}, Symmetric{T,S}} where S where T<:Real) at linalg/symmetric.jl:352" ], "text/plain": [ "eigvals(A::Union{Hermitian{Complex{T},S}, Hermitian{T,S}, Symmetric{T,S}} where S where T<:Real) in Base.LinAlg at linalg/symmetric.jl:352" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@which eigvals(A)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "eigmax(A::Union{Hermitian{#s268,#s267}, Hermitian{Complex{#s268},#s267}, Symmetric{#s268,#s267}} where #s267<:(Union{Base.ReshapedArray{T,2,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{DenseArray, SubArray{T,N,P,I,true} where I<:Tuple{Union{Base.Slice, UnitRange},Vararg{Any,N} where N} where P where N where T}, DenseArray{T,2}, SubArray{T,2,A,I,L} where L} where I<:Tuple{Vararg{Union{Base.AbstractCartesianIndex, Int64, Range{Int64}},N} where N} where A<:Union{Base.ReshapedArray{T,N,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{DenseArray, SubArray{T,N,P,I,true} where I<:Tuple{Union{Base.Slice, UnitRange},Vararg{Any,N} where N} where P where N where T} where N where T, DenseArray} where T) where #s268<:Real) at linalg/symmetric.jl:436" ], "text/plain": [ "eigmax(A::Union{Hermitian{#s268,#s267}, Hermitian{Complex{#s268},#s267}, Symmetric{#s268,#s267}} where #s267<:(Union{Base.ReshapedArray{T,2,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{DenseArray, SubArray{T,N,P,I,true} where I<:Tuple{Union{Base.Slice, UnitRange},Vararg{Any,N} where N} where P where N where T}, DenseArray{T,2}, SubArray{T,2,A,I,L} where L} where I<:Tuple{Vararg{Union{Base.AbstractCartesianIndex, Int64, Range{Int64}},N} where N} where A<:Union{Base.ReshapedArray{T,N,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{DenseArray, SubArray{T,N,P,I,true} where I<:Tuple{Union{Base.Slice, UnitRange},Vararg{Any,N} where N} where P where N where T} where N where T, DenseArray} where T) where #s268<:Real) in Base.LinAlg at linalg/symmetric.jl:436" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@which eigmax(A)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "eigmin(A::Union{Hermitian{#s268,#s267}, Hermitian{Complex{#s268},#s267}, Symmetric{#s268,#s267}} where #s267<:(Union{Base.ReshapedArray{T,2,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{DenseArray, SubArray{T,N,P,I,true} where I<:Tuple{Union{Base.Slice, UnitRange},Vararg{Any,N} where N} where P where N where T}, DenseArray{T,2}, SubArray{T,2,A,I,L} where L} where I<:Tuple{Vararg{Union{Base.AbstractCartesianIndex, Int64, Range{Int64}},N} where N} where A<:Union{Base.ReshapedArray{T,N,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{DenseArray, SubArray{T,N,P,I,true} where I<:Tuple{Union{Base.Slice, UnitRange},Vararg{Any,N} where N} where P where N where T} where N where T, DenseArray} where T) where #s268<:Real) at linalg/symmetric.jl:437" ], "text/plain": [ "eigmin(A::Union{Hermitian{#s268,#s267}, Hermitian{Complex{#s268},#s267}, Symmetric{#s268,#s267}} where #s267<:(Union{Base.ReshapedArray{T,2,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{DenseArray, SubArray{T,N,P,I,true} where I<:Tuple{Union{Base.Slice, UnitRange},Vararg{Any,N} where N} where P where N where T}, DenseArray{T,2}, SubArray{T,2,A,I,L} where L} where I<:Tuple{Vararg{Union{Base.AbstractCartesianIndex, Int64, Range{Int64}},N} where N} where A<:Union{Base.ReshapedArray{T,N,A,MI} where MI<:Tuple{Vararg{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},N} where N} where A<:Union{DenseArray, SubArray{T,N,P,I,true} where I<:Tuple{Union{Base.Slice, UnitRange},Vararg{Any,N} where N} where P where N where T} where N where T, DenseArray} where T) where #s268<:Real) in Base.LinAlg at linalg/symmetric.jl:437" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@which eigmin(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Don't request eigenvectors unless needed." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1000×1000 Symmetric{Float64,Array{Float64,2}}:\n", " 0.126238 0.618244 0.0919806 … -0.827714 0.552947 1.4751 \n", " 0.618244 1.10206 -1.22707 -0.329363 0.495627 -0.681543 \n", " 0.0919806 -1.22707 0.448183 0.651512 0.641943 -1.13163 \n", " -0.599229 0.444055 -0.144774 -2.42724 0.531677 -0.16157 \n", " -1.16943 -1.41308 1.04011 -0.149012 0.23343 1.07526 \n", " 0.48837 0.60684 -0.47627 … 0.0689862 0.16161 -0.340787 \n", " 0.291197 0.0233376 -1.54499 1.27717 0.928057 -1.23676 \n", " 0.907608 0.707824 0.169586 1.28848 1.17851 0.0213138\n", " 1.0741 0.303644 -0.966296 -0.273163 -0.984755 0.337243 \n", " -0.701505 -0.129978 0.136473 -1.62665 0.373027 1.45087 \n", " -2.03655 -2.06623 1.22059 … 0.0661588 -0.592486 -0.0623642\n", " 1.31756 0.774051 -0.413113 0.687657 -0.287483 -0.584347 \n", " 0.670946 0.0920452 1.2537 -0.615498 0.287261 1.46858 \n", " ⋮ ⋱ \n", " 1.23791 0.472766 0.541145 1.07576 0.186868 0.997434 \n", " -0.445531 -0.615596 0.615605 -0.241992 -0.785126 -0.693826 \n", " 0.241442 -0.413603 -0.562249 … 0.390124 -0.0762032 -0.258146 \n", " 0.285152 -0.0906967 -1.23907 -0.184222 0.968934 -1.15218 \n", " -1.23843 0.00274229 0.651478 1.21625 0.90555 0.909079 \n", " 0.632244 0.773648 -0.00569263 0.440222 0.187638 0.415562 \n", " -0.99491 1.0403 -0.0498013 -0.380093 1.23168 0.903511 \n", " 0.688573 -0.418917 -0.612049 … -1.02006 -1.04156 1.09417 \n", " -0.428258 -1.35384 -1.18739 0.218391 -0.605399 0.51032 \n", " -0.827714 -0.329363 0.651512 0.2232 -0.893229 0.163704 \n", " 0.552947 0.495627 0.641943 -0.893229 1.39604 0.246911 \n", " 1.4751 -0.681543 -1.13163 0.163704 0.246911 0.108533 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using BenchmarkTools\n", "\n", "srand(280)\n", "\n", "n = 1000\n", "A = Symmetric(randn(n, n), :U)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.99 MiB\n", " allocs estimate: 18\n", " --------------\n", " minimum time: 137.309 ms (0.00% GC)\n", " median time: 155.528 ms (0.00% GC)\n", " mean time: 174.524 ms (0.27% GC)\n", " maximum time: 222.454 ms (0.75% GC)\n", " --------------\n", " samples: 29\n", " evals/sample: 1" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# requesting eigenvalues only is cheaper\n", "@benchmark eigvals(A)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 23.25 MiB\n", " allocs estimate: 21\n", " --------------\n", " minimum time: 304.373 ms (0.64% GC)\n", " median time: 310.200 ms (0.63% GC)\n", " mean time: 330.251 ms (1.78% GC)\n", " maximum time: 426.362 ms (0.55% GC)\n", " --------------\n", " samples: 16\n", " evals/sample: 1" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# requesting eigenvectors requires extra work\n", "@benchmark eigfact(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Algorithm for singular value decomposition (SVD)\n", "\n", "\n", "\n", "\n", "\n", "Assume $\\mathbf{A} \\in \\mathbb{R}^{m \\times n}$ and we seek the SVD $\\mathbf{A} = \\mathbf{U} \\mathbf{D} \\mathbf{V}^T$. \n", "\n", "\n", "\n", "\n", "\n", "* **Golub-Kahan-Reinsch algorithm**: \n", " * Stage 1: Transform $\\mathbf{A}$ to an upper bidiagonal form $\\mathbf{B}$ (by Householder). \n", " * Stage 2: Apply implicit-shift QR iteration to the tridiagonal matrix $\\mathbf{B}^T \\mathbf{B}$ _implicitly_.\n", "\n", "* See Section 8.6 of [Matrix Computation](http://catalog.library.ucla.edu/vwebv/holdingsInfo?bibId=7122088) by Gene Golub and Charles Van Loan (2013) for more details.\n", "\n", "* $4m^2 n + 8mn^2 + 9n^3$ flops for a tall $(m \\ge n)$ matrix.\n", "\n", "### Example\n", "\n", "Julia functions: [`svdfact`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.svdfact), [`svdfact!`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.svdfact!), [`svd`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.svd), [`svdvals`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.svdvals)." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}([-0.260396 0.707178 0.119138; 0.756045 0.194481 -0.186492; … ; 0.15061 0.0569623 -0.829782; 0.00294694 0.67583 -0.0355871], [4.10862, 1.34446, 1.1401], [-0.720206 0.465188 0.514687; -0.639078 -0.733549 -0.231266; -0.269966 0.495485 -0.825599])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srand(280)\n", "\n", "A = randn(5, 3)\n", "Asvd = svdfact(A)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×3 Array{Float64,2}:\n", " -0.260396 0.707178 0.119138 \n", " 0.756045 0.194481 -0.186492 \n", " -0.58129 -0.0456556 -0.5111 \n", " 0.15061 0.0569623 -0.829782 \n", " 0.00294694 0.67583 -0.0355871" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Asvd[:U]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Float64,2}:\n", " -0.720206 0.465188 0.514687\n", " -0.639078 -0.733549 -0.231266\n", " -0.269966 0.495485 -0.825599" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Vt is cheaper to extract than V\n", "Asvd[:Vt]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Float64,2}:\n", " -0.720206 -0.639078 -0.269966\n", " 0.465188 -0.733549 0.495485\n", " 0.514687 -0.231266 -0.825599" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Asvd[:V]" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 4.10862\n", " 1.34446\n", " 1.1401 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Asvd[:S]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Don't request singular vectors unless needed." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 4.11 MiB\n", " allocs estimate: 12\n", " --------------\n", " minimum time: 60.028 ms (0.00% GC)\n", " median time: 60.798 ms (0.00% GC)\n", " mean time: 61.538 ms (0.32% GC)\n", " maximum time: 77.827 ms (0.00% GC)\n", " --------------\n", " samples: 82\n", " evals/sample: 1" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srand(280)\n", "\n", "n, p = 1000, 500\n", "A = randn(n, p)\n", "\n", "@benchmark svdvals(A)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 17.23 MiB\n", " allocs estimate: 16\n", " --------------\n", " minimum time: 116.675 ms (0.00% GC)\n", " median time: 120.056 ms (0.94% GC)\n", " mean time: 121.681 ms (1.87% GC)\n", " maximum time: 167.393 ms (29.17% GC)\n", " --------------\n", " samples: 42\n", " evals/sample: 1" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@benchmark svdfact(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lanczos/Arnoldi iterative method for top eigen-pairs\n", "\n", "* Consider the Google PageRank problem. We want to find the top left eigenvector of the transition matrix $\\mathbf{P}$. Direct methods such as (unsymmetric) QR or SVD takes forever. Iterative methods such as power method is feasible. However power method may take a large number of iterations.\n", "\n", "* **Krylov subspace methods** are the state-of-art iterative methods for obtaining the top eigen-values/vectors or singular values/vectors of large **sparse** or **structured** matrices.\n", "\n", "* **Lanczos method**: top eigen-pairs of a large _symmetric_ matrix. \n", "\n", "* **Arnoldi method**: top eigen-pairs of a large _asymmetric_ matrix.\n", "\n", "* Both methods are also adapted to obtain top singular values/vectors of large sparse or structured matrices.\n", "\n", "* We will give an overview of these methods together with the conjugate gradient method for solving large linear system.\n", "\n", "* [`eigs`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.eigs-Tuple{Any}) and [`svds`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.svds) in Julia and Matlab are wrappers of the ARPACK package, which implements Lanczos and Arnoldi methods." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "typeof(A) = SparseMatrixCSC{Float64,Int64}\n", "typeof(Afull) = Symmetric{Float64,Array{Float64,2}}\n" ] }, { "data": { "text/plain": [ "0.00425125" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using BenchmarkTools\n", "\n", "srand(280)\n", "\n", "n = 4000\n", "# a sparse pd matrix, about 0.5% non-zero entries\n", "A = sprand(n, n, 0.002)\n", "A = A + A' + (2n) * I\n", "@show typeof(A)\n", "b = randn(n)\n", "Afull = Symmetric(full(A))\n", "@show typeof(Afull)\n", "# actual sparsity level\n", "countnz(A) / length(A)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5-element Array{Float64,1}:\n", " 8004.83\n", " 8004.87\n", " 8004.91\n", " 8004.92\n", " 8008.74" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# top 5 eigenvalues by LAPACK (QR algorithm)\n", "eigvals(Afull, (n-4):n)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "([8004.83, 8004.87, 8004.91, 8004.92, 8008.74], 5, 21, 282, [0.0886855, -0.0110494, -0.0426668, -0.00949213, 0.0287553, 0.0384508, 0.0794464, -0.0312362, 0.0793456, 0.0167494 … -0.112428, 0.0194448, -0.180572, 0.00167151, 0.0563253, -0.00314476, -0.0484962, 0.0132143, -0.0208838, 0.0275571])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# top 5 eigenvalues by iterative methods\n", "eigs(A; nev = 5, ritzvec = false)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 123.48 MiB\n", " allocs estimate: 22\n", " --------------\n", " minimum time: 5.379 s (0.17% GC)\n", " median time: 5.379 s (0.17% GC)\n", " mean time: 5.379 s (0.17% GC)\n", " maximum time: 5.379 s (0.17% GC)\n", " --------------\n", " samples: 1\n", " evals/sample: 1" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@benchmark eigvals(Afull, (n-4):n)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 840.39 KiB\n", " allocs estimate: 1822\n", " --------------\n", " minimum time: 36.341 ms (0.00% GC)\n", " median time: 40.440 ms (0.00% GC)\n", " mean time: 42.443 ms (0.11% GC)\n", " maximum time: 58.579 ms (0.00% GC)\n", " --------------\n", " samples: 118\n", " evals/sample: 1" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@benchmark eigs(A; nev = 5, ritzvec = false)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "eigs(A) at linalg/arnoldi.jl:90" ], "text/plain": [ "eigs(A) in Base.LinAlg at linalg/arnoldi.jl:90" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@which eigs(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Jacobi method for symmetric eigen-decomposition\n", "\n", "Assume $\\mathbf{A} \\in \\mathbf{R}^{n \\times n}$ is symmetric and we seek the eigen-decomposition $\\mathbf{A} = \\mathbf{U} \\Lambda \\mathbf{U}^T$.\n", "\n", "* Idea: Systematically reduce off-diagonal entries \n", "$$\n", "\\text{off}(\\mathbf{A}) = \\sum_i \\sum_{j \\ne i} a_{ij}^2\n", "$$\n", "by Jacobi rotations.\n", "\n", "* Jacobi/Givens rotations:\n", "$$\n", "\\begin{eqnarray*}\n", "\t\\mathbf{J}(p,q,\\theta) = \\begin{pmatrix} \n", "\t1 & & 0 & & 0 & & 0 \\\\\n", "\t\\vdots & \\ddots & \\vdots & & \\vdots & & \\vdots \\\\\n", "\t0 & & \\cos(\\theta) & & \\sin(\\theta) & & 0 \\\\ \n", "\t\\vdots & & \\vdots & \\ddots & \\vdots & & \\vdots \\\\\n", "\t0 & & - \\sin(\\theta) & & \\cos(\\theta) & & 0 \\\\\n", "\t\\vdots & & \\vdots & & \\vdots & \\ddots & \\vdots \\\\\n", "\t0 & & 0 & & 0 & & 1 \\end{pmatrix},\n", "\\end{eqnarray*}\n", "$$\n", "$\\mathbf{J}(p,q,\\theta)$ is orthogonal.\n", "\n", "* Consider $\\mathbf{B} = \\mathbf{J}^T \\mathbf{A} \\mathbf{J}$. $\\mathbf{B}$ preserves the symmetry and eigenvalues of $\\mathbf{A}$. Taking \n", "$$\n", "\\begin{eqnarray*}\n", "\\begin{cases}\n", "\\tan (2\\theta) = 2a_{pq}/({a_{qq}-a_{pp}}) & \\text{if } a_{pp} \\ne a_{qq} \\\\\n", "\\theta = \\pi/4 & \\text{if } a_{pp}=a_{qq}\n", "\\end{cases}\n", "\\end{eqnarray*}\n", "$$\n", "forces $b_{pq}=0$.\n", "\n", "* Since orthogonal transform preserves Frobenius norm, we have\n", "$$\n", "b_{pp}^2 + b_{qq}^2 = a_{pp}^2 + a_{qq}^2 + 2a_{pq}^2.\n", "$$ \n", "(Just check the 2-by-2 block)\n", "\n", "* Since $\\|\\mathbf{A}\\|_{\\text{F}} = \\|\\mathbf{B}\\|_{\\text{F}}$, this implies that the off-diagonal part \n", "$$\n", "\t\\text{off}(\\mathbf{B}) = \\text{off}(\\mathbf{A}) - 2a_{pq}^2\n", "$$\n", "is decreased whenever $a_{pq} \\ne 0$.\n", "\n", "* One Jacobi rotation costs $O(n)$ flops.\n", "\n", "* **Classical Jacobi**: search for the largest $|a_{ij}|$ at each iteration. $O(n^2)$ efforts!\n", "\n", "* $\\text{off}(\\mathbf{A}) \\le n(n-1) a_{ij}^2$ and $\\text{off}(\\mathbf{B}) = \\text{off}(\\mathbf{A}) - 2 a_{ij}^2$ together implies \n", "$$\n", "\\text{off}(\\mathbf{B}) \\le \\left( 1 - \\frac{2}{n(n-1)} \\right) \\text{off}(\\mathbf{A}).\n", "$$\n", "Thus Jacobi method converges in $O(n^2)$ iterations.\n", "\n", "* In practice, cyclic-by-row implementation, to avoid the costly $O(n^2)$ search in the classical Jacobi.\n", "\n", "* Jacobi method attracts a lot recent attention because of its rich inherent parallelism.\n", "\n", "* **Parallel Jacobi**: ``merry-go-round\" to generate parallel ordering.\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Julia Version 0.6.2\n", "Commit d386e40c17 (2017-12-13 18:08 UTC)\n", "Platform Info:\n", " OS: macOS (x86_64-apple-darwin14.5.0)\n", " CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz\n", " WORD_SIZE: 64\n", " BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)\n", " LAPACK: libopenblas64_\n", " LIBM: libopenlibm\n", " LLVM: libLLVM-3.9.1 (ORCJIT, skylake)\n" ] } ], "source": [ "versioninfo()" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.6.2", "language": "julia", "name": "julia-0.6" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.6.2" }, "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "341px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 4, "toc_cell": true, "toc_position": { "height": "572px", "left": "0px", "right": "auto", "top": "106px", "width": "232px" }, "toc_section_display": "block", "toc_window_display": true, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 2 }