{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "toc": "true" }, "source": [ "# Table of Contents\n", "

1  Lecture 14: Iterative methods for large scale eigenvalue problems
1.1  Previous lecture
1.2  Partial eigenvalue problem
1.3  Power method and related methods
1.3.1  Power method
1.3.2  Inverse iteration
1.3.3  Rayleigh quotient (RQ) iteration
1.3.4  Inexact inverse iteration framework
1.3.5  Block power method
1.3.6  Accelerating convergence of the block power method
1.4  Ritz approximation
1.4.1  Properties of the Ritz approximation
1.4.2  Rayleigh-Ritz method
1.5  Lanczos and Arnoldi methods
1.5.1  Why is \\theta_\\max \\approx \\lambda_\\max?
1.5.2  Demo: approximating largest eigenvalue with Lanczos
1.5.3  Practical issues and stability
1.5.4  More problems with the Lanczos method
1.6  PINVIT (preconditioned inverse iteration)
1.6.1  Derivation
1.6.2  Convergence theory
1.6.3  Block case
1.7  LOBPCG (Locally Optimal Block Preconditioned CG)
1.7.1  Locally optimal PCG (not \"Block\" so far :))
1.7.2  LOPCG (stable version)
1.7.3  Locally optimal block PCG
1.7.4  LOBPCG summary
1.8  Jacobi-Davidson (JD) method
1.8.1  JD derivation
1.8.2  Jacobi correction equation
1.8.3  Solving Jacobi correction equation
1.8.4  Connection to the Rayleigh quotient iteration
1.8.5  Preconditioning of Jacobi equation
1.8.6  Subspace acceleration in JD
1.8.7  The block case of JD
1.8.8  Jacobi-Davidson: summary
1.9  Software
1.10  Take-home message
1.11  Next lecture
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Lecture 14: Iterative methods for large scale eigenvalue problems" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Previous lecture\n", "\n", "- Finalizing iterative methods for linear systems (minres, bicg, bicgstab)\n", "\n", "- Jacobi, Gauss-Seidel, SSOR methods as preconditioners\n", "\n", "- Incomplete LU for preconditioning, three flavours: ILU(k), ILUT, ILU2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Partial eigenvalue problem\n", "\n", "- Recall that to find eigenvalues of matrix of size $N\\times N$ one can use, e.g. the QR algorithm.\n", "\n", "- However, in some applications matrix is so large, that we even can not store it exactly.\n", "\n", "- Typically such matrices are given as a **black-box** that is able only to multiply matrix by vector (sometimes even without access to matrix elements). This is what we assume today.\n", "\n", "- In this case the best we can do is to solve partial eigenvalue problem, e.g.\n", "\n", " - Find $k\\ll N$ smallest or largest eigenvalues (and eigenvectors if needed)\n", " - Find $k\\ll N$ eigenvalues closest to a given number $\\sigma$\n", "\n", "- For simplicity we will consider the case when matrix is normal and thus has orthonormal basis of eigenvectors. \n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Power method and related methods\n", "\n", "### Power method\n", "\n", "Recall that the simplest method to find the largest eigenvalue is the **power method**\n", "\n", "$$\n", " x_{i+1} = \\frac{Ax_{i}}{\\|Ax_{i}\\|}\n", "$$\n", "\n", "The convergence is linear with rate $q = \\left|\\frac{\\lambda_1}{\\lambda_2}\\right|$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Inverse iteration\n", "\n", "To find the smallest eigenvalue one may run power method for $A^{-1}$:\n", "\n", "$$x_{i+1} = \\frac{A^{-1}x_{i}}{\\|A^{-1}x_{i}\\|}.$$\n", "\n", "To accelerate convergence shift-and-invert strategy can be used:\n", "\n", "$$x_{i+1} = \\frac{(A-\\sigma I)^{-1}x_{i}}{\\|(A-\\sigma I)^{-1}x_{i}\\|},$$\n", "\n", "where $\\sigma$ should be close to the eigenvalue we want to find." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Rayleigh quotient (RQ) iteration\n", "\n", "In order to get superlinear convergence one may use adaptive shifts:\n", "\n", "$$x_{i+1} = \\frac{(A-R(x_i) I)^{-1}x_{i}}{\\|(A-R(x_i) I)^{-1}x_{i}\\|},$$\n", "\n", "where $R(x_k) = \\frac{(x_i, Ax_i)}{(x_i, x_i)}$ is Rayleigh quotient. \n", "\n", "The method converges **cubically for Hermitian matrices** and quadratically for non-Hermitian case." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Inexact inverse iteration framework\n", "\n", "- Matrices $(A- \\sigma I)$ as well as $(A-R(x_i) I)$ are ill-conditioned if $\\sigma$ or $R(x_i)$ are close to eigenvalues.\n", "\n", "- Thus, if you are not given e.g. LU factorization of such matrix you might face a problem.\n", "\n", "- In practice you can solve systems only with some accuracy. Recall also that condition number is an upper bound and is overestimated for cosistent rhs. So, even in RQ iteration letting\n", "the shift tend to the eigenvalue [does not harm](http://www.sciencedirect.com/science/article/pii/S0024379505005756) significantly\n", "the performance of the iterative methods.\n", "\n", "- If accuracy of solution of systems increases from iteration to iteration, superlinear convergence for RQ iteration can still be present, see [Theorem 2.1](http://www.sciencedirect.com/science/article/pii/S0024379505005756).\n", "Otherwise, you will get linear convergence." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Block power method\n", "\n", "The block power method (also known as subspace iteration method or simultaneous vector iteration) is a natural generalization of the power method for several largest eigenvalues.
\n", "It looks as:\n", "\n", "1. $Y_0$ is $N\\times k$ matrix of rank $k$, $Y_0 = X_0 R_0$ (QR-decomposition)\n", "2. $Y_i = AX_{i-1}$ \n", "3. $Y_i = X_i R_i$ (QR-decomposition)\n", "\n", "QR-decomposition plays role of normalization in the standard power method. \n", "\n", "Moreover, orthogonalization prevents the columns of the $X_i$ from converging all to the eigenvector corresponding to the largest modulus eigenvalue." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Accelerating convergence of the block power method\n", "\n", "* For Hermitian matrices convergence of the $j$-column is **linear** as for the power method with $q=\\frac{|\\lambda_{j}|}{|\\lambda_{j+1}|}$. \n", "\n", "\n", "* Hence, applying the block power method to the matrix $(A-\\sigma I)^{-1}$ will accelerate convergence (shift-and-invert strategy).\n", "\n", "\n", "* You can also accelerate the convergence by applying the **Rayleigh-Ritz procedure** discussed below." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import copy\n", "%matplotlib inline\n", "\n", "n = 100\n", "k = 10\n", "A = np.diag(1./(1. + np.arange(n))) # diagonal matrix with well-separated maximum eigenvalues\n", "A_clustered = np.diag(1 - 1./(1. + np.arange(n))) # diagonal matrix with clustered maximum eigenvalues\n", "\n", "def subspace_iter(A, Y0, num_iter=100):\n", " Y0, _ = np.linalg.qr(Y0)\n", " Y = Y0.copy()\n", " Y_old = Y0.copy()\n", " err = []\n", " for i in range(num_iter):\n", " X = A.dot(Y)\n", " Y, _ = np.linalg.qr(X)\n", " err.append(np.linalg.norm(Y_old - Y.dot(Y.T.dot(Y_old))))\n", " Y_old = Y.copy()\n", " return Y, err\n", "\n", "Y0 = np.random.random((n, k))\n", "Y, err = subspace_iter(A, Y0, num_iter=100)\n", "Y, err_clustered = subspace_iter(A_clustered, Y0, num_iter=100) #np.diag((diagonal - sigma)**(-1))\n", "plt.semilogy(err, label='Separated eigvals')\n", "plt.semilogy(err_clustered, label='Clustered eigvals')\n", "plt.xlabel('Number of iterations')\n", "plt.ylabel('Error')\n", "plt.legend(loc='best')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Before we go to advanced methods let us discuss the important concept of **Ritz approximation**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Ritz approximation\n", "\n", "Given subspace spanned by columns of unitary matrix $Q_k$ of size $N\\times k$ we consider the projected matrix $Q_k^* A Q_k$.\n", "\n", "Let $\\Theta_k=\\mathrm{diag}(\\theta_1,\\dots,\\theta_k)$ and $S_k=\\begin{bmatrix}s_1 & \\dots & s_k \\end{bmatrix}$ be matrices of eigenvalues and eigenvectors of $(Q_k^* A Q_k)$: \n", "\n", "$$(Q_k^* A Q_k)S_k = S_k \\Theta_k$$\n", "\n", "then $\\{\\theta_i\\}$ are called **Ritz values** and $y_i = Q_k s_i$ - **Ritz vectors**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Properties of the Ritz approximation\n", "\n", "- Note that they are not eigenpairs of the initial matrix $AY_k\\not= Y_k \\Theta_k$, but the following equality holds:\n", "\n", " $$Q_k^* (AY_k - Y_k \\Theta_k) = Q_k^* (AQ_k S_k - Q_k S_k \\Theta_k) = 0,$$\n", "\n", " so the residual for the Ritz approximation is **orthogonal** to the subspace spanned by columns of $Q_k$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- $\\lambda_\\min(A) \\leq \\theta_\\min \\leq \\theta_\\max \\leq \\lambda_\\max(A)$. Indeed, using Rayleigh quotient:\n", "\n", " $$\\theta_\\min = \\lambda_\\min (Q_k^* A Q_k) = \\min_{x\\not=0} \\frac{x^* (Q_k^* A Q_k) x}{x^* x} = \\min_{y\\not=0:y=Q_k x} \\frac{y^* A y}{y^* y}\\geq \\min_{y\\not= 0} \\frac{y^* A y}{y^* y} = \\lambda_\\min(A).$$\n", "\n", " Obviously, $\\lambda_\\min (Q_k^* A Q_k) = \\lambda_\\min(A)$ if $k=N$, but we want to construct a basis $k\\ll N$ such that $\\lambda_\\min (Q_k^* A Q_k) \\approx \\lambda_\\min(A)$.\n", "\n", " Similarly, $\\theta_\\max \\leq \\lambda_\\max(A)$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Rayleigh-Ritz method\n", "\n", "Thus, if a subspace $V$ approximates first $k$ eigenvectors, then one can use the **Rayleigh-Ritz method**:\n", "\n", "1. Find orthonormal basis $Q_k$ in $V$ (e.g. by using QR decomposition)\n", "2. Calculate $Q_k^*AQ_k$\n", "3. Compute Ritz values and vectors\n", "4. Note that alternatevly one could use $V$ with no orthogonalization, but then generalized eigenvalue problem $(V^*AV)s_i = \\theta_i (V^*V)s_i$ has to be solved.\n", "\n", "The question is how to find a good subspace $V$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Lanczos and Arnoldi methods\n", "\n", "The good choice for $V$ is the Krylov subspace.\n", "\n", "Recall that in the power method we used only one Krylov vector \n", "\n", "$$x_k = \\frac{A^k x_0}{\\|A^k x_0\\|}.$$\n", "\n", "In this case $\\theta_k = \\frac{x_k^* A x_k}{x_k^* x_k}$ is nothing but a Ritz value. Natural idea is to use a bigger Krylov subspace.\n", "\n", "As a result we can find more eigenvalues (power method only gives $\\lambda_\\max$). Moreover,convergence of the eigenvalue corresponding to $\\lambda_\\max$ will be faster." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "For Hermitian matrices from the Arnoldi relation we have\n", "\n", "$$\n", "Q_k^*AQ_k = T_k,\n", "$$\n", "\n", "where $Q_k$ is orthogonal basis in the Krylov subspace generated by the Lanczos procedure and $T_k$ is triangular matrix.\n", "\n", "According to the Rayleigh-Ritz method we expect that eigenvalues of $T_k$ approximate eigenvalues of $A$. This method is called the **Lanczos method**. For nonsymmetric matrices it is called the **Arnoldi method** and instead of tridiagonal $T_k$ we would get upper=Hessenberg matrix.\n", "\n", "Let us show that $\\theta_\\max \\approx\\lambda_\\max$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Why is $\\theta_\\max \\approx \\lambda_\\max$?\n", "\n", "Let us denote $\\theta_1 \\equiv \\theta_\\max$ and $\\lambda_1 \\equiv \\lambda_\\max$. Then\n", "\n", "$$\n", " \\theta_1 = \\max_{y\\in \\mathcal{K}_i, y\\not=0}\\frac{(y,Ay)}{(y,y)} = \\max_{p_{i-1}} \\frac{(p_{i-1}(A)x_0, A p_{i-1}(A)x_0)}{(p_{i-1}(A)x_0, p_{i-1}(A)x_0)},\n", "$$\n", "\n", "where $p_{i-1}$ is a polynomial of degree not greater than $i-1$ such that $p_{i-1}(A)x_0\\not=0$.\n", "\n", "Expand $x_0 = \\sum_{j=1}^N c_j v_j$, where $v_j$ are eigenvectors of $A$ (form orthonormal basis).\n", "\n", "Since $\\theta_1 \\leq \\lambda_1$ we get\n", "$$\n", " \\lambda_1 - \\theta_1 \\leq \\lambda_1 - \\frac{(p_{i-1}(A)x_0, A p_{i-1}(A)x_0)}{(p_{i-1}(A)x_0, p_{i-1}(A)x_0)}\n", "$$\n", "for any polynomial $p_{i-1}$. Hence\n", "$$\n", "\\lambda_1 - \\theta_1 \\leq \\lambda_1 - \\frac{\\sum_{k=1}^N \\lambda_k |p_{i-1}(\\lambda_k)|^2 |c_k|^2}{\\sum_{k=1}^N |p_{i-1}(\\lambda_k)|^2 |c_k|^2} =\n", "$$\n", "$$\n", "= \\frac{\\sum_{k=2}^N (\\lambda_1 - \\lambda_k) |p_{i-1}(\\lambda_k)|^2 |c_k|^2}{|p_{i-1}(\\lambda_1)|^2 |c_1|^2 + \\sum_{k=2}^N |p_{i-1}(\\lambda_k)|^2 |c_k|^2} \\leq \n", "(\\lambda_1 - \\lambda_n) \\frac{\\max_{2\\leq k \\leq N}|p_{i-1}(\\lambda_k)|^2}{|p_{i-1}(\\lambda_1)|^2 }\\gamma, \\quad \\gamma = \\frac{\\sum_{k=2}^N|c_k|^2}{|c_1|^2}\n", "$$\n", "\n", "Since the inequality holds for any polynomial $p_{i-1}$ we will choose a polynomial: \n", "\n", "$$|p_{i-1}(\\lambda_1)| \\gg \\max_{2\\leq k \\leq N}|p_{i-1}(\\lambda_k)|.$$\n", "\n", "This holds, e.g. for the Chebyshev polynomial on $[\\lambda_n,\\lambda_2]$. Thus, $\\theta_1 \\approx \\lambda_1$ or more precisely (Paige-Kaniel error bound, check it!):\n", "$$\n", " \\lambda_1 - \\theta_1 \\leq \\frac{\\lambda_1 - \\lambda_n}{T_{i-1}^2(1 + 2\\mu)}\\gamma, \\quad \\mu = \\frac{\\lambda_1 - \\lambda_2}{\\lambda_2 - \\lambda_n},\n", "$$\n", "where $T_{i-1}$ is a Chebyshev polynomial." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Demo: approximating largest eigenvalue with Lanczos" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "k=10, err = 0.1097290322596427\n", "k=20, err = 0.02941322892533016\n", "k=100, err = 4.511147011498906e-11\n" ] } ], "source": [ "import scipy as sp\n", "import scipy.sparse\n", "from scipy.sparse import csc_matrix, csr_matrix\n", "import matplotlib.pyplot as plt\n", "import scipy.linalg\n", "import scipy.sparse.linalg\n", "import copy\n", "n = 40\n", "ex = np.ones(n)\n", "lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr')\n", "e = sp.sparse.eye(n)\n", "A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)\n", "\n", "def lanczos(A, m):\n", " n = A.shape[0]\n", " v = np.random.random((n, 1))\n", " v = v / np.linalg.norm(v)\n", " v_old = np.zeros((n, 1))\n", " beta = np.zeros(m)\n", " alpha = np.zeros(m)\n", " for j in range(m-1):\n", " w = A.dot(v)\n", " alpha[j] = w.T.dot(v)\n", " w = w - alpha[j] * v - beta[j] * v_old\n", " beta[j+1] = np.linalg.norm(w)\n", " v_old = v.copy()\n", " v = w / beta[j+1]\n", " w = A.dot(v)\n", " alpha[m-1] = w.T.dot(v)\n", " A = np.diag(beta[1:], k=-1) + np.diag(beta[1:], k=1) + np.diag(alpha[:], k=0)\n", " l, _ = np.linalg.eigh(A)\n", " return l\n", "\n", "# Approximation of the largest eigenvalue for different k\n", "l_large_exact = sp.sparse.linalg.eigsh(A, k=99, which='LM')[0][0]\n", "print('k=10, err = {}'.format(np.abs(l_large_exact - lanczos(A, 10)[0])))\n", "print('k=20, err = {}'.format(np.abs(l_large_exact - lanczos(A, 20)[0])))\n", "print('k=100, err = {}'.format(np.abs(l_large_exact - lanczos(A, 100)[0])))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Practical issues and stability\n", "\n", "- The Lanczos vectors may loose orthogonality during the process due to floating-point errors, thus all practical implementations of it use **restarts**.\n", "\n", "- A very good introduction to the topic is given in the book of **Golub and Van-Loan (Matrix Computations)**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### More problems with the Lanczos method\n", "\n", "- Applying Lanczos directly to $A$ may result into a very slow convergence if $\\lambda_i\\approx \\lambda_{i+1}$
(typically holds for smallest eigenvalues that are not well-separated)\n", "\n", "\n", "- To accelerate the convergence one may apply Lanczos to $(A-\\sigma I)^{-1}$, but in this case systems have to be solved **very accurately**.
\n", "Otherwise the Arnoldi relation does not hold anymore.\n", "\n", "An alternative to this approach are the so-called preconditioned iterative methods that include:\n", "1. PINVIT (Preconditioned Inverse Iteration)\n", "2. LOBCPG (Locally optimal block preconditioned CG)\n", "3. Jacobi-Davidson method" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## PINVIT (preconditioned inverse iteration)\n", "\n", "### Derivation\n", "\n", "Consider Rayleigh quotient $R(x) = \\frac{(x,Ax)}{(x,x)}$. Then\n", "$$\n", "\\nabla R(x) = \\frac{2}{(x,x)} (Ax - R(x) x),\n", "$$\n", "\n", "so the simplest gradient descent method with a preconditioner $B$ reads\n", "\n", "$$\n", " x_{i+1} = x_{i} - \\tau_i B^{-1} (Ax_i - R(x_i) x_i),\n", "$$\n", "\n", "$$\n", " x_{i+1} = \\frac{x_{i+1}}{\\|x_{i+1}\\|}.\n", "$$\n", "\n", "Typically $B\\approx (A-\\sigma I)$, where $\\sigma$ is called shift.\n", "\n", "The closer $\\sigma$ to the required eigenvalue is, the faster the convergence." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "- Parameter $\\tau_k$ is chosen to minimize the $R(x_{i+1})$ over $\\tau_k$ (steepest descent method).\n", "\n", "- One can think of this minimization procedure as minimization in basis $V = [x_i, r_i]$, where $r_{i}=B^{-1} (Ax_i - R(x_i) x_i)$.\n", "\n", "- This results into the generalized eigenvalue problem $(V^*AV)\\begin{bmatrix}1 \\\\ -\\tau_i \\end{bmatrix} = \\theta (V^*V) \\begin{bmatrix}1 \\\\ -\\tau_i \\end{bmatrix}$ (Rayleigh-Ritz procedure with no orthogonalization of $V$). Here $\\theta$ is the closest to the required eigenvalue." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "slide" } }, "source": [ "### Convergence theory\n", "\n", "**Theorem** ([Knyazev and Neymeyr](http://www.sciencedirect.com/science/article/pii/S002437950100461X)) \n", "\n", "Let \n", "- $R(x_{i})\\in [\\lambda_j,\\lambda_{j+1}]$\n", "- $R(x_{i+1})\\in [R(x_{i}),\\lambda_{j+1}]$ (case $R(x_{i+1})\\in [\\lambda_{j}, R(x_{i})]$ is similar)\n", "- $\\|I - B^{-1} A\\|_A \\leq \\gamma < 1$\n", "\n", "then\n", "\n", "$$\n", "\\left|\\frac{R(x_{i+1}) - \\lambda_j}{R(x_{i+1}) - \\lambda_{j+1}}\\right| < \\left[ 1 - (1-\\gamma)\\left(1 - \\frac{\\lambda_j}{\\lambda_{j+1}}\\right) \\right]^2 \\cdot \\left|\\frac{R(x_{i}) - \\lambda_j}{R(x_{i}) - \\lambda_{j+1}}\\right|\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Block case\n", "\n", "To find, e.g. $k$ eigenvalues one can do a one step of PINVIT for each vector:\n", "\n", "\n", "$$\n", " x^{(j)}_{i+1} = x^{(j)}_{i} - \\tau^{(j)}_i B^{-1} (Ax^{(j)}_i - R(x^{(j)}_i) x^{(j)}_i), \\quad j=1,\\dots,k\n", "$$\n", "\n", "$$\n", " x^{(j)}_{i+1} = \\frac{x^{(j)}_{i+1}}{\\|x^{(j)}_{i+1}\\|}.\n", "$$\n", "\n", "And then orthogonalize them using the QR-decomposition. However, it is better to use the Rayleigh-Ritz procedure:\n", "\n", "- Set $X^{i}_k = [x^{(1)}_{i},\\dots, x^{(k)}_{i}]$ and $R^{i}_k = [B^{-1}r^{(1)}_{i},\\dots, B^{-1}r^{(k)}_{i}]$, where $r^{(j)}_{i} = Ax^{(j)}_i - R(x^{(j)}_i) x^{(j)}_i$\n", "\n", "\n", "- Set $V = [X^{i}_k, R^{i}_k]$, use Rayleigh-Ritz procedure for $V$ to find new $X^{i+1}_k$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## LOBPCG (Locally Optimal Block Preconditioned CG)\n", "\n", "### Locally optimal PCG (not \"Block\" so far :))\n", "LOPCG method\n", "\n", "$$\n", " x_{i+1} = x_{i} - \\alpha_i B^{-1} (Ax_i - R(x_i) x_i) + \\beta_i x_{i-1} ,\n", "$$\n", "\n", "$$\n", " x_{i+1} = \\frac{x_{i+1}}{\\|x_{i+1}\\|}.\n", "$$\n", "\n", "\n", "is a superior to PINVIT method as it adds to basis not only $x_i$ and $r_i$, but also $x_{i-1}$.\n", "\n", "However, this interpretation leads to an unstable algorithm as $x_{i}$ is becoming colinear to $x_{i-1}$ as the procedure converges." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### LOPCG (stable version)\n", "\n", "Knyazev suggested an equivalent stable version, which introduces new vectors $p_i$ (conjugate gradients)\n", "\n", "$$\n", "p_{i+1} = r_{i} + \\beta_i p_{i},\n", "$$\n", "\n", "$$\n", "x_{i+1} = x_{i} + \\alpha_i p_{i+1}.\n", "$$\n", "\n", "One can check that $\\mathcal{L}(x_{i},x_{i-1},r_{i})=\\mathcal{L}(x_{i},p_{i},r_{i})$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The stable version explains name of the method:\n", "\n", "In standard CG method we would minimze Rayleigh quotient $R$ in the conjugate gradient direction $p_{i+1}$: \n", "\n", "$$\\alpha_i = \\arg\\min_{\\alpha_i} R(x_i + \\alpha_i p_{i+1}).$$\n", "\n", "In the locally-optimal CG we minimize over two parameters: \n", "\n", "$$\\alpha_i, \\beta_i = \\arg\\min_{\\alpha_i,\\beta_i} R\\left(x_i + \\alpha_i p_{i+1}\\right) = \\arg\\min_{\\alpha_i,\\beta_i} R\\left(x_i + \\alpha_i (r_{i} + \\beta_i p_{i})\\right)$$\n", "\n", "and we locally obtain more optimal solution. That is why the method is called **locally optimal**.\n", "\n", "As for PINVIT coefficients $\\alpha_i,\\beta_i$ can be found by the Rayleigh-Ritz procedure." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Locally optimal block PCG\n", "\n", "In the block version similarly to PINVIT on each iteration we are given basis $V=[X^{(i)}_k,B^{-1}R^{(i)}_k, P^{(i)}_k]$ and use Rayleigh-Ritz procedure.\n", "\n", "The overall algorithm:\n", "\n", "1. Find $\\tilde A = V^* A V$\n", "2. Find $\\tilde M = V^*V$\n", "3. Solve generalized eigenvalue problem $\\tilde A S_k = \\tilde M S_k \\Theta_k$\n", "4. $P^{(i+1)}_{k} = [B^{-1}R^{(i)}_k, P^{(i)}_k]S_k[:,k:]$\n", "5. $X^{(i+1)}_{k} = X^{(i)}_k S_k[:,:k] + P^{(i+1)}_{k}$ (equivalent to $X^{(i+1)}_{k} = VS_k$)\n", "6. Calculate new $B^{-1}R^{(i+1)}_k$\n", "7. Set $V=[X^{(i+1)}_k,B^{-1}R^{(i+1)}_k, P^{(i+1)}_k]$, goto 1.\n", "\n", "**Deflation technique** which stops iterating converged eigestates can also be applied here.\n", "\n", "The method also converges linearly, but faster than PINVIT." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### LOBPCG summary\n", "\n", "- Locally optimal preconditioned solver\n", "\n", "- Linear convergence\n", "\n", "- Preconditioner $(A-\\sigma I)$ is not always good for eigenvalue problems\n", "\n", "The next method (Jacobi-Davidson) has smart preconditioning and superlinear convergence (if systems are solved accurately)!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Jacobi-Davidson (JD) method\n", "\n", "Jacobi-Davidson method is a very popular technique for solving eigvalue problems (not only symmetric!).\n", "\n", "It consits of two **key ingredients**:\n", "\n", "- Given a preconditioner for $A-R(x_j) I$ it automatically constructs a good preconditioner for the eigevalue problem:\n", "$$\n", " B = (I - x_j x^*_j) (A - R(x_j) I) (I - x_j x^*_j),\n", "$$\n", "where $x_j$ - is approximation to the eigenvector on the $j$-th iteration.
**Note** that sometimes approximation to $(A-R(x_j) I)^{-1}$ is not a good preconditioner.\n", "\n", "\n", "- It additionally adds to a subspace $V$ solutions from previous iterations (**subspace acceleration**)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### JD derivation\n", "\n", "- Jacobi-Davidson method has a nice manifold optimization interpretation. \n", "- It is a **Riemannian Newton** method on a sphere and $P = I - x_j x^*_j$ is a projection on a tanget space of a sphere at $x_j$.\n", "\n", "But we will derive it similarly to the original paper." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Jacobi correction equation\n", "\n", "Jacobi not only presents the way to solve the eigenvalue problem by Jacobi rotations, but also proposed an iterative procedure. Let $x_j$ be the current approximation, and $t$ the correction:\n", "\n", "$$A(x_j + t) = \\lambda (x_j + t),$$\n", "\n", "and we look for the correction $t \\perp x_j$ (new orthogonal vector).\n", "\n", "Then, the parallel part has the form\n", "\n", "$$x_j x^*_j A (x_j + t) = \\lambda x_j,$$\n", "\n", "which simplifies to \n", "\n", "$$R(x_j) + x^* _j A t = \\lambda.$$\n", "\n", "The orthogonal component is \n", "\n", "$$( I - x_j x^*_j) A (x_j + t) = (I - x_j x^*_j) \\lambda (x_j + t),$$\n", "\n", "which is equivalent to \n", "\n", "$$\n", " (I - x_j x^*_j) (A - \\lambda I) t = (I - x_j x^*_j) (- A x_j + \\lambda x_j) = - (I - x_j x^*_j) A x_j = - (A - R(x_j) I) x_j = -r_j.\n", "$$\n", "\n", "$r_j$ is the residual.\n", "\n", "Since $(I - x_j x^*_j) t = t$, we can rewrite this equation in the symmetric form" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "$$ (I - x_j x^*_j) (A - \\lambda I) (I - x_j x^*_j) t = -r_j.$$\n", "\n", "Now we replace $\\lambda$ by $R(x_j)$, and get the **Jacobi correction equation**:\n", "\n", "$$\n", " (I - x_j x^*_j) (A - R(x_j) I) (I - x_j x^*_j) t = -r_j.\n", "$$\n", "\n", "Since $r_j \\perp x_j$ this equation is consistent, if $(A - R(x_j) I)$ is non-singular." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Solving Jacobi correction equation\n", "\n", "Typically Jacobi equation is solved inexactly by the appropriate Krylov method.\n", "\n", "Even inexact solution of Jacobi equation ensures (why?) that the correction $t$ is orthogonal to $x_j$, which is good for computations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Connection to the Rayleigh quotient iteration\n", "\n", "If this equation is solved exactly, we will get Rayleigh quotient iteration! Let us show that.\n", "\n", "$$ (I - x_j x^*_j) (A - R(x_j) I) (I - x_j x^*_j) t = -r_j.$$\n", "\n", "$$ (I - x_j x^*_j) (A - R(x_j) I) t = -r_j.$$\n", "\n", "$$ (A - R(x_j) I) t - \\alpha x_j = -r_j, \\quad \\alpha = x^*_j (A - R(x_j) I) x_j$$\n", "\n", "$$ t = \\alpha (A - R(x_j) I)^{-1}x_j - (A - R(x_j) I)^{-1}r_j,$$\n", "\n", "Thus, since $(A - R(x_j) I)^{-1}r_j = (A - R(x_j) I)^{-1}(A - R(x_j) I)x_j = x_j$ we get\n", "\n", "$$x_{j+1} = x_j + t = \\alpha (A - R(x_j) I)^{-1}x_j,$$\n", "\n", "which is Rayleigh quotient iteration up to normalization." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Preconditioning of Jacobi equation\n", "\n", "A popular preconditioner for solving Jacobi equation by Krylov method has the form\n", "\n", "$$\n", "\\widetilde K = (I - x_j x^*_j) K (I - x_j x^*_j)\n", "$$\n", "\n", "where $K$ is easily-inverted approximation of $(A - R(x_j) I)$.\n", "\n", "We need to derive how to solve a system with $\\widetilde K$ in terms of solving a system with $K$.\n", "\n", "We already showed that equation\n", "\n", "$$ (I - x_j x^*_j) K (I - x_j x^*_j) \\tilde t = f $$\n", "\n", "is equavelnt to \n", "\n", "$$ \\tilde t = \\alpha K^{-1}x_j + K^{-1}f $$\n", "\n", "The trick now is to forget about the value of $\\alpha$ and find it from $\\tilde t\\perp x_j$ to maintain orthogonality:\n", "\n", "$$\n", " \\alpha = \\frac{x_j^*K^{-1}f}{x_j^* K^{-1}x_j}\n", "$$\n", "Thus for each iteration of the Jacobi equation we calculate $K^{-1}x_j$ and then update only $K^{-1}f$ on each internal Krylov iteration" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Subspace acceleration in JD\n", "\n", "On each iteration of the method we expand a basis with new $t$.\n", "\n", "Namely, $V_j = [v_1,\\dots,v_{j-1},v_j]$, where $v_j$ is vector $t$ orthogonalized to $V_{j-1}$.\n", "\n", "Then standard Rayleigh-Ritz procedure is used.\n", "\n", "**Historal fact:** Initially subspace acceleration was used in the **Davidson method**.
\n", "\n", "However, instead of the Jacobi equation, equation $(\\mathrm{diag}(A) - R(x_j)I)t = -r_j$ was used.
\n", "Davidson method was very popular in quantum chemistry computations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### The block case of JD\n", "\n", "If we want many eigenvectors, we just compute **partial Schur decomposition:**\n", "\n", "$$A Q_k = Q_k T_k, $$\n", "\n", "and then want to update $Q_k$ by one vector added to $Q_k$. We just use instead of $A$ the matrix $(I - Q_k Q^*_k) A (I - Q_k Q^*_k)$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Jacobi-Davidson: summary\n", "\n", "The correction equation can be solved only roughly, and JD method is often the fastest." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Software\n", "\n", "- The [ARPack](http://www.caam.rice.edu/software/ARPACK/) is the most widely used (it powers scipy sparse eigensolver). Includes versions of Lanczos and Arnoldi algorithms.\n", "- The [PRIMME](https://github.com/primme/primme) is the best from my experience (it employs dynamic switching between different methods including LOBPCG and JD)\n", "- [PROPACK](http://sun.stanford.edu/~rmunk/PROPACK/) works well for the SVD." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Take-home message\n", "\n", "- Arnoldi and Lanczos methods. Shift-and-invert strategy is very expensive since inversion must be done very accurately.\n", "- Preconditioned iterative methods (PINVIT, LOBPCG, JD). Good for inexact inversions. \n", "- There is a software for using them\n", "- There is a lot of technical issues hidden (restarts, spurious eigenvalues, stability)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Next lecture\n", "\n", "\n", "- Fast Fourier transform\n", "\n", "- Structured matrices (Toeplitz, Circulants)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from IPython.core.display import HTML\n", "def css_styling():\n", " styles = open(\"./styles/custom.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ] } ], "metadata": { "anaconda-cloud": {}, "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" }, "nav_menu": {}, "toc": { "navigate_menu": true, "number_sections": false, "sideBar": true, "threshold": 6, "toc_cell": true, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }