{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "5c33e8d6", "metadata": {}, "source": [ "# Householder Reflections and Givens Rotations\n", "\n", "
\n", "\n", "Using rotations and reflections to introduce zeros in a matrix is a powerful paradigm in numerical linear algebra. \n", "\n", "
\n", "\n", "Reflections and rotations are orthogonal transformations. They preserve the Euclidean lengths of real vectors. \n", "\n", "
\n", "\n", "They don't shrink and they don't stretch too much - these transformations have the perfect condition number, $\\kappa=1$ (at least, in the 2-norm and other unitarily invariant norms like the Frobenius norm).\n", "\n", "
\n", "\n", "This notebook walks through the basic construction and application of two important variants. \n", "\n", "* **Householder reflections:** roughly speaking, these are used for dense problems, introducing zeros column by column.\n", "\n", "* **Givens rotations:** These are often used for data-sparse problems, for instance, introducing zeros one entry at a time in a sparse matrix." ] }, { "cell_type": "code", "execution_count": null, "id": "a275e098", "metadata": {}, "outputs": [], "source": [ "using LinearAlgebra\n", "using SparseArrays" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c2782277", "metadata": {}, "source": [ "## Householder Reflections\n", "\n", "The Householder reflector $F_v$, constructed from a real $k$-dimensional vector $v$, is a rank-one modification of the identity: \n", "\n", "$$F_v = I - 2\\alpha xx^*, \\quad\\text{where}\\quad x = {\\rm sign}(v_1)\\|v\\|e_1+v \\quad\\text{and}\\quad \\alpha = \\|x\\|^2.$$\n", "\n", "It looks like the orthogonal projection onto subspace orthogonal to $x$, but it is actually a _reflection across_ the subspace orthogonal to $x$. The orthogonal projection has rank $k-1$, but the reflection has rank $k$ and $F_v^{-1}=F_v^T=F_v$.\n", "\n", "The vector $x$ is chosen so that the reflection takes the vector $v$ to the first coordinate axis:\n", "\n", "$$ F_v\\begin{pmatrix}v_1 \\\\ v_2 \\\\ \\vdots \\\\ v_k \\end{pmatrix} = \\begin{pmatrix}\\|v\\| \\\\ 0 \\\\ \\vdots \\\\ 0\\end{pmatrix}.$$" ] }, { "cell_type": "code", "execution_count": null, "id": "a3fdac43", "metadata": {}, "outputs": [], "source": [ "# compute the Householder reflector\n", "function hhr(v)\n", "\n", " x = copy(v) # copy v to x\n", " x[1] = x[1] + sign(x[1])*norm(x) # modify first entry of x\n", " return x\n", "end\n", "\n", "\n", "v = randn(6)\n", "v = v / norm(v)\n", "x = hhr(v)\n", "Fv = I-2*x*x' / (x'*x)\n", "\n", "display(Fv) # here's the full reflection matrix\n", "display(norm(I - Fv'*Fv)) # orthogonal transformations have transpose = inverse\n", "display(Fv*v) # x is chosen so that Fv*v = ||v|| * e1 (on the computer, remember that rounding errors occur)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5b9eef81", "metadata": {}, "source": [ "In practice, we never form the matrix $F_v$ explicitly. We _apply_ it to vectors as a linear transformation by computing\n", "\n", "$$ F_vw = w - 2\\alpha x (x^*w). $$\n", "\n", "The arithmetic cost is a dot product, a vector addition, and a multiplication. Much better than building the whole matrix and multiplying by a vector.\n", "\n", "We also only need to store the reflection vector $x$ (you could also store the scalar to avoid calculating $\\alpha = \\|x\\|^2$ again if you want)." ] }, { "cell_type": "code", "execution_count": null, "id": "6de72dfb", "metadata": {}, "outputs": [], "source": [ "function apply_hhr(x, w)\n", " \n", " return w - 2 * x * (x' * w) / (x' * x)\n", "end\n", "\n", "w = randn(6)\n", "Fvw = apply_hhr(x, w) # we can use the computed reflector x to apply Fv to any vector without forming the full reflection matrix\n", "display(Fvw) # vectors other than v get reflected across the same subspace, but don't usually end up along a coordinate axis (mostly nonzeros)\n", "display(norm(Fvw)-norm(w)) # but, reflection means norm is preserved for _any_ vector" ] }, { "attachments": {}, "cell_type": "markdown", "id": "33046caa", "metadata": {}, "source": [ "Householder reflections really come into their strength when we use them to introduce zeros and factor matrices. Here's a toy implementation for a familiar example: $A=QR$." ] }, { "cell_type": "code", "execution_count": null, "id": "d6b49962", "metadata": {}, "outputs": [], "source": [ "function hqr(A, k)\n", " # Householder triangularization on the first k columns of A\n", " \n", " R = copy(A)\n", " n = size(A,2)\n", " X = zeros(size(A))\n", " for j in 1:k\n", " X[j:end,j] = hhr(R[j:end,j]') # get Householder reflector\n", " R[j:end,j:end] = apply_hhr(X[j:end,j],R[j:end,j:end]) # introduce zeros in n-j x n-j lower right submatrix\n", " end\n", " return X, R # return reflectors (for orthogonal Q) and upper triangular R\n", "end\n", "\n", "A = randn(8,5)\n", "F = hqr(A,5)\n", "display(abs.(F[2]).>1e-14) # R is now (numerically) zero below the diagonal\n", "display(abs.(F[1]).>1e-14) # F[1] contains Householder vectors of decreasing size" ] }, { "attachments": {}, "cell_type": "markdown", "id": "22d1b2fe", "metadata": {}, "source": [ "In practice, the Householder reflectors are usually stored in a compact blocked format that enjoys better storage properties and enables faster matrix-matrix multiplication implementations. \n", "\n", "The point is, we store the reflectors, and not the full reflection matrix!\n", "\n", "Next week, Householder reflectors will play a starring role in the \"crown jewel\" of numerical analysis: the QR algorithm for computing eigenvalues (not to be mistaken with the QR factorization)." ] }, { "attachments": {}, "cell_type": "markdown", "id": "6dca2d2e", "metadata": {}, "source": [ "## Givens rotations\n", "\n", "Householder reflections naturally operate on columns of $A$. But what if most column entries in $A$ are nonzero? We can introduce zeros one entry at a time with Givens rotations.\n", "\n", "
\n", "\n", "You can see the idea clearest in two dimensions first, where the vector $x = (x_1,\\,\\,x_2)^T$ is rotated counterclockwise by an angle $\\theta$ into the vector $y = (y_1,\\,\\,y_2)^T$ by\n", "\n", "$$\n", "\\begin{pmatrix} \n", "y_1 \\\\ y_2\n", "\\end{pmatrix}\n", "=\n", "\\begin{pmatrix}\n", "\\cos(\\theta) & -\\sin(\\theta) \\\\\n", "\\sin(\\theta) & \\cos(\\theta)\n", "\\end{pmatrix}\n", "\\begin{pmatrix} \n", "x_1 \\\\ x_2\n", "\\end{pmatrix}.\n", "$$\n", "\n", "Given $x$, how should we chose $\\theta$ so that $y_2 = 0$? We need to rotate $x$ counterclockwise so that it lies along the $e_1=(1,\\,\\,0)^T$ coordinate axis! If we choose $\\cos(\\theta) = x_1/\\|x\\|$ and $\\sin(\\theta) = - x_2/\\|x\\|$, then\n", "\n", "$$\n", "\\begin{pmatrix} \n", "\\|x\\| \\\\ 0\n", "\\end{pmatrix}\n", "=\n", "\\frac{1}{\\|x\\|}\\begin{pmatrix}\n", "x_1 & x_2 \\\\\n", "-x_2 & x_1\n", "\\end{pmatrix}\n", "\\begin{pmatrix} \n", "x_1 \\\\ x_2\n", "\\end{pmatrix}.\n", "$$\n", "\n", "The matrix we constructed is a rotation - an orthogonal transformation - that zeros out the second entry of the special vector $x$. A Givens rotation is the $2\\times 2$ rotation analogue of a Housholder reflection!\n", "\n", "
\n", "\n", "In numerical linear algebra, we are usually concerned with more than just two dimensions. But we can use Givens rotations to introduce one zero at a time by, e.g., mixing two rows at a time. Conceptually, this means embedding the Givens rotation matrix into a larger identity matrix. For example if we want to use the first entry of a $5$-dimensional vector to zero out its last entry, we could write\n", "\n", "$$\n", "\\begin{pmatrix} \n", "\\sqrt{x_1^2+x_5^2} \\\\ x_2 \\\\ x_3 \\\\ x_4 \\\\ 0\n", "\\end{pmatrix}\n", "=\n", "\\begin{pmatrix}\n", "c & 0 & 0 & 0 & s \\\\\n", "0 & 1 & 0 & 0 & 0 \\\\\n", "0 & 0 & 1 & 0 & 0 \\\\\n", "0 & 0 & 0 & 1 & 0 \\\\\n", "-s & 0 & 0 & 0 & c\n", "\\end{pmatrix}\n", "\\begin{pmatrix} \n", "x_1 \\\\ x_2 \\\\ x_3 \\\\ x_4 \\\\ x_5\n", "\\end{pmatrix}, \n", "\\qquad\\text{where}\\qquad\n", "c = \\frac{x_1}{\\sqrt{x_1^2+x_5^2}}\n", "\\qquad\\text{and}\\qquad\n", "s = \\frac{x_5}{\\sqrt{x_1^2+x_5^2}}.\n", "$$\n", "\n", "Just as with Householder reflections, we never form the full rotation matrix on the computer. We store $c$ and $s$ and apply the Givens rotation as a linear transformation that combines two rows (when applied from the left)." ] }, { "cell_type": "code", "execution_count": null, "id": "eb47426d", "metadata": {}, "outputs": [], "source": [ "function toy_givens(x, j, k)\n", " \n", " r = sqrt(x[j]^2 + x[k]^2)\n", " c = x[j] / r\n", " s = x[k] / r\n", "\n", " return c, s\n", "end\n", "\n", "function apply_givens(v, c, s, j, k)\n", "\n", " w = copy(v)\n", "\n", " w[j,:] = c*v[j,:] + s*v[k,:]\n", " w[k,:] = -s*v[j,:] + c*v[k,:]\n", "\n", " return w\n", "end\n", "\n", "N = 10\n", "A = diagm(-1 => -ones(N-1), 0 => 2*ones(N), 1 => -ones(N-1))\n", "g = toy_givens(A[:,1], 1, 2) # compute Givens rotation to zero out first subdiagonal entry\n", "B = apply_givens(A, g[1], g[2], 1, 2) # apply Givens rotation to mix first two rows of A\n", "display(sparse(A)) # display matrix before Givens\n", "display(sparse(B)) # display matrix after Givens\n", "display( norm(A[:,2]) - norm(B[:,2]) ) # column norm is preserved" ] }, { "attachments": {}, "cell_type": "markdown", "id": "effffdd1", "metadata": {}, "source": [ "There are a number of subtle points to get right when doing Givens on the computer. Luckily for us, Julia has a great ready-to-use function for computing and working with Givens rotations. Let's test it out on the same example we used for our \"toy\" Givens rotation code." ] }, { "cell_type": "code", "execution_count": null, "id": "b6844880", "metadata": {}, "outputs": [], "source": [ "N = 10\n", "A = diagm(-1 => -ones(N-1), 0 => 2*ones(N), 1 => -ones(N-1))\n", "G = givens(A, 1, 2, 1)\n", "B = G[1]*A\n", "display(sparse(A))\n", "display(sparse(B))\n", "display( norm(A[:,2]) - norm(B[:,2]) )" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8359af8a", "metadata": {}, "source": [ "It looks like our toy code did okay on this simple example. Now, let's string a few more Givens rotations together to triangularize the tridiagonal matrix $A$ above. We'll also allow it to accumulate the same Givens rotations applied to another matrix B, which is useful if we want to express the orthogonal factor $Q$ as a matrix, or if we want to compute $Q^Tb$ for least-squares problems." ] }, { "cell_type": "code", "execution_count": null, "id": "1ce4a736", "metadata": {}, "outputs": [], "source": [ "function triQR(A,B)\n", " # compute the QR decomposition of a tridiagonal matrix using Givens rotations\n", " \n", " R = copy(A)\n", " QTB = copy(B)\n", " n = size(R,2)\n", " for j in 1:n-1\n", " G = givens(R, j, j + 1, j)\n", " R = G[1]*R\n", " QTB = G[1]*QTB\n", " end\n", "\n", " return R, QTB\n", "end\n", "\n", "F = triQR(A,diagm(0 => ones(N)))\n", "display(abs.(F[1]).>1e-14) # F[1] is the triangular factor - it is banded with upper bandwidth = 3\n", "display(norm(F[2]'*F[2]-I)) # F[2] is the transpose of the orthogonal factor" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8a864530", "metadata": {}, "source": [ "Banded matrices are very special. What happens for other types of sparse matrices? Let's add a row of ones to the first row of the difference matrix and see how it effects the $QR$ factorization.\n", "\n", " $$\n", " A = \\begin{pmatrix}\n", " 1 & 1 & 1 & 1 & \\cdots & 1 \\\\\n", " -1 & 2 & -1 & 0 & \\cdots & 0 \\\\\n", " 0 & -1 & -2 & 1 & \\cdots & 0 \\\\\n", " \\vdots & & \\ddots & \\ddots & \\ddots & \\vdots \\\\\n", " 0 & \\cdots & & 0 & -1 & 2\n", " \\end{pmatrix}\n", " $$" ] }, { "cell_type": "code", "execution_count": null, "id": "ad523721", "metadata": {}, "outputs": [], "source": [ "A[1,:] = ones(1,N)\n", "F = triQR(A,diagm(0 => ones(N)))\n", "display(abs.(F[1]).>1e-14) # F[1] is the triangular factor - it is banded with upper bandwidth = 3\n", "display(norm(F[2]'*F[2]-I)) # F[2] is the transpose of the orthogonal factor" ] }, { "attachments": {}, "cell_type": "markdown", "id": "56ff1f07", "metadata": {}, "source": [ "The upper triangular factor is now completely dense... what happened?\n", "\n", "
\n", "\n", "We can explore by running Householder QR on $A$ one column at a time, stopping to visualize how the upper triangular factor fills in." ] }, { "cell_type": "code", "execution_count": null, "id": "d379f452", "metadata": {}, "outputs": [], "source": [ "F = hqr(A, 0)\n", "display(abs.(F[2]).>1e-14) # R is now (numerically) zero below the diagonal" ] }, { "attachments": {}, "cell_type": "markdown", "id": "0f93d48e", "metadata": {}, "source": [ "Just as we saw with Gaussian elimination and $A=LU$, the factors of $A$ can suffer from _fill-in_: many more nonzero entries than in the original matrix." ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.6.3", "language": "julia", "name": "julia-1.6" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.6.3" } }, "nbformat": 4, "nbformat_minor": 5 }