{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "5c33e8d6",
"metadata": {},
"source": [
"# Householder Reflections and Givens Rotations\n",
"\n",
"
\n",
"\n",
"Using rotations and reflections to introduce zeros in a matrix is a powerful paradigm in numerical linear algebra. \n",
"\n",
"
\n",
"\n",
"Reflections and rotations are orthogonal transformations. They preserve the Euclidean lengths of real vectors. \n",
"\n",
"
\n",
"\n",
"They don't shrink and they don't stretch too much - these transformations have the perfect condition number, $\\kappa=1$ (at least, in the 2-norm and other unitarily invariant norms like the Frobenius norm).\n",
"\n",
"
\n",
"\n",
"This notebook walks through the basic construction and application of two important variants. \n",
"\n",
"* **Householder reflections:** roughly speaking, these are used for dense problems, introducing zeros column by column.\n",
"\n",
"* **Givens rotations:** These are often used for data-sparse problems, for instance, introducing zeros one entry at a time in a sparse matrix."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a275e098",
"metadata": {},
"outputs": [],
"source": [
"using LinearAlgebra\n",
"using SparseArrays"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c2782277",
"metadata": {},
"source": [
"## Householder Reflections\n",
"\n",
"The Householder reflector $F_v$, constructed from a real $k$-dimensional vector $v$, is a rank-one modification of the identity: \n",
"\n",
"$$F_v = I - 2\\alpha xx^*, \\quad\\text{where}\\quad x = {\\rm sign}(v_1)\\|v\\|e_1+v \\quad\\text{and}\\quad \\alpha = \\|x\\|^2.$$\n",
"\n",
"It looks like the orthogonal projection onto subspace orthogonal to $x$, but it is actually a _reflection across_ the subspace orthogonal to $x$. The orthogonal projection has rank $k-1$, but the reflection has rank $k$ and $F_v^{-1}=F_v^T=F_v$.\n",
"\n",
"The vector $x$ is chosen so that the reflection takes the vector $v$ to the first coordinate axis:\n",
"\n",
"$$ F_v\\begin{pmatrix}v_1 \\\\ v_2 \\\\ \\vdots \\\\ v_k \\end{pmatrix} = \\begin{pmatrix}\\|v\\| \\\\ 0 \\\\ \\vdots \\\\ 0\\end{pmatrix}.$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3fdac43",
"metadata": {},
"outputs": [],
"source": [
"# compute the Householder reflector\n",
"function hhr(v)\n",
"\n",
" x = copy(v) # copy v to x\n",
" x[1] = x[1] + sign(x[1])*norm(x) # modify first entry of x\n",
" return x\n",
"end\n",
"\n",
"\n",
"v = randn(6)\n",
"v = v / norm(v)\n",
"x = hhr(v)\n",
"Fv = I-2*x*x' / (x'*x)\n",
"\n",
"display(Fv) # here's the full reflection matrix\n",
"display(norm(I - Fv'*Fv)) # orthogonal transformations have transpose = inverse\n",
"display(Fv*v) # x is chosen so that Fv*v = ||v|| * e1 (on the computer, remember that rounding errors occur)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5b9eef81",
"metadata": {},
"source": [
"In practice, we never form the matrix $F_v$ explicitly. We _apply_ it to vectors as a linear transformation by computing\n",
"\n",
"$$ F_vw = w - 2\\alpha x (x^*w). $$\n",
"\n",
"The arithmetic cost is a dot product, a vector addition, and a multiplication. Much better than building the whole matrix and multiplying by a vector.\n",
"\n",
"We also only need to store the reflection vector $x$ (you could also store the scalar to avoid calculating $\\alpha = \\|x\\|^2$ again if you want)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6de72dfb",
"metadata": {},
"outputs": [],
"source": [
"function apply_hhr(x, w)\n",
" \n",
" return w - 2 * x * (x' * w) / (x' * x)\n",
"end\n",
"\n",
"w = randn(6)\n",
"Fvw = apply_hhr(x, w) # we can use the computed reflector x to apply Fv to any vector without forming the full reflection matrix\n",
"display(Fvw) # vectors other than v get reflected across the same subspace, but don't usually end up along a coordinate axis (mostly nonzeros)\n",
"display(norm(Fvw)-norm(w)) # but, reflection means norm is preserved for _any_ vector"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "33046caa",
"metadata": {},
"source": [
"Householder reflections really come into their strength when we use them to introduce zeros and factor matrices. Here's a toy implementation for a familiar example: $A=QR$."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d6b49962",
"metadata": {},
"outputs": [],
"source": [
"function hqr(A, k)\n",
" # Householder triangularization on the first k columns of A\n",
" \n",
" R = copy(A)\n",
" n = size(A,2)\n",
" X = zeros(size(A))\n",
" for j in 1:k\n",
" X[j:end,j] = hhr(R[j:end,j]') # get Householder reflector\n",
" R[j:end,j:end] = apply_hhr(X[j:end,j],R[j:end,j:end]) # introduce zeros in n-j x n-j lower right submatrix\n",
" end\n",
" return X, R # return reflectors (for orthogonal Q) and upper triangular R\n",
"end\n",
"\n",
"A = randn(8,5)\n",
"F = hqr(A,5)\n",
"display(abs.(F[2]).>1e-14) # R is now (numerically) zero below the diagonal\n",
"display(abs.(F[1]).>1e-14) # F[1] contains Householder vectors of decreasing size"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "22d1b2fe",
"metadata": {},
"source": [
"In practice, the Householder reflectors are usually stored in a compact blocked format that enjoys better storage properties and enables faster matrix-matrix multiplication implementations. \n",
"\n",
"The point is, we store the reflectors, and not the full reflection matrix!\n",
"\n",
"Next week, Householder reflectors will play a starring role in the \"crown jewel\" of numerical analysis: the QR algorithm for computing eigenvalues (not to be mistaken with the QR factorization)."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "6dca2d2e",
"metadata": {},
"source": [
"## Givens rotations\n",
"\n",
"Householder reflections naturally operate on columns of $A$. But what if most column entries in $A$ are nonzero? We can introduce zeros one entry at a time with Givens rotations.\n",
"\n",
"
\n",
"\n",
"You can see the idea clearest in two dimensions first, where the vector $x = (x_1,\\,\\,x_2)^T$ is rotated counterclockwise by an angle $\\theta$ into the vector $y = (y_1,\\,\\,y_2)^T$ by\n",
"\n",
"$$\n",
"\\begin{pmatrix} \n",
"y_1 \\\\ y_2\n",
"\\end{pmatrix}\n",
"=\n",
"\\begin{pmatrix}\n",
"\\cos(\\theta) & -\\sin(\\theta) \\\\\n",
"\\sin(\\theta) & \\cos(\\theta)\n",
"\\end{pmatrix}\n",
"\\begin{pmatrix} \n",
"x_1 \\\\ x_2\n",
"\\end{pmatrix}.\n",
"$$\n",
"\n",
"Given $x$, how should we chose $\\theta$ so that $y_2 = 0$? We need to rotate $x$ counterclockwise so that it lies along the $e_1=(1,\\,\\,0)^T$ coordinate axis! If we choose $\\cos(\\theta) = x_1/\\|x\\|$ and $\\sin(\\theta) = - x_2/\\|x\\|$, then\n",
"\n",
"$$\n",
"\\begin{pmatrix} \n",
"\\|x\\| \\\\ 0\n",
"\\end{pmatrix}\n",
"=\n",
"\\frac{1}{\\|x\\|}\\begin{pmatrix}\n",
"x_1 & x_2 \\\\\n",
"-x_2 & x_1\n",
"\\end{pmatrix}\n",
"\\begin{pmatrix} \n",
"x_1 \\\\ x_2\n",
"\\end{pmatrix}.\n",
"$$\n",
"\n",
"The matrix we constructed is a rotation - an orthogonal transformation - that zeros out the second entry of the special vector $x$. A Givens rotation is the $2\\times 2$ rotation analogue of a Housholder reflection!\n",
"\n",
"
\n",
"\n",
"In numerical linear algebra, we are usually concerned with more than just two dimensions. But we can use Givens rotations to introduce one zero at a time by, e.g., mixing two rows at a time. Conceptually, this means embedding the Givens rotation matrix into a larger identity matrix. For example if we want to use the first entry of a $5$-dimensional vector to zero out its last entry, we could write\n",
"\n",
"$$\n",
"\\begin{pmatrix} \n",
"\\sqrt{x_1^2+x_5^2} \\\\ x_2 \\\\ x_3 \\\\ x_4 \\\\ 0\n",
"\\end{pmatrix}\n",
"=\n",
"\\begin{pmatrix}\n",
"c & 0 & 0 & 0 & s \\\\\n",
"0 & 1 & 0 & 0 & 0 \\\\\n",
"0 & 0 & 1 & 0 & 0 \\\\\n",
"0 & 0 & 0 & 1 & 0 \\\\\n",
"-s & 0 & 0 & 0 & c\n",
"\\end{pmatrix}\n",
"\\begin{pmatrix} \n",
"x_1 \\\\ x_2 \\\\ x_3 \\\\ x_4 \\\\ x_5\n",
"\\end{pmatrix}, \n",
"\\qquad\\text{where}\\qquad\n",
"c = \\frac{x_1}{\\sqrt{x_1^2+x_5^2}}\n",
"\\qquad\\text{and}\\qquad\n",
"s = \\frac{x_5}{\\sqrt{x_1^2+x_5^2}}.\n",
"$$\n",
"\n",
"Just as with Householder reflections, we never form the full rotation matrix on the computer. We store $c$ and $s$ and apply the Givens rotation as a linear transformation that combines two rows (when applied from the left)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb47426d",
"metadata": {},
"outputs": [],
"source": [
"function toy_givens(x, j, k)\n",
" \n",
" r = sqrt(x[j]^2 + x[k]^2)\n",
" c = x[j] / r\n",
" s = x[k] / r\n",
"\n",
" return c, s\n",
"end\n",
"\n",
"function apply_givens(v, c, s, j, k)\n",
"\n",
" w = copy(v)\n",
"\n",
" w[j,:] = c*v[j,:] + s*v[k,:]\n",
" w[k,:] = -s*v[j,:] + c*v[k,:]\n",
"\n",
" return w\n",
"end\n",
"\n",
"N = 10\n",
"A = diagm(-1 => -ones(N-1), 0 => 2*ones(N), 1 => -ones(N-1))\n",
"g = toy_givens(A[:,1], 1, 2) # compute Givens rotation to zero out first subdiagonal entry\n",
"B = apply_givens(A, g[1], g[2], 1, 2) # apply Givens rotation to mix first two rows of A\n",
"display(sparse(A)) # display matrix before Givens\n",
"display(sparse(B)) # display matrix after Givens\n",
"display( norm(A[:,2]) - norm(B[:,2]) ) # column norm is preserved"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "effffdd1",
"metadata": {},
"source": [
"There are a number of subtle points to get right when doing Givens on the computer. Luckily for us, Julia has a great ready-to-use function for computing and working with Givens rotations. Let's test it out on the same example we used for our \"toy\" Givens rotation code."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6844880",
"metadata": {},
"outputs": [],
"source": [
"N = 10\n",
"A = diagm(-1 => -ones(N-1), 0 => 2*ones(N), 1 => -ones(N-1))\n",
"G = givens(A, 1, 2, 1)\n",
"B = G[1]*A\n",
"display(sparse(A))\n",
"display(sparse(B))\n",
"display( norm(A[:,2]) - norm(B[:,2]) )"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8359af8a",
"metadata": {},
"source": [
"It looks like our toy code did okay on this simple example. Now, let's string a few more Givens rotations together to triangularize the tridiagonal matrix $A$ above. We'll also allow it to accumulate the same Givens rotations applied to another matrix B, which is useful if we want to express the orthogonal factor $Q$ as a matrix, or if we want to compute $Q^Tb$ for least-squares problems."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ce4a736",
"metadata": {},
"outputs": [],
"source": [
"function triQR(A,B)\n",
" # compute the QR decomposition of a tridiagonal matrix using Givens rotations\n",
" \n",
" R = copy(A)\n",
" QTB = copy(B)\n",
" n = size(R,2)\n",
" for j in 1:n-1\n",
" G = givens(R, j, j + 1, j)\n",
" R = G[1]*R\n",
" QTB = G[1]*QTB\n",
" end\n",
"\n",
" return R, QTB\n",
"end\n",
"\n",
"F = triQR(A,diagm(0 => ones(N)))\n",
"display(abs.(F[1]).>1e-14) # F[1] is the triangular factor - it is banded with upper bandwidth = 3\n",
"display(norm(F[2]'*F[2]-I)) # F[2] is the transpose of the orthogonal factor"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "8a864530",
"metadata": {},
"source": [
"Banded matrices are very special. What happens for other types of sparse matrices? Let's add a row of ones to the first row of the difference matrix and see how it effects the $QR$ factorization.\n",
"\n",
" $$\n",
" A = \\begin{pmatrix}\n",
" 1 & 1 & 1 & 1 & \\cdots & 1 \\\\\n",
" -1 & 2 & -1 & 0 & \\cdots & 0 \\\\\n",
" 0 & -1 & -2 & 1 & \\cdots & 0 \\\\\n",
" \\vdots & & \\ddots & \\ddots & \\ddots & \\vdots \\\\\n",
" 0 & \\cdots & & 0 & -1 & 2\n",
" \\end{pmatrix}\n",
" $$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ad523721",
"metadata": {},
"outputs": [],
"source": [
"A[1,:] = ones(1,N)\n",
"F = triQR(A,diagm(0 => ones(N)))\n",
"display(abs.(F[1]).>1e-14) # F[1] is the triangular factor - it is banded with upper bandwidth = 3\n",
"display(norm(F[2]'*F[2]-I)) # F[2] is the transpose of the orthogonal factor"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "56ff1f07",
"metadata": {},
"source": [
"The upper triangular factor is now completely dense... what happened?\n",
"\n",
"
\n",
"\n",
"We can explore by running Householder QR on $A$ one column at a time, stopping to visualize how the upper triangular factor fills in."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d379f452",
"metadata": {},
"outputs": [],
"source": [
"F = hqr(A, 0)\n",
"display(abs.(F[2]).>1e-14) # R is now (numerically) zero below the diagonal"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "0f93d48e",
"metadata": {},
"source": [
"Just as we saw with Gaussian elimination and $A=LU$, the factors of $A$ can suffer from _fill-in_: many more nonzero entries than in the original matrix."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.6.3",
"language": "julia",
"name": "julia-1.6"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}