{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Transpose, Permutations, and Orthogonality\n", "\n", "One special type of matrix for which we can solve problems much more quickly is a permutation matrix, introduced as $PA=LU$ factorization." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "permutation_matrix (generic function with 1 method)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# construct a permutation matrix P from the permutation vector p\n", "function permutation_matrix(p)\n", " P = zeros(Int, length(p),length(p))\n", " for i = 1:length(p)\n", " P[i,p[i]] = 1\n", " end\n", " return P\n", "end" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Int64,2}:\n", " 0 1 0 0 0\n", " 0 0 0 1 0\n", " 1 0 0 0 0\n", " 0 0 0 0 1\n", " 0 0 1 0 0" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P =\n", "permutation_matrix([2,4,1,5,3])" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"ONE-HOT VECTOR\"" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"ONE-HOT VECTOR\"" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 1.0 0.0 0.0 0.0 0.0\n", " 0.0 1.0 0.0 0.0 0.0\n", " 0.0 0.0 1.0 0.0 0.0\n", " 0.0 0.0 0.0 1.0 0.0\n", " 0.0 0.0 0.0 0.0 1.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "I₅ = eye(5)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 0.0 1.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 1.0 0.0\n", " 1.0 0.0 0.0 0.0 0.0\n", " 0.0 0.0 0.0 0.0 1.0\n", " 0.0 0.0 1.0 0.0 0.0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P * I₅" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The inverse of any permutation matrix $P$ turns out to be its [transpose](https://en.wikipedia.org/wiki/Transpose) $P^T$: we just swap rows and columns. In Julia, this is denoted `P'` (technically, this is the conjugate transpose, and `P.'` is the transpose, but the two are the same for real-number matrices where complex conjugation does nothing)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Int64,2}:\n", " 0 1 0 0 0\n", " 0 0 0 1 0\n", " 1 0 0 0 0\n", " 0 0 0 0 1\n", " 0 0 1 0 0" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Int64,2}:\n", " 1 0 0 0 0\n", " 0 1 0 0 0\n", " 0 0 1 0 0\n", " 0 0 0 1 0\n", " 0 0 0 0 1" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P'*P" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Int64,2}:\n", " 1 0 0 0 0\n", " 0 1 0 0 0\n", " 0 0 1 0 0\n", " 0 0 0 1 0\n", " 0 0 0 0 1" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P*P'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The reason this works is that $P^T P$ computes the dot products of *all the columns* of $P$ with *all of the columns*, and the columns of $P$ are [orthonormal](https://en.wikipedia.org/wiki/Orthonormality) (orthogonal with length 1). We say that $P$ is an example of an [\"orthogonal\" matrix or a \"unitary\" matrix](https://en.wikipedia.org/wiki/Unitary_matrix). We will have much to say about such matrices later in 18.06." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×5 Array{Float64,2}:\n", " 0.656093 0.0785021 0.0874409 0.741967 0.401168 \n", " 0.0552685 0.0664353 0.369016 0.527357 0.0764656\n", " 0.133683 0.416092 0.52589 0.838358 0.628186 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A =rand(3,5)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×3 Array{Float64,2}:\n", " 0.656093 0.0552685 0.133683\n", " 0.0785021 0.0664353 0.416092\n", " 0.0874409 0.369016 0.52589 \n", " 0.741967 0.527357 0.838358\n", " 0.401168 0.0764656 0.628186" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A'" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "my_transpose (generic function with 1 method)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function my_transpose(A)\n", " m,n=size(A)\n", " B = zeros(n,m)\n", " for i=1:m, j=1:n\n", " B[j,i]=A[i,j]\n", " end\n", " B\n", "end" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×5 Array{Float64,2}:\n", " 0.656093 0.0785021 0.0874409 0.741967 0.401168 \n", " 0.0552685 0.0664353 0.369016 0.527357 0.0764656\n", " 0.133683 0.416092 0.52589 0.838358 0.628186 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×3 Array{Float64,2}:\n", " 0.656093 0.0552685 0.133683\n", " 0.0785021 0.0664353 0.416092\n", " 0.0874409 0.369016 0.52589 \n", " 0.741967 0.527357 0.838358\n", " 0.401168 0.0764656 0.628186" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_transpose(A)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Float64,2}:\n", " 0.0657944 0.133402 0.10374 \n", " 0.974459 0.506739 0.981765\n", " 0.591419 0.472668 1.34456 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = rand(3,3)\n", "B = rand(3,3)\n", "(A*B)'" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Float64,2}:\n", " 0.0657944 0.133402 0.10374 \n", " 0.974459 0.506739 0.981765\n", " 0.591419 0.472668 1.34456 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B' * A'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Transposes and products\n", "\n", "Transposes are important in linear algebra because they have a special relationship to matrix and vector products:\n", "$$\n", "(AB)^T = B^T A^T\n", "$$\n", "and hence for a dot product (inner product) $x^T y$\n", "$$\n", "x \\cdot (Ay) = x \\mbox{ \"dot\" } (Ay) = x^T (Ay) = (A^T x)^T y = (A^T x) \\mbox{ \"dot\" } y = (A^T x) \\cdot y\n", "$$\n", "We can even turn the second step around and use this as the *definition* of a transpose: a transpose is *what \"moves\" a matrix from one side to the other of a dot product.*" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 0.289551\n", " 0.257849\n", " 0.976208" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = rand(3)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 0.284907\n", " 0.180932\n", " 0.28499 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = rand(3)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.6837202268882089" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x ⋅ (A*y) # The transpose of A is the unique matrix such that\n", "# x ⋅ (A*y) = (A'x)⋅y for all x and y" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.6837202268882088" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(A'x)⋅y" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "true" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C = rand(-9:9, 4,4)\n", "D = rand(-9:9, 4,4)\n", "(C*D)' == D'*C'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Transposes and inverses\n", "\n", "From the above property, we have:\n", "$$\n", "(A A^{-1})^T = (A^{-1})^T A^T = I^T = I\n", "$$\n", "and it follows that:\n", "$$\n", "(A^{-1})^T = (A^T)^{-1}\n", "$$\n", "The *transpose of the inverse* is the *inverse of the transpose*." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Int64,2}:\n", " 4 -2 -7 -4 -8\n", " 9 -6 -6 -1 -5\n", " -2 -9 3 -5 2\n", " 9 7 -9 5 -8\n", " -1 6 -3 9 6" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = [4 -2 -7 -4 -8\n", " 9 -6 -6 -1 -5\n", " -2 -9 3 -5 2\n", " 9 7 -9 5 -8\n", " -1 6 -3 9 6]" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 0.0109991 0.131989 -0.235564 -0.301558 0.2044 \n", " 0.529789 0.35747 -0.179652 -0.69172 0.678582 \n", " -0.908341 -0.900092 0.370302 1.48701 -1.29667 \n", " -0.635197 -0.622365 0.353804 1.16499 -1.05408 \n", " -0.0879927 -0.055912 -0.11549 0.0791323 0.0314696" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inv(A')" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 0.0109991 0.131989 -0.235564 -0.301558 0.2044 \n", " 0.529789 0.35747 -0.179652 -0.69172 0.678582 \n", " -0.908341 -0.900092 0.370302 1.48701 -1.29667 \n", " -0.635197 -0.622365 0.353804 1.16499 -1.05408 \n", " -0.0879927 -0.055912 -0.11549 0.0791323 0.0314696" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inv(A)'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected, they match!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Transposes and LU factors\n", "\n", "If $A = LU$, then $A^T = U^T L^T$. Note that $U^T$ is *lower* triangular, and $L^T$ is *upper* trangular. That means, that once we have the LU factorization of $A$, we immediately have a similar factorization of $A^T$." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "([1.0 0.0 … 0.0 0.0; 1.0 1.0 … 0.0 0.0; … ; -0.111111 0.410256 … 1.0 0.0; -0.222222 -0.794872 … 0.0242696 1.0], [9.0 -6.0 … -1.0 -5.0; 0.0 13.0 … 6.0 -3.0; … ; 0.0 0.0 … 8.67894 9.95297; 0.0 0.0 … 0.0 -0.771206], [2, 4, 1, 5, 3])" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "L,U,p = lu(A)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5-element Array{Int64,1}:\n", " 2\n", " 4\n", " 1\n", " 5\n", " 3" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 1.0 1.0 0.444444 -0.111111 -0.222222 \n", " 0.0 1.0 0.0512821 0.410256 -0.794872 \n", " 0.0 0.0 1.0 0.582822 0.171779 \n", " 0.0 0.0 0.0 1.0 0.0242696\n", " 0.0 0.0 0.0 0.0 1.0 " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "L'" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 9.0 0.0 0.0 0.0 0.0 \n", " -6.0 13.0 0.0 0.0 0.0 \n", " -6.0 -3.0 -4.17949 0.0 0.0 \n", " -1.0 6.0 -3.86325 8.67894 0.0 \n", " -5.0 -3.0 -5.62393 9.95297 -0.771206" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "U'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In particular, suppose we know the $PA = LU$ factorization for $A$, but we want to solve $A^T x = b$. We can:\n", "\n", "* Write $A = P^T L U \\implies A^T = U^T L^T P$\n", "* Substitute this in to $A^T x = b$ to obtain $U^T L^T P x = b$\n", "* Parenthesize and solve from the \"outside in\": $U^T (L^T (P x)) = b$:\n", " - First solve $U^T c = b$ for $c$ by forward-substitution\n", " - Then solve $L^T d = c$ by backsubstitution\n", " - Then solve $P x = d$ for $x = P^T d$ (i.e. just reversing the permutation)\n", "\n", "Let's try it:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5-element Array{Float64,1}:\n", " 1.28873 \n", " 6.07363 \n", " -11.9273 \n", " -8.92392 \n", " -0.643141" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "b = [4,2,1,-2,3] # \"randomly\" chosen right-hand side\n", "A' \\ b # correct solution to Aᵀx = b" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5-element Array{Float64,1}:\n", " 1.28873 \n", " 6.07363 \n", " -11.9273 \n", " -8.92392 \n", " -0.643141" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = U' \\ b # forward-substitution\n", "d = L' \\ c # backsubstitution\n", "permutation_matrix(p)' * d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As usual, the `lufact(A)` object (which encapsulates $L$, $U$, and $P$) does all this for you (in a more efficient way because it makes sure to take advantage of the special structure of these matrices, which we didn't above):" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5-element Array{Float64,1}:\n", " 1.28873 \n", " 6.07363 \n", " -11.9273 \n", " -8.92392 \n", " -0.643141" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "LU = lufact(A)\n", "LU' \\ b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Symmetric matrices\n", "\n", "A *very* important type of matrix that arises frequently in real problems (we will have *much more* to say about this later in the course, after exam 2) is a [symmetric matrix](https://en.wikipedia.org/wiki/Symmetric_matrix): a matrix $S$ that is equal to its transpose $S = S^T$.\n", "\n", "Given any matrix $A$, we can make a symmetric matrix out of it very easily in two ways:\n", "* $A + A^T$ (or often we write the \"symmetric part\" of $A$ as $\\frac{A + A^T}{2}$). (For square matrices only.)\n", "* $A^T A$ or $A A^T$. (This even works for *non-square* matrix.)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Int64,2}:\n", " 4 -2 -7 -4 -8\n", " 9 -6 -6 -1 -5\n", " -2 -9 3 -5 2\n", " 9 7 -9 5 -8\n", " -1 6 -3 9 6" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Int64,2}:\n", " 8 7 -9 5 -9\n", " 7 -12 -15 6 1\n", " -9 -15 6 -14 -1\n", " 5 6 -14 10 1\n", " -9 1 -1 1 12" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "S = A' + A" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "true" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "S' == (A')'+ A' == A +A' == A' + A" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Int64,2}:\n", " 183 13 -166 21 -159\n", " 13 206 -58 148 8\n", " -166 -58 184 -53 146\n", " 21 148 -53 148 41\n", " -159 8 146 41 193" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "S = A' * A" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "true" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "S' == (A' *A )' == A' * A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ordinary LU factorization of a symmetric $S$, however, seems to have nothing to do with the symmetry of $S$. Is there any special relationship between $L$ and $U$ in this case?" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "([1.0 0.0 … 0.0 0.0; 0.0710383 1.0 … 0.0 0.0; … ; 0.114754 0.714408 … 1.0 0.0; -0.868852 0.0940872 … 1.11804 1.0], [183.0 13.0 … 21.0 -159.0; 0.0 205.077 … 146.508 19.2951; … ; 0.0 0.0 … 40.8852 45.7112; 0.0 0.0 … 0.0 0.303428], [1, 2, 3, 4, 5])" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "L, U = lu(S, Val{false}) # LU without pivoting" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 1.0 0.0 0.0 0.0 0.0\n", " 0.0710383 1.0 0.0 0.0 0.0\n", " -0.907104 -0.225319 1.0 0.0 0.0\n", " 0.114754 0.714408 -0.0408412 1.0 0.0\n", " -0.868852 0.0940872 0.265894 1.11804 1.0" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "L" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 183.0 13.0 -166.0 21.0 -159.0 \n", " 0.0 205.077 -46.2077 146.508 19.2951 \n", " 0.0 0.0 23.0093 -0.939727 6.11804 \n", " 0.0 0.0 0.0 40.8852 45.7112 \n", " 0.0 0.0 0.0 0.0 0.303428" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "U" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$U$ and $L$ *seem* quite different because $L$ has 1's along the diagonal, but $U$ has some other numbers (the pivots). We can extract these with `diag(U)` in Julia:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5-element Array{Float64,1}:\n", " 183.0 \n", " 205.077 \n", " 23.0093 \n", " 40.8852 \n", " 0.303428" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "diag(U)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could make $U$ look more like $L$ if we *divided each row* of $U$ by these pivots. That corresponds to multiplying $D^{-1} U$, where $D$ is the *diagonal matrix* of the pivots:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 183.0 0.0 0.0 0.0 0.0 \n", " 0.0 205.077 0.0 0.0 0.0 \n", " 0.0 0.0 23.0093 0.0 0.0 \n", " 0.0 0.0 0.0 40.8852 0.0 \n", " 0.0 0.0 0.0 0.0 0.303428" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "D = diagm(diag(U)) # diagm makes a diagonal matrix from a 1d array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since a diagonal matrix just multiplies each row by a single number, the inverse of a diagonal matrix simply *divides* each row by the *reciprocal* of that number:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 0.00546448 0.0 0.0 0.0 0.0 \n", " 0.0 0.00487623 0.0 0.0 0.0 \n", " 0.0 0.0 0.0434607 0.0 0.0 \n", " 0.0 0.0 0.0 0.0244587 0.0 \n", " 0.0 0.0 0.0 0.0 3.29568" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inv(D)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 1.0 0.0710383 -0.907104 0.114754 -0.868852 \n", " 0.0 1.0 -0.225319 0.714408 0.0940872\n", " 0.0 0.0 1.0 -0.0408412 0.265894 \n", " 0.0 0.0 0.0 1.0 1.11804 \n", " 0.0 0.0 0.0 0.0 1.0 " ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inv(D) * U" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wait a minute, now the entries look *exactly* like those of $L$, except above the diagonal rather than below. In fact, this is precisely the *transpose* of $L$:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 1.0 0.0710383 -0.907104 0.114754 -0.868852 \n", " 0.0 1.0 -0.225319 0.714408 0.0940872\n", " 0.0 0.0 1.0 -0.0408412 0.265894 \n", " 0.0 0.0 0.0 1.0 1.11804 \n", " 0.0 0.0 0.0 0.0 1.0 " ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "L'" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Int64,2}:\n", " 183 13 -166 21 -159\n", " 13 206 -58 148 8\n", " -166 -58 184 -53 146\n", " 21 148 -53 148 41\n", " -159 8 146 41 193" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "S" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 1.0 0.0 0.0 0.0 0.0\n", " 0.0710383 1.0 0.0 0.0 0.0\n", " -0.907104 -0.225319 1.0 0.0 0.0\n", " 0.114754 0.714408 -0.0408412 1.0 0.0\n", " -0.868852 0.0940872 0.265894 1.11804 1.0" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "L\n", "D = abs.(D)\n", "S = L*abs.(D)*L'\n", "U = L'\n", "L" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since $D^{-1} U = L^T$, we have $U = D L^T$, and hence $S = LU = L D L^T$.\n", "\n", "This fact is so important that it has its own name: we have constructed the **LDLᵀ factorization** of our symmetric matrix $S$. This factorization is useful for two reasons:\n", "\n", "* It preserves the special structure of a symmetric matrix, which is important if we are to do subsequent algebraic manipulations: $(LDL^T)^T = LDL^T$.\n", "\n", "* Clever implementations can save roughly a factor of two in the number of operations by exploiting the symmetry." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Cholesky factorization\n", "\n", "Finally, we should mention another very important variation on this theme.\n", "\n", "Suppose that we have a symmetric matrix $S$ in which **all the pivots are positive**. This is called a [positive-definite matrix](https://en.wikipedia.org/wiki/Positive-definite_matrix), and turns out to be the case *whenever* you construct $S$ from $A^T A$ or $A A^T$ (for real $A$), as above. We will have much more to say about such matrices later in the course.\n", "\n", "In that case, we can take the *square roots* of the pivots to write $D = KK$ where $K$ is a diagonal matrix of the square roots of the pivots:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 13.5277 0.0 0.0 0.0 0.0 \n", " 0.960988 14.3205 0.0 0.0 0.0 \n", " -12.2711 -3.22668 4.7968 0.0 0.0 \n", " 1.55236 10.2307 -0.195907 6.39416 0.0 \n", " -11.7536 1.34738 1.27544 7.14891 0.550843" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "K = L* diagm(sqrt.(diag(D)))" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 0.0 0.0 0.0 0.0 0.0 \n", " 0.0 0.0 7.10543e-15 0.0 -1.77636e-15\n", " 0.0 0.0 2.84217e-14 7.10543e-15 0.0 \n", " 0.0 0.0 0.0 0.0 0.0 \n", " 0.0 0.0 -2.84217e-14 0.0 2.84217e-14" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "K*K' - S # kind of a matrix sqrt: cholesky factorization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we can write $S = LDL^T = LKKL^T = (LK)(LK)^T$. The matrix $\\hat{L} = LK$ is also a lower-triangular matrix, it is $L$ with the *columns* scaled by $K$. So, we can write *any* symmetric positive-definite (SPD) matrix as:\n", "\n", "$$\n", "S = \\hat{L} \\hat{L}^T\n", "$$\n", "\n", "This is called the [Cholesky factorization](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $S$, and it usually the most efficient way to solve SPD systems (half the operations, and often half the storage, compared to LU). In Julia, it is computed by `chol` (which returns $\\hat{L}^T$) or `cholfact`:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "ename": "LoadError", "evalue": "\u001b[91mArgumentError: matrix is not symmetric/Hermitian. This error can be avoided by calling chol(Hermitian(A)) which will ignore either the upper or lower triangle of the matrix.\u001b[39m", "output_type": "error", "traceback": [ "\u001b[91mArgumentError: matrix is not symmetric/Hermitian. This error can be avoided by calling chol(Hermitian(A)) which will ignore either the upper or lower triangle of the matrix.\u001b[39m", "", "Stacktrace:", " [1] \u001b[1mchol\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::Array{Float64,2}\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m./linalg/cholesky.jl:185\u001b[22m\u001b[22m", " [2] \u001b[1minclude_string\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::String, ::String\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m./loading.jl:522\u001b[22m\u001b[22m", " [3] \u001b[1mexecute_request\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::ZMQ.Socket, ::IJulia.Msg\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m/Users/stevenj/.julia/v0.6/IJulia/src/execute_request.jl:193\u001b[22m\u001b[22m", " [4] \u001b[1m(::Compat.#inner#14{Array{Any,1},IJulia.#execute_request,Tuple{ZMQ.Socket,IJulia.Msg}})\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m/Users/stevenj/.julia/v0.6/Compat/src/Compat.jl:332\u001b[22m\u001b[22m", " [5] \u001b[1meventloop\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::ZMQ.Socket\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m/Users/stevenj/.julia/v0.6/IJulia/src/eventloop.jl:8\u001b[22m\u001b[22m", " [6] \u001b[1m(::IJulia.##13#16)\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m./task.jl:335\u001b[22m\u001b[22m" ] } ], "source": [ "chol(S)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whoops, the problem was that $S = A^T A$ is supposed to be symmetric, but it is not *exactly* symmetric due to roundoff errors:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 0.0 0.0 0.0 0.0 0.0 \n", " 0.0 0.0 -7.10543e-15 0.0 1.77636e-15\n", " 0.0 7.10543e-15 0.0 -7.10543e-15 -2.84217e-14\n", " 0.0 0.0 7.10543e-15 0.0 0.0 \n", " 0.0 -1.77636e-15 2.84217e-14 0.0 0.0 " ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "S - S'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, we have to tell Julia explicitly that it is symmetric, so that it should ignore these tiny differences between the upper and lower triangles:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 UpperTriangular{Float64,Array{Float64,2}}:\n", " 13.5277 0.960988 -12.2711 1.55236 -11.7536 \n", " ⋅ 14.3205 -3.22668 10.2307 1.34738 \n", " ⋅ ⋅ 4.7968 -0.195907 1.27544 \n", " ⋅ ⋅ ⋅ 6.39416 7.14891 \n", " ⋅ ⋅ ⋅ ⋅ 0.550843" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chol(Symmetric(S))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the same as our $K^T$ matrix from above:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5×5 Array{Float64,2}:\n", " 13.5277 0.960988 -12.2711 1.55236 -11.7536 \n", " 0.0 14.3205 -3.22668 10.2307 1.34738 \n", " 0.0 0.0 4.7968 -0.195907 1.27544 \n", " 0.0 0.0 0.0 6.39416 7.14891 \n", " 0.0 0.0 0.0 0.0 0.550843" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "K'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One interesting fact about Cholesky factorization of SPD matrices is that **row swaps are never required**, even when concerns about roundoff errors are included, so there is no $P$ matrix.\n", "\n", "We won't do much with Cholesky factorizations in 18.06, but they are extremely useful in many applications of linear algebra. If your are solving a system of equations involving $A^TA$, and hence SPD, then you gain a factor of 2 or more by telling the computer to use Cholesky factorization instead of LU, and there are many other uses of Cholesky factorization as well." ] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.6.3", "language": "julia", "name": "julia-0.6" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }