{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Gauss–Jordan and computing A⁻¹\n", "\n", "The Gauss–Jordan algorithm is a technique for hand-calculation of the inverse. Nowadays, you should hardly ever compute a matrix inverse, even on a computer, but Gauss–Jordan is still useful to go over:\n", "\n", "* It helps us to understand when and why an inverse matrix exists.\n", "\n", "* It gives us yet another example to help us understand the *structure* of elimination operations" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "using LinearAlgebra # as usual, we'll load this package" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Review: Inverses\n", "\n", "The inverse of a linear operator $A$ is the operat that \"undoes\" the action of $A$:\n", "\n", "$$\n", "\\boxed{A^{-1}(Ax) = x} .\n", "$$\n", "\n", "for *any* $x$. Equivalently, $\\boxed{Ax=b \\implies x = A^{-1} b}$. This means that\n", "\n", "* **A⁻¹ only exists for (m×m) square matrices with m (nonzero) pivots**\n", "\n", "since for non-square matrices or matrices with one or more \"zero pivots\" we can't always solve $Ax=b$ (we'd divide by zero during backsubstitution). It is also easy to see that $\\boxed{(A^{-1})^{-1} = A}$, i.e. that $A$ undoes the action of $A^{-1}$.\n", "\n", "Equivalently,\n", "$$\n", "\\boxed{AA^{-1} = A^{-1} A = I}\n", "$$\n", "where $I$ is the m×m identity matrix — in linear algebra, we typically *infer* the size of $I$ from context, but if it is ambiguous we might write $I_m$.\n", "\n", "### Inverses of products: (AB)⁻¹ = B⁻¹A⁻¹\n", "\n", "It is easy to see that the inverse of a product $BA$ is the product of the inverses in *reverse order*: $\\boxed{(AB)^{-1} = B^{-1} A^{-1}}$. Intuitively, when you reverse a sequence of operations, you always need to retrace your steps in backwards order. Explicitly:\n", "$$\n", "(AB)^{-1} AB = B^{-1} \\underbrace{A^{-1} A}_I B = B^{-1} B = I \\, .\n", "$$\n", "\n", "For example, we saw that Gaussian elimination corresponded to the factorization $A = LU$, where $U$ is the result of elimination and $L$ is simply a record of the elimination steps. Then\n", "$$\n", "Ax = b \\implies x = A^{-1} b = (LU)^{-1} b = \\underbrace{U^{-1} \\underbrace{ L^{-1} b }_\\mbox{forward substitution}}_\\mbox{backsubstitution} \\, .\n", "$$\n", "\n", "### Rarely compute inverses!\n", "\n", "In general **rarely if ever** compute inverses explicitly:\n", "\n", "* **Read \"x = A⁻¹b\" as \"solve Ax=b for x\" the best way you can**, and invariably there are better ways to solve for x than inverting a matrix.\n", "\n", "More on this below. Instead, **inverses are mostly a *conceptual* tool** to move operators/matrices around in equations. Once we have the equations in the form that we want, we then carry out the computations in some other way.\n", "\n", "### Notation:\n", "\n", "Inverses allow us to \"divide by matrices\", but we always have to be clear about whether we are dividing **on the left or on the right**. The following notations can be convenient, and are used in computer software like Julia and Matlab and elsewhere for square invertible matrices $A$:\n", "\n", "$$ B / A = BA^{-1}, \\\\ A \\backslash B = A^{-1} B$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inverses by linear equations\n", "\n", "The equation $A A^{-1} = I$ actually gives us the algorithm to compute $A^{-1}$.\n", "\n", "Suppose we denote the *columns* of $A^{-1} = \\begin{pmatrix} x_1 & x_2 & \\cdots & x_m \\end{pmatrix}$, and the columns of $I = \\begin{pmatrix} e_1 & e_2 & \\cdots & e_m \\end{pmatrix}$.\n", "\n", "Then \n", "$$\n", "A \\underbrace{\\begin{pmatrix} x_1 & x_2 & \\cdots & x_m \\end{pmatrix}}_{A^{-1}} = \n", "\\begin{pmatrix} A x_1 & A x_2 & \\cdots & A x_n \\end{pmatrix} = \\underbrace{\\begin{pmatrix} e_1 & e_2 & \\cdots & e_m \\end{pmatrix}}_I.\n", "$$\n", "(The key fact here is that **multiplying A by a matrix on the right** is equivalent to **multiplying A by each column of that matrix**, which you can easily see by writing out the computation.)\n", "\n", "In consequence $A x_k = e_k$, which is a **linear equation for the k-th column of A⁻¹**. Equivalently, to find A⁻¹ for an m×m matrix A, we must **solve Ax=b for m right-hand sides** equal to the columns of I.\n", "\n", "* Put another way, for *any* matrix $B$, $Be_k = k\\mbox{-th column of }B$. So the k-th column of $A^{-1}$ is $x_k = A^{-1} e_k$, i.e. the solution to $Ax_k = e_k$.\n", "\n", "\n", "* Ideally, we do Gaussian elimination $A=LU$ *once*, then compute $x_k = U^{-1} L^{-1} e_k$ by forward+back-substitution for each column of $I$. (This is essentially what the computer does.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: computing L⁻¹ = E\n", "\n", "For example, how might we compute the inverse of the L matrix we got from Gaussian elimination in the last lecture, which should give us $L^{-1} = E$? We solve\n", "\n", "$$\n", "\\underbrace{\\begin{pmatrix} 1 & & \\\\ 1 & 1 & \\\\ 3 & -1 & 1 \\end{pmatrix}}_L x_k = e_k\n", "$$\n", "\n", "for $e_1,e_2,e_3$ (the columns of the 3×3 identity I).\n", "\n", "Let's do it for $e_1$, to find the **first column** $x_1$ of $L^{-1} = E$:\n", "$$\n", "\\underbrace{\\begin{pmatrix} 1 & & \\\\ 1 & 1 & \\\\ 3 & -1 & 1 \\end{pmatrix}}_L \\underbrace{\\begin{pmatrix} a \\\\ b \\\\ c \\end{pmatrix}}_{x_1} = \\underbrace{\\begin{pmatrix} 1 \\\\ 0 \\\\ 0 \\end{pmatrix}}_{x_1}\n", "$$\n", "By forward substitution (from top to bottom), we get $a = 1$, $1a + 1b = 0 \\implies b = -1$, $3a - 1b + 1c = 0 \\implies c = -4$, so $\\boxed{x_1 = [1, -1, -4]}$. Let's check:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Matrix{Int64}:\n", " 1 0 0\n", " 1 1 0\n", " 3 -1 1" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "L = [1 0 0\n", " 1 1 0\n", " 3 -1 1]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Matrix{Float64}:\n", " 1.0 0.0 0.0\n", " -1.0 1.0 0.0\n", " -4.0 1.0 1.0" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "E = L^-1" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Vector{Float64}:\n", " 1.0\n", " -1.0\n", " -4.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "E[:,1] # first column" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Yup, the first column is `[1, -1, -4]`. We could easily get the other two columns as well (left as an exercise).\n", "\n", "**Important note***: there is **no simple formula** for the inverse of a triangular matrix like L or U! You can invert *individual* elimination steps $E_k$ by flipping signs, but the *product* of the elimination steps is not so easy to invert.\n", "\n", "(A lot of students get confused by this because Strang's lectures and textbook start by inverting individual elimination steps, which is easier.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way to write this is `L \\ I`, which *conceptually* means \"multiply $I$ by $L^{-1}$ on the *left*\", but *actually* in Julia is computed without inverting any matrix explicitly, by instead solving with 3 right-hand sides:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Matrix{Float64}:\n", " 1.0 0.0 0.0\n", " -1.0 1.0 0.0\n", " -4.0 1.0 1.0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "L \\ I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that `I` is a special object defined by Julia's `LinearAlgebra` package which essentially means **an identity matrix whose size is inferred from context**.\n", "\n", "If we want an $m \\times m$ identity matrix, we can use `I(m)`:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Diagonal{Bool, Vector{Bool}}:\n", " 1 ⋅ ⋅\n", " ⋅ 1 ⋅\n", " ⋅ ⋅ 1" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "I(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Gauss–Jordan algorithm.\n", "\n", "Gauss–Jordan could be viewed as just a trick (primarily for hand calculation) to organize solving $A b_k = e_k$. But it's also nice to think about algebraically — it is a nice application of our \"matrix viewpoint\" of Gaussian elimination.\n", "\n", "The Gauss–Jordan idea, in a nutshell is: **if we do some row operations on A to obtain I, then doing the *same* row operations on I gives A⁻¹**. Why?\n", "\n", "* Row operations correspond to multiplying $A$ by a some matrix $E=\\cdots E_2 E_1$ on the *left*.\n", "\n", "* So, doing row operations that turn $A$ into $I$ means that $EA = I$, and hence $E = A^{-1}$.\n", "\n", "* Doing the *same* row operations on $I$ is equivalent to multiplying $I$ on the *left* by the *same* matrix $E$, giving $EI$. But $EI = E$, and $E = A^{-1}$, so this gives $A^{-1}$!\n", "\n", "As usual for Gaussian elimination, to do the *same* row operations on both $A$ and $I$ we **augment A** with $I$. That is, we do:\n", "\n", "$$\n", "\\boxed{\n", "\\left(\\begin{array}{c|c}A & I\\end{array}\\right) \\underset{\\mbox{row ops}}{\\longrightarrow} \\left(\\begin{array}{c|c}I & A^{-1}\\end{array}\\right)\n", "}\n", "$$\n", "\n", "### Elimination $A \\to I$\n", "\n", "How do we do row operations to turn $A$ into $I$? Simple:\n", "\n", "1. First, do ordinary Gaussian elimination \"downwards\" to turn $A$ into $U$ (an **upper-triangular** matrix).\n", "\n", "2. Then, do Gaussian elimination \"upwards\" on $U$ to eliminate entries *above* the diagonal, turning $U$ into a **diagonal** matrix $D$\n", "\n", "3. Finally, divide each row of $D$ by the diagonal entry to turn it into $I$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Gauss–Jordan example\n", "\n", "Let's perform these $A \\to I$ elimination steps on $3 \\times 3$ matrix $A$: first eliminate down to make $U$, then eliminate up to make $D$, then divide by the diagonals to make $I$:\n", "\n", "$$\n", "\\underbrace{\\begin{pmatrix} \\boxed{1} & 4 & 1 \\\\ 1 & 2 & -1 \\\\ 3 & 14 & 6 \\end{pmatrix}}_A\n", "\\longrightarrow\n", "\\begin{pmatrix} \\boxed{1} & 4 & 1 \\\\ 0 & \\boxed{-2} & -2 \\\\ 0 & 2 & 3 \\end{pmatrix}\n", "\\longrightarrow\n", "\\underbrace{\\begin{pmatrix} \\boxed{1} & 4 & 1 \\\\ 0 & \\boxed{-2} & -2 \\\\ 0 & 0 & \\boxed{1} \\end{pmatrix}}_U\n", "\\\\\n", "\\longrightarrow\n", "\\begin{pmatrix} 1 & 0 & -3 \\\\ 0 & \\boxed{-2} & -2 \\\\ 0 & 0 & 1 \\end{pmatrix}\n", "\\longrightarrow\n", "\\underbrace{\\begin{pmatrix} 1 & 0 & 0 \\\\ 0 & -2 & 0 \\\\ 0 & 0 & \\boxed{1} \\end{pmatrix}}_D\n", "\\longrightarrow\n", "\\underbrace{\\begin{pmatrix} 1 & 0 & 0 \\\\ 0 & 1 & 0 \\\\ 0 & 0 & 1 \\end{pmatrix}}_I\n", "$$\n", "\n", "No problem! It is easy to see that this will work **whenever A has all of its pivots** (i.e. it is non-singular)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get the inverse, we needed to augment this with $I$ so that we perform the same elimination steps on both.\n", "\n", "$$\n", "\\left(\\begin{array}{rrr|rrr}\n", " \\boxed{1} & 4 & 1 & 1 & 0 & 0 \\\\\n", " 1 & 2 & -1 & 0 & 1 & 0 \\\\\n", " 3 & 14 & 6 & 0 & 0 & 1 \\end{array}\\right)\n", "\\longrightarrow\n", "\\left(\\begin{array}{rrr|rrr}\n", " \\boxed{1} & 4 & 1 & 1 & 0 & 0 \\\\\n", " 0 & \\boxed{-2} & -2 & -1 & 1 & 0 \\\\\n", " 0 & 2 & 3 & -3 & 0 & 1 \\end{array}\\right) \\\\\n", "\\longrightarrow\n", "\\left(\\begin{array}{rrr|rrr}\n", " \\boxed{1} & 4 & 1 & 1 & 0 & 0 \\\\\n", " 0 & \\boxed{-2} & -2 & -1 & 1 & 0 \\\\\n", " 0 & 0 & \\boxed{1} & -4 & 1 & 1 \\end{array}\\right)\n", "\\longrightarrow\n", "\\left(\\begin{array}{rrr|rrr}\n", " 1 & 0 & -3 & -1 & 2 & 0 \\\\\n", " 0 & \\boxed{-2} & -2 & -1 & 1 & 0 \\\\\n", " 0 & 0 & 1 & -4 & 1 & 1 \\end{array}\\right) \\\\\n", "\\longrightarrow\n", "\\left(\\begin{array}{rrr|rrr}\n", " 1 & 0 & 0 & -13 & 5 & 3 \\\\\n", " 0 & -2 & 0 & -9 & 3 & 2 \\\\\n", " 0 & 0 & \\boxed{1} & -4 & 1 & 1 \\end{array}\\right)\n", "\\longrightarrow\n", "\\left(\\begin{array}{rrr|rrr}\n", " 1 & 0 & 0 & -13 & 5 & 3 \\\\\n", " 0 & 1 & 0 & 4.5 & -1.5 & -1 \\\\\n", " 0 & 0 & 1 & -4 & 1 & 1 \\end{array}\\right)\n", "$$\n", "\n", "Whew, this was a lot of work! Did we get the right answer?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Matrix{Int64}:\n", " 1 4 1\n", " 1 2 -1\n", " 3 14 6" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = [1 4 1\n", " 1 2 -1\n", " 3 14 6]" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Matrix{Float64}:\n", " -13.0 5.0 3.0\n", " 4.5 -1.5 -1.0\n", " -4.0 1.0 1.0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A^-1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hooray!\n", "\n", "(It is *really* easy to make a mistake during this process.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# (Almost) Never Compute Inverses!\n", "\n", "Matrix inverses are funny, however:\n", "\n", "* Inverse matrices are very convenient in *analytical* manipulations, because they allow you to move matrices from one side to the other of equations easily.\n", "\n", "* Inverse matrices are **almost never computed** in \"serious\" numerical calculations. Whenever you see $A^{-1} B$ (or $A^{-1} b$), when you go to *implement* it on a computer you should *read* $A^{-1} B$ as \"solve $AX = B$ by some method.\" e.g. solve it by `A \\ B` or by first computing the LU factorization of $A$ and then using it to solve $AX = B$.\n", "\n", "One reason that you don't usually compute inverse matrices is that it is wasteful: once you have $A=LU$ (later we will generalize this to \"$PA = LU$\"), you can solve $AX=B$ directly without bothering to find $A^{-1}$, and computing $A^{-1}$ requires much more work if you only have to solve a few right-hand sides.\n", "\n", "Another reason is that for many special matrices, there are ways to solve $AX=B$ *much* more quickly than you can find $A^{-1}$. For example, many large matrices in practice are [sparse](https://en.wikipedia.org/wiki/Sparse_matrix) (mostly zero), and often for sparse matrices you can arrange for $L$ and $U$ to be sparse too. Sparse matrices are much more efficient to work with than general \"dense\" matrices because you don't have to multiply (or even store) the zeros. Even if $A$ is sparse, however, $A^{-1}$ is usually non-sparse, so you lose the special efficiency of sparsity if you compute the inverse matrix. \n", "\n", "For example:\n", "\n", "* If you see $U^{-1} b$ where $U$ is *upper* triangular, don't compute $U^{-1}$ explicitly! Just solve $Ux = b$ by *back-substitution* (from the bottom row up).\n", "\n", "* If you see $L^{-1} b$ where $L$ is *lower* triangular, don't compute $L^{-1}$ explicitly! Just solve $Lx = b$ by *forward-substitution* (from the top row down)." ] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "kernelspec": { "display_name": "Julia 1.7.1", "language": "julia", "name": "julia-1.7" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.7.1" } }, "nbformat": 4, "nbformat_minor": 2 }