---
title: Linear equations and matrix inverses (BV Chapter 11)
subtitle: Biostat 216
author: Dr. Hua Zhou @ UCLA
date: today
format:
  html:
    theme: cosmo
    embed-resources: true
    number-sections: true
    toc: true
    toc-depth: 4
    toc-location: left
    code-fold: false
jupyter:
  jupytext:
    formats: 'ipynb,qmd'
    text_representation:
      extension: .qmd
      format_name: quarto
      format_version: '1.0'
      jupytext_version: 1.15.2
  kernelspec:
    display_name: Julia 1.9.3
    language: julia
    name: julia-1.9
---

```{julia}
using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()
```

```{julia}
using LinearAlgebra
```

## Generalized inverse and Moore-Penrose inverse

- Let $\mathbf{A} \in \mathbb{R}^{m \times n}$. A matrix $\mathbf{G} \in \mathbb{R}^{n \times m}$ is called the **Moore-Penrose inverse** or **MP inverse** of $\mathbf{A}$ if it satisifes following four conditions:   
    (1) $\mathbf{A} \mathbf{G} \mathbf{A} = \mathbf{A}$. $\quad \quad$ (_generalized inverse_, $g_1$ inverse, or _inner pseudo-inverse_).   
    (2) $\mathbf{G} \mathbf{A} \mathbf{G} = \mathbf{G}$. $\quad \quad$ (_outer pseudo-inverse_).    
    (3) $(\mathbf{A} \mathbf{G})' = \mathbf{A} \mathbf{G}$.  
    (4) $(\mathbf{G} \mathbf{A})' = \mathbf{G} \mathbf{A}$.  
    We shall denote the **Moore-Penrose inverse** of $\mathbf{A}$ by $\mathbf{A}^+$.

- Any matrix $\mathbf{G} \in \mathbb{R}^{n \times m}$ that satisfies (1) is called a **generalized inverse**, or **$g_1$ inverse**, or **inner pseudo-inverse**. Denoted by $\mathbf{A}^-$ or $\mathbf{A}^g$.  

    Generalized inverse is not unique in general.

- Definition (1) for generalized inverses is motivated by the following result.

    If $\mathbf{A} \mathbf{x} = \mathbf{b}$ has solution(s), then $\mathbf{A}^- \mathbf{b}$ is a solution for any generalized inverse $\mathbf{A}^-$. 

    Proof: $\mathbf{A} (\mathbf{A}^- \mathbf{b}) = \mathbf{A} (\mathbf{A}^- \mathbf{A} \mathbf{x}) = (\mathbf{A} \mathbf{A}^- \mathbf{A}) \mathbf{x} = \mathbf{A} \mathbf{x} = \mathbf{b}$. 

- Any matrix $\mathbf{G} \in \mathbb{R}^{n \times m}$ that satisfies (1)+(2) is called a **reflective generalized inverse**, or **$g_2$ inverse**, or **outer pseudo-inverse**. Denoted by $\mathbf{A}^*$.

- The **Moore-Penrose inverse** of any matrix $\mathbf{A}$ exists and is unique. 

    Proof of uniqueness (optional): Let $\mathbf{G}_1, \mathbf{G}_2 \in \mathbb{R}^{n \times m}$ be two matrices satisfying properties (1)-(4). Then  
$$
    \mathbf{A} \mathbf{G}_1 = (\mathbf{A} \mathbf{G}_1)' = \mathbf{G}_1' \mathbf{A}' = \mathbf{G}_1' (\mathbf{A} \mathbf{G}_2 \mathbf{A})' = \mathbf{G}_1' \mathbf{A}' \mathbf{G}_2' \mathbf{A}' = (\mathbf{A} \mathbf{G}_1)' (\mathbf{A} \mathbf{G}_2)' = \mathbf{A} \mathbf{G}_1 \mathbf{A} \mathbf{G}_2 = \mathbf{A} \mathbf{G}_2.
$$
Similarly,
$$
    \mathbf{G}_1 \mathbf{A} = (\mathbf{G}_1 \mathbf{A})' = \mathbf{A}' \mathbf{G}_1' = (\mathbf{A} \mathbf{G}_2 \mathbf{A})' \mathbf{G}_1' = \mathbf{A}' \mathbf{G}_2' \mathbf{A}' \mathbf{G}_1' = (\mathbf{G}_2 \mathbf{A})' (\mathbf{G}_1 \mathbf{A})' = \mathbf{G}_2 \mathbf{A} \mathbf{G}_1 \mathbf{A} = \mathbf{G}_2 \mathbf{A}.
$$
Hence,
$$
    \mathbf{G}_1 = \mathbf{G}_1 \mathbf{A} \mathbf{G}_1 = \mathbf{G}_1 \mathbf{A} \mathbf{G}_2 = \mathbf{G}_2 \mathbf{A} \mathbf{G}_2 = \mathbf{G}_2.
$$
    
    Proof of existence: TODO later. We construct a matrix that satisfies (1)-(4) using the singular value decomposition (SVD) of $\mathbf{A}$.
    
- Following are true: 
    1. $(\mathbf{A}^-)'$ is a generalized inverse of $\mathbf{A}'$. 
    2. $(\mathbf{A}')^+ = (\mathbf{A}^+)'$.  
    3. For any nonzero $k$, $(1/k) \mathbf{A}^-$ is a generalized inverse of $k \mathbf{A}$.  
    4. $(k \mathbf{A})^+ = (1/k) \mathbf{A}^+$.

    Proof: Check properties (1)-(4).
    
- **Multiplication by generalized inverse does not change rank.**  
    1. $\mathcal{C}(\mathbf{A}) = \mathcal{C}(\mathbf{A} \mathbf{A}^-)$ and $\mathcal{C}(\mathbf{A}') = \mathcal{C}((\mathbf{A}^- \mathbf{A})')$.  
    2. $\text{rank}(\mathbf{A}) = \text{rank}(\mathbf{A} \mathbf{A}^-) = \text{rank}(\mathbf{A}^- \mathbf{A})$. 
    
    Proof of 1: We already know $\mathcal{C}(\mathbf{A}) \supseteq \mathcal{C}(\mathbf{A} \mathbf{A}^-)$. Now since $\mathbf{A} = \mathbf{A} \mathbf{A}^- \mathbf{A}$, we also have $\mathcal{C}(\mathbf{A}) \subseteq \mathcal{C}(\mathbf{A} \mathbf{A}^-)$.   
    Proof of 2: Since $\mathbf{A} = \mathbf{A} \mathbf{A}^- \mathbf{A}$, $\text{rank}(\mathbf{A}) \le \text{rank}(\mathbf{A} \mathbf{A}^-) \le \text{rank}(\mathbf{A})$.  
    
- **Generalized inverse has equal or larger rank.**  $\text{rank}(\mathbf{A}^-) \ge \text{rank}(\mathbf{A})$.  

    Proof: $\text{rank}(\mathbf{A}) = \text{rank}(\mathbf{A}\mathbf{A}^- \mathbf{A}) \le \text{rank}(\mathbf{A}\mathbf{A}^-) \le \text{rank}(\mathbf{A}^-)$.
    
- Let $\mathbf{A} \in \mathbb{R}^{m \times n}$.

    1. If $\mathbf{A}$ has full column rank, then the MP inverse is $\mathbf{A}^+ = (\mathbf{A}'\mathbf{A})^{-1} \mathbf{A}'$. Given QR factorization $\mathbf{A} = \mathbf{Q} \mathbf{R}$, $\mathbf{A}^+ = \mathbf{R}^{-1} \mathbf{Q}'$.    
    
    2. If $\mathbf{A}$ has full row rank, then the MP inverse is $\mathbf{A}^+ = \mathbf{A}'(\mathbf{A} \mathbf{A}')^{-1}$.  Given QR factorization $\mathbf{A}' = \mathbf{Q} \mathbf{R}$, $\mathbf{A}^+ = \mathbf{Q} \mathbf{R}'^{-1}$. 

```{julia}
# a 3x2 matrix (full column rank)
A = [-3 -4; 4 6; 1 1]
```

```{julia}
# compute MP inverse using SVD
pinv(A)
```

```{julia}
# compute MP inverse using pseudo-inverse
(A'A) \ A'
```

```{julia}
# compute MP inverse by QR
qra = qr(A)
```

```{julia}
qra.R \ qra.Q[:, 1:2]'
```

## Linear equations

- Solving linear system
\begin{eqnarray*}
a_{11} x_1 + \cdots + a_{1n} x_n &=& b_1 \\
a_{21} x_1 + \cdots + a_{2n} x_n &=& b_2 \\
&\vdots& \\
a_{m1} x_1 + \cdots + a_{mn} x_n &=& b_m
\end{eqnarray*}
or
$$
    \mathbf{A} \mathbf{x} = \mathbf{b},
$$
where $\mathbf{A} \in \mathbb{R}^{m \times n}$ (coefficient matrix), $\mathbf{x} \in \mathbb{R}^n$ (solution vector), and $\mathbf{b} \in \mathbb{R}^m$ (right hand side) is a central theme in linear algebra. 


### When is there a solution to $\mathbf{A} \mathbf{x} = \mathbf{b}$?

- Theorem: The following statements are equivalent.

    1. The linear system $\mathbf{A} \mathbf{x} = \mathbf{b}$ has a solution (**consistent**).

    2. $\mathbf{b} \in {\cal C}(\mathbf{A})$.
 
    3. $\text{rank}((\mathbf{A} \,\, \mathbf{b})) = \text{rank}(\mathbf{A})$.
    
    4. $\mathbf{A} \mathbf{A}^- \mathbf{b} = \mathbf{b}$.

    Proof: Equivalence between 1, 2 and 3 is trivial. 4 implies 1: apparently $\tilde{\mathbf{x}} = \mathbf{A}^- \mathbf{b}$ is a solution to $\mathbf{A} \mathbf{x} = \mathbf{b}$. 1 implies 4: if $\tilde{\mathbf{x}}$ is a solution, then $\mathbf{b} = \mathbf{A} \tilde{\mathbf{x}} = \mathbf{A} \mathbf{A}^- \mathbf{A} \tilde{\mathbf{x}} = \mathbf{A} \mathbf{A}^- \mathbf{b}$.

    The last equivalence gives some intuition why $\mathbf{A}^-$ is called an inverse.

### How to characterize all solutions to $\mathbf{A} \mathbf{x} = \mathbf{b}$?

- Let's first study the homogenous case $\mathbf{A} \mathbf{x} = \mathbf{0}$, which is always consistent (why?).

    Theorem: $\tilde{\mathbf{x}}$ is a solution to $\mathbf{A} \mathbf{x} = \mathbf{0}$ if and only if $\tilde{\mathbf{x}} = (\mathbf{I}_n - \mathbf{A}^- \mathbf{A}) \mathbf{q}$ for some $\mathbf{q} \in \mathbb{R}^n$.

    Proof: _If_: Apparently $(\mathbf{I}_n - \mathbf{A}^- \mathbf{A}) \mathbf{q}$ is a solution regardless value of $\mathbf{q}$ since $\mathbf{A} (\mathbf{I}_n - \mathbf{A}^- \mathbf{A}) = \mathbf{A} - \mathbf{A} = \mathbf{0}$. _Only if_: If $\tilde{\mathbf{x}}$ is a solution, then $\tilde{\mathbf{x}} = (\mathbf{I}_n - \mathbf{A}^- \mathbf{A}) \mathbf{q}$ by taking $\mathbf{q} = \tilde{\mathbf{x}}$.

- Rephrasing above result, we have $\mathcal{N}(\mathbf{A}) = \mathcal{C}(\mathbf{I}_n - \mathbf{A}^- \mathbf{A})$. 

- Remarks (optional):

    1. The matrix $\mathbf{I}_n - \mathbf{A}^- \mathbf{A}$ is idempotent and is an oblique projection onto $\mathcal{N}(\mathbf{A})$.
 
    2. The matrix $\mathbf{I}_n - \mathbf{A}^+ \mathbf{A}$ is symmetric and idempotent and is an orthogonal projection onto $\mathcal{N}(\mathbf{A})$.
 
    3. The matrix $\mathbf{A} \mathbf{A}^-$ is idempotent and is an oblique projection onto $\mathcal{C}(\mathbf{A})$.
 
    4. The matrix $\mathbf{A} \mathbf{A}^+$ is symmetric and idempotent and is an orthogonal projection onto $\mathcal{C}(\mathbf{A})$.

- Now for the inhomogenous case $\mathbf{A} \mathbf{x} = \mathbf{b}$.

    Theorem: If $\mathbf{A} \mathbf{x} = \mathbf{b}$ is consistent, then $\tilde{\mathbf{x}}$ is a solution if and only if
    $$
      \tilde{\mathbf{x}} = \mathbf{A}^- \mathbf{b} + (\mathbf{I}_n - \mathbf{A}^- \mathbf{A}) \mathbf{q} 
    $$
    for some $\mathbf{q} \in \mathbb{R}^n$.

    Interpretation: **a specific solution** + **a vector in $\mathcal{N}(\mathbf{A})$**.

    Proof:
    \begin{eqnarray*}
    & & \mathbf{A} \mathbf{x} = \mathbf{b} \\
    &\Leftrightarrow& \mathbf{A} \mathbf{x} = \mathbf{A} \mathbf{A}^- \mathbf{b} \\
    &\Leftrightarrow& \mathbf{A} (\mathbf{x} - \mathbf{A}^- \mathbf{b}) = \mathbf{0} \\
    &\Leftrightarrow& \mathbf{x} - \mathbf{A}^- \mathbf{b} = (\mathbf{I}_n - \mathbf{A}^- \mathbf{A}) \mathbf{q} \text{ for some } \mathbf{q} \in \mathbb{R}^n \\
    &\Leftrightarrow& \mathbf{x} = \mathbf{A}^- \mathbf{b} + (\mathbf{I}_n - \mathbf{A}^- \mathbf{A}) \mathbf{q} \text{ for some } \mathbf{q} \in \mathbb{R}^n.
    \end{eqnarray*}

- Corollary: $\mathbf{A} \mathbf{x} = \mathbf{b}$ is consistent for _all_ $\mathbf{b}$ if and only if $\mathbf{A}$ has full row rank.

- Corollary: If a system is consistent, its solution is unique if and only if $\mathbf{A}$ has full column rank.

## Invertible matrix

Assume that a square $\mathbf{A} \in \mathbb{R}^{n \times n}$ has full row and column rank.

- If a square $\mathbf{A}$ has full row and column rank, then $\mathbf{A}$ is **invertible** (or **nonsingular**).

- Consider the left inverse $\mathbf{X}$ that satisifies $\mathbf{X} \mathbf{A} = \mathbf{I}$ and the right inverse $\mathbf{Y}$ that satisfies $\mathbf{A} \mathbf{Y} = \mathbf{I}$. Both left and right inverses exist (why?) and they are equal:
$$
\mathbf{X} = \mathbf{X} \mathbf{I} = \mathbf{X} \mathbf{A} \mathbf{Y} = \mathbf{I} \mathbf{Y} = \mathbf{Y}.
$$
Uniqueness of the left and right inverse follows from uniqueness of the basis expansion. 

- We say that $\mathbf{A}$ is **invertible** (or **nonsingular**), and denote the (left and right) **inverse** by $\mathbf{A}^{-1}$.

- The generalized inverse $\mathbf{A}^-$ is unique, which is the same as $\mathbf{A}^{-1}$.

    Proof: Let $\mathbf{G}_1$ and $\mathbf{G}_2$ be two generalized inverses of $\mathbf{A}$. Then $\mathbf{A} \mathbf{G}_1 \mathbf{A} = \mathbf{A}$ and $\mathbf{A} \mathbf{G}_2 \mathbf{A} = \mathbf{A}$. Multiplying both equations by $\mathbf{A}^{-1}$ on the left and right, we obtain $\mathbf{G}_1 = \mathbf{A}^{-1} = \mathbf{G}_2$.

- For any $\mathbf{b}$, the unique solution is $\mathbf{A}^{-1} \mathbf{b}$.

### Examples of matrix inverse

- Identity matrix $\mathbf{I}_n$ is invertible, with inverse $\mathbf{I}_n$. 

- A diagonal matrix is invertible if all diagonal entries are nonzero.
$$
\text{diag}(a_{11}, \ldots, a_{nn})^{-1} = \text{diag}(a_{11}^{-1}, \ldots, a_{nn}^{-1}).
$$

- A $2 \times 2$ matrix is invertible if and only if $a_{11} a_{22} \ne a_{12} a_{21}$
$$
\begin{pmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22}
\end{pmatrix}^{-1} = \frac{1}{a_{11} a_{22} - a_{12} a_{21}} \begin{pmatrix}
a_{22} & -a_{12} \\
-a_{21} & a_{11}
\end{pmatrix}.
$$

- Inverse of orthogonal matrix. If $\mathbf{A}$ is square and has orthonormal columns, then
$$
\mathbf{A}^{-1} = \mathbf{A}'.
$$

- Inverse of matrix transpose. If $\mathbf{A}$ is invertible, then
$$
(\mathbf{A}')^{-1} = (\mathbf{A}^{-1})'.
$$

- Inverse of matrix product. If $\mathbf{A}$ and $\mathbf{B}$ are invertible, then
$$
(\mathbf{A} \mathbf{B})^{-1} = \mathbf{B}^{-1} \mathbf{A}^{-1}.
$$  

## Solving linear equations - triangular systems

- To solve $\mathbf{A} \mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{n \times n}$ is **lower triangular**
$$
\begin{pmatrix}
    a_{11} & 0 & \cdots & 0 \\
    a_{21} & a_{22} & \cdots & 0 \\
    \vdots & \vdots & \ddots & \vdots \\
    a_{n1} & a_{n2} & \cdots & a_{nn}
\end{pmatrix}
\begin{pmatrix}
x_1 \\ x_2 \\ \vdots \\ x_n
\end{pmatrix} = \begin{pmatrix}
b_1 \\ b_2 \\ \vdots \\ b_n
\end{pmatrix}.
$$

    **Forward substitution** algorithm: 
$$
\begin{eqnarray*}
    x_1 &=& b_1 / a_{11} \\
    x_2 &=& (b_2 - a_{21} x_1) / a_{22} \\
    x_3 &=& (b_3 - a_{31} x_1 - a_{32} x_2) / a_{33} \\
    &\vdots& \\
    x_n &=& (b_n - a_{n1} x_1 - a_{n2} x_2 - \cdots - a_{n,n-1} x_{n-1}) / a_{nn}.
\end{eqnarray*}
$$
$1 + 3 + 5 + \cdots + (2n-1) = n^2$ flops. 

- To solve $\mathbf{A} \mathbf{x} = \mathbf{b}$, where $\mathbf{A} \in \mathbb{R}^{n \times n}$ is upper triangular  
$$
\begin{pmatrix}
    a_{11} & \cdots & a_{1,n-1} & a_{1n} \\
    \vdots & \ddots & \vdots & \vdots \\
    0 & \cdots & a_{n-1,n-1} & a_{n-1,n} \\
    0 & 0 & 0 & a_{nn}
\end{pmatrix}
\begin{pmatrix}
x_1 \\ \vdots \\ x_{n-1} \\ x_n
\end{pmatrix} = \begin{pmatrix}
b_1 \\ \vdots \\ b_{n-1} \\ b_n
\end{pmatrix}.
$$
    **Back substitution** algorithm: 
$$
\begin{eqnarray*}
    x_n &=& b_n / a_{nn} \\
    x_{n-1} &=& (b_{n-1} - a_{n-1,n} x_n) / a_{n-1,n-1} \\
    x_{n-2} &=& (b_{n-2} - a_{n-2,n-1} x_{n-1} - a_{n-2,n} x_n) / a_{n-2,n-2} \\
    &\vdots& \\
    x_1 &=& (b_1 - a_{12} x_2 - a_{13} x_3 - \cdots - a_{1,n} x_{n}) / a_{11}.
\end{eqnarray*}
$$
$n^2$ flops.

## Solving linear equations by QR

- For an invertible matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$ and $\mathbf{b} \in \mathbb{R}^n$, we can solve the linear equation $\mathbf{A} \mathbf{x} = \mathbf{b}$ by following steps.  
    - Step 1: Compute QR factorization, e.g., by [Gram–Schmidt](https://ucla-biostat-216.github.io/2023fall/slides/02-vector/02-vector.html#how-to-orthonormalize-a-set-of-vectors-gram-schmidt-algorithm) or [Household algorithm](https://ucla-biostat-216.github.io/2023fall/hw/hw4/hw4.html#q3-household-reflections) (HW4): $\mathbf{A} = \mathbf{Q} \mathbf{R}$. $2n^3$ flops.  
    - Step 2: Compute $\mathbf{Q}' \mathbf{b}$. $2n^2$ flops.  
    - Step 3: Solve the triangular equation $\mathbf{R} \mathbf{x} = \mathbf{Q}' \mathbf{b}$ by back substitution. $n^2$ flops.   
    
    The total number of flops is $2n^3 + 3n^2 \approx 2n^3$.
    
- Factor-solve method with multiple right-hand sides. For multiple right hand sides $\mathbf{b}$, we only need to do Step 1 once.

- Compute the inverse, $\mathbf{B}$, of an invertible matrix $\mathbf{A} \in \mathbb{R}^{n \times n}$.  
    - Step 1. QR factorization $\mathbf{A} = \mathbf{Q} \mathbf{R}$.  
    - Step 2. For $i=1,\ldots,n$, solve the triangular equation $\mathbf{R} \mathbf{b}_i = \mathbf{s}_i$, where $\mathbf{s}_i$ is the $i$-th row of $\mathbf{Q}$. 
    
    Total computational cost is $2n^3 + n^3 = 3n^3$ flops.  
    
- Question. Two ways to solve a linear system. `qr(A) \ b` vs `inv(A) * b`. Which way is better?

## Solving linear equations by Gaussian elimination and LU factorization

In this section, we (1) review the Gaussian elimination (GE) for solving linear equations  $\mathbf{A} \mathbf{x} = \mathbf{b}$, and then (2) show that GE leads to a useful decomposition 
$$
\mathbf{A} = \mathbf{L} \mathbf{U}.
$$

Remark: In practice, LU decomposition is used much less frequently than Cholesky decomposition in statistics and machinear learning because $\mathbf{A}$ is almost always symmetric and positive semidefinite in applications.

### Gaussian elimination (GE)

Let's review (from any undergraduate linear algebra textbook) how to solve a linear system
$$
\begin{pmatrix}
2 & 1 & -1 \\ -3 & -1 & 2 \\ -2 & 1 & 2
\end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = 
\begin{pmatrix}
8 \\ -11 \\ -3
\end{pmatrix}.
$$

**Stage 1**: Let's eliminate variable $x_1$ from equations (2) and (3).

Multiplying equation (1) by $\ell_{21} = 3/2$ and adds to equation (2) leads to
$$
\begin{pmatrix}
2 & 1 & -1 \\
0 & 1/2 & 1/2 \\ -2 & 1 & 2
\end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = 
\begin{pmatrix}
8 \\ 1 \\ -3
\end{pmatrix}.
$$

Multiplying equation (1) by $\ell_{31} = 1$ and adds to equation (3) leads to
$$
\begin{pmatrix}
2 & 1 & -1 \\
0 & 1/2 & 1/2 \\
0 & 2 & 1
\end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = 
\begin{pmatrix}
8 \\ 1 \\ 5
\end{pmatrix}.
$$

**Stage 2**: Let's eliminate variable $x_2$ from equation (3).

Multiplying equation (2) by $\ell_{32} = -4$ and adds to equation (3) leads to
$$
\begin{pmatrix}
2 & 1 & -1 \\
0 & 1/2 & 1/2 \\
0 & 0 & -1
\end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = 
\begin{pmatrix}
8 \\ 1 \\ 1
\end{pmatrix}.
$$

**Stage 3**: Now we can collect results by **back-solve** or **back-substitution**.

From the equation (3), $x_3 = -1$.  

Substituting $x_3 = -1$ to equation (2) and solve for $x_2 = 3$.

Substituting $x_2=3$ and $x_3 = 1$ to equation (1) and solve for $x_2 = 2$.

This is essentially how computer solves linear equation:

```{julia}
using LinearAlgebra

A = [2.0 1.0 -1.0; -3.0 -1.0 2.0; -2.0 1.0 2.0]
```

```{julia}
b = [8.0, -11.0, -3.0]
```

```{julia}
# Julia way to solve linear equation
# equivalent to `solve(A, b)` in R
A \ b
```

### LU decomposition

Let's collect those multipliers $-\ell_{ij}$ into a unit lower triangular matrix $\mathbf{L}$

```{julia}
L = [1.0 0.0 0.0; -1.5 1.0 0.0; -1.0 4.0 1.0]
```

and save the resultant upper triangular matrix after GE into $\mathbf{U}$

```{julia}
U = [2.0 1.0 -1.0; 0.0 0.5 0.5; 0.0 0.0 -1.0]
```

Surprisingly, we find that $\mathbf{A} = \mathbf{L} \mathbf{U}$!

```{julia}
A ≈ L * U
```

- Theorem: For any nonsingular $\mathbf{A} \in \mathbb{R}^{n \times n}$, there exists a unique unit lower triangular matrix $\mathbf{L}$ and upper triangular matrix $\mathbf{U}$ such that
$$
\mathbf{A} = \mathbf{L} \mathbf{U}.
$$

- Given LU decomposition $\mathbf{A} = \mathbf{L} \mathbf{U}$, for a new right hand side $\mathbf{b}$, the solution to $\mathbf{A} \mathbf{x} = \mathbf{L} \mathbf{U} \mathbf{x} = \mathbf{b}$ is readily given by two triangular solves:
\begin{eqnarray*}
    \mathbf{L} \mathbf{y} &=& \mathbf{b} \\
    \mathbf{U} \mathbf{x} &=& \mathbf{y}.
\end{eqnarray*}

- Indeed, computer performs GE/LU on a row-permuted version of $\mathbf{A}$ (pivoting) for numerical stability. That is 
$$
\mathbf{P} \mathbf{A} = \mathbf{L} \mathbf{U},
$$
where $\mathbf{P}$ is a permutation matrix. All multipliers $\ell_{ij}$ in $\mathbf{L}$ has magnitude $\le 1$.

- Total computational cost of LU decomposition is $(2/3)n^3$ flops.

- Remark: Total computational cost of the Cholesky decomposition is $(1/3)n^3$ flops.

```{julia}
A
```

```{julia}
Alu = lu(A)
```

```{julia}
Alu.p
```

```{julia}
A[Alu.p, :] ≈ Alu.L * Alu.U
```