---
title: Condition Number
subtitle: Biostat/Biomath M257
author: Dr. Hua Zhou @ UCLA
date: today
format:
  html:
    theme: cosmo
    embed-resources: true
    number-sections: true
    toc: true
    toc-depth: 4
    toc-location: left
    code-fold: false
jupyter:
  jupytext:
    formats: 'ipynb,qmd'
    text_representation:
      extension: .qmd
      format_name: quarto
      format_version: '1.0'
      jupytext_version: 1.14.5
  kernelspec:
    display_name: Julia (8 threads) 1.8.5
    language: julia
    name: julia-_8-threads_-1.8
---

System information (for reproducibility):

```{julia}
versioninfo()
```

Load packages:

```{julia}
using Pkg

Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()
```

* Assume $\mathbf{A} \in \mathbb{R}^{n \times n}$ is nonsingular and consider the system of linear equation 
$$
\mathbf{A} \mathbf{x} = \mathbf{b}.
$$
The solution is 
$$
\mathbf{x} = \mathbf{A}^{-1}  \mathbf{b}.
$$
We want to know how the solution changes with a small perturbation of the input $\mathbf{b}$ (or $\mathbf{A}$).

* Let 
$$
\mathbf{b}_{\text{new}} = \mathbf{b} + \Delta \mathbf{b}.
$$
Then 
$$
\mathbf{x}_{\text{new}} = \mathbf{A}^{-1} (\mathbf{b} + \Delta \mathbf{b}) = \mathbf{x} + \Delta \mathbf{x}.
$$
Thus 
$$
\|\Delta \mathbf{x}\| = \|\mathbf{A}^{-1} \Delta \mathbf{b}\| \le \|\mathbf{A}^{-1}\| \|\Delta \mathbf{b}\|.
$$
Because $\mathbf{b} = \mathbf{A} \mathbf{x}$, 
$$
\frac{1}{\|\mathbf{x}\|} \le \|\mathbf{A}\| \frac{1}{\|\mathbf{b}\|}.
$$
This results 
$$
\frac{ \|\Delta \mathbf{x}\|}{\|\mathbf{x}\|} \le \|\mathbf{A}\|\|\mathbf{A}^{-1}\| \frac{\|\Delta \mathbf{b}\|}{\|\mathbf{b}\|}.
$$

* $\kappa(\mathbf{A}) = \|\mathbf{A}\| \|\mathbf{A}^{-1}\|$ is called the **condition number for linear equation**. It depends on the matrix norm being used.  
    * $\kappa_p$ means condition number defined from matrix-$p$ norm.

* Large condition number means "bad".

* Some facts:  
$$
\begin{eqnarray*}
\kappa(\mathbf{A}) &=& \kappa(\mathbf{A}^{-1})	\\
\kappa(c\mathbf{A}) &=& \kappa(\mathbf{A})	\\
\kappa(\mathbf{A}) &\ge& 1	\\
%\kappa_1(\mathbf{A}) &=& \kappa_\infty (\mathbf{A}^T)	\\
\kappa_2 (\mathbf{A}) &=& \kappa_2 (\mathbf{A}^T) = \frac{\sigma_1(\mathbf{A})}{\sigma_n(\mathbf{A})}	\\
\kappa_2(\mathbf{A}^T \mathbf{A}) &=& \frac{\lambda_1(\mathbf{A}^T \mathbf{A})}{\lambda_n(\mathbf{A}^T \mathbf{A})} =  \kappa_2^2(\mathbf{A}) \ge \kappa_2(\mathbf{A}).
\end{eqnarray*}
$$
The last fact says that the condition number of $\mathbf{A}^T \mathbf{A}$ can be much larger than that of $\mathbf{A}$.

* The smallest singular value $\sigma_n$ indicates the _distance to the trouble_.

* **Condition number for the least squares problem** is more complicated. Roughly speaking, 
    - the method based on normal equation (Cholesky, sweep) has a condition depending on $[\kappa_2(\mathbf{X})]^2$ 
    - QR for a _small residuals_ least squares problem has a condition depending on $\kappa_2(\mathbf{X})$
    
* Consider the simple case
$$
\begin{eqnarray*}
	\mathbf{X} = \begin{pmatrix}
	1 & 1 \\
	10^{-3} & 0 \\
	0 & 10^{-3}
	\end{pmatrix}.
\end{eqnarray*}
$$
Forming normal equation yields a singular Gramian matrix
$$
\begin{eqnarray*}
	\mathbf{X}^T \mathbf{X} = \begin{pmatrix}
	1 & 1 \\
	1 & 1
	\end{pmatrix}
\end{eqnarray*}
$$
if executed with a precision of 6 decimal digits.

## Implementation

* Julia function [`cond`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.cond) computes $\kappa_p$ for $p=1$, 2 (default), or $\infty$.

* Julia function [`condskeel`](https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.condskeel) computes the Skeel condition number.

## Longley example

The [Longley (1967)](https://www.jstor.org/stable/2283673?seq=1#page_scan_tab_contents) macroeconomic data set is a famous test example for numerical software in early days.

```{julia}
using DelimitedFiles, LinearAlgebra, StatsBase, StatsPlots

longley = readdlm("longley.txt")
```

```{julia}
gr()

corrplot(longley, 
    label = ["Employ" "Price" "GNP" "Jobless" "Military" "PopSize" "Year"])
```

```{julia}
# response: Employment
y = longley[:, 1]
# predictor matrix
X = [ones(length(y)) longley[:, 2:end]]
```

```{julia}
# Julia function for obtaining condition number
cond(X)
```

```{julia}
# we see the smallest singular value (aka trouble number) is very small
xsvals = svdvals(X)
```

```{julia}
# condition number of the design matrix
xcond = maximum(xsvals) / minimum(xsvals)
```

```{julia}
# X is full rank from SVD
xrksvd = rank(X)
```

```{julia}
# least squares from QR
X \ y
```

```{julia}
# Gram matrix
G = X'X
```

```{julia}
# rank of Gram matrix from SVD
# rank deficient! We lost precision when forming Gram matrix
rank(G)
```

```{julia}
svdvals(G)
```

```{julia}
# rank of Gram matrix from cholesky
# full!
gchol = cholesky(Symmetric(G), Val(true))
rank(gchol)
```

```{julia}
# least squares solution from Cholesky matches those from QR
gchol \ (X'y)
```

* Now let us re-run above example using **single precision**. (Pretend we are in the 60s-70s.)

```{julia}
Xsp = Float32.(X)
ysp = Float32.(y)

# least squares by QR: dramatically different from Float64 solution
Xsp \ ysp
```

```{julia}
# least squares by Cholesky: failed
G = Xsp'Xsp
gchol = cholesky(Symmetric(G), Val(true), check=false)
gchol \ (Xsp'ysp)
```

```{julia}
rank(Xsp)
```

```{julia}
# rank of Gram matrix by Cholesky
rank(gchol)
```

```{julia}
# rank of Gram matrix by SVD
rank(G)
```

* **Standardizing the predictors** improves the condition.

```{julia}
# center and standardize matrix along dimension 1
Xcs = zscore(X[:, 2:end], 1)
Xcs = [ones(length(y)) Xcs]
```

```{julia}
cond(Xcs)
```

```{julia}
rank(Xcs)
```

```{julia}
rank(Xcs'Xcs)
```

## Further reading

* Chapter 6 of [Numerical Analysis for Statisticians](http://ucla.worldcat.org/title/numerical-analysis-for-statisticians/oclc/793808354&referer=brief_results) of Kenneth Lange (2010).

* Section 2.6 of [Matrix Computation](http://catalog.library.ucla.edu/vwebv/holdingsInfo?bibId=7122088) by Gene Golub and Charles Van Loan (2013).