{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "R version 4.2.2 (2022-10-31)\n",
       "Platform: x86_64-apple-darwin17.0 (64-bit)\n",
       "Running under: macOS Catalina 10.15.7\n",
       "\n",
       "Matrix products: default\n",
       "BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib\n",
       "LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib\n",
       "\n",
       "locale:\n",
       "[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8\n",
       "\n",
       "attached base packages:\n",
       "[1] stats     graphics  grDevices utils     datasets  methods   base     \n",
       "\n",
       "loaded via a namespace (and not attached):\n",
       " [1] fansi_1.0.4     crayon_1.5.2    digest_0.6.31   utf8_1.2.2     \n",
       " [5] IRdisplay_1.1   repr_1.1.5      lifecycle_1.0.3 jsonlite_1.8.4 \n",
       " [9] evaluate_0.20   pillar_1.8.1    rlang_1.0.6     cli_3.6.0      \n",
       "[13] uuid_1.1-0      vctrs_0.5.2     IRkernel_1.3.2  tools_4.2.2    \n",
       "[17] glue_1.6.2      fastmap_1.1.0   compiler_4.2.2  base64enc_0.1-3\n",
       "[21] pbdZMQ_0.3-9    htmltools_0.5.4"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sessionInfo()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Required R packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "library(Matrix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# QR Decomposition\n",
    "\n",
    "* We learned Cholesky decomposition as **one** approach for solving linear regression.\n",
    "\n",
    "* Another approach for linear regression uses the QR decomposition.  \n",
    "    **This is how the `lm()` function in R does linear regression.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 5 × 3 of type dbl</caption>\n",
       "<tbody>\n",
       "\t<tr><td> 0.3769721</td><td> 0.7205735</td><td>-0.8531228</td></tr>\n",
       "\t<tr><td> 0.3015484</td><td> 0.9391210</td><td> 0.9092592</td></tr>\n",
       "\t<tr><td>-1.0980232</td><td>-0.2293777</td><td> 1.1963730</td></tr>\n",
       "\t<tr><td>-1.1304059</td><td> 1.7591313</td><td>-0.3715839</td></tr>\n",
       "\t<tr><td>-2.7965343</td><td> 0.1173668</td><td>-0.1232602</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 5 × 3 of type dbl\n",
       "\\begin{tabular}{lll}\n",
       "\t  0.3769721 &  0.7205735 & -0.8531228\\\\\n",
       "\t  0.3015484 &  0.9391210 &  0.9092592\\\\\n",
       "\t -1.0980232 & -0.2293777 &  1.1963730\\\\\n",
       "\t -1.1304059 &  1.7591313 & -0.3715839\\\\\n",
       "\t -2.7965343 &  0.1173668 & -0.1232602\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 5 × 3 of type dbl\n",
       "\n",
       "|  0.3769721 |  0.7205735 | -0.8531228 |\n",
       "|  0.3015484 |  0.9391210 |  0.9092592 |\n",
       "| -1.0980232 | -0.2293777 |  1.1963730 |\n",
       "| -1.1304059 |  1.7591313 | -0.3715839 |\n",
       "| -2.7965343 |  0.1173668 | -0.1232602 |\n",
       "\n"
      ],
      "text/plain": [
       "     [,1]       [,2]       [,3]      \n",
       "[1,]  0.3769721  0.7205735 -0.8531228\n",
       "[2,]  0.3015484  0.9391210  0.9092592\n",
       "[3,] -1.0980232 -0.2293777  1.1963730\n",
       "[4,] -1.1304059  1.7591313 -0.3715839\n",
       "[5,] -2.7965343  0.1173668 -0.1232602"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<style>\n",
       ".list-inline {list-style: none; margin:0; padding: 0}\n",
       ".list-inline>li {display: inline-block}\n",
       ".list-inline>li:not(:last-child)::after {content: \"\\00b7\"; padding: 0 .5ex}\n",
       "</style>\n",
       "<ol class=list-inline><li>1.80004311672545</li><li>1.70399587729432</li><li>-3.03876460529759</li><li>-2.28897494991878</li><li>0.0583034949929225</li></ol>\n"
      ],
      "text/latex": [
       "\\begin{enumerate*}\n",
       "\\item 1.80004311672545\n",
       "\\item 1.70399587729432\n",
       "\\item -3.03876460529759\n",
       "\\item -2.28897494991878\n",
       "\\item 0.0583034949929225\n",
       "\\end{enumerate*}\n"
      ],
      "text/markdown": [
       "1. 1.80004311672545\n",
       "2. 1.70399587729432\n",
       "3. -3.03876460529759\n",
       "4. -2.28897494991878\n",
       "5. 0.0583034949929225\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "[1]  1.80004312  1.70399588 -3.03876461 -2.28897495  0.05830349"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "\n",
       "Call:\n",
       "lm(formula = y ~ X - 1)\n",
       "\n",
       "Coefficients:\n",
       "       X1         X2         X3  \n",
       " 0.615118  -0.008382  -0.770116  \n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "set.seed(2020) # seed\n",
    "\n",
    "n <- 5\n",
    "p <- 3\n",
    "(X <- matrix(rnorm(n * p), nrow=n)) # predictor matrix\n",
    "(y <- rnorm(n))    # response vector\n",
    "\n",
    "# find the (minimum L2 norm) least squares solution\n",
    "lm(y ~ X - 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want to understand what is QR and how it is used for solving least squares problem."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Definitions\n",
    "\n",
    "* Assume $\\mathbf{X} \\in \\mathbb{R}^{n \\times p}$ has full column rank. Necessarilly $n \\ge p$.\n",
    "\n",
    "* **Full QR decomposition**:  \n",
    "$$\n",
    "    \\mathbf{X} = \\mathbf{Q} \\mathbf{R},  \n",
    "$$\n",
    "where  \n",
    "    - $\\mathbf{Q} \\in \\mathbb{R}^{n \\times n}$, $\\mathbf{Q}^T \\mathbf{Q} = \\mathbf{Q}\\mathbf{Q}^T = \\mathbf{I}_n$. In other words, $\\mathbf{Q}$ is an orthogonal matrix.  \n",
    "        - First $p$ columns of $\\mathbf{Q}$ form an orthonormal basis of ${\\cal R}(\\mathbf{X})$ (**range** or column space of $\\mathbf{X}$)      \n",
    "        - Last $n-p$ columns of $\\mathbf{Q}$ form an orthonormal basis of ${\\cal N}(\\mathbf{X}^T)$ (**null space** of $\\mathbf{X}^T$)\n",
    "        - Recall that $\\mathcal{N}(\\mathbf{X}^T)=\\mathcal{R}(\\mathbf{X})^{\\perp}$ and $\\mathcal{R}(\\mathbf{X}) \\oplus \\mathcal{N}(\\mathbf{X}^T) = \\mathbb{R}^n$.\n",
    "    - $\\mathbf{R} \\in \\mathbb{R}^{n \\times p}$  is upper triangular with positive diagonal entries. \n",
    "        - The lower $(n-p)\\times p$ block of $\\mathbf{R}$ is $\\mathbf{0}$ (why?)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " * **Reduced QR decomposition**:\n",
    "$$\n",
    "    \\mathbf{X} = \\mathbf{Q}_1 \\mathbf{R}_1,\n",
    "$$\n",
    "where\n",
    "    - $\\mathbf{Q}_1 \\in \\mathbb{R}^{n \\times p}$, $\\mathbf{Q}_1^T \\mathbf{Q}_1 = \\mathbf{I}_p$. In other words, $\\mathbf{Q}_1$ is a partially orthogonal matrix. Note $\\mathbf{Q}_1\\mathbf{Q}_1^T \\neq \\mathbf{I}_n$.\n",
    "    - $\\mathbf{R}_1 \\in \\mathbb{R}^{p \\times p}$  is an upper triangular matrix with positive diagonal entries.  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Given QR decomposition $\\mathbf{X} = \\mathbf{Q} \\mathbf{R}$,\n",
    "    $$\n",
    "    \\mathbf{X}^T \\mathbf{X} = \\mathbf{R}^T \\mathbf{Q}^T \\mathbf{Q} \\mathbf{R} = \\mathbf{R}^T \\mathbf{R} = \\mathbf{R}_1^T \\mathbf{R}_1.\n",
    "    $$\n",
    "    - Once we have a (reduced) QR decomposition of $\\mathbf{X}$, we automatically have the Cholesky decomposition of the *Gram matrix*  $\\mathbf{X}^T \\mathbf{X}$.\n",
    "    \n",
    "### Least squares\n",
    "\n",
    "* Normal equation\n",
    "$$\n",
    "    \\mathbf{X}^T\\mathbf{X}\\beta = \\mathbf{X}^T\\mathbf{y}\n",
    "$$\n",
    "is equivalently written with reduced QR as\n",
    "$$\n",
    "    \\mathbf{R}_1^T\\mathbf{R}_1\\beta = \\mathbf{R}_1^T\\mathbf{Q}_1^T\\mathbf{y}\n",
    "$$\n",
    "\n",
    "* Since $\\mathbf{R}_1$ is invertible, we only need to solve the triangluar system\n",
    "$$\n",
    "    \\mathbf{R}_1\\beta = \\mathbf{Q}_1^T\\mathbf{y}\n",
    "$$\n",
    "Multiplication $\\mathbf{Q}_1^T \\mathbf{y}$ is done implicitly (see below).\n",
    "\n",
    "* This method is numerically more stable than directly solving the normal equation, since $\\kappa(\\mathbf{X}^T\\mathbf{X}) = \\kappa(\\mathbf{X})^2$!\n",
    "\n",
    "* In case we need standard errors, compute inverse of $\\mathbf{R}_1^T \\mathbf{R}_1$. This involves triangular solves."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gram-Schmidt procedure\n",
    "\n",
    "* Wait! Does $\\mathbf{X}$ always have a QR decomposition?\n",
    "    - Yes. It is equivalent to the [Gram-Schmidt procedure](https://en.wikipedia.org/wiki/Gram–Schmidt_process) for basis orthonormalization."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Assume $\\mathbf{X} = [\\mathbf{x}_1 | \\dotsb | \\mathbf{x}_p] \\in \\mathbb{R}^{n \\times p}$ has full column rank. That is,  $\\mathbf{x}_1,\\ldots,\\mathbf{x}_p$ are *linearly independent*.\n",
    "\n",
    "* Gram-Schmidt (GS) procedure produces nested orthonormal basis vectors $\\{\\mathbf{q}_1, \\dotsc, \\mathbf{q}_p\\}$ that spans $\\mathcal{R}(\\mathbf{X})$, i.e.,\n",
    "$$\n",
    "\\begin{split}\n",
    "    \\text{span}(\\{\\mathbf{x}_1\\}) &= \\text{span}(\\{\\mathbf{q}_1\\}) \\\\\n",
    "    \\text{span}(\\{\\mathbf{x}_1, \\mathbf{x}_2\\}) &= \\text{span}(\\{\\mathbf{q}_1, \\mathbf{q}_2\\}) \\\\\n",
    "    & \\vdots \\\\\n",
    "    \\text{span}(\\{\\mathbf{x}_1, \\mathbf{x}_2, \\dotsc, \\mathbf{x}_p\\}) &= \\text{span}(\\{\\mathbf{q}_1, \\mathbf{q}_2, \\dotsc, \\mathbf{q}_p\\}) \n",
    "\\end{split}\n",
    "$$\n",
    "and $\\langle \\mathbf{q}_i, \\mathbf{q}_j \\rangle = \\delta_{ij}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* The algorithm:\n",
    "    0. Initialize $\\mathbf{q}_1 = \\mathbf{x}_1 / \\|\\mathbf{x}_1\\|_2$\n",
    "    0. For $k=2, \\ldots, p$, \n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\t\\mathbf{v}_k &=& \\mathbf{x}_k - P_{\\text{span}(\\{\\mathbf{q}_1,\\ldots,\\mathbf{q}_{k-1}\\})}(\\mathbf{x}_k) = \\mathbf{x}_k -  \\sum_{j=1}^{k-1} \\langle \\mathbf{q}_j, \\mathbf{x}_k \\rangle \\cdot \\mathbf{q}_j \\\\\n",
    "\t\\mathbf{q}_k &=& \\mathbf{v}_k / \\|\\mathbf{v}_k\\|_2\n",
    "\\end{eqnarray*}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GS conducts reduced QR\n",
    "\n",
    "* $\\mathbf{Q} = [\\mathbf{q}_1 | \\dotsb | \\mathbf{q}_p]$. Obviously $\\mathbf{Q}^T \\mathbf{Q} = \\mathbf{I}_p$.\n",
    "\n",
    "* Where is $\\mathbf{R}$? \n",
    "    - Let $r_{jk} = \\langle \\mathbf{q}_j, \\mathbf{x}_k \\rangle$ for $j < k$, and $r_{kk} = \\|\\mathbf{v}_k\\|_2$.\n",
    "    - Re-write the above expression:\n",
    "$$\n",
    "    r_{kk} \\mathbf{q}_k = \\mathbf{x}_k -  \\sum_{j=1}^{k-1} r_{jk} \\cdot \\mathbf{q}_j\n",
    "$$\n",
    "        or\n",
    "$$\n",
    "    \\mathbf{x}_k = r_{kk} \\mathbf{q}_k +  \\sum_{j=1}^{k-1} r_{jk} \\cdot \\mathbf{q}_j\n",
    "    .\n",
    "$$    \n",
    "    - If we let $r_{jk} = 0$ for $j > k$, then $\\mathbf{R}=(r_{jk})$ is upper triangular and\n",
    "$$\n",
    "    \\mathbf{X} = \\mathbf{Q}\\mathbf{R}\n",
    "    .\n",
    "$$\n",
    " \n",
    "<img src=\"https://dsc-spidal.github.io/harp/img/daalAlgosNew/HarpIllustrations_QR.png\" width=\"800\" align=\"center\"/> \n",
    "  \n",
    "Source: <https://dsc-spidal.github.io/harp/docs/harpdaal/algorithms/>  \n",
    "  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Classical Gram-Schmidt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "cgs <- function(X) {  # not in-place\n",
    "    n <- dim(X)[1]\n",
    "    p <- dim(X)[2]    \n",
    "    Q <- Matrix(0, nrow=n, ncol=p, sparse=FALSE)\n",
    "    R <- Matrix(0, nrow=p, ncol=p, sparse=FALSE)\n",
    "    for (k in seq_len(p)) {\n",
    "        Q[, k] <- X[, k]\n",
    "        for (j in seq_len(k-1)) {\n",
    "            R[j, k] = sum(Q[, j] * X[, k])  # dot product\n",
    "            Q[, k] <- Q[, k] - R[j, k] * Q[, j]\n",
    "        }\n",
    "        R[k, k] <- Matrix::norm(Q[, k, drop=FALSE], \"F\")\n",
    "        Q[, k] <- Q[, k] / R[k, k]\n",
    "    }\n",
    "    list(Q=Q, R=R)\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* CGS is *unstable* (we lose orthogonality due to roundoff errors) when columns of $\\mathbf{X}$ are almost collinear."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4 x 3 Matrix of class \"dgeMatrix\"\n",
       "             [,1]         [,2]         [,3]\n",
       "[1,] 1.000000e+00 1.000000e+00 1.000000e+00\n",
       "[2,] 2.220446e-16 0.000000e+00 0.000000e+00\n",
       "[3,] 0.000000e+00 2.220446e-16 0.000000e+00\n",
       "[4,] 0.000000e+00 0.000000e+00 2.220446e-16"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "e <- .Machine$double.eps\n",
    "(A <- t(Matrix(c(1, 1, 1, e, 0, 0, 0, e, 0, 0, 0, e), nrow=3)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4 x 3 Matrix of class \"dgeMatrix\"\n",
       "             [,1]       [,2]       [,3]\n",
       "[1,] 1.000000e+00  0.0000000  0.0000000\n",
       "[2,] 2.220446e-16 -0.7071068 -0.7071068\n",
       "[3,] 0.000000e+00  0.7071068  0.0000000\n",
       "[4,] 0.000000e+00  0.0000000  0.7071068"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "res <- cgs(A)\n",
    "res$Q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3 x 3 Matrix of class \"dgeMatrix\"\n",
       "              [,1]          [,2]          [,3]\n",
       "[1,]  1.000000e+00 -1.570092e-16 -1.570092e-16\n",
       "[2,] -1.570092e-16  1.000000e+00  5.000000e-01\n",
       "[3,] -1.570092e-16  5.000000e-01  1.000000e+00"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "with(res, t(Q) %*% Q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* `Q` is hardly orthogonal.\n",
    "* Where exactly does the problem occur? (HW)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Modified Gram-Schmidt\n",
    "\n",
    "* The algorithm:\n",
    "    0. Initialize $\\mathbf{q}_1 = \\mathbf{x}_1 / \\|\\mathbf{x}_1\\|_2$\n",
    "    0. For $k=2, \\ldots, p$, \n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\t\\mathbf{v}_k &=& \\mathbf{x}_k - P_{\\text{span}(\\{\\mathbf{q}_1,\\ldots,\\mathbf{q}_{k-1}\\})}(\\mathbf{x}_k) = \\mathbf{x}_k -  \\sum_{j=1}^{k-1} \\langle \\mathbf{q}_j, \\mathbf{x}_k \\rangle \\cdot \\mathbf{q}_j \\\\\n",
    "    &=&  \\mathbf{x}_k -  \\sum_{j=1}^{k-1} \\left\\langle \\mathbf{q}_j, \\mathbf{x}_k - \\sum_{l=1}^{j-1}\\langle \\mathbf{q}_l, \\mathbf{x}_k \\rangle \\mathbf{q}_l \\right\\rangle \\cdot \\mathbf{q}_j \\\\\n",
    "\t\\mathbf{q}_k &=& \\mathbf{v}_k / \\|\\mathbf{v}_k\\|_2\n",
    "\\end{eqnarray*}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "mgs <- function(X) { #  in-place\n",
    "    n <- dim(X)[1]\n",
    "    p <- dim(X)[2]\n",
    "    R <- Matrix(0, nrow=p, ncol=p)\n",
    "    for (k in seq_len(p)) {\n",
    "        for (j in seq_len(k-1)) {\n",
    "            R[j, k] <- sum(X[, j] * X[, k])  # dot product\n",
    "            X[, k] <- X[, k] - R[j, k] * X[, j]\n",
    "        }\n",
    "        R[k, k] <- Matrix::norm(X[, k, drop=FALSE], \"F\")\n",
    "        X[, k] <- X[, k] / R[k, k]\n",
    "    }\n",
    "    list(Q=X, R=R)\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* $\\mathbf{X}$ is overwritten by $\\mathbf{Q}$ and $\\mathbf{R}$ is stored in a separate array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4 x 3 Matrix of class \"dgeMatrix\"\n",
       "             [,1]       [,2]       [,3]\n",
       "[1,] 1.000000e+00  0.0000000  0.0000000\n",
       "[2,] 2.220446e-16 -0.7071068 -0.4082483\n",
       "[3,] 0.000000e+00  0.7071068 -0.4082483\n",
       "[4,] 0.000000e+00  0.0000000  0.8164966"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "mres <- mgs(A)\n",
    "mres$Q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3 x 3 Matrix of class \"dgeMatrix\"\n",
       "              [,1]          [,2]          [,3]\n",
       "[1,]  1.000000e+00 -1.570092e-16 -9.064933e-17\n",
       "[2,] -1.570092e-16  1.000000e+00  1.110223e-16\n",
       "[3,] -9.064933e-17  1.110223e-16  1.000000e+00"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "with(mres, t(Q) %*% Q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* So MGS is more stable than CGS. However, even MGS is not completely immune to instability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2 x 2 Matrix of class \"dgeMatrix\"\n",
       "     [,1]      [,2]\n",
       "[1,]  0.7 0.7071068\n",
       "[2,]  0.7 0.7071068"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "(B <- t(Matrix(c(0.7, 1/sqrt(2), 0.7 + e, 1/sqrt(2)), nrow=2)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2 x 2 Matrix of class \"dgeMatrix\"\n",
       "          [,1] [,2]\n",
       "[1,] 0.7071068    1\n",
       "[2,] 0.7071068    0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "mres2 <- mgs(B)\n",
    "mres2$Q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2 x 2 Matrix of class \"dgeMatrix\"\n",
       "          [,1]      [,2]\n",
       "[1,] 1.0000000 0.7071068\n",
       "[2,] 0.7071068 1.0000000"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "with(mres2, t(Q) %*% Q)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* `Q` is hardly orthogonal.\n",
    "* Where exactly does the problem occur? (HW)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Computational cost of CGS and MGS is $\\sum_{k=1}^p 4n(k-1) \\approx 2np^2$.\n",
    "\n",
    "* There are 3 algorithms to compute QR: (modified) Gram-Schmidt, Householder transform, (fast) Givens transform.\n",
    "\n",
    "    In particular, the **Householder transform** for QR is implemented in LAPACK and thus used in R, Matlab, and Julia."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## QR by Householder transform\n",
    "\n",
    "<img src=\"http://www-history.mcs.st-andrews.ac.uk/BigPictures/Householder_2.jpeg\" width=\"200\" align=\"center\"/>\n",
    "\n",
    "[Alston Scott Householder (1904-1993)](https://en.wikipedia.org/wiki/Alston_Scott_Householder)\n",
    "\n",
    "* **This is the algorithm for solving linear regression in R**.\n",
    "\n",
    "* Assume again $\\mathbf{X} = [\\mathbf{x}_1 | \\dotsb | \\mathbf{x}_p] \\in \\mathbb{R}^{n \\times p}$ has full column rank.\n",
    "\n",
    "* Gram-Schmidt can be understood as:\n",
    "$$\n",
    "    \\mathbf{X}\\mathbf{R}_{1} \\mathbf{R}_2 \\cdots  \\mathbf{R}_n = \\mathbf{Q}_1\n",
    "$$\n",
    "where $\\mathbf{R}_j$ are a sequence of upper triangular matrices."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Householder QR does\n",
    "$$\n",
    "    \\mathbf{H}_{p} \\cdots \\mathbf{H}_2 \\mathbf{H}_1 \\mathbf{X} = \\begin{pmatrix} \\mathbf{R}_1 \\\\ \\mathbf{0} \\end{pmatrix},\n",
    "$$\n",
    "where $\\mathbf{H}_j \\in \\mathbf{R}^{n \\times n}$ are a sequence of Householder transformation matrices.\n",
    "\n",
    "    It yields the **full QR** where $\\mathbf{Q} = \\mathbf{H}_1 \\cdots \\mathbf{H}_p \\in \\mathbb{R}^{n \\times n}$. Recall that CGS/MGS only produces the **reduced QR** decomposition."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* For arbitrary $\\mathbf{x}, \\mathbf{w} \\in \\mathbb{R}^{n}$ with $\\|\\mathbf{x}\\|_2 = \\|\\mathbf{w}\\|_2$, we can construct a **Householder matrix** (or **Householder reflector**)\n",
    "$$\n",
    "    \\mathbf{H} = \\mathbf{I}_n - 2 \\mathbf{u} \\mathbf{u}^T, \\quad \\mathbf{u} = - \\frac{1}{\\|\\mathbf{x} - \\mathbf{w}\\|_2} (\\mathbf{x} - \\mathbf{w}),\n",
    "$$\n",
    "that transforms $\\mathbf{x}$ to $\\mathbf{w}$:\n",
    "$$\n",
    "\t\\mathbf{H} \\mathbf{x} = \\mathbf{w}.\n",
    "$$\n",
    "To see this, note $\\mathbf{u}^T\\mathbf{x} = - (\\mathbf{x}^T\\mathbf{x} - \\mathbf{w}^T\\mathbf{x})/\\Vert \\mathbf{x} - \\mathbf{w} \\Vert_2$. Then\n",
    "$$\n",
    "\\begin{split}\n",
    "    \\mathbf{H}\\mathbf{x} &= \\mathbf{x} + \\frac{2(\\mathbf{x}^T\\mathbf{x} - \\mathbf{w}^T\\mathbf{x})}{\\Vert \\mathbf{x} - \\mathbf{w} \\Vert_2}\\mathbf{u}\n",
    "    = \\mathbf{x} - \\frac{2(\\mathbf{x}^T\\mathbf{x} - \\mathbf{w}^T\\mathbf{x})}{\\Vert \\mathbf{x} - \\mathbf{w} \\Vert_2}\\frac{\\mathbf{x} - \\mathbf{w}}{\\Vert \\mathbf{x} - \\mathbf{w} \\Vert_2}\n",
    "    \\\\\n",
    "    &=\n",
    "    \\frac{(\\Vert\\mathbf{x} - \\mathbf{w}\\Vert_2^2 - 2\\mathbf{x}^T\\mathbf{x} + 2\\mathbf{w}^T\\mathbf{x})\\mathbf{x} + 2(\\mathbf{x}^T\\mathbf{x} - \\mathbf{w}^T\\mathbf{x})\\mathbf{w}}{\\Vert \\mathbf{x} - \\mathbf{w} \\Vert_2^2}\n",
    "    \\\\\n",
    "    &=\n",
    "    \\frac{(\\mathbf{x}^T\\mathbf{x} + \\mathbf{w}^T\\mathbf{w} - 2\\mathbf{w}^T\\mathbf{x} - 2\\mathbf{x}^T\\mathbf{x} + 2\\mathbf{w}^T\\mathbf{x})\\mathbf{x} + 2(\\mathbf{x}^T\\mathbf{x} - \\mathbf{w}^T\\mathbf{x})\\mathbf{w}}{\\Vert \\mathbf{x} - \\mathbf{w} \\Vert_2^2}    \n",
    "    \\\\\n",
    "    &=\n",
    "    \\frac{(\\mathbf{x}^T\\mathbf{x} + \\mathbf{w}^T\\mathbf{w} - 2\\mathbf{w}^T\\mathbf{x})\\mathbf{w}}{\\Vert \\mathbf{x} - \\mathbf{w} \\Vert_2^2}    \n",
    "    \\\\\n",
    "    &= \\frac{\\Vert \\mathbf{x} - \\mathbf{w} \\Vert_2^2\\mathbf{w}}{\\Vert \\mathbf{x} - \\mathbf{w} \\Vert_2^2}    \n",
    "    = \\mathbf{w}\n",
    "\\end{split}\n",
    "$$\n",
    "using $\\mathbf{x}^T\\mathbf{x} = \\mathbf{w}^T\\mathbf{w}$.\n",
    "\n",
    "$\\mathbf{H}$ is symmetric and orthogonal. Calculation of Householder vector $\\mathbf{u}$ costs $3n$ flops for general $\\mathbf{u}$ and $\\mathbf{w}$.\n",
    "\n",
    "<img src=\"https://www.cs.utexas.edu/users/flame/laff/alaff/images/Chapter03/reflector.png\" width=\"400\" align=\"center\"/>\n",
    "\n",
    "Source: https://www.cs.utexas.edu/users/flame/laff/alaff/images/Chapter03/reflector.png"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Now choose $\\mathbf{H}_1$ so that\n",
    "$$\n",
    "\t\\mathbf{H}_1 \\mathbf{x}_1 = \\begin{pmatrix} \\pm\\|\\mathbf{x}_{1}\\|_2 \\\\ 0 \\\\ \\vdots \\\\ 0 \\end{pmatrix}.\n",
    "$$\n",
    "That is, $\\mathbf{x} = \\mathbf{x}_1$ and $\\mathbf{w} = \\pm\\|\\mathbf{x}_1\\|_2\\mathbf{e}_1$.\n",
    "\n",
    "* Left-multiplying $\\mathbf{H}_1$ zeros out the first column of $\\mathbf{X}$ below (1, 1)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Take $\\mathbf{H}_2$ to zero the second column below diagonal:\n",
    "$$\n",
    "\\mathbf{H}_2\\mathbf{H}_1\\mathbf{X} = \n",
    "\\begin{bmatrix} \n",
    "\\times & \\times & \\times & \\times \\\\ \n",
    "0 & \\boldsymbol{\\times} & \\boldsymbol{\\times} & \\boldsymbol{\\times}  \\\\\n",
    "0 & \\mathbf{0} & \\boldsymbol{\\times} & \\boldsymbol{\\times}  \\\\\n",
    "0 & \\mathbf{0} & \\boldsymbol{\\times} & \\boldsymbol{\\times}  \\\\\n",
    "0 & \\mathbf{0} & \\boldsymbol{\\times} & \\boldsymbol{\\times} \n",
    "\\end{bmatrix} \n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* In general, choose the $j$-th Householder transform $\\mathbf{H}_j = \\mathbf{I}_n - 2 \\mathbf{u}_j \\mathbf{u}_j^T$, where \n",
    "$$\n",
    "     \\mathbf{u}_j = \\begin{bmatrix} \\mathbf{0}_{j-1} \\\\ {\\tilde u}_j \\end{bmatrix}, \\quad {\\tilde u}_j \\in \\mathbb{R}^{n-j+1},\n",
    "$$\n",
    "to zero the $j$-th column below diagonal. $\\mathbf{H}_j$ takes the form\n",
    "$$\n",
    "\t\\mathbf{H}_j = \\begin{bmatrix}\n",
    "\t\\mathbf{I}_{j-1} & \\\\\n",
    "\t& \\mathbf{I}_{n-j+1} - 2 {\\tilde u}_j {\\tilde u}_j^T\n",
    "\t\\end{bmatrix} = \\begin{bmatrix}\n",
    "\t\\mathbf{I}_{j-1} & \\\\\n",
    "\t& {\\tilde H}_{j}\n",
    "\t\\end{bmatrix}.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Applying a Householder transform $\\mathbf{H} = \\mathbf{I} - 2 \\mathbf{u} \\mathbf{u}^T$ to a matrix $\\mathbf{X} \\in \\mathbb{R}^{n \\times p}$\n",
    "$$\n",
    "\t\\mathbf{H} \\mathbf{X} = \\mathbf{X} - 2 \\mathbf{u} (\\mathbf{u}^T \\mathbf{X})\n",
    "$$\n",
    "costs $4np$ flops. **Householder updates never entails explicit formation of the Householder matrices.**\n",
    "\n",
    "* Note applying ${\\tilde H}_j$ to $\\mathbf{X}$ only needs $4(n-j+1)(p-j+1)$ flops."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Algorithm\n",
    "\n",
    "```r\n",
    "for (j in seq_len(p)) {  # in-place\n",
    "    u <- Householer(X[j:n, j])\n",
    "    for (i in j:p)\n",
    "        X[j:n, i] <- X[j:n, i] - 2 * u * sum(u * X[j:n, i])\n",
    "    }\n",
    "}\n",
    "```\n",
    "\n",
    "* The process is done in place. Upper triangular part of $\\mathbf{X}$ is overwritten by $\\mathbf{R}_1$ and the essential Householder vectors ($\\tilde u_{j1}$ is normalized to 1) are stored in $\\mathbf{X}[j:n,j]$. Can ensure $u_{j1} \\neq 0$ by a clever choice of the sign in determining vector $\\mathbf{w}$ above (HW).\n",
    "\n",
    "* At $j$-th stage\n",
    "     0. computing the Householder vector ${\\tilde u}_j$ costs $3(n-j+1)$ flops\n",
    "     0. applying the Householder transform ${\\tilde H}_j$ to the $\\mathbf{X}[j:n, j:p]$ block costs $4(n-j+1)(p-j+1)$ flops  \n",
    "     \n",
    "* In total we need $\\sum_{j=1}^p [3(n-j+1) + 4(n-j+1)(p-j+1)] \\approx 2np^2 - \\frac 23 p^3$ flops."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Where is $\\mathbf{Q}$? \n",
    "    - $\\mathbf{Q} = \\mathbf{H}_1 \\cdots \\mathbf{H}_p$. In some applications, it's necessary to form the orthogonal matrix $\\mathbf{Q}$. \n",
    "\n",
    "    Accumulating $\\mathbf{Q}$ costs another $2np^2 - \\frac 23 p^3$ flops.\n",
    "\n",
    "* When computing $\\mathbf{Q}^T \\mathbf{v}$ or $\\mathbf{Q} \\mathbf{v}$ as in some applications (e.g., solve linear equation using QR), no need to form $\\mathbf{Q}$. Simply apply Householder transforms successively to the vector $\\mathbf{v}$. (HW)\n",
    "\n",
    "* Computational cost of Householder QR for linear regression: $2n p^2 - \\frac 23 p^3$ (regression coefficients and $\\hat \\sigma^2$) or more (fitted values, s.e., ...)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Implementation\n",
    "\n",
    "* R function: [`qr`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/qr.html)\n",
    "\n",
    "* Wraps LAPACK routine [`dgeqp3`](http://www.netlib.org/lapack/explore-html/dd/d9a/group__double_g_ecomputational_ga1b0500f49e03d2771b797c6e88adabbb.html)  (with `LAPACK=TRUE`; default uses LINPACK, an ancient version of LAPACK)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 5 × 3 of type dbl</caption>\n",
       "<tbody>\n",
       "\t<tr><td> 0.3769721</td><td> 0.7205735</td><td>-0.8531228</td></tr>\n",
       "\t<tr><td> 0.3015484</td><td> 0.9391210</td><td> 0.9092592</td></tr>\n",
       "\t<tr><td>-1.0980232</td><td>-0.2293777</td><td> 1.1963730</td></tr>\n",
       "\t<tr><td>-1.1304059</td><td> 1.7591313</td><td>-0.3715839</td></tr>\n",
       "\t<tr><td>-2.7965343</td><td> 0.1173668</td><td>-0.1232602</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 5 × 3 of type dbl\n",
       "\\begin{tabular}{lll}\n",
       "\t  0.3769721 &  0.7205735 & -0.8531228\\\\\n",
       "\t  0.3015484 &  0.9391210 &  0.9092592\\\\\n",
       "\t -1.0980232 & -0.2293777 &  1.1963730\\\\\n",
       "\t -1.1304059 &  1.7591313 & -0.3715839\\\\\n",
       "\t -2.7965343 &  0.1173668 & -0.1232602\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 5 × 3 of type dbl\n",
       "\n",
       "|  0.3769721 |  0.7205735 | -0.8531228 |\n",
       "|  0.3015484 |  0.9391210 |  0.9092592 |\n",
       "| -1.0980232 | -0.2293777 |  1.1963730 |\n",
       "| -1.1304059 |  1.7591313 | -0.3715839 |\n",
       "| -2.7965343 |  0.1173668 | -0.1232602 |\n",
       "\n"
      ],
      "text/plain": [
       "     [,1]       [,2]       [,3]      \n",
       "[1,]  0.3769721  0.7205735 -0.8531228\n",
       "[2,]  0.3015484  0.9391210  0.9092592\n",
       "[3,] -1.0980232 -0.2293777  1.1963730\n",
       "[4,] -1.1304059  1.7591313 -0.3715839\n",
       "[5,] -2.7965343  0.1173668 -0.1232602"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<style>\n",
       ".list-inline {list-style: none; margin:0; padding: 0}\n",
       ".list-inline>li {display: inline-block}\n",
       ".list-inline>li:not(:last-child)::after {content: \"\\00b7\"; padding: 0 .5ex}\n",
       "</style>\n",
       "<ol class=list-inline><li>1.80004311672545</li><li>1.70399587729432</li><li>-3.03876460529759</li><li>-2.28897494991878</li><li>0.0583034949929225</li></ol>\n"
      ],
      "text/latex": [
       "\\begin{enumerate*}\n",
       "\\item 1.80004311672545\n",
       "\\item 1.70399587729432\n",
       "\\item -3.03876460529759\n",
       "\\item -2.28897494991878\n",
       "\\item 0.0583034949929225\n",
       "\\end{enumerate*}\n"
      ],
      "text/markdown": [
       "1. 1.80004311672545\n",
       "2. 1.70399587729432\n",
       "3. -3.03876460529759\n",
       "4. -2.28897494991878\n",
       "5. 0.0583034949929225\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "[1]  1.80004312  1.70399588 -3.03876461 -2.28897495  0.05830349"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "X  # to recall\n",
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       ".dl-inline {width: auto; margin:0; padding: 0}\n",
       ".dl-inline>dt, .dl-inline>dd {float: none; width: auto; display: inline-block}\n",
       ".dl-inline>dt::after {content: \":\\0020\"; padding-right: .5ex}\n",
       ".dl-inline>dt:not(:first-of-type) {padding-left: .5ex}\n",
       "</style><dl class=dl-inline><dt>X1</dt><dd>0.615117663816443</dd><dt>X2</dt><dd>-0.00838211603686755</dd><dt>X3</dt><dd>-0.770116370364976</dd></dl>\n"
      ],
      "text/latex": [
       "\\begin{description*}\n",
       "\\item[X1] 0.615117663816443\n",
       "\\item[X2] -0.00838211603686755\n",
       "\\item[X3] -0.770116370364976\n",
       "\\end{description*}\n"
      ],
      "text/markdown": [
       "X1\n",
       ":   0.615117663816443X2\n",
       ":   -0.00838211603686755X3\n",
       ":   -0.770116370364976\n",
       "\n"
      ],
      "text/plain": [
       "          X1           X2           X3 \n",
       " 0.615117664 -0.008382116 -0.770116370 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "coef(lm(y ~ X - 1)) # least squares solution by QR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       ".list-inline {list-style: none; margin:0; padding: 0}\n",
       ".list-inline>li {display: inline-block}\n",
       ".list-inline>li:not(:last-child)::after {content: \"\\00b7\"; padding: 0 .5ex}\n",
       "</style>\n",
       "<ol class=list-inline><li>0.615117663816443</li><li>-0.00838211603686755</li><li>-0.770116370364976</li></ol>\n"
      ],
      "text/latex": [
       "\\begin{enumerate*}\n",
       "\\item 0.615117663816443\n",
       "\\item -0.00838211603686755\n",
       "\\item -0.770116370364976\n",
       "\\end{enumerate*}\n"
      ],
      "text/markdown": [
       "1. 0.615117663816443\n",
       "2. -0.00838211603686755\n",
       "3. -0.770116370364976\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "[1]  0.615117664 -0.008382116 -0.770116370"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# same as\n",
    "qr.solve(X, y)  # or solve(qr(X), y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "$qr\n",
       "           [,1]        [,2]       [,3]\n",
       "[1,] -3.2460924  0.46519442  0.1837043\n",
       "[2,]  0.0832302 -2.08463446  0.3784090\n",
       "[3,] -0.3030648 -0.05061826 -1.7211061\n",
       "[4,] -0.3120027  0.61242637 -0.4073016\n",
       "[5,] -0.7718699  0.10474144 -0.3750994\n",
       "\n",
       "$rank\n",
       "[1] 3\n",
       "\n",
       "$qraux\n",
       "[1] 1.116131 1.440301 1.530697\n",
       "\n",
       "$pivot\n",
       "[1] 1 2 3\n",
       "\n",
       "attr(,\"useLAPACK\")\n",
       "[1] TRUE\n",
       "attr(,\"class\")\n",
       "[1] \"qr\""
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "(qrobj <- qr(X, LAPACK=TRUE))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `$qr`: a matrix of same size as input matrix \n",
    "- `$rank`: rank of the input matrix\n",
    "- `$pivot`: pivot vector (for rank-deficient `X`)\n",
    "- `$aux`: normalizing constants of Householder vectors\n",
    "\n",
    "The upper triangular part of `qrobj$qr` contains the $\\mathbf{R}$ of the decomposition and the lower triangular part contains information on the $\\mathbf{Q}$ of the decomposition (stored in compact form using $\\mathbf{Q} = \\mathbf{H}_1 \\cdots \\mathbf{H}_p$, HW)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 5 × 3 of type dbl</caption>\n",
       "<tbody>\n",
       "\t<tr><td>-0.11613105</td><td>-0.3715745</td><td> 0.40159171</td></tr>\n",
       "\t<tr><td>-0.09289581</td><td>-0.4712268</td><td>-0.64182039</td></tr>\n",
       "\t<tr><td> 0.33825998</td><td> 0.1855167</td><td>-0.61822569</td></tr>\n",
       "\t<tr><td> 0.34823590</td><td>-0.7661458</td><td> 0.08461993</td></tr>\n",
       "\t<tr><td> 0.86150792</td><td> 0.1359480</td><td> 0.19346097</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 5 × 3 of type dbl\n",
       "\\begin{tabular}{lll}\n",
       "\t -0.11613105 & -0.3715745 &  0.40159171\\\\\n",
       "\t -0.09289581 & -0.4712268 & -0.64182039\\\\\n",
       "\t  0.33825998 &  0.1855167 & -0.61822569\\\\\n",
       "\t  0.34823590 & -0.7661458 &  0.08461993\\\\\n",
       "\t  0.86150792 &  0.1359480 &  0.19346097\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 5 × 3 of type dbl\n",
       "\n",
       "| -0.11613105 | -0.3715745 |  0.40159171 |\n",
       "| -0.09289581 | -0.4712268 | -0.64182039 |\n",
       "|  0.33825998 |  0.1855167 | -0.61822569 |\n",
       "|  0.34823590 | -0.7661458 |  0.08461993 |\n",
       "|  0.86150792 |  0.1359480 |  0.19346097 |\n",
       "\n"
      ],
      "text/plain": [
       "     [,1]        [,2]       [,3]       \n",
       "[1,] -0.11613105 -0.3715745  0.40159171\n",
       "[2,] -0.09289581 -0.4712268 -0.64182039\n",
       "[3,]  0.33825998  0.1855167 -0.61822569\n",
       "[4,]  0.34823590 -0.7661458  0.08461993\n",
       "[5,]  0.86150792  0.1359480  0.19346097"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "qr.Q(qrobj)  # extract Q"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 3 × 3 of type dbl</caption>\n",
       "<tbody>\n",
       "\t<tr><td>-3.246092</td><td> 0.4651944</td><td> 0.1837043</td></tr>\n",
       "\t<tr><td> 0.000000</td><td>-2.0846345</td><td> 0.3784090</td></tr>\n",
       "\t<tr><td> 0.000000</td><td> 0.0000000</td><td>-1.7211061</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 3 × 3 of type dbl\n",
       "\\begin{tabular}{lll}\n",
       "\t -3.246092 &  0.4651944 &  0.1837043\\\\\n",
       "\t  0.000000 & -2.0846345 &  0.3784090\\\\\n",
       "\t  0.000000 &  0.0000000 & -1.7211061\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 3 × 3 of type dbl\n",
       "\n",
       "| -3.246092 |  0.4651944 |  0.1837043 |\n",
       "|  0.000000 | -2.0846345 |  0.3784090 |\n",
       "|  0.000000 |  0.0000000 | -1.7211061 |\n",
       "\n"
      ],
      "text/plain": [
       "     [,1]      [,2]       [,3]      \n",
       "[1,] -3.246092  0.4651944  0.1837043\n",
       "[2,]  0.000000 -2.0846345  0.3784090\n",
       "[3,]  0.000000  0.0000000 -1.7211061"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "qr.R(qrobj) # extract R"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 5 × 1 of type dbl</caption>\n",
       "<tbody>\n",
       "\t<tr><td>-2.1421018</td></tr>\n",
       "\t<tr><td>-0.2739453</td></tr>\n",
       "\t<tr><td> 1.3254520</td></tr>\n",
       "\t<tr><td>-3.3263425</td></tr>\n",
       "\t<tr><td> 1.7707709</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 5 × 1 of type dbl\n",
       "\\begin{tabular}{l}\n",
       "\t -2.1421018\\\\\n",
       "\t -0.2739453\\\\\n",
       "\t  1.3254520\\\\\n",
       "\t -3.3263425\\\\\n",
       "\t  1.7707709\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 5 × 1 of type dbl\n",
       "\n",
       "| -2.1421018 |\n",
       "| -0.2739453 |\n",
       "|  1.3254520 |\n",
       "| -3.3263425 |\n",
       "|  1.7707709 |\n",
       "\n"
      ],
      "text/plain": [
       "     [,1]      \n",
       "[1,] -2.1421018\n",
       "[2,] -0.2739453\n",
       "[3,]  1.3254520\n",
       "[4,] -3.3263425\n",
       "[5,]  1.7707709"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "qr.qty(qrobj, y)  # this uses the \"compact form\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 3 × 1 of type dbl</caption>\n",
       "<tbody>\n",
       "\t<tr><td>-2.1421018</td></tr>\n",
       "\t<tr><td>-0.2739453</td></tr>\n",
       "\t<tr><td> 1.3254520</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 3 × 1 of type dbl\n",
       "\\begin{tabular}{l}\n",
       "\t -2.1421018\\\\\n",
       "\t -0.2739453\\\\\n",
       "\t  1.3254520\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 3 × 1 of type dbl\n",
       "\n",
       "| -2.1421018 |\n",
       "| -0.2739453 |\n",
       "|  1.3254520 |\n",
       "\n"
      ],
      "text/plain": [
       "     [,1]      \n",
       "[1,] -2.1421018\n",
       "[2,] -0.2739453\n",
       "[3,]  1.3254520"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "t(qr.Q(qrobj)) %*% y  # waste of resource"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       ".list-inline {list-style: none; margin:0; padding: 0}\n",
       ".list-inline>li {display: inline-block}\n",
       ".list-inline>li:not(:last-child)::after {content: \"\\00b7\"; padding: 0 .5ex}\n",
       "</style>\n",
       "<ol class=list-inline><li>0.615117663816443</li><li>-0.0083821160368676</li><li>-0.770116370364976</li></ol>\n"
      ],
      "text/latex": [
       "\\begin{enumerate*}\n",
       "\\item 0.615117663816443\n",
       "\\item -0.0083821160368676\n",
       "\\item -0.770116370364976\n",
       "\\end{enumerate*}\n"
      ],
      "text/markdown": [
       "1. 0.615117663816443\n",
       "2. -0.0083821160368676\n",
       "3. -0.770116370364976\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "[1]  0.615117664 -0.008382116 -0.770116370"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "solve(qr.R(qrobj), qr.qty(qrobj, y)[1:p])  # solves R * beta = Q' * y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 3 × 1 of type dbl</caption>\n",
       "<tbody>\n",
       "\t<tr><td> 0.615117664</td></tr>\n",
       "\t<tr><td>-0.008382116</td></tr>\n",
       "\t<tr><td>-0.770116370</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 3 × 1 of type dbl\n",
       "\\begin{tabular}{l}\n",
       "\t  0.615117664\\\\\n",
       "\t -0.008382116\\\\\n",
       "\t -0.770116370\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 3 × 1 of type dbl\n",
       "\n",
       "|  0.615117664 |\n",
       "| -0.008382116 |\n",
       "| -0.770116370 |\n",
       "\n"
      ],
      "text/plain": [
       "     [,1]        \n",
       "[1,]  0.615117664\n",
       "[2,] -0.008382116\n",
       "[3,] -0.770116370"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "Xchol <- Matrix::chol(Matrix::symmpart(t(X) %*% X))  # solution using Cholesky of X'X\n",
    "solve(Xchol, solve(t(Xchol), t(X) %*% y))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets get back to the odd `A` and `B` matrices that fooled Gram-Schmidt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 4 × 3 of type dbl</caption>\n",
       "<tbody>\n",
       "\t<tr><td>-1.000000e+00</td><td> 1.570092e-16</td><td> 9.064933e-17</td></tr>\n",
       "\t<tr><td>-2.220446e-16</td><td>-7.071068e-01</td><td>-4.082483e-01</td></tr>\n",
       "\t<tr><td> 0.000000e+00</td><td> 7.071068e-01</td><td>-4.082483e-01</td></tr>\n",
       "\t<tr><td> 0.000000e+00</td><td> 0.000000e+00</td><td> 8.164966e-01</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 4 × 3 of type dbl\n",
       "\\begin{tabular}{lll}\n",
       "\t -1.000000e+00 &  1.570092e-16 &  9.064933e-17\\\\\n",
       "\t -2.220446e-16 & -7.071068e-01 & -4.082483e-01\\\\\n",
       "\t  0.000000e+00 &  7.071068e-01 & -4.082483e-01\\\\\n",
       "\t  0.000000e+00 &  0.000000e+00 &  8.164966e-01\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 4 × 3 of type dbl\n",
       "\n",
       "| -1.000000e+00 |  1.570092e-16 |  9.064933e-17 |\n",
       "| -2.220446e-16 | -7.071068e-01 | -4.082483e-01 |\n",
       "|  0.000000e+00 |  7.071068e-01 | -4.082483e-01 |\n",
       "|  0.000000e+00 |  0.000000e+00 |  8.164966e-01 |\n",
       "\n"
      ],
      "text/plain": [
       "     [,1]          [,2]          [,3]         \n",
       "[1,] -1.000000e+00  1.570092e-16  9.064933e-17\n",
       "[2,] -2.220446e-16 -7.071068e-01 -4.082483e-01\n",
       "[3,]  0.000000e+00  7.071068e-01 -4.082483e-01\n",
       "[4,]  0.000000e+00  0.000000e+00  8.164966e-01"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "qrA <- qr(A, LAPACK=TRUE)\n",
    "qr.Q(qrA)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 3 × 3 of type dbl</caption>\n",
       "<tbody>\n",
       "\t<tr><td>1</td><td> 0.000000e+00</td><td> 0.000000e+00</td></tr>\n",
       "\t<tr><td>0</td><td> 1.000000e+00</td><td>-1.110223e-16</td></tr>\n",
       "\t<tr><td>0</td><td>-1.110223e-16</td><td> 1.000000e+00</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 3 × 3 of type dbl\n",
       "\\begin{tabular}{lll}\n",
       "\t 1 &  0.000000e+00 &  0.000000e+00\\\\\n",
       "\t 0 &  1.000000e+00 & -1.110223e-16\\\\\n",
       "\t 0 & -1.110223e-16 &  1.000000e+00\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 3 × 3 of type dbl\n",
       "\n",
       "| 1 |  0.000000e+00 |  0.000000e+00 |\n",
       "| 0 |  1.000000e+00 | -1.110223e-16 |\n",
       "| 0 | -1.110223e-16 |  1.000000e+00 |\n",
       "\n"
      ],
      "text/plain": [
       "     [,1] [,2]          [,3]         \n",
       "[1,] 1     0.000000e+00  0.000000e+00\n",
       "[2,] 0     1.000000e+00 -1.110223e-16\n",
       "[3,] 0    -1.110223e-16  1.000000e+00"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "t(qr.Q(qrA)) %*% qr.Q(qrA)   # orthogonality preserved"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 2 × 2 of type dbl</caption>\n",
       "<tbody>\n",
       "\t<tr><td>-0.7071068</td><td>-0.7071068</td></tr>\n",
       "\t<tr><td>-0.7071068</td><td> 0.7071068</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 2 × 2 of type dbl\n",
       "\\begin{tabular}{ll}\n",
       "\t -0.7071068 & -0.7071068\\\\\n",
       "\t -0.7071068 &  0.7071068\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 2 × 2 of type dbl\n",
       "\n",
       "| -0.7071068 | -0.7071068 |\n",
       "| -0.7071068 |  0.7071068 |\n",
       "\n"
      ],
      "text/plain": [
       "     [,1]       [,2]      \n",
       "[1,] -0.7071068 -0.7071068\n",
       "[2,] -0.7071068  0.7071068"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "qrB <- qr(B, LAPACK=TRUE)\n",
    "qr.Q(qrB)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"dataframe\">\n",
       "<caption>A matrix: 2 × 2 of type dbl</caption>\n",
       "<tbody>\n",
       "\t<tr><td> 1.000000e+00</td><td>-1.110223e-16</td></tr>\n",
       "\t<tr><td>-1.110223e-16</td><td> 1.000000e+00</td></tr>\n",
       "</tbody>\n",
       "</table>\n"
      ],
      "text/latex": [
       "A matrix: 2 × 2 of type dbl\n",
       "\\begin{tabular}{ll}\n",
       "\t  1.000000e+00 & -1.110223e-16\\\\\n",
       "\t -1.110223e-16 &  1.000000e+00\\\\\n",
       "\\end{tabular}\n"
      ],
      "text/markdown": [
       "\n",
       "A matrix: 2 × 2 of type dbl\n",
       "\n",
       "|  1.000000e+00 | -1.110223e-16 |\n",
       "| -1.110223e-16 |  1.000000e+00 |\n",
       "\n"
      ],
      "text/plain": [
       "     [,1]          [,2]         \n",
       "[1,]  1.000000e+00 -1.110223e-16\n",
       "[2,] -1.110223e-16  1.000000e+00"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "t(qr.Q(qrB)) %*% qr.Q(qrB)  # orthogonality preserved"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## QR by Givens rotation\n",
    "\n",
    "* Householder transform $\\mathbf{H}_j$ introduces batch of zeros into a vector.\n",
    "\n",
    "* **Givens transform** (aka **Givens rotation**, **Jacobi rotation**, **plane rotation**) selectively zeros one element of a vector.\n",
    "\n",
    "* Overall QR by Givens rotation is less efficient than the Householder method, but is better suited for matrices with structured patterns of nonzero elements.\n",
    "\n",
    "* **Givens/Jacobi rotations**: \n",
    "$$\n",
    "\t\\mathbf{G}(i,k,\\theta) = \\begin{bmatrix} \n",
    "\t1 & & 0 & & 0 & & 0 \\\\\n",
    "\t\\vdots & \\ddots & \\vdots & & \\vdots & & \\vdots \\\\\n",
    "\t0 & & c & & s & & 0 \\\\ \n",
    "\t\\vdots & & \\vdots & \\ddots & \\vdots & & \\vdots \\\\\n",
    "\t0 & & - s & & c & & 0 \\\\\n",
    "\t\\vdots & & \\vdots & & \\vdots & \\ddots & \\vdots \\\\\n",
    "\t0 & & 0 & & 0 & & 1 \\end{bmatrix},\n",
    "$$\n",
    "where $c = \\cos(\\theta)$ and $s = \\sin(\\theta)$. $\\mathbf{G}(i,k,\\theta)$ is orthogonal."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Pre-multiplication by $\\mathbf{G}(i,k,\\theta)^T$ rotates counterclockwise $\\theta$ radians in the $(i,k)$ coordinate plane. If $\\mathbf{x} \\in \\mathbb{R}^n$ and $\\mathbf{y} = \\mathbf{G}(i,k,\\theta)^T \\mathbf{x}$, then\n",
    "$$\n",
    "\ty_j = \\begin{cases}\n",
    "\tcx_i - s x_k & j = i \\\\\n",
    "\tsx_i + cx_k & j = k \\\\\n",
    "\tx_j & j \\ne i, k\n",
    "\t\\end{cases}.\n",
    "$$\n",
    "Apparently if we choose $\\tan(\\theta) = -x_k / x_i$, or equivalently,\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\tc = \\frac{x_i}{\\sqrt{x_i^2 + x_k^2}}, \\quad s = \\frac{-x_k}{\\sqrt{x_i^2 + x_k^2}},\n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "then $y_k=0$.\n",
    "\n",
    "* Pre-applying Givens transform $\\mathbf{G}(i,k,\\theta)^T \\in \\mathbb{R}^{n \\times n}$ to a matrix $\\mathbf{A} \\in \\mathbb{R}^{n \\times m}$ only effects two rows of $\\mathbf{\n",
    "A}$:\n",
    "$$\n",
    "\t\\mathbf{A}([i, k], :) \\gets \\begin{bmatrix} c & s \\\\ -s & c \\end{bmatrix}^T \\mathbf{A}([i, k], :),\n",
    "$$\n",
    "costing $6m$ flops.\n",
    "\n",
    "* Post-applying Givens transform $\\mathbf{G}(i,k,\\theta) \\in \\mathbb{R}^{m \\times m}$ to a matrix $\\mathbf{A} \\in \\mathbb{R}^{n \\times m}$ only effects two columns of $\\mathbf{A}$:\n",
    "$$\n",
    "\t\\mathbf{A}(:, [i,k]) \\gets \\mathbf{A}(:, [i,k]) \\begin{bmatrix} c & s \\\\ -s & c \\end{bmatrix},\n",
    "$$\n",
    "costing $6n$ flops.\n",
    "\n",
    "* QR by Givens: $\\mathbf{G}_t^T \\cdots \\mathbf{G}_1^T \\mathbf{X} =  \\begin{bmatrix} \\mathbf{R}_1 \\\\ \\mathbf{0} \\end{bmatrix}$.\n",
    "\n",
    "<img src=\"QR_by_Givens.png\" width=\"600\" align=\"center\"/>\n",
    "\n",
    "* Zeros in $\\mathbf{X}$ can also be introduced row-by-row.\n",
    "\n",
    "* If $\\mathbf{X} \\in \\mathbb{R}^{n \\times p}$, the total cost is $3np^2 - p^3$ flops and $O(np)$ square roots.\n",
    "\n",
    "* Note each Givens transform can be summarized by a single number, which is stored in the zeroed entry of $\\mathbf{X}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Conditioning of linear regression by least squares\n",
    "\n",
    "* Recall that for nonsingular linear equation solve, i.e. $\\mathbf{A}\\mathbf{x} = \\mathbf{b}$, the condition number  of the problem with $\\mathbf{b}$ held fixed is equal to the condition number of matrix $\\mathbf{A}$\n",
    "$$\n",
    "    \\kappa(\\mathbf{A}) = \\sigma_{\\max}(\\mathbf{A})/\\sigma_{\\min}(\\mathbf{A}) = \\sigma_{\\max}(\\mathbf{A})\\sigma_{\\max}(\\mathbf{A}^{-1}). % \\|\\mathbf{A}\\|\\|\\mathbf{A}^{-1}\\|.\n",
    "$$\n",
    "\n",
    "* If $\\mathbf{A}$ is not square, then its condition number is\n",
    "$$\n",
    "    \\kappa(\\mathbf{A}) = \\sigma_{\\max}(\\mathbf{A})\\sigma_{\\max}(\\mathbf{A}^{\\dagger})=\\sigma_{\\max}(\\mathbf{A})/\\sigma_{\\min}(\\mathbf{A}), %\\|\\mathbf{A}\\|\\|\\mathbf{A}^{\\dagger}\\|,\n",
    "$$\n",
    "where $\\mathbf{A}^{\\dagger}$ is Moore-Penrose pseudoinverse of $\\mathbf{A}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* **Condition number for the least squares problem** $\\mathbf{y} \\approx \\mathbf{X}\\boldsymbol{\\beta}$ is more complicated and depends on the residual. It can be shown that the condition number is\n",
    "$$\n",
    "    \\kappa(\\mathbf{X}) + \\frac{\\kappa(\\mathbf{X})^2\\tan\\theta}{\\eta},\n",
    "$$\n",
    "where\n",
    "\n",
    "$$\n",
    "    \\theta = \\cos^{-1}\\frac{\\|\\mathbf{X}\\boldsymbol{\\beta}\\|}{\\|\\boldsymbol{\\beta}\\|},\n",
    "    \\quad\n",
    "    \\eta = \\frac{\\|\\mathbf{X}\\|\\|\\boldsymbol{\\beta}\\|}{\\|\\mathbf{X}\\boldsymbol{\\beta}\\|}\n",
    "    .\n",
    "$$\n",
    "\n",
    "* So if $\\kappa(\\mathbf{X})$ is large ($\\mathbf{X}$ is close to collinear), the condition number of the least squares problem is dominated by $\\kappa(\\mathbf{X})^2$, unless the residuals are small.\n",
    "    - In this case, *stable* algorithm is preferred."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* This is because the problem is equivalent to the normal equation\n",
    "$$\n",
    "    \\mathbf{X}^T\\mathbf{X}\\boldsymbol{\\beta} = \\mathbf{X}^T\\mathbf{y}.\n",
    "$$\n",
    "    \n",
    "* Consider the simple case\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\t\\mathbf{X} = \\begin{bmatrix}\n",
    "\t1 & 1 \\\\\n",
    "\t10^{-3} & 0 \\\\\n",
    "\t0 & 10^{-3}\n",
    "\t\\end{bmatrix}.\n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "Forming normal equation yields a singular Gramian matrix\n",
    "$$\n",
    "\\begin{eqnarray*}\n",
    "\t\\mathbf{X}^T \\mathbf{X} = \\begin{bmatrix}\n",
    "\t1 & 1 \\\\\n",
    "\t1 & 1\n",
    "\t\\end{bmatrix}\n",
    "\\end{eqnarray*}\n",
    "$$\n",
    "if executed with a precision of 6 decimal digits. This has the condition number $\\kappa(\\mathbf{X}^T\\mathbf{X})=\\infty$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Summary of linear regression\n",
    "\n",
    "Methods for solving linear regression $\\widehat \\beta = (\\mathbf{X}^T \\mathbf{X})^{-1} \\mathbf{X}^T \\mathbf{y}$:\n",
    "\n",
    "| Method            | Flops                  | Remarks                 | Software | Stability   |\n",
    "| :---------------: | :--------------------: | :---------------------: | :------: | :---------: |\n",
    "| Cholesky          | $np^2 + p^3/3$         |                         |          | unstable |\n",
    "| QR by Householder | $2np^2 - (2/3)p^3$     |                         | R, Julia, MATLAB     | stable      |\n",
    "| QR by MGS         | $2np^2$                | $Q_1$ available         |          | stable      | \n",
    "| QR by SVD         | $4n^2p + 8np^2 + 9p^3$ | $X = UDV^T$             |          | most stable |  \n",
    "\n",
    "Remarks:\n",
    "\n",
    "1. When $n \\gg p$, Cholesky is twice faster than QR and need less space.  \n",
    "2. Cholesky is based on the **Gram matrix** $\\mathbf{X}^T \\mathbf{X}$, which can be dynamically updated with incoming data. They can handle huge $n$, moderate $p$ data sets that cannot fit into memory.  \n",
    "3. QR methods are more stable and produce numerically more accurate solution.  \n",
    "4. MGS appears slower than Householder, but it yields $\\mathbf{Q}_1$.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **There is simply no such thing as a universal 'gold standard' when it comes to algorithms.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Underdetermined least squares\n",
    "\n",
    "* When the Gram matrix $\\mathbf{X}^T\\mathbf{X}$ is singular (this happens if $\\text{rank}(\\mathbf{X})<p$, in particular $n < p$), the normal equation $\\mathbf{X}^T\\mathbf{X}\\boldsymbol{\\beta} = \\mathbf{X}^T\\mathbf{y}$ is *underdetermined*.\n",
    "\n",
    "* It has *many* solutions unless an additional condition is added.\n",
    "\n",
    "* Typically we require solution $\\mathbf{x}$ has a smallest norm; the solution depends on the choice of the norm."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Example: minimum $\\ell_2$ norm solution\n",
    "$$\n",
    "    \\text{minimize}~\\|\\boldsymbol{\\beta}\\|_2 ~\\text{subject to}~\\mathbf{X}^T\\mathbf{X}\\boldsymbol{\\beta} = \\mathbf{X}^T\\mathbf{y}\n",
    "$$\n",
    "    - Solution is closed form: $\\hat{\\boldsymbol{\\beta}} = \\mathbf{X}^{\\dagger}\\mathbf{y}$, where $\\mathbf{X}^{\\dagger}$ is the Moore-Penrose pseudoinverse of $\\mathbf{X}$: $\\mathbf{X}^{\\dagger} = \\mathbf{X}^T(\\mathbf{X}\\mathbf{X}^T)^{-1}$ if $n<p$ and $\\mathbf{X}$ has full row rank.\n",
    "    - In fact $\\hat{\\boldsymbol{\\beta}}$ is a limit of the ridge regression: $\\hat{\\boldsymbol{\\beta}} = \\lim_{\\lambda\\to 0}(\\mathbf{X}^T\\mathbf{X}+\\lambda\\mathbf{I})^{-1}\\mathbf{X}^T\\mathbf{y}$.\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Example: minimum $\\ell_0$ \"norm\" solution\n",
    "$$\n",
    "    \\text{minimize}~\\|\\boldsymbol{\\beta}\\|_0 ~\\text{subject to}~\\mathbf{X}^T\\mathbf{X}\\boldsymbol{\\beta} = \\mathbf{X}^T\\mathbf{y}\n",
    "$$\n",
    "    - $\\|\\mathbf{x}\\|_0 \\triangleq \\sum_{i=1}^p \\mathbf{1}_{\\{x_i\\neq 0\\}}$, the \"sparsest\" solution.\n",
    "    - Model selection problem. Combinatorial (intractable if $p$ large).\n",
    "    \n",
    "* Compromise: minimum $\\ell_1$ norm solution    \n",
    "$$\n",
    "    \\text{minimize}~\\|\\boldsymbol{\\beta}\\|_1 ~\\text{subject to}~\\mathbf{X}^T\\mathbf{X}\\boldsymbol{\\beta} = \\mathbf{X}^T\\mathbf{y}\n",
    "$$\n",
    "    - LASSO.\n",
    "    - Convex optimization problem (linear programming). Tractable.\n",
    "    - Under certain conditions recovers the minimum $\\ell_0$ norm solution."
   ]
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "4.2.2"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "65px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "threshold": 4,
   "toc_cell": true,
   "toc_section_display": "block",
   "toc_window_display": true,
   "widenNotebook": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}