{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Julia Version 1.1.0\n", "Commit 80516ca202 (2019-01-21 21:24 UTC)\n", "Platform Info:\n", " OS: macOS (x86_64-apple-darwin14.5.0)\n", " CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz\n", " WORD_SIZE: 64\n", " LIBM: libopenlibm\n", " LLVM: libLLVM-6.0.1 (ORCJIT, skylake)\n", "Environment:\n", " JULIA_EDITOR = code\n" ] } ], "source": [ "versioninfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# QR Decomposition\n", "\n", "* We learnt Cholesky decomposition as **one** approach for solving linear regression.\n", "\n", "* A second approach for linear regression uses QR decomposition. \n", " **This is how the `lm()` function in R does linear regression.**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 0.3795466676698624\n", " 0.6508866456093488\n", " 0.392250419565355 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Random\n", "\n", "Random.seed!(280) # seed\n", "\n", "n, p = 5, 3\n", "X = randn(n, p) # predictor matrix\n", "y = randn(n) # response vector\n", "\n", "# finds the least squares solution\n", "X \\ y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to understand what is QR and how it is used for solving least squares problem." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Definition\n", "\n", "* Assume $\\mathbf{X} \\in \\mathbb{R}^{n \\times p}$ has full column rank. \n", "\n", "* **Full QR decomposition**: \n", "$$\n", " \\mathbf{X} = \\mathbf{Q} \\mathbf{R}, \n", "$$\n", "where \n", " - $\\mathbf{Q} \\in \\mathbb{R}^{n \\times n}$, $\\mathbf{Q}^T \\mathbf{Q} = \\mathbf{I}_n$. In other words, $\\mathbf{Q}$ is an orthogonal matrix. \n", " - First $p$ columns of $\\mathbf{Q}$ form an orthonormal basis of ${\\cal C}(\\mathbf{X})$ (**column space** of $\\mathbf{X}$) \n", " - Last $n-p$ columns of $\\mathbf{Q}$ form an orthonormal basis of ${\\cal N}(\\mathbf{X}^T)$ (**null space** of $\\mathbf{X}^T$)\n", " - $\\mathbf{R} \\in \\mathbb{R}^{n \\times p}$ is upper triangular with positive diagonal entries. \n", " \n", "* **Skinny (or thin) QR decomposition**:\n", "$$\n", " \\mathbf{X} = \\mathbf{Q}_1 \\mathbf{R}_1,\n", "$$\n", "where\n", " - $\\mathbf{Q}_1 \\in \\mathbb{R}^{n \\times p}$, $\\mathbf{Q}_1^T \\mathbf{Q}_1 = \\mathbf{I}_p$. In other words, $\\mathbf{Q}_1$ has orthonormal columns. \n", " - $\\mathbf{R}_1 \\in \\mathbb{R}^{p \\times p}$ is an upper triangular matrix with positive diagonal entries. \n", " \n", "* Given QR decompositin $\\mathbf{X} = \\mathbf{Q} \\mathbf{R}$,\n", "$$\n", " \\mathbf{X}^T \\mathbf{X} = \\mathbf{R}^T \\mathbf{Q}^T \\mathbf{Q} \\mathbf{R} = \\mathbf{R}^T \\mathbf{R}.\n", "$$\n", "Therefore $\\mathbf{R}$ is the same as the upper triangular Choleksy factor of the **Gram matrix** $\\mathbf{X}^T \\mathbf{X}$.\n", "\n", "* There are 3 algorithms to compute QR: (modified) Gram-Schmidt, Householder transform, (fast) Givens transform.\n", "\n", " In particular, the **Householder transform** for QR is implemented in LAPACK and thus used in R and Julia." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Algorithms\n", "\n", "### QR by Gram-Schmidt\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "* Assume $\\mathbf{X} = (\\mathbf{x}_1, \\ldots, \\mathbf{x}_p) \\in \\mathbb{R}^{n \\times p}$ has full column rank. \n", "\n", "* Gram-Schmidt (GS) algorithm produces the **skinny QR**: \n", "$$\n", " \\mathbf{X} = \\mathbf{Q} \\mathbf{R},\n", "$$\n", "where $\\mathbf{Q} \\in \\mathbb{R}^{n \\times p}$ has orthonormal columns and $\\mathbf{R} \\in \\mathbb{R}^{p \\times p}$ is an upper triangular matrix.\n", "\n", "* **Gram-Schmidt algorithm** orthonormalizes a set of non-zero, *linearly independent* vectors $\\mathbf{x}_1,\\ldots,\\mathbf{x}_p$. \n", "\n", " 0. Initialize $\\mathbf{q}_1 = \\mathbf{x}_1 / \\|\\mathbf{x}_1\\|_2$\n", " 0. For $k=2, \\ldots, p$, \n", "$$\n", "\\begin{eqnarray*}\n", "\t\\mathbf{v}_k &=& \\mathbf{x}_k - \\mathbf{P}_{{\\cal C}\\{\\mathbf{q}_1,\\ldots,\\mathbf{q}_{k-1}\\}} \\mathbf{x}_k = \\mathbf{x}_k - \\sum_{j=1}^{k-1} \\langle \\mathbf{x}_k, \\mathbf{q}_j \\rangle \\cdot \\mathbf{q}_j \\\\\n", "\t\\mathbf{q}_k &=& \\mathbf{v}_k / \\|\\mathbf{v}_k\\|_2\n", "\\end{eqnarray*}\n", "$$\n", "\n", "* Collectively we have $\\mathbf{X} = \\mathbf{Q} \\mathbf{R}$. \n", " - $\\mathbf{Q} \\in \\mathbb{R}^{n \\times p}$ has orthonormal columns $\\mathbf{q}_k$ thus $\\mathbf{Q}^T \\mathbf{Q} = \\mathbf{I}_p$ \n", " - Where is $\\mathbf{R}$? $\\mathbf{R} = \\mathbf{Q}^T \\mathbf{X}$ has entries $r_{jk} = \\langle \\mathbf{q}_j, \\mathbf{x}_k \\rangle$, which are computed during the Gram-Schmidt process. Note $r_{jk} = 0$ for $j > k$, so $\\mathbf{R}$ is upper triangular.\n", " \n", "* In GS algorithm, $\\mathbf{X}$ is over-written by $\\mathbf{Q}$ and $\\mathbf{R}$ is stored in a separate array.\n", "\n", "* The regular Gram-Schmidt is *unstable* (we loose orthogonality due to roundoff errors) when columns of $\\mathbf{X}$ are collinear.\n", "\n", "* **Modified Gram-Schmidt** (MGS): after each normalization step of $\\mathbf{v}_k$, we replace columns $\\mathbf{x}_j$, $j>k$, by its residual.\n", "\n", "* Why MGS is better than GS? Read