{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Julia Version 1.1.0\n", "Commit 80516ca202 (2019-01-21 21:24 UTC)\n", "Platform Info:\n", " OS: macOS (x86_64-apple-darwin14.5.0)\n", " CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz\n", " WORD_SIZE: 64\n", " LIBM: libopenlibm\n", " LLVM: libLLVM-6.0.1 (ORCJIT, skylake)\n", "Environment:\n", " JULIA_EDITOR = code\n" ] } ], "source": [ "versioninfo()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Summary of linear regression\n", "\n", "Methods for solving linear regression $\\widehat \\beta = (\\mathbf{X}^T \\mathbf{X})^{-1} \\mathbf{X}^T \\mathbf{y}$:\n", "\n", "| Method | Flops | Remarks | Software | Stability |\n", "| :---------------: | :--------------------: | :---------------------: | :------: | :---------: |\n", "| Sweep | $np^2 + p^3$ | $(X^TX)^{-1}$ available | SAS | less stable |\n", "| Cholesky | $np^2 + p^3/3$ | | | less stable |\n", "| QR by Householder | $2np^2 - (2/3)p^3$ | | R | stable |\n", "| QR by MGS | $2np^2$ | $Q_1$ available | | stable | \n", "| QR by SVD | $4n^2p + 8np^2 + 9p^3$ | $X = UDV^T$ | | most stable | \n", "\n", "Remarks:\n", "\n", "1. When $n \\gg p$, sweep and Cholesky are twice faster than QR and need less space. \n", "2. Sweep and Cholesky are based on the **Gram matrix** $\\mathbf{X}^T \\mathbf{X}$, which can be dynamically updated with incoming data. They can handle huge $n$, moderate $p$ data sets that cannot fit into memory. \n", "3. QR methods are more stable and produce numerically more accurate solution. \n", "4. Although sweep is slower than Cholesky, it yields standard errors and so on. \n", "5. MGS appears slower than Householder, but it yields $\\mathbf{Q}_1$." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "> **There is simply no such thing as a universal 'gold standard' when it comes to algorithms.**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "linreg_svd (generic function with 1 method)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using SweepOperator, BenchmarkTools, LinearAlgebra\n", "\n", "linreg_cholesky(y::Vector, X::Matrix) = cholesky!(X'X) \\ (X'y)\n", "\n", "linreg_qr(y::Vector, X::Matrix) = X \\ y\n", "\n", "function linreg_sweep(y::Vector, X::Matrix)\n", " p = size(X, 2)\n", " xy = [X y]\n", " tableau = xy'xy\n", " sweep!(tableau, 1:p)\n", " return tableau[1:p, end]\n", "end\n", "\n", "function linreg_svd(y::Vector, X::Matrix)\n", " xsvd = svd(X)\n", " return xsvd.V * ((xsvd.U'y) ./ xsvd.S)\n", "end" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "linreg_cholesky(y, X) = [0.390365, 0.262759, 0.149047]\n", "linreg_qr(y, X) = [0.390365, 0.262759, 0.149047]\n", "linreg_sweep(y, X) = [0.390365, 0.262759, 0.149047]\n", "linreg_svd(y, X) = [0.390365, 0.262759, 0.149047]\n" ] } ], "source": [ "using Random\n", "\n", "Random.seed!(280) # seed\n", "\n", "n, p = 10, 3\n", "X = randn(n, p)\n", "y = randn(n)\n", "\n", "# check these methods give same answer\n", "@show linreg_cholesky(y, X)\n", "@show linreg_qr(y, X)\n", "@show linreg_sweep(y, X)\n", "@show linreg_svd(y, X);" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 708.31 KiB\n", " allocs estimate: 8\n", " --------------\n", " minimum time: 1.590 ms (0.00% GC)\n", " median time: 1.756 ms (0.00% GC)\n", " mean time: 1.821 ms (2.78% GC)\n", " maximum time: 39.979 ms (95.16% GC)\n", " --------------\n", " samples: 2733\n", " evals/sample: 1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n, p = 1000, 300\n", "X = randn(n, p)\n", "y = randn(n)\n", "\n", "@benchmark linreg_cholesky(y, X)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 2.99 MiB\n", " allocs estimate: 7\n", " --------------\n", " minimum time: 5.200 ms (0.00% GC)\n", " median time: 7.225 ms (0.00% GC)\n", " mean time: 7.300 ms (2.58% GC)\n", " maximum time: 48.544 ms (82.00% GC)\n", " --------------\n", " samples: 684\n", " evals/sample: 1" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@benchmark linreg_sweep(y, X)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 4.04 MiB\n", " allocs estimate: 2444\n", " --------------\n", " minimum time: 10.474 ms (0.00% GC)\n", " median time: 11.760 ms (0.00% GC)\n", " mean time: 12.302 ms (3.93% GC)\n", " maximum time: 52.806 ms (76.80% GC)\n", " --------------\n", " samples: 406\n", " evals/sample: 1" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@benchmark linreg_qr(y, X)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 8.06 MiB\n", " allocs estimate: 16\n", " --------------\n", " minimum time: 31.354 ms (0.00% GC)\n", " median time: 32.280 ms (0.00% GC)\n", " mean time: 33.278 ms (3.04% GC)\n", " maximum time: 84.534 ms (59.10% GC)\n", " --------------\n", " samples: 151\n", " evals/sample: 1" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@benchmark linreg_svd(y, X)" ] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "kernelspec": { "display_name": "Julia 1.1.0", "language": "julia", "name": "julia-1.1" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.1.0" }, "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "31px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "skip_h1_title": true, "threshold": 4, "toc_cell": true, "toc_position": { "height": "452.3333435058594px", "left": "0px", "right": "895.3333129882813px", "top": "96.66666412353516px", "width": "136px" }, "toc_section_display": "block", "toc_window_display": true, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 2 }