{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A list of \"easy\" linear systems\n", "\n", "Consider $\\mathbf{A} \\mathbf{x} = \\mathbf{b}$, $\\mathbf{A} \\in \\mathbb{R}^{n \\times n}$. Or, consider matrix inverse (if you want). $\\mathbf{A}$ can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, ... We should be alert to many easy linear systems. \n", "\n", "Don't blindly use `A \\ b` and `inv` in Julia or `solve` function in R. **Don't waste computing resources by bad choices of algorithms!**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Diagonal matrix\n", "\n", "Diagonal $\\mathbf{A}$: $n$ flops. Use `Diagonal` type of Julia." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\\(A::AbstractArray{T,2} where T, B::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T) at linalg/generic.jl:820" ], "text/plain": [ "\\(A::AbstractArray{T,2} where T, B::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T) in Base.LinAlg at linalg/generic.jl:820" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using BenchmarkTools\n", "\n", "srand(280)\n", "n = 1000\n", "A = diagm(randn(n)) # a full matrix\n", "b = randn(n)\n", "\n", "@which A \\ b" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 15.95 KiB\n", " allocs estimate: 5\n", " --------------\n", " minimum time: 729.697 μs (0.00% GC)\n", " median time: 743.364 μs (0.00% GC)\n", " mean time: 778.562 μs (0.06% GC)\n", " maximum time: 2.414 ms (62.58% GC)\n", " --------------\n", " samples: 6392\n", " evals/sample: 1" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check `istril(A)` and `istriu(A)`, then call `Diagonal(A) \\ b`\n", "@benchmark A \\ b" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 15.95 KiB\n", " allocs estimate: 5\n", " --------------\n", " minimum time: 3.791 μs (0.00% GC)\n", " median time: 4.701 μs (0.00% GC)\n", " mean time: 5.935 μs (14.60% GC)\n", " maximum time: 267.824 μs (95.83% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 8" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# O(n) computation, no extra array allocation\n", "@benchmark Diagonal(A) \\ b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bidiagonal, tridiagonal, and banded matrices\n", "\n", "Bidiagonal, tridiagonal, or banded $\\mathbf{A}$: Band LU, band Cholesky, ... roughly $O(n)$ flops. \n", "* Use [`Bidiagonal`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.Bidiagonal), [`Tridiagonal`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.Tridiagonal), [`SymTridiagonal`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.SymTridiagonal) types of Julia." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1000×1000 SymTridiagonal{Float64}:\n", " 0.126238 0.618244 ⋅ … ⋅ ⋅ ⋅ \n", " 0.618244 -2.34688 1.10206 ⋅ ⋅ ⋅ \n", " ⋅ 1.10206 1.91661 ⋅ ⋅ ⋅ \n", " ⋅ ⋅ -0.447244 ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋮ ⋱ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ 0.709662 ⋅ ⋅ \n", " ⋅ ⋅ ⋅ 0.542815 -0.363838 ⋅ \n", " ⋅ ⋅ ⋅ -0.363838 -1.04034 0.321148\n", " ⋅ ⋅ ⋅ ⋅ 0.321148 -0.276537" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srand(280) \n", "\n", "n = 1000\n", "dv = randn(n)\n", "ev = randn(n - 1)\n", "b = randn(n)\n", "# symmetric tridiagonal matrix\n", "A = SymTridiagonal(dv, ev)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.65 MiB\n", " allocs estimate: 8\n", " --------------\n", " minimum time: 12.552 ms (0.00% GC)\n", " median time: 13.446 ms (0.00% GC)\n", " mean time: 13.404 ms (4.16% GC)\n", " maximum time: 16.647 ms (0.00% GC)\n", " --------------\n", " samples: 373\n", " evals/sample: 1" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# convert to a full matrix\n", "Afull = full(A)\n", "\n", "# LU decomposition (2/3) n^3 flops!\n", "@benchmark Afull \\ b" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 23.89 KiB\n", " allocs estimate: 6\n", " --------------\n", " minimum time: 12.511 μs (0.00% GC)\n", " median time: 14.714 μs (0.00% GC)\n", " mean time: 17.326 μs (8.28% GC)\n", " maximum time: 2.081 ms (97.27% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 1" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# specialized algorithm for tridiagonal matrix, O(n) flops\n", "@benchmark A \\ b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Triangular matrix\n", "\n", "Triangular $\\mathbf{A}$: $n^2$ flops to solve linear system." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.95 KiB\n", " allocs estimate: 2\n", " --------------\n", " minimum time: 762.721 μs (0.00% GC)\n", " median time: 946.558 μs (0.00% GC)\n", " mean time: 1.051 ms (0.02% GC)\n", " maximum time: 7.309 ms (0.00% GC)\n", " --------------\n", " samples: 4721\n", " evals/sample: 1" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srand(280)\n", "\n", "n = 1000\n", "A = tril(randn(n, n))\n", "b = randn(n)\n", "\n", "# check istril() then triangular solve\n", "@benchmark A \\ b" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.97 KiB\n", " allocs estimate: 3\n", " --------------\n", " minimum time: 320.558 μs (0.00% GC)\n", " median time: 380.211 μs (0.00% GC)\n", " mean time: 430.062 μs (0.07% GC)\n", " maximum time: 4.834 ms (0.00% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 1" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# triangular solve directly; save the cost of istril()\n", "@benchmark LowerTriangular(A) \\ b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Block diagonal matrix\n", "\n", "Block diagonal: Suppose $n = \\sum_i n_i$. $(\\sum_i n_i)^3$ vs $\\sum_i n_i^3$. \n", "\n", "Julia has a [`blkdiag`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.SparseArrays.blkdiag) function that generates a **sparse** matrix. Anyone interested writing a `BlockDiagonal.jl` package?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1000×1000 SparseMatrixCSC{Float64,Int64} with 969 stored entries:\n", " [31 , 1] = 2.07834\n", " [53 , 1] = -1.11883\n", " [58 , 1] = -0.66448\n", " [14 , 4] = 1.11793\n", " [96 , 5] = 1.22813\n", " [81 , 8] = -0.919643\n", " [48 , 9] = 1.0185\n", " [49 , 9] = -0.996332\n", " [15 , 10] = 1.30841\n", " [28 , 10] = -0.818757\n", " ⋮\n", " [956 , 987] = -0.900804\n", " [967 , 987] = -0.438788\n", " [971 , 991] = 0.176756\n", " [929 , 992] = -1.17384\n", " [974 , 993] = 1.59235\n", " [967 , 994] = 0.542169\n", " [994 , 995] = 0.627832\n", " [998 , 997] = 0.60382\n", " [935 , 998] = 0.342675\n", " [947 , 998] = 0.482228\n", " [975 , 1000] = 0.991598" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srand(280)\n", "\n", "B = 10 # number of blocks\n", "ni = 100\n", "A = blkdiag([sprandn(ni, ni, 0.01) for b in 1:B]...)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Kronecker product\n", "\n", "Use\n", "$$\n", "\\begin{eqnarray*}\n", " (\\mathbf{A} \\otimes \\mathbf{B})^{-1} &=& \\mathbf{A}^{-1} \\otimes \\mathbf{B}^{-1} \\\\\n", " (\\mathbf{C}^T \\otimes \\mathbf{A}) \\text{vec}(\\mathbf{B}) &=& \\text{vec}(\\mathbf{A} \\mathbf{B} \\mathbf{C}).\n", "\\end{eqnarray*} \n", "$$\n", "to avoid forming and doing computation on the potentially huge Kronecker $\\mathbf{A} \\otimes \\mathbf{B}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sparse matrix\n", "\n", "Sparsity: sparse matrix decomposition or iterative method. \n", "* The easiest recognizable structure. Familiarize yourself with the sparse matrix computation tools in Julia, Matlab, R (`Matrix` package), MKL (sparse BLAS), ... as much as possible." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.005096" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srand(280)\n", "\n", "n = 1000\n", "# a sparse pd matrix, about 0.5% non-zero entries\n", "A = sprand(n, n, 0.002)\n", "A = A + A' + (2n) * I\n", "b = randn(n)\n", "Afull = full(A)\n", "countnz(A) / length(A) # sparsity" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.94 KiB\n", " allocs estimate: 1\n", " --------------\n", " minimum time: 100.168 μs (0.00% GC)\n", " median time: 186.690 μs (0.00% GC)\n", " mean time: 201.362 μs (0.00% GC)\n", " maximum time: 2.580 ms (0.00% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 1" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# dense matrix-vector multiplication\n", "@benchmark Afull * b" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.94 KiB\n", " allocs estimate: 1\n", " --------------\n", " minimum time: 6.943 μs (0.00% GC)\n", " median time: 7.599 μs (0.00% GC)\n", " mean time: 8.241 μs (2.32% GC)\n", " maximum time: 483.364 μs (95.82% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 5" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# sparse matrix-vector multiplication\n", "@benchmark A * b" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.63 MiB\n", " allocs estimate: 8\n", " --------------\n", " minimum time: 9.110 ms (0.00% GC)\n", " median time: 11.609 ms (0.00% GC)\n", " mean time: 11.827 ms (8.51% GC)\n", " maximum time: 68.000 ms (85.06% GC)\n", " --------------\n", " samples: 422\n", " evals/sample: 1" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# dense Cholesky decomposition\n", "@benchmark cholfact(Afull)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 1.33 MiB\n", " allocs estimate: 53\n", " --------------\n", " minimum time: 3.326 ms (0.00% GC)\n", " median time: 3.434 ms (0.00% GC)\n", " mean time: 3.631 ms (0.99% GC)\n", " maximum time: 7.742 ms (0.00% GC)\n", " --------------\n", " samples: 1375\n", " evals/sample: 1" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# sparse Cholesky decomposition\n", "@benchmark cholfact(A)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.64 MiB\n", " allocs estimate: 10\n", " --------------\n", " minimum time: 10.132 ms (0.00% GC)\n", " median time: 11.326 ms (0.00% GC)\n", " mean time: 11.478 ms (5.48% GC)\n", " maximum time: 16.560 ms (0.00% GC)\n", " --------------\n", " samples: 435\n", " evals/sample: 1" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# solve via dense Cholesky\n", "xchol = cholfact(Afull) \\ b\n", "@benchmark cholfact(Afull) \\ b" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.020546956752686e-18" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# solve via sparse Cholesky\n", "xcholsp = cholfact(A) \\ b\n", "vecnorm(xchol - xcholsp)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 1.36 MiB\n", " allocs estimate: 64\n", " --------------\n", " minimum time: 3.825 ms (0.00% GC)\n", " median time: 3.928 ms (0.00% GC)\n", " mean time: 4.094 ms (0.92% GC)\n", " maximum time: 6.442 ms (0.00% GC)\n", " --------------\n", " samples: 1220\n", " evals/sample: 1" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@benchmark cholfact(A) \\ b" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mRecompiling stale cache file /Users/huazhou/.julia/lib/v0.6/IterativeSolvers.ji for module IterativeSolvers.\n", "\u001b[39m" ] }, { "data": { "text/plain": [ "0.022750177100322466" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# sparse solve via conjugate gradient\n", "using IterativeSolvers\n", "\n", "xcg, = cg(A, b)\n", "vecnorm(xcg - xchol)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 33.30 KiB\n", " allocs estimate: 27\n", " --------------\n", " minimum time: 31.181 μs (0.00% GC)\n", " median time: 34.950 μs (0.00% GC)\n", " mean time: 40.390 μs (8.18% GC)\n", " maximum time: 2.927 ms (94.92% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 1" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@benchmark cg(A, b)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Easy plus low rank\n", "\n", "Easy plus low rank: $\\mathbf{U} \\in \\mathbb{R}^{n \\times r}$, $\\mathbf{V} \\in \\mathbb{R}^{r \\times n}$, $r \\ll n$. Woodbury formula\n", "$$\n", "\t(\\mathbf{A} + \\mathbf{U} \\mathbf{V}^T)^{-1} = \\mathbf{A}^{-1} - \\mathbf{A}^{-1} \\mathbf{U} (\\mathbf{I}_r + \\mathbf{V}^T \\mathbf{A}^{-1} \\mathbf{U})^{-1} \\mathbf{V}^T \\mathbf{A}^{-1},\n", "$$\n", "\n", "* Keep HW2 Q2 in mind. \n", "* [`WoodburyMatrices.jl`](https://github.com/timholy/WoodburyMatrices.jl) package can be useful." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1000×1000 Symmetric{Float64,Array{Float64,2}}:\n", " 1.8571 0.513107 0.872146 … 0.764278 -0.241331 0.54921 \n", " 0.513107 4.57505 -0.636972 -1.86465 -1.92237 -1.72569 \n", " 0.872146 -0.636972 4.81387 1.99357 1.99337 3.66327 \n", " -0.516414 -0.996711 -0.0919924 0.262832 0.612402 0.621834\n", " 0.193686 1.68244 -0.770028 -0.723437 -1.4868 -1.32247 \n", " 1.6567 0.0634435 -0.901968 … -0.241872 -0.0356772 -0.39826 \n", " 0.553996 -0.274515 2.21265 0.219437 2.20382 2.60902 \n", " 0.402356 1.89288 -1.13032 -0.771441 -1.96862 -1.93483 \n", " -1.07744 -1.63881 1.78016 0.96551 1.7292 1.91326 \n", " -2.21617 -2.90695 -2.55971 -0.47867 0.855389 -0.933916\n", " 1.29975 0.779828 4.12459 … 1.87358 0.737112 2.84136 \n", " -0.80833 1.44882 1.67581 -0.139063 -0.107873 0.818132\n", " -2.32469 -4.83109 -2.31796 -0.0346402 2.65564 0.591075\n", " ⋮ ⋱ \n", " 0.334995 0.0914829 1.60026 0.0937123 1.40804 1.92919 \n", " 0.390722 2.36157 -1.58383 -1.92435 -1.3291 -1.88114 \n", " 1.19218 0.845478 1.9362 … -0.0890571 1.36046 2.01013 \n", " 0.915526 0.889395 -0.606443 -0.428052 -0.817555 -1.2017 \n", " -0.778472 2.1444 1.50696 0.00689644 -1.28104 -0.141234\n", " 0.275366 -0.25866 0.416593 0.481534 0.0874531 0.316543\n", " -0.289749 -1.16146 0.550825 0.698152 0.789054 0.949917\n", " 0.439213 -0.379216 1.0951 … 0.626399 0.624574 0.8568 \n", " -0.592534 -0.235847 -1.11586 -0.601497 -0.0651787 -0.573737\n", " 0.764278 -1.86465 1.99357 3.21548 0.932773 1.97505 \n", " -0.241331 -1.92237 1.99337 0.932773 3.43991 2.8539 \n", " 0.54921 -1.72569 3.66327 1.97505 2.8539 4.85234 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using BenchmarkTools, WoodburyMatrices\n", "\n", "srand(280)\n", "n = 1000\n", "r = 5\n", "\n", "A = Diagonal(rand(n))\n", "B = randn(n, r)\n", "D = Diagonal(rand(r))\n", "b = randn(n)\n", "# W = A + B * D * B'\n", "W = SymWoodbury(A, B, D)\n", "Wfull = Symmetric(full(W))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.64 MiB\n", " allocs estimate: 9\n", " --------------\n", " minimum time: 8.913 ms (0.00% GC)\n", " median time: 9.119 ms (0.00% GC)\n", " mean time: 9.485 ms (6.82% GC)\n", " maximum time: 12.387 ms (15.54% GC)\n", " --------------\n", " samples: 527\n", " evals/sample: 1" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# solve via Cholesky\n", "@benchmark cholfact(Wfull) \\ b" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 76.16 KiB\n", " allocs estimate: 26\n", " --------------\n", " minimum time: 19.164 μs (0.00% GC)\n", " median time: 24.648 μs (0.00% GC)\n", " mean time: 34.932 μs (24.29% GC)\n", " maximum time: 4.617 ms (98.44% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 1" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# solve using Woodbury formula\n", "@benchmark W \\ b" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.94 KiB\n", " allocs estimate: 1\n", " --------------\n", " minimum time: 193.856 μs (0.00% GC)\n", " median time: 203.398 μs (0.00% GC)\n", " mean time: 205.489 μs (0.00% GC)\n", " maximum time: 531.061 μs (0.00% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 1" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# multiplication without using Woodbury structure\n", "@benchmark Wfull * b" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 24.06 KiB\n", " allocs estimate: 5\n", " --------------\n", " minimum time: 4.910 μs (0.00% GC)\n", " median time: 6.397 μs (0.00% GC)\n", " mean time: 8.787 μs (20.79% GC)\n", " maximum time: 374.038 μs (96.19% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 7" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# multiplication using Woodbury structure\n", "@benchmark W * b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Easy plus border\n", "\n", "Easy plus border: For $\\mathbf{A}$ pd and $\\mathbf{V}$ full row rank,\n", "$$\n", "\t\\begin{pmatrix}\n", "\t\\mathbf{A} & \\mathbf{V}^T \\\\\n", "\t\\mathbf{V} & \\mathbf{0}\n", "\t\\end{pmatrix}^{-1} = \\begin{pmatrix}\n", "\t\\mathbf{A}^{-1} - \\mathbf{A}^{-1} \\mathbf{V}^T (\\mathbf{V} \\mathbf{A}^{-1} \\mathbf{V}^T)^{-1} \\mathbf{V} \\mathbf{A}^{-1} & \\mathbf{A}^{-1} \\mathbf{V}^T (\\mathbf{V} \\mathbf{A}^{-1} \\mathbf{V}^T)^{-1} \\\\\n", "\t(\\mathbf{V} \\mathbf{A}^{-1} \\mathbf{V}^T)^{-1} \\mathbf{V} \\mathbf{A}^{-1} & - (\\mathbf{V} \\mathbf{A}^{-1} \\mathbf{V}^T)^{-1}\n", "\t\\end{pmatrix}.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Orthogonal matrix\n", "\n", "Orthogonal $\\mathbf{A}$: $n^2$ flops **at most**. Permutation matrix, Householder matrix, Jacobi matrix, ... take less." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Toeplitz matrix\n", "\n", "Toeplitz systems:\n", "$$\n", "\t\\mathbf{T} = \\begin{pmatrix}\n", "\tr_0 & r_1 & r_2 & r_3 \\\\\n", "\tr_{-1} & r_0 & r_1 & r_2 \\\\\n", "\tr_{-2} & r_{-1} & r_0 & r_1 \\\\\n", "\tr_{-3} & r_{-2} & r_{-1} & r_0\n", "\t\\end{pmatrix}.\n", "$$\n", "$\\mathbf{T} \\mathbf{x} = \\mathbf{b}$, where $\\mathbf{T}$ is pd and Toeplitz, can be solved in $O(n^2)$ flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general $\\mathbf{b}$), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.\n", "\n", "* [`ToeplitzMatrices.jl`](https://github.com/JuliaMatrices/ToeplitzMatrices.jl) package can be useful." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Circulant matrix\n", "\n", "Circulant systems: Toeplitz matrix with wraparound\n", "$$\n", "\tC(\\mathbf{z}) = \\begin{pmatrix}\n", "\tz_0 & z_4 & z_3 & z_2 & z_1 \\\\\n", "\tz_1 & z_0 & z_4 & z_3 & z_2 \\\\\n", "\tz_2 & z_1 & z_0 & z_4 & z_3 \\\\\n", "\tz_3 & z_2 & z_1 & z_0 & z_4 \\\\\n", "\tz_4 & z_3 & z_2 & z_1 & z_0\n", "\t\\end{pmatrix},\n", "$$\n", "FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vandermonde matrix\n", "\n", "Vandermonde matrix: such as in interpolation and approximation problems\n", "$$\n", "\t\\mathbf{V}(x_0,\\ldots,x_n) = \\begin{pmatrix}\n", "\t1 & 1 & \\cdots & 1 \\\\\n", "\tx_0 & x_1 & \\cdots & x_n \\\\\n", "\t\\vdots & \\vdots & & \\vdots \\\\\n", "\tx_0^n & x_1^n & \\cdots & x_n^n\n", "\t\\end{pmatrix}.\n", "$$\n", "$\\mathbf{V} \\mathbf{x} = \\mathbf{b}$ or $\\mathbf{V}^T \\mathbf{x} = \\mathbf{b}$ can be solved in $O(n^2)$ flops." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cauchy-like matrix\n", "\n", "Cauchy-like matrices:\n", "$$\n", "\t\\Omega \\mathbf{A} - \\mathbf{A} \\Lambda = \\mathbf{R} \\mathbf{S}^T,\n", "$$\n", "where $\\Omega = \\text{diag}(\\omega_1,\\ldots,\\omega_n)$ and $\\Lambda = \\text{diag}(\\lambda_1,\\ldots, \\lambda_n)$. $O(n)$ flops for LU and QR." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Structured-rank matrix\n", "\n", "Structured-rank problems: semiseparable matrices (LU and QR takes $O(n)$ flops), quasiseparable matrices, ..." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Julia Version 0.6.2\n", "Commit d386e40c17 (2017-12-13 18:08 UTC)\n", "Platform Info:\n", " OS: macOS (x86_64-apple-darwin14.5.0)\n", " CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz\n", " WORD_SIZE: 64\n", " BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)\n", " LAPACK: libopenblas64_\n", " LIBM: libopenlibm\n", " LLVM: libLLVM-3.9.1 (ORCJIT, skylake)\n" ] } ], "source": [ "versioninfo()" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.6.2", "language": "julia", "name": "julia-0.6" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.6.2" }, "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "30px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 4, "toc_cell": true, "toc_section_display": "block", "toc_window_display": true, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 2 }