{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "

1  A list of \"easy\" linear systems
1.1  Diagonal matrix
1.2  Bidiagonal, tridiagonal, and banded matrices
1.3  Triangular matrix
1.4  Block diagonal matrix
1.5  Kronecker product
1.6  Sparse matrix
1.7  Easy plus low rank
1.8  Easy plus border
1.9  Orthogonal matrix
1.10  Toeplitz matrix
1.11  Circulant matrix
1.12  Vandermonde matrix
1.13  Cauchy-like matrix
1.14  Structured-rank matrix
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A list of \"easy\" linear systems\n", "\n", "Consider $\\mathbf{A} \\mathbf{x} = \\mathbf{b}$, $\\mathbf{A} \\in \\mathbb{R}^{n \\times n}$. Or, consider matrix inverse (if you want). $\\mathbf{A}$ can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, ... We should be alert to many easy linear systems. \n", "\n", "Don't blindly use `A \\ b` and `inv` in Julia or `solve` function in R. **Don't waste computing resources by bad choices of algorithms!**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Diagonal matrix\n", "\n", "Diagonal $\\mathbf{A}$: $n$ flops. Use `Diagonal` type of Julia." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\\(A::AbstractArray{T,2} where T, B::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T) at linalg/generic.jl:820" ], "text/plain": [ "\\(A::AbstractArray{T,2} where T, B::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T) in Base.LinAlg at linalg/generic.jl:820" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using BenchmarkTools\n", "\n", "srand(280)\n", "n = 1000\n", "A = diagm(randn(n)) # a full matrix\n", "b = randn(n)\n", "\n", "@which A \\ b" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 15.95 KiB\n", " allocs estimate: 5\n", " --------------\n", " minimum time: 729.697 μs (0.00% GC)\n", " median time: 743.364 μs (0.00% GC)\n", " mean time: 778.562 μs (0.06% GC)\n", " maximum time: 2.414 ms (62.58% GC)\n", " --------------\n", " samples: 6392\n", " evals/sample: 1" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check `istril(A)` and `istriu(A)`, then call `Diagonal(A) \\ b`\n", "@benchmark A \\ b" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 15.95 KiB\n", " allocs estimate: 5\n", " --------------\n", " minimum time: 3.791 μs (0.00% GC)\n", " median time: 4.701 μs (0.00% GC)\n", " mean time: 5.935 μs (14.60% GC)\n", " maximum time: 267.824 μs (95.83% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 8" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# O(n) computation, no extra array allocation\n", "@benchmark Diagonal(A) \\ b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bidiagonal, tridiagonal, and banded matrices\n", "\n", "Bidiagonal, tridiagonal, or banded $\\mathbf{A}$: Band LU, band Cholesky, ... roughly $O(n)$ flops. \n", "* Use [`Bidiagonal`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.Bidiagonal), [`Tridiagonal`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.Tridiagonal), [`SymTridiagonal`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.LinAlg.SymTridiagonal) types of Julia." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1000×1000 SymTridiagonal{Float64}:\n", " 0.126238 0.618244 ⋅ … ⋅ ⋅ ⋅ \n", " 0.618244 -2.34688 1.10206 ⋅ ⋅ ⋅ \n", " ⋅ 1.10206 1.91661 ⋅ ⋅ ⋅ \n", " ⋅ ⋅ -0.447244 ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋮ ⋱ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n", " ⋅ ⋅ ⋅ 0.709662 ⋅ ⋅ \n", " ⋅ ⋅ ⋅ 0.542815 -0.363838 ⋅ \n", " ⋅ ⋅ ⋅ -0.363838 -1.04034 0.321148\n", " ⋅ ⋅ ⋅ ⋅ 0.321148 -0.276537" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srand(280) \n", "\n", "n = 1000\n", "dv = randn(n)\n", "ev = randn(n - 1)\n", "b = randn(n)\n", "# symmetric tridiagonal matrix\n", "A = SymTridiagonal(dv, ev)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.65 MiB\n", " allocs estimate: 8\n", " --------------\n", " minimum time: 12.552 ms (0.00% GC)\n", " median time: 13.446 ms (0.00% GC)\n", " mean time: 13.404 ms (4.16% GC)\n", " maximum time: 16.647 ms (0.00% GC)\n", " --------------\n", " samples: 373\n", " evals/sample: 1" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# convert to a full matrix\n", "Afull = full(A)\n", "\n", "# LU decomposition (2/3) n^3 flops!\n", "@benchmark Afull \\ b" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 23.89 KiB\n", " allocs estimate: 6\n", " --------------\n", " minimum time: 12.511 μs (0.00% GC)\n", " median time: 14.714 μs (0.00% GC)\n", " mean time: 17.326 μs (8.28% GC)\n", " maximum time: 2.081 ms (97.27% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 1" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# specialized algorithm for tridiagonal matrix, O(n) flops\n", "@benchmark A \\ b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Triangular matrix\n", "\n", "Triangular $\\mathbf{A}$: $n^2$ flops to solve linear system." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.95 KiB\n", " allocs estimate: 2\n", " --------------\n", " minimum time: 762.721 μs (0.00% GC)\n", " median time: 946.558 μs (0.00% GC)\n", " mean time: 1.051 ms (0.02% GC)\n", " maximum time: 7.309 ms (0.00% GC)\n", " --------------\n", " samples: 4721\n", " evals/sample: 1" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srand(280)\n", "\n", "n = 1000\n", "A = tril(randn(n, n))\n", "b = randn(n)\n", "\n", "# check istril() then triangular solve\n", "@benchmark A \\ b" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.97 KiB\n", " allocs estimate: 3\n", " --------------\n", " minimum time: 320.558 μs (0.00% GC)\n", " median time: 380.211 μs (0.00% GC)\n", " mean time: 430.062 μs (0.07% GC)\n", " maximum time: 4.834 ms (0.00% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 1" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# triangular solve directly; save the cost of istril()\n", "@benchmark LowerTriangular(A) \\ b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Block diagonal matrix\n", "\n", "Block diagonal: Suppose $n = \\sum_i n_i$. $(\\sum_i n_i)^3$ vs $\\sum_i n_i^3$. \n", "\n", "Julia has a [`blkdiag`](https://docs.julialang.org/en/stable/stdlib/linalg/#Base.SparseArrays.blkdiag) function that generates a **sparse** matrix. Anyone interested writing a `BlockDiagonal.jl` package?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1000×1000 SparseMatrixCSC{Float64,Int64} with 969 stored entries:\n", " [31 , 1] = 2.07834\n", " [53 , 1] = -1.11883\n", " [58 , 1] = -0.66448\n", " [14 , 4] = 1.11793\n", " [96 , 5] = 1.22813\n", " [81 , 8] = -0.919643\n", " [48 , 9] = 1.0185\n", " [49 , 9] = -0.996332\n", " [15 , 10] = 1.30841\n", " [28 , 10] = -0.818757\n", " ⋮\n", " [956 , 987] = -0.900804\n", " [967 , 987] = -0.438788\n", " [971 , 991] = 0.176756\n", " [929 , 992] = -1.17384\n", " [974 , 993] = 1.59235\n", " [967 , 994] = 0.542169\n", " [994 , 995] = 0.627832\n", " [998 , 997] = 0.60382\n", " [935 , 998] = 0.342675\n", " [947 , 998] = 0.482228\n", " [975 , 1000] = 0.991598" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srand(280)\n", "\n", "B = 10 # number of blocks\n", "ni = 100\n", "A = blkdiag([sprandn(ni, ni, 0.01) for b in 1:B]...)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Kronecker product\n", "\n", "Use\n", "$$\n", "\\begin{eqnarray*}\n", " (\\mathbf{A} \\otimes \\mathbf{B})^{-1} &=& \\mathbf{A}^{-1} \\otimes \\mathbf{B}^{-1} \\\\\n", " (\\mathbf{C}^T \\otimes \\mathbf{A}) \\text{vec}(\\mathbf{B}) &=& \\text{vec}(\\mathbf{A} \\mathbf{B} \\mathbf{C}).\n", "\\end{eqnarray*} \n", "$$\n", "to avoid forming and doing computation on the potentially huge Kronecker $\\mathbf{A} \\otimes \\mathbf{B}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sparse matrix\n", "\n", "Sparsity: sparse matrix decomposition or iterative method. \n", "* The easiest recognizable structure. Familiarize yourself with the sparse matrix computation tools in Julia, Matlab, R (`Matrix` package), MKL (sparse BLAS), ... as much as possible." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.005096" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "srand(280)\n", "\n", "n = 1000\n", "# a sparse pd matrix, about 0.5% non-zero entries\n", "A = sprand(n, n, 0.002)\n", "A = A + A' + (2n) * I\n", "b = randn(n)\n", "Afull = full(A)\n", "countnz(A) / length(A) # sparsity" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.94 KiB\n", " allocs estimate: 1\n", " --------------\n", " minimum time: 100.168 μs (0.00% GC)\n", " median time: 186.690 μs (0.00% GC)\n", " mean time: 201.362 μs (0.00% GC)\n", " maximum time: 2.580 ms (0.00% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 1" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# dense matrix-vector multiplication\n", "@benchmark Afull * b" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.94 KiB\n", " allocs estimate: 1\n", " --------------\n", " minimum time: 6.943 μs (0.00% GC)\n", " median time: 7.599 μs (0.00% GC)\n", " mean time: 8.241 μs (2.32% GC)\n", " maximum time: 483.364 μs (95.82% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 5" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# sparse matrix-vector multiplication\n", "@benchmark A * b" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.63 MiB\n", " allocs estimate: 8\n", " --------------\n", " minimum time: 9.110 ms (0.00% GC)\n", " median time: 11.609 ms (0.00% GC)\n", " mean time: 11.827 ms (8.51% GC)\n", " maximum time: 68.000 ms (85.06% GC)\n", " --------------\n", " samples: 422\n", " evals/sample: 1" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# dense Cholesky decomposition\n", "@benchmark cholfact(Afull)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 1.33 MiB\n", " allocs estimate: 53\n", " --------------\n", " minimum time: 3.326 ms (0.00% GC)\n", " median time: 3.434 ms (0.00% GC)\n", " mean time: 3.631 ms (0.99% GC)\n", " maximum time: 7.742 ms (0.00% GC)\n", " --------------\n", " samples: 1375\n", " evals/sample: 1" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# sparse Cholesky decomposition\n", "@benchmark cholfact(A)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.64 MiB\n", " allocs estimate: 10\n", " --------------\n", " minimum time: 10.132 ms (0.00% GC)\n", " median time: 11.326 ms (0.00% GC)\n", " mean time: 11.478 ms (5.48% GC)\n", " maximum time: 16.560 ms (0.00% GC)\n", " --------------\n", " samples: 435\n", " evals/sample: 1" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# solve via dense Cholesky\n", "xchol = cholfact(Afull) \\ b\n", "@benchmark cholfact(Afull) \\ b" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.020546956752686e-18" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# solve via sparse Cholesky\n", "xcholsp = cholfact(A) \\ b\n", "vecnorm(xchol - xcholsp)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 1.36 MiB\n", " allocs estimate: 64\n", " --------------\n", " minimum time: 3.825 ms (0.00% GC)\n", " median time: 3.928 ms (0.00% GC)\n", " mean time: 4.094 ms (0.92% GC)\n", " maximum time: 6.442 ms (0.00% GC)\n", " --------------\n", " samples: 1220\n", " evals/sample: 1" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@benchmark cholfact(A) \\ b" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\u001b[1m\u001b[36mINFO: \u001b[39m\u001b[22m\u001b[36mRecompiling stale cache file /Users/huazhou/.julia/lib/v0.6/IterativeSolvers.ji for module IterativeSolvers.\n", "\u001b[39m" ] }, { "data": { "text/plain": [ "0.022750177100322466" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# sparse solve via conjugate gradient\n", "using IterativeSolvers\n", "\n", "xcg, = cg(A, b)\n", "vecnorm(xcg - xchol)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 33.30 KiB\n", " allocs estimate: 27\n", " --------------\n", " minimum time: 31.181 μs (0.00% GC)\n", " median time: 34.950 μs (0.00% GC)\n", " mean time: 40.390 μs (8.18% GC)\n", " maximum time: 2.927 ms (94.92% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 1" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@benchmark cg(A, b)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Easy plus low rank\n", "\n", "Easy plus low rank: $\\mathbf{U} \\in \\mathbb{R}^{n \\times r}$, $\\mathbf{V} \\in \\mathbb{R}^{r \\times n}$, $r \\ll n$. Woodbury formula\n", "$$\n", "\t(\\mathbf{A} + \\mathbf{U} \\mathbf{V}^T)^{-1} = \\mathbf{A}^{-1} - \\mathbf{A}^{-1} \\mathbf{U} (\\mathbf{I}_r + \\mathbf{V}^T \\mathbf{A}^{-1} \\mathbf{U})^{-1} \\mathbf{V}^T \\mathbf{A}^{-1},\n", "$$\n", "\n", "* Keep HW2 Q2 in mind. \n", "* [`WoodburyMatrices.jl`](https://github.com/timholy/WoodburyMatrices.jl) package can be useful." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1000×1000 Symmetric{Float64,Array{Float64,2}}:\n", " 1.8571 0.513107 0.872146 … 0.764278 -0.241331 0.54921 \n", " 0.513107 4.57505 -0.636972 -1.86465 -1.92237 -1.72569 \n", " 0.872146 -0.636972 4.81387 1.99357 1.99337 3.66327 \n", " -0.516414 -0.996711 -0.0919924 0.262832 0.612402 0.621834\n", " 0.193686 1.68244 -0.770028 -0.723437 -1.4868 -1.32247 \n", " 1.6567 0.0634435 -0.901968 … -0.241872 -0.0356772 -0.39826 \n", " 0.553996 -0.274515 2.21265 0.219437 2.20382 2.60902 \n", " 0.402356 1.89288 -1.13032 -0.771441 -1.96862 -1.93483 \n", " -1.07744 -1.63881 1.78016 0.96551 1.7292 1.91326 \n", " -2.21617 -2.90695 -2.55971 -0.47867 0.855389 -0.933916\n", " 1.29975 0.779828 4.12459 … 1.87358 0.737112 2.84136 \n", " -0.80833 1.44882 1.67581 -0.139063 -0.107873 0.818132\n", " -2.32469 -4.83109 -2.31796 -0.0346402 2.65564 0.591075\n", " ⋮ ⋱ \n", " 0.334995 0.0914829 1.60026 0.0937123 1.40804 1.92919 \n", " 0.390722 2.36157 -1.58383 -1.92435 -1.3291 -1.88114 \n", " 1.19218 0.845478 1.9362 … -0.0890571 1.36046 2.01013 \n", " 0.915526 0.889395 -0.606443 -0.428052 -0.817555 -1.2017 \n", " -0.778472 2.1444 1.50696 0.00689644 -1.28104 -0.141234\n", " 0.275366 -0.25866 0.416593 0.481534 0.0874531 0.316543\n", " -0.289749 -1.16146 0.550825 0.698152 0.789054 0.949917\n", " 0.439213 -0.379216 1.0951 … 0.626399 0.624574 0.8568 \n", " -0.592534 -0.235847 -1.11586 -0.601497 -0.0651787 -0.573737\n", " 0.764278 -1.86465 1.99357 3.21548 0.932773 1.97505 \n", " -0.241331 -1.92237 1.99337 0.932773 3.43991 2.8539 \n", " 0.54921 -1.72569 3.66327 1.97505 2.8539 4.85234 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using BenchmarkTools, WoodburyMatrices\n", "\n", "srand(280)\n", "n = 1000\n", "r = 5\n", "\n", "A = Diagonal(rand(n))\n", "B = randn(n, r)\n", "D = Diagonal(rand(r))\n", "b = randn(n)\n", "# W = A + B * D * B'\n", "W = SymWoodbury(A, B, D)\n", "Wfull = Symmetric(full(W))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.64 MiB\n", " allocs estimate: 9\n", " --------------\n", " minimum time: 8.913 ms (0.00% GC)\n", " median time: 9.119 ms (0.00% GC)\n", " mean time: 9.485 ms (6.82% GC)\n", " maximum time: 12.387 ms (15.54% GC)\n", " --------------\n", " samples: 527\n", " evals/sample: 1" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# solve via Cholesky\n", "@benchmark cholfact(Wfull) \\ b" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 76.16 KiB\n", " allocs estimate: 26\n", " --------------\n", " minimum time: 19.164 μs (0.00% GC)\n", " median time: 24.648 μs (0.00% GC)\n", " mean time: 34.932 μs (24.29% GC)\n", " maximum time: 4.617 ms (98.44% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 1" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# solve using Woodbury formula\n", "@benchmark W \\ b" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 7.94 KiB\n", " allocs estimate: 1\n", " --------------\n", " minimum time: 193.856 μs (0.00% GC)\n", " median time: 203.398 μs (0.00% GC)\n", " mean time: 205.489 μs (0.00% GC)\n", " maximum time: 531.061 μs (0.00% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 1" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# multiplication without using Woodbury structure\n", "@benchmark Wfull * b" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "BenchmarkTools.Trial: \n", " memory estimate: 24.06 KiB\n", " allocs estimate: 5\n", " --------------\n", " minimum time: 4.910 μs (0.00% GC)\n", " median time: 6.397 μs (0.00% GC)\n", " mean time: 8.787 μs (20.79% GC)\n", " maximum time: 374.038 μs (96.19% GC)\n", " --------------\n", " samples: 10000\n", " evals/sample: 7" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# multiplication using Woodbury structure\n", "@benchmark W * b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Easy plus border\n", "\n", "Easy plus border: For $\\mathbf{A}$ pd and $\\mathbf{V}$ full row rank,\n", "$$\n", "\t\\begin{pmatrix}\n", "\t\\mathbf{A} & \\mathbf{V}^T \\\\\n", "\t\\mathbf{V} & \\mathbf{0}\n", "\t\\end{pmatrix}^{-1} = \\begin{pmatrix}\n", "\t\\mathbf{A}^{-1} - \\mathbf{A}^{-1} \\mathbf{V}^T (\\mathbf{V} \\mathbf{A}^{-1} \\mathbf{V}^T)^{-1} \\mathbf{V} \\mathbf{A}^{-1} & \\mathbf{A}^{-1} \\mathbf{V}^T (\\mathbf{V} \\mathbf{A}^{-1} \\mathbf{V}^T)^{-1} \\\\\n", "\t(\\mathbf{V} \\mathbf{A}^{-1} \\mathbf{V}^T)^{-1} \\mathbf{V} \\mathbf{A}^{-1} & - (\\mathbf{V} \\mathbf{A}^{-1} \\mathbf{V}^T)^{-1}\n", "\t\\end{pmatrix}.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Orthogonal matrix\n", "\n", "Orthogonal $\\mathbf{A}$: $n^2$ flops **at most**. Permutation matrix, Householder matrix, Jacobi matrix, ... take less." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Toeplitz matrix\n", "\n", "Toeplitz systems:\n", "$$\n", "\t\\mathbf{T} = \\begin{pmatrix}\n", "\tr_0 & r_1 & r_2 & r_3 \\\\\n", "\tr_{-1} & r_0 & r_1 & r_2 \\\\\n", "\tr_{-2} & r_{-1} & r_0 & r_1 \\\\\n", "\tr_{-3} & r_{-2} & r_{-1} & r_0\n", "\t\\end{pmatrix}.\n", "$$\n", "$\\mathbf{T} \\mathbf{x} = \\mathbf{b}$, where $\\mathbf{T}$ is pd and Toeplitz, can be solved in $O(n^2)$ flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general $\\mathbf{b}$), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.\n", "\n", "* [`ToeplitzMatrices.jl`](https://github.com/JuliaMatrices/ToeplitzMatrices.jl) package can be useful." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Circulant matrix\n", "\n", "Circulant systems: Toeplitz matrix with wraparound\n", "$$\n", "\tC(\\mathbf{z}) = \\begin{pmatrix}\n", "\tz_0 & z_4 & z_3 & z_2 & z_1 \\\\\n", "\tz_1 & z_0 & z_4 & z_3 & z_2 \\\\\n", "\tz_2 & z_1 & z_0 & z_4 & z_3 \\\\\n", "\tz_3 & z_2 & z_1 & z_0 & z_4 \\\\\n", "\tz_4 & z_3 & z_2 & z_1 & z_0\n", "\t\\end{pmatrix},\n", "$$\n", "FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Vandermonde matrix\n", "\n", "Vandermonde matrix: such as in interpolation and approximation problems\n", "$$\n", "\t\\mathbf{V}(x_0,\\ldots,x_n) = \\begin{pmatrix}\n", "\t1 & 1 & \\cdots & 1 \\\\\n", "\tx_0 & x_1 & \\cdots & x_n \\\\\n", "\t\\vdots & \\vdots & & \\vdots \\\\\n", "\tx_0^n & x_1^n & \\cdots & x_n^n\n", "\t\\end{pmatrix}.\n", "$$\n", "$\\mathbf{V} \\mathbf{x} = \\mathbf{b}$ or $\\mathbf{V}^T \\mathbf{x} = \\mathbf{b}$ can be solved in $O(n^2)$ flops." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cauchy-like matrix\n", "\n", "Cauchy-like matrices:\n", "$$\n", "\t\\Omega \\mathbf{A} - \\mathbf{A} \\Lambda = \\mathbf{R} \\mathbf{S}^T,\n", "$$\n", "where $\\Omega = \\text{diag}(\\omega_1,\\ldots,\\omega_n)$ and $\\Lambda = \\text{diag}(\\lambda_1,\\ldots, \\lambda_n)$. $O(n)$ flops for LU and QR." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Structured-rank matrix\n", "\n", "Structured-rank problems: semiseparable matrices (LU and QR takes $O(n)$ flops), quasiseparable matrices, ..." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Julia Version 0.6.2\n", "Commit d386e40c17 (2017-12-13 18:08 UTC)\n", "Platform Info:\n", " OS: macOS (x86_64-apple-darwin14.5.0)\n", " CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz\n", " WORD_SIZE: 64\n", " BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)\n", " LAPACK: libopenblas64_\n", " LIBM: libopenlibm\n", " LLVM: libLLVM-3.9.1 (ORCJIT, skylake)\n" ] } ], "source": [ "versioninfo()" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.6.2", "language": "julia", "name": "julia-0.6" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.6.2" }, "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "30px", "width": "252px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 4, "toc_cell": true, "toc_section_display": "block", "toc_window_display": true, "widenNotebook": false } }, "nbformat": 4, "nbformat_minor": 2 }