{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Julia Version 1.6.2\n",
"Commit 1b93d53fc4 (2021-07-14 15:36 UTC)\n",
"Platform Info:\n",
" OS: macOS (x86_64-apple-darwin18.7.0)\n",
" CPU: Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz\n",
" WORD_SIZE: 64\n",
" LIBM: libopenlibm\n",
" LLVM: libLLVM-11.0.1 (ORCJIT, skylake)\n"
]
}
],
"source": [
"versioninfo()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[32m\u001b[1m Activating\u001b[22m\u001b[39m environment at `~/Dropbox/class/M1399.000200/2021/M1399_000200-2021fall/Project.toml`\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[32m\u001b[1m Status\u001b[22m\u001b[39m `~/Dropbox/class/M1399.000200/2021/M1399_000200-2021fall/Project.toml`\n",
" \u001b[90m [7d9fca2a] \u001b[39mArpack v0.5.3\n",
" \u001b[90m [6e4b80f9] \u001b[39mBenchmarkTools v1.1.4\n",
" \u001b[90m [1e616198] \u001b[39mCOSMO v0.8.1\n",
" \u001b[90m [f65535da] \u001b[39mConvex v0.14.13\n",
" \u001b[90m [a93c6f00] \u001b[39mDataFrames v1.2.2\n",
" \u001b[90m [31a5f54b] \u001b[39mDebugger v0.6.8\n",
" \u001b[90m [31c24e10] \u001b[39mDistributions v0.24.18\n",
" \u001b[90m [e2685f51] \u001b[39mECOS v0.12.3\n",
" \u001b[90m [f6369f11] \u001b[39mForwardDiff v0.10.19\n",
" \u001b[90m [28b8d3ca] \u001b[39mGR v0.58.1\n",
" \u001b[90m [c91e804a] \u001b[39mGadfly v1.3.3\n",
" \u001b[90m [bd48cda9] \u001b[39mGraphRecipes v0.5.7\n",
" \u001b[90m [82e4d734] \u001b[39mImageIO v0.5.8\n",
" \u001b[90m [6218d12a] \u001b[39mImageMagick v1.2.1\n",
" \u001b[90m [916415d5] \u001b[39mImages v0.24.1\n",
" \u001b[90m [b6b21f68] \u001b[39mIpopt v0.7.0\n",
" \u001b[90m [42fd0dbc] \u001b[39mIterativeSolvers v0.9.1\n",
" \u001b[90m [4076af6c] \u001b[39mJuMP v0.21.9\n",
" \u001b[90m [b51810bb] \u001b[39mMatrixDepot v1.0.4\n",
" \u001b[90m [1ec41992] \u001b[39mMosekTools v0.9.4\n",
" \u001b[90m [76087f3c] \u001b[39mNLopt v0.6.3\n",
" \u001b[90m [47be7bcc] \u001b[39mORCA v0.5.0\n",
" \u001b[90m [a03496cd] \u001b[39mPlotlyBase v0.4.3\n",
" \u001b[90m [f0f68f2c] \u001b[39mPlotlyJS v0.15.0\n",
" \u001b[90m [91a5bcdd] \u001b[39mPlots v1.21.2\n",
" \u001b[90m [438e738f] \u001b[39mPyCall v1.92.3\n",
" \u001b[90m [d330b81b] \u001b[39mPyPlot v2.9.0\n",
" \u001b[90m [dca85d43] \u001b[39mQuartzImageIO v0.7.3\n",
" \u001b[90m [6f49c342] \u001b[39mRCall v0.13.12\n",
" \u001b[90m [ce6b1742] \u001b[39mRDatasets v0.7.5\n",
" \u001b[90m [c946c3f1] \u001b[39mSCS v0.7.1\n",
" \u001b[90m [276daf66] \u001b[39mSpecialFunctions v1.6.1\n",
" \u001b[90m [2913bbd2] \u001b[39mStatsBase v0.33.10\n",
" \u001b[90m [b8865327] \u001b[39mUnicodePlots v2.0.1\n",
" \u001b[90m [0f1e0344] \u001b[39mWebIO v0.8.15\n",
" \u001b[90m [8f399da3] \u001b[39mLibdl\n",
" \u001b[90m [2f01184e] \u001b[39mSparseArrays\n",
" \u001b[90m [10745b16] \u001b[39mStatistics\n"
]
}
],
"source": [
"using Pkg\n",
"Pkg.activate(\"../..\")\n",
"Pkg.status()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Numerical linear algebra: introduction"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Topics in numerical linear algebra: \n",
" - BLAS \n",
" - solve linear equations $\\mathbf{A} \\mathbf{x} = \\mathbf{b}$\n",
" - regression computations $\\mathbf{X}^T \\mathbf{X} \\beta = \\mathbf{X}^T \\mathbf{y}$ \n",
" - eigen-problems $\\mathbf{A} \\mathbf{x} = \\lambda \\mathbf{x}$ \n",
" - generalized eigen-problems $\\mathbf{A} \\mathbf{x} = \\lambda \\mathbf{B} \\mathbf{x}$ \n",
" - singular value decompositions $\\mathbf{A} = \\mathbf{U} \\Sigma \\mathbf{V}^T$ \n",
" - iterative methods for numerical linear algebra \n",
"\n",
"* Except for the iterative methods, most of these numerical linear algebra tasks are implemented in the BLAS and LAPACK libraries. They form the **building blocks** of most statistical computing tasks (optimization, MCMC).\n",
"\n",
"* Our major **goal** (or learning objectives) is to \n",
" 1. know the complexity (flop count) of each task\n",
" 2. be familiar with the BLAS and LAPACK functions (what they do) \n",
" 3. do **not** re-invent the wheel by implementing these dense linear algebra subroutines by yourself \n",
" 4. understand the need for iterative methods \n",
" 5. apply appropriate numerical algebra tools to various statistical problems \n",
"\n",
"* All high-level languages (R, Matlab, Julia) call BLAS and LAPACK for numerical linear algebra. \n",
" - Julia offers more flexibility by exposing interfaces to many BLAS/LAPACK subroutines directly. See [documentation](https://docs.julialang.org/en/v1.1/stdlib/LinearAlgebra/#BLAS-Functions-1)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## BLAS\n",
"\n",
"* BLAS stands for _basic linear algebra subprograms_. \n",
"\n",
"* See [netlib](http://www.netlib.org/blas/) for a complete list of standardized BLAS functions.\n",
"\n",
"* There are many implementations of BLAS. \n",
" - [Netlib](http://www.netlib.org/blas/) provides a reference implementation. \n",
" - Matlab uses Intel's [Math Kernel Library (MKL)](https://software.intel.com/en-us/node/520724). **MKL implementation is the gold standard on market.** It is not open source but the compiled library is free for Linux and MacOS. \n",
" - Julia uses [OpenBLAS](http://www.openblas.net). OpenBLAS is a fast, actively maintained fork of [GotoBLAS](https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2), which has an [interesting history](https://www.nytimes.com/2005/11/28/technology/writing-the-fastest-code-by-hand-for-fun-a-human-computer-keeps.html).\n",
" - Julia can be compiled to use the MKL from the source.\n",
"\n",
"* There are 3 levels of BLAS functions.\n",
" - [Level 1](http://www.netlib.org/blas/#_level_1): vector-vector operation\n",
" - [Level 2](http://www.netlib.org/blas/#_level_2): matrix-vector operation\n",
" - [Level 3](http://www.netlib.org/blas/#_level_3): matrix-matrix operation\n",
"\n",
"| Level | Example Operation | Name | Dimension | Flops | \n",
"|-------|----------------------------------------|-------------|-------------------------------------------|-------|\n",
"| 1 | $\\alpha \\gets \\mathbf{x}^T \\mathbf{y}$ | dot product | $\\mathbf{x}, \\mathbf{y} \\in \\mathbb{R}^n$ | $2n$ | \n",
"| 1 | $\\mathbf{y} \\gets \\mathbf{y} + \\alpha \\mathbf{x}$ | axpy | $\\alpha \\in \\mathbb{R}$, $\\mathbf{x}, \\mathbf{y} \\in \\mathbb{R}^n$ | $2n$ | \n",
"| 2 | $\\mathbf{y} \\gets \\mathbf{y} + \\mathbf{A} \\mathbf{x}$ | gaxpy | $\\mathbf{A} \\in \\mathbb{R}^{m \\times n}$, $\\mathbf{x} \\in \\mathbb{R}^n$, $\\mathbf{y} \\in \\mathbb{R}^m$ | $2mn$ |\n",
"| 2 | $\\mathbf{A} \\gets \\mathbf{A} + \\mathbf{y} \\mathbf{x}^T$ | rank one update | $\\mathbf{A} \\in \\mathbb{R}^{m \\times n}$, $\\mathbf{x} \\in \\mathbb{R}^n$, $\\mathbf{y} \\in \\mathbb{R}^m$ | $2mn$ |\n",
"| 3 | $\\mathbf{C} \\gets \\mathbf{C} + \\mathbf{A} \\mathbf{B}$ | matrix multiplication | $\\mathbf{A} \\in \\mathbb{R}^{m \\times p}$, $\\mathbf{B} \\in \\mathbb{R}^{p \\times n}$, $\\mathbf{C} \\in \\mathbb{R}^{m \\times n}$ | $2mnp$ |\n",
"\n",
"* Typical BLAS functions support single precision (S), double precision (D), complex (C), and double complex (Z). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Examples\n",
"\n",
"> **The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.**\n",
"\\\n",
" -- James Gentle, *Matrix Algebra*, Springer, New York (2007).\n",
"\n",
"Some operations _appear_ as level-3 but indeed are level-2. \n",
"\n",
"### Example 1\n",
"\n",
"A common operation in statistics is column scaling or row scaling\n",
"$$\n",
"\\begin{eqnarray*}\n",
" \\mathbf{A} &=& \\mathbf{A} \\mathbf{D} \\quad \\text{(column scaling)} \\\\\n",
" \\mathbf{A} &=& \\mathbf{D} \\mathbf{A} \\quad \\text{(row scaling)},\n",
"\\end{eqnarray*}\n",
"$$\n",
"where $\\mathbf{D}$ is diagonal. For example, in generalized linear models (GLMs), the Fisher information matrix takes the form \n",
"$$\n",
"\\mathbf{X}^T \\mathbf{W} \\mathbf{X},\n",
"$$\n",
"where $\\mathbf{W}$ is a diagonal matrix with observation weights on diagonal. \n",
"\n",
" Column and row scalings are essentially level-2 operations!"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2000×2000 Diagonal{Float64, Vector{Float64}}:\n",
" 0.140972 ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n",
" ⋅ 0.143596 ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ 0.612494 ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ 0.0480573 ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋮ ⋱ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ 0.882415 ⋅ ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ 0.450904 ⋅ \n",
" ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.814614"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"using BenchmarkTools, LinearAlgebra, Random\n",
"\n",
"Random.seed!(123) # seed\n",
"n = 2000\n",
"A = rand(n, n) # n-by-n matrix\n",
"d = rand(n) # n vector\n",
"D = Diagonal(d) # diagonal matrix with d as diagonal"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2000×2000 Matrix{Float64}:\n",
" 0.140972 0.0 0.0 0.0 … 0.0 0.0 0.0\n",
" 0.0 0.143596 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.612494 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0480573 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" ⋮ ⋱ \n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.882415 0.0 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.450904 0.0\n",
" 0.0 0.0 0.0 0.0 0.0 0.0 0.814614"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Dfull = convert(Matrix, D) # convert to full matrix"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 39 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m 99.756 ms\u001b[22m\u001b[39m … \u001b[35m183.883 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 4.41%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m121.387 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m131.600 ms\u001b[22m\u001b[39m ± \u001b[32m 25.298 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m1.90% ± 2.80%\n",
"\n",
" \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m▁\u001b[39m▁\u001b[39m▄\u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[34m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m█\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m▆\u001b[39m▆\u001b[39m█\u001b[39m▆\u001b[39m▆\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▁\u001b[39m▆\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m█\u001b[34m▆\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[32m▁\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▆\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m \u001b[39m▁\n",
" 99.8 ms\u001b[90m Histogram: frequency by time\u001b[39m 184 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m30.52 MiB\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m2\u001b[39m."
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# this is calling BLAS routine for matrix multiplication: O(n^3) flops\n",
"# this is SLOW!\n",
"@benchmark $A * $Dfull"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 497 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m 6.324 ms\u001b[22m\u001b[39m … \u001b[35m18.680 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m 0.00% … 41.11%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m 8.060 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m10.049 ms\u001b[22m\u001b[39m ± \u001b[32m 3.227 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m20.63% ± 20.55%\n",
"\n",
" \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▄\u001b[39m█\u001b[39m▄\u001b[39m▂\u001b[34m▁\u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m▄\u001b[39m▄\u001b[39m▄\u001b[39m▃\u001b[39m▇\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[34m█\u001b[39m\u001b[39m▅\u001b[39m▄\u001b[39m▃\u001b[39m▃\u001b[39m▁\u001b[39m▂\u001b[39m▃\u001b[39m▂\u001b[39m▂\u001b[39m▃\u001b[32m▃\u001b[39m\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m▃\u001b[39m▃\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▂\u001b[39m▁\u001b[39m▂\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▃\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▅\u001b[39m▄\u001b[39m▃\u001b[39m▄\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m▁\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m \u001b[39m▃\n",
" 6.32 ms\u001b[90m Histogram: frequency by time\u001b[39m 17.2 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m30.52 MiB\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m4\u001b[39m."
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# dispatch to special method for diagonal matrix multiplication.\n",
"# columnwise scaling: O(n^2) flops\n",
"@benchmark $A * $D"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 372 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m 6.869 ms\u001b[22m\u001b[39m … \u001b[35m21.703 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m13.772 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m13.450 ms\u001b[22m\u001b[39m ± \u001b[32m 4.026 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m▃\u001b[39m▂\u001b[39m▃\u001b[39m▆\u001b[39m▅\u001b[39m▂\u001b[39m \u001b[39m▅\u001b[39m \u001b[39m▄\u001b[39m \u001b[39m \u001b[39m▁\u001b[39m▁\u001b[39m \u001b[39m \u001b[39m▂\u001b[39m▂\u001b[39m \u001b[39m \u001b[39m▂\u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[32m \u001b[39m\u001b[34m \u001b[39m\u001b[39m \u001b[39m▂\u001b[39m▃\u001b[39m \u001b[39m▃\u001b[39m█\u001b[39m▅\u001b[39m▃\u001b[39m \u001b[39m▄\u001b[39m▄\u001b[39m▃\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▃\u001b[39m▅\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m▆\u001b[39m█\u001b[39m▄\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▆\u001b[39m█\u001b[39m▅\u001b[39m█\u001b[39m▇\u001b[39m▅\u001b[39m█\u001b[39m█\u001b[39m▂\u001b[39m▆\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m▇\u001b[39m█\u001b[39m█\u001b[39m▆\u001b[39m▃\u001b[39m▄\u001b[32m▂\u001b[39m\u001b[34m▄\u001b[39m\u001b[39m▆\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▆\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▅\u001b[39m▄\u001b[39m▇\u001b[39m▆\u001b[39m▃\u001b[39m▆\u001b[39m▃\u001b[39m▅\u001b[39m▂\u001b[39m▅\u001b[39m▄\u001b[39m█\u001b[39m█\u001b[39m▆\u001b[39m▃\u001b[39m▆\u001b[39m▄\u001b[39m \u001b[39m▄\n",
" 6.87 ms\u001b[90m Histogram: frequency by time\u001b[39m 20.9 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m96 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m2\u001b[39m."
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# in-place: avoid allocating space for result\n",
"# rmul!: compute matrix-matrix product AB, overwriting A, and return the result.\n",
"@benchmark rmul!(A, D)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:** In R, `diag(d)` will create a full (dense) matrix. Be cautious using `diag` function: do we really need a full diagonal matrix?"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RObject{RealSxp}\n",
" [,1] [,2] [,3] [,4] [,5]\n",
"[1,] 0.9028847 0.0000000 0.0000000 0.0000000 0.00000000\n",
"[2,] 0.0000000 0.2051186 0.0000000 0.0000000 0.00000000\n",
"[3,] 0.0000000 0.0000000 0.8915648 0.0000000 0.00000000\n",
"[4,] 0.0000000 0.0000000 0.0000000 0.4390209 0.00000000\n",
"[5,] 0.0000000 0.0000000 0.0000000 0.0000000 0.05749282\n"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"using RCall\n",
"\n",
"R\"\"\"\n",
"d <- runif(5)\n",
"diag(d)\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example 2\n",
"\n",
"Inner product between two matrices $\\mathbf{A}, \\mathbf{B} \\in \\mathbb{R}^{m \\times n}$ is often written as \n",
"\n",
"$$\n",
" \\langle \\mathbf{A}, \\mathbf{B} \\rangle \n",
" = \\sum_{i,j} \\mathbf{A}_{ij}\\mathbf{B}_{ij}\n",
" = \\text{trace}(\\mathbf{A}^T \\mathbf{B}) \n",
" = \\text{trace}(\\mathbf{B} \\mathbf{A}^T) \n",
" = \\text{trace}(\\mathbf{A} \\mathbf{B}^T)\n",
" = \\text{trace}(\\mathbf{B}^T \\mathbf{A}).\n",
"$$\n",
"\n",
"They appear as level-3 operation (matrix multiplication with $O(m^2n)$ or $O(mn^2)$ flops)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 32 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m104.761 ms\u001b[22m\u001b[39m … \u001b[35m210.264 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 6.70%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m174.214 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m158.338 ms\u001b[22m\u001b[39m ± \u001b[32m 36.217 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m1.80% ± 3.03%\n",
"\n",
" \u001b[39m▄\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[34m \u001b[39m\u001b[39m█\u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m█\u001b[39m▆\u001b[39m▆\u001b[39m▁\u001b[39m▆\u001b[39m▆\u001b[39m▁\u001b[39m▆\u001b[39m█\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[32m▁\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m█\u001b[34m▆\u001b[39m\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m█\u001b[39m▆\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m \u001b[39m▁\n",
" 105 ms\u001b[90m Histogram: frequency by time\u001b[39m 210 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m30.52 MiB\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m2\u001b[39m."
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Random.seed!(123)\n",
"n = 2000\n",
"A, B = randn(n, n), randn(n, n)\n",
"\n",
"# slow way to evaluate this thing\n",
"@benchmark tr(transpose($A) * $B)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But $\\langle \\mathbf{A}, \\mathbf{B} \\rangle = \\langle \\text{vec}(\\mathbf{A}), \\text{vec}(\\mathbf{B}) \\rangle$. The latter is level-2 operation with $O(mn)$ flops."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 3891 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m1.213 ms\u001b[22m\u001b[39m … \u001b[35m 3.049 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m1.239 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m1.271 ms\u001b[22m\u001b[39m ± \u001b[32m78.007 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m▅\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m▆\u001b[34m▅\u001b[39m\u001b[39m▅\u001b[39m▄\u001b[39m▄\u001b[39m▄\u001b[39m▄\u001b[32m▄\u001b[39m\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▁\u001b[39m▂\u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▂\n",
" \u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[34m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[32m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▆\u001b[39m█\u001b[39m▇\u001b[39m▆\u001b[39m▇\u001b[39m▆\u001b[39m█\u001b[39m▇\u001b[39m▆\u001b[39m▇\u001b[39m▆\u001b[39m▇\u001b[39m▇\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▆\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m▆\u001b[39m█\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▆\u001b[39m▆\u001b[39m▇\u001b[39m▆\u001b[39m \u001b[39m█\n",
" 1.21 ms\u001b[90m \u001b[39m\u001b[90mHistogram: \u001b[39m\u001b[90m\u001b[1mlog(\u001b[22m\u001b[39m\u001b[90mfrequency\u001b[39m\u001b[90m\u001b[1m)\u001b[22m\u001b[39m\u001b[90m by time\u001b[39m 1.52 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@benchmark dot($A, $B)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Example 3\n",
"\n",
"Similarly $\\text{diag}(\\mathbf{A}^T \\mathbf{B})=\\text{diag}(\\sum_{j}\\mathbf{A}_{ij}\\mathbf{B}_{ij})$ can be calculated in $O(mn)$ flops."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 38 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m105.100 ms\u001b[22m\u001b[39m … \u001b[35m205.494 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 6.87%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m127.099 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m133.460 ms\u001b[22m\u001b[39m ± \u001b[32m 26.864 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m2.20% ± 3.58%\n",
"\n",
" \u001b[39m█\u001b[39m \u001b[39m \u001b[39m \u001b[39m█\u001b[39m▄\u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[34m \u001b[39m\u001b[39m \u001b[39m▄\u001b[39m \u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m█\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[34m▆\u001b[39m\u001b[39m▆\u001b[39m█\u001b[39m▁\u001b[32m▁\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m \u001b[39m▁\n",
" 105 ms\u001b[90m Histogram: frequency by time\u001b[39m 205 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m30.53 MiB\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m3\u001b[39m."
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# slow way to evaluate this thing\n",
"@benchmark diag(transpose($A) * $B)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 399 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m 7.757 ms\u001b[22m\u001b[39m … \u001b[35m24.467 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m 0.00% … 49.34%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m10.987 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m12.525 ms\u001b[22m\u001b[39m ± \u001b[32m 4.776 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m19.39% ± 21.20%\n",
"\n",
" \u001b[39m \u001b[39m█\u001b[39m▄\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[34m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m▅\u001b[39m█\u001b[39m█\u001b[39m▄\u001b[39m▄\u001b[39m▄\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▄\u001b[39m▃\u001b[39m▄\u001b[34m▄\u001b[39m\u001b[39m▃\u001b[39m▃\u001b[39m▄\u001b[39m▂\u001b[39m▄\u001b[32m▆\u001b[39m\u001b[39m▄\u001b[39m▄\u001b[39m▄\u001b[39m▂\u001b[39m▃\u001b[39m▂\u001b[39m▃\u001b[39m▂\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▂\u001b[39m▄\u001b[39m▅\u001b[39m▄\u001b[39m▃\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▃\u001b[39m▁\u001b[39m▁\u001b[39m▂\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m▁\u001b[39m▃\u001b[39m▂\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m \u001b[39m▃\n",
" 7.76 ms\u001b[90m Histogram: frequency by time\u001b[39m 23.7 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m30.53 MiB\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m5\u001b[39m."
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# smarter\n",
"@benchmark Diagonal(vec(sum($A .* $B, dims=1)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To get rid of allocation of intermediate array at all, we can just write a double loop or use `dot` function."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 1892 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m2.146 ms\u001b[22m\u001b[39m … \u001b[35m 5.498 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m2.453 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m2.629 ms\u001b[22m\u001b[39m ± \u001b[32m447.989 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m \u001b[39m▁\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m█\u001b[39m█\u001b[34m▆\u001b[39m\u001b[39m▅\u001b[39m▂\u001b[39m▂\u001b[32m▂\u001b[39m\u001b[39m▂\u001b[39m▁\u001b[39m▂\u001b[39m▁\u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m▆\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[34m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[32m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m█\u001b[39m▆\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m█\u001b[39m▇\u001b[39m▆\u001b[39m▅\u001b[39m▆\u001b[39m▅\u001b[39m▇\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▃\u001b[39m▄\u001b[39m▆\u001b[39m▄\u001b[39m▅\u001b[39m▄\u001b[39m▃\u001b[39m▅\u001b[39m▃\u001b[39m▄\u001b[39m▆\u001b[39m▃\u001b[39m▄\u001b[39m▃\u001b[39m▄\u001b[39m▃\u001b[39m▃\u001b[39m▅\u001b[39m \u001b[39m█\n",
" 2.15 ms\u001b[90m \u001b[39m\u001b[90mHistogram: \u001b[39m\u001b[90m\u001b[1mlog(\u001b[22m\u001b[39m\u001b[90mfrequency\u001b[39m\u001b[90m\u001b[1m)\u001b[22m\u001b[39m\u001b[90m by time\u001b[39m 4.5 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m15.75 KiB\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m1\u001b[39m."
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"function diag_matmul(A, B)\n",
" m, n = size(A)\n",
" @assert size(B) == (m, n) \"A and B should have same size\"\n",
" # @views: Convert every array-slicing operation in the given expression to return a view.\n",
" # See below for avoiding memory allocation\n",
" @views d = [dot(A[:, j], B[:, j]) for j in 1:n]\n",
"# d = zeros(eltype(A), n)\n",
"# for j in 1:n, i in 1:m\n",
"# d[j] += A[i, j] * B[i, j]\n",
"# end\n",
" Diagonal(d)\n",
"end\n",
"\n",
"@benchmark diag_matmul($A, $B)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Memory hierarchy and level-3 fraction\n",
"\n",
"> **Key to high performance is effective use of memory hierarchy. True on all architectures.**\n",
"\n",
"* Flop count is not the sole determinant of algorithm efficiency. Another important factor is data movement through the memory hierarchy.\n",
"\n",
"\n",
"\n",
" \n",
"\n",
"\n",
"\n",
"* Numbers everyone should know\n",
"\n",
"| Operation | Time |\n",
"|-------------------------------------|----------------|\n",
"| L1 cache reference | 0.5 ns |\n",
"| L2 cache reference | 7 ns |\n",
"| Main memory reference | 100 ns |\n",
"| Read 1 MB sequentially from memory | 250,000 ns |\n",
"| Read 1 MB sequentially from SSD | 1,000,000 ns | \n",
"| Read 1 MB sequentially from disk | 20,000,000 ns |\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
" Source: \n",
"\n",
"* For example, Xeon X5650 CPU has a theoretical throughput of 128 DP GFLOPS but a max memory bandwidth of 32GB/s. \n",
"\n",
"* Can we keep CPU cores busy with enough deliveries of matrix data and ship the results to memory fast enough to avoid backlog? \n",
"Answer: use **high-level BLAS** as much as possible.\n",
"\n",
"| BLAS | Dimension | Mem. Refs. | Flops | Ratio |\n",
"|--------------------------------|------------------------------------------------------------|------------|--------|-------|\n",
"| Level 1: $\\mathbf{y} \\gets \\mathbf{y} + \\alpha \\mathbf{x}$ | $\\mathbf{x}, \\mathbf{y} \\in \\mathbb{R}^n$ | $3n$ | $2n$ | 3:2 |\n",
"| Level 2: $\\mathbf{y} \\gets \\mathbf{y} + \\mathbf{A} \\mathbf{x}$ | $\\mathbf{x}, \\mathbf{y} \\in \\mathbb{R}^n$, $\\mathbf{A} \\in \\mathbb{R}^{n \\times n}$ | $n^2$ | $2n^2$ | 1:2 |\n",
"| Level 3: $\\mathbf{C} \\gets \\mathbf{C} + \\mathbf{A} \\mathbf{B}$ | $\\mathbf{A}, \\mathbf{B}, \\mathbf{C} \\in\\mathbb{R}^{n \\times n}$ | $4n^2$ | $2n^3$ | 2:n | \n",
"\n",
"* Higher level BLAS (3 or 2) make more effective use of arithmetic logic units (ALU) by keeping them busy. \n",
"See [Dongarra slides](https://www.samsi.info/wp-content/uploads/2017/02/SAMSI-0217_Dongarra.pdf).\n",
"\n",
"\n",
"\n",
"* A distinction between LAPACK and LINPACK (older version of R uses LINPACK) is that LAPACK makes use of higher level BLAS as much as possible (usually by smart partitioning) to increase the so-called **level-3 fraction**.\n",
"\n",
"* To appreciate the efforts in an optimized BLAS implementation such as OpenBLAS (evolved from GotoBLAS), see this [video](https://youtu.be/JzNpKDW07rw). The bottom line is \n",
"\n",
"> **Get familiar with (good implementations of) BLAS/LAPACK and use them as much as possible.**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Effect of data layout\n",
"\n",
"* Data layout in memory affects algorithmic efficiency too. It is much faster to move chunks of data in memory than retrieving/writing scattered data.\n",
"\n",
"* Storage mode: **column-major** (Fortran, Matlab, R, Julia) vs **row-major** (C/C++).\n",
"\n",
"* **Cache line** is the minimum amount of cache which can be loaded and stored to memory.\n",
" - x86 CPUs: 64 bytes \n",
" - ARM CPUs: 32 bytes\n",
"\n",
"\n",
"\n",
"Source: \n",
"\n",
"* Accessing *column-major* stored matrix by *rows* ($ij$ looping) causes lots of **cache misses**.\n",
"\n",
"* Take matrix multiplication as an example \n",
"$$ \n",
"\\mathbf{C} \\gets \\mathbf{C} + \\mathbf{A} \\mathbf{B}, \\quad \\mathbf{A} \\in \\mathbb{R}^{m \\times p}, \\mathbf{B} \\in \\mathbb{R}^{p \\times n}, \\mathbf{C} \\in \\mathbb{R}^{m \\times n}.\n",
"$$\n",
"Assume the storage is column-major, such as in Julia. There are 6 variants of the algorithms according to the order in the triple loops. \n",
" - `jki` or `kji` looping:\n",
" ```julia\n",
" # inner most loop\n",
" for i = 1:m\n",
" C[i, j] = C[i, j] + A[i, k] * B[k, j]\n",
" end\n",
" ``` \n",
" - `ikj` or `kij` looping:\n",
" ```julia\n",
" # inner most loop \n",
" for j = 1:n\n",
" C[i, j] = C[i, j] + A[i, k] * B[k, j]\n",
" end\n",
" ``` \n",
" - `ijk` or `jik` looping:\n",
" ```julia\n",
" # inner most loop \n",
" for k = 1:p\n",
" C[i, j] = C[i, j] + A[i, k] * B[k, j]\n",
" end\n",
" ```\n",
"* We pay attention to the innermost loop, where the vector calculation occurs. The associated **stride** when accessing the three matrices in memory (assuming column-major storage) is \n",
"\n",
"| Variant | A Stride | B Stride | C Stride |\n",
"|----------------|----------|----------|----------|\n",
"| `jki` or `kji` | Unit | 0 | Unit |\n",
"| `ikj` or `kij` | 0 | Non-Unit | Non-Unit |\n",
"| `ijk` or `jik` | Non-Unit | Unit | 0 | \n",
"\n",
"* Apparently the variants `jki` or `kji` are preferred."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
" matmul_by_loop!(A, B, C, order)\n",
"\n",
"Overwrite `C` by `A * B`. `order` indicates the looping order for triple loop.\n",
"\"\"\"\n",
"function matmul_by_loop!(A::Matrix, B::Matrix, C::Matrix, order::String)\n",
" \n",
" m = size(A, 1)\n",
" p = size(A, 2)\n",
" n = size(B, 2)\n",
" fill!(C, 0)\n",
" \n",
" if order == \"jki\"\n",
" for j = 1:n, k = 1:p, i = 1:m\n",
" C[i, j] += A[i, k] * B[k, j]\n",
" end\n",
" end\n",
"\n",
" if order == \"kji\"\n",
" for k = 1:p, j = 1:n, i = 1:m\n",
" C[i, j] += A[i, k] * B[k, j]\n",
" end\n",
" end\n",
" \n",
" if order == \"ikj\"\n",
" for i = 1:m, k = 1:p, j = 1:n\n",
" C[i, j] += A[i, k] * B[k, j]\n",
" end\n",
" end\n",
"\n",
" if order == \"kij\"\n",
" for k = 1:p, i = 1:m, j = 1:n\n",
" C[i, j] += A[i, k] * B[k, j]\n",
" end\n",
" end\n",
" \n",
" if order == \"ijk\"\n",
" for i = 1:m, j = 1:n, k = 1:p\n",
" C[i, j] += A[i, k] * B[k, j]\n",
" end\n",
" end\n",
" \n",
" if order == \"jik\"\n",
" for j = 1:n, i = 1:m, k = 1:p\n",
" C[i, j] += A[i, k] * B[k, j]\n",
" end\n",
" end\n",
" \n",
"end\n",
"\n",
"using Random\n",
"\n",
"Random.seed!(123)\n",
"m, p, n = 2000, 100, 2000\n",
"A = rand(m, p)\n",
"B = rand(p, n)\n",
"C = zeros(m, n);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* `jki` and `kji` looping:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 12 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m409.931 ms\u001b[22m\u001b[39m … \u001b[35m548.997 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m425.141 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m444.185 ms\u001b[22m\u001b[39m ± \u001b[32m 46.687 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m█\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[34m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m█\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[34m▁\u001b[39m\u001b[39m▆\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[32m▁\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▆\u001b[39m \u001b[39m▁\n",
" 410 ms\u001b[90m Histogram: frequency by time\u001b[39m 549 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"using BenchmarkTools\n",
"\n",
"@benchmark matmul_by_loop!($A, $B, $C, \"jki\")"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 8 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m674.248 ms\u001b[22m\u001b[39m … \u001b[35m729.114 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m714.277 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m712.280 ms\u001b[22m\u001b[39m ± \u001b[32m 16.689 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[32m▁\u001b[39m\u001b[39m \u001b[34m█\u001b[39m\u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \n",
" \u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▁\u001b[32m█\u001b[39m\u001b[39m▁\u001b[34m█\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m \u001b[39m▁\n",
" 674 ms\u001b[90m Histogram: frequency by time\u001b[39m 729 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@benchmark matmul_by_loop!($A, $B, $C, \"kji\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* `ikj` and `kij` looping:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 3 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m2.321 s\u001b[22m\u001b[39m … \u001b[35m 2.386 s\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m2.375 s \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m2.361 s\u001b[22m\u001b[39m ± \u001b[32m34.822 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[34m█\u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m█\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m█\u001b[39m \u001b[39m \n",
" \u001b[34m█\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[32m▁\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m \u001b[39m▁\n",
" 2.32 s\u001b[90m Histogram: frequency by time\u001b[39m 2.39 s \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@benchmark matmul_by_loop!($A, $B, $C, \"ikj\")"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 3 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m2.333 s\u001b[22m\u001b[39m … \u001b[35m 2.348 s\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m2.344 s \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m2.342 s\u001b[22m\u001b[39m ± \u001b[32m7.773 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[34m█\u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m█\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m█\u001b[39m \u001b[39m \n",
" \u001b[34m█\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[32m▁\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m \u001b[39m▁\n",
" 2.33 s\u001b[90m Histogram: frequency by time\u001b[39m 2.35 s \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@benchmark matmul_by_loop!($A, $B, $C, \"kij\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* `ijk` and `jik` looping:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 6 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m889.913 ms\u001b[22m\u001b[39m … \u001b[35m 1.017 s\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m930.412 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m940.485 ms\u001b[22m\u001b[39m ± \u001b[32m42.680 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[34m█\u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \n",
" \u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[34m█\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[32m▁\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m \u001b[39m▁\n",
" 890 ms\u001b[90m Histogram: frequency by time\u001b[39m 1.02 s \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@benchmark matmul_by_loop!($A, $B, $C, \"ijk\")"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 6 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m886.608 ms\u001b[22m\u001b[39m … \u001b[35m922.833 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m903.909 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m903.385 ms\u001b[22m\u001b[39m ± \u001b[32m 13.468 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m█\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m█\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m█\u001b[34m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m█\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m█\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m█\u001b[39m \u001b[39m \n",
" \u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[34m▁\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[32m▁\u001b[39m\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m█\u001b[39m \u001b[39m▁\n",
" 887 ms\u001b[90m Histogram: frequency by time\u001b[39m 923 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@benchmark matmul_by_loop!($A, $B, $C, \"ijk\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Julia wraps BLAS library for matrix multiplication. We see BLAS library wins the `jki` and `kji` looping by a large margin (multi-threading, Strassen algorithm, higher level-3 fraction by block outer product)."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 793 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m4.850 ms\u001b[22m\u001b[39m … \u001b[35m11.389 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m6.007 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m6.291 ms\u001b[22m\u001b[39m ± \u001b[32m 1.009 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m▂\u001b[39m▇\u001b[39m█\u001b[39m▆\u001b[39m▅\u001b[39m▄\u001b[34m▂\u001b[39m\u001b[39m▆\u001b[39m▇\u001b[39m▅\u001b[32m▁\u001b[39m\u001b[39m▃\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m▄\u001b[39m▇\u001b[39m▄\u001b[39m▄\u001b[39m▇\u001b[39m▅\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[34m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[32m█\u001b[39m\u001b[39m█\u001b[39m▇\u001b[39m▆\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m▆\u001b[39m▇\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▄\u001b[39m▇\u001b[39m▅\u001b[39m█\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▄\u001b[39m▅\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▁\u001b[39m▇\u001b[39m▆\u001b[39m▅\u001b[39m▆\u001b[39m▇\u001b[39m▆\u001b[39m▅\u001b[39m▆\u001b[39m▄\u001b[39m▆\u001b[39m \u001b[39m█\n",
" 4.85 ms\u001b[90m \u001b[39m\u001b[90mHistogram: \u001b[39m\u001b[90m\u001b[1mlog(\u001b[22m\u001b[39m\u001b[90mfrequency\u001b[39m\u001b[90m\u001b[1m)\u001b[22m\u001b[39m\u001b[90m by time\u001b[39m 9.9 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@benchmark mul!($C, $A, $B)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 790 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m5.715 ms\u001b[22m\u001b[39m … \u001b[35m 12.214 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m5.976 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m6.312 ms\u001b[22m\u001b[39m ± \u001b[32m952.165 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m█\u001b[39m▇\u001b[39m▆\u001b[39m█\u001b[34m▆\u001b[39m\u001b[39m▄\u001b[39m▂\u001b[39m▃\u001b[32m▂\u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[34m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[32m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m▇\u001b[39m▄\u001b[39m▇\u001b[39m▅\u001b[39m▇\u001b[39m█\u001b[39m▇\u001b[39m▅\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▆\u001b[39m▇\u001b[39m▆\u001b[39m▇\u001b[39m▅\u001b[39m▄\u001b[39m▅\u001b[39m▆\u001b[39m▄\u001b[39m▄\u001b[39m▁\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▆\u001b[39m▆\u001b[39m▇\u001b[39m▄\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▇\u001b[39m▄\u001b[39m▄\u001b[39m▆\u001b[39m▄\u001b[39m▇\u001b[39m▅\u001b[39m▆\u001b[39m▅\u001b[39m▄\u001b[39m▅\u001b[39m▄\u001b[39m▁\u001b[39m▁\u001b[39m▄\u001b[39m \u001b[39m█\n",
" 5.71 ms\u001b[90m \u001b[39m\u001b[90mHistogram: \u001b[39m\u001b[90m\u001b[1mlog(\u001b[22m\u001b[39m\u001b[90mfrequency\u001b[39m\u001b[90m\u001b[1m)\u001b[22m\u001b[39m\u001b[90m by time\u001b[39m 10.1 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# direct call of BLAS wrapper function\n",
"@benchmark LinearAlgebra.BLAS.gemm!('N', 'N', 1.0, $A, $B, 0.0, $C)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## BLAS in R\n",
"\n",
"* **Tip for R usesr**. Standard R distribution from CRAN uses a very out-dated BLAS/LAPACK library."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RObject{VecSxp}\n",
"Unit: milliseconds\n",
" expr min lq mean median uq max neval\n",
" `#JL`$A %*% `#JL`$B 909.2172 916.2797 940.2293 931.8024 947.0514 1230.23 100\n"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"using RCall\n",
"\n",
"R\"\"\"\n",
"library(\"microbenchmark\")\n",
"microbenchmark($A %*% $B)\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Re-building R from scratch using OpenBLAS or MKL will immediately boost linear algebra performance in R. Google `build R using MKL` to get started. Similarly we can build Julia using MKL.\n",
"\n",
"* Matlab uses MKL. Usually it's very hard to beat Matlab in terms of linear algebra (and it's expensive!)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Avoid memory allocation: some examples\n",
"\n",
"1. **Transposing** matrix is an expensive memory operation. \n",
" - In R, the command \n",
" ```R\n",
" t(A) %*% x\n",
" ```\n",
" will first transpose `A` then perform matrix multiplication, causing unnecessary memory allocation\n",
" - Julia is smart to avoid transposing matrix if possible."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"using Random, LinearAlgebra, BenchmarkTools\n",
"Random.seed!(123)\n",
"\n",
"n = 1000\n",
"A = rand(n, n)\n",
"x = rand(n);"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Transpose{Float64, Matrix{Float64}}"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"typeof(transpose(A))"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(:parent,)"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fieldnames(typeof(transpose(A)))"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Ptr{Float64} @0x00007fe7a7800000, Ptr{Float64} @0x00007fe7a7800000)"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# same data in tranpose(A) and original matrix A\n",
"pointer(transpose(A).parent), pointer(A)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 10000 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m71.486 μs\u001b[22m\u001b[39m … \u001b[35m692.121 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m87.752 μs \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m92.042 μs\u001b[22m\u001b[39m ± \u001b[32m 17.339 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m▃\u001b[39m▅\u001b[39m▇\u001b[39m▇\u001b[39m█\u001b[39m▇\u001b[34m▆\u001b[39m\u001b[39m▅\u001b[39m▂\u001b[32m▁\u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m▁\u001b[39m▁\u001b[39m▂\u001b[39m▃\u001b[39m▄\u001b[39m▆\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[34m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[32m█\u001b[39m\u001b[39m▇\u001b[39m▅\u001b[39m▅\u001b[39m▄\u001b[39m▄\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▁\u001b[39m▂\u001b[39m▂\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▂\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m \u001b[39m▃\n",
" 71.5 μs\u001b[90m Histogram: frequency by time\u001b[39m 147 μs \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m7.94 KiB\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m1\u001b[39m."
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# dispatch to BLAS\n",
"# does *not* actually transpose the matrix\n",
"@benchmark transpose($A) * $x"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 10000 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m69.739 μs\u001b[22m\u001b[39m … \u001b[35m475.971 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m83.509 μs \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m87.852 μs\u001b[22m\u001b[39m ± \u001b[32m 14.373 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▃\u001b[39m▅\u001b[39m█\u001b[39m▇\u001b[39m█\u001b[39m▇\u001b[34m▅\u001b[39m\u001b[39m▄\u001b[39m▂\u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m▁\u001b[39m▂\u001b[39m▃\u001b[39m▄\u001b[39m▆\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[34m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[32m█\u001b[39m\u001b[39m▆\u001b[39m▅\u001b[39m▄\u001b[39m▄\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▂\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▂\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▂\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▂\u001b[39m▁\u001b[39m \u001b[39m▃\n",
" 69.7 μs\u001b[90m Histogram: frequency by time\u001b[39m 140 μs \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# pre-allocate result\n",
"out = zeros(size(A, 2))\n",
"@benchmark mul!($out, transpose($A), $x)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 10000 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m68.930 μs\u001b[22m\u001b[39m … \u001b[35m459.098 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m85.314 μs \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m92.348 μs\u001b[22m\u001b[39m ± \u001b[32m 18.167 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m▃\u001b[39m▆\u001b[39m▇\u001b[39m█\u001b[39m▇\u001b[39m▇\u001b[39m▅\u001b[34m▃\u001b[39m\u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m▁\u001b[39m▁\u001b[39m▂\u001b[39m▃\u001b[39m▄\u001b[39m▆\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[34m█\u001b[39m\u001b[39m█\u001b[39m▇\u001b[39m▆\u001b[39m▅\u001b[32m▃\u001b[39m\u001b[39m▃\u001b[39m▄\u001b[39m▃\u001b[39m▄\u001b[39m▃\u001b[39m▄\u001b[39m▄\u001b[39m▄\u001b[39m▄\u001b[39m▃\u001b[39m▄\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m▂\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m \u001b[39m▃\n",
" 68.9 μs\u001b[90m Histogram: frequency by time\u001b[39m 141 μs \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# or call BLAS wrapper directly\n",
"@benchmark LinearAlgebra.BLAS.gemv!('T', 1.0, $A, $x, 0.0, $out)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. [Broadcasting](https://docs.julialang.org/en/v1/base/arrays/#Broadcast-and-vectorization-1) in Julia achieves vectorized code without creating intermediate arrays.\n",
"\n",
" Suppose we want to calculate elementsize maximum of absolute values of two large arrays. In R or Matlab, the command\n",
"```r\n",
"max(abs(X), abs(Y))\n",
"```\n",
"will create two intermediate arrays and then one result array."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RObject{VecSxp}\n",
"Unit: milliseconds\n",
" expr min lq mean median uq\n",
" max(abs(`#JL`$X), abs(`#JL`$Y)) 5.418381 6.065276 9.559118 7.078557 8.298674\n",
" max neval\n",
" 83.00264 100\n"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"using RCall\n",
"Random.seed!(123)\n",
"X, Y = rand(1000, 1000), rand(1000, 1000)\n",
"\n",
"R\"\"\"\n",
"library(microbenchmark)\n",
"microbenchmark(max(abs($X), abs($Y)))\n",
"\"\"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In Julia, dot operations are fused so no intermediate arrays are created."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 2582 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m892.227 μs\u001b[22m\u001b[39m … \u001b[35m13.614 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m 0.00% … 66.26%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m 1.299 ms \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m 1.928 ms\u001b[22m\u001b[39m ± \u001b[32m 2.262 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m21.39% ± 15.21%\n",
"\n",
" \u001b[39m▃\u001b[39m▇\u001b[34m█\u001b[39m\u001b[39m▆\u001b[39m▃\u001b[39m▂\u001b[32m \u001b[39m\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \n",
" \u001b[39m█\u001b[39m█\u001b[34m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[32m▇\u001b[39m\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▆\u001b[39m▇\u001b[39m▆\u001b[39m▅\u001b[39m▅\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▃\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▃\u001b[39m▇\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m█\u001b[39m▇\u001b[39m▆\u001b[39m▅\u001b[39m▇\u001b[39m \u001b[39m█\n",
" 892 μs\u001b[90m \u001b[39m\u001b[90mHistogram: \u001b[39m\u001b[90m\u001b[1mlog(\u001b[22m\u001b[39m\u001b[90mfrequency\u001b[39m\u001b[90m\u001b[1m)\u001b[22m\u001b[39m\u001b[90m by time\u001b[39m 11.5 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m7.63 MiB\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m2\u001b[39m."
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# no intermediate arrays created, only result array created\n",
"@benchmark max.(abs.($X), abs.($Y))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pre-allocating result array gets rid of memory allocation at all."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 6054 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m740.775 μs\u001b[22m\u001b[39m … \u001b[35m 2.007 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m771.994 μs \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m816.781 μs\u001b[22m\u001b[39m ± \u001b[32m102.622 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[39m▅\u001b[39m█\u001b[39m▇\u001b[39m▆\u001b[34m▅\u001b[39m\u001b[39m▄\u001b[39m▄\u001b[39m▃\u001b[39m▃\u001b[32m▃\u001b[39m\u001b[39m▃\u001b[39m▄\u001b[39m▄\u001b[39m▃\u001b[39m▂\u001b[39m▂\u001b[39m▃\u001b[39m▂\u001b[39m▁\u001b[39m▁\u001b[39m▂\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m \u001b[39m \u001b[39m▁\u001b[39m \u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▂\n",
" \u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[34m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[32m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▆\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m▅\u001b[39m▇\u001b[39m▆\u001b[39m▇\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▇\u001b[39m▆\u001b[39m▅\u001b[39m▅\u001b[39m▆\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▆\u001b[39m▅\u001b[39m▅\u001b[39m▄\u001b[39m▄\u001b[39m▄\u001b[39m▅\u001b[39m▅\u001b[39m \u001b[39m█\n",
" 741 μs\u001b[90m \u001b[39m\u001b[90mHistogram: \u001b[39m\u001b[90m\u001b[1mlog(\u001b[22m\u001b[39m\u001b[90mfrequency\u001b[39m\u001b[90m\u001b[1m)\u001b[22m\u001b[39m\u001b[90m by time\u001b[39m 1.23 ms \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# no memory allocation at all!\n",
"Z = zeros(size(X)) # zero matrix of same size as X\n",
"@benchmark $Z .= max.(abs.($X), abs.($Y)) # .= (vs =) is important!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. [View](https://docs.julialang.org/en/v1/base/arrays/#Views-(SubArrays-and-other-view-types)-1) avoids creating extra copy of matrix data."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 10000 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m 79.961 μs\u001b[22m\u001b[39m … \u001b[35m 9.766 ms\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m 0.00% … 96.49%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m268.438 μs \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m296.359 μs\u001b[22m\u001b[39m ± \u001b[32m507.203 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m10.42% ± 5.92%\n",
"\n",
" \u001b[39m▆\u001b[39m▃\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\u001b[39m▅\u001b[39m▇\u001b[39m▇\u001b[39m█\u001b[39m█\u001b[34m█\u001b[39m\u001b[39m▇\u001b[39m▇\u001b[39m▆\u001b[39m▅\u001b[32m▃\u001b[39m\u001b[39m▃\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▂\u001b[39m▁\u001b[39m▂\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m \u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▃\n",
" \u001b[39m█\u001b[39m█\u001b[39m▅\u001b[39m█\u001b[39m▆\u001b[39m▅\u001b[39m▅\u001b[39m█\u001b[39m▇\u001b[39m▄\u001b[39m▃\u001b[39m▃\u001b[39m▃\u001b[39m▅\u001b[39m▁\u001b[39m▃\u001b[39m▄\u001b[39m▅\u001b[39m▁\u001b[39m▄\u001b[39m▃\u001b[39m▁\u001b[39m▁\u001b[39m▁\u001b[39m▃\u001b[39m▇\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[34m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[32m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m \u001b[39m█\n",
" 80 μs\u001b[90m \u001b[39m\u001b[90mHistogram: \u001b[39m\u001b[90m\u001b[1mlog(\u001b[22m\u001b[39m\u001b[90mfrequency\u001b[39m\u001b[90m\u001b[1m)\u001b[22m\u001b[39m\u001b[90m by time\u001b[39m 440 μs \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m488.39 KiB\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m2\u001b[39m."
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Random.seed!(123)\n",
"A = randn(1000, 1000)\n",
"\n",
"# sum entries in a sub-matrix\n",
"@benchmark sum($A[1:2:500, 1:2:500])"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 10000 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m102.922 μs\u001b[22m\u001b[39m … \u001b[35m303.586 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m103.595 μs \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m110.998 μs\u001b[22m\u001b[39m ± \u001b[32m 20.398 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[34m█\u001b[39m\u001b[39m▆\u001b[39m▄\u001b[39m▅\u001b[39m▃\u001b[32m▂\u001b[39m\u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\n",
" \u001b[34m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[32m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▆\u001b[39m▇\u001b[39m█\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▆\u001b[39m▆\u001b[39m▇\u001b[39m▇\u001b[39m▇\u001b[39m▆\u001b[39m▇\u001b[39m▇\u001b[39m▆\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▆\u001b[39m▅\u001b[39m▅\u001b[39m▆\u001b[39m▇\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▆\u001b[39m▅\u001b[39m▅\u001b[39m▄\u001b[39m▅\u001b[39m▄\u001b[39m▄\u001b[39m▅\u001b[39m \u001b[39m█\n",
" 103 μs\u001b[90m \u001b[39m\u001b[90mHistogram: \u001b[39m\u001b[90m\u001b[1mlog(\u001b[22m\u001b[39m\u001b[90mfrequency\u001b[39m\u001b[90m\u001b[1m)\u001b[22m\u001b[39m\u001b[90m by time\u001b[39m 209 μs \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# view avoids creating a separate sub-matrix\n",
"@benchmark sum(@view $A[1:2:500, 1:2:500])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [`@views`](https://docs.julialang.org/en/v1/base/arrays/#Base.@views) macro, which can be useful in [some operations](https://discourse.julialang.org/t/why-is-a-manual-in-place-addition-so-much-faster-than-and-on-range-indexed-arrays/3302). (See also Example 3 above.)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"BenchmarkTools.Trial: 10000 samples with 1 evaluation.\n",
" Range \u001b[90m(\u001b[39m\u001b[36m\u001b[1mmin\u001b[22m\u001b[39m … \u001b[35mmax\u001b[39m\u001b[90m): \u001b[39m\u001b[36m\u001b[1m102.900 μs\u001b[22m\u001b[39m … \u001b[35m305.959 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmin … max\u001b[90m): \u001b[39m0.00% … 0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[34m\u001b[1mmedian\u001b[22m\u001b[39m\u001b[90m): \u001b[39m\u001b[34m\u001b[1m103.132 μs \u001b[22m\u001b[39m\u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmedian\u001b[90m): \u001b[39m0.00%\n",
" Time \u001b[90m(\u001b[39m\u001b[32m\u001b[1mmean\u001b[22m\u001b[39m ± \u001b[32mσ\u001b[39m\u001b[90m): \u001b[39m\u001b[32m\u001b[1m108.654 μs\u001b[22m\u001b[39m ± \u001b[32m 18.127 μs\u001b[39m \u001b[90m┊\u001b[39m GC \u001b[90m(\u001b[39mmean ± σ\u001b[90m): \u001b[39m0.00% ± 0.00%\n",
"\n",
" \u001b[34m█\u001b[39m\u001b[39m▄\u001b[39m▃\u001b[39m▂\u001b[32m▁\u001b[39m\u001b[39m▁\u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m \u001b[39m▁\n",
" \u001b[34m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[32m█\u001b[39m\u001b[39m█\u001b[39m█\u001b[39m█\u001b[39m▇\u001b[39m▇\u001b[39m▆\u001b[39m▇\u001b[39m█\u001b[39m▇\u001b[39m▆\u001b[39m▅\u001b[39m▆\u001b[39m▆\u001b[39m▅\u001b[39m▄\u001b[39m▄\u001b[39m▅\u001b[39m▆\u001b[39m▇\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▆\u001b[39m▅\u001b[39m▅\u001b[39m▄\u001b[39m▅\u001b[39m▄\u001b[39m▆\u001b[39m▄\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▆\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▅\u001b[39m▄\u001b[39m▅\u001b[39m▅\u001b[39m▄\u001b[39m▅\u001b[39m▄\u001b[39m \u001b[39m█\n",
" 103 μs\u001b[90m \u001b[39m\u001b[90mHistogram: \u001b[39m\u001b[90m\u001b[1mlog(\u001b[22m\u001b[39m\u001b[90mfrequency\u001b[39m\u001b[90m\u001b[1m)\u001b[22m\u001b[39m\u001b[90m by time\u001b[39m 197 μs \u001b[0m\u001b[1m<\u001b[22m\n",
"\n",
" Memory estimate\u001b[90m: \u001b[39m\u001b[33m0 bytes\u001b[39m, allocs estimate\u001b[90m: \u001b[39m\u001b[33m0\u001b[39m."
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"@benchmark @views sum($A[1:2:500, 1:2:500])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Acknowledgment\n",
"\n",
"This lecture note has evolved from [Dr. Hua Zhou](http://hua-zhou.github.io)'s 2019 Spring Statistical Computing course notes available at ."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"@webio": {
"lastCommId": null,
"lastKernelId": null
},
"kernelspec": {
"display_name": "Julia 1.6.2",
"language": "julia",
"name": "julia-1.6"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.6.2"
},
"toc": {
"colors": {
"hover_highlight": "#DAA520",
"running_highlight": "#FF0000",
"selected_highlight": "#FFD700"
},
"moveMenuLeft": true,
"nav_menu": {
"height": "103px",
"width": "252px"
},
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"skip_h1_title": true,
"threshold": 4,
"toc_cell": true,
"toc_position": {
"height": "689.8912963867188px",
"left": "0px",
"right": "619.4000244140625px",
"top": "76.10054779052734px",
"width": "224.36141967773438px"
},
"toc_section_display": "block",
"toc_window_display": true,
"widenNotebook": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}