{ "cells": [ { "cell_type": "markdown", "source": [ "# Matrix-vector product\n", "\n", "This example illustrates different ways of computing\n", "matrix-vector products\n", "using the Julia language." ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Setup\n", "Add the Julia packages used in this demo.\n", "Change `false` to `true` in the following code block\n", "if you are using any of the following packages for the first time." ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "if false\n", " import Pkg\n", " Pkg.add([\n", " \"BenchmarkTools\"\n", " \"InteractiveUtils\"\n", " ])\n", "end" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "Tell Julia to use the following packages.\n", "Run `Pkg.add()` in the preceding code block first, if needed." ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "using BenchmarkTools: @benchmark\n", "using InteractiveUtils: versioninfo" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "## Overview of matrix-vector multiplication\n", "\n", "The product between a matrix and a compatible vector\n", "is such a basic method in linear algebra\n", "that, of course,\n", "Julia has a function `*` built-in for it.\n", "\n", "In practice one should simply call that method via\n", "`A * x`\n", "or possibly `*(A, x)`.\n", "\n", "This demo explores other ways of coding the product,\n", "to explore techniques for writing efficient code.\n", "\n", "We write each method as a function\n", "because the most reliable way\n", "to benchmark different methods\n", "is to use functions." ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Built-in `*`\n", "Conventional high-level matrix-vector multiply function:" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul0(A::Matrix, x::Vector)\n", " @boundscheck size(A,2) == length(x) || error(\"DimensionMismatch(A,x)\")\n", " return A * x\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "## Double loop over `m,n`\n", "This is the textbook version." ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_mn(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " y = similar(x, M)\n", " for m in 1:M\n", " inprod = zero(eltype(x)) # accumulator\n", " for n in 1:N\n", " inprod += A[m,n] * x[n]\n", " end\n", " y[m] = inprod\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "Using `@inbounds`" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_mn_inbounds(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " @assert N == length(x)\n", " y = similar(x, M)\n", " for m in 1:M\n", " inprod = zero(x[1]) # accumulator\n", " for n in 1:N\n", " @inbounds inprod += A[m,n] * x[n]\n", " end\n", " @inbounds y[m] = inprod\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "## Double loop over `n,m`\n", "We expect this way to be faster because of cache access over `m`." ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_nm(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " y = zeros(eltype(x), M)\n", " for n in 1:N\n", " for m in 1:M\n", " y[m] += A[m,n] * x[n]\n", " end\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "With @inbounds" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_nm_inbounds(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " @assert N == length(x)\n", " y = zeros(eltype(x), M)\n", " for n in 1:N\n", " for m in 1:M\n", " @inbounds y[m] += A[m,n] * x[n]\n", " end\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "With `@inbounds` and `@simd`" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_nm_inbounds_simd(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " @assert N == length(x)\n", " y = zeros(eltype(x), M)\n", " for n in 1:N\n", " @simd for m in 1:M\n", " @inbounds y[m] += A[m,n] * x[n]\n", " end\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "And with `@views`" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_nm_inbounds_simd_views(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " @assert N == length(x)\n", " y = zeros(eltype(x), M)\n", " for n in 1:N\n", " @simd for m in 1:M\n", " @inbounds @views y[m] += A[m,n] * x[n]\n", " end\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "## Row versions\n", "Loop over `m`." ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_row(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " y = similar(x, M)\n", " for m in 1:M\n", " y[m] = transpose(A[m,:]) * x\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "with `@inbounds`" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_row_inbounds(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " @assert N == length(x)\n", " y = similar(x, M)\n", " for m in 1:M\n", " @inbounds y[m] = transpose(A[m,:]) * x\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "with `@views`" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_row_views(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " y = similar(x, M)\n", " for m in 1:M\n", " @views y[m] = transpose(A[m,:]) * x\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "with both" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_row_inbounds_views(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " @assert N == length(x)\n", " y = similar(x, M)\n", " for m in 1:M\n", " @inbounds @views y[m] = transpose(A[m,:]) * x\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "## Col versions\n", "Loop over `n`:" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_col(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " y = zeros(eltype(x), M)\n", " for n in 1:N\n", " y += A[:,n] * x[n]\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "with broadcast via `@.` to coalesce operations:" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_col_dot(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " y = zeros(eltype(x), M)\n", " for n in 1:N\n", " @. y += A[:,n] * x[n]\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "and with `@views`" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "function mul_col_dot_views(A::Matrix, x::Vector)\n", " (M,N) = size(A)\n", " y = zeros(eltype(x), M)\n", " for n in 1:N\n", " @views @. y += A[:,n] * x[n]\n", " end\n", " return y\n", "end;" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "## Test and time each version\n", "The results will depend on the computer used, of course." ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "M = 2^11\n", "N = M - 4 # non-square to stress test\n", "A = randn(Float32, M, N)\n", "x = randn(Float32, N);\n", "\n", "flist = (mul0,\n", " mul_mn, mul_mn_inbounds,\n", " mul_nm, mul_nm_inbounds, mul_nm_inbounds_simd, mul_nm_inbounds_simd_views,\n", " mul_row, mul_row_inbounds, mul_row_views, mul_row_inbounds_views,\n", " mul_col, mul_col_dot, mul_col_dot_views,\n", ");\n", "\n", "for f in flist # warm-up and test each version\n", " @assert A * x ≈ f(A, x)\n", "end;\n", "\n", "out = Vector{String}(undef, length(flist))\n", "for (i, f) in enumerate(flist) # benchmark timing for each\n", " b = @benchmark $f($A,$x)\n", " tim = round(minimum(b.times)/10^6, digits=1) # in ms\n", " tim = lpad(tim, 4)\n", " name = rpad(f, 27)\n", "\talloc = lpad(b.allocs, 5)\n", "\tmem = round(b.memory/2^10, digits=1)\n", " tmp = \"$name : $tim ms $alloc alloc $mem KiB\"\n", " out[i] = tmp\n", " isinteractive() && println(tmp)\n", "end\n", "out" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "The following results were for a\n", "2017 iMac with 4.2GHz quad-core Intel Core i7\n", "with macOS Mojave 10.14.6 and Julia 1.6.2.\n", "\n", "As expected, simple `A*x` is the fastest,\n", "but one can come quite close to that using proper double loop order\n", "with `@inbounds` or using \"dots\" and `@views` to coalesce.\n", "Without `@views` the vector versions have huge memory overhead!" ], "metadata": {} }, { "outputs": [], "cell_type": "code", "source": [ "[\n", "\"mul0 : 0.9 ms 1 alloc 16.1 KiB\"\n", "\"mul_mn : 22.5 ms 1 alloc 16.1 KiB\"\n", "\"mul_mn_inbounds : 22.0 ms 1 alloc 16.1 KiB\"\n", "\"mul_nm : 3.1 ms 1 alloc 16.1 KiB\"\n", "\"mul_nm_inbounds : 1.5 ms 1 alloc 16.1 KiB\"\n", "\"mul_nm_inbounds_simd : 1.5 ms 1 alloc 16.1 KiB\"\n", "\"mul_nm_inbounds_simd_views : 1.5 ms 1 alloc 16.1 KiB\"\n", "\"mul_row : 32.8 ms 2049 alloc 33040.1 KiB\"\n", "\"mul_row_inbounds : 32.7 ms 2049 alloc 33040.1 KiB\"\n", "\"mul_row_views : 22.4 ms 1 alloc 16.1 KiB\"\n", "\"mul_row_inbounds_views : 22.4 ms 1 alloc 16.1 KiB\"\n", "\"mul_col : 16.0 ms 6133 alloc 98894.6 KiB\"\n", "\"mul_col_dot : 7.0 ms 2045 alloc 32975.6 KiB\"\n", "\"mul_col_dot_views : 1.5 ms 1 alloc 16.1 KiB\"\n", "];" ], "metadata": {}, "execution_count": null }, { "cell_type": "markdown", "source": [ "---\n", "\n", "*This notebook was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*" ], "metadata": {} } ], "nbformat_minor": 3, "metadata": { "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.10.1" }, "kernelspec": { "name": "julia-1.10", "display_name": "Julia 1.10.1", "language": "julia" } }, "nbformat": 4 }