{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear Algebra, Precision and Profiling\n", "\n", "Julias generic way of implementing algorithms often makes it easy to explore different storage schemes, elevated or reduced precision or to try acceleration hardware like a GPU. I want to present a few illustrating examples for this in this workbook, showing how little effort is needed to give these things a try in Julia. I will also show in one example how one can track performance by profiling and understand what should be done to improve an algorithm at hand.\n", "\n", "First we install some packages for linear algebra and to perform some benchmarks ..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import Pkg; Pkg.add([\"GenericLinearAlgebra\", \"SparseArrays\", \"Profile\", \"ProfileView\", \"BenchmarkTools\", \"PProf\"])\n", "using LinearAlgebra\n", "using SparseArrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear Algebra\n", "\n", "For dense and sparse arrays, all important linear algebra routines are available in the `LinearAlgebra`. This includes common tasks such as\n", "- `qr` (also pivoted)\n", "- `cholesky` (also pivoted)\n", "- `eigen`, `eigvals`, `eigvecs` (compute eigenpairs, values, vectors)\n", "- `factorize` (for computing matrix factorisations)\n", "\n", "All these methods are both implemented for generic matrices and specialised for specific kinds. For example `factorize` is intended to compute a clever factorisation for solving linear systems. What it does depends on the matrix properties:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Random real matrix -> will do an LU\n", "A = randn(10, 10)\n", "@show typeof(factorize(A))\n", "\n", "# Real-symmetric matrix -> will do a Bunch-Kaufman\n", "Am = Symmetric(A + A')\n", "@show typeof(factorize(Am))\n", "\n", "# Symmetric tridiagonal -> will do a LDLt\n", "Am = SymTridiagonal(A + A')\n", "@show typeof(factorize(Am))\n", "\n", "# Random sparse matrix -> will do sparse LU\n", "S = sprandn(50, 50, 0.3)\n", "@show typeof(factorize(S))\n", "\n", "# ... and so on ..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The all share a common interface, such that an algorithm like" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function solve_many(A, xs)\n", " F = factorize(A)\n", " [F \\ rhs for rhs in xs]\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "will automatically work for sparse arrays and dense arrays and is furthermore independent of the floating-point type.\n", "\n", "##### More details\n", "- https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use case: A generic Davidson\n", "\n", "Let's try this in a more realistic algorithm.\n", "If we leave efficiency aside for now, a simple Davidson algorithm can be implemented quite concisely:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using LinearAlgebra\n", "\n", "function davidson(A, SS::AbstractArray; maxiter=100, prec=I,\n", " tol=20size(A,2)*eps(eltype(A)),\n", " maxsubspace=8size(SS, 2))\n", " m = size(SS, 2)\n", " for i in 1:100\n", " Ass = A * SS\n", "\n", " # Use eigen specialised for Hermitian matrices\n", " rvals, rvecs = eigen(Hermitian(SS' * Ass))\n", " rvals = rvals[1:m]\n", " rvecs = rvecs[:, 1:m]\n", " Ax = Ass * rvecs\n", "\n", " R = Ax - SS * rvecs * Diagonal(rvals)\n", " if norm(R) < tol\n", " return rvals, SS * rvecs\n", " end\n", "\n", " println(i, \" \", size(SS, 2), \" \", norm(R))\n", "\n", " # Use QR to orthogonalise the subspace.\n", " if size(SS, 2) + m > maxsubspace\n", " SS = typeof(R)(qr(hcat(SS * rvecs, prec * R)).Q)\n", " else\n", " SS = typeof(R)(qr(hcat(SS, prec * R)).Q)\n", " end\n", " end\n", " error(\"not converged.\")\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can easily test:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nev = 2\n", "A = randn(20, 20); A = A + A' + I;\n", "\n", "# Generate two random orthogonal guess vectors\n", "x0 = randn(size(A, 2), nev)\n", "x0 = Array(qr(x0).Q)\n", "\n", "# Run the problem\n", "davidson(A, x0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From this a mixed-precsion scheme can be just written down:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using GenericLinearAlgebra # (Slow) generic fallbacks for eigen and qr\n", "\n", "λ, v = davidson(Matrix{Float16}(A), Float16.(x0), tol=1)\n", "println()\n", "λ, v = davidson(Matrix{Float32}(A), v, tol=1e-3)\n", "println()\n", "λ, v = davidson(Matrix{Float64}(A), v, tol=1e-13)\n", "println()\n", "λ, v = davidson(Matrix{BigFloat}(A), v, tol=1e-25)\n", "λ" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use sparse arrays on this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nev = 2\n", "spA = sprandn(100, 100, 0.3); spA = spA + spA' + 2I\n", "spx0 = randn(size(spA, 2), nev)\n", "spx0 = Array(qr(spx0).Q)\n", "\n", "davidson(spA, spx0, tol=1e-6)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or GPU arrays, see also [davidson_cuda.jl](davidson_cuda.jl):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use the GPU\n", "using CuArrays\n", "using CuArrays.CUSOLVER\n", "\n", "function LinearAlgebra.eigen(A::Hermitian{T, CuArray{T, 2}}) where {T <: Real}\n", " CUSOLVER.syevd!('V', 'U', A.data)\n", "end\n", "\n", "davidson(cu(A), 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "but actually the performance is overall not that good out of the box, because we're doing a lot of copying and elementwise access in our naive algorithm." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Profiling and suggestions for improvement\n", "\n", "Let's see if we can detect the performance issues and suggest places for improvements. For this we will use Julia's builtin `Profile` package in combination with two grapical viewers, `ProfileView` and `PProf`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using Profile\n", "using ProfileView\n", "using PProf" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Setup of the problem:\n", "nev = 2\n", "A = randn(20, 20); A = A + A' + I;\n", "x0 = randn(size(A, 2), nev)\n", "x0 = Array(qr(x0).Q)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run once to compile everything ... this should be ignored\n", "@profview davidson(A, x0)\n", "@profview davidson(A, x0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run once to compile everything ... this should be ignored\n", "@pprof davidson(A, x0)\n", "@pprof davidson(A, x0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So it seems line 27 in the davidson implementation is where basically all the time is spent. This line is\n", "```julia\n", "SS = typeof(R)(qr(hcat(SS, prec * R)).Q)\n", "```\n", "which basically consists of a QR, the `hcat` and the conversion of the QR result to `typeof(R)` (which is an array in this case). Let us try to get an idea for the time each of these steps need.\n", "\n", "We will use the `BenchmarkTools` package for this. It measures the time one (or multiple) Julia statement take to execute. For this it runs them a *repeated* number of times and collects some statistics." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using BenchmarkTools" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In our example `SS` is at most 20x16, `prec * R` is 20x2, so let's use that:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "SS = randn(20, 16)\n", "precR = randn(20, 2)\n", "\n", "catted = @btime hcat(SS, precR)\n", "fac = @btime qr(catted)\n", "@btime Array(fac.Q);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In agreement with the benchmark result the QR dominates, but also the unpacking of the Q factor takes quite some time. Basically it shows that the things to impove in the algorithm would be to use a different way to orthogonalise the subspace." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### More details\n", "- https://docs.julialang.org/en/v1/manual/profile/\n", "- https://github.com/timholy/ProfileView.jl\n", "- https://github.com/JuliaCI/BenchmarkTools.jl" ] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "kernelspec": { "display_name": "Julia 1.3.0", "language": "julia", "name": "julia-1.3" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.3.0" } }, "nbformat": 4, "nbformat_minor": 2 }