{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Is Julia fast?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Julia isn't fast *per se*.\n", "\n", "One can write terribly slow code in any language, including Julia.\n", "\n", "So let's ask a different question." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# *Can* Julia be fast?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " ### Microbenchmarks\n", " \"drawing\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More realistic case: Vandermonde matrix\n", "(modified from [Steve's Julia intro](https://web.mit.edu/18.06/www/Fall17/1806/julia/Julia-intro.pdf))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Vandermonde matrix:](https://en.wikipedia.org/wiki/Vandermonde_matrix)\n", "\\begin{align}V=\\begin{bmatrix}1&\\alpha _{1}&\\alpha _{1}^{2}&\\dots &\\alpha _{1}^{n-1}\\\\1&\\alpha _{2}&\\alpha _{2}^{2}&\\dots &\\alpha _{2}^{n-1}\\\\1&\\alpha _{3}&\\alpha _{3}^{2}&\\dots &\\alpha _{3}^{n-1}\\\\\\vdots &\\vdots &\\vdots &\\ddots &\\vdots \\\\1&\\alpha _{m}&\\alpha _{m}^{2}&\\dots &\\alpha _{m}^{n-1}\\end{bmatrix}\\end{align}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using PyCall" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np = pyimport(\"numpy\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.vander(1:5, increasing=true)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The source code for this function is [here](https://github.com/numpy/numpy/blob/v1.16.1/numpy/lib/twodim_base.py#L475-L563). It calls `np.multiply.accumulate` which is implemented in C [here](https://github.com/numpy/numpy/blob/deea4983aedfa96905bbaee64e3d1de84144303f/numpy/core/src/umath/ufunc_object.c#L3678). However, this code doesn't actually perform the computation, it basically only checks types and stuff. The actual kernel that gets called is [here](https://github.com/numpy/numpy/blob/deea4983aedfa96905bbaee64e3d1de84144303f/numpy/core/src/umath/loops.c.src#L1742). This isn't even C code but a template for C code which is used to generate type specific kernels.\n", "\n", "Overall, this setup only supports a limited set of types, like `Float64`, `Float32`, and so forth." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is a simple Julia implementation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function vander(x::AbstractVector{T}, n=length(x)) where T\n", " m = length(x)\n", " V = Matrix{T}(undef, m, n)\n", " for j = 1:m\n", " V[j,1] = one(x[j])\n", " end\n", " for i= 2:n\n", " for j = 1:m\n", " V[j,i] = x[j] * V[j,i-1]\n", " end\n", " end\n", " return V\n", "end" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vander(1:5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### A quick speed comparison" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Show Code\n", "
\n", " \n", "```julia\n", "using BenchmarkTools, Plots\n", "ns = exp10.(range(1, 4, length=30));\n", "\n", "tnp = Float64[]\n", "tjl = Float64[]\n", "for n in ns\n", " x = 1:n |> collect\n", " push!(tnp, @belapsed np.vander(\\$x) samples=3 evals=1)\n", " push!(tjl, @belapsed vander(\\$x) samples=3 evals=1)\n", "end\n", "plot(ns, tnp./tjl, m=:circle, xscale=:log10, xlab=\"matrix size\", ylab=\"NumPy time / Julia time\", legend=:false)\n", "```\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \"drawing\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the clean and concise Julia implementation is **beating numpy's C implementation for small matrices** and is **on-par for large matrix sizes**.\n", "\n", "At the same time, the Julia code is *generic* and works for arbitrary types!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vander(Int32[4, 8, 16, 32])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It even works for non-numerical types. The only requirement is that the type has a *one* (identity element) and a multiplication operation defined." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vander([\"this\", \"is\", \"a\", \"test\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, `one(String) == \"\"` since the empty string is the identity under multiplication (string concatenation)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# How can Julia be fast?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\"drawing\"

\n", " \n", "**AST = Abstract Syntax Tree**\n", "\n", "**SSA = Static Single Assignment**\n", "\n", "**[LLVM](https://de.wikipedia.org/wiki/LLVM) = Low Level Virtual Machine**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specialization and code inspection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Julia specializes on the types of function arguments.**\n", "\n", "When a function is called for the first time, Julia compiles efficient machine code for the given input types.\n", "\n", "If it is called again, the already existing machine code is reused, until we call the function with different input types." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "func(x,y) = x^2 + y" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@time func(1,2)\n", "@time func(1,2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**First call:** compilation + running the code\n", "\n", "**Second call:** running the code" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@time func(1,2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the input types change, Julia compiles a new specialization of the function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@time func(1.3,4.8)\n", "@time func(1.3,4.8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have two efficient codes, one for all `Int64` inputs and another one for all `Float64` arguments, in the cache." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### *But I really want to see what happens!*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can inspect the code at all transformation stages with a bunch of macros:\n", "\n", "* The AST after parsing (**`@macroexpand`**)\n", "* The AST after lowering (**`@code_typed`**, **`@code_warntype`**)\n", "* The AST after type inference and optimization (**`@code_lowered`**)\n", "* The LLVM IR (**`@code_llvm`**)\n", "* The assembly machine code (**`@code_native`**)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@code_typed func(1,2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@code_lowered func(1,2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@code_llvm func(1,2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can remove the comments (lines starting with `;` using `debuginfo=:none`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@code_llvm debuginfo=:none func(1,2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@code_native func(1,2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's compare this to `Float64` input." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@code_native func(1.2,2.9)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How important is code specialization?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try to estimate the performance gain by specialization.\n", "\n", "We wrap our numbers into a custom type which internally stores them as `Any` to prevent specialization.\n", "\n", "(This is qualitatively comparable to what Python does.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "struct Anything\n", " value::Any\n", "end\n", "\n", "operation(x::Number) = x^2 + sqrt(x)\n", "operation(x::Anything) = x.value^2 + sqrt(x.value)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using BenchmarkTools\n", "\n", "@btime operation(2);\n", "@btime operation(2.0);\n", "\n", "x = Anything(2.0)\n", "@btime operation($x);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**That's about an 40 times slowdown!**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@code_native operation(2.0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@code_native operation(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Make run-time the fun time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In scientific computations, we typically run a piece of code many times over and over again. Think of a Monte Carlo simulation, for example, where we perform the update and the Metropolis check millions of times.\n", "\n", "**Therefore, we want our run-time to be as short as possible.**\n", "\n", "On the other hand, for a given set of input arguments, Julia compiles the piece of code only once, as we have seen above. The time it takes to compile our code is almost always negligible compared to the duration of the full computation.\n", "\n", "A general strategy is therefore to move parts of the computation to compile-time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since Julia specializes on types, at compile-time **only type information is available to the compiler.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "f1(x::Int) = x + 1\n", "f2(x::Int) = x + 2\n", "\n", "function f_slow(x::Int, p::Bool)\n", " if p # check depends on the value of p\n", " return f1(x)\n", " else\n", " return f2(x)\n", " end\n", "end" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@code_llvm debuginfo=:none f_slow(1, true)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can eliminate the if branch by moving the condition check to the type domain. This way, it **will only be evaluated once at compile-time.**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "abstract type Boolean end\n", "struct True <: Boolean end # type domain true\n", "struct False <: Boolean end # type domain false\n", "\n", "function f_fast(x::Int, p::Boolean)\n", " if typeof(p) == True # check solely based on the type of p\n", " return f1(x)\n", " else\n", " return f2(x)\n", " end\n", "end" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@code_llvm debuginfo=:none f_fast(1, True())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Are explicit type annotations necessary? (like in C or Fortran)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that Julia's type inference is powerful. Specifying types **is not** necessary for best performance!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "function my_function(x)\n", " y = rand()\n", " z = rand()\n", " x+y+z\n", "end\n", "\n", "function my_function_typed(x::Int)::Float64\n", " y::Float64 = rand()\n", " z::Float64 = rand()\n", " x+y+z\n", "end" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@btime my_function(10);\n", "@btime my_function_typed(10);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " However, annotating types explicitly can serve a purpose.\n", "\n", "* **Define a user interface/type filter** (will throw error if incompatible type is given)\n", "* Enforce conversions\n", "* Rarely, help the compiler infer types in tricky situations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Core messages of this Notebook\n", "\n", "* Julia **can be fast.**\n", "* **A function is compiled when called for the first time** with a given set of argument types.\n", "* The are **multiple compilation steps** all of which can be inspected through macros like `@code_warntype`.\n", "* **Code specialization** based on the types of all of the input arguments is important for speed.\n", "* Calculations can be moved to compile-time to make run-time faster.\n", "* In virtually all cases, **explicit type annotations are irrelevant for performance**.\n", "* Type annotations in function signatures define a **type filter/user interface**." ] } ], "metadata": { "@webio": { "lastCommId": null, "lastKernelId": null }, "kernelspec": { "display_name": "Julia 1.2.0", "language": "julia", "name": "julia-1.2" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.2.0" } }, "nbformat": 4, "nbformat_minor": 4 }