{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "<a id='types-methods'></a>\n",
    "<div id=\"qe-notebook-header\" style=\"text-align:right;\">\n",
    "        <a href=\"https://quantecon.org/\" title=\"quantecon.org\">\n",
    "                <img style=\"width:250px;display:inline;\" src=\"https://assets.quantecon.org/img/qe-menubar-logo.svg\" alt=\"QuantEcon\">\n",
    "        </a>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Need for Speed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Contents\n",
    "\n",
    "- [The Need for Speed](#The-Need-for-Speed)  \n",
    "  - [Overview](#Overview)  \n",
    "  - [Understanding Multiple Dispatch in Julia](#Understanding-Multiple-Dispatch-in-Julia)  \n",
    "  - [Foundations](#Foundations)  \n",
    "  - [JIT Compilation in Julia](#JIT-Compilation-in-Julia)  \n",
    "  - [Fast and Slow Julia Code](#Fast-and-Slow-Julia-Code)  \n",
    "  - [Further Comments](#Further-Comments)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "Computer scientists often classify programming languages according to the following two categories.\n",
    "\n",
    "*High level languages* aim to maximize productivity by\n",
    "\n",
    "- being easy to read, write and debug  \n",
    "- automating standard tasks (e.g., memory management)  \n",
    "- being interactive, etc.  \n",
    "\n",
    "\n",
    "*Low level languages* aim for speed and control, which they achieve by\n",
    "\n",
    "- being closer to the metal (direct access to CPU, memory, etc.)  \n",
    "- requiring a relatively large amount of information from the user (e.g., all data types must be specified)  \n",
    "\n",
    "\n",
    "Traditionally we understand this as a trade off\n",
    "\n",
    "- high productivity or high performance  \n",
    "- optimized for humans or optimized for machines  \n",
    "\n",
    "\n",
    "One of the great strengths of Julia is that it pushes out the curve, achieving\n",
    "both high productivity and high performance with relatively little fuss.\n",
    "\n",
    "The word “relatively” is important here, however…\n",
    "\n",
    "In simple programs, excellent performance is often trivial to achieve.\n",
    "\n",
    "For longer, more sophisticated programs, you need to be aware of potential stumbling blocks.\n",
    "\n",
    "This lecture covers the key points."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Requirements\n",
    "\n",
    "You should read our [earlier lecture](generic_programming.html) on types, methods and multiple dispatch before this one."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "hide-output": true
   },
   "outputs": [],
   "source": [
    "using InstantiateFromURL\n",
    "github_project(\"QuantEcon/quantecon-notebooks-julia\", version = \"0.6.0\")\n",
    "# uncomment to force package installation and precompilation\n",
    "# github_project(\"QuantEcon/quantecon-notebooks-julia\", version=\"0.6.0\", instantiate=true, precompile = true)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "using LinearAlgebra, Statistics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Understanding Multiple Dispatch in Julia\n",
    "\n",
    "This section provides more background on how methods, functions, and types are connected."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Methods and Functions\n",
    "\n",
    "The precise data type is important, for reasons of both efficiency and mathematical correctness.\n",
    "\n",
    "For example consider 1 + 1 vs. 1.0 + 1.0 or [1 0] + [0 1].\n",
    "\n",
    "On a CPU, integer and floating point addition are different things, using a different set of instructions.\n",
    "\n",
    "Julia handles this problem by storing multiple, specialized versions of functions like addition, one for each data type or set of data types.\n",
    "\n",
    "These individual specialized versions are called **methods**.\n",
    "\n",
    "When an operation like addition is requested, the Julia compiler inspects the type of data to be acted on and hands it out to the appropriate method.\n",
    "\n",
    "This process is called **multiple dispatch**.\n",
    "\n",
    "Like all “infix” operators, 1 + 1 has the alternative syntax +(1, 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "+(1, 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This operator + is itself a function with multiple methods.\n",
    "\n",
    "We can investigate them using the @which macro, which shows the method to which a given call is dispatched"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "+(x::<b>Float64</b>, y::<b>Float64</b>) in Base at <a href=\"https://github.com/JuliaLang/julia/tree/381693d3dfc9b7072707f6d544f82f6637fc5e7c/base/float.jl#L401\" target=\"_blank\">float.jl:401</a>"
      ],
      "text/plain": [
       "+(x::Float64, y::Float64) in Base at float.jl:401"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x, y = 1.0, 1.0\n",
    "@which +(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that the operation is sent to the `+` method that specializes in adding\n",
    "floating point numbers.\n",
    "\n",
    "Here’s the integer case"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "+(x::<b>T</b>, y::<b>T</b>)<i> where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8}</i> in Base at <a href=\"https://github.com/JuliaLang/julia/tree/381693d3dfc9b7072707f6d544f82f6637fc5e7c/base/int.jl#L53\" target=\"_blank\">int.jl:53</a>"
      ],
      "text/plain": [
       "+(x::T, y::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8} in Base at int.jl:53"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x, y = 1, 1\n",
    "@which +(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This output says that the call has been dispatched to the + method\n",
    "responsible for handling integer values.\n",
    "\n",
    "(We’ll learn more about the details of this syntax below)\n",
    "\n",
    "Here’s another example, with complex numbers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "+(z::<b>Complex</b>, w::<b>Complex</b>) in Base at <a href=\"https://github.com/JuliaLang/julia/tree/381693d3dfc9b7072707f6d544f82f6637fc5e7c/base/complex.jl#L275\" target=\"_blank\">complex.jl:275</a>"
      ],
      "text/plain": [
       "+(z::Complex, w::Complex) in Base at complex.jl:275"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x, y = 1.0 + 1.0im, 1.0 + 1.0im\n",
    "@which +(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, the call has been dispatched to a + method specifically designed for handling the given data type."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Adding Methods\n",
    "\n",
    "It’s straightforward to add methods to existing functions.\n",
    "\n",
    "For example, we can’t at present add an integer and a string in Julia (i.e. `100 + \"100\"` is not valid syntax).\n",
    "\n",
    "This is sensible behavior, but if you want to change it there’s nothing to stop you."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100 + \"100\" = 200\n",
      "100 + \"100\" = 200\n"
     ]
    }
   ],
   "source": [
    "import Base: +  # enables adding methods to the + function\n",
    "\n",
    "+(x::Integer, y::String) = x + parse(Int, y)\n",
    "\n",
    "@show +(100, \"100\")\n",
    "@show 100 + \"100\";  # equivalent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Understanding the Compilation Process\n",
    "\n",
    "We can now be a little bit clearer about what happens when you call a function on given types.\n",
    "\n",
    "Suppose we execute the function call `f(a, b)` where `a` and `b`\n",
    "are of concrete types `S` and `T` respectively.\n",
    "\n",
    "The Julia interpreter first queries the types of `a` and `b` to obtain the tuple `(S, T)`.\n",
    "\n",
    "It then parses the list of methods belonging to `f`, searching for a match.\n",
    "\n",
    "If it finds a method matching `(S, T)` it calls that method.\n",
    "\n",
    "If not, it looks to see whether the pair `(S, T)` matches any method defined for *immediate parent types*.\n",
    "\n",
    "For example, if `S` is `Float64` and `T` is `ComplexF32` then the\n",
    "immediate parents are `AbstractFloat` and `Number` respectively"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AbstractFloat"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "supertype(Float64)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Number"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "supertype(ComplexF32)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hence the interpreter looks next for a method of the form `f(x::AbstractFloat, y::Number)`.\n",
    "\n",
    "If the interpreter can’t find a match in immediate parents (supertypes) it proceeds up the tree, looking at the parents of the last type it checked at each iteration.\n",
    "\n",
    "- If it eventually finds a matching method, it invokes that method.  \n",
    "- If not, we get an error.  \n",
    "\n",
    "\n",
    "This is the process that leads to the following error (since we only added the `+` for adding `Integer` and `String` above)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(typeof(100.0) <: Integer) == false = true\n"
     ]
    },
    {
     "ename": "MethodError",
     "evalue": "MethodError: no method matching +(::Float64, ::String)\nClosest candidates are:\n  +(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:529\n  +(::Float64, !Matched::Float64) at float.jl:401\n  +(::AbstractFloat, !Matched::Bool) at bool.jl:106\n  ...",
     "output_type": "error",
     "traceback": [
      "MethodError: no method matching +(::Float64, ::String)\nClosest candidates are:\n  +(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:529\n  +(::Float64, !Matched::Float64) at float.jl:401\n  +(::AbstractFloat, !Matched::Bool) at bool.jl:106\n  ...",
      "",
      "Stacktrace:",
      " [1] top-level scope at In[10]:2"
     ]
    }
   ],
   "source": [
    "@show (typeof(100.0) <: Integer) == false\n",
    "100.0 + \"100\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because the dispatch procedure starts from concrete types and works upwards, dispatch always invokes the *most specific method* available.\n",
    "\n",
    "For example, if you have methods for function `f` that handle\n",
    "\n",
    "1. `(Float64, Int64)` pairs  \n",
    "1. `(Number, Number)` pairs  \n",
    "\n",
    "\n",
    "and you call `f` with `f(0.5, 1)` then the first method will be invoked.\n",
    "\n",
    "This makes sense because (hopefully) the first method is optimized for\n",
    "exactly this kind of data.\n",
    "\n",
    "The second method is probably more of a “catch all” method that handles other\n",
    "data in a less optimal way.\n",
    "\n",
    "Here’s another simple example, involving a user-defined function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "q (generic function with 3 methods)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function q(x)  # or q(x::Any)\n",
    "    println(\"Default (Any) method invoked\")\n",
    "end\n",
    "\n",
    "function q(x::Number)\n",
    "    println(\"Number method invoked\")\n",
    "end\n",
    "\n",
    "function q(x::Integer)\n",
    "    println(\"Integer method invoked\")\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s now run this and see how it relates to our discussion of method dispatch\n",
    "above"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Integer method invoked\n"
     ]
    }
   ],
   "source": [
    "q(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number method invoked\n"
     ]
    }
   ],
   "source": [
    "q(3.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Default (Any) method invoked\n"
     ]
    }
   ],
   "source": [
    "q(\"foo\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since `typeof(3) <: Int64 <: Integer <: Number`, the call `q(3)` proceeds up the tree to `Integer` and invokes `q(x::Integer)`.\n",
    "\n",
    "On the other hand, `3.0` is a `Float64`, which is not a subtype of  `Integer`.\n",
    "\n",
    "Hence the call `q(3.0)` continues up to `q(x::Number)`.\n",
    "\n",
    "Finally, `q(\"foo\")` is handled by the function operating on `Any`, since `String` is not a subtype of `Number` or `Integer`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Analyzing Function Return Types\n",
    "\n",
    "For the most part, time spent “optimizing” Julia code to run faster is about ensuring the compiler can correctly deduce types for all functions.\n",
    "\n",
    "The macro `@code_warntype` gives us a hint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variables\n",
      "  #self#"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[36m::Core.Compiler.Const(f, false)\u001b[39m\n",
      "  x\u001b[36m::Array{Int64,1}\u001b[39m\n",
      "\n",
      "Body\u001b[36m::Array{Int64,1}\u001b[39m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[90m1 ─\u001b[39m %1 = ("
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2 * x)\u001b[36m::Array{Int64,1}\u001b[39m\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[90m└──\u001b[39m      return %1\n"
     ]
    }
   ],
   "source": [
    "x = [1, 2, 3]\n",
    "f(x) = 2x\n",
    "@code_warntype f(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `@code_warntype` macro compiles `f(x)` using the type of `x` as an example – i.e., the `[1, 2, 3]` is used as a prototype for analyzing the compilation, rather than simply calculating the value.\n",
    "\n",
    "Here, the `Body::Array{Int64,1}` tells us the type of the return value of the\n",
    "function, when called with types like `[1, 2, 3]`, is always a vector of integers.\n",
    "\n",
    "In contrast, consider a function potentially returning `nothing`, as in [this lecture](../getting_started_julia/fundamental_types.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variables\n",
      "  #self#\u001b[36m::Core.Compiler.Const(f, false)\u001b[39m\n",
      "  x\u001b[36m::Int64\u001b[39m\n",
      "\n",
      "Body\u001b[33m\u001b[1m::Union{Nothing, Int64}\u001b[22m\u001b[39m\n",
      "\u001b[90m1 ─\u001b[39m %1 = (x > 0.0)\u001b[36m::Bool\u001b[39m\n",
      "\u001b[90m└──\u001b[39m      "
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "goto #3 if not %1\n",
      "\u001b[90m2 ─\u001b[39m      return x\n",
      "\u001b[90m3 ─\u001b[39m      return Main.nothing\n"
     ]
    }
   ],
   "source": [
    "f(x) = x > 0.0 ? x : nothing\n",
    "@code_warntype f(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This states that the compiler determines the return type when called with an integer (like `1`) could be one of two different types, `Body::Union{Nothing, Int64}`.\n",
    "\n",
    "A final example is a variation on the above, which returns the maximum of `x` and `0`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variables\n",
      "  #self#\u001b[36m::Core.Compiler.Const(f, false)\u001b[39m\n",
      "  x\u001b[36m::Int64\u001b[39m\n",
      "\n",
      "Body\u001b[91m\u001b[1m::Union{Float64, Int64}\u001b[22m\u001b[39m\n",
      "\u001b[90m1 ─\u001b[39m %1 = (x > 0.0)\u001b[36m::Bool\u001b[39m\n",
      "\u001b[90m└──\u001b[39m      goto #3 if not %1\n",
      "\u001b[90m2 ─\u001b[39m      return x\n",
      "\u001b[90m3 ─\u001b[39m      return 0.0\n"
     ]
    }
   ],
   "source": [
    "f(x) = x > 0.0 ? x : 0.0\n",
    "@code_warntype f(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Which shows that, when called with an integer, the type could be that integer or the floating point `0.0`.\n",
    "\n",
    "On the other hand, if we use change the function to return `0` if x <= 0, it is type-unstable with  floating point."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Variables\n",
      "  #self#\u001b[36m::Core.Compiler.Const(f, false)\u001b[39m\n",
      "  x\u001b[36m::Float64\u001b[39m\n",
      "\n",
      "Body\u001b[91m\u001b[1m::Union{Float64, Int64}\u001b[22m\u001b[39m\n",
      "\u001b[90m1 ─\u001b[39m %1 = (x > 0.0)\u001b[36m::Bool\u001b[39m\n",
      "\u001b[90m└──\u001b[39m      goto #3 if not %1\n",
      "\u001b[90m2 ─\u001b[39m      return x\n",
      "\u001b[90m3 ─\u001b[39m      return 0\n"
     ]
    }
   ],
   "source": [
    "f(x) = x > 0.0 ? x : 0\n",
    "@code_warntype f(1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The solution is to use the `zero(x)` function which returns the additive identity element of type `x`.\n",
    "\n",
    "On the other hand, if we change the function to return `0` if `x <= 0`, it is type-unstable with  floating point."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "zero(2.3) = 0.0\n",
      "zero(4) = 0\n",
      "zero(2.0 + 3im) = 0.0 + 0.0im\n",
      "Variables\n",
      "  #self#\u001b[36m::Core.Compiler.Const(f, false)\u001b[39m\n",
      "  x\u001b[36m::Float64\u001b[39m\n",
      "\n",
      "Body\u001b[36m::Float64\u001b[39m\n",
      "\u001b[90m1 ─\u001b[39m %1 = (x > 0.0)\u001b[36m::Bool\u001b[39m\n",
      "\u001b[90m└──\u001b[39m      goto #3 if not %1\n",
      "\u001b[90m2 ─\u001b[39m      return x\n",
      "\u001b[90m3 ─\u001b[39m %4 = Main"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ".zero(x)\u001b[36m::Core.Compiler.Const(0.0, false)\u001b[39m\n",
      "\u001b[90m└──\u001b[39m      return %4\n"
     ]
    }
   ],
   "source": [
    "@show zero(2.3)\n",
    "@show zero(4)\n",
    "@show zero(2.0 + 3im)\n",
    "\n",
    "f(x) = x > 0.0 ? x : zero(x)\n",
    "@code_warntype f(1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Foundations\n",
    "\n",
    "Let’s think about how quickly code runs, taking as given\n",
    "\n",
    "- hardware configuration  \n",
    "- algorithm (i.e., set of instructions to be executed)  \n",
    "\n",
    "\n",
    "We’ll start by discussing the kinds of instructions that machines understand."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Machine Code\n",
    "\n",
    "All instructions for computers end up as *machine code*.\n",
    "\n",
    "Writing fast code — expressing a given algorithm so that it runs quickly — boils down to producing efficient machine code.\n",
    "\n",
    "You can do this yourself, by hand, if you want to.\n",
    "\n",
    "Typically this is done by writing [assembly](https://en.wikipedia.org/wiki/Assembly_language), which is a symbolic representation of machine code.\n",
    "\n",
    "Here’s some assembly code implementing a function that takes arguments $ a, b $ and returns $ 2a + 8b $"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hide-output": false
   },
   "source": [
    "```asm\n",
    "    pushq   %rbp\n",
    "    movq    %rsp, %rbp\n",
    "    addq    %rdi, %rdi\n",
    "    leaq    (%rdi,%rsi,8), %rax\n",
    "    popq    %rbp\n",
    "    retq\n",
    "    nopl    (%rax)\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that this code is specific to one particular piece of hardware that we use — different machines require different machine code.\n",
    "\n",
    "If you ever feel tempted to start rewriting your economic model in assembly, please restrain yourself.\n",
    "\n",
    "It’s far more sensible to give these instructions in a language like Julia,\n",
    "where they can be easily written and understood."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "f (generic function with 2 methods)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function f(a, b)\n",
    "    y = 2a + 8b\n",
    "    return y\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hide-output": false
   },
   "source": [
    "```python\n",
    "def f(a, b):\n",
    "    y = 2 * a + 8 * b\n",
    "    return y\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or even C"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "hide-output": false
   },
   "source": [
    "```c\n",
    "int f(int a, int b) {\n",
    "    int y = 2 * a + 8 * b;\n",
    "    return y;\n",
    "}\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In any of these languages we end up with code that is much easier for humans to write, read, share and debug.\n",
    "\n",
    "We leave it up to the machine itself to turn our code into machine code.\n",
    "\n",
    "How exactly does this happen?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generating Machine Code\n",
    "\n",
    "The process for turning high level code into machine code differs across\n",
    "languages.\n",
    "\n",
    "Let’s look at some of the options and how they differ from one another."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### AOT Compiled Languages\n",
    "\n",
    "Traditional compiled languages like Fortran, C and C++ are a reasonable option for writing fast code.\n",
    "\n",
    "Indeed, the standard benchmark for performance is still well-written C or Fortran.\n",
    "\n",
    "These languages compile down to efficient machine code because users are forced to provide a lot of detail on data types and how the code will execute.\n",
    "\n",
    "The compiler therefore has ample information for building the corresponding machine code ahead of time (AOT) in a way that\n",
    "\n",
    "- organizes the data optimally in memory and  \n",
    "- implements efficient operations as required for the task in hand  \n",
    "\n",
    "\n",
    "At the same time, the syntax and semantics of C and Fortran are verbose and unwieldy when compared to something like Julia.\n",
    "\n",
    "Moreover, these low level languages lack the interactivity that’s so crucial for scientific work."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Interpreted Languages\n",
    "\n",
    "Interpreted languages like Python generate machine code “on the fly”, during program execution.\n",
    "\n",
    "This allows them to be flexible and interactive.\n",
    "\n",
    "Moreover, programmers can leave many tedious details to the runtime environment, such as\n",
    "\n",
    "- specifying variable types  \n",
    "- memory allocation/deallocation, etc.  \n",
    "\n",
    "\n",
    "But all this convenience and flexibility comes at a cost: it’s hard to turn\n",
    "instructions written in these languages into efficient machine code.\n",
    "\n",
    "For example, consider what happens when Python adds a long list of numbers\n",
    "together.\n",
    "\n",
    "Typically the runtime environment has to check the type of these objects one by one before it figures out how to add them.\n",
    "\n",
    "This involves substantial overheads.\n",
    "\n",
    "There are also significant overheads associated with accessing the data values themselves, which might not be stored contiguously in memory.\n",
    "\n",
    "The resulting machine code is often complex and slow."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Just-in-time compilation\n",
    "\n",
    "Just-in-time (JIT) compilation is an alternative approach that marries some of\n",
    "the advantages of AOT compilation and interpreted languages.\n",
    "\n",
    "The basic idea is that functions for specific tasks are compiled as requested.\n",
    "\n",
    "As long as the compiler has enough information about what the function does,\n",
    "it can in principle generate efficient machine code.\n",
    "\n",
    "In some instances, all the information is supplied by the programmer.\n",
    "\n",
    "In other cases, the compiler will attempt to infer missing information on the fly based on usage.\n",
    "\n",
    "Through this approach, computing environments built around JIT compilers aim to\n",
    "\n",
    "- provide all the benefits of high level languages discussed above and, at the same time,  \n",
    "- produce efficient instruction sets when functions are compiled down to machine code  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## JIT Compilation in Julia\n",
    "\n",
    "JIT compilation is the approach used by Julia.\n",
    "\n",
    "In an ideal setting, all information necessary to generate efficient native machine code is supplied or inferred.\n",
    "\n",
    "In such a setting, Julia will be on par with machine code from low level languages."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### An Example\n",
    "\n",
    "Consider the function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "f (generic function with 2 methods)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function f(a, b)\n",
    "    y = (a + 8b)^2\n",
    "    return 7y\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose we call `f` with integer arguments (e.g., `z = f(1, 2)`).\n",
    "\n",
    "The JIT compiler now knows the types of `a` and `b`.\n",
    "\n",
    "Moreover, it can infer types for other variables inside the function\n",
    "\n",
    "- e.g., `y` will also be an integer  \n",
    "\n",
    "\n",
    "It then compiles a specialized version of the function to handle integers and\n",
    "stores it in memory.\n",
    "\n",
    "We can view the corresponding machine code using the @code_native macro"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\t.text\n",
      "; ┌ @ In[21]:2 within `f'\n",
      "; │┌ @ In[21]:2 within `+'\n",
      "\tleaq\t(%rdi,%rsi,8), %rcx\n",
      "; │└\n",
      "; │┌ @ intfuncs.jl:261 within `literal_pow'\n",
      "; ││┌ @ int.jl:54 within `*'\n",
      "\timulq\t%rcx, %rcx\n",
      "; │└└\n",
      "; │ @ In[21]:3 within `f'\n",
      "; │┌ @ int.jl:54 within `*'\n",
      "\tleaq\t(,%rcx,8), %rax\n",
      "\tsubq\t%rcx, %rax\n",
      "; │└\n",
      "\tretq\n",
      "\tnopw\t%cs:(%rax,%rax)\n",
      "\tnop\n",
      "; └\n"
     ]
    }
   ],
   "source": [
    "@code_native f(1, 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we now call `f` again, but this time with floating point arguments, the JIT compiler will once more infer types for the other variables inside the function.\n",
    "\n",
    "- e.g., `y` will also be a float  \n",
    "\n",
    "\n",
    "It then compiles a new version to handle this type of argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\t.text\n",
      "; ┌ @ In[21]:2 within `f'\n",
      "\tmovabsq\t$140360208033048, %rax  # imm = 0x7FA828572518\n",
      "; │┌ @ promotion.jl:312 within `*' @ float.jl:405\n",
      "\tvmulsd\t(%rax), %xmm1, %xmm1\n",
      "; │└\n",
      "; │┌ @ float.jl:401 within `+'\n",
      "\tvaddsd\t%xmm0, %xmm1, %xmm0\n",
      "; │└\n",
      "; │┌ @ intfuncs.jl:261 within `literal_pow'\n",
      "; ││┌ @ float.jl:405 within `*'\n",
      "\tvmulsd\t%xmm0, %xmm0, %xmm0\n",
      "\tmovabsq\t$140360208033056, %rax  # imm = 0x7FA828572520\n",
      "; │└└\n",
      "; │ @ In[21]:3 within `f'\n",
      "; │┌ @ promotion.jl:312 within `*' @ float.jl:405\n",
      "\tvmulsd\t(%rax), %xmm0, %xmm0\n",
      "; │└\n",
      "\tretq\n",
      "\tnopw\t%cs:(%rax,%rax)\n",
      "\tnop\n",
      "; └\n"
     ]
    }
   ],
   "source": [
    "@code_native f(1.0, 2.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Subsequent calls using either floats or integers are now routed to the appropriate compiled code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Potential Problems\n",
    "\n",
    "In some senses, what we saw above was a best case scenario.\n",
    "\n",
    "Sometimes the JIT compiler produces messy, slow machine code.\n",
    "\n",
    "This happens when type inference fails or the compiler has insufficient information to optimize effectively.\n",
    "\n",
    "The next section looks at situations where these problems arise and how to get around them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fast and Slow Julia Code\n",
    "\n",
    "To summarize what we’ve learned so far, Julia provides a platform for generating highly efficient machine code with relatively little effort by combining\n",
    "\n",
    "1. JIT compilation  \n",
    "1. Optional type declarations and type inference to pin down the types of variables and hence compile efficient code  \n",
    "1. Multiple dispatch to facilitate specialization and optimization of compiled code for different data types  \n",
    "\n",
    "\n",
    "But the process is not flawless, and hiccups can occur.\n",
    "\n",
    "The purpose of this section is to highlight potential issues and show you how\n",
    "to circumvent them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### BenchmarkTools\n",
    "\n",
    "The main Julia package for benchmarking is [BenchmarkTools.jl](https://www.github.com/JuliaCI/BenchmarkTools.jl).\n",
    "\n",
    "Below, we’ll use the `@btime` macro it exports to evaluate the performance of Julia code.\n",
    "\n",
    "As mentioned in an [earlier lecture](testing.html), we can also save benchmark results to a file and guard against performance regressions in code.\n",
    "\n",
    "For more, see the package docs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Global Variables\n",
    "\n",
    "Global variables are names assigned to values outside of any function or type definition.\n",
    "\n",
    "The are convenient and novice programmers typically use them with abandon.\n",
    "\n",
    "But global variables are also dangerous, especially in medium to large size programs, since\n",
    "\n",
    "- they can affect what happens in any part of your program  \n",
    "- they can be changed by any function  \n",
    "\n",
    "\n",
    "This makes it much harder to be certain about what some  small part of a given piece of code actually commands.\n",
    "\n",
    "Here’s a [useful discussion on the topic](http://wiki.c2.com/?GlobalVariablesAreBad).\n",
    "\n",
    "When it comes to JIT compilation, global variables create further problems.\n",
    "\n",
    "The reason is that the compiler can never be sure of the type of the global\n",
    "variable, or even that the type will stay constant while a given function runs.\n",
    "\n",
    "To illustrate, consider this code, where `b` is global"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "g (generic function with 1 method)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b = 1.0\n",
    "function g(a)\n",
    "    global b\n",
    "    for i ∈ 1:1_000_000\n",
    "        tmp = a + b\n",
    "    end\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The code executes relatively slowly and uses a huge amount of memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  32.336 ms (2000000 allocations: 30.52 MiB)\n"
     ]
    }
   ],
   "source": [
    "using BenchmarkTools\n",
    "\n",
    "@btime g(1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you look at the corresponding machine code you will see that it’s a mess."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\t.text\n",
      "; ┌ @ In[24]:3 within `g'\n",
      "\tpushq\t%rbp\n",
      "\tmovq\t%rsp, %rbp\n",
      "\tpushq\t%r15\n",
      "\tpushq\t%r14\n",
      "\tpushq\t%r13\n",
      "\tpushq\t%r12\n",
      "\tpushq\t%rbx\n",
      "\tandq\t$-32, %rsp\n",
      "\tsubq\t$128, %rsp\n",
      "\tvmovsd\t%xmm0, 24(%rsp)\n",
      "\tvxorps\t%xmm0, %xmm0, %xmm0\n",
      "\tvmovaps\t%ymm0, 32(%rsp)\n",
      "\tmovq\t%fs:0, %rax\n",
      "\tmovq\t$8, 32(%rsp)\n",
      "\tmovq\t-15712(%rax), %rcx\n",
      "\tmovq\t%rcx, 40(%rsp)\n",
      "\tleaq\t32(%rsp), %rcx\n",
      "\tmovq\t%rcx, -15712(%rax)\n",
      "\tleaq\t-15712(%rax), %r12\n",
      "\tmovl\t$1000000, %ebx          # imm = 0xF4240\n",
      "\tmovabsq\t$jl_system_image_data, %r14\n",
      "\tleaq\t88(%rsp), %r15\n",
      "\tnopl\t(%rax)\n",
      "; │ @ In[24]:5 within `g'\n",
      "L112:\n",
      "\tmovabsq\t$140360047021752, %rax  # imm = 0x7FA81EBE4EB8\n",
      "\tmovq\t(%rax), %r13\n",
      "\tmovq\t%r13, 48(%rsp)\n",
      "\tmovq\t%r12, %rdi\n",
      "\tmovl\t$1400, %esi             # imm = 0x578\n",
      "\tmovl\t$16, %edx\n",
      "\tmovabsq\t$jl_gc_pool_alloc, %rax\n",
      "\tvzeroupper\n",
      "\tcallq\t*%rax\n",
      "\tmovabsq\t$jl_system_image_data, %rcx\n",
      "\tmovq\t%rcx, -8(%rax)\n",
      "\tvmovsd\t24(%rsp), %xmm0         # xmm0 = mem[0],zero\n",
      "\tvmovsd\t%xmm0, (%rax)\n",
      "\tmovq\t%rax, 56(%rsp)\n",
      "\tmovq\t%rax, 88(%rsp)\n",
      "\tmovq\t%r13, 96(%rsp)\n",
      "\tmovq\t%r14, %rdi\n",
      "\tmovq\t%r15, %rsi\n",
      "\tmovl\t$2, %edx\n",
      "\tmovabsq\t$jl_apply_generic, %rax\n",
      "\tcallq\t*%rax\n",
      "; │┌ @ range.jl:597 within `iterate'\n",
      "; ││┌ @ promotion.jl:398 within `=='\n",
      "\taddq\t$-1, %rbx\n",
      "; │└└\n",
      "\tjne\tL112\n",
      "\tmovq\t40(%rsp), %rax\n",
      "\tmovq\t%rax, (%r12)\n",
      "; │ @ In[24]:5 within `g'\n",
      "\tleaq\t-40(%rbp), %rsp\n",
      "\tpopq\t%rbx\n",
      "\tpopq\t%r12\n",
      "\tpopq\t%r13\n",
      "\tpopq\t%r14\n",
      "\tpopq\t%r15\n",
      "\tpopq\t%rbp\n",
      "\tretq\n",
      "\tnopw\t(%rax,%rax)\n",
      "; └\n"
     ]
    }
   ],
   "source": [
    "@code_native g(1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we eliminate the global variable like so"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "g (generic function with 2 methods)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function g(a, b)\n",
    "    for i ∈ 1:1_000_000\n",
    "        tmp = a + b\n",
    "    end\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "then execution speed improves dramatically"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  1.600 ns (0 allocations: 0 bytes)\n"
     ]
    }
   ],
   "source": [
    "@btime g(1.0, 1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the second run was dramatically faster than the first.\n",
    "\n",
    "That’s because the first call included the time for JIT compilaiton.\n",
    "\n",
    "Notice also how small the memory footprint of the execution is.\n",
    "\n",
    "Also, the machine code is simple and clean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\t.text\n",
      "; ┌ @ In[27]:2 within `g'\n",
      "\tretq\n",
      "\tnopw\t%cs:(%rax,%rax)\n",
      "\tnopl\t(%rax,%rax)\n",
      "; └\n"
     ]
    }
   ],
   "source": [
    "@code_native g(1.0, 1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now the compiler is certain of types throughout execution of the function and\n",
    "hence can optimize accordingly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The `const` keyword\n",
    "\n",
    "Another way to stabilize the code above is to maintain the global variable but\n",
    "prepend it with `const`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "g (generic function with 2 methods)"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "const b_const = 1.0\n",
    "function g(a)\n",
    "    global b_const\n",
    "    for i ∈ 1:1_000_000\n",
    "        tmp = a + b_const\n",
    "    end\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now the compiler can again generate efficient machine code.\n",
    "\n",
    "We’ll leave you to experiment with it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Composite Types with Abstract Field Types\n",
    "\n",
    "Another scenario that trips up the JIT compiler is when composite types have\n",
    "fields with abstract types.\n",
    "\n",
    "We met this issue [earlier](generic_programming.html#spec-field-types), when we discussed AR(1) models.\n",
    "\n",
    "Let’s experiment, using, respectively,\n",
    "\n",
    "- an untyped field  \n",
    "- a field with abstract type, and  \n",
    "- parametric typing  \n",
    "\n",
    "\n",
    "As we’ll see, the last of these options gives us the best performance, while still maintaining significant flexibility.\n",
    "\n",
    "Here’s the untyped case"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "struct Foo_generic\n",
    "    a\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here’s the case of an abstract type on the field `a`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "struct Foo_abstract\n",
    "    a::Real\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, here’s the parametrically typed case"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "struct Foo_concrete{T <: Real}\n",
    "    a::T\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we generate instances"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Foo_concrete{Float64}(1.0)"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fg = Foo_generic(1.0)\n",
    "fa = Foo_abstract(1.0)\n",
    "fc = Foo_concrete(1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the last case, concrete type information for the fields is embedded in the object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Foo_concrete{Float64}"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "typeof(fc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is significant because such information is detected by the compiler."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Timing\n",
    "\n",
    "Here’s a function that uses the field `a` of our objects"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "f (generic function with 2 methods)"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function f(foo)\n",
    "    for i ∈ 1:1_000_000\n",
    "        tmp = i + foo.a\n",
    "    end\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s try timing our code, starting with the generic case:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  42.075 ms (1999489 allocations: 30.51 MiB)\n"
     ]
    }
   ],
   "source": [
    "@btime f($fg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The timing is not very impressive.\n",
    "\n",
    "Here’s the nasty looking machine code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\t.text\n",
      "; ┌ @ In[36]:2 within `f'\n",
      "\tpushq\t%rbp\n",
      "\tpushq\t%r15\n",
      "\tpushq\t%r14\n",
      "\tpushq\t%r13\n",
      "\tpushq\t%r12\n",
      "\tpushq\t%rbx\n",
      "\tsubq\t$56, %rsp\n",
      "\tvxorps\t%xmm0, %xmm0, %xmm0\n",
      "\tvmovaps\t%xmm0, (%rsp)\n",
      "\tmovq\t$0, 16(%rsp)\n",
      "\tmovq\t%rsi, 48(%rsp)\n",
      "\tmovq\t%fs:0, %rax\n",
      "\tmovq\t$4, (%rsp)\n",
      "\tmovq\t-15712(%rax), %rcx\n",
      "\tmovq\t%rcx, 8(%rsp)\n",
      "\tmovq\t%rsp, %rcx\n",
      "\tmovq\t%rcx, -15712(%rax)\n",
      "\tleaq\t-15712(%rax), %rax\n",
      "\tmovq\t%rax, 24(%rsp)\n",
      "\tmovq\t(%rsi), %r13\n",
      "\tmovl\t$1, %ebx\n",
      "\tmovabsq\t$jl_apply_generic, %r12\n",
      "\tmovabsq\t$jl_system_image_data, %r14\n",
      "\tleaq\t32(%rsp), %r15\n",
      "\tnopl\t(%rax)\n",
      "; │ @ In[36]:3 within `f'\n",
      "; │┌ @ Base.jl:33 within `getproperty'\n",
      "L128:\n",
      "\tmovq\t(%r13), %rbp\n",
      "; │└\n",
      "\tmovq\t%rbx, %rdi\n",
      "\tmovabsq\t$jl_box_int64, %rax\n",
      "\tcallq\t*%rax\n",
      "\tmovq\t%rax, 16(%rsp)\n",
      "\tmovq\t%rax, 32(%rsp)\n",
      "\tmovq\t%rbp, 40(%rsp)\n",
      "\tmovq\t%r14, %rdi\n",
      "\tmovq\t%r15, %rsi\n",
      "\tmovl\t$2, %edx\n",
      "\tcallq\t*%r12\n",
      "; │┌ @ range.jl:597 within `iterate'\n",
      "\taddq\t$1, %rbx\n",
      "; ││┌ @ promotion.jl:398 within `=='\n",
      "\tcmpq\t$1000001, %rbx          # imm = 0xF4241\n",
      "; │└└\n",
      "\tjne\tL128\n",
      "\tmovq\t8(%rsp), %rax\n",
      "\tmovq\t24(%rsp), %rcx\n",
      "\tmovq\t%rax, (%rcx)\n",
      "\tmovabsq\t$jl_system_image_data, %rax\n",
      "; │ @ In[36]:3 within `f'\n",
      "\taddq\t$56, %rsp\n",
      "\tpopq\t%rbx\n",
      "\tpopq\t%r12\n",
      "\tpopq\t%r13\n",
      "\tpopq\t%r14\n",
      "\tpopq\t%r15\n",
      "\tpopq\t%rbp\n",
      "\tretq\n",
      "\tnopw\t%cs:(%rax,%rax)\n",
      "\tnopl\t(%rax)\n",
      "; └\n"
     ]
    }
   ],
   "source": [
    "@code_native f(fg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The abstract case is similar"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  41.603 ms (1999489 allocations: 30.51 MiB)\n"
     ]
    }
   ],
   "source": [
    "@btime f($fa)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note the large memory footprint.\n",
    "\n",
    "The machine code is also long and complex, although we omit details.\n",
    "\n",
    "Finally, let’s look at the parametrically typed version"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  1.600 ns (0 allocations: 0 bytes)\n"
     ]
    }
   ],
   "source": [
    "@btime f($fc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some of this time is JIT compilation, and one more execution gets us down to.\n",
    "\n",
    "Here’s the corresponding machine code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\t.text\n",
      "; ┌ @ In[36]:2 within `f'\n",
      "\tretq\n",
      "\tnopw\t%cs:(%rax,%rax)\n",
      "\tnopl\t(%rax,%rax)\n",
      "; └\n"
     ]
    }
   ],
   "source": [
    "@code_native f(fc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Much nicer…"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Abstract Containers\n",
    "\n",
    "Another way we can run into trouble is with abstract container types.\n",
    "\n",
    "Consider the following function, which essentially does the same job as Julia’s `sum()` function but acts only on floating point data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sum_float_array (generic function with 1 method)"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function sum_float_array(x::AbstractVector{<:Number})\n",
    "    sum = 0.0\n",
    "    for i ∈ eachindex(x)\n",
    "        sum += x[i]\n",
    "    end\n",
    "    return sum\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calls to this function run very quickly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Array{Float64,1}"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = range(0,  1, length = Int(1e6))\n",
    "x = collect(x)\n",
    "typeof(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  1.250 ms (0 allocations: 0 bytes)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "499999.9999999796"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@btime sum_float_array($x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When Julia compiles this function, it knows that the data passed in as `x` will be an array of 64 bit floats.\n",
    "\n",
    "Hence it’s known to the compiler that the relevant method for `+` is always addition of floating point numbers.\n",
    "\n",
    "Moreover, the data can be arranged into continuous 64 bit blocks of memory to simplify memory access.\n",
    "\n",
    "Finally, data types are stable — for example, the local variable `sum` starts off as a float and remains a float throughout."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Type Inferences\n",
    "\n",
    "Here’s the same function minus the type annotation in the function signature"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "sum_array (generic function with 1 method)"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "function sum_array(x)\n",
    "    sum = 0.0\n",
    "    for i ∈ eachindex(x)\n",
    "        sum += x[i]\n",
    "    end\n",
    "    return sum\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we run it with the same array of floating point numbers it executes at a\n",
    "similar speed as the function with type information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  1.250 ms (0 allocations: 0 bytes)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "499999.9999999796"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@btime sum_array($x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The reason is that when `sum_array()` is first called on a vector of a given\n",
    "data type, a newly compiled version of the function is produced to handle that\n",
    "type.\n",
    "\n",
    "In this case, since we’re calling the function on a vector of floats, we get a compiled version of the function with essentially the same internal representation as `sum_float_array()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### An Abstract Container\n",
    "\n",
    "Things get tougher for the interpreter when the data type within the array is imprecise.\n",
    "\n",
    "For example, the following snippet creates an array where the element type is `Any`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "x = Any[ 1/i for i ∈ 1:1e6 ];"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Any"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "eltype(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now summation is much slower and memory management is less efficient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "hide-output": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  26.316 ms (1000000 allocations: 15.26 MiB)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "14.392726722864989"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "@btime sum_array($x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Further Comments\n",
    "\n",
    "Here are some final comments on performance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Explicit Typing\n",
    "\n",
    "Writing fast Julia code amounts to writing Julia from which the compiler can\n",
    "generate efficient machine code.\n",
    "\n",
    "For this, Julia needs to know about the type of data it’s processing as early as possible.\n",
    "\n",
    "We could hard code the type of all variables and function arguments but this comes at a cost.\n",
    "\n",
    "Our code becomes more cumbersome and less generic.\n",
    "\n",
    "We are starting to loose the advantages that drew us to Julia in the first place.\n",
    "\n",
    "Moreover, explicitly typing everything is not necessary for optimal performance.\n",
    "\n",
    "The Julia compiler is smart and can often infer types perfectly well, without\n",
    "any performance cost.\n",
    "\n",
    "What we really want to do is\n",
    "\n",
    "- keep our code simple, elegant and generic  \n",
    "- help the compiler out in situations where it’s liable to get tripped up  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Summary and Tips\n",
    "\n",
    "Use functions to segregate operations into logically distinct blocks.\n",
    "\n",
    "Data types will be determined at function boundaries.\n",
    "\n",
    "If types are not supplied then they will be inferred.\n",
    "\n",
    "If types are stable and can be inferred effectively your functions will run fast."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Further Reading\n",
    "\n",
    "A good next stop for further reading is the [relevant part](https://docs.julialang.org/en/v1/manual/performance-tips/) of the Julia documentation."
   ]
  }
 ],
 "metadata": {
  "date": 1589496364.9639802,
  "download_nb": 1,
  "download_nb_path": "https://julia.quantecon.org/",
  "filename": "need_for_speed.rst",
  "filename_with_path": "more_julia/need_for_speed",
  "kernelspec": {
   "display_name": "Julia 1.4.1",
   "language": "julia",
   "name": "julia-1.4"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.4.1"
  },
  "title": "The Need for Speed"
 },
 "nbformat": 4,
 "nbformat_minor": 2
}