{
 "metadata": {
  "language": "Julia",
  "name": "",
  "signature": "sha256:76389525a1bce1d1e27a6aa9bf2e0c36050c2ac6b8931decf8d2ed58ebd7ab46"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This notebook shows the usage of *BenchmarkLite.jl*, a lightweight Julia package for performance benchmarking. \n",
      "\n",
      "Suppose we want to compare the performance of several math functions (applied in batch to vectors). We can do this in several steps:\n",
      "- Load the *BenchmarkLite* package\n",
      "- Define the procedures to be tested\n",
      "- Run the benchmark\n",
      "- See the results"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Loading the package\n",
      "\n",
      "Like other packages, one can load a package with either `import` or `using`. Most of the methods in this package are extended from Julia Base. Hence, `import` should be good enough in typical cases. However, if you want to access the types like `Proc` and `BenchmarkTable` more conveniently, you may use `using`.\n",
      "\n",
      "The package is very lightweight. So the package should load very fast."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "using BenchmarkLite"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Defining the procedures"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "All procedures to be benchmarked should be defined as subtypes of `Proc`, which is an abstract type defined in the `BechmarkLite` module. Several methods need to be defined for a procedure. Each procedure can be run under differen configs.\n",
      "\n",
      "- `string(proc)`: \n",
      "\n",
      "    a short name to identify procedure (this will be used when showing the benchmark table)\n",
      "\n",
      "\n",
      "- `length(proc, cfg)`: \n",
      "\n",
      "    the size of the problem under certain configuration. For example, if the procedure is to run computation of some function over `n` elements, then this function is to return `n`.\n",
      "\n",
      "\n",
      "\n",
      "- `isvalid(proc, cfg)`: \n",
      "\n",
      "    whether the procedure can be run under the given configuration `cfg`.\n",
      "\n",
      "\n",
      "\n",
      "- `s = start(proc, cfg)`: \n",
      "\n",
      "    initialize states to support the procedure (e.g. allocating necessary memory, or connecting to a database). This part is *not* counted in the run-time of the procedure.\n",
      "\n",
      "\n",
      "\n",
      "- `run(proc, cfg, s)`: \n",
      "\n",
      "    run the procedure under certain given config (together with initialized states)\n",
      "\n",
      "\n",
      "\n",
      "- `done(proc, cfg, s)`: \n",
      "\n",
      "    de-initialize the run-time states (e.g. closes a file or database connection)\n",
      "\n",
      "\n",
      "**Note:** all these methods are extended from Julia Base. "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we define a `VecMath` subtype to represent the procedures:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "type VecMath{Op} <: Proc end"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here, the type parameter `Op` can be `Sqrt`, `Exp` etc, as we defined below, to represent the calculation we want to perform on each scalar. Using types to represent functions, allow specific computation to be inlined without incurring runtime overheads."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "type Sqrt end\n",
      "calc(::Sqrt, x) = sqrt(x)\n",
      "\n",
      "type Exp end\n",
      "calc(::Exp, x) = exp(x)\n",
      "\n",
      "type Log end\n",
      "calc(::Log, x) = log(x);"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Define procedure names:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "Base.string{Op}(::VecMath{Op}) = string(\"vec-\", lowercase(\"$Op\"));"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "string(VecMath{Sqrt}())"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 5,
       "text": [
        "\"vec-sqrt\""
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To preclude the memory allocation time from the benchmark, we need to allocate arrays of specific sizes in advance, and store them as the initialized states. Particularly, we need to vectors, one for input, and the other for output. We use `FVecPair` as a shortname to represent such a bi-vector state: "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "typealias FVecPair (Vector{Float64},Vector{Float64});"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The configuration is vector length, which can be simply represented by an integer. Then, we can define the procedures as follows:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "Base.length(p::VecMath, n::Int) = n\n",
      "\n",
      "Base.isvalid(p::VecMath, n::Int) = (n > 0)\n",
      "\n",
      "Base.start(p::VecMath, n::Int) = (rand(n), zeros(n))\n",
      "\n",
      "function Base.run{Op}(p::VecMath{Op}, n::Int, s::FVecPair)\n",
      "    x, y = s\n",
      "    op = Op()\n",
      "    for i = 1:n\n",
      "        @inbounds y[i] = calc(op, x[i])\n",
      "    end\n",
      "end\n",
      "\n",
      "Base.done(p::VecMath, n, s) = nothing;"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Running the benchmark"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Collect all procedures into a `Proc`-vector, as"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "procs = Proc[ VecMath{Sqrt}(), \n",
      "              VecMath{Exp}(), \n",
      "              VecMath{Log}() ];"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Collect all configurations into an `Int`-vector, as"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "cfgs = 2 .^ (4:10)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 10,
       "text": [
        "7-element Array{Int64,1}:\n",
        "   16\n",
        "   32\n",
        "   64\n",
        "  128\n",
        "  256\n",
        "  512\n",
        " 1024"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now, we call `run` to actually run the benchmark. For each procedure under each configuration, there are three stages of running:\n",
      "\n",
      "- *warming up*: \n",
      "\n",
      "    it runs the procedure under the given configuration once, which triggers the pre-compilation of the function.\n",
      "\n",
      "\n",
      "- *probing*: \n",
      "\n",
      "    it runs the procedure again to roughly estimate the time needed to run it once. Then the total number of runs is determined such that the entire duration of measuring takes about 1 second. If you want to change this duration, you may set it using the `duration` keyword argument. For example, `duration = 0.5` means having each procedure under each configuration run for about 0.5 second. \n",
      "    \n",
      "- *measuring*:\n",
      "\n",
      "    it runs the procedure a number of times (the number of times is decided in the *probing* stage), and records the elapsed time.\n",
      "    \n",
      "    "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "rtable = run(procs, cfgs);"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Benchmarking vec-sqrt ...\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "  vec-sqrt with cfg = 16: nruns = 2816902, elapsed = 0.181171339 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-sqrt with cfg = 32: nruns = 2949853, elapsed = 0.383015011 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-sqrt with cfg = 64: nruns = 2096437, elapsed = 0.541707007 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-sqrt with cfg = 128: nruns = 1394701, elapsed = 0.707816771 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-sqrt with cfg = 256: nruns = 800000, elapsed = 0.809421738 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-sqrt with cfg = 512: nruns = 431407, elapsed = 0.872099936 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-sqrt with cfg = 1024: nruns = 224568, elapsed = 0.908604125 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Benchmarking vec-exp ...\n",
        "  vec-exp with cfg = 16: nruns = 675220, elapsed = 0.830725251 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-exp with cfg = 32: nruns = 372440, elapsed = 0.90710495 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-exp with cfg = 64: nruns = 194402, elapsed = 0.945588553 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-exp with cfg = 128: nruns = 99286, elapsed = 0.96417331 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-exp with cfg = 256: nruns = 34761, elapsed = 0.680242924 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-exp with cfg = 512: nruns = 24515, elapsed = 0.956955982 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-exp with cfg = 1024: nruns = 11338, elapsed = 0.881633733 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Benchmarking vec-log ...\n",
        "  vec-log with cfg = 16: nruns = 670691, elapsed = 0.808676817 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-log with cfg = 32: nruns = 378788, elapsed = 0.900356022 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-log with cfg = 64: nruns = 198689, elapsed = 0.946116462 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-log with cfg = 128: nruns = 100827, elapsed = 0.965859782 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-log with cfg = 256: nruns = 49604, elapsed = 0.948375143 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-log with cfg = 512: nruns = 24885, elapsed = 0.983452456 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "  vec-log with cfg = 1024: nruns = 12500, elapsed = 0.96543341 secs"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The result is stored in an instance of `BenchmarkTable`, which can be shown in different units. For example, you can show how many milliseconds each procedure takes (under various configuration):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "show(rtable; unit=:msec)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "BenchmarkTable [unit = msec]\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "config |  vec-sqrt  vec-exp  vec-log  \n",
        "--------------------------------------\n",
        "16     |    0.0001   0.0012   0.0012  \n",
        "32     |    0.0001   0.0024   0.0024  \n",
        "64     |    0.0003   0.0049   0.0048  \n",
        "128    |    0.0005   0.0097   0.0096  \n",
        "256    |    0.0010   0.0196   0.0191  \n",
        "512    |    0.0020   0.0390   0.0395  \n",
        "1024   |    0.0040   0.0778   0.0772  \n"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If millisecond is not precise enough, you may try showing in terms of microseconds:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "show(rtable; unit=:usec)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "BenchmarkTable [unit = usec]\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "config |  vec-sqrt  vec-exp  vec-log  \n",
        "--------------------------------------\n",
        "16     |    0.0643   1.2303   1.2057  \n",
        "32     |    0.1298   2.4356   2.3769  \n",
        "64     |    0.2584   4.8641   4.7618  \n",
        "128    |    0.5075   9.7111   9.5794  \n",
        "256    |    1.0118  19.5691  19.1189  \n",
        "512    |    2.0215  39.0355  39.5199  \n",
        "1024   |    4.0460  77.7592  77.2347  \n"
       ]
      }
     ],
     "prompt_number": 14
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Sometimes, you may want to watch the results in terms of speed (*e.g.*, MPS, million numbers per second):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "show(rtable; unit=:mps)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "BenchmarkTable [unit = mps]\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "config |  vec-sqrt  vec-exp  vec-log  \n",
        "--------------------------------------\n",
        "16     |  248.7724  13.0049  13.2699  \n",
        "32     |  246.4533  13.1386  13.4627  \n",
        "64     |  247.6836  13.1577  13.4403  \n",
        "128    |  252.2146  13.1808  13.3620  \n",
        "256    |  253.0201  13.0818  13.3899  \n",
        "512    |  253.2742  13.1163  12.9555  \n",
        "1024   |  253.0889  13.1689  13.2583  \n"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here is a list of supported units:\n",
      "\n",
      "| `unit`  |  description  |\n",
      "| ------- | ------------- |\n",
      "| `:sec`  |  seconds per run |\n",
      "| `:msec` |  milliseconds per run |\n",
      "| `:usec` |  microseconds per run |\n",
      "| `:nsec` |  nanoseconds per run |\n",
      "| `:ups`  |  how many items/numbers per second. *note:* the number of items per run is determined by `length(proc, cfg)` |\n",
      "| `:kps`  |  how many thousand items/numbers per second |\n",
      "| `:mps`  |  how many million items/numbers per second |\n",
      "| `:gps`  |  how many trillion items/numbers per seoncd |\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Export\n",
      "\n",
      "You can also export the result table to a CSV file using `writecsv`:\n",
      "\n",
      "```\n",
      "writecsv(io, rtable)\n",
      "```"
     ]
    }
   ],
   "metadata": {}
  }
 ]
}