{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Week 9 Day 3\n",
    "\n",
    "## Objectives\n",
    "* Discuss calculation performance\n",
    "* Look at ways to speed up code\n",
    "\n",
    "> Reminder: You should all have started your projects. You should be able to give a simple report over what you are doing and planning to do two weeks from today."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example Numpy calculation\n",
    "\n",
    "For the following, we will often use normal Numpy as a baseline. Remember it's already a factor of 10 to 100 faster than pure Python!\n",
    "\n",
    "Let's pretend that we have the following calculation, and it's important to us for some reason:\n",
    "\n",
    "$$\n",
    "y = x + x^2 + x^3 + x^4 + x^5 + x^6 + x^7 + x^8\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from scipy.stats import norm\n",
    "import matplotlib.pyplot as plt\n",
    "import numexpr\n",
    "import numba"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For all the following examples, we'll use the same 1M element vector:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.random.normal(size=1_000_000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Simple Python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def f_simple(xarr):\n",
    "    yarr = np.empty_like(xarr)\n",
    "    for i in range(len(xarr)):\n",
    "        x = xarr[i]\n",
    "        yarr[i] = x + x**2 + x**3 + x**4 + x**5 + x**6 + x**7 + x**8\n",
    "    return yarr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "f_simple(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Simple Numpy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def f_numpy(x):\n",
    "    return x + x**2 + x**3 + x**4 + x**5 + x**6 + x**7 + x**8"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "f_numpy(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that this is clean and clear. If this is fast enough for your use case, *stop here*! This is the easiest one to maintain, and is the most similar to the original problem statement.\n",
    "\n",
    "It is slow, however, We have to make a lot of memory allocations; even though we can reclaim it later, this is slow. Also, raising to a power is a bit pricey too."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Algorithmic rearrangement\n",
    "\n",
    "Let's try to build a new version of the expression that is friendlier to doing in-place calculations. Before we move to in-place calculations, though, let's test the new expression, both against the old one and with a timer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def f_nest_numpy(x):\n",
    "    return x*(1 + x*(1 + x*(1 + x*(1 + x*(1 + x*(1 + x*(1 + x)))))))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.testing.assert_allclose(f_numpy(x), f_nest_numpy(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "f_nest_numpy(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Numpy reduced memory\n",
    "\n",
    "Now, let's get rid of all the extra memory allocations. We will allocate once, then just keep changing the memory we've already allocated:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def f_mem_numpy(x):\n",
    "    y = x.copy()\n",
    "    for _ in range(7):\n",
    "        y += 1\n",
    "        y *= x\n",
    "    return y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.testing.assert_allclose(f_numpy(x), f_mem_numpy(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "f_mem_numpy(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Wow! Almost all the time in the previous examples was memory allocation! \n",
    "\n",
    "Most of Numpy's functionality can be accessed with pre-allocated memory. You can use the `out=` keyword argument in many functions to put the results of many function calls directly in existing memory, for example.\n",
    "\n",
    "However, we can't always get rid of memory allocation, depending on the problem, and we are now much further from the original math expression. Let's see if we can use a library to help."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Numexpr\n",
    "\n",
    "We can't resist a quick look at the numexpr JIT compiler. It's an oldy but a goody for simple use cases. *If* you have long, simple numpy expressions, you can put them inside a string, and numexpr will create an optimized version of this without in-between expressions that Python would require, and then evaluate that. Let's see:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def f_numexper(x):\n",
    "    return numexpr.evaluate('x + x**2 + x**3 + x**4 + x**5 + x**6 + x**7 + x**8')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.testing.assert_allclose(f_numpy(x), f_numexper(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "f_numexper(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Nice! The downside: It only supports a small handful of operations and functions, like `sin` and `cos`. It's also best used for simple bits like this. However, your code is pretty readable (albeit in a string), and if used carefully, could be all you'll need. Numexpr must be available, of course, but it usually is."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Cython: A new language\n",
    "\n",
    "I can't skip Cython. It revolutionized Python in many ways, though as you'll see soon, I don't think it's the best tool for the job anymore. It allows you to write a mix of Python and C to create completely pre-compiled functions. Compared to writing the Python C-API, it's amazing. Compared to pretty much anything else...\n",
    "\n",
    "Compiling Cython is often a pain, but it has a Jupyter extension that is excellent, so let's use that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext Cython"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%cython\n",
    "# cython: boundscheck=False, wraparound=False, noncheck=False, cdivision=True\n",
    "\n",
    "import numpy as np\n",
    "\n",
    "def f_cython(double[:] xarr):\n",
    "    y_arr = np.empty([len(xarr)])\n",
    "    cdef double[:] y = y_arr\n",
    "    cdef int i\n",
    "    cdef double x\n",
    "    \n",
    "    for i in range(xarr.shape[0]):\n",
    "        x = xarr[i]\n",
    "        y[i] = x*(1 + x*(1 + x*(1 + x*(1 + x*(1 + x*(1 + x*(1 + x)))))))\n",
    "        #y[i] = x + x**2 + x**3 + x**4 + x**5 + x**6 + x**7 + x**8\n",
    "                \n",
    "    return y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.testing.assert_allclose(f_numpy(x), f_cython(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "f_cython(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Yes, this is overkill. Even if you know Python, C, and C++, this is still a new language with its own quirks. But it does give you lots of power for complex tasks - you can even write classes! You can activate C++ mode and call existing C++ libraries! Yes, compiling gets exponentially harder to do if you do that, but it's possible. Not pretty, just possible. You can use OpenMP to run in multiple threads, again making the compile step more complicated (I'm not showing that on macOS here!)\n",
    "\n",
    "Now let's look at the new kid on the scene:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Numba\n",
    "\n",
    "Numba takes a totally different approach. It's built on two ideas:\n",
    "\n",
    "* It reads and converts byte code from *existing Python functions*\n",
    "* It does JIT compilation directly to LLVM (no C or C++ in-between)\n",
    "\n",
    "Why is that special or new?\n",
    "\n",
    "* 100% Python - no special compiler, special extensions, strings, etc.\n",
    "* Easy to make optional - just don't run the function through Numba - no Numba requirement\n",
    "* It's a just a library! No new interpreter or compiler step"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For now, let's *not even bother to write a new function*! We have perfectly nice existing functions, let's just pass them through Numba:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f_numba_simple = numba.jit(f_numpy)\n",
    "f_numba_nest = numba.jit(f_nest_numpy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.testing.assert_allclose(f_numpy(x), f_numba_simple(x))\n",
    "np.testing.assert_allclose(f_nest_numpy(x), f_numba_nest(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "f_numba_simple(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "f_numba_nest(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Wow! We didn't do anything at all to our *original*, slow source code. We even reused it! We could have written a check for Numba, and only applied the JIT if Numba was found - free speedup."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Numba UFunc\n",
    "\n",
    "Before we co on, let's play around a bit more and see if we can squeeze a bit more out of Numba by creating a parallel UFunc - don't worry about what this is, we'll cover Numba in detail next:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f_numba_simple_ufunc = numba.vectorize([numba.double(numba.double)],\n",
    "                                       target='parallel')(f_numpy)\n",
    "f_numba_nest_ufunc = numba.vectorize([numba.double(numba.double)],\n",
    "                                     target='parallel')(f_numba_nest)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.testing.assert_allclose(f_numpy(x), f_numba_simple_ufunc(x))\n",
    "np.testing.assert_allclose(f_numpy(x), f_numba_nest_ufunc(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "f_numba_simple_ufunc(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "f_numba_nest_ufunc(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Numba JIT Parallel:\n",
    "\n",
    "We could actually have done that with just JIT, as well, by asking it to parallelize:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f_numba_parallel = numba.jit(nopython=True, parallel=True)(f_numpy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.testing.assert_allclose(f_numpy(x), f_numba_parallel(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "f_numba_parallel(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Numba\n",
    "\n",
    "So now that we have seen it, let's properly introduce Numba's features. Numba is developed by Anaconda, and (of course) comes in the Anaconda distribution. It's also available via pip now, too."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Decorators\n",
    "\n",
    "Reminder: We can write this:\n",
    "\n",
    "```python\n",
    "def f(...):\n",
    "    ...\n",
    "\n",
    "f = numba.jit(f)\n",
    "```\n",
    "\n",
    "As a decorator like this:\n",
    "\n",
    "```python\n",
    "@numba.jit\n",
    "def f(...):\n",
    "    ...\n",
    "```\n",
    "\n",
    "If you noticed the rather odd double-call syntax above, that's because it was designed to be a decorator and it's more natural that way."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* `jit`: Compile a function. Unlike Cython, this supports array-at-a-time syntax. `njit` is `jit(nopython=True)`\n",
    "* `vectorize`: Make a UFunc. You write a 1 element function, and it gets applied to all elements.\n",
    "* `guvectorize`: Make a Generalized UFunc. You can specify relations for the input/output shapes\n",
    "* `cfunc`: Callable from other Numba routines only (jit functions can be called too)\n",
    "\n",
    "Lesser used / more specialized / underdeveloped:\n",
    "\n",
    "* `jitclass`: Write a class instead of a function (early support warning has been there for a long time)\n",
    "* `stencil`: Masking operations\n",
    "* `overload`: Make a nopython version of a function\n",
    "* GPU versions: You can target CUDA with `cuda.jit`, `vectorize(target='cuda')` (limited feature set)\n",
    "* You recently can target AMD ROC GPUs too\n",
    "\n",
    "See the [Numba documentation](https://numba.pydata.org/numba-doc/dev/index.html) for more. You can find a list of functions and features supported - it's far more extensive than numexpr! (Note that the CUDA and ROC versions have fewer features supported)\n",
    "\n",
    "Common options (not all are available on all functions):\n",
    "* A signature or list of signatures: Precompile the function with these expected types\n",
    "* `nopython=True`: force the function to fail if it can't become a pure Numba compiled function\n",
    "* `target=`: Can be one of `cpu`, `parallel`, `gpu`, `cuda`, depending on the function\n",
    "* `fastmath=True`: Can speed up math at some accuracy cost\n",
    "* `parallel=True`: Just on `jit`, can try to parallelize\n",
    "* `nogil=True`: Release Python's global interpreter lock on memory\n",
    "\n",
    "Numba is under rapid development, with a new release every 2-3 months. Each release adds features from Intel's collaboration, often a few AMD features now, new supported functions, and lots more."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Tips and tricks:\n",
    "\n",
    "If you need 32 bit floats, you have to be a bit careful if you create a new variable inside your function. You can pass a dictionary to `locals=` of the internal types if Numba gets it wrong.\n",
    "\n",
    "You can see the contained types, LLVM code, or assembly that gets generated if you wish."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f_numba_simple.inspect_types()\n",
    "#print(list(f_numba_simple.inspect_llvm().values())[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Real code example\n",
    "\n",
    "Here is some real code that I've written recently. I had a training sample that took 60 seconds to run an iteration. The original (non-decorated version) took 100+ seconds to run on 10K events. The decorated version, with some small changes to remove list appending, takes <0.1 second and can be run every iteration.\n",
    "\n",
    "The code looks for signals that look like bumps in an array of mostly zeros following a specific set of requirements (`threshold`, `integral_threshold`, `min_width`). \n",
    "\n",
    "```python\n",
    "@numba.jit(numba.float32[:](numba.float32[:], numba.float32, numba.float32, numba.int32),\n",
    "           locals={'integral':numba.float32, 'sum_weights_locs':numba.float32},\n",
    "           nopython=True)\n",
    "def pv_locations(targets, threshold, integral_threshold, min_width):\n",
    "    state = 0\n",
    "    integral = 0.0\n",
    "    sum_weights_locs = 0.0\n",
    "\n",
    "    # Make an empty array and manually track the size (faster than python array)\n",
    "    items = np.empty(150, np.float32)\n",
    "    nitems = 0\n",
    "\n",
    "    for i in range(len(targets)):\n",
    "        if targets[i] >= threshold:\n",
    "            state += 1\n",
    "            integral += targets[i]\n",
    "            sum_weights_locs += i * targets[i] # weight times location\n",
    "\n",
    "        if (targets[i] < threshold or i == len(targets)-1) and state > 0:\n",
    "\n",
    "            # Record only if\n",
    "            if state >= min_width and integral >= integral_threshold:\n",
    "                items[nitems] = sum_weights_locs / integral\n",
    "                nitems += 1\n",
    "\n",
    "            #reset state\n",
    "            state = 0\n",
    "            integral = 0.0\n",
    "            sum_weights_locs = 0.0\n",
    "\n",
    "\n",
    "    # Special case for final item (very rare or never occuring)\n",
    "    # handled by above if len\n",
    "\n",
    "    return items[:nitems]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Numba accelerated code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Fractal code from week 3\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "size = (500, 500)\n",
    "iterations = 50\n",
    "\n",
    "def make_fractal(size, iterations):\n",
    "    x = np.linspace(-2,2,size[0]).reshape(1,-1)\n",
    "    y = np.linspace(-2,2,size[1]).reshape(-1,1)\n",
    "    c = x + y*1j\n",
    "    z = np.zeros_like(c)\n",
    "    it_matrix = np.zeros(size, dtype=np.int)\n",
    "    for n in range(iterations):\n",
    "        z = z**2 + c\n",
    "        it_matrix[np.abs(z) < 2] = n\n",
    "    return it_matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10,10))\n",
    "plt.imshow(make_fractal(size, iterations));"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "make_fractal(size, iterations)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@numba.jit(nopython=True)\n",
    "def make_fractal_nb(size, iterations):\n",
    "    x = np.linspace(-2,2,size[0]).reshape(1,-1)\n",
    "    y = np.linspace(-2,2,size[1]).reshape(-1,1)\n",
    "    c = x + y*1j\n",
    "    z = np.zeros_like(c)\n",
    "    it_matrix = np.zeros(size, dtype=np.int64)\n",
    "    for n in range(iterations):\n",
    "        for i in range(size[0]):\n",
    "            for j in range(size[1]):\n",
    "                if np.abs(z[i,j]) < 2:\n",
    "                    z[i,j] = z[i,j]**2 + c[i,j]\n",
    "                    it_matrix[i, j] = n\n",
    "    return it_matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10,10))\n",
    "plt.imshow(make_fractal_nb(size, iterations));"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%timeit\n",
    "make_fractal_nb(size, iterations)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### MCMC\n",
    "\n",
    "This is our sampler for MCMC:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sampler(data, samples=4, mu_init=.5, proposal_width=.5, plot=False, mu_prior_mu=0, mu_prior_sd=1.):\n",
    "    mu_current = mu_init\n",
    "    posterior = [mu_current]\n",
    "    for i in range(samples):\n",
    "        # suggest new position\n",
    "        mu_proposal = norm(mu_current, proposal_width).rvs()\n",
    "\n",
    "        # Compute likelihood by multiplying probabilities of each data point\n",
    "        likelihood_current = norm(mu_current, 1).pdf(data).prod()\n",
    "        likelihood_proposal = norm(mu_proposal, 1).pdf(data).prod()\n",
    "        \n",
    "        # Compute prior probability of current and proposed mu        \n",
    "        prior_current = norm(mu_prior_mu, mu_prior_sd).pdf(mu_current)\n",
    "        prior_proposal = norm(mu_prior_mu, mu_prior_sd).pdf(mu_proposal)\n",
    "        \n",
    "        p_current = likelihood_current * prior_current\n",
    "        p_proposal = likelihood_proposal * prior_proposal\n",
    "        \n",
    "        # Accept proposal?\n",
    "        p_accept = p_proposal / p_current\n",
    "        \n",
    "        # Usually would include prior probability, which we neglect here for simplicity\n",
    "        accept = np.random.rand() < p_accept\n",
    "        \n",
    "        if plot:\n",
    "            plot_proposal(mu_current, mu_proposal, mu_prior_mu, mu_prior_sd, data, accept, posterior, i)\n",
    "        \n",
    "        if accept:\n",
    "            # Update position\n",
    "            mu_current = mu_proposal\n",
    "        \n",
    "        posterior.append(mu_current)\n",
    "        \n",
    "    return posterior"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "np.random.seed(123)\n",
    "data = np.random.randn(20)\n",
    "\n",
    "posterior = sampler(data, samples=1500, mu_init=1.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's replace the scipy calls, and rerun:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def norm_pdf(loc, scale, x):\n",
    "    y = (x - loc) / scale\n",
    "    return np.exp(-y**2/2)/np.sqrt(2*np.pi) / scale"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make sure it's really the same:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert norm_pdf(.4, .7, .2) == norm(.4, .7).pdf(.2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert norm(.3, 1).pdf(data).prod() == np.prod(norm_pdf(.3,1,data))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll also remove the list appending."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def sampler(data, samples=4, mu_init=.5, proposal_width=.5, plot=False, mu_prior_mu=0, mu_prior_sd=1.):\n",
    "    mu_current = mu_init\n",
    "    \n",
    "    posterior = np.empty(samples+1)\n",
    "    posterior[0] = mu_current\n",
    "    \n",
    "    for i in range(samples):\n",
    "        # suggest new position\n",
    "        mu_proposal = np.random.normal(mu_current, proposal_width)\n",
    "\n",
    "        # Compute likelihood by multiplying probabilities of each data point\n",
    "        likelihood_current = np.prod(norm_pdf(mu_current, 1, data))\n",
    "        likelihood_proposal = np.prod(norm_pdf(mu_proposal, 1, data))\n",
    "        \n",
    "        # Compute prior probability of current and proposed mu        \n",
    "        prior_current = norm_pdf(mu_prior_mu, mu_prior_sd, mu_current)\n",
    "        prior_proposal = norm_pdf(mu_prior_mu, mu_prior_sd, mu_proposal)\n",
    "        \n",
    "        p_current = likelihood_current * prior_current\n",
    "        p_proposal = likelihood_proposal * prior_proposal\n",
    "        \n",
    "        # Accept proposal?\n",
    "        p_accept = p_proposal / p_current\n",
    "        \n",
    "        # Usually would include prior probability, which we neglect here for simplicity\n",
    "        accept = np.random.rand() < p_accept\n",
    "        \n",
    "        if accept:\n",
    "            # Update position\n",
    "            mu_current = mu_proposal\n",
    "        \n",
    "        posterior[i+1] = mu_current\n",
    "        \n",
    "    return posterior"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "np.random.seed(123)\n",
    "data = np.random.randn(20)\n",
    "\n",
    "posterior = sampler(data, samples=1500, mu_init=1.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Okay, instanciating scipy distributions over and over again is very costly...."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@numba.jit\n",
    "def norm_pdf(loc, scale, x):\n",
    "    y = (x - loc) / scale\n",
    "    return np.exp(-y**2/2)/np.sqrt(2*np.pi) / scale"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@numba.jit(nopython=True)\n",
    "def sampler(data, samples=4, mu_init=.5, proposal_width=.5, plot=False, mu_prior_mu=0, mu_prior_sd=1.):\n",
    "    mu_current = mu_init\n",
    "    \n",
    "    posterior = np.empty(samples+1)\n",
    "    posterior[0] = mu_current\n",
    "    \n",
    "    for i in range(samples):\n",
    "        # suggest new position\n",
    "        mu_proposal = np.random.normal(mu_current, proposal_width)\n",
    "\n",
    "        # Compute likelihood by multiplying probabilities of each data point\n",
    "        likelihood_current = np.prod(norm_pdf(mu_current, 1, data))\n",
    "        likelihood_proposal = np.prod(norm_pdf(mu_proposal, 1, data))\n",
    "        \n",
    "        # Compute prior probability of current and proposed mu        \n",
    "        prior_current = norm_pdf(mu_prior_mu, mu_prior_sd, mu_current)\n",
    "        prior_proposal = norm_pdf(mu_prior_mu, mu_prior_sd, mu_proposal)\n",
    "        \n",
    "        p_current = likelihood_current * prior_current\n",
    "        p_proposal = likelihood_proposal * prior_proposal\n",
    "        \n",
    "        # Accept proposal?\n",
    "        p_accept = p_proposal / p_current\n",
    "        \n",
    "        # Usually would include prior probability, which we neglect here for simplicity\n",
    "        accept = np.random.rand() < p_accept\n",
    "        \n",
    "        if accept:\n",
    "            # Update position\n",
    "            mu_current = mu_proposal\n",
    "        \n",
    "        posterior[i+1] = mu_current\n",
    "        \n",
    "    return posterior"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axs = plt.subplots(1,2, figsize=(12,5))\n",
    "vals = sampler(data, samples=1500, mu_init=1.)\n",
    "axs[0].plot(vals)\n",
    "axs[1].hist(vals[500:], bins=np.linspace(-1,1,100))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "np.random.seed(123)\n",
    "data = np.random.randn(20)\n",
    "\n",
    "posterior = sampler(data, samples=1500, mu_init=1.)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "compclass",
   "language": "python",
   "name": "compclass"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}