{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install jax jaxlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fonnesbeck/Bios8366/blob/master/notebooks/Section4_4-Hamiltonian-Monte-Carlo.ipynb)\n",
    "\n",
    "# Hamiltonian Monte Carlo\n",
    "\n",
    "Having introduced first-generation MCMC methods previously, we will now turn our attention to a more sophisticated class of algorithm, which improve upon random-walk jump algorithms by using information about the posterior distribution to inform candidate transitions: namely, gradient information.\n",
    "\n",
    "In order to implement these methods in code, we require more powerful mathematical software tools that allow for automated gradient calculation. There are several open source toolboxes that can support gradient-based Monte Carlo methods, and we will look at one of them in detail."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction to Mathematical Expressions with Aesara\n",
    "\n",
    "Aesara is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Aesara features:\n",
    "\n",
    "* __efficient symbolic differentiation__ – Aesara does your derivatives for function with one or many inputs.\n",
    "* __speed and stability optimizations__ – Get the right answer for log(1+x) even when x is really tiny.\n",
    "* __evaluate expressions faster__ - Implements an extensible graph transpilation framework that currently provides compilation via C, JAX, and Numba\n",
    "* __extensive unit-testing and self-verification__ – Detect and diagnose errors.\n",
    "* __hackable, pure-Python codebase__ - Extensible graph framework suitable for rapid development of custom operators and symbolic optimizations\n",
    "\n",
    "Aesara is based on one of the most widely-used Python tensor libraries: Theano\n",
    "\n",
    "After a brief introduction to the Aesara package, we will use it to implement a modern MCMC algorithm, *Hamiltonian Monte Carlo (HMC)*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adding Two Scalars\n",
    "\n",
    "To get us started with Aesara and get a feel of what we're working with, \n",
    "let's make a simple function: add two numbers together. Here is how you do\n",
    "it:\n",
    "\n",
    "### Step 1 - Declaring Variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import aesara.tensor as at\n",
    "\n",
    "x = at.scalar(name='x')\n",
    "y = at.scalar(name='y')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Aesara symbols we have created are `TensorVariable` objects"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, the data type is `float64`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.type"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A **tensor** is a generalization of an array to (potentially) multiple dimensions. Thus, everything from a scalar to a multi-dimensional hyper-matrix can be accomodated with the same abstraction. All expressions defined in Aesara are performed by associating tensors with operations and with one another."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 2 - Symbolic Expressions\n",
    "\n",
    "The second step is to combine *x* and *y* into their sum *z*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z = x + y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*z* is yet another *Variable* which represents the addition of\n",
    "*x* and *y*. You can use the `pp` function to *pretty-print* out the computation associated to *z*.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aesara.printing import pp\n",
    "print(pp(z))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Variables can have attributes, such as a meaningful name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z.name = 'x plus y'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, we have created an *expression* for the addition of `x` and `y` and called it `z`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3 - Compiling a Function\n",
    "\n",
    "The last step is to create a function taking *x* and *y* as inputs\n",
    "and giving *z* as output:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aesara import function\n",
    "\n",
    "f = function(inputs=[x, y], outputs=z, mode='NUMBA')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first argument to `function()` is a list of Variables that will be provided as inputs to the function. The second argument is a single Variable *or* a list of Variables. For either case, the second argument is what we want to see as output when we apply the function. *f* may then be used like a normal Python function. \n",
    "\n",
    "By default, Aesara functions are compiled to C, but this can be replaced by other backend options with the `mode` argument, which currently accepts `'NUMBA'` or `'JAX'` as alternatives (assuming they are installed on your system).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's have a look at the graph of our function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aesara import dprint\n",
    "\n",
    "dprint(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can call the function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f(2, 3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f(16.3, 12.1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you are following along and typing into an interpreter, you may have\n",
    "noticed that there was a slight delay in executing the ``function``\n",
    "instruction. Behind the scenes, *f* was being **compiled into C code** (or Numba, or Jax)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Internally, Aesara builds a graph structure composed of interconnected `Variable` nodes, `op` nodes and `apply` nodes. \n",
    "\n",
    "An `op` node encapsulates a particular mathematical operation, such as an arithmetic operation or a transformation.\n",
    "\n",
    "An `apply` node represents the application of an `op` to some variables. It is important to draw the difference between the definition of a computation represented by an `op` and its application to some actual data which is represented by the apply node. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A `Variable` is the main data structure you work with when\n",
    "using Aesara. By calling `at.scalar` with a string argument, you create a\n",
    "`Variable` representing a floating-point scalar quantity with the\n",
    "given name. If you provide no argument, the symbol will be unnamed. Names\n",
    "are not required, but they can help debugging."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adding Two Matrices\n",
    "\n",
    "If we want to work with matrices instead of scalars, the only change\n",
    "from the previous example is that you need to instantiate *x* and\n",
    "*y* using the matrix Types:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = at.dmatrix('x')\n",
    "y = at.dmatrix('y')\n",
    "z = x + y\n",
    "f = function([x, y], z, mode='JAX')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is the expression graph corresponding to the addition of `x` and `y`:\n",
    "\n",
    "![expression graph](images/expression_graph.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "``matrix`` is the Type for matrices of doubles. Then we can use\n",
    "our new function on 2D arrays:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f([[1, 2], [3, 4]], [[10, 20], [30, 40]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each node in the expression graph is aware of its parent nodes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z.owner"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following types are available:\n",
    "\n",
    "* **byte**: ``bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4``\n",
    "* **16-bit integers**: ``wscalar, wvector, wmatrix, wrow, wcol, wtensor3, wtensor4``\n",
    "* **32-bit integers**: ``iscalar, ivector, imatrix, irow, icol, itensor3, itensor4``\n",
    "* **64-bit integers**: ``lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4``\n",
    "* **float**: ``fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4``\n",
    "* **double**: ``dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4``\n",
    "* **complex**: ``cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4``"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An example of a slightly more interesting function is the logistic curve. Let's start with a matrix variable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = at.matrix('x')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and apply the logistic transformation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s = 1 / (1 + at.exp(-x))\n",
    "s.name = 'logit-x'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, this is just an `op` node which acts as a placeholder for the operation; we still need to define the `apply` node, as a `function`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "logistic = function([x], s)\n",
    "print(logistic([[0, 1], [-1, -2]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Aesara supports functions with multiple outputs. For example, we can\n",
    "compute the elementwise difference, absolute difference, and\n",
    "squared difference between two matrices *a* and *b* at the same time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a, b = at.matrices('a', 'b')\n",
    "diff = a - b\n",
    "abs_diff = abs(diff)\n",
    "diff_squared = diff ** 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we use the function `f`, it returns the three computed results as a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f = function([a, b], [diff, abs_diff, diff_squared])\n",
    "\n",
    "f([[1, 1], [1, 1]], [[0, 1], [2, 3]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Manipulating the Graph\n",
    "\n",
    "After an expression graph has been created, it can be modified as needed. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import aesara\n",
    "\n",
    "list(aesara.graph.graph_inputs([z]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, let's add a log transformation to `z` at the end of the graph:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "w = at.log(z)\n",
    "w.name = 'log of z'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dprint(w)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now arbitrarily add an exponential transformation before the logarithm, essentially cancelling it out. This involves swapping out the parent node of `w`, which is currently `x + y`, with `exp(x + y)` using the `clone_replace` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "parent_of_w = w.owner.inputs[0]\n",
    "new_parent_of_w = at.exp(parent_of_w)\n",
    "new_parent_of_w.name = 'exp(x + y)'\n",
    "new_w = aesara.clone_replace(output=[w], replace={parent_of_w: new_parent_of_w})[0]\n",
    "new_w.name = \"log(exp(x + y))\"\n",
    "dprint(new_w)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setting a Default Value for an Argument\n",
    " \n",
    "Let's say you want to define a function that adds two numbers, except that if you only provide one number, the other input is assumed to be one. In Python, the default value for parameters (keywork argument) achieves this effect.\n",
    "\n",
    "In Aesara we make use of the [In](https://aesara.readthedocs.io/en/latest/library/compile/io.html?highlight=In#aesara.compile.io.In) class, which allows you to specify properties of your function's parameters with greater detail. Here we give a default value of 1 for y by creating an In instance with its value field set to 1. Inputs with default values must follow inputs without default values (like Python's functions). There can be multiple inputs with default values. These parameters can be set positionally or by name, as in standard Python."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aesara import In\n",
    "\n",
    "x, y, w = at.dscalars('x', 'y', 'w')\n",
    "z = (x + y) * w\n",
    "g = function([x, In(y, value=1), In(w, value=2, name='w_by_name')], z)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f'g(33) = {g(33)}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f'g(33, 0, 1) = {g(33, 0, 1)}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f'g(33, w_by_name=1) = {g(33, w_by_name=1)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Maintaining State with Shared Variables\n",
    "\n",
    "It is also possible to make a function with an internal state. For example, let’s say we want to make an accumulator: at the beginning, the state is initialized to zero. Then, on each function call, the state is incremented by the function’s argument.\n",
    "\n",
    "First let’s define the accumulator function. It adds its argument to the internal state, and returns the old state value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aesara import shared\n",
    "\n",
    "state = shared(0)\n",
    "inc = at.iscalar('inc')\n",
    "accumulator = function([inc], state, updates=[(state, state+inc)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This code introduces a few new concepts. The `shared` function constructs so-called shared variables. \n",
    "\n",
    "    state = shared(0)\n",
    "\n",
    "These are hybrid symbolic and non-symbolic variables whose value may be shared between multiple functions. Shared variables can be used in symbolic expressions but they also have an internal value that defines the value taken by this symbolic variable in all the functions that use it. It is called a shared variable because its value is shared between many functions. The value can be accessed and modified by the `get_value` and `set_value` methods.\n",
    "\n",
    "The other new thing in this code is the `updates` parameter of function. \n",
    "\n",
    "    updates=[(state, state+inc)\n",
    "\n",
    "`updates` must be supplied with a list of pairs of the form `(shared-variable, new expression)`. It can also be a dictionary whose keys are shared-variables and values are the new expressions. Here, the accumulator replaces the `state`‘s value with the sum of `state` and the increment amount `inc`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(state.get_value())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(accumulator(1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(state.get_value())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(accumulator(300))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(state.get_value())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is possible to reset the state. Just use the `set_value` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "state.set_value(-1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(accumulator(3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(state.get_value())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we mentioned above, you can define more than one function to use the same shared variable. These functions can all update the value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dec = at.iscalar('dec')\n",
    "decrementor = function([dec], state, updates=[(state, state-dec)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(decrementor(2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(state.get_value())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You might be wondering why the updates mechanism exists. You can always achieve a similar result by returning the new expressions, and working with them in NumPy as usual. While the updates mechanism can be a syntactic convenience, it is mainly there for *efficiency*. Updates to shared variables can sometimes be done more quickly using in-place algorithms (e.g. low-rank matrix updates). Also, Aesara has more control over where and how shared variables are allocated, which is one of the important elements of getting good performance on the GPU."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise: Create and manipulate Aesara objects\n",
    "\n",
    "To give you some practice with basic Aesara data structures and functions, try making the operations below work by implementing the functions that are needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "def make_vector():\n",
    "    \"\"\"\n",
    "    Create and return a new Aesara vector.\n",
    "    \"\"\"\n",
    "\n",
    "    pass\n",
    "\n",
    "def make_matrix():\n",
    "    \"\"\"\n",
    "    Create and return a new Aesara matrix.\n",
    "    \"\"\"\n",
    "\n",
    "    pass\n",
    "\n",
    "def elemwise_mul(a, b):\n",
    "    \"\"\"\n",
    "    a: An Aesara matrix\n",
    "    b: An Aesara matrix\n",
    "    \n",
    "    Calcuate the elementwise product of a and b and return it\n",
    "    \"\"\"\n",
    "\n",
    "    pass\n",
    "\n",
    "def matrix_vector_mul(a, b):\n",
    "    \"\"\"\n",
    "    a: An Aesara matrix\n",
    "    b: An Aesara vector\n",
    "    \n",
    "    Calculate the matrix-vector product of a and b and return it\n",
    "    \"\"\"\n",
    "\n",
    "    pass\n",
    "\n",
    "a = make_vector()\n",
    "b = make_vector()\n",
    "c = elemwise_mul(a, b)\n",
    "d = make_matrix()\n",
    "e = matrix_vector_mul(d, c)\n",
    "\n",
    "f = function([a, b, d], e)\n",
    "\n",
    "rng = np.random.RandomState([1, 2, 3])\n",
    "a_value = rng.randn(5).astype(a.dtype)\n",
    "b_value = rng.rand(5).astype(b.dtype)\n",
    "c_value = a_value * b_value\n",
    "d_value = rng.randn(5, 5).astype(d.dtype)\n",
    "expected = np.dot(d_value, c_value)\n",
    "\n",
    "actual = f(a_value, b_value, d_value)\n",
    "\n",
    "assert np.allclose(actual, expected)\n",
    "print(\"SUCCESS!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example: Logistic regression\n",
    "\n",
    "Here is a non-trivial example, which uses Aesara to estimate the parameters of a logistic regression model using gradient information. We will use the bioassay example as a test case:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "rng = np.random\n",
    "\n",
    "dose = np.array([-0.86, -0.3 , -0.05,  0.73])\n",
    "deaths = np.array([0, 1, 3, 5])\n",
    "training_steps = 1000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We first declare Aesara symbolic variables:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = at.vector(\"x\")\n",
    "y = at.vector(\"y\")\n",
    "w = shared(1., name=\"w\")\n",
    "b = shared(0., name=\"b\")\n",
    "\n",
    "print(\"Initial model:\", w.get_value(), b.get_value())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "... then construct the expression graph:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Probability that target = 1\n",
    "p_1 = 1 / (1 + at.exp(-(x*w + b))) \n",
    "\n",
    "# The prediction threshold\n",
    "prediction = p_1 > 0.5         \n",
    "\n",
    "# Cross-entropy loss function\n",
    "xent = -y * at.log(p_1) - (5-y) * at.log(1-p_1) \n",
    "\n",
    "# The cost to minimize\n",
    "cost = xent.mean()      \n",
    "\n",
    "# Compute the gradient of the cost\n",
    "gw, gb = at.grad(cost, [w, b])                  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compile Aesara functions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "step = shared(10., name='step')\n",
    "train = function(\n",
    "          inputs=[x, y],\n",
    "          outputs=[prediction, xent],\n",
    "          updates=((w, w - step * gw), (b, b - step * gb), (step, step * 0.99)),\n",
    "          mode='JAX')\n",
    "predict = function(inputs=[x], outputs=prediction, mode='JAX')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Train model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i in range(training_steps):\n",
    "    pred, err = train(dose, deaths)\n",
    "    \n",
    "w, b = w.get_value(), b.get_value()\n",
    "\n",
    "print(\"Final model:\", w, b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "logit = lambda x: 1. / (1 + np.exp(-x))\n",
    "xvals = np.linspace(-1, 1)\n",
    "plt.plot(xvals, logit(w*xvals + b))\n",
    "plt.plot(dose, deaths/5., 'ro')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises: Gradients and functions\n",
    "\n",
    "Let's try using the Aesara automatic gradient system to compute derivatives."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def grad_sum(x, y, z):\n",
    "    \"\"\"\n",
    "    x: A aesara variable\n",
    "    y: A aesara variable\n",
    "    z: A aesara expression involving x and y\n",
    "    Returns dz / dx + dz / dy\n",
    "    \"\"\"\n",
    "\n",
    "    pass\n",
    "\n",
    "\n",
    "x = at.scalar()\n",
    "y = at.scalar()\n",
    "z = x + y\n",
    "s = grad_sum(x, y, z)\n",
    "assert s.eval({x: 0, y: 0}) == 2\n",
    "print(\"SUCCESS!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now try compiling and running a simple function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def evaluate(x, y, expr, x_value, y_value):\n",
    "    \"\"\"\n",
    "    x: A aesara variable\n",
    "    y: A aesara variable\n",
    "    expr: A aesara expression involving x and y\n",
    "    x_value: A numpy value\n",
    "    y_value: A numpy value\n",
    "    \n",
    "    Returns the value of expr when x_value is substituted for x\n",
    "    and y_value is substituted for y\n",
    "    \"\"\"\n",
    "\n",
    "    pass\n",
    "\n",
    "\n",
    "x = at.iscalar()\n",
    "y = at.iscalar()\n",
    "z = x + y\n",
    "assert evaluate(x, y, z, 1, 2) == 3\n",
    "print(\"SUCCESS!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Random Numbers\n",
    "\n",
    "Because in Aesara you first express everything symbolically and afterwards compile this expression to get functions, using pseudo-random numbers is not as straightforward as it is in NumPy.\n",
    "\n",
    "The way to think about putting randomness into Aesara’s computations is to put random variables in your graph. Aesara will allocate a NumPy `RandomStream` object (a random number generator) for each such variable, and draw from it as necessary. We will call this sort of sequence of random numbers a random stream. Random streams are at their core shared variables, so the observations on shared variables hold here as well. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aesara.tensor.random.utils import RandomStream\n",
    "from aesara import function\n",
    "srng = RandomStream(seed=234)\n",
    "\n",
    "rv_u = srng.uniform(0, 1, size=(2,2))\n",
    "rv_n = srng.normal(0, 1, size=(2,2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, `rv_u` is a random stream of 2x2 matrices of draws from a uniform distribution, while `rv_n` is a random stream of 2x2 matrices of draws from a normal distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f = function([], rv_u)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have wrapped our random number stream in a function, we have a random number generator. Each time we call `f()`, it returns a pseudo-random number and the internal state of the stream is updated, so that the next call to `f()` will return a different value. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "g = function([], rv_n, no_default_updates=True)    #Not updating rv_n.rng"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the case of the normal example, if we add the argument `no_default_updates=True` to function, then the random number stream state is not affected by calling the returned function (i.e. the state is not updated). In this case, calling `g()` multiple times will return the same numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "g()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "g()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Its important to note that a random variable is drawn at most once during any single function execution. So the function below is guaranteed to return a value that is zero (except for rounding error) despite the fact that the expression `rv_u` appears multiple times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nearly_zeros = function([], rv_u - rv_u - rv_u + rv_u)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nearly_zeros()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Looping in Aesara\n",
    "\n",
    "The `scan` function provides the ability to write loops in Aesara. We are not able to use Python `for` loops with Aesara because Aesara needs to be able to build and optimize the expression graph before compiling it into faster code, and be able to use symbolic differentiation for calculating gradients.\n",
    "\n",
    "### Simple loop with accumulation\n",
    "\n",
    "Assume that, given $k$ you want to get $A^k$ using a loop. More precisely, if $A$ is a tensor you want to compute $A^k$ elementwise. The python code might look like:\n",
    "\n",
    "```python\n",
    "result = 1\n",
    "for i in range(k):\n",
    "    result = result * A\n",
    "```\n",
    "\n",
    "There are three things here that we need to handle: the initial value assigned to result, the accumulation of results in result, and the unchanging variable A. Unchanging variables are passed to scan as non_sequences. Initialization occurs in outputs_info, and the accumulation happens automatically.\n",
    "\n",
    "The equivalent Aesara code would be:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from aesara import scan \n",
    "\n",
    "k = at.iscalar(\"k\")\n",
    "A = at.vector(\"A\")\n",
    "\n",
    "# Symbolic description of the result\n",
    "result, updates = scan(\n",
    "    fn=lambda prior_result, A: prior_result * A,\n",
    "    outputs_info=at.ones_like(A),\n",
    "    non_sequences=A,\n",
    "    n_steps=k\n",
    ")\n",
    "\n",
    "# We only care about A**k, but scan has provided us with A**1 through A**k.\n",
    "# Discard the values that we don't care about. Scan is smart enough to\n",
    "# notice this and not waste memory saving them.\n",
    "final_result = result[-1]\n",
    "\n",
    "# compiled function that returns A**k\n",
    "power = function(inputs=[A,k], outputs=final_result, updates=updates)\n",
    "\n",
    "print(power(range(10),2))\n",
    "print(power(range(10),4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us go through the example line by line. What we did is first to **construct a function** (using a lambda expression) that given `prior_result` and `A` returns `prior_result * A`. The order of parameters is fixed by `scan`: the output of the prior call to `fn` is the first parameter, followed by all non-sequences.\n",
    "\n",
    "Next we **initialize the output** as a tensor with same shape and `dtype` as `A`, filled with ones. We give `A` to `scan` as a non sequence parameter and specify the number of steps `k` to iterate over our `lambda` expression.\n",
    "\n",
    "Scan **returns a tuple** containing our result (`result`) and a dictionary of updates (empty in this case). Note that the result is not a matrix, but a 3D tensor containing the value of $A^k$ for each step. We want the last value (after k steps) so we compile a function to return just that. Note that there is an optimization, that at compile time will detect that you are using just the last value of the result and ensure that scan does not store all the intermediate values that are used. So do not worry if `A` and `k` are large."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to looping a fixed number of times, scan can iterate over the leading dimension of tensors (similar to Python’s `for x in a_list`).\n",
    "\n",
    "The tensor(s) to be looped over should be provided to `scan` using the `sequences` keyword argument.\n",
    "\n",
    "### Example: polynomial calculation\n",
    "\n",
    "Here’s an example that builds a symbolic calculation of a polynomial from a list of its coefficients:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "coefficients = at.vector(\"coefficients\")\n",
    "x = at.scalar(\"x\")\n",
    "\n",
    "# Generate the components of the polynomial\n",
    "components, updates = scan(fn=lambda coefficient, power, val: coefficient * (val ** power),\n",
    "                                  outputs_info=None,\n",
    "                                  sequences=[coefficients, at.arange(1000)],\n",
    "                                  non_sequences=x)\n",
    "# Sum them up\n",
    "polynomial = components.sum()\n",
    "\n",
    "# Compile a function\n",
    "calculate_polynomial = function(inputs=[coefficients, x], outputs=polynomial)\n",
    "\n",
    "# Test\n",
    "test_coefficients = np.asarray([1, 0, 2], dtype=np.float32)\n",
    "test_value = 3\n",
    "print(calculate_polynomial(test_coefficients, test_value))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hamiltonian Monte Carlo\n",
    "\n",
    "While flexible and easy to implement, Metropolis-Hastings sampling is a random walk\n",
    "sampler that might not be statistically efficient for many models. Specifically, for models of high dimension, random walk jumping algorithms do not perform well. It is not enough to simply guess at the next sample location; we need to make each iteration a useful draw from the posterior whenever we can, in order to have an efficient sampler for bigger models.\n",
    "\n",
    "Since Bayesian inference is all about calculating expectations over posteriors, what we seek is an algorithm that explores the area of the parameter space that contains most of the non-zero probability. This region is called the **typical set**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "### What's a Typical Set?\n",
    "\n",
    "The typical set is where most of the probability density (mass) lies in a particular volume associated with the distribution. As the dimension of a model increases, this set moves progressively further from the mode, and becomes more singular, as the result of concentration of measure.\n",
    "\n",
    "The typical set is a product of both the density, which is highest at the mode, and volume (that we integrate over), which increasingly becomes larger away from the mode as dimensionality increases. In fact, at high dimensions, the region around the mode contributes almost nothing to the expectation. We need an algorithm that will find this narrow region and explore it efficiently."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![from Hoffman and Gelman 2014](http://d.pr/i/RAA+)\n",
    "\n",
    "In this context, and when sampling from continuous variables, Hamiltonian (or Hybrid) Monte\n",
    "Carlo (HMC) can prove to be a powerful tool. It avoids\n",
    "random walk behavior by simulating a physical system governed by\n",
    "Hamiltonian dynamics, potentially avoiding tricky conditional\n",
    "distributions in the process.\n",
    "\n",
    "In HMC, model samples are obtained by simulating a physical system,\n",
    "where particles move about a high-dimensional landscape, subject to\n",
    "potential and kinetic energies. Adapting the notation from [Neal (1993)](http://www.cs.toronto.edu/~radford/review.abstract.html),\n",
    "particles are characterized by a position vector or state\n",
    "$s \\in \\mathcal{R}^D$ and velocity vector $\\phi \\in \\mathcal{R}^D$. The\n",
    "combined state of a particle is denoted as $\\chi=(s,\\phi)$. \n",
    "\n",
    "The joint **canonical distribution** of the position and velocity can be expressed as a product of the marginal position (which is of interest) and the conditional distribution of the velocity:\n",
    "\n",
    "$$\\pi(s, \\phi) = \\pi(\\phi | s) \\pi(s)$$\n",
    "\n",
    "This joint probability can also be written in terms of an invariant **Hamiltonian function**:\n",
    "\n",
    "$$\\pi(s, \\phi) \\propto \\exp(-H(s,\\phi))$$\n",
    "\n",
    "The Hamiltonian is then defined as the sum of potential energy $E(s)$ and kinetic energy\n",
    "$K(\\phi)$, as follows:\n",
    "\n",
    "$$\\mathcal{H}(s,\\phi) = E(s) + K(\\phi)\n",
    "= E(s) + \\frac{1}{2} \\sum_i \\phi_i^2$$\n",
    "\n",
    "Instead of sampling $p(s)$ directly, HMC operates by sampling from the canonical distribution.\n",
    "\n",
    "$$p(s,\\phi) = \\frac{1}{Z} \\exp(-\\mathcal{H}(s,\\phi))=p(s)p(\\phi)$$\n",
    "\n",
    "If we choose a momentum that is independent of position, marginalizing over $\\phi$ is\n",
    "trivial and recovers the original distribution of interest.\n",
    "\n",
    "Note that the Hamiltonian $\\mathcal{H}$ is independent of the parameterization of the model, and therefore, captures the geometry of the phase space distribution, including typical set. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Hamiltonian Dynamics**\n",
    "\n",
    "State $s$ and velocity $\\phi$ are modified such that\n",
    "$\\mathcal{H}(s,\\phi)$ remains constant throughout the simulation. The\n",
    "differential equations are given by:\n",
    "\n",
    "$$\\begin{aligned}\\frac{ds_i}{dt} &= \\frac{\\partial \\mathcal{H}}{\\partial \\phi_i} = \\phi_i \\\\\n",
    "\\frac{d\\phi_i}{dt} &= - \\frac{\\partial \\mathcal{H}}{\\partial s_i}\n",
    "= - \\frac{\\partial E}{\\partial s_i}\n",
    "\\end{aligned}$$\n",
    "\n",
    "As shown in [Neal (1993)](http://www.cs.toronto.edu/~radford/review.abstract.html), \n",
    "the above transformation preserves volume and is\n",
    "reversible. The above dynamics can thus be used as transition operators\n",
    "of a Markov chain and will leave $p(s,\\phi)$ invariant. That chain by\n",
    "itself is not ergodic however, since simulating the dynamics maintains a\n",
    "fixed Hamiltonian $\\mathcal{H}(s,\\phi)$. HMC thus alternates Hamiltonian\n",
    "dynamic steps, with Gibbs sampling of the velocity. Because $p(s)$ and\n",
    "$p(\\phi)$ are independent, sampling $\\phi_{new} \\sim p(\\phi|s)$ is\n",
    "trivial since $p(\\phi|s)=p(\\phi)$, where $p(\\phi)$ is often taken to be\n",
    "the univariate Gaussian.\n",
    "\n",
    "**The Leap-Frog Algorithm**\n",
    "\n",
    "In practice, we cannot simulate Hamiltonian dynamics exactly because of\n",
    "the problem of time discretization. There are several ways one can do\n",
    "this. To maintain invariance of the Markov chain however, care must be\n",
    "taken to preserve the properties of *volume conservation* and *time\n",
    "reversibility*. The **leap-frog algorithm** maintains these properties\n",
    "and operates in 3 steps:\n",
    "\n",
    "$$\\begin{aligned}\n",
    "\\phi_i(t + \\epsilon/2) &= \\phi_i(t) - \\frac{\\epsilon}{2} \\frac{\\partial{}}{\\partial s_i} E(s(t)) \\\\\n",
    "s_i(t + \\epsilon) &= s_i(t) + \\epsilon \\phi_i(t + \\epsilon/2) \\\\\n",
    "\\phi_i(t + \\epsilon) &= \\phi_i(t + \\epsilon/2) - \\frac{\\epsilon}{2} \\frac{\\partial{}}{\\partial s_i} E(s(t + \\epsilon)) \n",
    "\\end{aligned}$$\n",
    "\n",
    "We thus perform a half-step update of the velocity at time\n",
    "$t+\\epsilon/2$, which is then used to compute $s(t + \\epsilon)$ and\n",
    "$\\phi(t + \\epsilon)$.\n",
    "\n",
    "**Accept / Reject**\n",
    "\n",
    "In practice, using finite stepsizes $\\epsilon$ will not preserve\n",
    "$\\mathcal{H}(s,\\phi)$ exactly and will introduce bias in the simulation.\n",
    "Also, rounding errors due to the use of floating point numbers means\n",
    "that the above transformation will not be perfectly reversible.\n",
    "\n",
    "HMC cancels these effects **exactly** by adding a Metropolis\n",
    "accept/reject stage, after $n$ leapfrog steps. The new state\n",
    "$\\chi' = (s',\\phi')$ is accepted with probability $p_{acc}(\\chi,\\chi')$,\n",
    "defined as:\n",
    "\n",
    "$$p_{acc}(\\chi,\\chi') = min \\left( 1, \\frac{\\exp(-\\mathcal{H}(s',\\phi')}{\\exp(-\\mathcal{H}(s,\\phi)} \\right)$$\n",
    "\n",
    "**HMC Algorithm**\n",
    "\n",
    "We obtain a new HMC sample as follows:\n",
    "\n",
    "1.  sample a new velocity from a univariate Gaussian distribution\n",
    "2.  perform $n$ leapfrog steps to obtain the new state $\\chi'$\n",
    "3.  perform accept/reject move of $\\chi'$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Implementing HMC Using Aesara\n",
    "-----------------------------\n",
    "\n",
    "In Aesara, update dictionaries and shared variables provide a natural\n",
    "way to implement a sampling algorithm. The current state of the sampler\n",
    "can be represented as a Aesara shared variable, with HMC updates being\n",
    "implemented by the updates list of a Aesara function.\n",
    "\n",
    "We breakdown the HMC algorithm into the following sub-components:\n",
    "\n",
    "-   `simulate_dynamics`: a symbolic Python function which, given an\n",
    "    initial position and velocity, will perform `n_steps` leapfrog\n",
    "    updates and return the symbolic variables for the proposed state\n",
    "    $\\chi'$.\n",
    "-   `hmc_move`: a symbolic Python function which given a starting\n",
    "    position, generates $\\chi$ by randomly sampling a velocity vector.\n",
    "    It then calls `simulate_dynamics` and determines whether the\n",
    "    transition $\\chi\n",
    "    \\rightarrow \\chi'$ is to be accepted.\n",
    "-   `hmc_updates`: a Python function which, given the symbolic\n",
    "    outputs of `hmc_move`, generates the list of updates for a single\n",
    "    iteration of HMC.\n",
    "-   `HMC_sampler`: a Python helper class which wraps everything\n",
    "    together.\n",
    "\n",
    "**simulate\\_dynamics**\n",
    "\n",
    "To perform $n$ leapfrog steps, we first need to define a function over\n",
    "which the algorithm can iterate. Instead of implementing leap frog verbatim, notice that we can obtain\n",
    "$s(t + n \\epsilon)$ and $\\phi(t + n \\epsilon)$ by performing an initial\n",
    "half-step update for $\\phi$, followed by $n$ full-step updates for\n",
    "$s,\\phi$ and one last half-step update for $\\phi$. In loop form, this\n",
    "gives:\n",
    "\n",
    "$$\\begin{aligned}\\phi_i(t + \\epsilon/2) &= \\phi_i(t) -\n",
    "\\frac{\\epsilon}{2} \\frac{\\partial{}}{\\partial s_i} E(s(t)) \\\\\n",
    "s_i(t + \\epsilon) &= s_i(t) + \\epsilon \\phi_i(t + \\epsilon/2) \\\\\n",
    "\\text{For } m \\in [2,n]\\text{, perform full updates: } \\\\\n",
    "\\qquad\n",
    "\\phi_i(t + (m - 1/2)\\epsilon) &= \\phi_i(t + (m-3/2)\\epsilon) -\n",
    "\\epsilon \\frac{\\partial{}}{\\partial s_i} E(s(t + (m-1)\\epsilon)) \\\\\n",
    "\\qquad\n",
    "s_i(t + m\\epsilon) &= s_i(t) + \\epsilon \\phi_i(t + (m-1/2)\\epsilon) \\\\\n",
    "\\phi_i(t + n\\epsilon) &= \\phi_i(t + (n-1/2)\\epsilon) -\n",
    "\\frac{\\epsilon}{2} \\frac{\\partial{}}{\\partial s_i} E(s(t + n\\epsilon)) \n",
    "\\end{aligned}$$\n",
    "\n",
    "The inner-loop defined above is implemented by the following\n",
    "`leapfrog` function, with `pos`, `vel` and `step` replacing\n",
    "$s,\\phi$ and $\\epsilon$ respectively."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "def leapfrog(pos, vel, step):\n",
    "\n",
    "    # one full velocity step\n",
    "    dE_dpos = tt.grad(energy_fn(pos).sum(), pos)\n",
    "    new_vel = vel - step * dE_dpos\n",
    "    \n",
    "    # one full position step\n",
    "    new_pos = pos + step * new_vel\n",
    "\n",
    "    return [new_pos, new_vel],{}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `simulate_dynamics` function performs the full algorithm. We start with the initial half-step update of $\\phi$\n",
    "and full-step of $s$, and then scan over the `leapfrog` method `n_steps-1` times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def simulate_dynamics(initial_pos, initial_vel, stepsize, n_steps, energy_fn):\n",
    "    \n",
    "\n",
    "    def leapfrog(pos, vel, step):\n",
    "\n",
    "        # Gradient calculation\n",
    "        dE_dpos = at.grad(energy_fn(pos).sum(), pos)\n",
    "        \n",
    "        new_vel = vel - step * dE_dpos\n",
    "        new_pos = pos + step * new_vel\n",
    "        return [new_pos, new_vel], {}\n",
    "\n",
    "    # An initial half-step in velocity\n",
    "    initial_energy = energy_fn(initial_pos)\n",
    "    dE_dpos = at.grad(initial_energy.sum(), initial_pos)\n",
    "    vel_half_step = initial_vel - 0.5 * stepsize * dE_dpos\n",
    "\n",
    "    # ... followed by one full position step\n",
    "    pos_full_step = initial_pos + stepsize * vel_half_step\n",
    "\n",
    "    # Perform full velocity-step updates afterwards, using scan\n",
    "    (all_pos, all_vel), scan_updates = scan(leapfrog,\n",
    "            outputs_info=[\n",
    "                dict(initial=pos_full_step),\n",
    "                dict(initial=vel_half_step),\n",
    "                ],\n",
    "            non_sequences=[stepsize],\n",
    "            n_steps=n_steps - 1)\n",
    "    \n",
    "    # Our final position after integrating\n",
    "    final_pos = all_pos[-1]\n",
    "    final_vel = all_vel[-1]\n",
    "    \n",
    "    # One final half-step in velocity to complete the algorithm\n",
    "    energy = energy_fn(final_pos)\n",
    "    final_vel = final_vel - 0.5 * stepsize * at.grad(energy.sum(), final_pos)\n",
    "\n",
    "    # return new proposal state\n",
    "    return final_pos, final_vel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A final half-step is performed to compute $\\phi(t+n\\epsilon)$, and the\n",
    "final proposed state $\\chi'$ is returned.\n",
    "\n",
    "**hmc_move**\n",
    "\n",
    "The `hmc_move` function implements the remaining steps (steps 1 and\n",
    "3) of an HMC move proposal (while wrapping the `simulate_dynamics`\n",
    "function). Given a matrix of initial states\n",
    "$s \\in \\mathcal{R}^{N \\times D}$ (`positions`) and energy function\n",
    "$E(s)$ (`energy_fn`), it defines the symbolic graph for computing\n",
    "`n_steps` of HMC, using a given `stepsize`. The function prototype\n",
    "is as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def hmc_move(s_rng, positions, energy_fn, stepsize, n_steps):\n",
    "\n",
    "    # sample random velocity using independent normals\n",
    "    initial_vel = s_rng.normal(size=positions.shape)\n",
    "\n",
    "    # perform simulation of particles subject to Hamiltonian dynamics\n",
    "    final_pos, final_vel = simulate_dynamics(\n",
    "            initial_pos=positions,\n",
    "            initial_vel=initial_vel,\n",
    "            stepsize=stepsize,\n",
    "            n_steps=n_steps,\n",
    "            energy_fn=energy_fn)\n",
    "\n",
    "    # accept/reject the proposed move based on the joint distribution\n",
    "    accept = metropolis_hastings_accept(\n",
    "            energy_prev=hamiltonian(positions, initial_vel, energy_fn),\n",
    "            energy_next=hamiltonian(final_pos, final_vel, energy_fn),\n",
    "            s_rng=s_rng)\n",
    "\n",
    "    return accept, final_pos"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We start by sampling **random velocities**, using the provided shared\n",
    "`RandomStream` object. Velocities are sampled independently for each\n",
    "dimension and for each particle under simulation, yielding a\n",
    "$N \\times D$ matrix.\n",
    "\n",
    "    initial_vel = s_rng.normal(size=positions.shape)\n",
    "    \n",
    "Since we now have an initial position and velocity, we can now call the\n",
    "`simulate_dynamics` to obtain the proposal for the new state $\\chi'$.\n",
    "\n",
    "    final_pos, final_vel = simulate_dynamics(\n",
    "            initial_pos = positions, \n",
    "            initial_vel = initial_vel,\n",
    "            stepsize = stepsize,\n",
    "            n_steps = n_steps,\n",
    "            energy_fn = energy_fn)\n",
    "            \n",
    "We then **accept/reject** the proposed state based on the Metropolis\n",
    "algorithm.\n",
    "\n",
    "    accept = metropolis_hastings_accept(\n",
    "            energy_prev=hamiltonian(positions, initial_vel, energy_fn),\n",
    "            energy_next=hamiltonian(final_pos, final_vel, energy_fn),\n",
    "            s_rng=s_rng)\n",
    "            \n",
    "where `metropolis_hastings_accept` and `hamiltonian` are helper\n",
    "functions, defined as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def metropolis_hastings_accept(energy_prev, energy_next, s_rng):\n",
    "    \n",
    "    ediff = energy_prev - energy_next\n",
    "    return (at.exp(ediff) - s_rng.uniform(size=energy_prev.shape)) >= 0\n",
    "\n",
    "\n",
    "def kinetic_energy(vel):\n",
    "    \n",
    "    return 0.5 * (vel ** 2).sum(axis=1)\n",
    "\n",
    "\n",
    "def hamiltonian(pos, vel, energy_fn):\n",
    "    \"\"\"\n",
    "    Returns the Hamiltonian (sum of potential and kinetic energy) for the given\n",
    "    velocity and position. Assumes mass is 1.\n",
    "    \"\"\"\n",
    "    return energy_fn(pos) + kinetic_energy(vel)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`hmc_move` finally returns the tuple `(accept, final_pos)`.\n",
    "`accept` is a symbolic boolean variable indicating whether or not the\n",
    "new state `final_pos` should be used or not.\n",
    "\n",
    "**hmc_updates**\n",
    "\n",
    "The purpose of `hmc_updates` is to generate the **list of updates** to\n",
    "perform, whenever our HMC sampling function is called. `hmc_updates`\n",
    "thus receives as parameters, a series of shared variables to update\n",
    "(`positions`, `stepsize` and `avg_acceptance_rate`), and the\n",
    "parameters required to compute their new state.\n",
    "\n",
    "    def hmc_updates(positions, stepsize, avg_acceptance_rate, final_pos, \n",
    "            accept, target_acceptance_rate, stepsize_inc, stepsize_dec, \n",
    "            stepsize_min, stepsize_max, avg_acceptance_slowness):\n",
    "    \n",
    "    \n",
    "        accept_matrix = accept.dimshuffle(0, *(('x',) * (final_pos.ndim - 1)))\n",
    "        \n",
    "        new_positions = tt.switch(accept_matrix, final_pos, positions)\n",
    "        \n",
    "Using the above code, the dictionary `{positions: new_positions}` can\n",
    "be used to update the state of the sampler with either (1) the new state\n",
    "`final_pos` if `accept` is True, or (2) the old state if `accept`\n",
    "is False. This conditional assignment is performed by Aesara's\n",
    "[switch](https://aesara.readthedocs.io/en/latest/library/tensor/basic.html#aesara.tensor.switch)\n",
    "function.\n",
    "\n",
    "`switch` expects as its first argument, a boolean mask with the same\n",
    "broadcastable dimensions as the second and third argument. Since\n",
    "`accept` is scalar-valued, we must first use\n",
    "[dimshuffle](https://aesara.readthedocs.io/en/latest/library/tensor/basic.html#aesara.tensor.var._tensor_py_operators.dimshuffle)\n",
    "to permute the dimensions so that it is of the appropriate\n",
    "dimensions for broadcasting.\n",
    "\n",
    "`hmc_updates` additionally implements an *adaptive* version of HMC. We start by\n",
    "tracking the average acceptance rate of the HMC move proposals (across\n",
    "many simulations), using an exponential moving average with time\n",
    "constant `1 - avg_acceptance_slowness`.\n",
    "\n",
    "    new_acceptance_rate = tt.add(\n",
    "            avg_acceptance_slowness * avg_acceptance_rate,\n",
    "            (1.0 - avg_acceptance_slowness) * accept.mean())\n",
    "            \n",
    "If the average acceptance rate is larger than the\n",
    "`target_acceptance_rate`, we increase the `stepsize` by a factor\n",
    "of `stepsize_inc` in order to increase the mixing rate of our chain.\n",
    "If the average acceptance rate is too low however, `stepsize` is\n",
    "decreased by a factor of `stepsize_dec`, yielding a more conservative\n",
    "mixing rate.\n",
    "\n",
    "    _new_stepsize = tt.switch(avg_acceptance_rate > target_acceptance_rate,\n",
    "                              stepsize * stepsize_inc, stepsize * stepsize_dec)\n",
    "\n",
    "    new_stepsize = tt.clip(_new_stepsize, stepsize_min, stepsize_max)\n",
    "\n",
    "The\n",
    "[`clip`](https://aesara.readthedocs.io/en/latest/library/tensor/basic.html#aesara.tensor.clip)\n",
    "function allows us to maintain the `stepsize` in the range\n",
    "[`stepsize_min`, `stepsize_max`].\n",
    "\n",
    "The final updates list is then returned:\n",
    "\n",
    "    return [(positions, new_positions),\n",
    "        (stepsize, new_stepsize),\n",
    "        (avg_acceptance_rate, new_acceptance_rate)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import aesara \n",
    "\n",
    "def hmc_updates(positions, stepsize, avg_acceptance_rate, final_pos, accept,\n",
    "                 target_acceptance_rate, stepsize_inc, stepsize_dec,\n",
    "                 stepsize_min, stepsize_max, avg_acceptance_slowness):\n",
    "\n",
    "    \"\"\"\n",
    "    POSITION UPDATES\n",
    "    \n",
    "    Uses results of the acceptance test to update position\n",
    "    \"\"\"\n",
    "\n",
    "    accept_matrix = accept.dimshuffle(0, *(('x',) * (final_pos.ndim - 1)))\n",
    "    # if accept is True, update to `final_pos` else stay put\n",
    "    new_positions = at.switch(accept_matrix, final_pos, positions)\n",
    "\n",
    "    \"\"\"\n",
    "    STEP SIZE UPDATES\n",
    "    \n",
    "    If the acceptance rate is two low, reduce the stepsize; if it is too high, \n",
    "    increase the stepsize\n",
    "    \"\"\"\n",
    "    \n",
    "    _new_stepsize = at.switch(avg_acceptance_rate > target_acceptance_rate,\n",
    "                              stepsize * stepsize_inc, stepsize * stepsize_dec)\n",
    "    # maintain stepsize in [stepsize_min, stepsize_max]\n",
    "    new_stepsize = at.clip(_new_stepsize, stepsize_min, stepsize_max)\n",
    "\n",
    "    # Update acceptance rate with exponential moving average\n",
    "    mean_dtype = aesara.scalar.upcast(accept.dtype, avg_acceptance_rate.dtype)\n",
    "    new_acceptance_rate = at.add(\n",
    "            avg_acceptance_slowness * avg_acceptance_rate,\n",
    "            (1.0 - avg_acceptance_slowness) * accept.mean(dtype=mean_dtype))\n",
    "\n",
    "    return [(positions, new_positions),\n",
    "            (stepsize, new_stepsize),\n",
    "            (avg_acceptance_rate, new_acceptance_rate)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**HMC_sampler**\n",
    "\n",
    "We implement the sampler in a Python class, called `HMC_Sampler`. It is a convenience wrapper for performing Hybrid Monte Carlo (HMC). It creates the symbolic graph for performing an HMC simulation (using `hmc_move` and `hmc_updates`). The graph is then compiled into the `simulate` function, an Aesara function which runs the simulation and updates the required shared\n",
    "variables. Its main attributes are:\n",
    "\n",
    "-   `new_from_shared_positions`: a constructor method which\n",
    "    allocates various shared variables and strings together the calls to\n",
    "    `hmc_move` and `hmc_updates`. It also builds the Aesara\n",
    "    function `simulate`, whose sole purpose is to execute the updates\n",
    "    generated by `hmc_updates`.\n",
    "-   `draw`: a convenience method which calls the Aesara function\n",
    "    `simulate` and returns a copy of the contents of the shared\n",
    "    variable `self.positions`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sharedX = lambda X, name: \\\n",
    "        shared(np.asarray(X, dtype=aesara.config.floatX), name=name)\n",
    "\n",
    "class HMC(object):\n",
    "\n",
    "    def __init__(self, shared_positions, energy_fn,\n",
    "            initial_stepsize=0.01, target_acceptance_rate=.9, n_steps=20,\n",
    "            stepsize_dec=0.98,\n",
    "            stepsize_min=0.001,\n",
    "            stepsize_max=0.25,\n",
    "            stepsize_inc=1.02,\n",
    "            avg_acceptance_slowness=0.9,\n",
    "            seed=12345):\n",
    "        \n",
    "        self.positions = shared_positions\n",
    "\n",
    "        # allocate shared variables\n",
    "        self.stepsize = stepsize = sharedX(initial_stepsize, 'hmc_stepsize')\n",
    "        self.avg_acceptance_rate = avg_acceptance_rate = sharedX(target_acceptance_rate,\n",
    "                                      'avg_acceptance_rate')\n",
    "        \n",
    "        s_rng = at.random.utils.RandomStream(seed)\n",
    "\n",
    "        # define graph for an `n_steps` HMC simulation\n",
    "        accept, final_pos = hmc_move(\n",
    "                s_rng,\n",
    "                shared_positions,\n",
    "                energy_fn,\n",
    "                stepsize,\n",
    "                n_steps)\n",
    "\n",
    "        # define the dictionary of updates, to apply on every `simulate` call\n",
    "        simulate_updates = hmc_updates(\n",
    "                shared_positions,\n",
    "                stepsize,\n",
    "                avg_acceptance_rate,\n",
    "                final_pos=final_pos,\n",
    "                accept=accept,\n",
    "                stepsize_min=stepsize_min,\n",
    "                stepsize_max=stepsize_max,\n",
    "                stepsize_inc=stepsize_inc,\n",
    "                stepsize_dec=stepsize_dec,\n",
    "                target_acceptance_rate=target_acceptance_rate,\n",
    "                avg_acceptance_slowness=avg_acceptance_slowness)\n",
    "\n",
    "        # compile Aesara function\n",
    "        self.simulate = function([], [], updates=simulate_updates)\n",
    "\n",
    "\n",
    "    def draw(self, **kwargs):\n",
    "        \"\"\"\n",
    "        Returns a new position obtained after `n_steps` of HMC simulation.\n",
    "        \"\"\"\n",
    "        self.simulate()\n",
    "        return self.positions.get_value(borrow=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Testing our Sampler\n",
    "\n",
    "We test our implementation of HMC by sampling from a multi-variate\n",
    "Gaussian distribution. We start by generating a random mean vector\n",
    "`mu` and covariance matrix `cov`, which allows us to define the\n",
    "energy function of the corresponding Gaussian distribution:\n",
    "`gaussian_energy`. We then initialize the state of the sampler by\n",
    "allocating a `position` shared variable. It is passed to the\n",
    "constructor of `HMC_sampler` along with our target energy function.\n",
    "\n",
    "Following a burn-in period, we then generate a large number of samples\n",
    "and compare the empirical mean and covariance matrix to their true\n",
    "values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Simulation hyperparameters\n",
    "burnin=2000\n",
    "n_samples=1000\n",
    "dim=5\n",
    "batchsize=3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's set up the model: a multivariate normal distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rng = np.random.RandomState(123)\n",
    "\n",
    "# Define a covariance and mu for a gaussian\n",
    "mu  = np.array(rng.rand(dim) * 10, dtype=aesara.config.floatX)\n",
    "cov = np.array(rng.rand(dim, dim), dtype=aesara.config.floatX)\n",
    "cov = (cov + cov.T) / 2.\n",
    "cov[np.arange(dim), np.arange(dim)] = 1.0\n",
    "cov_inv = np.linalg.inv(cov)\n",
    "\n",
    "# Define energy function for a multi-variate Gaussian\n",
    "def gaussian_energy(x):\n",
    "    return 0.5 * (at.dot((x - mu), cov_inv) * (x - mu)).sum(axis=1)\n",
    "\n",
    "# Declared shared random variable for positions\n",
    "position = shared(rng.randn(batchsize, dim).astype(aesara.config.floatX))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's initialize our sampler:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sampler = HMC(position, gaussian_energy,\n",
    "        initial_stepsize=1e-3, stepsize_max=0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Execute some tuning samples, to be discarded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "burn_me = [sampler.draw() for r in range(burnin)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now draw `n_samples` for inference. This returns a tensor of dimension `(n_samples, batchsize, dim)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "samples = np.asarray([sampler.draw() for r in range(n_samples)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "samples.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "flat_samples = samples.T.reshape(dim, -1).T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('****** TARGET VALUES ******')\n",
    "print('target mean:', mu)\n",
    "print('target cov:\\n', cov)\n",
    "\n",
    "print('****** EMPIRICAL MEAN/COV USING HMC ******')\n",
    "print('empirical mean: ', flat_samples.mean(axis=0))\n",
    "print('empirical_cov:\\n', np.cov(flat_samples.T))\n",
    "\n",
    "print('****** HMC INTERNALS ******')\n",
    "print('final stepsize', sampler.stepsize.get_value())\n",
    "print('final acceptance_rate', sampler.avg_acceptance_rate.get_value())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As can be seen above, the samples generated by our HMC sampler yield an\n",
    "empirical mean and covariance matrix, which are very close to the true\n",
    "underlying parameters. The adaptive algorithm also seemed to work well\n",
    "as the final acceptance rate is close to our target of `0.9`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "___\n",
    "## References\n",
    "\n",
    "1. Team TTD, Al-Rfou R, Alain G, et al. (2016) [Theano: A Python framework for fast computation of mathematical expressions](https://arxiv.org/abs/1605.02688). arXiv.org.\n",
    "2. Neal, R. M. (2010) [MCMC using Hamiltonian dynamics](http://www.mcmchandbook.net/HandbookChapter5.pdf), in the Handbook of Markov Chain Monte Carlo, S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng (editors), Chapman & Hall / CRC Press, pp. 113-162.\n",
    "3. Betancourt, M. (2017). [A Conceptual Introduction to Hamiltonian Monte Carlo](https://arxiv.org/abs/1701.02434). arXiv.org."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.9"
  },
  "latex_envs": {
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 0
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}