{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using a \"black box\" likelihood function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Cython extension is already loaded. To reload it, use:\n",
      "  %reload_ext Cython\n",
      "Running on PyMC3 v3.6\n"
     ]
    },
    {
     "ename": "ModuleNotFoundError",
     "evalue": "No module named 'corner'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-2-1cf6341006dd>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m     19\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mmatplotlib\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     20\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0memcee\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 21\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mcorner\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     22\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     23\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Python version:     {}\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mplatform\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpython_version\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'corner'"
     ]
    }
   ],
   "source": [
    "%load_ext Cython\n",
    "\n",
    "import arviz as az\n",
    "import corner\n",
    "import cython\n",
    "import emcee\n",
    "import IPython\n",
    "import matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import os\n",
    "import platform\n",
    "import pymc3 as pm\n",
    "import theano\n",
    "import theano.tensor as tt\n",
    "\n",
    "print('Running on PyMC3 v{}'.format(pm.__version__))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%config InlineBackend.figure_format = 'retina'\n",
    "az.style.use('arviz-darkgrid')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[PyMC3](https://docs.pymc.io/index.html) is a great tool for doing Bayesian inference and parameter estimation. It has a load of [in-built probability distributions](https://docs.pymc.io/api/distributions.html) that you can use to set up priors and likelihood functions for your particular model. You can even create your own [custom distributions](https://docs.pymc.io/prob_dists.html#custom-distributions).\n",
    "\n",
    "However, this is not necessarily that simple if you have a model function, or probability distribution, that, for example, relies on an external code that you have little/no control over (and may even be, for example, wrapped `C` code rather than Python). This can be problematic went you need to pass parameters set as PyMC3 distributions to these external functions; your external function probably wants you to pass it floating point numbers rather than PyMC3 distributions!\n",
    "\n",
    "```python\n",
    "import pymc3 as pm:\n",
    "from external_module import my_external_func  # your external function!\n",
    "\n",
    "# set up your model\n",
    "with pm.Model():\n",
    "    # your external function takes two parameters, a and b, with Uniform priors\n",
    "    a = pm.Uniform('a', lower=0., upper=1.)\n",
    "    b = pm.Uniform('b', lower=0., upper=1.)\n",
    "    \n",
    "    m = my_external_func(a, b)  # <--- this is not going to work!\n",
    "```\n",
    "\n",
    "Another issue is that if you want to be able to use the [gradient-based step samplers](https://docs.pymc.io/notebooks/getting_started.html#Gradient-based-sampling-methods) like [NUTS](https://docs.pymc.io/api/inference.html#module-pymc3.step_methods.hmc.nuts) and [Hamiltonian Monte Carlo (HMC)](https://docs.pymc.io/api/inference.html#hamiltonian-monte-carlo), then your model/likelihood needs a gradient to be defined. If you have a model that is defined as a set of Theano operators then this is no problem - internally it will be able to do automatic differentiation - but if your model is essentially a \"black box\" then you won't necessarily know what the gradients are.\n",
    "\n",
    "Defining a model/likelihood that PyMC3 can use and that calls your \"black box\" function is possible, but it relies on creating a [custom Theano Op](https://docs.pymc.io/advanced_theano.html#writing-custom-theano-ops). This is, hopefully, a clear description of how to do this, including one way of writing a gradient function that could be generally applicable.\n",
    "\n",
    "In the examples below, we create a very simple model and log-likelihood function in [Cython](http://cython.org/). Cython is used just as an example to show what you might need to do if calling external `C` codes, but you could in fact be using pure Python codes. The log-likelihood function used is actually just a [Normal distribution](https://en.wikipedia.org/wiki/Normal_distribution), so defining this yourself is obviously overkill (and I'll compare it to doing the same thing purely with the pre-defined PyMC3 [Normal](https://docs.pymc.io/api/distributions/continuous.html#pymc3.distributions.continuous.Normal) distribution), but it should provide a simple to follow demonstration.\n",
    "\n",
    "First, let's define a _super-complicated_&trade; model (a straight line!), which is parameterised by two variables (a gradient `m` and a y-intercept `c`) and calculated at a vector of points `x`. Here the model is defined in [Cython](http://cython.org/) and calls [GSL](https://www.gnu.org/software/gsl/) functions. This is just to show that you could be calling some other `C` library that you need. In this example, the model parameters are all packed into a list/array/tuple called `theta`.\n",
    "\n",
    "Let's also define a _really-complicated_&trade; log-likelihood function (a Normal log-likelihood that ignores the normalisation), which takes in the list/array/tuple of model parameter values `theta`, the points at which to calculate the model `x`, the vector of \"observed\" data points `data`, and the standard deviation of the noise in the data `sigma`. This log-likelihood function calls the _super-complicated_&trade; model function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%cython -I/usr/include -L/usr/lib/x86_64-linux-gnu -lgsl -lgslcblas -lm\n",
    "\n",
    "import cython\n",
    "cimport cython\n",
    "\n",
    "import numpy as np\n",
    "cimport numpy as np\n",
    "\n",
    "### STUFF FOR USING GSL (FEEL FREE TO IGNORE!) ###\n",
    "\n",
    "# declare GSL vector structure and functions\n",
    "cdef extern from \"gsl/gsl_block.h\":\n",
    "    cdef struct gsl_block:\n",
    "        size_t size\n",
    "        double * data\n",
    "\n",
    "cdef extern from \"gsl/gsl_vector.h\":\n",
    "    cdef struct gsl_vector:\n",
    "        size_t size\n",
    "        size_t stride\n",
    "        double * data\n",
    "        gsl_block * block\n",
    "        int owner\n",
    "\n",
    "    ctypedef struct gsl_vector_view:\n",
    "        gsl_vector vector\n",
    "\n",
    "    int gsl_vector_scale (gsl_vector * a, const double x) nogil\n",
    "    int gsl_vector_add_constant (gsl_vector * a, const double x) nogil\n",
    "    gsl_vector_view gsl_vector_view_array (double * base, size_t n) nogil\n",
    "\n",
    "###################################################\n",
    "\n",
    "\n",
    "# define your super-complicated model that uses loads of external codes\n",
    "cpdef my_model(theta, np.ndarray[np.float64_t, ndim=1] x):\n",
    "    \"\"\"\n",
    "    A straight line!\n",
    "\n",
    "    Note:\n",
    "        This function could simply be:\n",
    "\n",
    "            m, c = thetha\n",
    "            return m*x + x\n",
    "\n",
    "        but I've made it more complicated for demonstration purposes\n",
    "    \"\"\"\n",
    "    m, c = theta  # unpack line gradient and y-intercept\n",
    "\n",
    "    cdef size_t length = len(x)  # length of x\n",
    "\n",
    "    cdef np.ndarray line = np.copy(x)  # make copy of x vector\n",
    "    cdef gsl_vector_view lineview      # create a view of the vector\n",
    "    lineview = gsl_vector_view_array(<double *>line.data, length) \n",
    "\n",
    "    # multiply x by m\n",
    "    gsl_vector_scale(&lineview.vector, <double>m)\n",
    "\n",
    "    # add c\n",
    "    gsl_vector_add_constant(&lineview.vector, <double>c)\n",
    "\n",
    "    # return the numpy array\n",
    "    return line\n",
    "\n",
    "\n",
    "# define your really-complicated likelihood function that uses loads of external codes\n",
    "cpdef my_loglike(theta, np.ndarray[np.float64_t, ndim=1] x,\n",
    "                 np.ndarray[np.float64_t, ndim=1] data, sigma):\n",
    "    \"\"\"\n",
    "    A Gaussian log-likelihood function for a model with parameters given in theta\n",
    "    \"\"\"\n",
    "\n",
    "    model = my_model(theta, x)\n",
    "\n",
    "    return -(0.5/sigma**2)*np.sum((data - model)**2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, as things are, if we wanted to sample from this log-likelihood function, using certain prior distributions for the model parameters (gradient and y-intercept) using PyMC3, we might try something like this (using a [PyMC3 DensityDist](https://docs.pymc.io/prob_dists.html#custom-distributions)):\n",
    "\n",
    "```python\n",
    "import pymc3 as pm\n",
    "\n",
    "# create/read in our \"data\" (I'll show this in the real example below)\n",
    "x = ...\n",
    "sigma = ...\n",
    "data = ...\n",
    "\n",
    "with pm.Model():\n",
    "    # set priors on model gradient and y-intercept\n",
    "    m = pm.Uniform('m', lower=-10., upper=10.)\n",
    "    c = pm.Uniform('c', lower=-10., upper=10.)\n",
    "\n",
    "    # create custom distribution \n",
    "    pm.DensityDist('likelihood', my_loglike,\n",
    "                   observed={'theta': (m, c), 'x': x, 'data': data, 'sigma': sigma})\n",
    "    \n",
    "    # sample from the distribution\n",
    "    trace = pm.sample(1000)\n",
    "```\n",
    "\n",
    "But, this will give an error like:\n",
    "\n",
    "```\n",
    "ValueError: setting an array element with a sequence.\n",
    "```\n",
    "\n",
    "This is because `m` and `c` are Theano tensor-type objects.\n",
    "\n",
    "So, what we actually need to do is create a [Theano Op](http://deeplearning.net/software/theano/extending/extending_theano.html). This will be a new class that wraps our log-likelihood function (or just our model function, if that is all that is required) into something that can take in Theano tensor objects, but internally can cast them as floating point values that can be passed to our log-likelihood function. We will do this below, initially without defining a [grad() method](http://deeplearning.net/software/theano/extending/op.html#grad) for the Op."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define a theano Op for our likelihood function\n",
    "class LogLike(tt.Op):\n",
    "\n",
    "    \"\"\"\n",
    "    Specify what type of object will be passed and returned to the Op when it is\n",
    "    called. In our case we will be passing it a vector of values (the parameters\n",
    "    that define our model) and returning a single \"scalar\" value (the\n",
    "    log-likelihood)\n",
    "    \"\"\"\n",
    "    itypes = [tt.dvector] # expects a vector of parameter values when called\n",
    "    otypes = [tt.dscalar] # outputs a single scalar value (the log likelihood)\n",
    "\n",
    "    def __init__(self, loglike, data, x, sigma):\n",
    "        \"\"\"\n",
    "        Initialise the Op with various things that our log-likelihood function\n",
    "        requires. Below are the things that are needed in this particular\n",
    "        example.\n",
    "\n",
    "        Parameters\n",
    "        ----------\n",
    "        loglike:\n",
    "            The log-likelihood (or whatever) function we've defined\n",
    "        data:\n",
    "            The \"observed\" data that our log-likelihood function takes in\n",
    "        x:\n",
    "            The dependent variable (aka 'x') that our model requires\n",
    "        sigma:\n",
    "            The noise standard deviation that our function requires.\n",
    "        \"\"\"\n",
    "\n",
    "        # add inputs as class attributes\n",
    "        self.likelihood = loglike\n",
    "        self.data = data\n",
    "        self.x = x\n",
    "        self.sigma = sigma\n",
    "\n",
    "    def perform(self, node, inputs, outputs):\n",
    "        # the method that is used when calling the Op\n",
    "        theta, = inputs  # this will contain my variables\n",
    " \n",
    "        # call the log-likelihood function\n",
    "        logl = self.likelihood(theta, self.x, self.data, self.sigma)\n",
    "\n",
    "        outputs[0][0] = np.array(logl) # output the log-likelihood"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's use this Op to repeat the example shown above. To do this let's create some data containing a straight line with additive Gaussian noise (with a mean of zero and a standard deviation of `sigma`). For simplicity we set [uniform](https://docs.pymc.io/api/distributions/continuous.html#pymc3.distributions.continuous.Uniform) prior distributions on the gradient and y-intercept. As we've not set the `grad()` method of the Op PyMC3 will not be able to use the gradient-based samplers, so will fall back to using the [Slice](https://docs.pymc.io/api/inference.html#module-pymc3.step_methods.slicer) sampler."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set up our data\n",
    "N = 10  # number of data points\n",
    "sigma = 1.  # standard deviation of noise\n",
    "x = np.linspace(0., 9., N)\n",
    "\n",
    "mtrue = 0.4  # true gradient\n",
    "ctrue = 3.   # true y-intercept\n",
    "\n",
    "truemodel = my_model([mtrue, ctrue], x)\n",
    "\n",
    "# make data\n",
    "np.random.seed(716742)  # set random seed, so the data is reproducible each time\n",
    "data = sigma*np.random.randn(N) + truemodel\n",
    "\n",
    "ndraws = 3000  # number of draws from the distribution\n",
    "nburn = 1000   # number of \"burn-in points\" (which we'll discard)\n",
    "\n",
    "# create our Op\n",
    "logl = LogLike(my_loglike, data, x, sigma)\n",
    "\n",
    "# use PyMC3 to sampler from log-likelihood\n",
    "with pm.Model():\n",
    "    # uniform priors on m and c\n",
    "    m = pm.Uniform('m', lower=-10., upper=10.)\n",
    "    c = pm.Uniform('c', lower=-10., upper=10.)\n",
    "\n",
    "    # convert m and c to a tensor vector\n",
    "    theta = tt.as_tensor_variable([m, c])\n",
    "\n",
    "    # use a DensityDist (use a lamdba function to \"call\" the Op)\n",
    "    pm.DensityDist('likelihood', lambda v: logl(v), observed={'v': theta})\n",
    "    \n",
    "    trace = pm.sample(ndraws, tune=nburn, discard_tuned_samples=True)\n",
    "\n",
    "# plot the traces\n",
    "_ = pm.traceplot(trace, lines={'m': mtrue, 'c': ctrue})\n",
    "\n",
    "# put the chains in an array (for later!)\n",
    "samples_pymc3 = np.vstack((trace['m'], trace['c'])).T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What if we wanted to use NUTS or HMC? If we knew the analytical derivatives of the model/likelihood function then we could add a [grad() method](http://deeplearning.net/software/theano/extending/op.html#grad) to the Op using that analytical form.\n",
    "\n",
    "But, what if we don't know the analytical form. If our model/likelihood is purely Python and made up of standard maths operators and Numpy functions, then the [autograd](https://github.com/HIPS/autograd) module could potentially be used to find gradients (also, see [here](https://github.com/ActiveState/code/blob/master/recipes/Python/580610_Auto_differentiation/recipe-580610.py) for a nice Python example of automatic differentiation). But, if our model/likelihood truely is a \"black box\" then we can just use the good-old-fashioned [finite difference](https://en.wikipedia.org/wiki/Finite_difference) to find the gradients - this can be slow, especially if there are a large number of variables, or the model takes a long time to evaluate. Below, a function to find gradients has been defined that uses the finite difference (the central difference) - it uses an iterative method with successively smaller interval sizes to check that the gradient converges. But, you could do something far simpler and just use, for example, the SciPy [approx_fprime](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.approx_fprime.html) function. Here, the gradient function is defined in Cython for speed, but if the function it evaluates to find the gradients is the performance bottle neck then having this as a pure Python function may not make a significant speed difference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%cython\n",
    "\n",
    "import cython\n",
    "cimport cython\n",
    "\n",
    "import numpy as np\n",
    "cimport numpy as np\n",
    "\n",
    "import warnings\n",
    "\n",
    "def gradients(vals, func, releps=1e-3, abseps=None, mineps=1e-9, reltol=1e-3,\n",
    "              epsscale=0.5):\n",
    "    \"\"\"\n",
    "    Calculate the partial derivatives of a function at a set of values. The\n",
    "    derivatives are calculated using the central difference, using an iterative\n",
    "    method to check that the values converge as step size decreases.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    vals: array_like\n",
    "        A set of values, that are passed to a function, at which to calculate\n",
    "        the gradient of that function\n",
    "    func:\n",
    "        A function that takes in an array of values.\n",
    "    releps: float, array_like, 1e-3\n",
    "        The initial relative step size for calculating the derivative.\n",
    "    abseps: float, array_like, None\n",
    "        The initial absolute step size for calculating the derivative.\n",
    "        This overrides `releps` if set.\n",
    "        `releps` is set then that is used.\n",
    "    mineps: float, 1e-9\n",
    "        The minimum relative step size at which to stop iterations if no\n",
    "        convergence is achieved.\n",
    "    epsscale: float, 0.5\n",
    "        The factor by which releps if scaled in each iteration.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    grads: array_like\n",
    "        An array of gradients for each non-fixed value.\n",
    "    \"\"\"\n",
    "\n",
    "    grads = np.zeros(len(vals))\n",
    "\n",
    "    # maximum number of times the gradient can change sign\n",
    "    flipflopmax = 10.\n",
    "\n",
    "    # set steps\n",
    "    if abseps is None:\n",
    "        if isinstance(releps, float):\n",
    "            eps = np.abs(vals)*releps\n",
    "            eps[eps == 0.] = releps  # if any values are zero set eps to releps\n",
    "            teps = releps*np.ones(len(vals))\n",
    "        elif isinstance(releps, (list, np.ndarray)):\n",
    "            if len(releps) != len(vals):\n",
    "                raise ValueError(\"Problem with input relative step sizes\")\n",
    "            eps = np.multiply(np.abs(vals), releps)\n",
    "            eps[eps == 0.] = np.array(releps)[eps == 0.]\n",
    "            teps = releps\n",
    "        else:\n",
    "            raise RuntimeError(\"Relative step sizes are not a recognised type!\")\n",
    "    else:\n",
    "        if isinstance(abseps, float):\n",
    "            eps = abseps*np.ones(len(vals))\n",
    "        elif isinstance(abseps, (list, np.ndarray)):\n",
    "            if len(abseps) != len(vals):\n",
    "                raise ValueError(\"Problem with input absolute step sizes\")\n",
    "            eps = np.array(abseps)\n",
    "        else:\n",
    "            raise RuntimeError(\"Absolute step sizes are not a recognised type!\")\n",
    "        teps = eps\n",
    "\n",
    "    # for each value in vals calculate the gradient\n",
    "    count = 0\n",
    "    for i in range(len(vals)):\n",
    "        # initial parameter diffs\n",
    "        leps = eps[i]\n",
    "        cureps = teps[i]\n",
    "\n",
    "        flipflop = 0\n",
    "\n",
    "        # get central finite difference\n",
    "        fvals = np.copy(vals)\n",
    "        bvals = np.copy(vals)\n",
    "\n",
    "        # central difference\n",
    "        fvals[i] += 0.5*leps  # change forwards distance to half eps\n",
    "        bvals[i] -= 0.5*leps  # change backwards distance to half eps\n",
    "        cdiff = (func(fvals)-func(bvals))/leps\n",
    "\n",
    "        while 1:\n",
    "            fvals[i] -= 0.5*leps  # remove old step\n",
    "            bvals[i] += 0.5*leps\n",
    "\n",
    "            # change the difference by a factor of two\n",
    "            cureps *= epsscale\n",
    "            if cureps < mineps or flipflop > flipflopmax:\n",
    "                # if no convergence set flat derivative (TODO: check if there is a better thing to do instead)\n",
    "                warnings.warn(\"Derivative calculation did not converge: setting flat derivative.\")\n",
    "                grads[count] = 0.\n",
    "                break\n",
    "            leps *= epsscale\n",
    "\n",
    "            # central difference\n",
    "            fvals[i] += 0.5*leps  # change forwards distance to half eps\n",
    "            bvals[i] -= 0.5*leps  # change backwards distance to half eps\n",
    "            cdiffnew = (func(fvals)-func(bvals))/leps\n",
    "\n",
    "            if cdiffnew == cdiff:\n",
    "                grads[count] = cdiff\n",
    "                break\n",
    "\n",
    "            # check whether previous diff and current diff are the same within reltol\n",
    "            rat = (cdiff/cdiffnew)\n",
    "            if np.isfinite(rat) and rat > 0.:\n",
    "                # gradient has not changed sign\n",
    "                if np.abs(1.-rat) < reltol:\n",
    "                    grads[count] = cdiffnew\n",
    "                    break\n",
    "                else:\n",
    "                    cdiff = cdiffnew\n",
    "                    continue\n",
    "            else:\n",
    "                cdiff = cdiffnew\n",
    "                flipflop += 1\n",
    "                continue\n",
    "\n",
    "        count += 1\n",
    "\n",
    "    return grads"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, now we can just redefine our Op with a `grad()` method, right?\n",
    "\n",
    "It's not quite so simple! The `grad()` method itself requires that its inputs are Theano tensor variables, whereas our `gradients` function above, like our `my_loglike` function, wants a list of floating point values. So, we need to define another Op that calculates the gradients. Below, I define a new version of the `LogLike` Op, called `LogLikeWithGrad` this time, that has a `grad()` method. This is followed by anothor Op called `LogLikeGrad` that, when called with a vector of Theano tensor variables, returns another vector of values that are the gradients (i.e., the [Jacobian](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant)) of our log-likelihood function at those values. Note that the `grad()` method itself does not return the gradients directly, but instead returns the [Jacobian](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant)-vector product (you can hopefully just copy what I've done and not worry about what this means too much!)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define a theano Op for our likelihood function\n",
    "class LogLikeWithGrad(tt.Op):\n",
    "\n",
    "    itypes = [tt.dvector] # expects a vector of parameter values when called\n",
    "    otypes = [tt.dscalar] # outputs a single scalar value (the log likelihood)\n",
    "\n",
    "    def __init__(self, loglike, data, x, sigma):\n",
    "        \"\"\"\n",
    "        Initialise with various things that the function requires. Below\n",
    "        are the things that are needed in this particular example.\n",
    "\n",
    "        Parameters\n",
    "        ----------\n",
    "        loglike:\n",
    "            The log-likelihood (or whatever) function we've defined\n",
    "        data:\n",
    "            The \"observed\" data that our log-likelihood function takes in\n",
    "        x:\n",
    "            The dependent variable (aka 'x') that our model requires\n",
    "        sigma:\n",
    "            The noise standard deviation that out function requires.\n",
    "        \"\"\"\n",
    "\n",
    "        # add inputs as class attributes\n",
    "        self.likelihood = loglike\n",
    "        self.data = data\n",
    "        self.x = x\n",
    "        self.sigma = sigma\n",
    "\n",
    "        # initialise the gradient Op (below)\n",
    "        self.logpgrad = LogLikeGrad(self.likelihood, self.data, self.x, self.sigma)\n",
    "\n",
    "    def perform(self, node, inputs, outputs):\n",
    "        # the method that is used when calling the Op\n",
    "        theta, = inputs  # this will contain my variables\n",
    " \n",
    "        # call the log-likelihood function\n",
    "        logl = self.likelihood(theta, self.x, self.data, self.sigma)\n",
    "\n",
    "        outputs[0][0] = np.array(logl) # output the log-likelihood\n",
    "\n",
    "    def grad(self, inputs, g):\n",
    "        # the method that calculates the gradients - it actually returns the\n",
    "        # vector-Jacobian product - g[0] is a vector of parameter values \n",
    "        theta, = inputs  # our parameters \n",
    "        return [g[0]*self.logpgrad(theta)]\n",
    "\n",
    "\n",
    "class LogLikeGrad(tt.Op):\n",
    "\n",
    "    \"\"\"\n",
    "    This Op will be called with a vector of values and also return a vector of\n",
    "    values - the gradients in each dimension.\n",
    "    \"\"\"\n",
    "    itypes = [tt.dvector]\n",
    "    otypes = [tt.dvector]\n",
    "\n",
    "    def __init__(self, loglike, data, x, sigma):\n",
    "        \"\"\"\n",
    "        Initialise with various things that the function requires. Below\n",
    "        are the things that are needed in this particular example.\n",
    "\n",
    "        Parameters\n",
    "        ----------\n",
    "        loglike:\n",
    "            The log-likelihood (or whatever) function we've defined\n",
    "        data:\n",
    "            The \"observed\" data that our log-likelihood function takes in\n",
    "        x:\n",
    "            The dependent variable (aka 'x') that our model requires\n",
    "        sigma:\n",
    "            The noise standard deviation that out function requires.\n",
    "        \"\"\"\n",
    "\n",
    "        # add inputs as class attributes\n",
    "        self.likelihood = loglike\n",
    "        self.data = data\n",
    "        self.x = x\n",
    "        self.sigma = sigma\n",
    "\n",
    "    def perform(self, node, inputs, outputs):\n",
    "        theta, = inputs\n",
    "\n",
    "        # define version of likelihood function to pass to derivative function\n",
    "        def lnlike(values):\n",
    "            return self.likelihood(values, self.x, self.data, self.sigma)\n",
    "\n",
    "        # calculate gradients\n",
    "        grads = gradients(theta, lnlike)\n",
    "\n",
    "        outputs[0][0] = grads"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's re-run PyMC3 with our new \"grad\"-ed Op. This time it will be able to automatically use NUTS."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create our Op\n",
    "logl = LogLikeWithGrad(my_loglike, data, x, sigma)\n",
    "\n",
    "# use PyMC3 to sampler from log-likelihood\n",
    "with pm.Model() as opmodel:\n",
    "    # uniform priors on m and c\n",
    "    m = pm.Uniform('m', lower=-10., upper=10.)\n",
    "    c = pm.Uniform('c', lower=-10., upper=10.)\n",
    "\n",
    "    # convert m and c to a tensor vector\n",
    "    theta = tt.as_tensor_variable([m, c])\n",
    "\n",
    "    # use a DensityDist\n",
    "    pm.DensityDist('likelihood', lambda v: logl(v), observed={'v': theta})\n",
    "\n",
    "    trace = pm.sample(ndraws, tune=nburn, discard_tuned_samples=True)\n",
    "\n",
    "# plot the traces\n",
    "_ = pm.traceplot(trace, lines={'m': mtrue, 'c': ctrue})\n",
    "\n",
    "# put the chains in an array (for later!)\n",
    "samples_pymc3_2 = np.vstack((trace['m'], trace['c'])).T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, finally, just to check things actually worked as we might expect, let's do the same thing purely using PyMC3 distributions (because in this simple example we can!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with pm.Model() as pymodel:\n",
    "    # uniform priors on m and c\n",
    "    m = pm.Uniform('m', lower=-10., upper=10.)\n",
    "    c = pm.Uniform('c', lower=-10., upper=10.)\n",
    "\n",
    "    # convert m and c to a tensor vector\n",
    "    theta = tt.as_tensor_variable([m, c])\n",
    "\n",
    "    # use a Normal distribution\n",
    "    pm.Normal('likelihood', mu=(m*x + c), sd = sigma, observed=data)\n",
    "\n",
    "    trace = pm.sample(ndraws, tune=nburn, discard_tuned_samples=True)\n",
    "\n",
    "# plot the traces\n",
    "_ = pm.traceplot(trace, lines={'m': mtrue, 'c': ctrue})\n",
    "\n",
    "# put the chains in an array (for later!)\n",
    "samples_pymc3_3 = np.vstack((trace['m'], trace['c'])).T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To check that they match let's plot all the examples together and also find the autocorrelation lengths."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.simplefilter(action='ignore', category=FutureWarning)  # supress emcee autocorr FutureWarning\n",
    "\n",
    "matplotlib.rcParams['font.size'] = 18\n",
    "\n",
    "hist2dkwargs = {'plot_datapoints': False,\n",
    "                'plot_density': False,\n",
    "                'levels': 1.0 - np.exp(-0.5 * np.arange(1.5, 2.1, 0.5) ** 2)} # roughly 1 and 2 sigma\n",
    "\n",
    "colors = ['r', 'g', 'b']\n",
    "labels = ['Theanp Op (no grad)', 'Theano Op (with grad)', 'Pure PyMC3']\n",
    "\n",
    "for i, samples in enumerate([samples_pymc3, samples_pymc3_2, samples_pymc3_3]):\n",
    "    # get maximum chain autocorrelartion length\n",
    "    autocorrlen = int(np.max(emcee.autocorr.integrated_time(samples, c=3)));\n",
    "    print('Auto-correlation length ({}): {}'.format(labels[i], autocorrlen))\n",
    "\n",
    "    if i == 0:\n",
    "        fig = corner.corner(samples, labels=[r\"$m$\", r\"$c$\"], color=colors[i],\n",
    "                            hist_kwargs={'density': True}, **hist2dkwargs,\n",
    "                            truths=[mtrue, ctrue])\n",
    "    else:\n",
    "        corner.corner(samples, color=colors[i], hist_kwargs={'density': True},\n",
    "                      fig=fig, **hist2dkwargs)\n",
    "\n",
    "fig.set_size_inches(8, 8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now check that the gradient Op works was we expect it to. First, just create and call the `LogLikeGrad` class, which should return the gradient directly (note that we have to create a [Theano function](http://deeplearning.net/software/theano/library/compile/function.html) to convert the output of the Op to an array). Secondly, we call the gradient from `LogLikeWithGrad` by using the [Theano tensor gradient](http://deeplearning.net/software/theano/library/gradient.html#theano.gradient.grad) function. Finally, we will check the gradient returned by the PyMC3 model for a Normal distribution, which should be the same as the log-likelihood function we defined. In all cases we evaluate the gradients at the true values of the model function (the straight line) that was created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# test the gradient Op by direct call\n",
    "theano.config.compute_test_value = \"ignore\"\n",
    "theano.config.exception_verbosity = \"high\"\n",
    "\n",
    "var = tt.dvector()\n",
    "test_grad_op = LogLikeGrad(my_loglike, data, x, sigma)\n",
    "test_grad_op_func = theano.function([var], test_grad_op(var))\n",
    "grad_vals = test_grad_op_func([mtrue, ctrue])\n",
    "\n",
    "print('Gradient returned by \"LogLikeGrad\": {}'.format(grad_vals))\n",
    "\n",
    "# test the gradient called through LogLikeWithGrad\n",
    "test_gradded_op = LogLikeWithGrad(my_loglike, data, x, sigma)\n",
    "test_gradded_op_grad = tt.grad(test_gradded_op(var), var)\n",
    "test_gradded_op_grad_func = theano.function([var], test_gradded_op_grad)\n",
    "grad_vals_2 = test_gradded_op_grad_func([mtrue, ctrue])\n",
    "\n",
    "print('Gradient returned by \"LogLikeWithGrad\": {}'.format(grad_vals_2))\n",
    "\n",
    "# test the gradient that PyMC3 uses for the Normal log likelihood\n",
    "test_model = pm.Model()\n",
    "with test_model:\n",
    "    m = pm.Uniform('m', lower=-10., upper=10.)\n",
    "    c = pm.Uniform('c', lower=-10., upper=10.)\n",
    "\n",
    "    pm.Normal('likelihood', mu=(m*x + c), sigma=sigma, observed=data)\n",
    "\n",
    "    gradfunc = test_model.logp_dlogp_function([m, c], dtype=None)\n",
    "    gradfunc.set_extra_values({'m_interval__': mtrue, 'c_interval__': ctrue})\n",
    "    grad_vals_pymc3 = gradfunc(np.array([mtrue, ctrue]))[1]  # get dlogp values\n",
    "    \n",
    "print('Gradient returned by PyMC3 \"Normal\" distribution: {}'.format(grad_vals_pymc3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also do some [profiling](http://docs.pymc.io/notebooks/profiling.html) of the Op, as used within a PyMC3 Model, to check performance. First, we'll profile using the `LogLikeWithGrad` Op, and then doing the same thing purely using PyMC3 distributions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# profile logpt using our Op\n",
    "opmodel.profile(opmodel.logpt).summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# profile using our PyMC3 distribution\n",
    "pymodel.profile(pymodel.logpt).summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Authors\n",
    "\n",
    "* Adapted from a blog post by [Matt Pitkin](http://mattpitkin.github.io/samplers-demo/pages/pymc3-blackbox-likelihood/) on August 27, 2018. That post was based on an example provided by [Jørgen Midtbø](https://github.com/jorgenem/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "pymc3 3.8\n",
      "arviz 0.8.3\n",
      "numpy 1.17.5\n",
      "last updated: Thu Jun 11 2020 \n",
      "\n",
      "CPython 3.8.2\n",
      "IPython 7.11.0\n",
      "watermark 2.0.2\n"
     ]
    }
   ],
   "source": [
    "%load_ext watermark\n",
    "%watermark -n -u -v -iv -w"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}