{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Homework 4.1: Writing your own MCMC sampler (70 pts)\n",
    "\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import numba"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**a)** Write your own MCMC sampler that employs a Metropolis-Hastings algorithm to sample continuous parameters (it does not need to handle discrete parameters) that uses a Normal proposal distribution. Since you are sampling multiple parameters, your proposal distribution will be multi-dimensional. You can use a Normal proposal distribution with a diagonal covariance. In other words, you generate a proposal for each variable in the target distribution independently.\n",
    "\n",
    "You can organize your code how you like, but here is a suggestion.\n",
    "\n",
    "* Write a function that takes a Metropolis-Hastings step. It should look something like the below (obviously where it does something instead of `pass`ing)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def mh_step(x, logtarget, logtarget_current, sigma, args=()):\n",
    "    \"\"\"\n",
    "    Parameters\n",
    "    ----------\n",
    "    x : ndarray, shape (n_variables,)\n",
    "        The present location of the walker in parameter space.\n",
    "    logtarget : function\n",
    "        The function to compute the log target. It has call\n",
    "        signature `logtarget(x, *args)`.\n",
    "    logtarget_current : float\n",
    "        The current value of the log target.\n",
    "    sigma : ndarray, shape (n_variables, )\n",
    "        The standard deviations for the proposal distribution.\n",
    "    args : tuple\n",
    "        Additional arguments passed to `logtarget()` function.\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    new_x : ndarray, shape (n_variables,)\n",
    "        The position of the walker after the Metropolis-Hastings\n",
    "        step. If no step is taken, returns the inputted `x`.\n",
    "    new_logtarget : float\n",
    "        The updated log probability density of the target.\n",
    "    accepted : bool\n",
    "        True is the proposed step was taken and False otherwise.\n",
    "    \"\"\"\n",
    "    \n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Write another function that calls that function over and over again to do the sampling. It should look something like the below. (Note that I am using `n_burn` as opposed to `n_warmup`, because you are just going to step as you normally would, and then \"burn\" the samples. This is in contrast to Stan, which tunes the stepping procedure during the warm-up phase.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def mh_sample(\n",
    "    logtarget, x0, sigma, args=(), n_burn=1000, n_steps=1000, variable_names=None\n",
    "):\n",
    "    \"\"\"\n",
    "    Parameters\n",
    "    ----------\n",
    "    logtarget : function\n",
    "        The function to compute the log target. It has call\n",
    "        signature `logtarget(x, *args)`.\n",
    "    x0 : ndarray, shape (n_variables,)\n",
    "        The starting location of a walker in parameter space.\n",
    "    sigma : ndarray, shape (n_variables, )\n",
    "        The standard deviations for the proposal distribution.\n",
    "    args : tuple\n",
    "        Additional arguments passed to `logtarget()` function.\n",
    "    n_burn : int, default 1000\n",
    "        Number of burn-in steps.\n",
    "    n_steps : int, default 1000\n",
    "        Number of steps to take after burn-in.\n",
    "    variable_names : list, length n_variables\n",
    "        List of names of variables. If None, then variable names\n",
    "        are sequential integers.\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "    output : DataFrame\n",
    "        The first `n_variables` columns contain the samples.\n",
    "        Additionally, column 'lnprob' has the log target value\n",
    "        at each sample.\n",
    "    \"\"\"\n",
    "\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**b)** To test your code, we will get samples out of a known distribution. We will use a bivariate Normal with a mean of $\\boldsymbol{\\mu} = (10, 20)$ and covariance matrix of \n",
    "\n",
    "\\begin{align}\n",
    "\\mathsf{\\Sigma} = \\begin{pmatrix}\n",
    "4 & -2 \\\\\n",
    "-2 & 6\n",
    "\\end{pmatrix}\n",
    "\\end{align}\n",
    "\n",
    "I have written the log-PDF function to be unnormalized and JITted with [numba](https://numba.pydata.org) for optimal speed.\n",
    "\n",
    "*Do not be confused*: In this test function we are sampling $\\mathbf{x}$ out of $P(\\mathbf{x}\\mid \\boldsymbol{\\mu},\\mathsf{\\Sigma})$. This is not sampling a posterior; it is just a test for your code. (Remember, MCMC samplers can sample out of arbitrary distributions, not just posteriors. In general, the distribution we sample out of is called a target distribution.) Furthermore, though both Normal, this test function is not the proposal distribution. You will pass `log_test_distribution` as the `logtarget` argument in the above functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "mu = np.array([10.0, 20])\n",
    "cov = np.array([[4, -2],[-2, 6]])\n",
    "inv_cov = np.linalg.inv(cov)\n",
    "\n",
    "@numba.njit\n",
    "def log_test_distribution(x, mu, inv_cov):\n",
    "    \"\"\"\n",
    "    Unnormalized log target of a multivariate Gaussian.\n",
    "    \"\"\"\n",
    "    return -np.dot((x-mu), np.dot(inv_cov, (x-mu))) / 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you compute the means and covariance (using `np.cov()`) of your samples, you should come close to the inputted means and covariance. You might also want to plot your samples using `bebi103.viz.corner()` to make sure everything makes sense.\n",
    "\n",
    "You may want to add in some logic to your Metropolis-Hastings sampler to enable *tuning*. This is the process of automatically adjusting the $\\sigma$ in the proposal distribution such that the acceptance rate is desirable. A good target acceptance rate is about 0.4. The developers of [PyMC](https://docs.pymc.io) use the scheme below, which is reasonable.\n",
    "\n",
    "|Acceptance rate|Standard deviation adaptation|\n",
    "|:---:|:-------------------:|\n",
    "| < 0.001        |$\\times$ 0.1|\n",
    "|< 0.05         |$\\times$ 0.5|\n",
    "|< 0.2          |$\\times$ 0.9|\n",
    "|> 0.5          |$\\times$ 1.1|\n",
    "|> 0.75         |$\\times$ 2|\n",
    "|> 0.95         |$\\times$ 10|\n",
    "\n",
    "Be sure to test your code to demonstrate that it works."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}