{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Homework 4.1: Writing your own MCMC sampler (70 pts)\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import numba" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**a)** Write your own MCMC sampler that employs a Metropolis-Hastings algorithm to sample continuous parameters (it does not need to handle discrete parameters) that uses a Normal proposal distribution. Since you are sampling multiple parameters, your proposal distribution will be multi-dimensional. You can use a Normal proposal distribution with a diagonal covariance. In other words, you generate a proposal for each variable in the target distribution independently.\n", "\n", "You can organize your code how you like, but here is a suggestion.\n", "\n", "* Write a function that takes a Metropolis-Hastings step. It should look something like the below (obviously where it does something instead of `pass`ing)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def mh_step(x, logtarget, logtarget_current, sigma, args=()):\n", " \"\"\"\n", " Parameters\n", " ----------\n", " x : ndarray, shape (n_variables,)\n", " The present location of the walker in parameter space.\n", " logtarget : function\n", " The function to compute the log target. It has call\n", " signature `logtarget(x, *args)`.\n", " logtarget_current : float\n", " The current value of the log target.\n", " sigma : ndarray, shape (n_variables, )\n", " The standard deviations for the proposal distribution.\n", " args : tuple\n", " Additional arguments passed to `logtarget()` function.\n", "\n", " Returns\n", " -------\n", " new_x : ndarray, shape (n_variables,)\n", " The position of the walker after the Metropolis-Hastings\n", " step. If no step is taken, returns the inputted `x`.\n", " new_logtarget : float\n", " The updated log probability density of the target.\n", " accepted : bool\n", " True is the proposed step was taken and False otherwise.\n", " \"\"\"\n", " \n", " pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Write another function that calls that function over and over again to do the sampling. It should look something like the below. (Note that I am using `n_burn` as opposed to `n_warmup`, because you are just going to step as you normally would, and then \"burn\" the samples. This is in contrast to Stan, which tunes the stepping procedure during the warm-up phase.)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def mh_sample(\n", " logtarget, x0, sigma, args=(), n_burn=1000, n_steps=1000, variable_names=None\n", "):\n", " \"\"\"\n", " Parameters\n", " ----------\n", " logtarget : function\n", " The function to compute the log target. It has call\n", " signature `logtarget(x, *args)`.\n", " x0 : ndarray, shape (n_variables,)\n", " The starting location of a walker in parameter space.\n", " sigma : ndarray, shape (n_variables, )\n", " The standard deviations for the proposal distribution.\n", " args : tuple\n", " Additional arguments passed to `logtarget()` function.\n", " n_burn : int, default 1000\n", " Number of burn-in steps.\n", " n_steps : int, default 1000\n", " Number of steps to take after burn-in.\n", " variable_names : list, length n_variables\n", " List of names of variables. If None, then variable names\n", " are sequential integers.\n", " \n", " Returns\n", " -------\n", " output : DataFrame\n", " The first `n_variables` columns contain the samples.\n", " Additionally, column 'lnprob' has the log target value\n", " at each sample.\n", " \"\"\"\n", "\n", " pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**b)** To test your code, we will get samples out of a known distribution. We will use a bivariate Normal with a mean of $\\boldsymbol{\\mu} = (10, 20)$ and covariance matrix of \n", "\n", "\\begin{align}\n", "\\mathsf{\\Sigma} = \\begin{pmatrix}\n", "4 & -2 \\\\\n", "-2 & 6\n", "\\end{pmatrix}\n", "\\end{align}\n", "\n", "I have written the log-PDF function to be unnormalized and JITted with [numba](https://numba.pydata.org) for optimal speed.\n", "\n", "*Do not be confused*: In this test function we are sampling $\\mathbf{x}$ out of $P(\\mathbf{x}\\mid \\boldsymbol{\\mu},\\mathsf{\\Sigma})$. This is not sampling a posterior; it is just a test for your code. (Remember, MCMC samplers can sample out of arbitrary distributions, not just posteriors. In general, the distribution we sample out of is called a target distribution.) Furthermore, though both Normal, this test function is not the proposal distribution. You will pass `log_test_distribution` as the `logtarget` argument in the above functions." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "mu = np.array([10.0, 20])\n", "cov = np.array([[4, -2],[-2, 6]])\n", "inv_cov = np.linalg.inv(cov)\n", "\n", "@numba.njit\n", "def log_test_distribution(x, mu, inv_cov):\n", " \"\"\"\n", " Unnormalized log target of a multivariate Gaussian.\n", " \"\"\"\n", " return -np.dot((x-mu), np.dot(inv_cov, (x-mu))) / 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you compute the means and covariance (using `np.cov()`) of your samples, you should come close to the inputted means and covariance. You might also want to plot your samples using `bebi103.viz.corner()` to make sure everything makes sense.\n", "\n", "You may want to add in some logic to your Metropolis-Hastings sampler to enable *tuning*. This is the process of automatically adjusting the $\\sigma$ in the proposal distribution such that the acceptance rate is desirable. A good target acceptance rate is about 0.4. The developers of [PyMC](https://docs.pymc.io) use the scheme below, which is reasonable.\n", "\n", "|Acceptance rate|Standard deviation adaptation|\n", "|:---:|:-------------------:|\n", "| < 0.001 |$\\times$ 0.1|\n", "|< 0.05 |$\\times$ 0.5|\n", "|< 0.2 |$\\times$ 0.9|\n", "|> 0.5 |$\\times$ 1.1|\n", "|> 0.75 |$\\times$ 2|\n", "|> 0.95 |$\\times$ 10|\n", "\n", "Be sure to test your code to demonstrate that it works." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 4 }