{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fitting and error estimation with MCMC\n",
    "\n",
    "## Introduction\n",
    "\n",
    "The goal of Markov Chain Monte Carlo (MCMC) algorithms is to approximate the posterior distribution of your model parameters by random sampling in a probabilistic space. For most readers this sentence was probably not very helpful so here we'll start straight with and example but you should read the more detailed mathematical approaches of the method [here](https://www.pas.rochester.edu/~sybenzvi/courses/phy403/2015s/p403_17_mcmc.pdf) and [here](https://github.com/jakevdp/BayesianAstronomy/blob/master/03-Bayesian-Modeling-With-MCMC.ipynb).\n",
    "\n",
    "### How does it work ?\n",
    "\n",
    "The idea is that we use a number of walkers that will sample the posterior distribution (i.e. sample the Likelihood profile).\n",
    "\n",
    "The goal is to produce a \"chain\", i.e. a list of $\\theta$ values, where each $\\theta$ is a vector of parameters for your model.<br>\n",
    "If you start far away from the truth value, the chain will take some time to converge until it reaches a stationary state. Once it has reached this stage, each successive elements of the chain are samples of the target posterior distribution.<br>\n",
    "This means that, once we have obtained the chain of samples, we have everything we need. We can compute the  distribution of each parameter by simply approximating it with the histogram of the samples projected into the parameter space. This will provide the errors and correlations between parameters.\n",
    "\n",
    "\n",
    "Now let's try to put a picture on the ideas described above. With this notebook, we have simulated and carried out a MCMC analysis for a source with the following parameters:<br>\n",
    "$Index=2.0$, $Norm=5\\times10^{-12}$ cm$^{-2}$ s$^{-1}$ TeV$^{-1}$, $Lambda =(1/Ecut) = 0.02$ TeV$^{-1}$ (50 TeV) for 20 hours.\n",
    "\n",
    "The results that you can get from a MCMC analysis will look like this :\n",
    "\n",
    "<img src=\"images/gammapy_mcmc.png\" width=\"800\">\n",
    "\n",
    "On the first two top panels, we show the pseudo-random walk of one walker from an offset starting value to see it evolve to a better solution.\n",
    "In the bottom right panel, we show the trace of each 16 walkers for 500 runs (the chain described previsouly). For the first 100 runs, the parameter evolve towards a solution (can be viewed as a fitting step). Then they explore the local minimum for 400 runs which will be used to estimate the parameters correlations and errors.\n",
    "The choice of the Nburn value (when walkers have reached a stationary stage) can be done by eye but you can also look at the autocorrelation time.\n",
    "\n",
    "### Why should I use it ?\n",
    "\n",
    "When it comes to evaluate errors and investigate parameter correlation, one typically estimate the Likelihood in a gridded search (2D Likelihood profiles). Each point of the grid implies a new model fitting. If we use 10 steps for each parameters, we will need to carry out 100 fitting procedures. \n",
    "\n",
    "Now let's say that I have a model with $N$ parameters, we need to carry out that gridded analysis $N*(N-1)$ times. \n",
    "So for 5 free parameters you need 20 gridded search, resulting in 2000 individual fit. \n",
    "Clearly this strategy doesn't scale well to high-dimensional models.\n",
    "\n",
    "Just for fun: if each fit procedure takes 10s, we're talking about 5h of computing time to estimate the correlation plots. \n",
    "\n",
    "There are many MCMC packages in the python ecosystem but here we will focus on [emcee](http://url), a lightweight Python package. A description is provided here : [Foreman-Mackey, Hogg, Lang & Goodman (2012)](https://arxiv.org/abs/1202.3665)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import astropy.units as u\n",
    "from astropy.coordinates import SkyCoord\n",
    "from gammapy.irf import load_cta_irfs\n",
    "from gammapy.maps import WcsGeom, MapAxis\n",
    "from gammapy.spectrum.models import ExponentialCutoffPowerLaw\n",
    "from gammapy.image.models import SkyGaussian\n",
    "from gammapy.cube.models import SkyModel\n",
    "from gammapy.cube.simulate import simulate_dataset\n",
    "from gammapy.utils.fitting import Fit\n",
    "\n",
    "import emcee\n",
    "import corner"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simulate an observation\n",
    "\n",
    "Here we will start by simulating an observation using the `simulate_dataset` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "irfs = load_cta_irfs(\"$GAMMAPY_DATA/cta-1dc/caldb/data/cta/1dc/bcf/South_z20_50h/irf_file.fits\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define sky model to simulate the data\n",
    "spatial_model = SkyGaussian(lon_0=\"0 deg\", lat_0=\"0 deg\", sigma=\"0.2 deg\")\n",
    "\n",
    "spectral_model = ExponentialCutoffPowerLaw(\n",
    "    index=2,\n",
    "    amplitude=\"3e-12 cm-2 s-1 TeV-1\",\n",
    "    reference=\"1 TeV\",\n",
    "    lambda_=\"0.05 TeV-1\",\n",
    ")\n",
    "\n",
    "sky_model_simu = SkyModel(\n",
    "    spatial_model=spatial_model, spectral_model=spectral_model\n",
    ")\n",
    "print(sky_model_simu)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define map geometry\n",
    "axis = MapAxis.from_edges(\n",
    "    np.logspace(-1, 2, 30), unit=\"TeV\", name=\"energy\", interp=\"log\"\n",
    ")\n",
    "geom = WcsGeom.create(\n",
    "    skydir=(0, 0), binsz=0.05, width=(2, 2), coordsys=\"GAL\", axes=[axis]\n",
    ")\n",
    "\n",
    "# Define some observation parameters\n",
    "pointing = SkyCoord(0 * u.deg, 0 * u.deg, frame=\"galactic\")\n",
    "\n",
    "\n",
    "dataset = simulate_dataset(\n",
    "    sky_model_simu, geom, pointing, irfs, livetime=20 * u.h, random_state=42\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.counts.sum_over_axes().plot(add_cbar=True);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If you want to fit the data for comparison with MCMC later\n",
    "\n",
    "# fit = Fit(dataset)\n",
    "# result = fit.run(optimize_opts={\"print_level\": 1})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Estimate parameter correlations with MCMC\n",
    "\n",
    "Now let's analyse the simulated data.\n",
    "Here we just fit it again with the same model we had before as a starting point.\n",
    "The data that would be needed are the following: \n",
    "- counts cube, psf cube, exposure cube and background model\n",
    "\n",
    "Luckily all those maps are already in the Dataset object.\n",
    "\n",
    "We will need to define a Likelihood function and define priors on parameters.<br>\n",
    "Here we will assume a uniform prior reading the min, max parameters from the sky model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prior functions\n",
    "\n",
    "\n",
    "def uniform_prior(value, umin, umax):\n",
    "    \"\"\"Uniform prior distribution.\"\"\"\n",
    "    if umin <= value <= umax:\n",
    "        return 0.0\n",
    "    else:\n",
    "        return -np.inf\n",
    "\n",
    "\n",
    "def normal_prior(value, mean, sigma):\n",
    "    \"\"\"Normal prior distribution.\"\"\"\n",
    "    return -0.5 * (2 * np.pi * sigma) - (value - mean) ** 2 / (2.0 * sigma)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def model_to_par(dataset):\n",
    "    \"\"\"\n",
    "    Return a tuple of the factor parameters of all \n",
    "    free parameters in the dataset sky model.\n",
    "    \"\"\"\n",
    "    pars = []\n",
    "    for p in dataset.parameters.free_parameters:\n",
    "        pars.append(p.factor)\n",
    "\n",
    "    return pars\n",
    "\n",
    "\n",
    "def par_to_model(dataset, pars):\n",
    "    \"\"\"Update model in dataset with a list of free parameters factors\"\"\"\n",
    "    for i, p in enumerate(dataset.parameters.free_parameters):\n",
    "        p.factor = pars[i]\n",
    "\n",
    "\n",
    "def lnprior(dataset):\n",
    "    \"\"\"\n",
    "    Return probability of parameter values according to prior knowledge.\n",
    "    Parameter limits should be done here through uniform prior ditributions\n",
    "    \"\"\"\n",
    "    logprob = 0\n",
    "    for par in dataset.parameters.free_parameters:\n",
    "        logprob += uniform_prior(par.value, par.min, par.max)\n",
    "\n",
    "    return logprob\n",
    "\n",
    "\n",
    "def lnprob(pars, dataset, verb=False):\n",
    "    \"\"\"Estimate the likelihood of a model including prior on parameters.\"\"\"\n",
    "    # Update model parameters factors inplace\n",
    "    for factor, par in zip(pars, dataset.parameters.free_parameters):\n",
    "        par.factor = factor\n",
    "\n",
    "    lnprob_priors = lnprior(dataset)\n",
    "\n",
    "    # dataset.likelihood returns Cash statistics values\n",
    "    # emcee will maximisise the LogLikelihood so we need -dataset.likelihood\n",
    "    total_lnprob = -dataset.likelihood(None) + lnprob_priors\n",
    "\n",
    "    if verb:\n",
    "        print(\"Parameters are:\", pars)\n",
    "        print(\"LL=\", total_lnprob)\n",
    "        for p in dataset.parameters.free_parameters:\n",
    "            print(p)\n",
    "        print(\"\")\n",
    "\n",
    "    return total_lnprob"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define priors\n",
    "\n",
    "This steps is a bit manual for the moment until we find a better API to define priors.<br>\n",
    "Note the you **need** to define priors for each parameter otherwise your walkers can explore uncharted territories (e.g. negative norms)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define the free parameters and min, max values\n",
    "\n",
    "dataset.parameters[\"sigma\"].frozen = True\n",
    "dataset.parameters[\"lon_0\"].frozen = True\n",
    "dataset.parameters[\"lat_0\"].frozen = True\n",
    "dataset.parameters[\"amplitude\"].frozen = False\n",
    "dataset.parameters[\"index\"].frozen = False\n",
    "dataset.parameters[\"lambda_\"].frozen = False\n",
    "\n",
    "\n",
    "dataset.background_model.parameters[\"norm\"].frozen = False\n",
    "dataset.background_model.parameters[\"tilt\"].frozen = True\n",
    "\n",
    "dataset.background_model.parameters[\"norm\"].min = 0.5\n",
    "dataset.background_model.parameters[\"norm\"].max = 2\n",
    "\n",
    "dataset.parameters[\"index\"].min = 1\n",
    "dataset.parameters[\"index\"].max = 5\n",
    "dataset.parameters[\"lambda_\"].min = 1e-3\n",
    "dataset.parameters[\"lambda_\"].max = 1\n",
    "\n",
    "dataset.parameters[\"amplitude\"].min = (\n",
    "    0.01 * dataset.parameters[\"amplitude\"].value\n",
    ")\n",
    "dataset.parameters[\"amplitude\"].max = (\n",
    "    100 * dataset.parameters[\"amplitude\"].value\n",
    ")\n",
    "\n",
    "dataset.parameters[\"sigma\"].min = 0.05\n",
    "dataset.parameters[\"sigma\"].max = 1\n",
    "\n",
    "# Setting amplitude init values a bit offset to see evolution\n",
    "# Here starting close to the real value\n",
    "dataset.parameters[\"index\"].value = 2.0\n",
    "dataset.parameters[\"amplitude\"].value = 3.3e-12\n",
    "dataset.parameters[\"lambda_\"].value = 0.05\n",
    "\n",
    "print(dataset.model)\n",
    "print(\"LL =\", dataset.likelihood(dataset.parameters))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# Now let's define a function to init parameters and run the MCMC with emcee\n",
    "# Depending on your number of walkers, Nrun and dimensionality, this can take a while (> minutes)\n",
    "\n",
    "def run_mcmc(dataset, nwalkers=12, nrun=500, threads=1):\n",
    "    \"\"\"\n",
    "    Run the MCMC sampler.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    dataset : `MapDataset`  \n",
    "        A gammapy dataset object. This contains the observed counts cube,\n",
    "        the exposure cube, the psf cube, and the sky model and model.\n",
    "        Each free parameter in the sky model is considered as parameter for the MCMC.\n",
    "    nwalkers: int\n",
    "        Required integer number of walkers to use in ensemble.\n",
    "        Minimum is 2*nparam+2, but more than that is usually better.\n",
    "        Must be even to use MPI mode.\n",
    "    nrun: int\n",
    "        Number of steps for walkers. Typically at least a few hundreds (but depends on dimensionality).\n",
    "        Low nrun (<100?) will underestimate the errors. \n",
    "        Samples that would populate the distribution are nrun*nwalkers.\n",
    "        This step can be ~seen as the error estimation step. \n",
    "    \"\"\"\n",
    "    dataset.parameters.autoscale()  # Autoscale parameters\n",
    "    pars = model_to_par(dataset)  # get a tuple of parameters from dataset\n",
    "    ndim = len(pars)\n",
    "\n",
    "    # Initialize walkers in a ball of relative size 0.5% in all dimensions if the\n",
    "    # parameters have been fit, or to 10% otherwise\n",
    "    spread = 0.5 / 100\n",
    "    p0var = np.array([spread * pp for pp in pars])\n",
    "    p0 = emcee.utils.sample_ball(pars, p0var, nwalkers)\n",
    "\n",
    "    labels = []\n",
    "    for par in dataset.parameters.free_parameters:\n",
    "        labels.append(par.name)\n",
    "        if (par.min is np.nan) and (par.max is np.nan):\n",
    "            print(\n",
    "                \"Warning: no priors have been set for parameter %s\\n The MCMC will likely not work !\"\n",
    "                % (par.name)\n",
    "            )\n",
    "\n",
    "    print(\"List of free parameters:\", labels)\n",
    "    print(\"{} walkers will run for {} steps\".format(nwalkers, nrun))\n",
    "    print(\"Parameters init value for 1st walker:\", p0[0])\n",
    "    sampler = emcee.EnsembleSampler(\n",
    "        nwalkers, ndim, lnprob, args=[dataset], threads=threads\n",
    "    )\n",
    "\n",
    "    for idx, result in enumerate(sampler.sample(p0, iterations=nrun)):\n",
    "        if (idx + 1) % 100 == 0:\n",
    "            print(\"{0:5.0%}\".format(idx / nrun))\n",
    "\n",
    "    return sampler\n",
    "\n",
    "\n",
    "sampler = run_mcmc(dataset, nwalkers=8, nrun=500)  # to speedup the notebook\n",
    "# sampler=run_mcmc(dataset,nwalkers=16,nrun=1000) # more accurate contours"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting the results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot the results\n",
    "\n",
    "The MCMC will return a sampler object containing the trace of all walkers.<br>\n",
    "The most important part is the chain attribute which is an array of shape:<br>\n",
    "_(nwalkers, nrun, nfreeparam)_\n",
    "\n",
    "The chain is then used to plot the trace of the walkers and estimate the burnin period (the time for the walkers to reach a stationary stage)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "def plot_trace(sampler, dataset):\n",
    "    \"\"\"\n",
    "    Plot the trace of walkers for every steps\n",
    "    \"\"\"\n",
    "    labels = []\n",
    "    for par in dataset.parameters.free_parameters:\n",
    "        labels.append(par.name)\n",
    "\n",
    "    fig, ax = plt.subplots(len(labels), sharex=True)\n",
    "    for i in range(len(ax)):\n",
    "        ax[i].plot(sampler.chain[:, :, i].T, \"-k\", alpha=0.2)\n",
    "        ax[i].set_ylabel(labels[i])\n",
    "    plt.xlabel(\"Nrun\")\n",
    "\n",
    "\n",
    "plot_trace(sampler, dataset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_corner(sampler, dataset, nburn=0):\n",
    "    \"\"\"\n",
    "    Corner plot for each parameter explored by the walkers.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    sampler : `EnsembleSample`\n",
    "        Sample instance.\n",
    "    \n",
    "    nburn: int\n",
    "        Number of runs that will be discarded (burnt) until reaching ~stationary states for walkers.\n",
    "        Hard to guess. Depends how close to best-fit you are. \n",
    "        A good nbrun value can be estimated from the trace plot.\n",
    "        This step can be ~seen as the fitting step.    \n",
    "    \n",
    "    \"\"\"\n",
    "    labels = [par.name for par in dataset.parameters.free_parameters]\n",
    "\n",
    "    samples = sampler.chain[:, nburn:, :].reshape((-1, len(labels)))\n",
    "\n",
    "    corner.corner(\n",
    "        samples, labels=labels, quantiles=[0.16, 0.5, 0.84], show_titles=True\n",
    "    )\n",
    "\n",
    "\n",
    "plot_corner(sampler, dataset, nburn=200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot the model dispersion\n",
    "\n",
    "Using the samples from the chain after the burn period, we can plot the different models compared to the truth model. To do this we need to the spectral models for each parameter state in the sample."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "emin, emax = [0.1, 100] * u.TeV\n",
    "nburn = 300\n",
    "\n",
    "fig, ax = plt.subplots(1, 1, figsize=(12, 6))\n",
    "\n",
    "for nwalk in range(0, 8):\n",
    "    for n in range(nburn, nburn + 100):\n",
    "        pars = sampler.chain[nwalk, n, :]\n",
    "\n",
    "        # set model parameters\n",
    "        par_to_model(dataset, pars)\n",
    "        spectral_model = dataset.model.skymodels[0].spectral_model\n",
    "\n",
    "        spectral_model.plot(\n",
    "            energy_range=(emin, emax),\n",
    "            ax=ax,\n",
    "            energy_power=2,\n",
    "            alpha=0.02,\n",
    "            color=\"grey\",\n",
    "        )\n",
    "\n",
    "\n",
    "sky_model_simu.spectral_model.plot(\n",
    "    energy_range=(emin, emax), energy_power=2, ax=ax, color=\"red\"\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fun Zone\n",
    "\n",
    "Now that you have the sampler chain, you have in your hands the entire history of each walkers in the N-Dimensional parameter space. <br>\n",
    "You can for example trace the steps of each walker in any parameter space."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here we plot the trace of one walker in a given parameter space\n",
    "parx, pary = 0, 1\n",
    "\n",
    "plt.plot(sampler.chain[0, :, parx], sampler.chain[0, :, pary], \"ko\", ms=1)\n",
    "plt.plot(\n",
    "    sampler.chain[0, :, parx],\n",
    "    sampler.chain[0, :, pary],\n",
    "    ls=\":\",\n",
    "    color=\"grey\",\n",
    "    alpha=0.5,\n",
    ")\n",
    "\n",
    "plt.xlabel(\"Index\")\n",
    "plt.ylabel(\"Amplitude\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PeVatrons in CTA ?\n",
    "\n",
    "Now it's your turn to play with this MCMC notebook. For example to test the CTA performance to measure a cutoff at very high energies (100 TeV ?).\n",
    "\n",
    "After defining your Skymodel it can be as simple as this :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# dataset = simulate_dataset(model, geom, pointing, irfs)\n",
    "# sampler = run_mcmc(dataset)\n",
    "# plot_trace(sampler, dataset)\n",
    "# plot_corner(sampler, dataset, nburn=200)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}