{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Modifiers\n",
    "\n",
    "In our simple examples so far, we've only used two types of modifiers, but HistFactory allows for a handful of modifiers that have proven to be sufficient to model a wide range of uncertainties. Each of the modifiers is further described in the [Modifiers section](https://pyhf.readthedocs.io/en/v0.7.4/likelihood.html#modifiers) of the pyhf docs on model specification.\n",
    "\n",
    "There is an addtional table of [Modifiers and Constraints](https://pyhf.readthedocs.io/en/v0.7.4/intro.html#id24) in the introduction of the pyhf documentation to use as reference.\n",
    "\n",
    "![modifiers and constraints](assets/modifiers_and_constraints.png)\n",
    "\n",
    "In each of the sections below, we will explore the impact of modifiers on the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ipywidgets as widgets\n",
    "import pyhf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Unconstrained Normalisation (normfactor)\n",
    "\n",
    "This is a simple scaling of a sample (primarily used for signal strengths). Controlled by a single nuisance parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = pyhf.Model(\n",
    "    {\n",
    "        \"channels\": [\n",
    "            {\n",
    "                \"name\": \"singlechannel\",\n",
    "                \"samples\": [\n",
    "                    {\n",
    "                        \"name\": \"signal\",\n",
    "                        \"data\": [5.0, 10.0],\n",
    "                        \"modifiers\": [\n",
    "                            {\n",
    "                                \"name\": \"my_normfactor\",\n",
    "                                \"type\": \"normfactor\",\n",
    "                                \"data\": None,\n",
    "                            }\n",
    "                        ],\n",
    "                    }\n",
    "                ],\n",
    "            }\n",
    "        ]\n",
    "    },\n",
    "    poi_name=None,\n",
    ")\n",
    "\n",
    "print(f\"  aux data: {model.config.auxdata}\")\n",
    "print(f\"   nominal: {model.expected_data([1.0])}\")\n",
    "print(f\"2x nominal: {model.expected_data([2.0])}\")\n",
    "print(f\"3x nominal: {model.expected_data([3.0])}\")\n",
    "\n",
    "\n",
    "@widgets.interact(normfactor=(0, 10, 1))\n",
    "def interact(normfactor=1):\n",
    "    print(f\"normfactor = {normfactor:2d}: {model.expected_data([normfactor])}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example use case\n",
    "\n",
    "A `normfactor` modifier could be applied to scale the signal rate of a possible BSM signal or simultaneous in-situ measurements of background samples."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Normalisation Uncertainty (normsys)\n",
    "\n",
    "This is a simple constrained scaling of a sample representing the sample normalisation uncertainty, applied to every bin in the sample. Controlled by a single nuisance parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = pyhf.Model(\n",
    "    {\n",
    "        \"channels\": [\n",
    "            {\n",
    "                \"name\": \"singlechannel\",\n",
    "                \"samples\": [\n",
    "                    {\n",
    "                        \"name\": \"signal\",\n",
    "                        \"data\": [5.0, 10.0],\n",
    "                        \"modifiers\": [\n",
    "                            {\n",
    "                                \"name\": \"my_normsys\",\n",
    "                                \"type\": \"normsys\",\n",
    "                                \"data\": {\"hi\": 0.9, \"lo\": 1.1},\n",
    "                            }\n",
    "                        ],\n",
    "                    }\n",
    "                ],\n",
    "            }\n",
    "        ]\n",
    "    },\n",
    "    poi_name=None,\n",
    ")\n",
    "\n",
    "print(f\"  aux data: {model.config.auxdata}\")\n",
    "print(f\"      down: {model.expected_data([-1.0])}\")\n",
    "print(f\"   nominal: {model.expected_data([0.0])}\")\n",
    "print(f\"        up: {model.expected_data([1.0])}\")\n",
    "\n",
    "\n",
    "@widgets.interact(normsys=(-1, 1, 0.1))\n",
    "def interact(normsys=0):\n",
    "    print(f\"normsys = {normsys:4.1f}: {model.expected_data([normsys])}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What do you think happens if we switch `\"hi\"` and `\"lo\"`?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example use case\n",
    "\n",
    "A `normsys` modifier could represent part of the uncertainty associated with a jet energy scale or a jet energy resolution (which can contribute both a `normsys` and a `histosys` to a sample)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Correlated Shape (histosys)\n",
    "\n",
    "A `histosys` modifier applies a correlated per-bin shape unceratinty to a sample. The `hi_data` (`lo_data`) represent the expected event rates for upward (downward) variations for each bin of the sample. Controlled by a single nuisance parameter.\n",
    "\n",
    "Unlike most other modifiers, this is an additive effect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = pyhf.Model(\n",
    "    {\n",
    "        \"channels\": [\n",
    "            {\n",
    "                \"name\": \"singlechannel\",\n",
    "                \"samples\": [\n",
    "                    {\n",
    "                        \"name\": \"signal\",\n",
    "                        \"data\": [5.0, 10.0],\n",
    "                        \"modifiers\": [\n",
    "                            {\n",
    "                                \"name\": \"my_histosys\",\n",
    "                                \"type\": \"histosys\",\n",
    "                                \"data\": {\"hi_data\": [15, 22], \"lo_data\": [5, 18]},\n",
    "                            }\n",
    "                        ],\n",
    "                    }\n",
    "                ],\n",
    "            }\n",
    "        ]\n",
    "    },\n",
    "    poi_name=None,\n",
    ")\n",
    "\n",
    "print(f\"  aux data: {model.config.auxdata}\")\n",
    "print(f\"      down: {model.expected_data([-1.0])}\")\n",
    "print(f\"   nominal: {model.expected_data([0.0])}\")\n",
    "print(f\"        up: {model.expected_data([1.0])}\")\n",
    "\n",
    "\n",
    "@widgets.interact(histosys=(-1, 1, 0.1))\n",
    "def interact(histosys=0):\n",
    "    print(f\"histosys = {histosys:4.1f}: {model.expected_data([histosys])}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What's going on with the data? As this is a shape-related uncertainty, we're providing (absolute) histograms for the modifier to interpolate the up/down variations with respect to the nominal (the sample data)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example use case\n",
    "\n",
    "A `histosys` modifier could represent part of the uncertainty associated with a jet energy scale or a jet energy resolution (which can contribute both a `normsys` and a `histosys` to a sample)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Uncorrelated Shape (shapesys)\n",
    "\n",
    "A shape uncertainty, but fully **uncorrelated** across bins (applied to each bin in a sample independently). Controlled by $n$ nuisance parameters (one for each bin)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = pyhf.Model(\n",
    "    {\n",
    "        \"channels\": [\n",
    "            {\n",
    "                \"name\": \"singlechannel\",\n",
    "                \"samples\": [\n",
    "                    {\n",
    "                        \"name\": \"signal\",\n",
    "                        \"data\": [5.0, 10.0],\n",
    "                        \"modifiers\": [\n",
    "                            {\"name\": \"my_shapesys\", \"type\": \"shapesys\", \"data\": [1, 4]}\n",
    "                        ],\n",
    "                    }\n",
    "                ],\n",
    "            }\n",
    "        ]\n",
    "    },\n",
    "    poi_name=None,\n",
    ")\n",
    "\n",
    "print(f\"aux data: {model.config.auxdata}\")\n",
    "print(f\"(1x, 1x): {model.expected_data([1.0, 1.0])}\")\n",
    "print(f\"(2x, 2x): {model.expected_data([2.0, 2.0])}\")\n",
    "print(f\"(3x, 3x): {model.expected_data([3.0, 3.0])}\")\n",
    "\n",
    "\n",
    "@widgets.interact(shapesys_0=(0, 10, 1), shapesys_1=(0, 10, 1))\n",
    "def interact(shapesys_0=1, shapesys_1=1):\n",
    "    print(\n",
    "        f\"shapesys = ({shapesys_0:2d}, {shapesys_1:2d}): {model.expected_data([shapesys_0, shapesys_1])}\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note the differences with `histosys`. Here, we specify the absolute uncertainty which is fed into the Poisson constraint. This tends to feel more like a bin-by-bin (bin-wise?) constrained `normfactor`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example use case\n",
    "\n",
    "A `shapesys` modifier could represent the statistical uncertainty of each bin in a sample."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MC Statistical Uncertainty (staterror)\n",
    "\n",
    "A `staterror` modifier is applied as a set of bin-wise scale factors to each sample. This is used to model the uncertainty in the bins due to Monte Carlo statistics. Controlled by $n$ nuisance parameters (one for each bin).\n",
    "\n",
    "In particular, this tends to be correlated across samples as they're usually modeled by the same MC generator. Unlike `shapesys` which is constrained by a Poisson, this modifier is constrained by a Gaussian."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = pyhf.Model(\n",
    "    {\n",
    "        \"channels\": [\n",
    "            {\n",
    "                \"name\": \"singlechannel\",\n",
    "                \"samples\": [\n",
    "                    {\n",
    "                        \"name\": \"signal\",\n",
    "                        \"data\": [5.0, 10.0],\n",
    "                        \"modifiers\": [\n",
    "                            {\n",
    "                                \"name\": \"my_staterror\",\n",
    "                                \"type\": \"staterror\",\n",
    "                                \"data\": [1.0, 2.0],\n",
    "                            }\n",
    "                        ],\n",
    "                    }\n",
    "                ],\n",
    "            }\n",
    "        ]\n",
    "    },\n",
    "    poi_name=None,\n",
    ")\n",
    "\n",
    "print(f\"aux data: {model.config.auxdata}\")\n",
    "print(f\"(1x, 1x): {model.expected_data([1.0, 1.0])}\")\n",
    "print(f\"(2x, 2x): {model.expected_data([2.0, 2.0])}\")\n",
    "print(f\"(3x, 3x): {model.expected_data([3.0, 3.0])}\")\n",
    "\n",
    "\n",
    "@widgets.interact(staterror_0=(0, 10, 1), staterror_1=(0, 10, 1))\n",
    "def interact(staterror_0=1, staterror_1=1):\n",
    "    print(\n",
    "        f\"staterror = ({staterror_0:2d}, {staterror_1:2d}): {model.expected_data([staterror_0, staterror_1])}\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This looks/feels a lot like `shapesys`. Is there a difference in the expected data? What happens to the data for the `staterror`? (*Hint: it has to do with the Gaussian constraint.*)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example use case\n",
    "\n",
    "A `staterror` modifier represents the inherent statistical uncertainty due to the finite sample size of Monte Carlo simulations of physics processes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Luminosity (lumi)\n",
    "\n",
    "A `lumi` modifier is applied as a global scale factor across all samples in a channel and sample boundaries. Sample rates derived from theory calculations, as opposed to data-driven estimates (`shapefactor`), are scaled to the integrated luminosity corresponding to the observed data. Controlled by a single nuisance parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = pyhf.Model(\n",
    "    {\n",
    "        \"channels\": [\n",
    "            {\n",
    "                \"name\": \"singlechannel\",\n",
    "                \"samples\": [\n",
    "                    {\n",
    "                        \"name\": \"signal\",\n",
    "                        \"data\": [5.0, 10.0],\n",
    "                        \"modifiers\": [{\"name\": \"lumi\", \"type\": \"lumi\", \"data\": None}],\n",
    "                    }\n",
    "                ],\n",
    "            }\n",
    "        ],\n",
    "        \"parameters\": [\n",
    "            {\n",
    "                \"name\": \"lumi\",\n",
    "                \"auxdata\": [1.0],\n",
    "                \"sigmas\": [0.017],\n",
    "                \"bounds\": [[0.915, 1.085]],\n",
    "                \"inits\": [1.0],\n",
    "            }\n",
    "        ],\n",
    "    },\n",
    "    poi_name=None,\n",
    ")\n",
    "\n",
    "print(f\"aux data: {model.config.auxdata}\")\n",
    "print(f\"(1x, 1x): {model.expected_data([1.0, 1.0])}\")\n",
    "print(f\"(2x, 2x): {model.expected_data([2.0, 2.0])}\")\n",
    "print(f\"(3x, 3x): {model.expected_data([3.0, 3.0])}\")\n",
    "\n",
    "\n",
    "@widgets.interact(lumi=(0, 10, 1))\n",
    "def interact(lumi=1):\n",
    "    print(f\"lumi = {lumi:2d}: {model.expected_data([lumi])}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example use case\n",
    "\n",
    "A `lumi` modifier represents the uncertainty of luminosity measurement applied to the samples."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data-driven Shape (shapefactor)\n",
    "\n",
    "A `shapefactor` modifier is applied as an unconstrained, bin-wise multipiciative factor and represents a data-driven estimation of sample rates (e.g. think of estimating multi-jet backgrounds). Controlled by $n$ nuisance parameters (one for each bin).\n",
    "\n",
    "This feels like a bin-wise `normfactor`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = pyhf.Model(\n",
    "    {\n",
    "        \"channels\": [\n",
    "            {\n",
    "                \"name\": \"singlechannel\",\n",
    "                \"samples\": [\n",
    "                    {\n",
    "                        \"name\": \"signal\",\n",
    "                        \"data\": [5.0, 10.0],\n",
    "                        \"modifiers\": [\n",
    "                            {\n",
    "                                \"name\": \"my_shapefactor\",\n",
    "                                \"type\": \"shapefactor\",\n",
    "                                \"data\": None,\n",
    "                            }\n",
    "                        ],\n",
    "                    }\n",
    "                ],\n",
    "            }\n",
    "        ]\n",
    "    },\n",
    "    poi_name=None,\n",
    ")\n",
    "\n",
    "print(f\"aux data: {model.config.auxdata}\")\n",
    "print(f\"(1x, 1x): {model.expected_data([1.0, 1.0])}\")\n",
    "print(f\"(2x, 2x): {model.expected_data([2.0, 2.0])}\")\n",
    "print(f\"(3x, 3x): {model.expected_data([3.0, 3.0])}\")\n",
    "\n",
    "\n",
    "@widgets.interact(shapefactor_0=(0, 10, 1), shapefactor_1=(0, 10, 1))\n",
    "def interact(shapefactor_0=1, shapefactor_1=1):\n",
    "    print(\n",
    "        f\"shapefactor = ({shapefactor_0:2d}, {shapefactor_1:2d}): {model.expected_data([shapefactor_0, shapefactor_1])}\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example use case\n",
    "\n",
    "A `shapefactor` modifier could represent a scale factor applied to multijet backgrounds derived from measurements in data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Correlating Modifiers\n",
    "\n",
    "Like in HistFactory, modifiers are controlled by parameters which are named based on the name you assign to the modifier. Therefore, as long as the modifiers you want to correlate are \"mostly\" compatible (e.g. same number of nuisance parameters allocated), you can correlate them!\n",
    "\n",
    "Let's repeat the `normsys` example, and add in a `histosys` uncertainty — both controlled by a **shared** single nuisance parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = pyhf.Model(\n",
    "    {\n",
    "        \"channels\": [\n",
    "            {\n",
    "                \"name\": \"singlechannel\",\n",
    "                \"samples\": [\n",
    "                    {\n",
    "                        \"name\": \"signal\",\n",
    "                        \"data\": [5.0, 10.0],\n",
    "                        \"modifiers\": [\n",
    "                            {\n",
    "                                \"name\": \"shared_parameter\",\n",
    "                                \"type\": \"normsys\",\n",
    "                                \"data\": {\"hi\": 0.9, \"lo\": 1.1},\n",
    "                            },\n",
    "                            {\n",
    "                                \"name\": \"shared_parameter\",\n",
    "                                \"type\": \"histosys\",\n",
    "                                \"data\": {\"hi_data\": [15, 22], \"lo_data\": [5, 18]},\n",
    "                            },\n",
    "                        ],\n",
    "                    }\n",
    "                ],\n",
    "            }\n",
    "        ]\n",
    "    },\n",
    "    poi_name=None,\n",
    ")\n",
    "\n",
    "print(f\"  aux data: {model.config.auxdata}\")\n",
    "print(f\"      down: {model.expected_data([-1.0])}\")\n",
    "print(f\"   nominal: {model.expected_data([0.0])}\")\n",
    "print(f\"        up: {model.expected_data([1.0])}\")\n",
    "\n",
    "\n",
    "@widgets.interact(shared_parameter=(-1, 1, 0.1))\n",
    "def interact(shared_parameter=0):\n",
    "    print(\n",
    "        f\"shared_parameter = {shared_parameter:4.1f}: {model.expected_data([shared_parameter])}\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We're seeing the impact of a multiplicative bin-wise correlated shape modification convoluted with an additive normalization uncertainty — both constrained by a Gaussian."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}