{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as sts\n", "from celluloid import Camera\n", "from IPython.display import HTML\n", "import jax.numpy as jnp\n", "import jax.scipy as jsc\n", "from jax.nn import softmax\n", "\n", "plt.rc(\"figure\", figsize=(10.0, 3.0), dpi=120, facecolor=\"w\")\n", "np.random.seed(222)\n", "\n", "\n", "def kde(x, data, h):\n", " return jnp.mean(jsc.stats.norm.pdf(x.reshape(-1, 1), loc=data, scale=h), axis=1)\n", "\n", "\n", "def kde_hist(events, bins, bandwidth=None, density=False):\n", " \"\"\"\n", " Args:\n", " events: (jax array-like) data to filter.\n", "\n", " bins: (jax array-like) intervals to calculate counts.\n", "\n", " bandwidth: (float) value that specifies the width of the individual\n", " distributions (kernels) whose cdfs are averaged over each bin. Defaults\n", " to Scott's rule -- the same as the scipy implementation of kde.\n", "\n", " density: (bool) whether or not to normalize the histogram to unit area.\n", " Returns:\n", " binned counts, calculated by kde!\n", " \"\"\"\n", " bandwidth = bandwidth or events.shape[-1] ** -0.25 # Scott's rule\n", "\n", " edge_hi = bins[1:] # ending bin edges ||<-\n", " edge_lo = bins[:-1] # starting bin edges ->||\n", "\n", " # get cumulative counts (area under kde) for each set of bin edges\n", " cdf_up = jsc.stats.norm.cdf(edge_hi.reshape(-1, 1), loc=events, scale=bandwidth)\n", " cdf_dn = jsc.stats.norm.cdf(edge_lo.reshape(-1, 1), loc=events, scale=bandwidth)\n", " # sum kde contributions in each bin\n", " counts = (cdf_up - cdf_dn).sum(axis=1)\n", "\n", " if density: # normalize by bin width and counts for total area = 1\n", " db = jnp.array(jnp.diff(bins), float) # bin spacing\n", " return counts / db / counts.sum(axis=0)\n", "\n", " return counts" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This presentation is a Jupyter notebook! (thanks to https://github.com/damianavila/RISE )\n", "2 * 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Physics analysis as a differentiable program\n", "**Nathan Simpson** \n", "\n", "(supervised by & in collaboration with **Lukas Heinrich**)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", " \n", " \n", " \n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# We three things of presentation are:\n", "\n", "- **What does it mean for analysis to be 'differentiable'?**\n", " - *Why even care?*\n", "\n", "- **How can we differentiate physics?**\n", " - feat. smoothing, pyhf, autodiff\n", "\n", "- **Seeing it in action: *neos***\n", " - **m a c h i n e l e a r n i n g**!!!!! omg!!!!!!!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**A typical analysis:** \n", "\n", "datacutflowobservablemodeltest statistichypothesis test*limits* (p-values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Free parameters:** cut values, parameters of a multivariate observable\n", "\n", "*Can we optimize them?*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Sure we can!** \n", "\n", "Here's what we might do:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- *Cuts:* grid search of cut positions\n", " - Objective: maximize approximate median significance $Z_{A}^{\\left(\\sigma_{b}=0\\right)}=\\sqrt{2((s+b) \\ln (1+s / b)-s)}$\n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- *Observable (e.g. neural network):* train parameters $\\phi$ via gradient descent\n", " - Objective: minimize binary cross entropy (signal vs background)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Why do the methods and objectives differ if they're in the same pipeline?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**They don't have to!**\n", "\n", "- *Sigmoid* cut optimized wrt $Z_A$ using gradient descent: ([notebook from Alex Held](https://mybinder.org/v2/gh/alexander-held/differentiable-analysis-example/master?filepath=Significance_optimization.ipynb))\n", " - Green curve is the cut profile, centered around the grey dotted line\n", "\n", "
\n", " \n", " \n", "
\n", "\n", "- Using $1/Z_A$ as a loss function: https://arxiv.org/abs/1806.00322" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One could then imagine a *jointly-optimized pipeline* with respect to a single objective, trained using gradient descent.\n", "\n", "This raises an important question:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **What should that objective be?**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Picking the right objective\n", "\n", "Insights from Sec 3.1, [Deep Learning and its Application to LHC Physics](https://arxiv.org/abs/1806.11484):\n", "\n", "> \"tools are often optimized for performance on a particular task that is several steps removed from the ultimate physical goal of searching for a new particle or testing a new\n", "physical theory\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Picking the right objective\n", "\n", "Insights from Sec 3.1, [Deep Learning and its Application to LHC Physics](https://arxiv.org/abs/1806.11484):\n", "\n", "> \"sensitivity to high-level physics questions must account for systematic uncertainties, which involve a nonlinear\n", "trade-off between the typical machine learning performance metrics and the systematic uncertainty estimates.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want:\n", "- a goal that is directly related to our physics objective\n", "- to take into account the *full profile likelihood* in order to be robust to systematic variations of nusiance parameters\n", "\n", "*But don't we already have this?*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can optimize our analysis *end-to-end* with respect to the actual significance... if we can evaluate its gradient." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Differentiable analysis:\n", "\n", "**Analysis with free parameters $\\varphi$ that can be optimized end-to-end using gradient-based methods.**\n", "\n", "$$\\varphi' = \\varphi - \\frac{\\partial \\, \\mathsf{objective} }{\\partial \\, \\mathsf{\\varphi}}\\times \\mathsf{learning~rate} $$\n", "\n", "This is an example of **differentiable programming**, which is a superset of machine learning.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Due to the chain rule, *every step in-between the objective and the parameters must also be differentiable.*\n", "\n", "e.g. $$\\frac{\\partial \\, \\mathsf{objective} }{\\partial \\, \\mathsf{\\varphi}} = \\frac{\\partial \\, \\mathsf{objective} }{\\partial \\, \\mathsf{likelihood}} \\times \\frac{\\partial \\, \\mathsf{likelihood} }{\\partial \\, \\mathsf{model~parameters}}\\times\\frac{\\partial \\, \\mathsf{model~parameters} }{\\partial \\, \\mathsf{cut~values}}\\times~...$$\n", "\n", "... not guaranteed that all terms are finite, tractable, or even exist at all!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One could then imagine a *jointly-optimized pipeline* with respect to a single objective, trained using gradient descent.\n", "\n", "This raises ~an~ **two** important question**s**:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- What should that objective be?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **Can we evaluate its gradient?**\n", "\n", " - $\\frac{\\partial \\, \\mathsf{objective} }{\\partial \\, \\mathsf{parameters}}$ = ? \n", " - Is *every step* of the analysis differentiable?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Making physics differentiable: histograms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The height of the bins in a histogram *do not vary smoothly* with respect to shifts in the data. (remember, gradients are the language of small changes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- But this is how optimizing analysis works -- updating the parameters (cuts, weights, etc.) changes the shape of the data downstream.\n", "- We need that change to be well-behaved, as that's how we update the parameters in the first place!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA7kAAAEnCAYAAABsVITaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAASdAAAEnQB3mYfeAAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAG5ElEQVR4nO3XwQkAIBDAMHX/nc8lBKEkE/TbPTOzAAAAIOD8DgAAAIBXTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBxAXS2Bkr9u/+EAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig = plt.figure()\n", "cam = Camera(fig)\n", "plt.xlim([-1, 4])\n", "plt.axis(\"off\")\n", "bins = np.linspace(-1, 4, 7)\n", "centers = bins[:-1] + np.diff(bins) / 2.0\n", "grid = np.linspace(-1, 4, 500)\n", "mu_range = np.linspace(1, 2, 100)\n", "data = np.random.normal(size=100)\n", "truths = sts.norm(loc=mu_range.reshape(-1, 1)).pdf(grid)\n", "for i, mu in enumerate(mu_range):\n", " plt.plot(grid, truths[i], color=\"C1\") # plot true data distribution\n", " plt.hist(\n", " data + mu, bins=bins, density=True, color=\"C0\", alpha=0.6\n", " ) # histogram data\n", " plt.axvline(mu, color=\"slategray\", linestyle=\":\", alpha=0.6)\n", " cam.snap()\n", "animation = cam.animate()\n", "# HTML(animation.to_html5_video()) # uncomment for animation!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Making physics differentiable: histograms\n", "\n", "An alternative: kernel density estimation\n", "\n", "$$\\mathsf{kde}(x ; h, \\mathsf{data}) = \\mathsf{mean}_{i\\,\\in\\,\\mathsf{len(data)}}\\left[K\\left(\\frac{x-\\mathsf{data}[i]}{h}\\right)\\right]\\times \\frac{1}{h}$$\n", "\n", "given some point $x$, kernel function $K$, and bandwith $h$ (smoothing parameter). In practice, *the choice of $h$ has significantly more impact on the shape of the kde than the kernel choice*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Normally (ahem), we use the *standard normal distribution* for $K$ \n", "\n", "$$\\Rightarrow \\mathsf{kde}(x ; h) = \\mathsf{mean}_{i\\,\\in\\,\\mathsf{len(data)}}[\\mathsf{Normal}(\\mu_i = \\mathsf{data}[i], \\sigma_i = h )(x)]$$ \n", "\n", "\n", "**This is just an average of gaussians centered at each data point!** We control their widths *globally* using the bandwidth (one size fits all)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What happens when we vary the bandwidth?\n", "\n", "Example for a 1-D gaussian:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to estimate counts in a binned interval, we can still do that!\n", "- The kde is just a *mixture of gaussians*, and has a tractable pdf/cdf\n", "- For a bin with edges $[a,b]$, one can just evaluate the density $f(a\\leqslant x" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA7kAAAEnCAYAAABsVITaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAASdAAAEnQB3mYfeAAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAG5ElEQVR4nO3XwQkAIBDAMHX/nc8lBKEkE/TbPTOzAAAAIOD8DgAAAIBXTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBhcgEAAMgwuQAAAGSYXAAAADJMLgAAABkmFwAAgAyTCwAAQIbJBQAAIMPkAgAAkGFyAQAAyDC5AAAAZJhcAAAAMkwuAAAAGSYXAACADJMLAABAhskFAAAgw+QCAACQYXIBAADIMLkAAABkmFwAAAAyTC4AAAAZJhcAAIAMkwsAAECGyQUAACDD5AIAAJBxAXS2Bkr9u/+EAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "bw = 0.6\n", "fig = plt.figure()\n", "cam = Camera(fig)\n", "plt.xlim([-1, 4])\n", "plt.axis(\"off\")\n", "for i, mu in enumerate(mu_range):\n", " plt.plot(grid, truths[i], color=\"C1\")\n", " plt.plot(grid, kde(grid, data + mu, h=bw), color=\"C9\", linestyle=\":\")\n", " plt.bar(\n", " centers,\n", " kde_hist(data + mu, bins=bins, bandwidth=bw, density=True),\n", " color=\"C9\",\n", " width=5 / (len(bins) - 1),\n", " alpha=0.6,\n", " ) # histogram data\n", " plt.axvline(mu, color=\"slategray\", linestyle=\":\", alpha=0.6)\n", " cam.snap()\n", "animation = cam.animate()\n", "# animation.save('ani.gif',writer='imagemagick')\n", "# HTML(animation.to_html5_video()) # uncomment for animation!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Another 'soft' histogram:\n", "\n", "Could also use a **softmax**: $$ \\mathsf{softmax}(\\mathbf{x}; \\tau)=\\frac{e^{x_i / \\tau}}{\\sum_{i=0}^{\\mathsf{len(\\mathbf{x})}} e^{x_i / \\tau}}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- $\\tau$ is a softness hyperparameter, just like bandwidth!\n", "- Softmax is a natural choice when working with a neural network that maps events to an observable\n", " - $\\mathbf{x}$ would be the output of $f$ for one event\n", " - Then get normalized counts, with len(bins) = len(output of $f$)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Making physics differentiable: likelihood function\n", "\n", "For the HistFactory-prescribed likelihood, [**pyhf**](https://github.com/scikit-hep/pyhf) has us covered$^*$!\n", "\n", "$^*$not *everything* in pyhf is differentiable (yet)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Big thanks to the team :)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Making physics differentiable: profile likelihood ratio\n", "\n", "Things get trickier when we consider the profile likelihood ratio: \n", "\n", "$$\\lambda(\\mu)=\\frac{L\\left(\\mu, \\hat{\\hat{\\boldsymbol{\\theta}}}\\right)}{L\\left(\\hat{\\mu}, \\hat{\\boldsymbol{\\theta}}\\right)}$$\n", "\n", "Both $\\hat{\\hat{\\boldsymbol{\\theta}}}$ and $(\\hat{\\boldsymbol{\\theta}},\\hat{\\boldsymbol{\\mu}})$ are defined by $\\mathsf{argmax}$ operators, which involve doing maximum likelihood fits.\n", "\n", "*Fitting is not trivially differentiable!*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Why not?\n", "- Normally involves a large number of iterations, which would be unrolled one-by-one when tracing computations\n", " \n", " - We don't need all that information -- *we just want to know how the final fitted values change!*\n", "- The number of iterations doesn't vary smoothly" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "Solution? **Fixed-point differentiation!**\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can take advantage of the fact that $\\mathsf{maximize}$ (or equivalently $\\mathsf{minimize}$) routines have a **fixed point**:\n", "- $\\mathsf{minimize}$($L$, $x_{\\mathsf{init}}$) = $x_*$\n", "- $\\mathsf{minimize}$($L$, $x_*$) = $x_*$\n", "- $\\Rightarrow\\mathsf{minimize}(L,x_*)-x_*=0~~,$ so $x_*$ is a fixed point of $\\mathsf{minimize}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we know a fixed point exists, we can calculate the gradient using the *converged iterations only!*\n", "\n", "\n", "\n", "(graphic from [this paper on adjoints of fixed-point iterations](https://hal.inria.fr/hal-01079185))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Which gradient, exactly?\n", "\n", "The pyhf likelihood (and its local maxima) is *implicitly* a function of the signal and background yields. We need this gradient in the wider context of optimizing wrt the profile likelihood ratio:\n", "\n", "$$\\frac{\\partial \\, \\mathsf{profile~likelihood} }{\\partial \\, \\mathsf{\\varphi}} = \\frac{\\partial \\, \\mathsf{profile~likelihood} }{\\partial (s,b)} \\times \\frac{\\partial (s,b)}{\\partial \\, \\mathsf{\\mathsf{\\varphi}}}$$\n", "\n", "But then $\\frac{\\partial \\, \\mathsf{profile~likelihood} }{\\partial (s,b)}$ involves $ \\frac{\\partial \\, \\mathsf{profile~likelihood} }{\\partial \\hat{\\hat{\\boldsymbol{\\theta}}}} \\times \\frac{\\partial \\, \\hat{\\hat{\\boldsymbol{\\theta}}}}{\\partial (s,b)}$ etc.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Think about the moving gaussian example: changing the parameter $\\mu$ changed the shape of the histogram, but we don't write **histogram(data,$\\mu$)**. Likewise, we don't write $\\hat{\\hat{\\boldsymbol{\\theta}}}(s,b)$, *but we can still find the gradient*!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Putting it all together:\n", "Recap:\n", "- Using 'soft' histograms and cuts, we can differentiate computations up to the stage of constructing an observable.\n", "- Thanks to pyhf, we have differentiable model building & likelihoods that build off of this observable.\n", "- Combined with fixed-point differentiation, we can extend this differentiability downstream to the profile likelihood." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**We now have the ingredients to differentiate a whole analysis!**\n", "\n", "Now comes the fun part: lets *optimize* it :)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Differentiable analysis in practice:\n", "\n", "\n", "\n", "\n", "because we don't need a neural net for differentiable analysis (though we do use one :D)\n", "\n", "[Click me for Github! (and maybe leave a star... haha just kidding... unless?)](https://github.com/pyhf/neos)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "neos is implemented in Python using [**jax**](https://github.com/google/jax) (think numpy, but differentiable)\n", "\n", "\n", "\n", "- Lets you write numpythonic code, and get gradients for free!\n", "\n", "We also make use of [**fax**](https://github.com/gehring/fax/), which contains a wrapper function that lets you take gradients of fixed-point loops using jax!\n", "\n", "**Many thanks to both teams!**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Optional jax demo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## neos demo: softmax, 2 bins, 2 background blobs\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## neos demo: kde, 3 bins, 3 background blobs (up/down variations)\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Next steps:\n", "\n", "A high-level roadmap:\n", "\n", "- Actually release a paper (the software is citable though: https://zenodo.org/record/3697981#.Xw8KmJNKhhE)\n", "- Make pyhf *fully differentiable*\n", " - Would also be awesome to have awkward-array support autodiff (having this conversation on the side)\n", "- More studies\n", "- More differentiable tools\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Next steps:\n", "\n", "Many of these goals are the target of gradHEP!\n", "\n", "gradHEP is an effort to consolidate differentiable building blocks for analysis into a set of common tools, and apply them.\n", "\n", "See the ['Differentiable computing' HSF activity](https://hepsoftwarefoundation.org/activities/differentiablecomputing.html) to find ways to get involved -- all are welcome at this very early stage! :)\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extra thanks:\n", "- Pablo de Castro and Tommaso Dorigo for [INFERNO](https://arxiv.org/abs/1806.04743) (an inspiration for this work) and for discussions\n", "- Folks in IRIS-HEP for their support & discussions\n", "- Lukas for being a great supervisor\n", "- The organisers of PyHEP! (you rock)\n", "\n", "**and thanks to you for listening!! :)**\n", "\n", "Twitters: [@phi_nate](https://twitter.com/phi_nate), [@lukasheinrich_](https://twitter.com/lukasheinrich_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Backup: which bandwidth?\n", "\n", "Can get an optimal bandwidth wrt 'mean integrated square error' if you do adaptive kdes: see [this paper by Kyle Cranmer](https://arxiv.org/pdf/hep-ex/0011057.pdf)\n", "\n", "Have also been doing some studies on bandwidth vs number of samples:\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Backup: softmax v kde?\n", "\n", "A future study to be done! See [this issue on github](https://github.com/gradhep/center/issues/3)\n", "\n", "It's debatable whether you get any additional expressivity from putting a kde on top of a neural network (and in practice, it seems like the softmax converges more stably), but:\n", "\n", "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 4 }