{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Simple template fitting using pyroofit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Note for contributors: Remember to run Kernel > Restart & Clear output before adding any changes to git!
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial we will perform template fits with ``pyroofit``.\n", "\n", "``pyroofit`` is a python wrapper for the ``RooFit`` package. Documentation can be found here: \n", " http://www.desy.de/~swehle/pyroofit/\n", " \n", "The project's code can be found at https://github.com/simonUU/PyrooFit\n", "\n", "``pyroofit`` can be installed with ``pip3 install --user pyroofit``." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:40.498401Z", "start_time": "2020-04-22T14:13:37.675901Z" } }, "outputs": [], "source": [ "import ROOT\n", "from pyroofit.models import Gauss, Chebychev\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "# Show pictures in this notebook\n", "from IPython.display import Image\n", "\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:40.520138Z", "start_time": "2020-04-22T14:13:40.501471Z" } }, "outputs": [], "source": [ "# Create some test data\n", "data = pd.DataFrame(\n", " np.concatenate((\n", " np.random.normal(-2, 1, 1000), \n", " np.random.normal(3, 2, 1000), \n", " -5 + 10* np.random.random_sample(2000)\n", " )),\n", " columns=[\"x\"]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Note: We will fit the distribution of the above data, so note the conceptual difference to the x, y data from the tutorial fitting_curves_data.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Question [medium]: Can you guess what the corresponding histogram to this data will look like and why?
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No? Then cheat and look at the histogram:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:40.858719Z", "start_time": "2020-04-22T14:13:40.522064Z" }, "scrolled": true }, "outputs": [], "source": [ "data.hist(bins=30)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## First try: Fit single gaussian as signal and line as background" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:41.005861Z", "start_time": "2020-04-22T14:13:40.860684Z" } }, "outputs": [], "source": [ "# Gaussian for signal\n", "pdf_sig = Gauss(('x', -5, 5), mean=(-1, 0, 1), title=\"Signal\")\n", "\n", "# Straight line for background\n", "pdf_bkg = Chebychev(('x', -5, 5), n=1, title=\"Background\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Question: Why is the model for the straight line called 'Chebychev'?
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we build a compound PDF from the two simple PDFs.\n", "``pyroofit`` is quite nice in that this has a very pythonic syntax (we overload the ``+`` operator):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:41.044424Z", "start_time": "2020-04-22T14:13:41.009495Z" } }, "outputs": [], "source": [ "pdf = pdf_sig + pdf_bkg" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we're ready to fit:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:41.453677Z", "start_time": "2020-04-22T14:13:41.047718Z" } }, "outputs": [], "source": [ "pdf.fit(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Don't be deterred by the amount of output, but let's look at the results:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:41.492265Z", "start_time": "2020-04-22T14:13:41.455755Z" } }, "outputs": [], "source": [ "pdf.get()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Hint:** In order to get the results as a dictionary, use ``get_parameters()`` instead:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:41.528772Z", "start_time": "2020-04-22T14:13:41.494124Z" } }, "outputs": [], "source": [ "pdf.get_parameters()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And plot:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:41.988752Z", "start_time": "2020-04-22T14:13:41.530548Z" } }, "outputs": [], "source": [ "pdf.plot(\"test.png\", legend=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Image(\"test.png\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Question 2 [easy]: Why are the results so terrible?
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Second try: Fit two gaussians as signal" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:41.999889Z", "start_time": "2020-04-22T14:13:41.990917Z" }, "run_control": { "marked": false } }, "outputs": [], "source": [ "gauss1 = Gauss(('x', -5, 5), mean=(-5, -3, 0), title=\"signal\", name=\"gauss1\")\n", "gauss2 = Gauss(('x', -5, 5), mean=(0, 3, 5), title=\"signal\", name=\"gauss2\")\n", "pdf_sig = gauss1 + gauss2\n", "pdf_bkg = Chebychev(('x', -5, 5), n=1, title=\"Background\")\n", "pdf = pdf_sig + pdf_bkg" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:42.348051Z", "start_time": "2020-04-22T14:13:42.001790Z" }, "run_control": { "marked": true } }, "outputs": [], "source": [ "pdf.fit(data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:42.385421Z", "start_time": "2020-04-22T14:13:42.350093Z" } }, "outputs": [], "source": [ "print(\"Gauss 1:\")\n", "print(gauss1.get())\n", "print(\"Gauss 2:\")\n", "print(gauss2.get())\n", "print(\"Bkg:\")\n", "print(pdf_bkg.get())" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:42.553699Z", "start_time": "2020-04-22T14:13:42.387565Z" } }, "outputs": [], "source": [ "pdf.plot(\"test2.png\", legend=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Image(\"test2.png\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 1" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2020-04-22T13:54:32.164691Z", "start_time": "2020-04-22T13:54:32.157603Z" } }, "source": [ "
\n", "Exercise 1 [easy]: Fit one gaussian for signal and a linear background model to the following dataset:\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:13:42.584024Z", "start_time": "2020-04-22T14:13:42.555506Z" } }, "outputs": [], "source": [ "data = pd.DataFrame(\n", " np.concatenate((\n", " np.random.normal(-2, 1, 1000), \n", " -5 + 10* np.random.random_sample(2000)\n", " )),\n", " columns=[\"x\"]\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fixing templates from MC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous examples, we simply \"knew\" that our signal was shaped like a (two) Gaussian(s) and the background was linear.\n", "\n", "Usually however, the situation isn't as simple and we first have to learn how our signal and background looks like by looking at MC data. Remember that in MC we always know signal from background (it's simulated data after all).\n", "\n", "Thus, we can first fit our signal and background model to the MC, then fix the parameters. Now we have two \n", "PDFs $\\mathrm{pdf}_\\mathrm{sig}$ and $\\mathrm{pdf}_\\mathrm{bkg}$ and get the signal and background yields by\n", "fitting the data with $\\mu_\\mathrm{sig}\\mathrm{pdf}_\\mathrm{sig} + \\mu_\\mathrm{bkg}\\mathrm{pdf}_\\mathrm{bkg}$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:44.240647Z", "start_time": "2020-04-22T14:16:44.214231Z" } }, "outputs": [], "source": [ "mc_signal = pd.DataFrame(\n", " np.random.normal(-2, 1, 1000),\n", " columns=[\"x\"]\n", ")\n", "\n", "mc_bkg = pd.DataFrame(\n", " np.concatenate((\n", " np.random.normal(2, 1, 1000), \n", " -5 + 10* np.random.random_sample(2000)\n", " )),\n", " columns=[\"x\"]\n", ")\n", "\n", "data = pd.DataFrame(\n", " np.concatenate((\n", " np.random.normal(2, 1, int(1.2*1000)), \n", " -5 + 10* np.random.random_sample(int(1.2*2000)),\n", " np.random.normal(-2, 1, int(0.3*1000))\n", " )),\n", " columns=[\"x\"]\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:45.775886Z", "start_time": "2020-04-22T14:16:45.452423Z" } }, "outputs": [], "source": [ "mc_signal.hist(bins=30)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:47.058863Z", "start_time": "2020-04-22T14:16:46.788296Z" } }, "outputs": [], "source": [ "mc_bkg.hist(bins=30)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:47.491617Z", "start_time": "2020-04-22T14:16:47.484001Z" } }, "outputs": [], "source": [ "pdf_sig = Gauss(('x', -5, 5), mean=(-4, -2, 0), title=\"signal\", name=\"gauss1\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:48.333720Z", "start_time": "2020-04-22T14:16:48.305218Z" } }, "outputs": [], "source": [ "pdf_sig.fit(mc_signal)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:49.514597Z", "start_time": "2020-04-22T14:16:49.420275Z" } }, "outputs": [], "source": [ "pdf_sig.plot(\"mc_signal_fit.png\", legend=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Image(\"mc_signal_fit.png\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:50.303290Z", "start_time": "2020-04-22T14:16:50.285513Z" } }, "outputs": [], "source": [ "pdf_sig.fix()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:52.163399Z", "start_time": "2020-04-22T14:16:52.155576Z" } }, "outputs": [], "source": [ "pdf_bkg = (\n", " Gauss(('x', -5, 5), mean=(-5, 3, 5), title=\"Background\", name=\"gauss2\")\n", " + Chebychev(('x', -5, 5), n=1, title=\"Background\")\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:53.335247Z", "start_time": "2020-04-22T14:16:53.138835Z" } }, "outputs": [], "source": [ "pdf_bkg.fit(mc_bkg)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:54.462858Z", "start_time": "2020-04-22T14:16:54.330521Z" } }, "outputs": [], "source": [ "pdf_bkg.plot(\"mc_bkg_fit.png\", legend=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Image(\"mc_bkg_fit.png\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:55.495021Z", "start_time": "2020-04-22T14:16:55.482416Z" } }, "outputs": [], "source": [ "pdf_bkg.fix()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:56.517537Z", "start_time": "2020-04-22T14:16:56.503584Z" } }, "outputs": [], "source": [ "pdf_bkg.get()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:56.526524Z", "start_time": "2020-04-22T14:16:56.519294Z" } }, "outputs": [], "source": [ "pdf = pdf_sig + pdf_bkg" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:56.582830Z", "start_time": "2020-04-22T14:16:56.528231Z" } }, "outputs": [], "source": [ "pdf.fit(data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:16:56.717651Z", "start_time": "2020-04-22T14:16:56.584874Z" } }, "outputs": [], "source": [ "pdf.plot(\"fit_to_data.png\", legend=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Image(\"fit_to_data.png\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:17:01.161121Z", "start_time": "2020-04-22T14:17:01.142243Z" } }, "outputs": [], "source": [ "pdf.get()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-04-22T14:17:04.142816Z", "start_time": "2020-04-22T14:17:04.128794Z" } }, "outputs": [], "source": [ "pdf_bkg.get()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "position": { "height": "466.85px", "left": "1550px", "right": "20px", "top": "120px", "width": "350px" }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": true } }, "nbformat": 4, "nbformat_minor": 2 }