{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# `order`: An introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example we get to know the most important classes of *order* and how they are related to describe your analysis and all external data. We will set up a simple but scalable example analysis that involves most of the API. For more info, see the full [API documentation](http://python-order.readthedocs.io/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Classes and Relations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| Name | Purpose |\n", "| :-------------------------------------------------------- | :---------------------------------------------------------------------------------------------------: |\n", "| [`Analysis`](http://python-order.readthedocs.io#analysis) | Represents the central object of a physics analysis. |\n", "| [`Campaign`](http://python-order.readthedocs.io#campaign) | Provides data of a well-defined range of data-taking, detector alignment, MC settings, datasets, etc. |\n", "| [`Config`](http://python-order.readthedocs.io#config) | Holds analysis information related to a campaign instance (most configuration happens here!). |\n", "| [`Dataset`](http://python-order.readthedocs.io#dataset) | Definition of a dataset, produced for / measured in a campaign. |\n", "| [`Process`](http://python-order.readthedocs.io#process) | Phyiscs process with cross sections for multiple center-of-mass energies, labels, etc. |\n", "| [`Channel`](http://python-order.readthedocs.io#channel) | Analysis channel, often defined by a particular decay resulting in distinct final state objects. |\n", "| [`Category`](http://python-order.readthedocs.io#category) | Category definition, (optionally) within the phase-space of an analysis channel. |\n", "| [`Variable`](http://python-order.readthedocs.io#variable) | Generic variable description providing expression and selection statements, titles, binning, etc. |\n", "| [`Shift`](http://python-order.readthedocs.io#shift) | Represents a systematic shift with a name, direction and type. |" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "*Relations* between these classes are the glue that hold an analysis together. If you have ever performed a HEP analysis, they might look pretty familiar to you." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `Analysis` ↔ `Campaign` ↔ `Config`\n", "\n", "1. An analysis is not limited to a single campaign (e.g. for combining results across several data-taking periods or even experiments).\n", "2. A campaign is independent of analyses it is used in. In general, it could be defined externally / centrally.\n", "3. An analysis stores campaign-related data in config objects.\n", "4. An analysis can store multiple config objects that are related to the same campaign." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `Campaign`, `Config` ↔ `Dataset`\n", "\n", "1. A campaign can contain all datasets that were recorded / produced for its era and settings.\n", "2. A config contains a subset of its campaign's datasets, depending on what is required in its analysis.\n", "3. A dataset belongs to a campaign and since a config is distinctly assigned to a campaign, a dataset is also related to a config." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `Dataset` ↔ `Process`\n", "\n", "1. A dataset contains physics processes.\n", "2. A process can be contained in multiple datasets.\n", "3. Processes can have *child* and *parent* processes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `Channel` ↔ `Category`\n", "\n", "1. A category describes a sub-phase-space of a channel, therefore, it belongs to a channel and channels have categories.\n", "2. Channels can have *child* channels and a *parent* channel.\n", "3. Categories can have *child* and *parent* categories." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `Config` ↔ `Channel`, `Variable`, `Shift`\n", "\n", "1. A config has channels.\n", "2. A config has variables.\n", "3. A config has shifts." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example Analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, we define a toy $t\\bar{t}H$ analysis with a signal dataset, a $t\\bar{t}$ background and real data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###### Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import order as od\n", "import scinum as sn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###### General, Analysis-unrelated Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define a campaign, its datasets and link processes. This could be done externally or even via importing a centrally maintained repository." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 1, 1]\n" ] } ], "source": [ "# campaign\n", "c_2017 = od.Campaign(\"2017_13Tev_25ns\", 1, ecm=13, bx=25)\n", "\n", "# processes\n", "p_data = od.Process(\"data\", 1,\n", " is_data=True,\n", " label=\"data\",\n", ")\n", "p_ttH = od.Process(\"ttH\", 2,\n", " label=r\"$t\\bar{t}H$\",\n", " xsecs={\n", " 13: sn.Number(0.5071, {\"scale\": (sn.Number.REL, 0.058, 0.092)}),\n", " },\n", ")\n", "p_tt = od.Process(\"tt\", 3,\n", " label=r\"$t\\bar{t}$\",\n", " xsecs={\n", " 13: sn.Number(831.76, {\"scale\": (19.77, 29.20)}),\n", " },\n", ")\n", "\n", "# datasets\n", "d_data = od.Dataset(\"data\", 1,\n", " campaign=c_2017,\n", " is_data=True,\n", " n_files=100,\n", " n_events=200000,\n", " keys=[\"/data/2017.../AOD\"],\n", ")\n", "d_ttH = od.Dataset(\"ttH\", 2,\n", " campaign=c_2017,\n", " n_files=50,\n", " n_events=100000,\n", " keys=[\"/ttH_powheg.../.../AOD\"],\n", ")\n", "d_tt = od.Dataset(\"tt\", 3,\n", " campaign=c_2017,\n", " n_files=500,\n", " n_events=87654321,\n", " keys=[\"/tt_powheg.../.../AOD\"],\n", ")\n", "d_WW = od.Dataset(\"WW\", 4,\n", " campaign=c_2017,\n", " n_files=100,\n", " n_events=54321,\n", " keys=[\"/WW_madgraph.../.../AOD\"],\n", ")\n", "\n", "# link processes to datasets\n", "d_data.add_process(p_data)\n", "d_ttH.add_process(p_ttH)\n", "d_tt.add_process(p_tt)\n", "print([len(d.processes) for d in [d_data, d_ttH, d_tt]])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Get the cross section of the process in the ttH dataset at the energy of its campaign." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/latex": [ "$0.5071\\;^{+0.0294118}_{-0.0466532}\\;\\left(\\text{scale}\\right)$" ], "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d_ttH.get_process(\"ttH\").get_xsec(d_ttH.campaign.ecm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###### Analysis Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, define the analysis object and create a config for the ``2017_13Tev_25ns`` campaign:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "ana = od.Analysis(\"ttH\", 1)\n", "\n", "# create a config by passing the campaign, so id and name will be identical\n", "cfg = ana.add_config(c_2017)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Add processes we're interested in and datasets that we want to use:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# add processes manually\n", "cfg.add_process(p_data)\n", "cfg.add_process(p_ttH)\n", "cfg.add_process(p_tt)\n", "\n", "# add datasets in a loop\n", "for name in [\"data\", \"ttH\", \"tt\"]:\n", " cfg.add_dataset(c_2017.get_dataset(name))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Get the mean number of events per file in the `ttH` dataset." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2000.0" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cfg.get_dataset(\"ttH\").n_events / float(cfg.get_dataset(\"ttH\").n_files)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Define channels and categories:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "ch_bb = cfg.add_channel(\"ttH_bb\", 1)\n", "cat_5j = ch_bb.add_category(\"eq5j\",\n", " label=\"5 jets\",\n", " selection=\"n_jets == 5\",\n", ")\n", "cat_6j = ch_bb.add_category(\"ge6j\",\n", " label=r\"$\\geq$ 6 jets\",\n", " selection=\"n_jets >= 6\",\n", ")\n", "\n", "# divide the 6j category further\n", "cat_6j_3b = cat_6j.add_category(\"ge6j_eq3b\",\n", " label=r\"$\\geq$ 6 jets, 3 b-tags\",\n", " selection=\"n_jets >= 6 && n_btags == 3\",\n", ")\n", "cat_6j_4b = cat_6j.add_category(\"ge6j_ge4b\",\n", " label=r\"$\\geq$ 6 jets, $\\geq$ 4 b-tags\",\n", " selection=\"n_jets >= 6 && n_btags >= 4\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Get the ROOT-latex label of the 6j4b category by using only the config." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'#geq 6 jets, #geq 4 b-tags'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cfg.get_channel(\"ttH_bb\").get_category(\"ge6j_ge4b\", deep=True).label_root" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Systematic shifts we're going to study:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n" ] } ], "source": [ "cfg.add_shift(\"nominal\", 1)\n", "cfg.add_shift(\"lumi_up\", 2, type=\"rate\")\n", "cfg.add_shift(\"lumi_down\", 3, type=\"rate\")\n", "cfg.add_shift(\"scale_up\", 4, type=\"shape\")\n", "cfg.add_shift(\"scale_down\", 5, type=\"shape\")\n", "print(len(cfg.shifts))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Determine all shift objects starting wiht the *source* of the `scale_down` shift." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[, ]\n" ] } ], "source": [ "shifts = [s for s in cfg.shifts if s.source == \"scale\"]\n", "print(shifts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Add some variables that we want to project via ROOT trees (or numpy arrays / pandas dataframes with numexpr)." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n" ] } ], "source": [ "cfg.add_variable(\"jet1_pt\",\n", " expression=\"Reco__jet1__pt\",\n", " binning=(25, 0., 500,),\n", " unit=\"GeV\",\n", " x_title=r\"Leading jet $p_{T}$\",\n", ")\n", "cfg.add_variable(\"jet1_px\",\n", " expression=\"Reco__jet1__pt * cos(Reco__jet1__phi)\",\n", " binning=(25, 0., 500,),\n", " unit=\"GeV\",\n", " x_title=r\"Leading jet $p_{x}$\",\n", ")\n", "print(len(cfg.variables))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Get the full ROOT histogram title (i.e. + axis labels) of the `jet1_px` variable." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'jet1_px;Leading jet p_{x} / GeV;Entries / 20.0 GeV'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cfg.get_variable(\"jet1_px\").get_full_title(root=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Add \"soft\" information as auxiliary data." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3\n" ] } ], "source": [ "cfg.set_aux(\"lumi\", 40.)\n", "cfg.set_aux((\"globalTag\", \"data\"), \"80X_dataRun2...\")\n", "cfg.set_aux((\"globalTag\", \"mc\"), \"80X_mcRun2...\")\n", "print(len(cfg.aux))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Task**: Get the MC global tag." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "80X_mcRun2...\n" ] } ], "source": [ "print(cfg.get_aux((\"globalTag\", \"mc\")))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Now, we can start to use the analysis objects in a \"framework\" ..." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.14" } }, "nbformat": 4, "nbformat_minor": 1 }