{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# `order`: An introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example we get to know the most important classes of *order* and how they are related to describe your analysis and all external data. We will set up a simple but scalable example analysis that involves most of the API. For more info, see the full [API documentation](http://python-order.readthedocs.io/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Classes and Relations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| Name                                                      | Purpose                                                                                               |\n",
    "| :-------------------------------------------------------- | :---------------------------------------------------------------------------------------------------: |\n",
    "| [`Analysis`](http://python-order.readthedocs.io#analysis) | Represents the central object of a physics analysis.                                                  |\n",
    "| [`Campaign`](http://python-order.readthedocs.io#campaign) | Provides data of a well-defined range of data-taking, detector alignment, MC settings, datasets, etc. |\n",
    "| [`Config`](http://python-order.readthedocs.io#config)     | Holds analysis information related to a campaign instance (most configuration happens here!).         |\n",
    "| [`Dataset`](http://python-order.readthedocs.io#dataset)   | Definition of a dataset, produced for / measured in a campaign.                                       |\n",
    "| [`Process`](http://python-order.readthedocs.io#process)   | Phyiscs process with cross sections for multiple center-of-mass energies, labels, etc.                |\n",
    "| [`Channel`](http://python-order.readthedocs.io#channel)   | Analysis channel, often defined by a particular decay resulting in distinct final state objects.      |\n",
    "| [`Category`](http://python-order.readthedocs.io#category) | Category definition, (optionally) within the phase-space of an analysis channel.                      |\n",
    "| [`Variable`](http://python-order.readthedocs.io#variable) | Generic variable description providing expression and selection statements, titles, binning, etc.     |\n",
    "| [`Shift`](http://python-order.readthedocs.io#shift)       | Represents a systematic shift with a name, direction and type.                                        |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "*Relations* between these classes are the glue that hold an analysis together. If you have ever performed a HEP analysis, they might look pretty familiar to you."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  `Analysis` &harr; `Campaign` &harr; `Config`\n",
    "\n",
    "1. An analysis is not limited to a single campaign (e.g. for combining results across several data-taking periods or even experiments).\n",
    "2. A campaign is independent of analyses it is used in. In general, it could be defined externally / centrally.\n",
    "3. An analysis stores campaign-related data in config objects.\n",
    "4. An analysis can store multiple config objects that are related to the same campaign."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  `Campaign`, `Config` &harr; `Dataset`\n",
    "\n",
    "1. A campaign can contain all datasets that were recorded / produced for its era and settings.\n",
    "2. A config contains a subset of its campaign's datasets, depending on what is required in its analysis.\n",
    "3. A dataset belongs to a campaign and since a config is distinctly assigned to a campaign, a dataset is also related to a config."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "####  `Dataset` &harr; `Process`\n",
    "\n",
    "1. A dataset contains physics processes.\n",
    "2. A process can be contained in multiple datasets.\n",
    "3. Processes can have *child* and *parent* processes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### `Channel` &harr; `Category`\n",
    "\n",
    "1. A category describes a sub-phase-space of a channel, therefore, it belongs to a channel and channels have categories.\n",
    "2. Channels can have *child* channels and a *parent* channel.\n",
    "3. Categories can have *child* and *parent* categories."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### `Config` &harr; `Channel`, `Variable`, `Shift`\n",
    "\n",
    "1. A config has channels.\n",
    "2. A config has variables.\n",
    "3. A config has shifts."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example Analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example, we define a toy $t\\bar{t}H$ analysis with a signal dataset, a $t\\bar{t}$ background and real data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import order as od\n",
    "import scinum as sn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### General, Analysis-unrelated Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Define a campaign, its datasets and link processes. This could be done externally or even via importing a centrally maintained repository."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1, 1, 1]\n"
     ]
    }
   ],
   "source": [
    "# campaign\n",
    "c_2017 = od.Campaign(\"2017_13Tev_25ns\", 1, ecm=13, bx=25)\n",
    "\n",
    "# processes\n",
    "p_data = od.Process(\"data\", 1,\n",
    "    is_data=True,\n",
    "    label=\"data\",\n",
    ")\n",
    "p_ttH = od.Process(\"ttH\", 2,\n",
    "    label=r\"$t\\bar{t}H$\",\n",
    "    xsecs={\n",
    "        13: sn.Number(0.5071, {\"scale\": (sn.Number.REL, 0.058, 0.092)}),\n",
    "    },\n",
    ")\n",
    "p_tt = od.Process(\"tt\", 3,\n",
    "    label=r\"$t\\bar{t}$\",\n",
    "    xsecs={\n",
    "        13: sn.Number(831.76, {\"scale\": (19.77, 29.20)}),\n",
    "    },\n",
    ")\n",
    "\n",
    "# datasets\n",
    "d_data = od.Dataset(\"data\", 1,\n",
    "    campaign=c_2017,\n",
    "    is_data=True,\n",
    "    n_files=100,\n",
    "    n_events=200000,\n",
    "    keys=[\"/data/2017.../AOD\"],\n",
    ")\n",
    "d_ttH = od.Dataset(\"ttH\", 2,\n",
    "    campaign=c_2017,\n",
    "    n_files=50,\n",
    "    n_events=100000,\n",
    "    keys=[\"/ttH_powheg.../.../AOD\"],\n",
    ")\n",
    "d_tt = od.Dataset(\"tt\", 3,\n",
    "    campaign=c_2017,\n",
    "    n_files=500,\n",
    "    n_events=87654321,\n",
    "    keys=[\"/tt_powheg.../.../AOD\"],\n",
    ")\n",
    "d_WW = od.Dataset(\"WW\", 4,\n",
    "    campaign=c_2017,\n",
    "    n_files=100,\n",
    "    n_events=54321,\n",
    "    keys=[\"/WW_madgraph.../.../AOD\"],\n",
    ")\n",
    "\n",
    "# link processes to datasets\n",
    "d_data.add_process(p_data)\n",
    "d_ttH.add_process(p_ttH)\n",
    "d_tt.add_process(p_tt)\n",
    "print([len(d.processes) for d in [d_data, d_ttH, d_tt]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task**: Get the cross section of the process in the ttH dataset at the energy of its campaign."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/latex": [
       "$0.5071\\;^{+0.0294118}_{-0.0466532}\\;\\left(\\text{scale}\\right)$"
      ],
      "text/plain": [
       "<Number at 0x10b8d2fd0, '0.5071 +0.0294118-0.0466532 (scale)'>"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d_ttH.get_process(\"ttH\").get_xsec(d_ttH.campaign.ecm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### Analysis Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, define the analysis object and create a config for the ``2017_13Tev_25ns`` campaign:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "ana = od.Analysis(\"ttH\", 1)\n",
    "\n",
    "# create a config by passing the campaign, so id and name will be identical\n",
    "cfg = ana.add_config(c_2017)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr />\n",
    "Add processes we're interested in and datasets that we want to use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# add processes manually\n",
    "cfg.add_process(p_data)\n",
    "cfg.add_process(p_ttH)\n",
    "cfg.add_process(p_tt)\n",
    "\n",
    "# add datasets in a loop\n",
    "for name in [\"data\", \"ttH\", \"tt\"]:\n",
    "    cfg.add_dataset(c_2017.get_dataset(name))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task**: Get the mean number of events per file in the `ttH` dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2000.0"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cfg.get_dataset(\"ttH\").n_events / float(cfg.get_dataset(\"ttH\").n_files)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr />\n",
    "Define channels and categories:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "ch_bb = cfg.add_channel(\"ttH_bb\", 1)\n",
    "cat_5j = ch_bb.add_category(\"eq5j\",\n",
    "    label=\"5 jets\",\n",
    "    selection=\"n_jets == 5\",\n",
    ")\n",
    "cat_6j = ch_bb.add_category(\"ge6j\",\n",
    "    label=r\"$\\geq$ 6 jets\",\n",
    "    selection=\"n_jets >= 6\",\n",
    ")\n",
    "\n",
    "# divide the 6j category further\n",
    "cat_6j_3b = cat_6j.add_category(\"ge6j_eq3b\",\n",
    "    label=r\"$\\geq$ 6 jets, 3 b-tags\",\n",
    "    selection=\"n_jets >= 6 && n_btags == 3\",\n",
    ")\n",
    "cat_6j_4b = cat_6j.add_category(\"ge6j_ge4b\",\n",
    "    label=r\"$\\geq$ 6 jets, $\\geq$ 4 b-tags\",\n",
    "    selection=\"n_jets >= 6 && n_btags >= 4\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task**: Get the ROOT-latex label of the 6j4b category by using only the config."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'#geq 6 jets, #geq 4 b-tags'"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cfg.get_channel(\"ttH_bb\").get_category(\"ge6j_ge4b\", deep=True).label_root"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr />\n",
    "Systematic shifts we're going to study:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5\n"
     ]
    }
   ],
   "source": [
    "cfg.add_shift(\"nominal\", 1)\n",
    "cfg.add_shift(\"lumi_up\", 2, type=\"rate\")\n",
    "cfg.add_shift(\"lumi_down\", 3, type=\"rate\")\n",
    "cfg.add_shift(\"scale_up\", 4, type=\"shape\")\n",
    "cfg.add_shift(\"scale_down\", 5, type=\"shape\")\n",
    "print(len(cfg.shifts))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task**: Determine all shift objects starting wiht the *source* of the `scale_down` shift."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[<Shift at 0x10b8d2050, name=scale_up, id=4, context=shift>, <Shift at 0x10b8c7b50, name=scale_down, id=5, context=shift>]\n"
     ]
    }
   ],
   "source": [
    "shifts = [s for s in cfg.shifts if s.source == \"scale\"]\n",
    "print(shifts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr />\n",
    "Add some variables that we want to project via ROOT trees (or numpy arrays / pandas dataframes with numexpr)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n"
     ]
    }
   ],
   "source": [
    "cfg.add_variable(\"jet1_pt\",\n",
    "    expression=\"Reco__jet1__pt\",\n",
    "    binning=(25, 0., 500,),\n",
    "    unit=\"GeV\",\n",
    "    x_title=r\"Leading jet $p_{T}$\",\n",
    ")\n",
    "cfg.add_variable(\"jet1_px\",\n",
    "    expression=\"Reco__jet1__pt * cos(Reco__jet1__phi)\",\n",
    "    binning=(25, 0., 500,),\n",
    "    unit=\"GeV\",\n",
    "    x_title=r\"Leading jet $p_{x}$\",\n",
    ")\n",
    "print(len(cfg.variables))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task**: Get the full ROOT histogram title (i.e. + axis labels) of the `jet1_px` variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'jet1_px;Leading jet p_{x} / GeV;Entries / 20.0 GeV'"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cfg.get_variable(\"jet1_px\").get_full_title(root=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr />\n",
    "Add \"soft\" information as auxiliary data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3\n"
     ]
    }
   ],
   "source": [
    "cfg.set_aux(\"lumi\", 40.)\n",
    "cfg.set_aux((\"globalTag\", \"data\"), \"80X_dataRun2...\")\n",
    "cfg.set_aux((\"globalTag\", \"mc\"), \"80X_mcRun2...\")\n",
    "print(len(cfg.aux))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Task**: Get the MC global tag."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "80X_mcRun2...\n"
     ]
    }
   ],
   "source": [
    "print(cfg.get_aux((\"globalTag\", \"mc\")))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr />\n",
    "Now, we can start to use the analysis objects in a \"framework\" ..."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}