{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ZZ to 4 leptons analysis with CMS open data and ADL/CutLang\n",
    "\n",
    "This is an exercise showing a simple analysis exploring the ZZ -> 4 lepton final state, focusing on the e+e-μ+μ- channel.  The analysis aims to explore the kinematics of ZZ --> e+e-μ+μ- events.\n",
    "\n",
    "The analysis is performed based on CMS open data MC ntuples.\n",
    "\n",
    "The analysis consists of two parts:\n",
    "1. Applying some event selection to the input events and making distributions.  This part is performed using a special language called ADL, and via a software called CutLang that can read and process ADL.\n",
    "2. Drawing plots produced by the previous step.  This part is performed using ROOT (with Python syntax).  ROOT is the main analysis software used at CERN.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget --progress=dot:giga https://www.dropbox.com/s/hak5sqxamgkrfa2/ZZTo2e2mu.root\n",
    "# Get the ROOT file containing the ZZ -> eemumu background events"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Before starting the analysis\n",
    "\n",
    "Please import the requirements by running the cell below to avoid error"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ROOT\n",
    "%jsroot on"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Writing the analysis with ADL and running with CutLang\n",
    "\n",
    "**Writing the analysis with ADL:** In the following cell, part of the analysis is written using the ADL syntax.  However there are some parts missing. Please follow the instructions in the comments to complete the missing parts.  If you feel adventurous, you could modify the object or event selections, add new variables or new histograms.\n",
    "\n",
    "**Running the analysis with CutLang:** Executing the cell will run the analysis on both the signal (SMHiggsToZZTo4L.root) and background (ZZTo2e2mu.root) events.  The run parameters are given in the first line of the cell:\n",
    "- **file** : input root file\n",
    "- **filetype** : input event format (do not change!)\n",
    "- **adlfile** : the name we use for labeling the analysis \n",
    "- **events** : number of events used from each file\n",
    "- **verbose** : frequency of processed event numbers written in output text\n",
    "- **parallel** : enter 0 to speed up analysis with multiprocessing\n",
    "\n",
    "NOTE: When running jupyter/binder via direct link, if your run does not complete due to memory issues, please reduce the number of events via the \"events\" parameter.\n",
    "\n",
    "**Analysis output:** Running the analysis will produce two outputs:\n",
    "  * Text output shown cell output: This includes \"cutflows\" for each region, i.e. the selections applied and how many events survive the various selections.  Histograms are also listed.  You should see a separate output for each ROOT file that is run.\n",
    "  * ROOT output: One ROOT file called histoOut-\\<adlfile name\\>-\\<file name\\>.root that includes all the histograms produced by the analysis.  These ROOT files will be used in the next step."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%cutlang file=ZZTo2e2mu.root filetype=CMSODR2 adlfile=ZZ4L events=100000 verbose=20000\n",
    "\n",
    "# ADL file for ZZ->eemumu analysis\n",
    "\n",
    "# Object selection\n",
    "# Take input electrons, labeled \"ele\" and obtain a set of selected electrons \"elesel\"\n",
    "object elesel\n",
    "  take ele\n",
    "  select pT(ele) > 20\n",
    "  select abs(eta(ele)) < 2.5\n",
    "\n",
    "# Take input muons, labeled \"muo\" and obtain a set of selected muons \"muosel\"\n",
    "object muosel\n",
    "  take muo\n",
    "  select pT(muo) > 20\n",
    "  select abs(eta(muo)) < 2.4\n",
    "\n",
    "# Event selection\n",
    "    \n",
    "# Select all events and make histograms of lepton multiplicities\n",
    "region overview\n",
    "  select ALL # to count all events\n",
    "  histo hneinp, \"number of input electrons\", 10, 0, 10, size(ele)\n",
    "  histo hnesel, \"number of selected electrons\", 10, 0, 10, size(elesel)\n",
    "  histo hnminp, \"number of input muons\", 10, 0, 10, size(muo)\n",
    "  histo hnmsel, \"number of selected muons\", 10, 0, 10, size(muosel)\n",
    "  histo hnenminp, \"number of input electrons vs muons\", 10, 0, 10, 10, 0, 10, size(ele), size(muo)\n",
    "  histo hnenmsel, \"number of selected electrons vs muons\", 10, 0, 10, 10, 0, 10, size(elesel), size(muosel)\n",
    "\n",
    "# Selection requiring 1 Z->ee in the event using input electrons\n",
    "region rZeeinp\n",
    "  select ALL\n",
    "  select size(ele) == 2\n",
    "  select q(ele[0]) + q(ele[1]) == 0\n",
    "  histo hZeeinp, \"Z(->ee,inp) candidate mass (GeV)\", 50, 50, 150, m(ele[0] ele[1])\n",
    "\n",
    "# Selection requiring 1 Z->mumu in the event using selected electrons\n",
    "region rZeesel\n",
    "  select ALL\n",
    "  select size(elesel) == 2\n",
    "  select q(elesel[0]) + q(elesel[1]) == 0\n",
    "  histo hZeesel, \"Z(->ee,sel) candidate mass (GeV)\", 50, 50, 150, m(elesel[0] elesel[1])\n",
    "\n",
    "# Can you write here the 2 regions requiring 1 Z->mumu in the event?\n",
    "region rZmminp\n",
    "  select ALL\n",
    "  # Please complete the rest\n",
    "\n",
    "region rZmmsel\n",
    "  select ALL\n",
    "  # Please complete the rest\n",
    "\n",
    "# Now let's apply a selection with 2Zs, Z->ee and Z->mumu\n",
    "region rZeemminp\n",
    "  select ALL\n",
    "  select size(ele) == 2 and size(muo) == 2\n",
    "  select q(ele[0]) + q(ele[1]) == 0\n",
    "  select q(muo[0]) + q(muo[1]) == 0\n",
    "  histo hZeeinp, \"Z(->ee,inp) candidate mass (GeV)\", 50, 50, 150, m(ele[0] ele[1])\n",
    "  histo hZmminp, \"Z(->mm,inp) candidate mass (GeV)\", 50, 50, 150, m(muo[0] muo[1])\n",
    "  # Can you make a 2D histogram plotting m(ee) vs m(mumu) ?\n",
    "  # histo hZeemminp,  \n",
    "\n",
    "# Can you write the same region using the selected electrons and muons?\n",
    "region rZeemmsel\n",
    "  select ALL\n",
    "  # Please complete the rest\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Checking the analysis output with ROOT\n",
    "\n",
    "Now let's make some plots using the ROOT package in python (which is widely used at CERN).\n",
    "Instructions are shown within comments in the following cells.\n",
    "\n",
    "What to do:\n",
    "  * Compare some of the histograms you made:\n",
    "    * Electrons vs. muons\n",
    "    * Input leptons vs. selected leptons\n",
    "    * Different selection regions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's start with importing the needed modules\n",
    "from ROOT import gStyle, TFile, TH1, TH1D, TH2D, TCanvas, TLegend, TColor\n",
    "\n",
    "# Now let's set some ROOT styling parameters:\n",
    "# You do not need to know what they mean, but can directly use these settings\n",
    "\n",
    "gStyle.SetOptStat(0)\n",
    "gStyle.SetPalette(1)\n",
    "\n",
    "gStyle.SetTextFont(42)\n",
    "\n",
    "gStyle.SetTitleStyle(0000)\n",
    "gStyle.SetTitleBorderSize(0)\n",
    "gStyle.SetTitleFont(42)\n",
    "gStyle.SetTitleFontSize(0.055)\n",
    "\n",
    "gStyle.SetTitleFont(42, \"xyz\")\n",
    "gStyle.SetTitleSize(0.5, \"xyz\")\n",
    "gStyle.SetLabelFont(42, \"xyz\")\n",
    "gStyle.SetLabelSize(0.45, \"xyz\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's open the output file produced by CutLang: \n",
    "# (If you changed the adlfile option when running cutlang, you will need to change the file names)\n",
    "f = TFile(\"histoOut-ZZ4L-ZZTo2e2mu.root\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can see what is inside the signal file:\n",
    "f.ls()\n",
    "# There should be a directory (TDirectoryFile) per selection region."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's check out what is inside \"baseline\":\n",
    "f.cd(\"rZeeinp\")\n",
    "f.ls()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the histograms\n",
    "# Overview region:\n",
    "hneinp = f.Get(\"overview/hneinp\")\n",
    "hnminp = f.Get(\"overview/hnminp\")\n",
    "hnesel = f.Get(\"overview/hnesel\")\n",
    "hnmsel = f.Get(\"overview/hnmsel\")\n",
    "hnenminp = f.Get(\"overview/hnenminp\")\n",
    "hnenmsel = f.Get(\"overview/hnenmsel\")\n",
    "\n",
    "# Zeeinp region\n",
    "hZeeinp = f.Get(\"rZeeinp/hZeeinp\")\n",
    "# Zeesel region\n",
    "hZeesel = f.Get(\"rZeesel/hZeesel\")\n",
    "# Zmminp region\n",
    "hZmminp = f.Get(\"rZmminp/hZmminp\")\n",
    "# Zmmsel region\n",
    "hZmmsel = f.Get(\"rZmmsel/hZmmsel\")\n",
    "# Zeemminp region\n",
    "hZeeinp2 = f.Get(\"rZeemminp/hZeeinp\")\n",
    "hZmminp2 = f.Get(\"rZeemminp/hZmminp\")\n",
    "hZeemminp = f.Get(\"rZeemminp/hZeemminp\")\n",
    "# Zeemmsel region\n",
    "hZeesel2 = f.Get(\"rZeemmsel/hZeesel\")\n",
    "hZmmsel2 = f.Get(\"rZeemmsel/hZmmsel\")\n",
    "hZeemmsel = f.Get(\"rZeemmsel/hZeemmsel\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# In order to be able to make many plots, let's define two generic histogrms to which we can \n",
    "# assign any of the histograms above:\n",
    "h1 = hneinp\n",
    "h2 = hnminp\n",
    "\n",
    "# Now we format the histograms: lines, colors, axes titles, etc..  \n",
    "# You do not need to learn the commands here unless you are really curious.\n",
    "# Otherwise just execute the cell.\n",
    "\n",
    "# Color numbers can be retrived from https://root.cern.ch/doc/master/classTColor.html\n",
    "# (check for color wheel)\n",
    "h1.SetLineColor(600) # kBlue\n",
    "h2.SetLineColor(416+2) # kGreen + 2\n",
    "\n",
    "# Titles, labels.  \n",
    "# It is enough to set these variables ONLY FOR THE FIRST HISTOGRAM YOU WILL DRAW\n",
    "# i.e., the one you will call by .Draw().  The rest you will draw by .Draw(\"same\") will only \n",
    "# contribute with the historam curve.\n",
    "\n",
    "# Make the x-axis title:\n",
    "rawtitle = h1.GetTitle()\n",
    "if (\"electron\" in rawtitle): \n",
    "    title = rawtitle.replace(\"electron\", \"lepton\")\n",
    "elif (\"muon\" in rawtitle): \n",
    "    title = rawtitle.replace(\"muon\", \"lepton\")\n",
    "elif (\"ee\" in rawtitle): \n",
    "    title = rawtitle.replace(\"ee\", \"ll\")\n",
    "elif (\"mm\" in rawtitle): \n",
    "    title = rawtitle.replace(\"mm\", \"ll\")\n",
    "print(title)\n",
    "    \n",
    "h1.SetTitle(\"\")\n",
    "h1.GetXaxis().SetTitle(title)\n",
    "h1.GetXaxis().SetTitleOffset(1.25)\n",
    "h1.GetXaxis().SetTitleSize(0.05)\n",
    "h1.GetXaxis().SetLabelSize(0.045)\n",
    "h1.GetXaxis().SetNdivisions(8, 5, 0)\n",
    "h1.GetYaxis().SetTitle(\"number of events\")\n",
    "h1.GetYaxis().SetTitleOffset(1.4)\n",
    "h1.GetYaxis().SetTitleSize(0.05)\n",
    "h1.GetYaxis().SetLabelSize(0.045)\n",
    "\n",
    "# Set the maximum of the y axis:\n",
    "if (h2.GetMaximum()>h1.GetMaximum()):\n",
    "    h1.SetMaximum(h2.GetMaximum()*1.1)\n",
    "    \n",
    "# Make a generically usable legend\n",
    "l = TLegend(0.65, 0.75, 0.88, 0.87)\n",
    "l.SetBorderSize(0)\n",
    "l.SetFillStyle(0000)\n",
    "# You can change the legend titles from here based on what you are plotting\n",
    "l.AddEntry(h1,h1.GetName(), \"l\")\n",
    "l.AddEntry(h2,h2.GetName(), \"l\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now we make a canvas and draw our histograms\n",
    "c = TCanvas(\"c\", \"c\", 620, 500)\n",
    "c.SetBottomMargin(0.15)\n",
    "c.SetLeftMargin(0.15)\n",
    "c.SetRightMargin(0.15)\n",
    "h1.Draw()\n",
    "h2.Draw(\"same\")\n",
    "l.Draw(\"same\")\n",
    "c.Draw()\n",
    "# Don't worry about the error that appears below!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "c2 = TCanvas(\"c2\", \"c2\", 620, 500)\n",
    "c2.SetBottomMargin(0.15)\n",
    "c2.SetLeftMargin(0.15)\n",
    "c2.SetRightMargin(0.15)\n",
    "hZeemmsel.Draw(\"colz\")\n",
    "c2.Draw()\n",
    "# Don't worry about the error that appears below!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}