{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cd70edc7",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Save dataset to ROOT file after processing\n",
    "\n",
    "With RDataFrame, you can read your dataset, add new columns with processed values and finally use `Snapshot` to save the resulting data to a ROOT file in TTree format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "308c56f0",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "import ROOT\n",
    "\n",
    "df = ROOT.RDataFrame(\"dataset\",\"data/example_file.root\")\n",
    "df1 = df.Define(\"c\",\"a+b\")\n",
    "\n",
    "out_treename = \"outtree\"\n",
    "out_filename = \"outtree.root\"\n",
    "out_columns = [\"a\",\"b\",\"c\"]\n",
    "snapdf = df1.Snapshot(out_treename, out_filename, out_columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ecaaed15",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "We can now check that the dataset was correctly stored in a file:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ca9de7b",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "rootls -lt outtree.root"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55b7bc7f",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Result of a Snapshot is still an RDataFrame that can be further used:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23f46a0b",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "snapdf.Display().Print()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d98928de",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Cutflow reports\n",
    "Filters applied to the dataset can be given a name. The `Report` method will gather information about filter efficiency and show the data flow between subsequent cuts on the original dataset.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d7610f52",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "df = ROOT.RDataFrame(\"sig_tree\", \"https://root.cern/files/Higgs_data.root\")\n",
    "\n",
    "filter1 = df.Filter(\"lepton_eta > 0\", \"Lepton eta cut\")\n",
    "filter2 = filter1.Filter(\"lepton_phi < 1\", \"Lepton phi cut\")\n",
    "\n",
    "rep = df.Report()\n",
    "rep.Print()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3be5b9d",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Using C++ functions in Python\n",
    "- We still want to perform complex operations in Python but plain Python code is prone to be slow and not thread-safe. \n",
    "\n",
    "- Instead, you can inject C++ functions that will do the work in your event loop during runtime. \n",
    "\n",
    "- This mechanism uses the C++ interpreter cling shipped with ROOT, making this possible in a single line of code. \n",
    "\n",
    "- Let's start by defining a function that will allow us to change the type of a the RDataFrame dataset entry numbers (stored in the special column \"rdfentry\") from `unsigned long long` to `float`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a1bcee4",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%%cpp\n",
    "\n",
    "float asfloat(unsigned long long entrynumber){\n",
    "    return entrynumber;\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b8f4bd1",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Then let's define another function that takes a `float` values and computes its square."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d3a8b4f",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "%%cpp\n",
    "\n",
    "float square(float val){\n",
    "    return val * val;\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90522e44",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "And now let's use these functions with RDataFrame! \n",
    "\n",
    "We start by creating an empty RDataFrame with 100 consecutive entries and defining new columns on it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0edd70d3",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "# Create a new RDataFrame from scratch with 100 consecutive entries\n",
    "df = ROOT.RDataFrame(100)\n",
    "\n",
    "# Create a new column using the previously declared C++ functions\n",
    "df1 = df.Define(\"a\", \"asfloat(rdfentry_)\")\n",
    "df2 = df1.Define(\"b\", \"square(a)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b1005d7",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "We can now plot the values of the columns in a graph:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18a35cd0",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "# Show the two columns created in a graph\n",
    "c = ROOT.TCanvas()\n",
    "graph = df2.Graph(\"a\",\"b\")\n",
    "graph.SetMarkerStyle(20)\n",
    "graph.SetMarkerSize(0.5)\n",
    "graph.SetMarkerColor(ROOT.kBlue)\n",
    "graph.SetTitle(\"My graph\")\n",
    "graph.Draw(\"AP\")\n",
    "c.Draw()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "072ae85d",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Using all cores of your machine with multi-threaded RDataFrame\n",
    "- RDataFrame can transparently perform multi-threaded event loops to speed up the execution of its actions. \n",
    "\n",
    "- Users have to call `ROOT::EnableImplicitMT()` before constructing the RDataFrame object to indicate that it should take advantage of a pool of worker threads. \n",
    "\n",
    "- Each worker thread processes a distinct subset of entries, and their partial results are merged before returning the final values to the user.\n",
    "\n",
    "- RDataFrame operations such as Histo1D or Snapshot are guaranteed to work correctly in multi-thread event loops. \n",
    "\n",
    "- User-defined expressions, such as strings or lambdas passed to `Filter`, `Define`, `Foreach`, `Reduce` or `Aggregate` will have to be thread-safe, i.e. it should be possible to call them concurrently from different threads."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f2d4528b",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "# Get a first baseline measurement\n",
    "\n",
    "treename = \"Events\"\n",
    "filename = \"root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root\"\n",
    "df = ROOT.RDataFrame(treename, filename)\n",
    "\n",
    "df.Sum(\"nMuon\").GetValue()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec2afbc4",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "%%time\n",
    "# Activate multithreading capabilities\n",
    "# By default takes all available cores on the machine\n",
    "ROOT.EnableImplicitMT()\n",
    "\n",
    "treename = \"Events\"\n",
    "filename = \"root://eospublic.cern.ch//eos/opendata/cms/derived-data/AOD2NanoAODOutreachTool/Run2012BC_DoubleMuParked_Muons.root\"\n",
    "df = ROOT.RDataFrame(treename, filename)\n",
    "\n",
    "df.Sum(\"nMuon\").GetValue()\n",
    "\n",
    "# Disable implicit multithreading when done\n",
    "ROOT.DisableImplicitMT()"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}