{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# NanoEvents awkward1 demo\n", "\n", "NanoEvents is a Coffea utility to wrap the CMS [NanoAOD](https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_06021.pdf) or similar flat nTuple structure into a single awkward array with appropriate object methods (such as Lorentz vector methods), cross references, and nested objects, all lazily accessed from the source ROOT TTree via uproot.\n", "\n", "NanoEvents is in a **experimental** stage, and has been available in awkward0 for about 6 months. Quite recently, it was ported to awkward1. Here we demo using an awkward1-based NanoEvents array.\n", "\n", "It can be instantiated using the [NanoEventsFactory](https://coffeateam.github.io/coffea/api/coffea.nanoevents.NanoEventsFactory.html#nanoeventsfactory):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import awkward1 as ak\n", "from coffea.nanoevents import NanoEventsFactory\n", "\n", "fname = \"https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root\"\n", "cache = {}\n", "factory = NanoEventsFactory(fname, cache=cache)\n", "events = factory.events()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `events` object is an awkward array, which at its top level is a record array with one record for each \"collection\", where a collection is a grouping of column (TBranch) names, categorized based on the available columns as follows:\n", "\n", " * one branch exists named `name` and no branches start with `name_`, interpreted as a single flat array;\n", " * one branch exists named `name`, one named `n{name}`, and no branches start with `name_`, interpreted as a single jagged array;\n", " * no branch exists named `n{name}` and many branches start with `name_*`, interpreted as a flat table; or\n", " * one branch exists named `n{name}` and many branches start with `name_*`, interpreted as a jagged table.\n", "\n", "*Any ROOT TTree that follows such a naming convention should be readable as a NanoEvents array.*\n", "\n", "For example, in the file we opened, the branches:\n", "```\n", "Generator_binvar\n", "Generator_scalePDF\n", "Generator_weight\n", "Generator_x1\n", "Generator_x2\n", "Generator_xpdf1\n", "Generator_xpdf2\n", "Generator_id1\n", "Generator_id2\n", "```\n", "are grouped into one sub-record named `Generator` which can be accessed using either getitem or getattr syntax, i.e. `events[\"Generator\"]` or `events.Generator`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "events.Generator.id1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# all column names can be listed with:\n", "events.Generator.columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# In CMS NanoAOD, each TBranch title is a help string, which is carried into the NanoEvents\n", "# e.g. executing the following cell should produce a help pop-up \"id of first parton\"\n", "events.Generator.id1?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Based on a collection's name, some collections acquire additional _methods_, which are extra features exposed by the code in the mixin classes of the [nanoaod.methods](https://coffeateam.github.io/coffea/modules/coffea.nanoevents.methods.html) module. For example, although `events.GenJet` has the columns:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "events.GenJet.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "we can access additional attributes associated to each generated jet by virtue of the fact that they can be interpreted as Lorentz vectors:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "events.GenJet.energy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can call more complex methods, like computing the distance $\\Delta R = \\sqrt{\\Delta \\eta^2 + \\Delta \\phi ^2}$ between two LorentzVector objects:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# find distance between leading jet and all electrons in each event\n", "events.Jet[:, 0].delta_r(events.Electron)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The mapping from collection name to methods is controlled by [NanoEventsFactory.default_mixins](https://coffeateam.github.io/coffea/api/coffea.nanoevents.NanoEventsFactory.html#coffea.nanoevents.NanoEventsFactory.default_mixins) and can be overriden with new mappings in the NanoEventsFactory constructor, if desired.\n", "Additional methods provide convenience functions for interpreting some branches, e.g." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# unpacked Jet_jetId flags\n", "events.Jet.isTight" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# unpacked GenPart_statusFlags\n", "events.GenPart.hasFlags(['isPrompt', 'isLastCopy'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "CMS NanoAOD also contains pre-computed cross-references for some types of collections. For example, there is a TBranch `Electron_genPartIdx` which indexes the `GenPart` collection per event to give the matched generated particle, and `-1` if no match is found. NanoEvents transforms these indices into an awkward _indexed array_ pointing to the collection, so that one can directly access the matched particle using getattr syntax:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "events.Electron.matched_gen.pdgId" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "events.Muon.matched_jet.pt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For generated particles, the parent index is similarly mapped:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "events.GenPart.parent.pdgId" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition, using the parent index, a helper method computes the inverse mapping, namely, `children`. As such, one can find particle siblings with:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "events.GenPart.parent.children.pdgId\n", "# notice this is a doubly-jagged array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since often one wants to shortcut repeated particles in a decay sequence, a helper method `distinctParent` is also available. Here we use it to find the parent particle ID for all prompt electrons:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "events.GenPart[\n", " (abs(events.GenPart.pdgId) == 11)\n", " & events.GenPart.hasFlags(['isPrompt', 'isLastCopy'])\n", "].distinctParent.pdgId" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Events can be filtered like any other awkward array using boolean fancy-indexing" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mmevents = events[ak.num(events.Muon) == 2]\n", "zmm = mmevents.Muon[:, 0] + mmevents.Muon[:, 1]\n", "zmm.mass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can assign new variables to the arrays, with some caveats:\n", "\n", " * Assignment must use setitem (`events[\"path\", \"to\", \"name\"] = value`)\n", " * Assignment to a sliced `events` won't be accessible from the original variable\n", " * New variables are not visible from cross-references" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mmevents[\"Electron\", \"myvar2\"] = mmevents.Electron.pt + zmm.mass\n", "mmevents.Electron.myvar2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just to demonstrate that everything is lazily-accessed, here are all the cache items that have built up through the execution of this demo" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"\\n\".join(sorted(cache.keys())))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }