{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Quick start of AlphaTwirl\n",
    "\n",
    "\n",
    "Tai Sakuma\n",
    "\n",
    "[AlphaTwirl](https://github.com/alphatwirl/alphatwirl) is a python library for summarizing event data into multivariate categorical data. [qtwirl](https://github.com/alphatwirl/qtwirl) is one-function interface to AlphaTwirl. In this notebook, I will shows you by example how to summarize event data in flat ROOT trees and create [Pandas'](http://pandas.pydata.org/) data frames with qtwirl.\n",
    "\n",
    "***\n",
    "\n",
    "## Table of contents\n",
    "\n",
    "- [**Preparation**](#%E2%9E%BC-Preparation)\n",
    "    - [Import common modules](#%E2%9E%BB-Import-common-modules)\n",
    "    - [Import the function qtwirl](#%E2%9E%BB-Import-the-function-qtwirl)\n",
    "- [**The first example**](#%E2%9E%BC-The-first-example)\n",
    "    - [1-D histogram of discrete variable](#%E2%9E%BB-1-D-histogram-of-discrete-variable)\n",
    "- [**More simple examples**](#%E2%9E%BC-More-simple-examples)\n",
    "    - [Continous variable and binnings](#%E2%9E%BB-continous-variable-and-binnings)\n",
    "    - [Multiple data frames](#%E2%9E%BB-Multiple-data-frames)\n",
    "    - [Multi-dimensional categorical data](#%E2%9E%BB-Multi-dimensional-categorical-data)\n",
    "- [**Examples of reading array and indices**](#%E2%9E%BC-Examples-of-reading-array-and-indices)\n",
    "    - [Specific element](#%E2%9E%BB-Specific-element)\n",
    "    - [All elements&mdash;wildcard](#%E2%9E%BB-All-elements%E2%80%94wildcard)\n",
    "    - [Multiple dimensions](#%E2%9E%BB-Multiple-dimensions)\n",
    "    - [All combinations](#%E2%9E%BB--All-combinations)\n",
    "    - [Back references](#%E2%9E%BB--Back-references)\n",
    "- [**Event selections**](#%E2%9E%BC-Event-selections)\n",
    "- [**Adding variables&mdash;scribblers**](#%E2%9E%BC-Adding-variables%E2%80%94scribblers)\n",
    "- [**Multiple input files**](#%E2%9E%BC-multiple-input-files)\n",
    "    - [First example](#%E2%9E%BB-First-example)\n",
    "    - [Skip zombie files](#%E2%9E%BB-Skip-zombie-files)\n",
    "    - [Don't skip zombie files](#%E2%9E%BB-Don't-skip-zombie-files)\n",
    "- [**Concurrency**](#%E2%9E%BC--Concurrency)\n",
    "    - [Multiprocessing](#%E2%9E%BB-Multiprocessing)\n",
    "    - [HTCondor](#%E2%9E%BB-HTCondor)\n",
    "\n",
    "***\n",
    "\n",
    "\n",
    "## ➼ Preparation\n",
    "\n",
    "### ➻ Import common modules\n",
    "\n",
    "We start by importing some common modules."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ➻ Import the function qtwirl\n",
    "\n",
    "\n",
    "Import the function `qtwirl()` from the package `qtwirl`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from qtwirl import qtwirl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "## ➼ The first example\n",
    "\n",
    "### ➻ 1-D histogram of discrete variable\n",
    "\n",
    "We will run a simple example of using qtwirl. We will read a ROOT file and create a 1-D histogram of a [discrete variable](http://www.statisticshowto.com/discrete-variable/).\n",
    "\n",
    "The file used in this example is stored on [github](https://github.com/alphatwirl/qtwirl/tree/v0.03.1/tests/data). This file contains one tree, named `tree`, with 1,000 events. Let us store the URL to the file to the variable `filapath`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath = 'https://github.com/alphatwirl/qtwirl/raw/v0.06.0/tests/data/root/sample_chain_01.root'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we run the first example of `qtwirl()`.\n",
    "\n",
    "The function `qtwirl()` requires only a few arguments but also accepts many optional arguments. Here, we use three arguments.\n",
    "- `data`: the path to the input data file. This can be a list of paths if you have multiple input files.\n",
    "- `tree_name`: the name of the tree with event data to read in the file.\n",
    "- `reader_cfg`: a dict or a list of dicts—specifing how to summarize the event data. In this first example, it is simply a dict with one item. This argument can be long and complex.\n",
    "\n",
    "This example counts the number of the events for each value of `njets`. In other words, it creates a histogram of `njets`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=dict(key_name='njets'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The cell above doesn't print anything (except a progress bar).\n",
    "\n",
    "The return value `results` is a [Padans' data frame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "expected output: `pandas.core.frame.DataFrame`.\n",
    "\n",
    "Look at `results`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "expected output:\n",
    "\n",
    "| <span></span> | njets |   n |  nvar |\n",
    "| ---|:-----:| ---:| -----:|\n",
    "|  0 |     0 |  20 |    20 |\n",
    "|  1 |     1 |  84 |    84 |\n",
    "|  2 |     2 | 139 |   139 |\n",
    "|  3 |     3 | 198 |   198 |\n",
    "|  4 |     4 | 203 |   203 |\n",
    "|  5 |     5 | 152 |   152 |\n",
    "|  6 |     6 | 100 |   100 |\n",
    "|  7 |     7 |  44 |    44 |\n",
    "|  8 |     8 |  43 |    43 |\n",
    "|  9 |     9 |  10 |    10 |\n",
    "| 10 |    10 |   7 |     7 |\n",
    "\n",
    "\n",
    "The `n` is the number of events with each value of `njets`. The `nvar` is the sum of the square of the weights. In this example, `n` and `nvar` have the same value in each row because the weight is one. The `nvar` is useful when you need statistical uncertainty of weighted histograms.\n",
    "\n",
    "#### Histograms in terms of the split-apply-combine strategy\n",
    "\n",
    "The result of this simple example is a 1-D histogram. The histogram is created by the [split-apply-combine strategy](http://dx.doi.org/10.18637/jss.v040.i01).\n",
    "- **split**: split the data into *groups* in terms of the variables specified by `key_name`, which can be a variable name or a list of variable names.\n",
    "- **apply**: apply a function to summarize the data in each group. The default function in AlphaTwirl is to count the number of the entries in the group. Thus, by default, the summary of the data in each group is the number of the entries in the group.\n",
    "- **combine**: combine the summaries of data from all groups. By default, qtwirl combines summaries into Pandas' data frames.\n",
    "\n",
    "With split-apply-combine strategy, AlphaTwirl summarizes event data into *multivariate categorical data*. Histograms are particular cases of multivariate categorical data. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "## ➼ More simple examples\n",
    "\n",
    "I will show a few more simple examples.\n",
    "\n",
    "### ➻ continous variable and binnings\n",
    "\n",
    "In the previous example, we used a discrete variable, `njets`, to specify groups into which we split data. We can also use a [continous variable](http://www.statisticshowto.com/continuous-variable/) if we *bin* it, i.e., put each value into one of pre-arranged intervals.\n",
    "\n",
    "#### Functor RoundLog\n",
    "\n",
    "You can specify binnings by any function that takes a value and returns a discrete value. Here, we use one of the binning functors defined in AlphaTwirl, `RoundLog`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from alphatwirl.binning import RoundLog"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The bin boundaries in `RoundLog` are separated by a equal width in log scale, which is convenint for variables with steeply falling distributions, such as jet pT.\n",
    "\n",
    "Bin boundaries are fully determined if the log of width and a boundary are specified."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "binning = RoundLog(0.2, 100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The log10 of the bin width is 0.2 and 100 is a boundary. Therefore, the bin boudaries are \n",
    "\n",
    "$$\n",
    "[\\cdots, 10^{1.8}, 10^{2.0}, 10^{2.2}, 10^{2.4}, \\cdots].\n",
    "$$\n",
    "\n",
    "In `RoundLog`, the lower edges are closed and the upper edges are open; the intervales are\n",
    "\n",
    "$$\n",
    "\\cdots, [10^{1.8}, 10^{2.0}), [10^{2.0}, 10^{2.2}), [10^{2.2}, 10^{2.4}), \\cdots.\n",
    "$$\n",
    "\n",
    "The functor `RoundLog` uses the lower edge of a bin as the bin lable. So, if you call `binning` with a value, it returns the lower edge of the bin that the value belongs. For example,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "binning(200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In my environment, I got `158.48931924611142`. Because of the [floating-point arithmetic](https://en.wikipedia.org/wiki/Floating-point_arithmetic), you might get a slighly different value.\n",
    "\n",
    "The value 200 is within the interval $[10^{2.2}, 10^{2.4})$. Therefre, the lower edge of this interval $10^{2.2} \\approx 158.49$ is returned.\n",
    "\n",
    "By default, `RoundLog` has no lowest or highest bin. So, `binning` puts any positive value in an interal. For example,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "binning(0.0005)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I got `0.00039810717055349654`, which is within the interval $[10^{-3.4}, 10^{-3.2})$.\n",
    "\n",
    "$\\log(0)$ is mathematically undefined. `binning(0)` returns `0`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "binning(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "expected output: `0`\n",
    "\n",
    "`binning()` returns `None` for a negative value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(binning(-1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "expected output: `None`\n",
    "\n",
    "You can also give a very large number to `binning()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "binning(1000000000000000000000000000000000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I got `9.999999999998528e+32`, which is approximately $10^{33.0}$, the lower edge of the interval $[10^{33.0}, 10^{33.2})$.\n",
    "\n",
    "The largest possible value is determined by the system. For exampe, you can try giving a value larger than `sys.float_info.max`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(binning(np.float128(sys.float_info.max)*1.0001))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I got `None`. In the above code, I used `np.float128` before multiplying by `1.0001` because `sys.float_info.max*1.0001` is already `inf`. The `binning(float('inf'))` returns `None`.\n",
    "\n",
    "\n",
    "You can specify the lowest and highest bins of `RoundLog`.\n",
    "\n",
    "For example,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "binning = RoundLog(0.2, 100, min=50, underflow_bin=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The lowest bin, specified by the option `min`, is the bin that 50 belongs, which is around 39.8 as you can see by executing `binning(50)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "binning(50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In my environment, I got `39.810717055349734`.\n",
    "\n",
    "\n",
    "If you give a value smaller than this value, this function returns `0`, the underflow bin specified above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "binning(30)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "expected output: `0`\n",
    "\n",
    "#### back to qtwirl\n",
    "\n",
    "After so much about `RoundLog`, we come back to qtwirl.\n",
    "\n",
    "Now, as an example of summarizing a continous variable into categorical data, we create a 1-D histogram of `met` with binnings defined by `RoundLog`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=dict(key_name='met', key_binning=RoundLog(0.2, 100, min=50, underflow_bin=0)))\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The results inlcude all non-empty bins and the next bins of all non-empty bins. In the above example, the last bin (met=1000.00) is empty but is included because its previous bin (min=630.96) is not empty. The next bins of all non-empty bins are included so that you can find the upper edges of all non-empty bins. Notice that upper edges of empty bins have no infomratin about data.\n",
    "\n",
    "### ➻ Multiple data frames\n",
    "\n",
    "In the above evempls, we created one data frame at a time.\n",
    "\n",
    "You can also create multiple data frames with a single execution of qtwirl. You can give a list of configs to `reader_cfg`. Here, we create the two histograms at once."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=[\n",
    "        dict(key_name='njets'),\n",
    "        dict(key_name='met', key_binning=RoundLog(0.2, 100, min=50, underflow_bin=0))])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `results` are a list of two data frames."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ➻ Multi-dimensional categorical data\n",
    "\n",
    "So far, we have been creating 1-D histograms, 1-D categorical data. You can also create multi-dimensional categorical data by giving a list of variable names to `key_name` and a list of binning funcitons to `key_binning`. If an input is a discrete variable and unnecessary to bin, you can give `None` to the correspoinding element of the list for `key_binning`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=dict(\n",
    "        key_name=('njets', 'met'),\n",
    "        key_binning=(None, RoundLog(0.2, 100, min=50, underflow_bin=0))))\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In principle, you can create any dimensions of categorical data.\n",
    "\n",
    "In the case of two dimensions, the results might be easier to browse if you pivot them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results.pivot(index='met', columns='njets', values='n').fillna(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ➼ Examples of reading array and indices\n",
    "\n",
    "In the above example, we created histograms of the variables `njets` and `met`, i.e., we defined the groups into which we split data by the variables `njets` and `met`. These variables are scalars, i.e., they each have one value per event. Events can also have arrays, e.g., `jet_pt`. Here, I will show how to specify indices of arrays.\n",
    "\n",
    "### ➻ Specific element\n",
    "\n",
    "You can give the array index to `key_index`. The next example creates the histogram of the 1st element of `jet_pt`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=dict(key_name='jet_pt', key_index=0, key_binning=RoundLog(0.2, 100, min=50, underflow_bin=0)))\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ➻ All elements&mdash;wildcard\n",
    "\n",
    "Instead of a specific element of an array, you can use all elments inclusively, by giving the *wildcard* `'*'` to `key_index`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=dict(key_name='jet_pt', key_index='*', key_binning=RoundLog(0.2, 100, min=50, underflow_bin=0)))\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ➻ Multiple dimensions\n",
    "\n",
    "If you are giving a list to `key_name`, you need also give a list to `key_index`. When some variables in `key_name` are scalars and don't need indices, you can give `None` to the indices for those variables. In the following exmample, we give `None` to the index for `met`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=dict(\n",
    "        key_name=('met', 'jet_pt'),\n",
    "        key_index=(None, 0),\n",
    "        key_binning=(RoundLog(0.25, 100, min=10, underflow_bin=0), RoundLog(0.2, 100, min=50, underflow_bin=0))))\n",
    "results.pivot(index='met', columns='jet_pt', values='n').fillna(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In general, you need to give lists with the same length to `key_name`, `key_index`, and `key_binnning` unless you omit or give `None`.\n",
    "\n",
    "**Note**: Internally, a scalar variable is treated as an array with one element. So, you get the same results, if you give `0` instead of `None` to the index for a scalar variable."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ➻  All combinations\n",
    "\n",
    "If you use the wildcard `'*'` to two variables, all possible pairs of the two variables will be used. We will see an example here.\n",
    "\n",
    "In the next example, we use another binning functor, `Round`, from AlphaTwirl."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from alphatwirl.binning import Round"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`Round` works in a similar way to `RoundLog`. While `RoundLog` uses the same bin width in the log scale, `Round` uses the same bin width.\n",
    "\n",
    "We use the wildcard to `jet_pt` and `jet_eta`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=dict(\n",
    "        key_name=('jet_pt', 'jet_eta'),\n",
    "        key_index=('*', '*'),\n",
    "        key_binning=(RoundLog(0.25, 100, min=10, underflow_bin=0), Round(1, 0.5))))\n",
    "results.pivot(index='jet_pt', columns='jet_eta', values='n').fillna(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The results are a 2-D histogram of all possible pairs of `jet_pt` and `jet_eta`. All possible pairs can be useful for some pairs of variables but not for this particular pair. We will make a more sensible 2-D histogram of pT and eta of inclusive jets in the next example.\n",
    "\n",
    "### ➻  Back references\n",
    "\n",
    "It makes more sense to use the same index for both `jet_pt` and `jet_eta`, which can be accomplisehd by *back references* as in the following example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=dict(\n",
    "        key_name=('jet_pt', 'jet_eta'),\n",
    "        key_index=('(*)', '\\\\1'),\n",
    "        key_binning=(RoundLog(0.25, 100, min=10), Round(1))))\n",
    "results.pivot(index='jet_pt', columns='jet_eta', values='n').fillna(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The syntax for the back reference is inspired by the regeular expression. The index for `jet_eta` in the example is `'\\\\1'`. This instructs AlphaTwirl to use the same index as the first wildcard wihtin the parenthes `'(*)'`. In general, `'\\\\n'` means to use the same index as the n-th `'(*)'`.\n",
    "\n",
    "## ➼ Event selections\n",
    "\n",
    "\n",
    "In AlphaTwirl, the conditions of event selections can be specified as a nested combinations of `All` and `Any` in python dicts.\n",
    "\n",
    "```python\n",
    "        dict(All=('ev: ev.ht[0] >= 400',\n",
    "                  'ev: ev.mht[0] >= 200',\n",
    "                  dict(Any=('ev: ev.nJet[0] == 1',\n",
    "                       dict(All=('ev: ev.nJet[0] >= 2',\n",
    "                                 'ev: ev.minChi[0] >= 0.7'))\n",
    "        ))))\n",
    "```\n",
    "Events need to satisfy all conditions listed in a value for the key `All`. Events need to satisfy at least one of the conditions listed in a value for the key `Any`.\n",
    "\n",
    "**Note**: Strings like `'ev: ev.ht[0] >= 400'` will be parsed and will be given to python lambda. In the case of `'ev: ev.ht[0] >= 400'`, `ev` is the event object in AlphaTwirl; the value of the first element of the attribue `ht` needs to be greater than or equal to `400`.\n",
    "\n",
    "**Note**: Strings are used instead of lambdas themselves because lambdas are not picklable.\n",
    "\n",
    "The following example shows how to apply event selections. It uses a simple condition, `dict(All=('ev: ev.njets[0] > 4', )`. It creates the same histograms before and after the event selection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=[\n",
    "        dict(key_name=('njets', 'met'),\n",
    "             key_binning=(None, RoundLog(0.2, 100, min=50, underflow_bin=0))),\n",
    "        dict(selection_cfg=dict(All=('ev: ev.njets[0] > 4', ))),\n",
    "        dict(key_name=('njets', 'met'),\n",
    "             key_binning=(None, RoundLog(0.2, 100, min=50, underflow_bin=0)))\n",
    "    ])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The histogram before the event selection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results[0].pivot(index='met', columns='njets', values='n').fillna(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The histogram after the event selection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results[1].pivot(index='met', columns='njets', values='n').fillna(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ➼ Adding variables&mdash;scribblers\n",
    "\n",
    "An example of adding variables"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scribblers.essentials import FuncOnNumpyArrays\n",
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=[\n",
    "        dict(reader=FuncOnNumpyArrays(\n",
    "            src_arrays=['jet_pt'],\n",
    "            out_name='ht',\n",
    "            func=np.sum)),\n",
    "        dict(key_name='ht',\n",
    "             key_binning=RoundLog(0.2, 100, min=50, underflow_bin=0)),\n",
    "    ])\n",
    "results[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ➼ multiple input files\n",
    "\n",
    "### ➻ First example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath = [\n",
    "    'https://github.com/alphatwirl/qtwirl/raw/v0.06.0/tests/data/root/sample_chain_01.root',\n",
    "    'https://github.com/alphatwirl/qtwirl/raw/v0.06.0/tests/data/root/sample_chain_02.root',\n",
    "    'https://github.com/alphatwirl/qtwirl/raw/v0.06.0/tests/data/root/sample_chain_04.root',\n",
    "]\n",
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=dict(key_name='njets'))\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ➻ Skip zombie files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath = [\n",
    "    'https://github.com/alphatwirl/qtwirl/raw/v0.06.0/tests/data/root/sample_chain_01.root',\n",
    "    'https://github.com/alphatwirl/qtwirl/raw/v0.06.0/tests/data/root/sample_chain_02.root',\n",
    "    'https://github.com/alphatwirl/qtwirl/raw/v0.06.0/tests/data/root/sample_chain_03_zombie.root',\n",
    "    'https://github.com/alphatwirl/qtwirl/raw/v0.06.0/tests/data/root/sample_chain_04.root',\n",
    "]\n",
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=dict(key_name='njets'))\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ➻ Don't skip zombie files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "filepath = [\n",
    "    'https://github.com/alphatwirl/qtwirl/raw/v0.06.0/tests/data/root/sample_chain_01.root',\n",
    "    'https://github.com/alphatwirl/qtwirl/raw/v0.06.0/tests/data/root/sample_chain_02.root',\n",
    "    'https://github.com/alphatwirl/qtwirl/raw/v0.06.0/tests/data/root/sample_chain_03_zombie.root',\n",
    "    'https://github.com/alphatwirl/qtwirl/raw/v0.06.0/tests/data/root/sample_chain_04.root',\n",
    "]\n",
    "results = qtwirl(\n",
    "    filepath, tree_name='tree',\n",
    "    reader_cfg=dict(key_name='njets'),\n",
    "    skip_error_files=False)\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ➼  Concurrency\n",
    "\n",
    "coming..\n",
    "\n",
    "### ➻ Multiprocessing\n",
    "\n",
    "coming..\n",
    "\n",
    "\n",
    "### ➻ HTCondor\n",
    "\n",
    "coming.."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}