{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## EnergyFlow Demo\n",
    "\n",
    "[EnergyFlow website](https://energyflow.network)\n",
    "\n",
    "### Energy Flow Polynomials\n",
    "\n",
    "In this tutorial, we introduce the EnergyFlow package for computing arbitrary Energy Flow Polynomials (EFPs) and collections of EFPs. The package is built with both usability, flexibility, and computational efficiency in mind.\n",
    "\n",
    "For a collection of $M$ particles with energy measure $z_i$ and pairwise angular distance measures $\\theta_{ij}$, the EFPs are multiparticle energy correlator observables, indexed by multigraphs $G$, and defined as:\n",
    "$$ \\mathrm{EFP}_G = \\sum_{i_1=1}^M \\sum_{i_2=1}^M\\cdots \\sum_{i_N=1}^M z_{i_1}z_{i_2}\\cdots z_{i_N} \\prod_{(k,\\ell)\\in G} \\theta_{i_ki_\\ell},$$\n",
    "where $(k,\\ell)\\in G$ iterates over the edges in the multigraph.\n",
    "\n",
    "### Choices of Measure\n",
    "\n",
    "The specific choice of energy and angular measures depends on the collider context. We provide the following `measure` options (default is `hadrdot`):\n",
    "\n",
    "`hadr`: $$ z_i = p_{T,i}^\\kappa,\\,\\,\\,\\,\\,\\, \\theta_{ij} = ((y_i - y_j)^2 + (\\phi_i - \\phi_j)^2)^{\\beta/2} $$\n",
    "`hadrdot`: $$ z_i = p_{T,i}^\\kappa,\\,\\,\\,\\,\\,\\, \\theta_{ij} = (2\\, \\hat p_i^\\mu \\hat p_{i\\,\\mu})^{\\beta/2},\\,\\,\\,\\,\\,\\, \\mathrm{where }\\,\\,\\,\\,\\,\\,\\hat p_i^\\mu \\equiv p_i^\\mu/p_{T,i}  $$\n",
    "`ee`: $$ z_i = E_i^\\kappa,\\,\\,\\,\\,\\,\\, \\theta_{ij} = (2\\, \\hat p_i^\\mu \\hat p_{i\\,\\mu})^{\\beta/2},\\,\\,\\,\\,\\,\\, \\mathrm{where }\\,\\,\\,\\,\\,\\,\\hat p_i^\\mu \\equiv p_i^\\mu/E_i $$\n",
    "\n",
    "The energy and angular weighting parameters `kappa` and `beta` default to $\\kappa=1$ and $\\beta = 2$. The choice of $\\kappa = 1$ is required for infrared and collinear (IRC) safety of the observables. Any choice of $\\beta > 0$ guarantees IRC safety. We also provide the `normed` option (default is `True`) to use a normalized and dimensionless energy measure $z_i/\\sum_{j=1}^M z_j$ rather than $z_i$.\n",
    "\n",
    "With this refresher, we have enough to begin using EnergyFlow to compute arbitrary EFPs! Ensure you have EnergyFlow installed. It is easily `pip` installable by executing:\n",
    "` pip install energyflow `\n",
    "\n",
    "We start by importing the EnergyFlow package as well as some other helpful Python libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import energyflow as ef\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Obtaining Events\n",
    "Let's get some events. Typically these would be read in from your favorite event generator for your physics case of interest.\n",
    "\n",
    "For the purposes of this tutorial, we will uniformly sample massless $M$-body phase space with the use of our implementation of the [RAMBO](https://www.sciencedirect.com/science/article/pii/0010465586901190?via%3Dihub) algorithm via the function `ef.gen_massless_phase_space`. It returns `nevents` events consisting of `nparticles` massless four-momenta with center of mass energy of `energy` in the center of momentum frame. In general, EnergyFlow supports events as arrays of four-momenta `[E,px,py,pz]` or arrays of hadronic coordinates `[pT,y,phi]` for hadronic measures.\n",
    "\n",
    "Let's generate 50 events with 20 particles each at center of mass energy 100 GeV."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "events = ef.gen_massless_phase_space(nevents=50, nparticles=20, energy=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Defining Graphs\n",
    "\n",
    "To specify a particular EFP to be computed for our events, we must simply specify the corresponding multigraph.\n",
    "\n",
    "\n",
    "In EnergyFlow, multigraphs are specified as lists of edges, where edges are pairs of vertices. Here are several examples of graphs given as edge lists, where we label the vertices with integers from $0$ to $N-1$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "dot      = []\n",
    "line     = [(0,1)]\n",
    "wedge    = [(0,1), (0,2)]\n",
    "triangle = [(0,1), (0,2), (1,2)]\n",
    "square   = [(0,1), (1,2), (2,3), (3,1)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Multigraphs can have multiple edges per pair of vertices. Here are several examples of multigraphs as edgelists, with one or more doubled edges."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "multiline      = [(0,1), (0,1)]\n",
    "multiwedge1    = [(0,1), (0,1), (0,2)]\n",
    "multiwedge2    = [(0,1), (0,1), (0,2), (0,2)]\n",
    "multitriangle1 = [(0,1), (0,2), (0,2), (1,2)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In fact, arbitrary objects can be used to label the vertices. We typically use integer-labeling for simplicity and readibility, though this is not required. The following two edge lists both define the _same_ graph from the perspective of EnergyFlow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "graph_LHC  = [('atlas','cms'),('atlas','cms'),('atlas','lhcb'),('cms','lhcb'),('lhcb','alice')]\n",
    "graph_ints = [(0,1),(0,1),(0,2),(1,2),(2,3)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Computing an Energy Flow Polynomial\n",
    "\n",
    "EFPs are defined by passing a graph and measure choies to `ef.EFP`. The `compute` method of `ef.EFP` can then be called on an event in order to compute the EFP on that event.\n",
    "\n",
    "For concreteness, let's begin by defining the EFP corresponding to the line graph `[(0,1)]` (one edge connecting two vertices) which should be equal to twice the squared center of mass energy of the event with a suitable choice of measure parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "EFP Value: 20000.000000000007\n"
     ]
    }
   ],
   "source": [
    "# Computing a single EFP on a single event\n",
    "\n",
    "# specify a graph and event\n",
    "graph = [(0,1)]\n",
    "event = events[0]\n",
    "\n",
    "# define the EFP corresponding to the specified graph\n",
    "EFP_graph = ef.EFP(graph, measure='hadrdot', beta=2, normed=False, coords='epxpypz')\n",
    "\n",
    "# compute the EFP on the specified event\n",
    "result = EFP_graph.compute(event)\n",
    "\n",
    "print(\"EFP Value:\", result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that the value of this EFP is indeed $\\mathrm{EFP}_{[(0,1)]} = 2\\times E_\\mathrm{CM}^2 = 2\\times (100\\,\\mathrm{GeV})^2 = 20\\,000\\, \\mathrm{GeV}^2$, as expected for our events.\n",
    "\n",
    "The framework above can be immediately extended to computing the value of an EFP on a collection of many events. This can either be done with list comprehension or even more simply using the `batch_compute` method, which tries to use as many processes as there are CPUs in the machine. The number of worker processes can be controlled by passing in `n_jobs`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "EFP w. list comprehension: [ 3425.95623748  7415.20417255 10639.10756502   538.54450213\n",
      "  2257.7204505   1388.99529475  1804.2802471   2379.91818331\n",
      "   744.76895496  2448.44326014   791.65788366  1777.23263608\n",
      "  2549.01010263  1345.0581705   3173.89847025   583.10565749\n",
      "  6402.76565968   588.39404401  1111.39724888  1300.48733859\n",
      "  1124.37600351  2011.89562027  2421.43308899   366.13033974\n",
      " 24689.53697442  1601.08307778  3088.81736771  3751.02824457\n",
      "   475.10900468  3256.96085805   545.84972603  5007.8556729\n",
      "   863.92331652  1750.0094433   2483.6595287  11504.25260136\n",
      "  2083.06997218  1249.40659285  1745.32638252  3072.47495042\n",
      "   884.73536234   628.91049857  4019.04287973  1189.00573617\n",
      "  1085.78678264  1998.76301115  2675.57238823 18365.35483771\n",
      "   355.40085464  1601.72095566]\n"
     ]
    }
   ],
   "source": [
    "# Computing a single EFP on many events\n",
    "\n",
    "# specify a graph\n",
    "graph = [(0,1), (0,2), (0,2), (1,2)]\n",
    "\n",
    "# define the EFP corresponding to the specified graph\n",
    "efp_graph = ef.EFP(graph, measure='hadr', beta=1, normed=True)\n",
    "\n",
    "# compute the EFP on the collection of events with list comprehension\n",
    "results = np.asarray([efp_graph.compute(event) for event in events])\n",
    "print(\"EFP w. list comprehension:\", results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Computing a set of Energy Flow Polynomials\n",
    "\n",
    "If we are interested in using the spanning properties of EFPs, we typically want to compute large numbers of EFPs. The EnergyFlow package does the heavy lifting for you! It contains information about all multigraphs with up to 10 edges, which can be easily and intuitively accessed using `EFPSet`.\n",
    "\n",
    "Relevant graph quantities which can easily be used to select a set of multigraphs are summarized below. More options are available: for a full list see the [documentation](https://thaler-lab.github.io/EnergyFlow/).\n",
    "\n",
    "`n` : Number of vertices in the multigraph.\n",
    "\n",
    "`d` : Degree, or number of edges in the multigraph.\n",
    "\n",
    "`v` : Maximum valency (number of edges touching a vertex).\n",
    "\n",
    "`c` : Variable Elimination computational complexity $\\mathcal O(M^c)$\n",
    "\n",
    "`p` : Number of prime factors (or connected components of the multigraph).\n",
    "\n",
    "As a basis, EFPs are typically organized by the number of edges `d`. Not only does this correspond to the degree of the polynomial, but there are also a _finite_ number of multigraphs up to a specified degree `d`. Lets get  all EFPs with up to five edges by passing `d<=5` to `EFPSet` and compute them on our events!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Originally Available EFPs:\n",
      "  Prime: 23691\n",
      "  Composite: 21540\n",
      "  Total:  45231\n",
      "Current Stored EFPs:\n",
      "  Prime: 54\n",
      "  Composite: 48\n",
      "  Total:  102\n",
      "Results: [[1.00000000e+00 6.47741037e+00 7.09165172e+01 ... 1.92731027e+04\n",
      "  1.33207431e+04 1.14026833e+04]\n",
      " [1.00000000e+00 8.38195548e+00 1.02475169e+02 ... 6.03468623e+04\n",
      "  4.40736076e+04 4.13739275e+04]\n",
      " [1.00000000e+00 9.14192496e+00 1.25226262e+02 ... 9.56771818e+04\n",
      "  6.87190500e+04 6.38540225e+04]\n",
      " ...\n",
      " [1.00000000e+00 1.08365705e+01 1.91047936e+02 ... 2.43118459e+05\n",
      "  1.62181681e+05 1.49437402e+05]\n",
      " [1.00000000e+00 3.92684387e+00 2.14485599e+01 ... 1.29876041e+03\n",
      "  9.77240038e+02 9.33723246e+02]\n",
      " [1.00000000e+00 5.61951356e+00 4.45468201e+01 ... 7.90520029e+03\n",
      "  5.97753065e+03 5.60394181e+03]]\n"
     ]
    }
   ],
   "source": [
    "# get all EFPs with d<=5 (up to d<=10 available by default)\n",
    "efpset = ef.EFPSet('d<=5', measure='hadr', beta=1, normed=True, verbose=True)\n",
    "\n",
    "# compute their values on our events\n",
    "results = np.asarray([efpset.compute(event) for event in events])\n",
    "\n",
    "print(\"Results:\", results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can very easily do much more sophisticated EFP selections using any of the graph quantities. For example to select all prime EFPs with at most 4 vertices and at most 5 edges, we simply use `EFPSet` with the following intuitive syntax:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Originally Available EFPs:\n",
      "  Prime: 23691\n",
      "  Composite: 21540\n",
      "  Total:  45231\n",
      "Current Stored EFPs:\n",
      "  Prime: 54\n",
      "  Composite: 0\n",
      "  Total:  54\n"
     ]
    }
   ],
   "source": [
    "# get all EFPs with n<=4, d<=5, that are prime (i.e. p==1)\n",
    "efpset = ef.EFPSet(('n<=',6), ('d<=',5), ('p==',1), measure='hadr', beta=1, normed=True, verbose=True)\n",
    "\n",
    "# compute their values on our events\n",
    "results = np.asarray([efpset.compute(event) for event in events])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To study the properties of an individual EFP within `EFPSet` we can easily do this using `specs` and `graphs`. Suppose we are interested in the 95th EFP. We can find out its graph and all of its relevant information simply with the following syntax. Here we show only a subset of all the information that specs provides about an EFP: for a full list see the [documentation](https://thaler-lab.github.io/EnergyFlow/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Originally Available EFPs:\n",
      "  Prime: 23691\n",
      "  Composite: 21540\n",
      "  Total:  45231\n",
      "Current Stored EFPs:\n",
      "  Prime: 54\n",
      "  Composite: 48\n",
      "  Total:  102\n",
      "Graph: [(0, 1), (2, 3), (4, 5), (4, 6), (4, 7)]\n",
      "Number of vertices, n: 8\n",
      "Number of edges,    d: 5\n",
      "Maximum valency,    v: 3\n",
      "VE complexity,      c: 2\n",
      "Number of primes,   p: 3\n"
     ]
    }
   ],
   "source": [
    "efpset = ef.EFPSet('d<=5', measure='hadr', beta=1, normed=True, verbose=True)\n",
    "\n",
    "ind = 96\n",
    "graph = efpset.graphs(ind)\n",
    "n, _, d, v, _, c, p, _ = efpset.specs[ind]\n",
    "\n",
    "print(\"Graph:\", graph)\n",
    "print(\"Number of vertices, n:\", n)\n",
    "print(\"Number of edges,    d:\", d)\n",
    "print(\"Maximum valency,    v:\", v)\n",
    "print(\"VE complexity,      c:\", c)\n",
    "print(\"Number of primes,   p:\", p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Custom measures\n",
    "\n",
    "If you want to specify your own custom measure for $z_i$ and $\\theta_{ij}$ to be used in the EFP formula, that's also possible within the EnergyFlow package. You simply compute your own custom-defined `zs` and `thetas` on the events and pass them to the `compute` methods.\n",
    "\n",
    "We demonstrate this below by using random numbers for `zs` and `thetas`, which can be replaced with any custom values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1.04637461e+01 5.15570630e+01 3.20219020e+01 ... 4.38845484e+06\n",
      "  3.52265327e+07 3.64284630e+08]\n",
      " [1.36306266e+01 9.25341706e+01 6.12626231e+01 ... 4.85402489e+07\n",
      "  5.05375289e+08 6.78438811e+09]\n",
      " [1.11692742e+01 6.42131647e+01 4.32774631e+01 ... 1.14586649e+07\n",
      "  9.95863740e+07 1.09174289e+09]\n",
      " ...\n",
      " [1.14937327e+01 6.52426006e+01 4.37276509e+01 ... 1.21436695e+07\n",
      "  1.03871374e+08 1.18210572e+09]\n",
      " [1.33279983e+01 8.80756053e+01 5.82535104e+01 ... 3.98005442e+07\n",
      "  4.01299768e+08 5.30002822e+09]\n",
      " [1.20689116e+01 7.29417525e+01 4.87920981e+01 ... 1.89355565e+07\n",
      "  1.73257390e+08 2.06481415e+09]]\n"
     ]
    }
   ],
   "source": [
    "# zs and thetas can be passed in explicitly if you want to use a custom measure\n",
    "(zs, thetas) = (np.random.rand(100,25), np.random.rand(100,25,25))\n",
    "\n",
    "results = np.asarray([efpset.compute(zs=z, thetas=theta) for (z,theta) in zip(zs,thetas)])\n",
    "print(results)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And that's it! Now you should be able to specify any EFP (i.e. multigraph) or set of EFPs that you want to comput with `EFP` or `EFPSet`, compute them on a set of events with `compute` or `batch_compute`, and study the results with `specs` and `graphs`! As always, see the [documentation](https://thaler-lab.github.io/EnergyFlow/) for a full description of the EnergyFlow package."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}