{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Enabling notebook extension splitcell/splitcell...\n",
      "      - Validating: \u001b[32mOK\u001b[0m\n",
      "Enabling notebook extension rise/main...\n",
      "      - Validating: \u001b[32mOK\u001b[0m\n"
     ]
    }
   ],
   "source": [
    "# early imports\n",
    "import IPython\n",
    "import numpy as np\n",
    "\n",
    "# setup notebook extensions\n",
    "!jupyter nbextension enable splitcell/splitcell\n",
    "!jupyter nbextension enable rise/main"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "![law logo](https://raw.githubusercontent.com/riga/law/master/logo.png)\n",
    "[github.com/riga/law](https://github.com/riga/law)\n",
    "\n",
    "*Design Pattern for Analysis Automation on Interchangeable,<br>\n",
    "Distributed Resources using Luigi Analysis Workflows*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## A typical high-energy physics analyses"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "#### Search for ttH (H → bb) production ([1804.03682](https://arxiv.org/abs/1804.03682))\n",
    "\n",
    "- ~500k different *tasks* to process\n",
    "- Processing on the WLCG, HTCondor batch systems, GPU clusters, local machines\n",
    "- O(100TB) *user* storage across several sites, local storage, DropBox\n",
    "- Complex data flow due to sophisticated analyses techniques and requirements\n",
    "    - BDTs, DNNs, Matrix Element Method\n",
    "    - ~80 systematic variations\n",
    "- Must be operable by everyone any time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "<center>\n",
    "    <img src=\"images/tth.png\" style=\"margin-top: 10%;\" />\n",
    "    <img src=\"images/dnn.png\" />\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Landscape of HEP analyses"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "#### Metrics to describe analyses\n",
    "\n",
    "- Scale: Measure of resource consumption and amount of data to be processed\n",
    "    \n",
    "- Complexity: Measure of granularity and inhomogeneity of workloads"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "<center>\n",
    "    <img src=\"images/scale_complexity.png\" width=\"500\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Todays (and definitely future) analyses tend to be **both** large-scale **and** complex:\n",
    "    - Undocumented requirements between workloads, only exist in the physicist’s head\n",
    "    - Manual execution & steering of jobs, bookkeeping of data, ...\n",
    "    - Error-prone & time-consuming"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<center>\n",
    "    <img src=\"images/ttbb_workflow.png\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<center>\n",
    "    <img src=\"images/ttbb_workflow_zoom.png\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Motivational questions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "- Does the analysis depend on *where* it runs,<br> or *where* it stores data?<br>\n",
    "**→ Decisions should not dictate code design!**\n",
    "\n",
    "- When a Postdoc / PhD student leaves,<br> can someone else run the analysis,<br> is a new *framework* required?<br>\n",
    "**→ Workflow often not documented, <br> only exists in the physicists head!**\n",
    "\n",
    "- After an analysis is published, are people investing time to preserve their work,<br> can it be repeated after O(w/m/y)?<br>\n",
    "**→ Daily working environment must provide preservation features out-of-the-box!**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "<center>\n",
    "    <img src=\"images/remote.png\">\n",
    "    <img src=\"images/lost.png\">\n",
    "    <img src=\"images/preservation.png\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## <a href=\"https://github.com/spotify/luigi\"><span style='color:#63cc10;'>luigi</span></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "- Python package for building complex pipelines\n",
    "- Development started at Spotify (2012),<br > now open-source and community driven\n",
    "\n",
    "#### Simple building blocks\n",
    "\n",
    "1. Workloads defined as <span style=\"color:blue; font-family:monospace;\">Tasks</span>\n",
    "2. <span style=\"color:blue; font-family:monospace;\">Tasks</span> **require** other <span style=\"color:blue; font-family:monospace;\">Tasks</span>\n",
    "3. <span style=\"color:blue; font-family:monospace;\">Tasks</span> **output** <span style=\"color:green; font-family:monospace;\">Targets</span>\n",
    "4. <span style=\"color:red; font-family:monospace;\">Parameters</span> customize behavior\n",
    "\n",
    "#### Benefits\n",
    "\n",
    "- Only processes what is really necessary\n",
    "- Error handling & automatic re-scheduling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "<center>\n",
    "    <img src=\"https://raw.githubusercontent.com/spotify/luigi/master/doc/luigi.png\" width=\"250\">\n",
    "    <img src=\"images/scheduler.png\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## <span style='color:#63cc10;'>luigi</span> in a nutshell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "# analysis.py\n",
    "\n",
    "import luigi\n",
    "\n",
    "class Inference(luigi.Task):\n",
    "    \n",
    "    split = luigi.ChoiceParameter(default=\"test\", choices=(\"test\", \"valid\", \"train\"))\n",
    "    # parameters are translated into command-line interface arguments\n",
    "    \n",
    "    def requires(self):\n",
    "        from mva import MVAEvaluation\n",
    "        return MVAEvaluation(split=self.split)  # pass parameters upstream\n",
    "    \n",
    "    def output(self):\n",
    "        return luigi.LocalTarget(f\"data_{self.split}.h5\")  # encode *significant* parameters into output path\n",
    "    \n",
    "    def run(self):\n",
    "        outp = self.output()  # this is the LocalTarget from above\n",
    "        inp = self.input()    # this is output() of MVAEvaluation\n",
    "        \n",
    "        do_whathever_an_inference_does(inp.path, outp.path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## `make`-like execution system"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "```shell\n",
    "luigi --module analysis Inference \\\n",
    "    --split test \\\n",
    "    --other-parameters ...\n",
    "```\n",
    "\n",
    "#### What <span style='color:#63cc10;'>luigi</span> does\n",
    "\n",
    "1. Create dependency tree for triggered task\n",
    "2. Determine tasks to actually run:\n",
    "    - Top-down tree traversal\n",
    "    - Consider a task **complete** when all of its output ``Target``'s exist\n",
    "3. Run tree with configurable number of workers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "<center>\n",
    "    <img src=\"images/example_tree1.png\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Work of a B.Sc. student after 2 weeks "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "<center>\n",
    "    <img src=\"images/example_tree2.png\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# law: <span style='color:#63cc10;'>luigi</span> <span style='color:#3c69a1;'>analysis workflow</span> package"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "- <span style='color:#63cc10;'>luigi</span> is a perfect tool to model complex workflows, simple structure, easy to extend\n",
    "\n",
    "- <span style='color:#63cc10;'>l</span><span style='color:#3c69a1;'>aw</span> *extends* <span style='color:#63cc10;'>luigi</span> (i.e. it does not replace it)\n",
    "\n",
    "- **Main goal**: decouple algorithm code from<br> 1. *run locations*,<br> 2. *storage locations*, and<br> 3. *software environments*\n",
    "\n",
    "- Provides a toolbox to follow a design pattern\n",
    "- No constraints on data format, language, ...\n",
    "- No fixation on dedicated resources\n",
    "- Not a *framework*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "<center>\n",
    "    <img src=\"https://raw.githubusercontent.com/riga/law/master/logo.png\"><br>\n",
    "    <img src=\"images/workload.png\" width=\"45%\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Example task: multiply input parameter by 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before we start, import <span style='color:#63cc10;'>luigi</span> and <span style='color:#63cc10;'>l</span><span style='color:#3c69a1;'>aw</span>, and load IPython magics to execute tasks from within notebooks:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[0;49;32mINFO\u001b[0m: law.contrib.ipython.magic - magics successfully registered: %law, %ilaw\n"
     ]
    }
   ],
   "source": [
    "# basic imports\n",
    "import luigi\n",
    "import law\n",
    "import json\n",
    "\n",
    "# load law ipython magics\n",
    "law.contrib.load(\"ipython\")\n",
    "law.ipython.register_magics(log_level=\"INFO\")\n",
    "# drop-in replacement for base task with some interactive features\n",
    "Task = law.ipython.Task"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "IPython magics:\n",
    "\n",
    "- `%ilaw` runs a task inside the current session\n",
    "- `%law` runs a task as a subprocess (not used in this notebook)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Example task: multiply an input parameter by 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "class TimesTwo(Task):\n",
    "\n",
    "    n = luigi.IntParameter()  # no default!\n",
    "    \n",
    "    def output(self):\n",
    "        return law.LocalFileTarget(\n",
    "            f\"data/n{self.n}.json\")\n",
    "    \n",
    "    def run(self):        \n",
    "        # method 1: the verbose way\n",
    "        output = self.output()\n",
    "        output.parent.touch()  # creates the data/ dir\n",
    "\n",
    "        # define data to save\n",
    "        # note: self.n is the value of the \"n\" parameter\n",
    "        # and != self.__class__.n (parameter instance!)\n",
    "        data = {\"in\": self.n, \"out\": self.n * 2}\n",
    "\n",
    "        # pythonic way to save data\n",
    "        with open(output.path, \"w\") as f:\n",
    "            json.dump(data, f, indent=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "%ilaw run TimesTwo --n 5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Inspect results with `--print-status <tree_depth>`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "%ilaw run TimesTwo --n 5 --print-status 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "!cat data/n5.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Remove results with `--remove-output <tree_depth>`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split",
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%ilaw run TimesTwo --n 5 --remove-output 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "!cat data/n5.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Example task: multiply an input parameter by 2, refactored"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "class TimesTwo(Task):\n",
    "\n",
    "    n = luigi.IntParameter()  # no default!\n",
    "    \n",
    "    def output(self):\n",
    "        return law.LocalFileTarget(\n",
    "            f\"data/n{self.n}.json\")\n",
    "    \n",
    "    def run(self):\n",
    "        # method 1: using target *formatters*\n",
    "        data = {\"in\": self.n, \"out\": self.n * 2}\n",
    "        self.output().dump(data, formatter=\"json\",\n",
    "                           indent=4)\n",
    "        \n",
    "        # all arguments passed to \"dump\" implementation\n",
    "\n",
    "        # variety of available formatters: yaml, numpy,\n",
    "        # h5py, root, matplotlib, tensorflow, keras,\n",
    "        # coffea, zip, tar, pickle, ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "%ilaw run TimesTwo --n 6"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Abstract storage: <span style=\"color:#589af7;\">remote targets</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "- Idea: work with remote files / directories<br> as if they were local\n",
    "\n",
    "- Remote `Target`s based on GFAL2 Python bindings, supports all WLCG protocols (dCache, XRootD, GridFTP, SRM, ...) + DropBox\n",
    "\n",
    "- Implement **identical** target API\n",
    "\n",
    "- Automatic retries\n",
    "\n",
    "- Round-robin (over different doors)\n",
    "\n",
    "- Local caching"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "<center>\n",
    "    <img src=\"images/workload.png\" width=\"50%\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Example: DropBox targets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "<br>\n",
    "\n",
    "Configure DropBox access in law.cfg:\n",
    "\n",
    "```\n",
    "[dropbox]\n",
    "\n",
    "base: dropbox://dropbox.com/my_dir\n",
    "app_key: ...\n",
    "app_secret: ...\n",
    "access_token: ...\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "# load the dropbox and numpy contrib packages\n",
    "law.contrib.load(\"dropbox\", \"numpy\")\n",
    "\n",
    "# define a directory\n",
    "my_dir = law.dropbox.DropboxFileTarget(\"/\")\n",
    "\n",
    "# save a numpy array in a new file\n",
    "my_file = my_dir.child(\"data.npz\", type=\"f\")\n",
    "my_file.dump(np.zeros((10, 20)), formatter=\"numpy\")\n",
    "\n",
    "# directory listing\n",
    "my_dir.listdir()  # -> [\"data.npz\"]\n",
    "\n",
    "# load the data again\n",
    "zeros = my_file.load(formatter=\"numpy\")\n",
    "\n",
    "# play around with objects\n",
    "my_file.parent == my_dir  # -> True\n",
    "\n",
    "my_dir.child(\"other.txt\", type=\"f\").touch()\n",
    "my_file.sibling(\"other.txt\").exists()  # > True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See [examples/dropbox_targets](https://github.com/riga/law/tree/master/examples/dropbox_targets) for examples and more infos on configuring access to your DropBox."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Abstract job interface: <span style=\"color:#f38f23;\">remote workflows</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "- Idea: submission built into tasks, **no need** to write extra code\n",
    "\n",
    "- Currently supported job systems:<br>\n",
    "HTCondor, LSF, gLite, ARC, (Slurm)\n",
    "\n",
    "- Automatic resubmission\n",
    "\n",
    "- Full job control (# tasks per job, # parallel jobs, # of job retries, early stopping, ... )\n",
    "\n",
    "- Dashboard interface"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "<center>\n",
    "    <img src=\"images/workload.png\" width=\"50%\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## From the [htcondor_at_cern](https://github.com/riga/law/tree/master/examples/htcondor_at_cern) example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "A `Workflow` task has **branches**, defined in `create_branch_map`:\n",
    "\n",
    "- Add `--branch <n>` to the command line to run a *certain branch* locally\n",
    "- Leave it empty to run **all** branches at a configurable location (local, HTCondor, ...)\n",
    "\n",
    "The task to the right has 26 branches (branch 0 to 25), with each branch writing one character of the alphabet into a file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "class CreateChars(Task, law.LocalWorkflow, HTCondorWorkflow):\n",
    "\n",
    "    def create_branch_map(self):\n",
    "        # map branch numbers 0-25 to\n",
    "        # ascii numbers 97-122 (= a-z)\n",
    "        return {\n",
    "            i: num for i, num in\n",
    "            enumerate(range(97, 123))\n",
    "        }\n",
    "\n",
    "    def output(self):\n",
    "        return law.LocalFileTarget(\n",
    "            f\"output_{self.branch}.json\")\n",
    "\n",
    "    def run(self):\n",
    "        # branch_data holds the integer number to convert\n",
    "        char = chr(self.branch_data)\n",
    "\n",
    "        # write the output\n",
    "        self.output().dump({\"char\": char})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `law run CreateChars --branch 0` would write `{\"char\": \"a\"}` into `output_0.json`.\n",
    "- `law run CreateChars` would run all branches locally."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Select htcondor at execution time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "![](images/htcondor.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Checkout [examples/htcondor_at_cern](https://github.com/riga/law/tree/master/examples/htcondor_at_cern)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Abstract software environments: <span style=\"color:#69ba3a;\">sandboxes</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "- Diverging software requirements between workloads is a great feature / challenge / problem\n",
    "\n",
    "- Introduce sandboxing:<br> \"Run entire task in different environment\"\n",
    "\n",
    "- Existing sandbox implementations:\n",
    "    - Sub-shell with init file\n",
    "    - Docker images\n",
    "    - Singularity images\n",
    "    \n",
    "<br>\n",
    "<br>\n",
    "<center>\n",
    "    <img src=\"images/singularity_docker.png\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "<center>\n",
    "    <img src=\"images/workload.png\" width=\"35%\">\n",
    "    <img src=\"images/example_tree3.png\" width=\"50%\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Example: Bash sandboxes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#!/usr/bin/env bash\r\n",
      "\r\n",
      "export MY_VARIABLE=\"foo\"\r\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "class SandboxExample(Task, law.SandboxTask):\n",
    "    \n",
    "    sandbox = \"bash::test_env1.sh\"\n",
    "    \n",
    "    # for docker container, use\n",
    "    # sandbox = \"docker::image_name\"\n",
    "\n",
    "    # for singularity container, use\n",
    "    # sandbox = \"singularity::image_name\"\n",
    "    \n",
    "    def output(self):\n",
    "        return law.LocalFileTarget(\"data/sandbox_variable.txt\")\n",
    "\n",
    "    # the run method is encapsulated \n",
    "    def run(self):\n",
    "        value = os.getenv(\"MY_VARIABLE\")\n",
    "\n",
    "        print(f\"MY_VARIABLE: {value}\")\n",
    "        \n",
    "        self.output().dump(value, formatter=\"text\")\n",
    "        \n",
    "\n",
    "!cat test_env1.sh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[0;49;32mINFO\u001b[0m: luigi-interface - Informed scheduler that task   SandboxExample__99914b932b   has status   PENDING\n",
      "\u001b[0;49;32mINFO\u001b[0m: luigi-interface - Done scheduling tasks\n",
      "\u001b[0;49;32mINFO\u001b[0m: luigi-interface - Running Worker with 1 processes\n",
      "\u001b[0;49;32mINFO\u001b[0m: luigi-interface - [pid 3238510] Worker Worker(salt=533918042, workers=1, host=lxplus806.cern.ch, username=mrieger, pid=3238510) running   SandboxExample()\n",
      "\u001b[0;49;32mINFO\u001b[0m: luigi-interface - [pid 3238510] Worker Worker(salt=533918042, workers=1, host=lxplus806.cern.ch, username=mrieger, pid=3238510) done      SandboxExample()\n",
      "\u001b[0;49;32mINFO\u001b[0m: luigi-interface - Informed scheduler that task   SandboxExample__99914b932b   has status   DONE\n",
      "\u001b[0;49;32mINFO\u001b[0m: luigi-interface - Worker Worker(salt=533918042, workers=1, host=lxplus806.cern.ch, username=mrieger, pid=3238510) was stopped. Shutting down Keep-Alive thread\n",
      "\u001b[0;49;32mINFO\u001b[0m: luigi-interface - \n",
      "===== Luigi Execution Summary =====\n",
      "\n",
      "Scheduled 1 tasks of which:\n",
      "* 1 ran successfully:\n",
      "    - 1 SandboxExample(...)\n",
      "\n",
      "This progress looks :) because there were no failed tasks or missing dependencies\n",
      "\n",
      "===== Luigi Execution Summary =====\n",
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "======================== entering sandbox ========================\n",
      "sandbox: bash::test_env1.sh\n",
      "task   : SandboxExample__99914b932b\n",
      "==================================================================\n",
      "\n",
      "MY_VARIABLE: foo\n",
      "\n",
      "======================== leaving sandbox =========================\n",
      "sandbox: bash::test_env1.sh\n",
      "task   : SandboxExample__99914b932b\n",
      "==================================================================\n"
     ]
    }
   ],
   "source": [
    "%ilaw run SandboxExample\n",
    "# mockup output, please run via command line :)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Summary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "- Many HEP analyses already / likely to increase in scale and complexity\n",
    "- Resource-agnostic workflow management and automation essential\n",
    "- **Need for toolbox providing a design pattern, not another framework**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- <span style='color:#63cc10;'>luigi</span> is able to model even complex workflows\n",
    "- <span style='color:#63cc10;'>l</span><span style='color:#3c69a1;'>aw</span> provides convenience & scalability on HEP infrastructure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- All information transparently encoded in tasks, targets & dependencies\n",
    "- **Full automation** of end-to-end analyses"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Links\n",
    "\n",
    "#### law\n",
    "- Repository: [github.com/riga/law](https://github.com/riga/law)\n",
    "- Paper: [arXiv:1706.00955](https://arxiv.org/abs/1706.00955) (CHEP proceedings)\n",
    "- Documentation: [law.readthedocs.io](https://law.readthedocs.io) (in progress)\n",
    "- Minimal example: [github.com/riga/law/tree/master/examples/loremipsum](https://github.com/riga/law/tree/master/examples/loremipsum)\n",
    "- HTCondor example: [github.com/riga/law/tree/master/examples/htcondor_at_cern](https://github.com/riga/law/tree/master/examples/htcondor_at_cern)\n",
    "\n",
    "#### luigi\n",
    "- Repository: [github.com/spotify/luigi](https://github.com/spotify/luigi)\n",
    "- Documentation: [luigi.readthedocs.io](https://luigi.readthedocs.io)\n",
    "- \"Hello world\": [github.com/spotify/luigi/blob/master/examples/hello_world.py](https://github.com/spotify/luigi/blob/master/examples/hello_world.py)"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  },
  "rise": {
   "controls": false,
   "footer": "&nbsp;&nbsp; <b>PyHEP2020</b> | <span style='color:#63cc10;'>luigi</span> <span style='color:#3c69a1;'>analysis workflow</span> | Marcel Rieger, CERN",
   "scroll": true,
   "slideNumber": "c",
   "theme": "simple",
   "transition": "none"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}