{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Optimizing models using the PyTorch JIT\n",
    "\n",
    "*Thomas Viehmann, MathInf GmbH*\n",
    "\n",
    "Today we look at TorchScript, the language implemented by the PyTorch JIT (\"Just in Time compiler\"), PyTorch's solution for deployment and model optimization. \n",
    "We can use it to export models to work beyond Python, e.g. on mobile or embedded platforms, or just to escape the infamous Python Global Interpreter Lock during computation. This is possibly the more well-known application.\n",
    "\n",
    "But the JIT also lends itself to the implementation _holistic_ optimizations that consider several operations at once. This is  as opposed to just writing a better implementation of any given PyTorch operation, although the JIT works for these, too, as we will see.\n",
    "\n",
    "We will start with a high-level overview of how PyTorch and the JIT work to then dive into the how it enables compiling fused kernels to optimize models at run time.\n",
    "\n",
    "*Sidenote:* If you want to take a look at exporting models, do check out Chapter 15 of our [book](https://www.manning.com/books/deep-learning-with-pytorch), from which I also took some diagrams below. There we introduce the JIT with a view towards running the model in C++ and on mobile. The book also as a comprehensive introduction from everything PyTorch to how to represent data and a detailed account of project to build an AI detecting cancerous lung nodules.\n",
    "\n",
    "This tutorial has been prepared in the context of work I did for AMD. Thank you!\n",
    "\n",
    "**Note:** This is the Notebook version of a [blog post](https://lernapparat.de/jit-optimization-intro/) on the subject."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "The first thing we want to do when considering how the JIT works is consider the structure of PyTorch.\n",
    "\n",
    "<img src=\"Ch15_F4_PyTorch_calling_pytorch.svg\"/>\n",
    "(image from Deep Learning with PyTorch)\n",
    "\n",
    "PyTorch most prominently is a PyTorch library, I call this part classic PyTorch. Some parts are implemented in Python (e.g. the `torch.nn` modules and the optimizers), but the compute functions (like `torch.matmul`) are provided as a Python C++ extension.\n",
    "\n",
    "Looking a bit closer, this Python C++ extension is a thin wrapper around PyTorch's C++ library _LibTorch_. That in turn uses the ATen tensor library which itself dispatches into various backends.\n",
    "\n",
    "The PyTorch JIT now implements a virtual machine that takes in TorchScript programs (typically created through the `torch.jit` ) and runs them by calling into LibTorch itself, circumventing the Python parts.\n",
    "\n",
    "The JIT also is extendable by defining _Custom Ops_, we'll get back to this. To run PyTorch-exported programs in Torch Mobile or Torch Serving, the typical thing is to implement a wrapper around the JIT api to load and run modules.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## TorchScript\n",
    "\n",
    "Now that we know that we want to run our model in the JIT execution, we should see how to get our model into TorchScript, the form the JIT can process.\n",
    "\n",
    "*Sidenote:* TorchScript is used simultaneously for the language - mostly a typed subset of Python - and the representation (intermediate - IR).\n",
    "\n",
    "There are two main ways of achieving this (but they can be mixed), _scripting_ and _tracing_. Let's look at them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Vega 20 [Radeon VII]'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import sys\n",
    "#sys.path.insert(0, '/home/tv/pytorch/pytorch/build/lib.linux-x86_64-3.9/')\n",
    "import torch\n",
    "\n",
    "%matplotlib inline\n",
    "from matplotlib import pyplot\n",
    "import numpy\n",
    "\n",
    "assert torch.cuda.is_available(), \"Some examples need the GPU\"\n",
    "torch.cuda.get_device_name()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Scripting\n",
    "\n",
    "Scripting compiles (mostly) a subset of Python.\n",
    "It takes the Python source code and transforms it. \n",
    "\"Here is what the function should do\", just like normal programming.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "@torch.jit.script\n",
    "def fn(x):\n",
    "    return x * 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(<torch.jit.ScriptFunction at 0x7ff36e888f90>,\n",
       " graph(%x.1 : Tensor):\n",
       "   %2 : int = prim::Constant[value=2]() # <ipython-input-2-6dcc6c3c1c8e>:3:15\n",
       "   %3 : Tensor = aten::mul(%x.1, %2) # <ipython-input-2-6dcc6c3c1c8e>:3:11\n",
       "   return (%3))"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fn, fn.graph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Tracing\n",
    "\n",
    "Tracing runs the code and observers the calls into PyTorch with some sample input.\n",
    "\"Watch me, now you know how to do the same.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "graph(%x : Float(5, strides=[1], requires_grad=0, device=cpu)):\n",
      "  %1 : Long(requires_grad=0, device=cpu) = prim::Constant[value={2}]() # <ipython-input-4-7d43d0af2e80>:2:0\n",
      "  %2 : Float(5, strides=[1], requires_grad=0, device=cpu) = aten::mul(%x, %1) # <ipython-input-4-7d43d0af2e80>:2:0\n",
      "  return (%2)\n",
      " def fn(x: Tensor) -> Tensor:\n",
      "  return torch.mul(x, CONSTANTS.c0)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "def fn(x):\n",
    "    return x * 2\n",
    "fn = torch.jit.trace(fn, [torch.randn(5)])\n",
    "\n",
    "print(fn.graph, fn.code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "N.B.: The specialization for the Tensor shape isn't relevant here and will be erased e.g. during saving of the model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### What is TorchScript?\n",
    "\n",
    "Now that had a glimpse of TorchScript, what is it?\n",
    "\n",
    "One important difference between TorchScript and Python is that in TorchScript everything is typed. Important\n",
    "types are\n",
    "- `bool`, `int`, `long`, `double` for numbers (int = 32 bit integer, long = 64 bit integer)\n",
    "- `Tensor` for tensors (of arbitrary shape, dtype, ...)\n",
    "- `List[T]` a list with elements fo type T (one of the above)\n",
    "- Tuples are of fixed size with arbitrary but fixed element type, so e.g. `Tuple(Tensor, int)`.\n",
    "- `Optional[T]` for things that can be `None`\n",
    "\n",
    "`None` always is of type `Optional[T]` for some specific `T` (except in the rarest circumstances).\n",
    "\n",
    "PyTorch will mostly infer the intermediate and return types, but you need to annotate any non-Tensor inputs.\n",
    "\n",
    "(maybe move to later)\n",
    "Another important difference is the binding behaviour - when a given variable name is looked up to find the associated variable. Python uses late binding. If we write a function that calls `torch.matmul` the Python interpreter will look up what `torch.matmul` is when it executes the statement in which it is used.\n",
    "\n",
    "This is in contrast to many other languages, which use early binding, as - your guessed it - TorchScript does: When we compile a function to TorchScript, the JIT looks it up then and there and puts it into our function (it even inlines the commands, but that is another part).\n",
    "*Sidenote:* And while functions are looked up early, the *operators* being executed by the PyTorch JIT are found during runtime.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Tracing vs. Scripting\n",
    "\n",
    "Scripting will process all code but may not understand all. This means it captures all constructs (like control flow) it understands, but it will fail if it doesn't understand something.\n",
    "\n",
    "Tracing doesn't see anything not calling into PyTorch and will happily ignore that (e.g. control flow). This is also the reason why it will loudly complain if you have non-tensor inputs.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "def fn(x):\n",
    "    for i in range(x.dim()):\n",
    "        x = x * x\n",
    "    return x\n",
    "\n",
    "script_fn = torch.jit.script(fn)\n",
    "trace_fn = torch.jit.trace(fn, [torch.randn(5, 5)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "def fn(x: Tensor) -> Tensor:\n",
      "  x0 = x\n",
      "  for i in range(torch.dim(x)):\n",
      "    x0 = torch.mul(x0, x0)\n",
      "  return x0\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(script_fn.code)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "def fn(x: Tensor) -> Tensor:\n",
      "  x0 = torch.mul(x, x)\n",
      "  return torch.mul(x0, x0)\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(trace_fn.code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Tracing and Scripting Modules\n",
    "\n",
    "But our models often are not functions. What now?\n",
    "\n",
    "With tracing, we can work just like with functions. We get a `ScriptModule` subclass that behaves much like a\n",
    "`Module` with parameters, state dict etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(torch.jit._trace.TopLevelTracedModule,\n",
       " Sequential(\n",
       "   original_name=Sequential\n",
       "   (0): Linear(original_name=Linear)\n",
       "   (1): ReLU(original_name=ReLU)\n",
       "   (2): Linear(original_name=Linear)\n",
       " ))"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = torch.nn.Sequential(\n",
    "    torch.nn.Linear(1, 10),\n",
    "    torch.nn.ReLU(),\n",
    "    torch.nn.Linear(10, 1))\n",
    "    \n",
    "traced_model = torch.jit.trace(model, [torch.randn(8, 1)])\n",
    "type(traced_model), traced_model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Saving is a bit different, here we include the model on purpose:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[1.0100],\n",
       "        [0.4957],\n",
       "        [0.5004],\n",
       "        [0.6980],\n",
       "        [0.8027],\n",
       "        [0.5387],\n",
       "        [0.6841],\n",
       "        [0.7053]], grad_fn=<AddBackward0>)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "traced_model.save('./traced_model.pt')\n",
    "loaded_model = torch.jit.load('./traced_model.pt')\n",
    "\n",
    "loaded_model(torch.randn(8,1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Scripting Modules\n",
    "\n",
    "Scripting modules is ... a bit tricky. We don't script the class in its entirety but instead take an instance (in particular past `__init__`) and process its data members and methods (the latters work like script functions)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "def forward(self,\n",
      "    input: Tensor) -> Tensor:\n",
      "  _0 = getattr(self, \"0\")\n",
      "  _1 = getattr(self, \"1\")\n",
      "  _2 = getattr(self, \"2\")\n",
      "  input0 = (_0).forward(input, )\n",
      "  input1 = (_1).forward(input0, )\n",
      "  return (_2).forward(input1, )\n",
      "\n"
     ]
    }
   ],
   "source": [
    "scripted_model = torch.jit.script(model)\n",
    "print(scripted_model.code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We can also look at the graph including submodules, but it gets unwieldy rather fast:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_13.Sequential,\n",
       "      %input.1 : Tensor):\n",
       "  %2 : __torch__.torch.nn.modules.linear.___torch_mangle_10.Linear = prim::GetAttr[name=\"0\"](%self)\n",
       "  %3 : __torch__.torch.nn.modules.activation.___torch_mangle_11.ReLU = prim::GetAttr[name=\"1\"](%self)\n",
       "  %4 : __torch__.torch.nn.modules.linear.___torch_mangle_12.Linear = prim::GetAttr[name=\"2\"](%self)\n",
       "  %8 : int = prim::Constant[value=1]()\n",
       "  %9 : int = prim::Constant[value=2]() # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1663:22\n",
       "  %10 : Tensor = prim::GetAttr[name=\"weight\"](%2)\n",
       "  %11 : Tensor = prim::GetAttr[name=\"bias\"](%2)\n",
       "  %12 : int = aten::dim(%input.1) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1663:7\n",
       "  %13 : bool = aten::eq(%12, %9) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1663:7\n",
       "  %input.3 : Tensor = prim::If(%13) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1663:4\n",
       "    block0():\n",
       "      %15 : Tensor = aten::t(%10) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1665:39\n",
       "      %ret.2 : Tensor = aten::addmm(%11, %input.1, %15, %8, %8) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1665:14\n",
       "      -> (%ret.2)\n",
       "    block1():\n",
       "      %17 : Tensor = aten::t(%10) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1667:30\n",
       "      %output.2 : Tensor = aten::matmul(%input.1, %17) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1667:17\n",
       "      %output.4 : Tensor = aten::add_(%output.2, %11, %8) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1669:12\n",
       "      -> (%output.4)\n",
       "  %input.5 : Tensor = aten::relu(%input.3) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1111:17\n",
       "  %21 : int = prim::Constant[value=1]()\n",
       "  %22 : int = prim::Constant[value=2]() # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1663:22\n",
       "  %23 : Tensor = prim::GetAttr[name=\"weight\"](%4)\n",
       "  %24 : Tensor = prim::GetAttr[name=\"bias\"](%4)\n",
       "  %25 : int = aten::dim(%input.5) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1663:7\n",
       "  %26 : bool = aten::eq(%25, %22) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1663:7\n",
       "  %input.7 : Tensor = prim::If(%26) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1663:4\n",
       "    block0():\n",
       "      %28 : Tensor = aten::t(%23) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1665:39\n",
       "      %ret.1 : Tensor = aten::addmm(%24, %input.5, %28, %21, %21) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1665:14\n",
       "      -> (%ret.1)\n",
       "    block1():\n",
       "      %30 : Tensor = aten::t(%23) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1667:30\n",
       "      %output.1 : Tensor = aten::matmul(%input.5, %30) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1667:17\n",
       "      %output.3 : Tensor = aten::add_(%output.1, %24, %21) # /usr/local/lib/python3.9/dist-packages/torch/nn/functional.py:1669:12\n",
       "      -> (%output.3)\n",
       "  return (%input.7)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "scripted_model.forward.inlined_graph"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# What can you do with scripted modules?\n",
    "\n",
    "- Run them as is, bypassing Python.\n",
    "  - not as much speedup as often is expected (maybe 5%-10% for some models I tested),\n",
    "  - but - sometimes crucially - it avoids the dreaded Python Global Interpreter Lock (GIL), so it is useful e.g.\n",
    "    for multithreaded things like serving PyTorch models.\n",
    "- Export and run in C++ / Mobile / ..., export to other frameworks like [TVM](https://tvm.ai/).\n",
    "- Apply holistic optimizations (this is what a submodule, the JIT fuser does)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How the JIT works at a very high level\n",
    "\n",
    "For the in-depth discussion of fusers it will be useful to look closer at how the JIT works under the hood.\n",
    "The JIT has several phases to get us from a function to running our programs. For our purposes, we think of the following three stages:\n",
    "\n",
    "- The first thing is to go from tracing or source to a graph.\n",
    "- Then there are a number of compiler passes through the graph to go from `.graph` to an optimized graph (that can be retrieved with `.graph_for(*inputs)`. We will meet some of them in detail below.\n",
    "- Finally, the `.graph` is compiled to a from of bytecode that is then executed by a virtual machine. We might hope to not meet the bytecode too often, but clearly we want this part to be fast, too. This maintains the operands on a stack and then dispatches to the various operators registered by LibTorch or the _custom operators_ that extend the JIT.\n",
    "\n",
    "The unoptimized `.graph` is the \"hosehold\" format here, in particular, this is what is serialized and loading a scripted function will then have to re-do the optimizations.\n",
    "\n",
    "## Tracing or scripting to a .graph\n",
    "\n",
    " When tracing a function, the LibTorch dispatcher will call a special function (found in `torch/csrc/autograd/generated/TraceTypeEverything.cpp` after you have built PyTorch) for every call of a LibTorch function. This special function  (*Sidenote*: For more on the dispatcher, see Ed Yang's excellent blog post [Let's talk about the PyTorch Dispatcher](http://blog.ezyang.com/2020/09/lets-talk-about-the-pytorch-dispatcher/).) This will record a graph node (the ones that show up in `.graph`) with source location and type information and all in a `TracingState` structure's `.graph` and in between re-dispatch to run the real LibTorch operation. This `.graph` is more or less directly what you can see as `.graph` of a traced function.\n",
    "\n",
    "- When tracing modules, the tracer will also hook into the module `__call__` method to record the current module as the scope to capture the module structure. This is done at the Python level in the `torch.nn.Module` class, see the `_slow_forward` method there.\n",
    "\n",
    "- When scripting a function from Python, the JIT grabs the Python source code (via the `inspect` module of the standard Python library) and then runs the Python parser from `ast` (for Abstract Syntax Tree) module. It then transforms the Python AST into a TorchScript (implemented in C++) one, from which an initial graph form that looks a lot like Python (i.e. before converting to a [static single assignment](https://en.wikipedia.org/wiki/Static_single_assignment_form) (SSA) form. Any name lookup is also done at this stage (so TorchScript is (mostly) [statically binding](https://en.wikipedia.org/wiki/Name_resolution_(programming_languages) rather than dynamically like Python), representing objects as _Sugared Values_ in between. Finally, the JIT transforms the graph into the SSA form that you can see with `.graph`.\n",
    "\n",
    "- There is a variant of scripting that can be called directly from C++ and does not use the Python `ast` but parses Python on its own. This is used internally by `AutoDiff` but is also a neat trick to use from C++.\n",
    "\n",
    "\n",
    "## Optimization passes\n",
    "\n",
    "The JIT compiler gets us from `.graph` to what we see with `.graph_for` above by running a series of optimization (and some other) passes. This is done by the JIT's GraphExecutor (actually there are two, the \"regular\" one and the profiling one) on the first run or first few runs in the case of the profiling executor. The optimized graphs are cached along with the bytecode.\n",
    "\n",
    "There are a number of passes that work and don't mess with AutoGrad like (these are not all of them also there are analysis passes for shapes and types and such)\n",
    "\n",
    "- Eliminating dead code and common subexpressions, pre-computing things that only involve constants,\n",
    "- Pooling redundant constants into single values, and some simple \"pattern matching\" optimizations (like eliminating  `.t().t()`),\n",
    "- Unrolling small loops and batching matrix multiplications that result from unrolling loops.\n",
    "\n",
    "If the last one looks awefully special, it is, but it is quite commonly used in recurrent networks such as LSTMs with the input weights.\n",
    "\n",
    "As you might have guessed with the introduction, there are also some passes that can mess up AutoGrad and we can only do them if we do not require gradients or have taken of AutoGrad before. \n",
    "\n",
    "## Bytecode and execution\n",
    "\n",
    "Finally, the optmized graph is lowered to bytecode and run by the virtual machine. The virtual machine can also do function calls, this is used e.g. by the fallback mechanisms of the fusers. We will not deal much with this part.\n",
    "\n",
    "So this gives you a very high-level overview of what goes on in the JIT. As usual, things get complicated really soon and also the JIT is actively being worked on, making this a bit of a moving target in the details. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Excursion: GPU, efficiency, measurement\n",
    "\n",
    "Before we discuss optimization through the JIT we have to discuss measurement. In fact, one of my many informal mottos is _It's not optimization until you measure_. I'll only discuss the most basic measurement here, PyTorch offers a capable profiling facility, too.\n",
    "\n",
    "When you think about code being slow, it's important to figure out what is slow and why.\n",
    "To my mind, a lot of measurement can be done with very basic tools, e.g. IPython's `%timeit` magic.\n",
    "\n",
    "As GPU computation is and shold be ansynchronous, avoid unneeded synchronization points. Synchronization happens when the CPU waits for the GPU (to get the results).\n",
    "- Synchronizations can happen because the program needs to know something (e.g. sizes of tensors depending on\n",
    "    the input). Often, these are unavoidable.\n",
    "- Typical sources of spurious synchronizations are too frequent \n",
    "    `.to(device=\"cpu\")`, `.item()`, `.to_list()`, `print`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "If we want to time GPU kernels, we want to be sure to synchronize before taking the start and end times.\n",
    "Typically, we also want to have some \"warm-up\", i.e. run the measured function before timing.\n",
    "\n",
    "Let's take the uniformity loss from [Wang and Isola: Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere](https://arxiv.org/abs/2005.10242) (a great paper!).\n",
    "\n",
    "The Uniformity loss is defined as a function of the pairwise distances over a largish set of vectors.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(-3.9374, device='cuda:0', grad_fn=<LogBackward>)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def lunif(x, t=2): # copied from the paper\n",
    "    sq_pdist = torch.pdist(x, p=2).pow(2)\n",
    "    return sq_pdist.mul(-t).exp().mean().log()\n",
    "\n",
    "x = torch.randn(1024, 128, device=\"cuda\")\n",
    "x /= x.norm(p=2, dim=1, keepdim=True).requires_grad_()\n",
    "\n",
    "lunif(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "One would think that the specialised `pdist` function is the right tool for the job.\n",
    "But is it? Let's time it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "18.6 ms ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "def totime(fn):\n",
    "    l = fn(x)\n",
    "    g, = torch.autograd.grad(l, x)\n",
    "    torch.cuda.synchronize()\n",
    "\n",
    "totime(lunif) # warmup\n",
    "%timeit totime(lunif)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Let's use $|x-y|^2 = |x|^2 + |y|^2 - 2 <x, y>$ and compare."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.0\n",
      "2.19 ms ± 9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
     ]
    }
   ],
   "source": [
    "def lunif2(x, t=2):\n",
    "    t=2\n",
    "    xnorm = torch.norm(x, p=2, dim=1).pow(2)\n",
    "    sq_pdist = xnorm[None] + xnorm[:, None] - 2 * torch.mm(x, x.t())\n",
    "    exp = sq_pdist.mul(-t).exp().tril(diagonal=-1)\n",
    "    N = x.size(0)\n",
    "    res = exp.sum().mul(2/(N*N-N)).log()\n",
    "    return res\n",
    "\n",
    "print((lunif2(x.to(torch.double)) - lunif(x.to(torch.double))).item())\n",
    "\n",
    "totime(lunif2)\n",
    "%timeit totime(lunif2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "Even though we have stark inefficiencies (like taking tril and taking a copy to do so), this is almost an order of magnitude faster!\n",
    "\n",
    "Largely due to backward of `pdist` implementation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Optimization\n",
    "\n",
    "### But Python is slow...\n",
    "\n",
    "Uniformity loss: \"what formulas you use\" is the real bottleneck (unless you optimize pdist).\n",
    "\n",
    "The \"what do we compute\" typically should be the first optimization target.\n",
    "\n",
    "But when we fix the task (\"what\"), how can we optimize?\n",
    "\n",
    "Conventional wisdom: **Python is slow**\n",
    "\n",
    "- certainly, Python isn't fast (`for` loop vs C++ `for` loop)\n",
    "- but, if the GPU is saturated $\\Rightarrow$ Python isn't the bottleneck\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### How PyTorch programs spend their time\n",
    "\n",
    "At a very high level, you can divide time spent into these parts:\n",
    "- Python program flow,\n",
    "- Data \"administrative overhead\" (creating `Tensor` data structures, autograd `Node`s etc.),\n",
    "- Data aquisition (I/O),\n",
    "- Computation roughly as\n",
    "  - fixed overhead (kernel launches etc.),\n",
    "  - reading / writing memory,\n",
    "  - \"real computation\".\n",
    "\n",
    "**Thomas' rule of thumb**: As long as your operands are reasonably large (say 100s of elements, not single elements), Python and data \"administrative overhead\" probably isn't your main problem.\n",
    "\n",
    "So while the JIT takes away some Python overhead, this is not spectacular optimization.\n",
    "With this out of the way, let us get back to how the JIT helps us optimize things."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "## An ad-hoc graph plotter (skip this)\n",
    "\n",
    "It will be handy to draw some graphs, so here is a function that plots our graphs. It's not complete by any means, but it helps us here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "def make_graph(gr):\n",
    "    import graphviz\n",
    "    dot = graphviz.Digraph(format='svg', graph_attr={'labelloc': 't'})\n",
    "\n",
    "    nodes = {}\n",
    "    for i in gr.inputs():\n",
    "        nname = i.debugName()\n",
    "        label = nname.split('.')[0]\n",
    "        nodes[nname] = (nname, dot)\n",
    "        dot.node(nname, label, color='blue')\n",
    "\n",
    "    unseen_ops = {'prim::ListConstruct', 'aten::index', \n",
    "                  'aten::size', 'aten::slice', 'aten::unsqueeze', 'aten::squeeze',\n",
    "                  'aten::to', 'aten::view', 'aten::permute', 'aten::transpose', 'aten::contiguous',\n",
    "                  'aten::permute', 'aten::Int', 'prim::TupleUnpack', 'prim::ListUnpack', 'aten::unbind',\n",
    "                  'aten::select', 'aten::detach', 'aten::stack', 'aten::reshape', 'aten::split_with_sizes',\n",
    "                  'aten::cat', 'aten::expand', 'aten::expand_as', 'aten::_shape_as_tensor',\n",
    "                  'aten::_size_if_not_equal', 'prim::BroadcastSizes',\n",
    "                  'prim::Constant',\n",
    "                  }\n",
    "\n",
    "    def process_block(nodeit, dot):\n",
    "        firstnode = None\n",
    "        lastnode = None\n",
    "        for n in nodeit:\n",
    "            k = n.kind()\n",
    "            outs = list(n.outputs())\n",
    "            inps = list(n.inputs())\n",
    "            type_outs = [o.type().kind() for o in outs]\n",
    "            type_inps = [o.type().kind() for o in inps]\n",
    "            if k == 'prim::If':\n",
    "                label = 'If'\n",
    "                nname = outs[0].debugName()\n",
    "                for i in inps:\n",
    "                    src, srcdot = nodes.get(i.debugName(), (None, None))\n",
    "                    if src is not None:\n",
    "                        srcdot.edge(src, nname + '_in')\n",
    "                dot.node(nname + '_in', 'If', shape='diamond')\n",
    "                dot.node(nname, '', width='0.1', height='0.1')\n",
    "                dot.edge(nname + '_in', nname, style='invis')\n",
    "                nodes[nname] = (nname, dot)\n",
    "                bl = list(n.blocks())\n",
    "                for i, b in enumerate(bl):\n",
    "                    with dot.subgraph(name=f\"cluster_{nname}_{i}\", graph_attr={'label':''}) as sub_dot:\n",
    "                        firstnode, lastnode = process_block(b.nodes(), sub_dot)\n",
    "                    dot.edge(nname + '_in', firstnode, label=\"yn\"[i])\n",
    "                    dot.edge(lastnode, nname)\n",
    "                if firstnode is None:\n",
    "                    firstnode = nname + '_in'\n",
    "                lastnode = nname\n",
    "            elif k == 'prim::DifferentiableGraph':\n",
    "                label = 'DifferentiableGraph'\n",
    "                nname = outs[0].debugName()\n",
    "                nodes[nname] = (nname, dot)\n",
    "                sg = n.g('Subgraph')\n",
    "                nis = list(n.inputs())\n",
    "                sgis = list(sg.inputs())\n",
    "                assert len(nis) == len(sgis)\n",
    "                for ni, sgi in zip(nis, sgis):\n",
    "                    if ni.debugName() in nodes:\n",
    "                        nodes[sgi.debugName()] = nodes[ni.debugName()]\n",
    "                with dot.subgraph(name=f\"cluster_{nname}\", graph_attr={\n",
    "                    'label': 'DifferentiableGraph', 'labelloc':'b', 'labeljust':'r'}) as sub_dot:\n",
    "                    firstnode, lastnode = process_block(sg.nodes(), sub_dot)\n",
    "                nos = list(n.outputs())\n",
    "                sgos = list(sg.outputs())\n",
    "                assert len(nos) <= len(sgos)\n",
    "                for no, sgo in zip(nos, sgos):\n",
    "                    if sgo.debugName() in nodes:\n",
    "                        nodes[no.debugName()] = (nodes[sgo.debugName()][0], dot)\n",
    "            elif k not in unseen_ops:\n",
    "                if k == 'prim::CallFunction':\n",
    "                    label = 'call ' + next(n.inputs()).node().s(\"name\")\n",
    "                else:\n",
    "                    label = k.replace('aten::', '').replace('prim::', '')\n",
    "                nname = outs[0].debugName()\n",
    "                dot.node(nname, label, shape='box', style='rounded')\n",
    "                for o in outs:\n",
    "                    nodes[o.debugName()] = (nname, dot)\n",
    "                for i in inps:\n",
    "                    src, srcdot = nodes.get(i.debugName(), (None, None))\n",
    "                    if src is not None:\n",
    "                        srcdot.edge(src, nname)\n",
    "                if firstnode is None:\n",
    "                    firstnode = nname\n",
    "                lastnode = nname\n",
    "        return firstnode, lastnode\n",
    "\n",
    "    process_block(gr.nodes(), dot)\n",
    "    dot.node('.outputs', 'outputs', color='blue')\n",
    "    for i, o in enumerate(gr.outputs()):\n",
    "        src, srcdot = nodes.get(o.debugName(), (None, None))\n",
    "        if src is not None:\n",
    "            dot.edge(src, '.outputs')\n",
    "\n",
    "    return dot\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Holistic Optimizations - JIT fusers\n",
    "\n",
    "So currently the fuser is a hotspot of development, and PyTorch has no fewer than three fusers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on function fuser in module torch.jit._fuser:\n",
      "\n",
      "fuser(name)\n",
      "    A context manager that facilitates switching between\n",
      "    backend fusers.\n",
      "    \n",
      "    Valid names:\n",
      "    * ``fuser0`` - enables only legacy fuser\n",
      "    * ``fuser1`` - enables only NNC\n",
      "    * ``fuser2`` - enables only nvFuser\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(torch.jit.fuser)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# How the JIT optimizes pointwise operations\n",
    "\n",
    "<img src=\"iou.svg\" style=\"float: right;\" />\n",
    "\n",
    "To get a taste of how the JIT fuser works, let us look at the intersection over union ratio for detection models.\n",
    "We have a two lists of rectangles given by the top left (as x and y coordinates) and width and height.\n",
    "To measure the pairwise agreement of the $i$th rectangle in the first and in the second list.\n",
    "We do this by the intersection over union ratio which computes the areas of the intersection and the union of the two rectangles. The quotient of the two is between 0 (no agreement at all) and 1 (perfect agreement).\n",
    "\n",
    "*Sidenote*: Another prominent example of pointwise operations is in LSTMs: They can be though of as two matrix multiplications followed by a series of pointwise operations for the gates. The case of LSTMs has been a show case for the JIT\n",
    "[show case](https://lernapparat.de/fast-lstm-pytorch/) [for JIT](https://lernapparat.de/more-jit-optimizations/) [optimizations](https://pytorch.org/blog/optimizing-cuda-rnn-with-torchscript/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "def ratio_iou(x1, y1, w1, h1, x2, y2, w2, h2):\n",
    "    xi = torch.max(x1, x2)                                  # Intersection left\n",
    "    yi = torch.max(y1, y2)                                  # Intersection top\n",
    "    wi = torch.clamp(torch.min(x1+w1, x2+w2) - xi, min=0.)  # Intersection width\n",
    "    hi = torch.clamp(torch.min(y1+h1, y2+h2) - yi, min=0.)  # Intersection height\n",
    "    area_i = wi * hi                                        # Area Intersection\n",
    "    area_u = w1 * h1 + w2 * h2 - wi * hi                    # Area Union\n",
    "    return area_i / torch.clamp(area_u, min=1e-5)           # Intersection over Union\n",
    "\n",
    "# we make a scripted function\n",
    "ratio_iou_scripted = torch.jit.script(ratio_iou)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a simple enough function with elementwise computation. Let us look at the function graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.43.0 (0)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"566pt\" height=\"692pt\"\n",
       " viewBox=\"0.00 0.00 566.00 692.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 688)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-688 562,-688 562,4 -4,4\"/>\n",
       "<!-- x1.1 -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>x1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"99\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">x1</text>\n",
       "</g>\n",
       "<!-- xi.1 -->\n",
       "<g id=\"node9\" class=\"node\">\n",
       "<title>xi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M42,-612C42,-612 12,-612 12,-612 6,-612 0,-606 0,-600 0,-600 0,-588 0,-588 0,-582 6,-576 12,-576 12,-576 42,-576 42,-576 48,-576 54,-582 54,-588 54,-588 54,-600 54,-600 54,-606 48,-612 42,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">max</text>\n",
       "</g>\n",
       "<!-- x1.1&#45;&gt;xi.1 -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>x1.1&#45;&gt;xi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M84.43,-650.83C75.08,-641.75 62.7,-629.71 51.83,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"54.21,-616.57 44.6,-612.11 49.33,-621.59 54.21,-616.57\"/>\n",
       "</g>\n",
       "<!-- 17 -->\n",
       "<g id=\"node11\" class=\"node\">\n",
       "<title>17</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M186,-612C186,-612 156,-612 156,-612 150,-612 144,-606 144,-600 144,-600 144,-588 144,-588 144,-582 150,-576 156,-576 156,-576 186,-576 186,-576 192,-576 198,-582 198,-588 198,-588 198,-600 198,-600 198,-606 192,-612 186,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"171\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">add</text>\n",
       "</g>\n",
       "<!-- x1.1&#45;&gt;17 -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>x1.1&#45;&gt;17</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M113.57,-650.83C122.92,-641.75 135.3,-629.71 146.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"148.67,-621.59 153.4,-612.11 143.79,-616.57 148.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- y1.1 -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>y1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"531\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"531\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">y1</text>\n",
       "</g>\n",
       "<!-- yi.1 -->\n",
       "<g id=\"node10\" class=\"node\">\n",
       "<title>yi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M546,-612C546,-612 516,-612 516,-612 510,-612 504,-606 504,-600 504,-600 504,-588 504,-588 504,-582 510,-576 516,-576 516,-576 546,-576 546,-576 552,-576 558,-582 558,-588 558,-588 558,-600 558,-600 558,-606 552,-612 546,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"531\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">max</text>\n",
       "</g>\n",
       "<!-- y1.1&#45;&gt;yi.1 -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>y1.1&#45;&gt;yi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M531,-647.7C531,-639.98 531,-630.71 531,-622.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"534.5,-622.1 531,-612.1 527.5,-622.1 534.5,-622.1\"/>\n",
       "</g>\n",
       "<!-- 32 -->\n",
       "<g id=\"node16\" class=\"node\">\n",
       "<title>32</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M474,-612C474,-612 444,-612 444,-612 438,-612 432,-606 432,-600 432,-600 432,-588 432,-588 432,-582 438,-576 444,-576 444,-576 474,-576 474,-576 480,-576 486,-582 486,-588 486,-588 486,-600 486,-600 486,-606 480,-612 474,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"459\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">add</text>\n",
       "</g>\n",
       "<!-- y1.1&#45;&gt;32 -->\n",
       "<g id=\"edge14\" class=\"edge\">\n",
       "<title>y1.1&#45;&gt;32</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M516.43,-650.83C507.08,-641.75 494.7,-629.71 483.83,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"486.21,-616.57 476.6,-612.11 481.33,-621.59 486.21,-616.57\"/>\n",
       "</g>\n",
       "<!-- w1.1 -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>w1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"243\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"243\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">w1</text>\n",
       "</g>\n",
       "<!-- w1.1&#45;&gt;17 -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>w1.1&#45;&gt;17</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M228.43,-650.83C219.08,-641.75 206.7,-629.71 195.83,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"198.21,-616.57 188.6,-612.11 193.33,-621.59 198.21,-616.57\"/>\n",
       "</g>\n",
       "<!-- 48 -->\n",
       "<g id=\"node22\" class=\"node\">\n",
       "<title>48</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M330,-612C330,-612 300,-612 300,-612 294,-612 288,-606 288,-600 288,-600 288,-588 288,-588 288,-582 294,-576 300,-576 300,-576 330,-576 330,-576 336,-576 342,-582 342,-588 342,-588 342,-600 342,-600 342,-606 336,-612 330,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"315\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">mul</text>\n",
       "</g>\n",
       "<!-- w1.1&#45;&gt;48 -->\n",
       "<g id=\"edge25\" class=\"edge\">\n",
       "<title>w1.1&#45;&gt;48</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M257.57,-650.83C266.92,-641.75 279.3,-629.71 290.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"292.67,-621.59 297.4,-612.11 287.79,-616.57 292.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- h1.1 -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>h1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"387\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"387\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">h1</text>\n",
       "</g>\n",
       "<!-- h1.1&#45;&gt;32 -->\n",
       "<g id=\"edge15\" class=\"edge\">\n",
       "<title>h1.1&#45;&gt;32</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M401.57,-650.83C410.92,-641.75 423.3,-629.71 434.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"436.67,-621.59 441.4,-612.11 431.79,-616.57 436.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- h1.1&#45;&gt;48 -->\n",
       "<g id=\"edge26\" class=\"edge\">\n",
       "<title>h1.1&#45;&gt;48</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M372.43,-650.83C363.08,-641.75 350.7,-629.71 339.83,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"342.21,-616.57 332.6,-612.11 337.33,-621.59 342.21,-616.57\"/>\n",
       "</g>\n",
       "<!-- x2.1 -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>x2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"27\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">x2</text>\n",
       "</g>\n",
       "<!-- x2.1&#45;&gt;xi.1 -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>x2.1&#45;&gt;xi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M27,-647.7C27,-639.98 27,-630.71 27,-622.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"30.5,-622.1 27,-612.1 23.5,-622.1 30.5,-622.1\"/>\n",
       "</g>\n",
       "<!-- 21 -->\n",
       "<g id=\"node12\" class=\"node\">\n",
       "<title>21</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M114,-612C114,-612 84,-612 84,-612 78,-612 72,-606 72,-600 72,-600 72,-588 72,-588 72,-582 78,-576 84,-576 84,-576 114,-576 114,-576 120,-576 126,-582 126,-588 126,-588 126,-600 126,-600 126,-606 120,-612 114,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">add</text>\n",
       "</g>\n",
       "<!-- x2.1&#45;&gt;21 -->\n",
       "<g id=\"edge7\" class=\"edge\">\n",
       "<title>x2.1&#45;&gt;21</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M41.57,-650.83C50.92,-641.75 63.3,-629.71 74.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"76.67,-621.59 81.4,-612.11 71.79,-616.57 76.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- y2.1 -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>y2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"459\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"459\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">y2</text>\n",
       "</g>\n",
       "<!-- y2.1&#45;&gt;yi.1 -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>y2.1&#45;&gt;yi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M473.57,-650.83C482.92,-641.75 495.3,-629.71 506.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"508.67,-621.59 513.4,-612.11 503.79,-616.57 508.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- 36 -->\n",
       "<g id=\"node17\" class=\"node\">\n",
       "<title>36</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M402,-612C402,-612 372,-612 372,-612 366,-612 360,-606 360,-600 360,-600 360,-588 360,-588 360,-582 366,-576 372,-576 372,-576 402,-576 402,-576 408,-576 414,-582 414,-588 414,-588 414,-600 414,-600 414,-606 408,-612 402,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"387\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">add</text>\n",
       "</g>\n",
       "<!-- y2.1&#45;&gt;36 -->\n",
       "<g id=\"edge16\" class=\"edge\">\n",
       "<title>y2.1&#45;&gt;36</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M444.43,-650.83C435.08,-641.75 422.7,-629.71 411.83,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"414.21,-616.57 404.6,-612.11 409.33,-621.59 414.21,-616.57\"/>\n",
       "</g>\n",
       "<!-- w2.1 -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>w2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"171\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"171\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">w2</text>\n",
       "</g>\n",
       "<!-- w2.1&#45;&gt;21 -->\n",
       "<g id=\"edge8\" class=\"edge\">\n",
       "<title>w2.1&#45;&gt;21</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M156.43,-650.83C147.08,-641.75 134.7,-629.71 123.83,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"126.21,-616.57 116.6,-612.11 121.33,-621.59 126.21,-616.57\"/>\n",
       "</g>\n",
       "<!-- 51 -->\n",
       "<g id=\"node23\" class=\"node\">\n",
       "<title>51</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M258,-612C258,-612 228,-612 228,-612 222,-612 216,-606 216,-600 216,-600 216,-588 216,-588 216,-582 222,-576 228,-576 228,-576 258,-576 258,-576 264,-576 270,-582 270,-588 270,-588 270,-600 270,-600 270,-606 264,-612 258,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"243\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">mul</text>\n",
       "</g>\n",
       "<!-- w2.1&#45;&gt;51 -->\n",
       "<g id=\"edge27\" class=\"edge\">\n",
       "<title>w2.1&#45;&gt;51</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M185.57,-650.83C194.92,-641.75 207.3,-629.71 218.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"220.67,-621.59 225.4,-612.11 215.79,-616.57 220.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- h2.1 -->\n",
       "<g id=\"node8\" class=\"node\">\n",
       "<title>h2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"315\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"315\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">h2</text>\n",
       "</g>\n",
       "<!-- h2.1&#45;&gt;36 -->\n",
       "<g id=\"edge17\" class=\"edge\">\n",
       "<title>h2.1&#45;&gt;36</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M329.57,-650.83C338.92,-641.75 351.3,-629.71 362.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"364.67,-621.59 369.4,-612.11 359.79,-616.57 364.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- h2.1&#45;&gt;51 -->\n",
       "<g id=\"edge28\" class=\"edge\">\n",
       "<title>h2.1&#45;&gt;51</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M300.43,-650.83C291.08,-641.75 278.7,-629.71 267.83,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"270.21,-616.57 260.6,-612.11 265.33,-621.59 270.21,-616.57\"/>\n",
       "</g>\n",
       "<!-- 25 -->\n",
       "<g id=\"node14\" class=\"node\">\n",
       "<title>25</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M136,-468C136,-468 106,-468 106,-468 100,-468 94,-462 94,-456 94,-456 94,-444 94,-444 94,-438 100,-432 106,-432 106,-432 136,-432 136,-432 142,-432 148,-438 148,-444 148,-444 148,-456 148,-456 148,-462 142,-468 136,-468\"/>\n",
       "<text text-anchor=\"middle\" x=\"121\" y=\"-446.3\" font-family=\"Times,serif\" font-size=\"14.00\">sub</text>\n",
       "</g>\n",
       "<!-- xi.1&#45;&gt;25 -->\n",
       "<g id=\"edge12\" class=\"edge\">\n",
       "<title>xi.1&#45;&gt;25</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M38.02,-575.87C49.62,-557.91 68.45,-528.87 85,-504 91.01,-494.97 97.66,-485.13 103.6,-476.41\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"106.5,-478.36 109.24,-468.13 100.72,-474.42 106.5,-478.36\"/>\n",
       "</g>\n",
       "<!-- 40 -->\n",
       "<g id=\"node19\" class=\"node\">\n",
       "<title>40</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M420,-468C420,-468 390,-468 390,-468 384,-468 378,-462 378,-456 378,-456 378,-444 378,-444 378,-438 384,-432 390,-432 390,-432 420,-432 420,-432 426,-432 432,-438 432,-444 432,-444 432,-456 432,-456 432,-462 426,-468 420,-468\"/>\n",
       "<text text-anchor=\"middle\" x=\"405\" y=\"-446.3\" font-family=\"Times,serif\" font-size=\"14.00\">sub</text>\n",
       "</g>\n",
       "<!-- yi.1&#45;&gt;40 -->\n",
       "<g id=\"edge21\" class=\"edge\">\n",
       "<title>yi.1&#45;&gt;40</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M515.8,-575.87C493.66,-550.92 452.43,-504.45 426.98,-475.77\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"429.51,-473.35 420.25,-468.19 424.27,-477.99 429.51,-473.35\"/>\n",
       "</g>\n",
       "<!-- 22 -->\n",
       "<g id=\"node13\" class=\"node\">\n",
       "<title>22</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M136,-540C136,-540 106,-540 106,-540 100,-540 94,-534 94,-528 94,-528 94,-516 94,-516 94,-510 100,-504 106,-504 106,-504 136,-504 136,-504 142,-504 148,-510 148,-516 148,-516 148,-528 148,-528 148,-534 142,-540 136,-540\"/>\n",
       "<text text-anchor=\"middle\" x=\"121\" y=\"-518.3\" font-family=\"Times,serif\" font-size=\"14.00\">min</text>\n",
       "</g>\n",
       "<!-- 17&#45;&gt;22 -->\n",
       "<g id=\"edge9\" class=\"edge\">\n",
       "<title>17&#45;&gt;22</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M158.64,-575.7C152.77,-567.47 145.63,-557.48 139.16,-548.42\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"141.88,-546.21 133.22,-540.1 136.18,-550.28 141.88,-546.21\"/>\n",
       "</g>\n",
       "<!-- 21&#45;&gt;22 -->\n",
       "<g id=\"edge10\" class=\"edge\">\n",
       "<title>21&#45;&gt;22</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M104.44,-575.7C106.89,-567.9 109.84,-558.51 112.57,-549.83\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"115.96,-550.69 115.62,-540.1 109.29,-548.59 115.96,-550.69\"/>\n",
       "</g>\n",
       "<!-- 22&#45;&gt;25 -->\n",
       "<g id=\"edge11\" class=\"edge\">\n",
       "<title>22&#45;&gt;25</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M121,-503.7C121,-495.98 121,-486.71 121,-478.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"124.5,-478.1 121,-468.1 117.5,-478.1 124.5,-478.1\"/>\n",
       "</g>\n",
       "<!-- wi.1 -->\n",
       "<g id=\"node15\" class=\"node\">\n",
       "<title>wi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M179.5,-396C179.5,-396 144.5,-396 144.5,-396 138.5,-396 132.5,-390 132.5,-384 132.5,-384 132.5,-372 132.5,-372 132.5,-366 138.5,-360 144.5,-360 144.5,-360 179.5,-360 179.5,-360 185.5,-360 191.5,-366 191.5,-372 191.5,-372 191.5,-384 191.5,-384 191.5,-390 185.5,-396 179.5,-396\"/>\n",
       "<text text-anchor=\"middle\" x=\"162\" y=\"-374.3\" font-family=\"Times,serif\" font-size=\"14.00\">clamp</text>\n",
       "</g>\n",
       "<!-- 25&#45;&gt;wi.1 -->\n",
       "<g id=\"edge13\" class=\"edge\">\n",
       "<title>25&#45;&gt;wi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M131.13,-431.7C135.85,-423.64 141.56,-413.89 146.78,-404.98\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"149.95,-406.5 151.98,-396.1 143.91,-402.96 149.95,-406.5\"/>\n",
       "</g>\n",
       "<!-- area_i.1 -->\n",
       "<g id=\"node21\" class=\"node\">\n",
       "<title>area_i.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M182,-324C182,-324 152,-324 152,-324 146,-324 140,-318 140,-312 140,-312 140,-300 140,-300 140,-294 146,-288 152,-288 152,-288 182,-288 182,-288 188,-288 194,-294 194,-300 194,-300 194,-312 194,-312 194,-318 188,-324 182,-324\"/>\n",
       "<text text-anchor=\"middle\" x=\"167\" y=\"-302.3\" font-family=\"Times,serif\" font-size=\"14.00\">mul</text>\n",
       "</g>\n",
       "<!-- wi.1&#45;&gt;area_i.1 -->\n",
       "<g id=\"edge23\" class=\"edge\">\n",
       "<title>wi.1&#45;&gt;area_i.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M163.24,-359.7C163.79,-351.98 164.45,-342.71 165.06,-334.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"168.56,-334.33 165.78,-324.1 161.57,-333.83 168.56,-334.33\"/>\n",
       "</g>\n",
       "<!-- 56 -->\n",
       "<g id=\"node25\" class=\"node\">\n",
       "<title>56</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M254,-324C254,-324 224,-324 224,-324 218,-324 212,-318 212,-312 212,-312 212,-300 212,-300 212,-294 218,-288 224,-288 224,-288 254,-288 254,-288 260,-288 266,-294 266,-300 266,-300 266,-312 266,-312 266,-318 260,-324 254,-324\"/>\n",
       "<text text-anchor=\"middle\" x=\"239\" y=\"-302.3\" font-family=\"Times,serif\" font-size=\"14.00\">mul</text>\n",
       "</g>\n",
       "<!-- wi.1&#45;&gt;56 -->\n",
       "<g id=\"edge31\" class=\"edge\">\n",
       "<title>wi.1&#45;&gt;56</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M181.03,-359.7C190.54,-351.05 202.21,-340.45 212.57,-331.03\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"215.14,-333.42 220.19,-324.1 210.43,-328.24 215.14,-333.42\"/>\n",
       "</g>\n",
       "<!-- 37 -->\n",
       "<g id=\"node18\" class=\"node\">\n",
       "<title>37</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M420,-540C420,-540 390,-540 390,-540 384,-540 378,-534 378,-528 378,-528 378,-516 378,-516 378,-510 384,-504 390,-504 390,-504 420,-504 420,-504 426,-504 432,-510 432,-516 432,-516 432,-528 432,-528 432,-534 426,-540 420,-540\"/>\n",
       "<text text-anchor=\"middle\" x=\"405\" y=\"-518.3\" font-family=\"Times,serif\" font-size=\"14.00\">min</text>\n",
       "</g>\n",
       "<!-- 32&#45;&gt;37 -->\n",
       "<g id=\"edge18\" class=\"edge\">\n",
       "<title>32&#45;&gt;37</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M445.65,-575.7C439.24,-567.39 431.44,-557.28 424.39,-548.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"427.07,-545.88 418.19,-540.1 421.53,-550.16 427.07,-545.88\"/>\n",
       "</g>\n",
       "<!-- 36&#45;&gt;37 -->\n",
       "<g id=\"edge19\" class=\"edge\">\n",
       "<title>36&#45;&gt;37</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M391.45,-575.7C393.45,-567.9 395.87,-558.51 398.1,-549.83\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"401.5,-550.66 400.6,-540.1 394.72,-548.92 401.5,-550.66\"/>\n",
       "</g>\n",
       "<!-- 37&#45;&gt;40 -->\n",
       "<g id=\"edge20\" class=\"edge\">\n",
       "<title>37&#45;&gt;40</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M405,-503.7C405,-495.98 405,-486.71 405,-478.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"408.5,-478.1 405,-468.1 401.5,-478.1 408.5,-478.1\"/>\n",
       "</g>\n",
       "<!-- hi.1 -->\n",
       "<g id=\"node20\" class=\"node\">\n",
       "<title>hi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M256.5,-396C256.5,-396 221.5,-396 221.5,-396 215.5,-396 209.5,-390 209.5,-384 209.5,-384 209.5,-372 209.5,-372 209.5,-366 215.5,-360 221.5,-360 221.5,-360 256.5,-360 256.5,-360 262.5,-360 268.5,-366 268.5,-372 268.5,-372 268.5,-384 268.5,-384 268.5,-390 262.5,-396 256.5,-396\"/>\n",
       "<text text-anchor=\"middle\" x=\"239\" y=\"-374.3\" font-family=\"Times,serif\" font-size=\"14.00\">clamp</text>\n",
       "</g>\n",
       "<!-- 40&#45;&gt;hi.1 -->\n",
       "<g id=\"edge22\" class=\"edge\">\n",
       "<title>40&#45;&gt;hi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M377.96,-437.6C350.85,-426.17 308.84,-408.45 278.1,-395.49\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"279.12,-392.12 268.55,-391.46 276.4,-398.57 279.12,-392.12\"/>\n",
       "</g>\n",
       "<!-- hi.1&#45;&gt;area_i.1 -->\n",
       "<g id=\"edge24\" class=\"edge\">\n",
       "<title>hi.1&#45;&gt;area_i.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M221.2,-359.7C212.4,-351.14 201.62,-340.66 192,-331.3\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"194.2,-328.57 184.59,-324.1 189.32,-333.58 194.2,-328.57\"/>\n",
       "</g>\n",
       "<!-- hi.1&#45;&gt;56 -->\n",
       "<g id=\"edge32\" class=\"edge\">\n",
       "<title>hi.1&#45;&gt;56</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M239,-359.7C239,-351.98 239,-342.71 239,-334.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"242.5,-334.1 239,-324.1 235.5,-334.1 242.5,-334.1\"/>\n",
       "</g>\n",
       "<!-- 64 -->\n",
       "<g id=\"node28\" class=\"node\">\n",
       "<title>64</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M215,-108C215,-108 185,-108 185,-108 179,-108 173,-102 173,-96 173,-96 173,-84 173,-84 173,-78 179,-72 185,-72 185,-72 215,-72 215,-72 221,-72 227,-78 227,-84 227,-84 227,-96 227,-96 227,-102 221,-108 215,-108\"/>\n",
       "<text text-anchor=\"middle\" x=\"200\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">div</text>\n",
       "</g>\n",
       "<!-- area_i.1&#45;&gt;64 -->\n",
       "<g id=\"edge36\" class=\"edge\">\n",
       "<title>area_i.1&#45;&gt;64</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M169.64,-287.85C175.35,-250.83 188.87,-163.18 195.78,-118.39\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"199.28,-118.65 197.34,-108.23 192.36,-117.58 199.28,-118.65\"/>\n",
       "</g>\n",
       "<!-- 53 -->\n",
       "<g id=\"node24\" class=\"node\">\n",
       "<title>53</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M312,-540C312,-540 282,-540 282,-540 276,-540 270,-534 270,-528 270,-528 270,-516 270,-516 270,-510 276,-504 282,-504 282,-504 312,-504 312,-504 318,-504 324,-510 324,-516 324,-516 324,-528 324,-528 324,-534 318,-540 312,-540\"/>\n",
       "<text text-anchor=\"middle\" x=\"297\" y=\"-518.3\" font-family=\"Times,serif\" font-size=\"14.00\">add</text>\n",
       "</g>\n",
       "<!-- 48&#45;&gt;53 -->\n",
       "<g id=\"edge29\" class=\"edge\">\n",
       "<title>48&#45;&gt;53</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M310.55,-575.7C308.55,-567.9 306.13,-558.51 303.9,-549.83\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"307.28,-548.92 301.4,-540.1 300.5,-550.66 307.28,-548.92\"/>\n",
       "</g>\n",
       "<!-- 51&#45;&gt;53 -->\n",
       "<g id=\"edge30\" class=\"edge\">\n",
       "<title>51&#45;&gt;53</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M256.35,-575.7C262.76,-567.39 270.56,-557.28 277.61,-548.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"280.47,-550.16 283.81,-540.1 274.93,-545.88 280.47,-550.16\"/>\n",
       "</g>\n",
       "<!-- area_u.1 -->\n",
       "<g id=\"node26\" class=\"node\">\n",
       "<title>area_u.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M254,-252C254,-252 224,-252 224,-252 218,-252 212,-246 212,-240 212,-240 212,-228 212,-228 212,-222 218,-216 224,-216 224,-216 254,-216 254,-216 260,-216 266,-222 266,-228 266,-228 266,-240 266,-240 266,-246 260,-252 254,-252\"/>\n",
       "<text text-anchor=\"middle\" x=\"239\" y=\"-230.3\" font-family=\"Times,serif\" font-size=\"14.00\">sub</text>\n",
       "</g>\n",
       "<!-- 53&#45;&gt;area_u.1 -->\n",
       "<g id=\"edge33\" class=\"edge\">\n",
       "<title>53&#45;&gt;area_u.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M298.22,-504C300.44,-464.66 302.65,-364.85 275,-288 271.53,-278.37 265.91,-268.81 260.12,-260.52\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"262.87,-258.36 254.12,-252.39 257.24,-262.51 262.87,-258.36\"/>\n",
       "</g>\n",
       "<!-- 56&#45;&gt;area_u.1 -->\n",
       "<g id=\"edge34\" class=\"edge\">\n",
       "<title>56&#45;&gt;area_u.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M239,-287.7C239,-279.98 239,-270.71 239,-262.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"242.5,-262.1 239,-252.1 235.5,-262.1 242.5,-262.1\"/>\n",
       "</g>\n",
       "<!-- 63 -->\n",
       "<g id=\"node27\" class=\"node\">\n",
       "<title>63</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M253.5,-180C253.5,-180 218.5,-180 218.5,-180 212.5,-180 206.5,-174 206.5,-168 206.5,-168 206.5,-156 206.5,-156 206.5,-150 212.5,-144 218.5,-144 218.5,-144 253.5,-144 253.5,-144 259.5,-144 265.5,-150 265.5,-156 265.5,-156 265.5,-168 265.5,-168 265.5,-174 259.5,-180 253.5,-180\"/>\n",
       "<text text-anchor=\"middle\" x=\"236\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">clamp</text>\n",
       "</g>\n",
       "<!-- area_u.1&#45;&gt;63 -->\n",
       "<g id=\"edge35\" class=\"edge\">\n",
       "<title>area_u.1&#45;&gt;63</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M238.26,-215.7C237.93,-207.98 237.53,-198.71 237.16,-190.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"240.66,-189.95 236.73,-180.1 233.66,-190.25 240.66,-189.95\"/>\n",
       "</g>\n",
       "<!-- 63&#45;&gt;64 -->\n",
       "<g id=\"edge37\" class=\"edge\">\n",
       "<title>63&#45;&gt;64</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M227.1,-143.7C223,-135.73 218.05,-126.1 213.51,-117.26\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"216.48,-115.4 208.8,-108.1 210.26,-118.6 216.48,-115.4\"/>\n",
       "</g>\n",
       "<!-- .outputs -->\n",
       "<g id=\"node29\" class=\"node\">\n",
       "<title>.outputs</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"200\" cy=\"-18\" rx=\"46.29\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"200\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">outputs</text>\n",
       "</g>\n",
       "<!-- 64&#45;&gt;.outputs -->\n",
       "<g id=\"edge38\" class=\"edge\">\n",
       "<title>64&#45;&gt;.outputs</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M200,-71.7C200,-63.98 200,-54.71 200,-46.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"203.5,-46.1 200,-36.1 196.5,-46.1 203.5,-46.1\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7ff26ddcc9d0>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "make_graph(ratio_iou_scripted.graph)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "It is not complex as code, but it has quite a few operations. Now, in terms of execution, every of these ops launches a kernel (a function run on the GPU) that does three things:\n",
    "\n",
    "- Load the inputs (from the incoming edges) from memory,\n",
    "- compute the output,\n",
    "- store the result.\n",
    "\n",
    "These are 37 times loading inputs and 20 times storing outputs with only trivial computation.\n",
    "Clearly this is heavily limited by the memory transfers, even if we can get helped by caching.\n",
    "\n",
    "What if we could make it all into one large kernel and have 8 loads and 1 store?\n",
    "\n",
    "This is exactly what a fuser does and it does give us a good speedup:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "161 µs ± 938 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n",
      "38.2 µs ± 485 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
     ]
    }
   ],
   "source": [
    "x1, y1, w1, h1, x2, y2, w2, h2 = torch.randn(8, 100, 1000, device='cuda').exp()\n",
    "\n",
    "def take_time(fn):\n",
    "    _ = fn(x1, y1, w1, h1, x2, y2, w2, h2)\n",
    "    torch.cuda.synchronize()\n",
    "\n",
    "take_time(ratio_iou) # warmup\n",
    "%timeit take_time(ratio_iou)\n",
    "\n",
    "for i in range(2):\n",
    "    take_time(ratio_iou_scripted)\n",
    "%timeit take_time(ratio_iou_scripted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We can see in the graph specialised for the inputs which operations are fused:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "scrolled": false,
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.43.0 (0)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"595pt\" height=\"394pt\"\n",
       " viewBox=\"0.00 0.00 595.00 394.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 390)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-390 591,-390 591,4 -4,4\"/>\n",
       "<g id=\"clust1\" class=\"cluster\">\n",
       "<title>cluster_121_0</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"419,-109 419,-161 579,-161 579,-109 419,-109\"/>\n",
       "</g>\n",
       "<g id=\"clust2\" class=\"cluster\">\n",
       "<title>cluster_121_1</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"140,-109 140,-161 322,-161 322,-109 140,-109\"/>\n",
       "</g>\n",
       "<!-- x1.1 -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>x1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"27\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">x1</text>\n",
       "</g>\n",
       "<!-- 112 -->\n",
       "<g id=\"node9\" class=\"node\">\n",
       "<title>112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M534.5,-313C534.5,-313 463.5,-313 463.5,-313 457.5,-313 451.5,-307 451.5,-301 451.5,-301 451.5,-289 451.5,-289 451.5,-283 457.5,-277 463.5,-277 463.5,-277 534.5,-277 534.5,-277 540.5,-277 546.5,-283 546.5,-289 546.5,-289 546.5,-301 546.5,-301 546.5,-307 540.5,-313 534.5,-313\"/>\n",
       "<text text-anchor=\"middle\" x=\"499\" y=\"-291.3\" font-family=\"Times,serif\" font-size=\"14.00\">TypeCheck</text>\n",
       "</g>\n",
       "<!-- x1.1&#45;&gt;112 -->\n",
       "<g id=\"edge8\" class=\"edge\">\n",
       "<title>x1.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M47.5,-355.95C52.46,-353.67 57.82,-351.5 63,-350 194.68,-311.82 356.4,-300.63 441.05,-297.36\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"441.37,-300.85 451.23,-296.99 441.11,-293.85 441.37,-300.85\"/>\n",
       "</g>\n",
       "<!-- 147 -->\n",
       "<g id=\"node13\" class=\"node\">\n",
       "<title>147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M302.5,-153C302.5,-153 159.5,-153 159.5,-153 153.5,-153 147.5,-147 147.5,-141 147.5,-141 147.5,-129 147.5,-129 147.5,-123 153.5,-117 159.5,-117 159.5,-117 302.5,-117 302.5,-117 308.5,-117 314.5,-123 314.5,-129 314.5,-129 314.5,-141 314.5,-141 314.5,-147 308.5,-153 302.5,-153\"/>\n",
       "<text text-anchor=\"middle\" x=\"231\" y=\"-131.3\" font-family=\"Times,serif\" font-size=\"14.00\">call fallback_function</text>\n",
       "</g>\n",
       "<!-- x1.1&#45;&gt;147 -->\n",
       "<g id=\"edge28\" class=\"edge\">\n",
       "<title>x1.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M33.93,-350.47C50.42,-313.11 95.55,-221.46 161,-171 167.2,-166.22 174.16,-161.81 181.23,-157.86\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"182.99,-160.89 190.17,-153.1 179.7,-154.71 182.99,-160.89\"/>\n",
       "</g>\n",
       "<!-- y1.1 -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>y1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"243\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"243\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">y1</text>\n",
       "</g>\n",
       "<!-- y1.1&#45;&gt;112 -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>y1.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M263.64,-356.39C268.59,-354.09 273.92,-351.8 279,-350 332.93,-330.87 396.62,-316.04 441.56,-306.8\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"442.31,-310.22 451.42,-304.8 440.92,-303.36 442.31,-310.22\"/>\n",
       "</g>\n",
       "<!-- y1.1&#45;&gt;147 -->\n",
       "<g id=\"edge26\" class=\"edge\">\n",
       "<title>y1.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M221.87,-356.3C212.76,-350.47 203.04,-342.31 198,-332 166.56,-267.72 171.59,-237.5 198,-171 199.38,-167.54 201.27,-164.22 203.47,-161.1\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"206.2,-163.29 209.83,-153.34 200.79,-158.85 206.2,-163.29\"/>\n",
       "</g>\n",
       "<!-- w1.1 -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>w1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"99\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">w1</text>\n",
       "</g>\n",
       "<!-- w1.1&#45;&gt;112 -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>w1.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M119.51,-355.97C124.46,-353.69 129.83,-351.52 135,-350 194.35,-332.58 211.79,-340.87 273,-332 330.39,-323.69 395.81,-313.15 441.44,-305.63\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"442.15,-309.06 451.44,-303.98 441.01,-302.15 442.15,-309.06\"/>\n",
       "</g>\n",
       "<!-- w1.1&#45;&gt;147 -->\n",
       "<g id=\"edge23\" class=\"edge\">\n",
       "<title>w1.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M101.98,-349.98C109.08,-313.56 129.97,-226.87 176,-171 179.55,-166.69 183.78,-162.74 188.29,-159.17\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"190.56,-161.85 196.63,-153.16 186.47,-156.16 190.56,-161.85\"/>\n",
       "</g>\n",
       "<!-- h1.1 -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>h1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"171\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"171\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">h1</text>\n",
       "</g>\n",
       "<!-- h1.1&#45;&gt;112 -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>h1.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M191.57,-356.17C196.52,-353.88 201.87,-351.66 207,-350 218.1,-346.42 359.4,-320.92 441.29,-306.28\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"441.99,-309.71 451.21,-304.51 440.75,-302.82 441.99,-309.71\"/>\n",
       "</g>\n",
       "<!-- h1.1&#45;&gt;147 -->\n",
       "<g id=\"edge24\" class=\"edge\">\n",
       "<title>h1.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M169.04,-349.59C166.14,-319.48 162.25,-255.9 174,-204 177.56,-188.29 179.23,-183.81 189,-171 191.94,-167.15 195.38,-163.44 199,-159.97\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"201.44,-162.48 206.58,-153.22 196.78,-157.25 201.44,-162.48\"/>\n",
       "</g>\n",
       "<!-- x2.1 -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>x2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"315\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"315\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">x2</text>\n",
       "</g>\n",
       "<!-- x2.1&#45;&gt;112 -->\n",
       "<g id=\"edge7\" class=\"edge\">\n",
       "<title>x2.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M336.18,-356.61C341.01,-354.34 346.15,-352.02 351,-350 380.58,-337.65 414.09,-325.29 441.73,-315.52\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"442.98,-318.79 451.25,-312.18 440.66,-312.19 442.98,-318.79\"/>\n",
       "</g>\n",
       "<!-- x2.1&#45;&gt;147 -->\n",
       "<g id=\"edge27\" class=\"edge\">\n",
       "<title>x2.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M293.86,-356.53C289.02,-354.27 283.87,-351.97 279,-350 255.94,-340.67 240.49,-352.22 226,-332 189.81,-281.49 207.77,-203.37 221.06,-162.68\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"224.38,-163.78 224.3,-153.19 217.76,-161.52 224.38,-163.78\"/>\n",
       "</g>\n",
       "<!-- y2.1 -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>y2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"387\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"387\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">y2</text>\n",
       "</g>\n",
       "<!-- y2.1&#45;&gt;112 -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>y2.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M406,-354.96C421.69,-345.01 444.41,-330.61 463.46,-318.53\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"465.34,-321.48 471.92,-313.17 461.6,-315.57 465.34,-321.48\"/>\n",
       "</g>\n",
       "<!-- y2.1&#45;&gt;147 -->\n",
       "<g id=\"edge25\" class=\"edge\">\n",
       "<title>y2.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M365.55,-356.77C347.89,-347.45 323.21,-332.21 307,-313 268.83,-267.75 247.1,-200.44 237.34,-163.22\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"240.63,-161.97 234.79,-153.13 233.84,-163.68 240.63,-161.97\"/>\n",
       "</g>\n",
       "<!-- w2.1 -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>w2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"459\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"459\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">w2</text>\n",
       "</g>\n",
       "<!-- w2.1&#45;&gt;112 -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>w2.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M468.08,-350.89C472.9,-342.33 478.94,-331.61 484.4,-321.91\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"487.54,-323.47 489.4,-313.04 481.44,-320.03 487.54,-323.47\"/>\n",
       "</g>\n",
       "<!-- w2.1&#45;&gt;147 -->\n",
       "<g id=\"edge21\" class=\"edge\">\n",
       "<title>w2.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M441.05,-354.23C426.42,-343.62 405.42,-327.9 388,-313 352.41,-282.55 343.36,-274.79 312,-240 288.97,-214.45 265.06,-183.06 249.25,-161.48\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"251.91,-159.19 243.19,-153.17 246.25,-163.31 251.91,-159.19\"/>\n",
       "</g>\n",
       "<!-- h2.1 -->\n",
       "<g id=\"node8\" class=\"node\">\n",
       "<title>h2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"531\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"531\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">h2</text>\n",
       "</g>\n",
       "<!-- h2.1&#45;&gt;112 -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>h2.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M523.58,-350.53C519.85,-342.26 515.24,-332.04 511.03,-322.68\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"514.13,-321.05 506.83,-313.37 507.75,-323.92 514.13,-321.05\"/>\n",
       "</g>\n",
       "<!-- h2.1&#45;&gt;147 -->\n",
       "<g id=\"edge22\" class=\"edge\">\n",
       "<title>h2.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M511.11,-355.49C493.06,-344.88 465.86,-328.48 443,-313 398.03,-282.54 385.58,-276.1 345,-240 328.56,-225.37 326.46,-219.67 311,-204 296.21,-189.01 292.96,-184.73 277,-171 272.52,-167.14 267.66,-163.21 262.84,-159.43\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"264.89,-156.6 254.83,-153.28 260.62,-162.15 264.89,-156.6\"/>\n",
       "</g>\n",
       "<!-- 121_in -->\n",
       "<g id=\"node10\" class=\"node\">\n",
       "<title>121_in</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"381,-240 354,-222 381,-204 408,-222 381,-240\"/>\n",
       "<text text-anchor=\"middle\" x=\"381\" y=\"-218.3\" font-family=\"Times,serif\" font-size=\"14.00\">If</text>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;121_in -->\n",
       "<g id=\"edge9\" class=\"edge\">\n",
       "<title>112&#45;&gt;121_in</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M455.46,-276.9C444.21,-271.76 432.36,-265.69 422,-259 413.76,-253.68 405.49,-246.78 398.55,-240.44\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"400.77,-237.72 391.1,-233.39 395.96,-242.81 400.77,-237.72\"/>\n",
       "</g>\n",
       "<!-- 68 -->\n",
       "<g id=\"node12\" class=\"node\">\n",
       "<title>68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M558.5,-153C558.5,-153 439.5,-153 439.5,-153 433.5,-153 427.5,-147 427.5,-141 427.5,-141 427.5,-129 427.5,-129 427.5,-123 433.5,-117 439.5,-117 439.5,-117 558.5,-117 558.5,-117 564.5,-117 570.5,-123 570.5,-129 570.5,-129 570.5,-141 570.5,-141 570.5,-147 564.5,-153 558.5,-153\"/>\n",
       "<text text-anchor=\"middle\" x=\"499\" y=\"-131.3\" font-family=\"Times,serif\" font-size=\"14.00\">TensorExprGroup</text>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge11\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M468.19,-276.79C448.23,-249.13 446.45,-194.59 462.82,-162.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"466.04,-163.56 468.2,-153.19 460.04,-159.95 466.04,-163.56\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge12\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M476.99,-276.79C462.8,-249.25 461.47,-195.06 473.01,-162.53\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"476.29,-163.76 477,-153.19 469.86,-161.01 476.29,-163.76\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge13\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M485.8,-276.79C477.32,-249.37 476.49,-195.53 483.32,-162.96\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"486.73,-163.74 485.8,-153.19 479.94,-162.02 486.73,-163.74\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge14\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M494.6,-276.79C491.78,-249.48 491.5,-195.99 493.74,-163.38\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"497.25,-163.45 494.6,-153.19 490.27,-162.86 497.25,-163.45\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge15\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M503.4,-276.79C506.22,-249.48 506.5,-195.99 504.26,-163.38\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"507.73,-162.86 503.4,-153.19 500.75,-163.45 507.73,-162.86\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge16\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M512.2,-276.79C520.68,-249.37 521.51,-195.53 514.68,-162.96\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"518.06,-162.02 512.2,-153.19 511.27,-163.74 518.06,-162.02\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge17\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M521.01,-276.79C535.2,-249.25 536.53,-195.06 524.99,-162.53\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"528.14,-161.01 521,-153.19 521.71,-163.76 528.14,-161.01\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge18\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M529.81,-276.79C549.77,-249.13 551.55,-194.59 535.18,-162.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"537.96,-159.95 529.8,-153.19 531.96,-163.56 537.96,-159.95\"/>\n",
       "</g>\n",
       "<!-- 121 -->\n",
       "<g id=\"node11\" class=\"node\">\n",
       "<title>121</title>\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"379\" cy=\"-76.5\" rx=\"3.5\" ry=\"3.5\"/>\n",
       "</g>\n",
       "<!-- 121_in&#45;&gt;121 -->\n",
       "<!-- 121_in&#45;&gt;68 -->\n",
       "<g id=\"edge19\" class=\"edge\">\n",
       "<title>121_in&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M386.8,-207.47C392.17,-196.35 401.12,-180.89 413,-171 419.1,-165.93 426.03,-161.51 433.25,-157.68\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"435.07,-160.69 442.51,-153.15 431.99,-154.4 435.07,-160.69\"/>\n",
       "<text text-anchor=\"middle\" x=\"417.5\" y=\"-174.8\" font-family=\"Times,serif\" font-size=\"14.00\">y</text>\n",
       "</g>\n",
       "<!-- 121_in&#45;&gt;147 -->\n",
       "<g id=\"edge29\" class=\"edge\">\n",
       "<title>121_in&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M372.43,-209.57C363.46,-198.33 348.43,-181.4 332,-171 323.39,-165.54 313.82,-160.82 304.1,-156.76\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"305.35,-153.49 294.77,-153.07 302.78,-160 305.35,-153.49\"/>\n",
       "<text text-anchor=\"middle\" x=\"355\" y=\"-174.8\" font-family=\"Times,serif\" font-size=\"14.00\">n</text>\n",
       "</g>\n",
       "<!-- .outputs -->\n",
       "<g id=\"node14\" class=\"node\">\n",
       "<title>.outputs</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"379\" cy=\"-18\" rx=\"46.29\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"379\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">outputs</text>\n",
       "</g>\n",
       "<!-- 121&#45;&gt;.outputs -->\n",
       "<g id=\"edge31\" class=\"edge\">\n",
       "<title>121&#45;&gt;.outputs</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M379,-72.77C379,-67.83 379,-56.9 379,-46.17\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"382.5,-46.15 379,-36.15 375.5,-46.15 382.5,-46.15\"/>\n",
       "</g>\n",
       "<!-- 68&#45;&gt;121 -->\n",
       "<g id=\"edge20\" class=\"edge\">\n",
       "<title>68&#45;&gt;121</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M462.67,-116.89C438.25,-105.4 407.59,-90.96 391,-83.15\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"392.24,-79.86 381.7,-78.77 389.25,-86.2 392.24,-79.86\"/>\n",
       "</g>\n",
       "<!-- 147&#45;&gt;121 -->\n",
       "<g id=\"edge30\" class=\"edge\">\n",
       "<title>147&#45;&gt;121</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M275.81,-116.89C307.09,-104.95 346.66,-89.85 366.48,-82.28\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"367.93,-85.47 376.02,-78.64 365.43,-78.93 367.93,-85.47\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7ff26ddd7940>"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "make_graph(ratio_iou_scripted.graph_for(x1, y1, w1, h1, x2, y2, w2, h2))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Aha, so we do some type check and if that returns OK, we run a `TensorExprGroup`, which will be executed as one kernel. We keep a fallback just in case.\n",
    "In the text representation, we can actually see the `TensorExprGroup` and we can see which operations are fused:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "graph(%x1.1 : Tensor,\n",
       "      %y1.1 : Tensor,\n",
       "      %w1.1 : Tensor,\n",
       "      %h1.1 : Tensor,\n",
       "      %x2.1 : Tensor,\n",
       "      %y2.1 : Tensor,\n",
       "      %w2.1 : Tensor,\n",
       "      %h2.1 : Tensor):\n",
       "  %112 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %113 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %114 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %115 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %116 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %117 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %118 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %119 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %120 : bool = prim::TypeCheck(%w2.1, %h2.1, %w1.1, %h1.1, %y2.1, %y1.1, %x2.1, %x1.1)\n",
       "  %121 : Tensor = prim::If(%120)\n",
       "    block0():\n",
       "      %68 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%112, %113, %114, %115, %116, %117, %118, %119)\n",
       "      -> (%68)\n",
       "    block1():\n",
       "      %146 : Function = prim::Constant[name=\"fallback_function\", fallback=1]()\n",
       "      %147 : (Tensor) = prim::CallFunction(%146, %w2.1, %h2.1, %w1.1, %h1.1, %y2.1, %y1.1, %x2.1, %x1.1)\n",
       "      %148 : Tensor = prim::TupleUnpack(%147)\n",
       "      -> (%148)\n",
       "  return (%121)\n",
       "with prim::TensorExprGroup_0 = graph(%14 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %15 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %17 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %18 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %34 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %37 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %51 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %54 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0)):\n",
       "  %4 : float = prim::Constant[value=1.0000000000000001e-05]()\n",
       "  %42 : None = prim::Constant()\n",
       "  %41 : float = prim::Constant[value=0.]()\n",
       "  %55 : int = prim::Constant[value=1]()\n",
       "  %xi.2 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::max(%54, %51) # <ipython-input-17-5cfef179cef6>:2:9\n",
       "  %yi.2 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::max(%37, %34) # <ipython-input-17-5cfef179cef6>:3:9\n",
       "  %56 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::add(%54, %17, %55) # <ipython-input-17-5cfef179cef6>:4:31\n",
       "  %53 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::add(%51, %14, %55) # <ipython-input-17-5cfef179cef6>:4:38\n",
       "  %50 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::min(%56, %53) # <ipython-input-17-5cfef179cef6>:4:21\n",
       "  %47 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::sub(%50, %xi.2, %55) # <ipython-input-17-5cfef179cef6>:4:21\n",
       "  %wi.2 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::clamp(%47, %41, %42) # <ipython-input-17-5cfef179cef6>:4:9\n",
       "  %39 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::add(%37, %18, %55) # <ipython-input-17-5cfef179cef6>:5:31\n",
       "  %36 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::add(%34, %15, %55) # <ipython-input-17-5cfef179cef6>:5:38\n",
       "  %33 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::min(%39, %36) # <ipython-input-17-5cfef179cef6>:5:21\n",
       "  %30 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::sub(%33, %yi.2, %55) # <ipython-input-17-5cfef179cef6>:5:21\n",
       "  %hi.2 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::clamp(%30, %41, %42) # <ipython-input-17-5cfef179cef6>:5:9\n",
       "  %area_i.2 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::mul(%wi.2, %hi.2) # <ipython-input-17-5cfef179cef6>:6:13\n",
       "  %19 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::mul(%17, %18) # <ipython-input-17-5cfef179cef6>:7:13\n",
       "  %16 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::mul(%14, %15) # <ipython-input-17-5cfef179cef6>:7:23\n",
       "  %13 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::add(%19, %16, %55) # <ipython-input-17-5cfef179cef6>:7:13\n",
       "  %area_u.2 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::sub(%13, %area_i.2, %55) # <ipython-input-17-5cfef179cef6>:7:13\n",
       "  %6 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::clamp(%area_u.2, %4, %42) # <ipython-input-17-5cfef179cef6>:8:20\n",
       "  %2 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::div(%area_i.2, %6) # <ipython-input-17-5cfef179cef6>:8:11\n",
       "  return (%2)"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ratio_iou_scripted.graph_for(x1, y1, w1, h1, x2, y2, w2, h2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "scrolled": false
   },
   "source": [
    "We will look in some detail how these things work, but the core idea is that operations of the `TensorExprGroup` here will be compiled into a single kernel that then computes the result from the inputs in one go.\n",
    "\n",
    "## How the Fusers Work at a High Level\n",
    "\n",
    "At a high level, PyTorch's fusers work in three parts:\n",
    "\n",
    "- In a fusion JIT compiler pass, the operations that can be fused are arranged in a fusion group. By looking at which operations can be fused, we get a good glimpse of what the fusers (think) they can achieve. The classic (or legacy) PyTorch fuser only considers pointwise operations (like the IOU above, see `isSimpleMap` in `torch/csrc/jit/passes/graph_fuser.cpp`). The cuda fuser (or fuser2/nvFuser above), which is conceptually somewhat close but much more elaborate than the classic fuser also handles `sum` (see `IRParser`'s `registerJitOperator` in `torch/csrc/jit/codegen/cuda/parser.cpp`). The TensorExpr fuser (fuser1, the default) fuses pointwise and `softmax` and `log_softmax` in addition to `sum` if reduction support is enabled (see `isSupported` in `torch/csrc/jit/passes/tensorexpr_graph_fuser.cpp`). It generates a fusion group node of some sort, but, in the case of the newer two fusers also inserts a check (`TypeCheck` or ...) and an explicit fallback. Interestingly, the fusers also support `rand_like`, which is very interesting and useful functionality for things like dropout.\n",
    "\n",
    "- At some point (typically the first invocation of the fusion group), it compiles a kernel for the computation. Typically this is specific to (some aspects of) the type and shape of the inputs. For the GPU, the fusers emit HIP/CUDA C code and compile using the GPU RTC (run time compile) library. For the CPU the classic fuser would also use C but the TensorExpr fuser uses an LLVM backend (but note that the CPU is much less of a target and the main use case is the GPU). These kernels are cached.\n",
    "\n",
    "- When running a fusion group (the fuser registers an operator with the JIT that is then called), the fuser needs to launch the kernel. For the newer fusers, checking whether the inputs matches expectations is done outside this node, but the classic fuser would do the fallback itself if needed.\n",
    "\n",
    "One thing to know about the fallback is that it itself will be optimized by the PyTorch JIT. So when we run a function that has been optimized with fusions with incompatible parameters (e.g. change whether we want gradients), the faling type check would cause the JIT to call the fallback and that would then get the optmizations for these parameters (and another level of check and fallback).\n",
    "\n",
    "\n",
    "### Code generation from TorchScritpt IR to GPU kernel\n",
    "\n",
    "In addition to the operator support, the code generation is where each fuser has a different approach.\n",
    "\n",
    "The CUDA fuser first transforms the TorchScript IR in the CudaFusionGroup to a Fusion IR.\n",
    "This is then further lowered to the Kernel IR and finally translated to C++-code from which the\n",
    "runtime compiler generates the kernel. The approach is conceptually relatively straightforward: there are optimizaitons how the data access is layed out and then pointwise operators are just loading, computing and storing. For reductions, there is a heuristic how to deal with the reduction axes (this is somewhat similar to TensorIterators in ATen, and, indeed the use-case is quite similar but with the compile-time vs. run-time distinction). But, as these things go, to get good results, there are quite a few things to take care of.\n",
    "\n",
    "The TensorExpr fuser (which is inspired by the lower levels of the [Apache TVM](https://tvm.apache.org/)) translates the TorchScript IR into a sequence of [Loop-Nest](https://en.wikipedia.org/wiki/Loop_nest_optimization) statements (this is done in `torch/csrc/jit/tensorexpr/kernel.cpp`, which implements the operator processing the `TensorExprGroup` Torchscript IR node). This is the TensorExpr IR (the quickest overview over the IR node types can maybe be had by looking at `torch/csrc/jit/tensorexpr/ir_visitor.h`). They are then optimized and lowered before they are passed to the code generators (CUDA source code for the GPU or LLVM for the CPU) that write kernel functions and then compile and run them (again, with caching).\n",
    "\n",
    "\n",
    "## Automatic Differentiation in TorchScript\n",
    "\n",
    "Things are a bit more complicated if we need gradients. The default mode of the JIT is to execute the LibTorch operations and they will build an autograd graph just like in classic PyTorch. But when we want to fuse operators, things get a bit more complicated. The problem here is AutoGrad needs intermediate results to compute the backward. This is OK, but our express purpose here is to skip storing and loading the intermediate results. This is mitigated by the PyTorch JIT's own automatic differentiation (AD) mechanism, AutoDiff (as opposed to AutoGrad in PyTorch). \n",
    "\n",
    "We can see it in action when we re-define our function and run it with gradient-requiring inputs: we get a `DifferentiableGraph` in there and the `TensorExprGroup` is inside that (usually this would be created as part of the fallback function but to start fresh and see this better we have to re-define the function here, just re-scripting isn't enough to clear the script):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "graph(%x1.1 : Tensor,\n",
      "      %y1.1 : Tensor,\n",
      "      %w1.1 : Tensor,\n",
      "      %h1.1 : Tensor,\n",
      "      %x2.1 : Tensor,\n",
      "      %y2.1 : Tensor,\n",
      "      %w2.1 : Tensor,\n",
      "      %h2.1 : Tensor):\n",
      "  %68 : Tensor = prim::DifferentiableGraph_0(%h2.1, %h1.1, %w2.1, %w1.1, %y2.1, %y1.1, %x2.1, %x1.1)\n",
      "  return (%68)\n",
      "with prim::DifferentiableGraph_0 = graph(%65 : Tensor,\n",
      "      %70 : Tensor,\n",
      "      %96 : Tensor,\n",
      "      %101 : Tensor,\n",
      "      %104 : Tensor,\n",
      "      %106 : Tensor,\n",
      "      %109 : Tensor,\n",
      "      %111 : Tensor):\n",
      "  %617 : int[] = aten::size(%111) # <string>:3:44\n",
      "  %620 : int[] = aten::size(%109) # <string>:3:93\n",
      "  %624 : int[] = aten::size(%106) # <string>:3:44\n",
      "  %627 : int[] = aten::size(%104) # <string>:3:93\n",
      "  %634 : int[] = aten::size(%101) # <string>:3:93\n",
      "  %641 : int[] = aten::size(%96) # <string>:3:93\n",
      "  %655 : int[] = aten::size(%70) # <string>:3:93\n",
      "  %662 : int[] = aten::size(%65) # <string>:3:93\n",
      "  %903 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %904 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %905 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %906 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %907 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %908 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %909 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %910 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %911 : bool = prim::TypeCheck(%96, %65, %101, %70, %104, %106, %109, %111)\n",
      "  %912 : Tensor, %913 : Tensor, %914 : Tensor, %915 : Tensor, %916 : Tensor, %917 : Tensor, %918 : Tensor, %919 : Tensor, %920 : Tensor, %921 : Tensor, %922 : Tensor, %923 : Tensor = prim::If(%911)\n",
      "    block0():\n",
      "      %830 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %832 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %area_u.4 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %area_i.4 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %hi.4 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %846 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %850 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %852 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %wi.4 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %856 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %860 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0), %862 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = prim::TensorExprGroup_0(%903, %904, %905, %906, %907, %908, %909, %910)\n",
      "      -> (%830, %832, %area_u.4, %area_i.4, %hi.4, %846, %850, %852, %wi.4, %856, %860, %862)\n",
      "    block1():\n",
      "      %959 : Function = prim::Constant[name=\"fallback_function\", fallback=1]()\n",
      "      %960 : (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor) = prim::CallFunction(%959, %96, %65, %101, %70, %104, %106, %109, %111)\n",
      "      %961 : Tensor, %962 : Tensor, %963 : Tensor, %964 : Tensor, %965 : Tensor, %966 : Tensor, %967 : Tensor, %968 : Tensor, %969 : Tensor, %970 : Tensor, %971 : Tensor, %972 : Tensor = prim::TupleUnpack(%960)\n",
      "      -> (%961, %962, %963, %964, %965, %966, %967, %968, %969, %970, %971, %972)\n",
      "  %875 : int[] = aten::size(%912)\n",
      "  %876 : int[] = aten::size(%913)\n",
      "  %877 : int[] = aten::size(%914)\n",
      "  %878 : int[] = aten::size(%915)\n",
      "  %879 : int[] = aten::size(%916)\n",
      "  %880 : int[] = aten::size(%917)\n",
      "  %881 : int[] = aten::size(%918)\n",
      "  %882 : int[] = aten::size(%919)\n",
      "  %883 : int[] = aten::size(%920)\n",
      "  %884 : int[] = aten::size(%921)\n",
      "  %885 : int[] = aten::size(%922)\n",
      "  %886 : int[] = aten::size(%923)\n",
      "  %887 : int[] = prim::BroadcastSizes(%617, %620)\n",
      "  %888 : int[] = prim::BroadcastSizes(%624, %627)\n",
      "  %891 : int[] = prim::BroadcastSizes(%886, %885)\n",
      "  %895 : int[] = prim::BroadcastSizes(%882, %881)\n",
      "  %898 : int[] = prim::BroadcastSizes(%634, %655)\n",
      "  %899 : int[] = prim::BroadcastSizes(%641, %662)\n",
      "  %900 : int[] = prim::BroadcastSizes(%898, %899)\n",
      "  %619 : int[]? = aten::_size_if_not_equal(%617, %887) # <string>:3:19\n",
      "  %622 : int[]? = aten::_size_if_not_equal(%620, %887) # <string>:3:68\n",
      "  %626 : int[]? = aten::_size_if_not_equal(%624, %888) # <string>:3:19\n",
      "  %629 : int[]? = aten::_size_if_not_equal(%627, %888) # <string>:3:68\n",
      "  %633 : int[]? = aten::_size_if_not_equal(%617, %886) # <string>:3:19\n",
      "  %636 : int[]? = aten::_size_if_not_equal(%634, %886) # <string>:3:68\n",
      "  %640 : int[]? = aten::_size_if_not_equal(%620, %885) # <string>:3:19\n",
      "  %643 : int[]? = aten::_size_if_not_equal(%641, %885) # <string>:3:68\n",
      "  %647 : int[]? = aten::_size_if_not_equal(%891, %884) # <string>:3:19\n",
      "  %650 : int[]? = aten::_size_if_not_equal(%887, %884) # <string>:3:68\n",
      "  %654 : int[]? = aten::_size_if_not_equal(%624, %882) # <string>:3:19\n",
      "  %657 : int[]? = aten::_size_if_not_equal(%655, %882) # <string>:3:68\n",
      "  %661 : int[]? = aten::_size_if_not_equal(%627, %881) # <string>:3:19\n",
      "  %664 : int[]? = aten::_size_if_not_equal(%662, %881) # <string>:3:68\n",
      "  %668 : int[]? = aten::_size_if_not_equal(%895, %880) # <string>:3:19\n",
      "  %671 : int[]? = aten::_size_if_not_equal(%888, %880) # <string>:3:68\n",
      "  %675 : int[]? = aten::_size_if_not_equal(%883, %878) # <string>:3:19\n",
      "  %678 : int[]? = aten::_size_if_not_equal(%879, %878) # <string>:3:68\n",
      "  %682 : int[]? = aten::_size_if_not_equal(%634, %898) # <string>:3:19\n",
      "  %685 : int[]? = aten::_size_if_not_equal(%655, %898) # <string>:3:68\n",
      "  %689 : int[]? = aten::_size_if_not_equal(%641, %899) # <string>:3:19\n",
      "  %692 : int[]? = aten::_size_if_not_equal(%662, %899) # <string>:3:68\n",
      "  %696 : int[]? = aten::_size_if_not_equal(%898, %900) # <string>:3:19\n",
      "  %699 : int[]? = aten::_size_if_not_equal(%899, %900) # <string>:3:68\n",
      "  %703 : int[]? = aten::_size_if_not_equal(%900, %877) # <string>:3:19\n",
      "  %706 : int[]? = aten::_size_if_not_equal(%878, %877) # <string>:3:68\n",
      "  %710 : int[]? = aten::_size_if_not_equal(%878, %875) # <string>:3:19\n",
      "  %713 : int[]? = aten::_size_if_not_equal(%876, %875) # <string>:3:68\n",
      "  return (%912, %111, %109, %619, %622, %106, %104, %626, %629, %101, %633, %636, %96, %640, %643, %923, %922, %647, %650, %921, %70, %654, %657, %65, %661, %664, %919, %918, %668, %671, %917, %920, %916, %675, %678, %682, %685, %689, %692, %696, %699, %915, %703, %706, %914, %913, %710, %713)\n",
      "with prim::TensorExprGroup_0 = graph(%14 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
      "      %15 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
      "      %17 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
      "      %18 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
      "      %34 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
      "      %37 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
      "      %51 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
      "      %54 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0)):\n",
      "  %4 : float = prim::Constant[value=1.0000000000000001e-05]()\n",
      "  %42 : None = prim::Constant()\n",
      "  %41 : float = prim::Constant[value=0.]()\n",
      "  %55 : int = prim::Constant[value=1]()\n",
      "  %xi.3 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::max(%54, %51) # <ipython-input-22-f16be0da5a84>:2:9\n",
      "  %yi.3 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::max(%37, %34) # <ipython-input-22-f16be0da5a84>:3:9\n",
      "  %56 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::add(%54, %17, %55) # <ipython-input-22-f16be0da5a84>:4:31\n",
      "  %53 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::add(%51, %14, %55) # <ipython-input-22-f16be0da5a84>:4:38\n",
      "  %50 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::min(%56, %53) # <ipython-input-22-f16be0da5a84>:4:21\n",
      "  %47 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::sub(%50, %xi.3, %55) # <ipython-input-22-f16be0da5a84>:4:21\n",
      "  %wi.3 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::clamp(%47, %41, %42) # <ipython-input-22-f16be0da5a84>:4:9\n",
      "  %39 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::add(%37, %18, %55) # <ipython-input-22-f16be0da5a84>:5:31\n",
      "  %36 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::add(%34, %15, %55) # <ipython-input-22-f16be0da5a84>:5:38\n",
      "  %33 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::min(%39, %36) # <ipython-input-22-f16be0da5a84>:5:21\n",
      "  %30 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::sub(%33, %yi.3, %55) # <ipython-input-22-f16be0da5a84>:5:21\n",
      "  %hi.3 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::clamp(%30, %41, %42) # <ipython-input-22-f16be0da5a84>:5:9\n",
      "  %area_i.3 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::mul(%wi.3, %hi.3) # <ipython-input-22-f16be0da5a84>:6:13\n",
      "  %19 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::mul(%17, %18) # <ipython-input-22-f16be0da5a84>:7:13\n",
      "  %16 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::mul(%14, %15) # <ipython-input-22-f16be0da5a84>:7:23\n",
      "  %13 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::add(%19, %16, %55) # <ipython-input-22-f16be0da5a84>:7:13\n",
      "  %area_u.3 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::sub(%13, %area_i.3, %55) # <ipython-input-22-f16be0da5a84>:7:13\n",
      "  %6 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::clamp(%area_u.3, %4, %42) # <ipython-input-22-f16be0da5a84>:8:20\n",
      "  %2 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0) = aten::div(%area_i.3, %6) # <ipython-input-22-f16be0da5a84>:8:11\n",
      "  return (%2, %6, %area_u.3, %area_i.3, %hi.3, %30, %36, %39, %wi.3, %47, %53, %56)\n",
      "\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.43.0 (0)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"566pt\" height=\"415pt\"\n",
       " viewBox=\"0.00 0.00 566.00 415.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 411)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-411 562,-411 562,4 -4,4\"/>\n",
       "<g id=\"clust1\" class=\"cluster\">\n",
       "<title>cluster_68</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"18,-63 18,-342 398,-342 398,-63 18,-63\"/>\n",
       "<text text-anchor=\"middle\" x=\"318\" y=\"-70.8\" font-family=\"Times,serif\" font-size=\"14.00\">DifferentiableGraph</text>\n",
       "</g>\n",
       "<g id=\"clust2\" class=\"cluster\">\n",
       "<title>cluster_912_0</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"230,-130 230,-182 390,-182 390,-130 230,-130\"/>\n",
       "</g>\n",
       "<g id=\"clust3\" class=\"cluster\">\n",
       "<title>cluster_912_1</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"26,-130 26,-182 208,-182 208,-130 26,-130\"/>\n",
       "</g>\n",
       "<!-- x1.1 -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>x1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"99\" cy=\"-389\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-385.3\" font-family=\"Times,serif\" font-size=\"14.00\">x1</text>\n",
       "</g>\n",
       "<!-- 903 -->\n",
       "<g id=\"node9\" class=\"node\">\n",
       "<title>903</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M345.5,-334C345.5,-334 274.5,-334 274.5,-334 268.5,-334 262.5,-328 262.5,-322 262.5,-322 262.5,-310 262.5,-310 262.5,-304 268.5,-298 274.5,-298 274.5,-298 345.5,-298 345.5,-298 351.5,-298 357.5,-304 357.5,-310 357.5,-310 357.5,-322 357.5,-322 357.5,-328 351.5,-334 345.5,-334\"/>\n",
       "<text text-anchor=\"middle\" x=\"310\" y=\"-312.3\" font-family=\"Times,serif\" font-size=\"14.00\">TypeCheck</text>\n",
       "</g>\n",
       "<!-- x1.1&#45;&gt;903 -->\n",
       "<g id=\"edge8\" class=\"edge\">\n",
       "<title>x1.1&#45;&gt;903</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M120.1,-377.42C124.94,-375.17 130.1,-372.9 135,-371 173.44,-356.08 217.85,-342.42 252.31,-332.57\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"253.54,-335.86 262.21,-329.76 251.64,-329.12 253.54,-335.86\"/>\n",
       "</g>\n",
       "<!-- 960 -->\n",
       "<g id=\"node10\" class=\"node\">\n",
       "<title>960</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M188.5,-174C188.5,-174 45.5,-174 45.5,-174 39.5,-174 33.5,-168 33.5,-162 33.5,-162 33.5,-150 33.5,-150 33.5,-144 39.5,-138 45.5,-138 45.5,-138 188.5,-138 188.5,-138 194.5,-138 200.5,-144 200.5,-150 200.5,-150 200.5,-162 200.5,-162 200.5,-168 194.5,-174 188.5,-174\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-152.3\" font-family=\"Times,serif\" font-size=\"14.00\">call fallback_function</text>\n",
       "</g>\n",
       "<!-- x1.1&#45;&gt;960 -->\n",
       "<g id=\"edge16\" class=\"edge\">\n",
       "<title>x1.1&#45;&gt;960</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M97.32,-371C94.38,-336.72 89.79,-257.04 103,-192 103.56,-189.27 104.31,-186.48 105.19,-183.72\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"108.57,-184.66 108.75,-174.07 102.01,-182.24 108.57,-184.66\"/>\n",
       "</g>\n",
       "<!-- y1.1 -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>y1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"243\" cy=\"-389\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"243\" y=\"-385.3\" font-family=\"Times,serif\" font-size=\"14.00\">y1</text>\n",
       "</g>\n",
       "<!-- y1.1&#45;&gt;903 -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>y1.1&#45;&gt;903</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M256.88,-373.29C265.52,-364.14 276.84,-352.14 286.81,-341.57\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"289.46,-343.87 293.78,-334.19 284.37,-339.06 289.46,-343.87\"/>\n",
       "</g>\n",
       "<!-- y1.1&#45;&gt;960 -->\n",
       "<g id=\"edge14\" class=\"edge\">\n",
       "<title>y1.1&#45;&gt;960</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M235.32,-371.63C219.12,-337.52 179.78,-256.63 141,-192 139.16,-188.94 137.16,-185.78 135.12,-182.67\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"137.88,-180.51 129.39,-174.17 132.08,-184.42 137.88,-180.51\"/>\n",
       "</g>\n",
       "<!-- w1.1 -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>w1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"27\" cy=\"-389\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-385.3\" font-family=\"Times,serif\" font-size=\"14.00\">w1</text>\n",
       "</g>\n",
       "<!-- w1.1&#45;&gt;903 -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>w1.1&#45;&gt;903</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M47.61,-377.29C52.56,-374.99 57.9,-372.74 63,-371 126.17,-349.49 201.42,-334.48 252.12,-325.84\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"252.96,-329.25 262.24,-324.15 251.8,-322.35 252.96,-329.25\"/>\n",
       "</g>\n",
       "<!-- w1.1&#45;&gt;960 -->\n",
       "<g id=\"edge11\" class=\"edge\">\n",
       "<title>w1.1&#45;&gt;960</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M31.94,-371.09C42.22,-336.64 67.48,-256.18 97,-192 98.37,-189.01 99.92,-185.94 101.52,-182.92\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"104.61,-184.57 106.41,-174.13 98.49,-181.17 104.61,-184.57\"/>\n",
       "</g>\n",
       "<!-- h1.1 -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>h1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"171\" cy=\"-389\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"171\" y=\"-385.3\" font-family=\"Times,serif\" font-size=\"14.00\">h1</text>\n",
       "</g>\n",
       "<!-- h1.1&#45;&gt;903 -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>h1.1&#45;&gt;903</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M191.81,-377.37C211.78,-367.17 242.51,-351.47 267.64,-338.64\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"269.3,-341.72 276.61,-334.05 266.11,-335.49 269.3,-341.72\"/>\n",
       "</g>\n",
       "<!-- h1.1&#45;&gt;960 -->\n",
       "<g id=\"edge12\" class=\"edge\">\n",
       "<title>h1.1&#45;&gt;960</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M167.03,-371.01C157.75,-331.3 134.56,-232.1 123.31,-184\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"126.66,-182.95 120.98,-174.01 119.84,-184.55 126.66,-182.95\"/>\n",
       "</g>\n",
       "<!-- x2.1 -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>x2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"387\" cy=\"-389\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"387\" y=\"-385.3\" font-family=\"Times,serif\" font-size=\"14.00\">x2</text>\n",
       "</g>\n",
       "<!-- x2.1&#45;&gt;903 -->\n",
       "<g id=\"edge7\" class=\"edge\">\n",
       "<title>x2.1&#45;&gt;903</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M371.79,-373.97C361.58,-364.56 347.87,-351.91 335.95,-340.93\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"338.16,-338.21 328.44,-334 333.42,-343.36 338.16,-338.21\"/>\n",
       "</g>\n",
       "<!-- x2.1&#45;&gt;960 -->\n",
       "<g id=\"edge15\" class=\"edge\">\n",
       "<title>x2.1&#45;&gt;960</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M394.73,-371.55C412.44,-331.31 450.03,-228.55 391,-192 375.38,-182.33 244.18,-184.62 226,-182 215.33,-180.46 204.17,-178.42 193.22,-176.16\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"193.76,-172.69 183.25,-174.03 192.3,-179.54 193.76,-172.69\"/>\n",
       "</g>\n",
       "<!-- y2.1 -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>y2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"531\" cy=\"-389\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"531\" y=\"-385.3\" font-family=\"Times,serif\" font-size=\"14.00\">y2</text>\n",
       "</g>\n",
       "<!-- y2.1&#45;&gt;903 -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>y2.1&#45;&gt;903</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M509.92,-377.37C505.08,-375.12 499.91,-372.86 495,-371 453.25,-355.16 404.76,-341.16 367.89,-331.39\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"368.38,-327.9 357.82,-328.75 366.6,-334.67 368.38,-327.9\"/>\n",
       "</g>\n",
       "<!-- y2.1&#45;&gt;960 -->\n",
       "<g id=\"edge13\" class=\"edge\">\n",
       "<title>y2.1&#45;&gt;960</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M529.4,-370.86C524.47,-329.54 505.33,-225.38 436,-192 414.95,-181.87 249.14,-185.18 226,-182 215.18,-180.51 203.84,-178.47 192.75,-176.19\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"193.16,-172.7 182.65,-174.04 191.7,-179.55 193.16,-172.7\"/>\n",
       "</g>\n",
       "<!-- w2.1 -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>w2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"315\" cy=\"-389\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"315\" y=\"-385.3\" font-family=\"Times,serif\" font-size=\"14.00\">w2</text>\n",
       "</g>\n",
       "<!-- w2.1&#45;&gt;903 -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>w2.1&#45;&gt;903</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M313.79,-370.81C313.22,-362.79 312.54,-353.05 311.91,-344.07\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"315.39,-343.76 311.2,-334.03 308.41,-344.25 315.39,-343.76\"/>\n",
       "</g>\n",
       "<!-- w2.1&#45;&gt;960 -->\n",
       "<g id=\"edge9\" class=\"edge\">\n",
       "<title>w2.1&#45;&gt;960</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M335.92,-377.44C348.92,-369.61 364.54,-357.6 372,-342 400.75,-281.85 417.5,-240.72 372,-192 360.9,-180.12 242.09,-184.38 226,-182 215.41,-180.43 204.33,-178.38 193.46,-176.13\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"194.06,-172.68 183.55,-174.01 192.6,-179.52 194.06,-172.68\"/>\n",
       "</g>\n",
       "<!-- h2.1 -->\n",
       "<g id=\"node8\" class=\"node\">\n",
       "<title>h2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"459\" cy=\"-389\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"459\" y=\"-385.3\" font-family=\"Times,serif\" font-size=\"14.00\">h2</text>\n",
       "</g>\n",
       "<!-- h2.1&#45;&gt;903 -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>h2.1&#45;&gt;903</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M437.65,-377.83C416.11,-367.56 382.19,-351.4 354.73,-338.31\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"356.23,-335.15 345.69,-334.01 353.21,-341.47 356.23,-335.15\"/>\n",
       "</g>\n",
       "<!-- h2.1&#45;&gt;960 -->\n",
       "<g id=\"edge10\" class=\"edge\">\n",
       "<title>h2.1&#45;&gt;960</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M461.78,-370.88C467.29,-330.07 474.26,-227.79 413,-192 395.03,-181.5 246.61,-184.89 226,-182 215.25,-180.49 204.01,-178.45 192.98,-176.18\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"193.46,-172.7 182.95,-174.04 192,-179.55 193.46,-172.7\"/>\n",
       "</g>\n",
       "<!-- 912_in -->\n",
       "<g id=\"node11\" class=\"node\">\n",
       "<title>912_in</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"242,-261 215,-243 242,-225 269,-243 242,-261\"/>\n",
       "<text text-anchor=\"middle\" x=\"242\" y=\"-239.3\" font-family=\"Times,serif\" font-size=\"14.00\">If</text>\n",
       "</g>\n",
       "<!-- 903&#45;&gt;912_in -->\n",
       "<g id=\"edge17\" class=\"edge\">\n",
       "<title>903&#45;&gt;912_in</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M293.54,-297.81C283.16,-286.98 269.79,-273.02 259.27,-262.03\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"261.63,-259.44 252.19,-254.64 256.58,-264.28 261.63,-259.44\"/>\n",
       "</g>\n",
       "<!-- 830 -->\n",
       "<g id=\"node13\" class=\"node\">\n",
       "<title>830</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M369.5,-174C369.5,-174 250.5,-174 250.5,-174 244.5,-174 238.5,-168 238.5,-162 238.5,-162 238.5,-150 238.5,-150 238.5,-144 244.5,-138 250.5,-138 250.5,-138 369.5,-138 369.5,-138 375.5,-138 381.5,-144 381.5,-150 381.5,-150 381.5,-162 381.5,-162 381.5,-168 375.5,-174 369.5,-174\"/>\n",
       "<text text-anchor=\"middle\" x=\"310\" y=\"-152.3\" font-family=\"Times,serif\" font-size=\"14.00\">TensorExprGroup</text>\n",
       "</g>\n",
       "<!-- 903&#45;&gt;830 -->\n",
       "<g id=\"edge19\" class=\"edge\">\n",
       "<title>903&#45;&gt;830</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M279.19,-297.79C259.23,-270.13 257.45,-215.59 273.82,-183.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"277.04,-184.56 279.2,-174.19 271.04,-180.95 277.04,-184.56\"/>\n",
       "</g>\n",
       "<!-- 903&#45;&gt;830 -->\n",
       "<g id=\"edge20\" class=\"edge\">\n",
       "<title>903&#45;&gt;830</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M287.99,-297.79C273.8,-270.25 272.47,-216.06 284.01,-183.53\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"287.29,-184.76 288,-174.19 280.86,-182.01 287.29,-184.76\"/>\n",
       "</g>\n",
       "<!-- 903&#45;&gt;830 -->\n",
       "<g id=\"edge21\" class=\"edge\">\n",
       "<title>903&#45;&gt;830</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M296.8,-297.79C288.32,-270.37 287.49,-216.53 294.32,-183.96\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"297.73,-184.74 296.8,-174.19 290.94,-183.02 297.73,-184.74\"/>\n",
       "</g>\n",
       "<!-- 903&#45;&gt;830 -->\n",
       "<g id=\"edge22\" class=\"edge\">\n",
       "<title>903&#45;&gt;830</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M305.6,-297.79C302.78,-270.48 302.5,-216.99 304.74,-184.38\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"308.25,-184.45 305.6,-174.19 301.27,-183.86 308.25,-184.45\"/>\n",
       "</g>\n",
       "<!-- 903&#45;&gt;830 -->\n",
       "<g id=\"edge23\" class=\"edge\">\n",
       "<title>903&#45;&gt;830</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M314.4,-297.79C317.22,-270.48 317.5,-216.99 315.26,-184.38\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"318.73,-183.86 314.4,-174.19 311.75,-184.45 318.73,-183.86\"/>\n",
       "</g>\n",
       "<!-- 903&#45;&gt;830 -->\n",
       "<g id=\"edge24\" class=\"edge\">\n",
       "<title>903&#45;&gt;830</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M323.2,-297.79C331.68,-270.37 332.51,-216.53 325.68,-183.96\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"329.06,-183.02 323.2,-174.19 322.27,-184.74 329.06,-183.02\"/>\n",
       "</g>\n",
       "<!-- 903&#45;&gt;830 -->\n",
       "<g id=\"edge25\" class=\"edge\">\n",
       "<title>903&#45;&gt;830</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M332.01,-297.79C346.2,-270.25 347.53,-216.06 335.99,-183.53\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"339.14,-182.01 332,-174.19 332.71,-184.76 339.14,-182.01\"/>\n",
       "</g>\n",
       "<!-- 903&#45;&gt;830 -->\n",
       "<g id=\"edge26\" class=\"edge\">\n",
       "<title>903&#45;&gt;830</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M340.81,-297.79C360.77,-270.13 362.55,-215.59 346.18,-183.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"348.96,-180.95 340.8,-174.19 342.96,-184.56 348.96,-180.95\"/>\n",
       "</g>\n",
       "<!-- 912 -->\n",
       "<g id=\"node12\" class=\"node\">\n",
       "<title>912</title>\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"219\" cy=\"-97.5\" rx=\"3.5\" ry=\"3.5\"/>\n",
       "</g>\n",
       "<!-- 960&#45;&gt;912 -->\n",
       "<g id=\"edge30\" class=\"edge\">\n",
       "<title>960&#45;&gt;912</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M147.88,-137.89C168.05,-126.72 193.23,-112.78 207.57,-104.83\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"209.4,-107.82 216.45,-99.91 206.01,-101.69 209.4,-107.82\"/>\n",
       "</g>\n",
       "<!-- 912_in&#45;&gt;960 -->\n",
       "<g id=\"edge29\" class=\"edge\">\n",
       "<title>912_in&#45;&gt;960</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M232.97,-230.93C223.6,-219.98 208.09,-203.27 192,-192 185.06,-187.14 177.35,-182.67 169.57,-178.65\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"170.99,-175.45 160.47,-174.16 167.89,-181.72 170.99,-175.45\"/>\n",
       "<text text-anchor=\"middle\" x=\"214\" y=\"-195.8\" font-family=\"Times,serif\" font-size=\"14.00\">n</text>\n",
       "</g>\n",
       "<!-- 912_in&#45;&gt;912 -->\n",
       "<!-- 912_in&#45;&gt;830 -->\n",
       "<g id=\"edge27\" class=\"edge\">\n",
       "<title>912_in&#45;&gt;830</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M250.89,-230.89C260.7,-218.63 276.81,-198.49 289.74,-182.33\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"292.64,-184.3 296.16,-174.3 287.18,-179.93 292.64,-184.3\"/>\n",
       "<text text-anchor=\"middle\" x=\"284.5\" y=\"-195.8\" font-family=\"Times,serif\" font-size=\"14.00\">y</text>\n",
       "</g>\n",
       "<!-- .outputs -->\n",
       "<g id=\"node14\" class=\"node\">\n",
       "<title>.outputs</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"219\" cy=\"-18\" rx=\"46.29\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"219\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">outputs</text>\n",
       "</g>\n",
       "<!-- 912&#45;&gt;.outputs -->\n",
       "<g id=\"edge31\" class=\"edge\">\n",
       "<title>912&#45;&gt;.outputs</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M219,-93.92C219,-86.5 219,-64.45 219,-46.18\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"222.5,-46 219,-36 215.5,-46 222.5,-46\"/>\n",
       "</g>\n",
       "<!-- 830&#45;&gt;912 -->\n",
       "<g id=\"edge28\" class=\"edge\">\n",
       "<title>830&#45;&gt;912</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M282.2,-137.74C264.74,-126.9 243.15,-113.5 230.26,-105.49\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"231.83,-102.35 221.49,-100.05 228.14,-108.3 231.83,-102.35\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7ff26dd978b0>"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def ratio_iou(x1, y1, w1, h1, x2, y2, w2, h2):\n",
    "    xi = torch.max(x1, x2)                                  # Intersection left\n",
    "    yi = torch.max(y1, y2)                                  # Intersection top\n",
    "    wi = torch.clamp(torch.min(x1+w1, x2+w2) - xi, min=0.)  # Intersection width\n",
    "    hi = torch.clamp(torch.min(y1+h1, y2+h2) - yi, min=0.)  # Intersection height\n",
    "    area_i = wi * hi                                        # Area Intersection\n",
    "    area_u = w1 * h1 + w2 * h2 - wi * hi                    # Area Union\n",
    "    return area_i / torch.clamp(area_u, min=1e-5)           # Intersection over Union\n",
    "\n",
    "ratio_iou_scripted = torch.jit.script(ratio_iou)\n",
    "\n",
    "x1, y1, w1, h1, x2, y2, w2, h2 = torch.randn(8, 100, 1000, device='cuda', requires_grad=True).exp()\n",
    "\n",
    "for i in range(10):\n",
    "    ratio_iou_scripted.graph_for(x1, y1, w1, h1, x2, y2, w2, h2)\n",
    "print(ratio_iou_scripted.graph_for(x1, y1, w1, h1, x2, y2, w2, h2))\n",
    "make_graph(ratio_iou_scripted.graph_for(x1, y1, w1, h1, x2, y2, w2, h2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To understand why this is, we need to look at how AutoDiff works. It has roughly three stages:\n",
    "\n",
    "- The first part of AutoDiff is a pass that creates these differentiable graphs (in the optimizations, notably before the fusing). AutoDiff has a catalogue of operations for which it can compute backwards (*Sidenote*: with their own derivative definition which could potentially differ from the AutoGrad one) it will move those into the `DifferentiableGraph`.\n",
    "\n",
    "- Then, when we run a graph containing `DifferentiableGraph` nodes (i.e. during the forward pass), the second part of AutoDiff will compute the gradient by going through the nodes of the forward graph. This is a form of source-to-source differentiation (but in contrast to classic symbolic differentiation, it is specialized to autograd-style jacobian-vector-products). This can amend the forward to output intermediates that are then captured for the backward, similar to the `save_for_backward` mechanism in an `autograd.Function` (you can see that the `TensorExprGroup` now returns a lot more values and the `DifferentiableGraph` itself adds all these sizes.\n",
    "\n",
    "- Finally, the PyTorch AutoGrad(!) mechanism is used by making a `DifferentiableGraphBackward` node that holds on to the intermediate values and, when backward is called, runs the backward graph constructed in the previous step (including letting the JIT optimize it, potentially fusing operations etc.).\n",
    "\n",
    "What is it with these sizes then? The convenient broadcasting semantics cause PyTorch to implicitly expand operands to (mostly) binary operations. But these expansions have a gradient operation associated with them - a summation of any broadcast dimensions. These size operations check whether broadcasting has happened (i.e. the output shape is large than the input for a binary operation) and if so record the target size for the summation (and `None` if no summation is needed thanks to the `aten::_size_if_not_equal` operation).\n",
    "\n",
    "There is another thing to note here: The JIT currently does not have a terribly smart logic to decide which things to capture and which things might be as well re-computed (e.g. done manually, one might well choose to recompute all the intermediates of our little function instead of capturing the values), but will mimic what AutoGrad does (defined by the AutoDiff backward specifications)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "221 µs ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n",
      "91.5 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
     ]
    }
   ],
   "source": [
    "x1, y1, w1, h1, x2, y2, w2, h2 = torch.randn(8, 100, 1000, device='cuda', requires_grad=True).exp()\n",
    "\n",
    "def take_time(fn):\n",
    "    _ = fn(x1, y1, w1, h1, x2, y2, w2, h2)\n",
    "    torch.cuda.synchronize()\n",
    "\n",
    "take_time(ratio_iou) # warmup\n",
    "%timeit take_time(ratio_iou)\n",
    "\n",
    "for i in range(2):\n",
    "    take_time(ratio_iou_scripted)\n",
    "%timeit take_time(ratio_iou_scripted)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Profiling Executor\n",
    "\n",
    "We mentioned that the JIT fusers will specialze on detailed tensor type informaition. How does it get this information? It is through the Profiling Executor that is in charge of running the JITed graphs.\n",
    "\n",
    "The profiling executor will record tensor type information (dtype, strides, sizes, requires gradient) in its profiling phase (the first few invocations). This is done by inserting special `prim::profile` nodes into the graph which then run an operator collecting and aggregating this information. Currently, it runs one profiling run, but this is configurable. Then it uses this information to implement optimizations. (*Sidenote*: Lest you should be thinking of taking time measurements when hearing profiling - I sure did - this does not seem to be done here, currently.)\n",
    "\n",
    "Traditionally, the same thing (get tensor type information attached to every value) has been done by propagating the types from the inputs through the graph. While this works great in general, it soon hits limitations, e.g. for convolutions the output shape (and thus precise type information) depends on the value (not even just the type) of e.g. the padding input. This means that unless we detect that the output-shaping inputs are constants *and* have some way of accessing type propagation, we do not know the output shape. (*Sidenote*: The same topic is also addressed by people interested in type-checking tensor programs who coordinate on the Python [typing sig mailing list](https://mail.python.org/archives/list/typing-sig@python.org/).) My best guess on the design choice here is that this is the reason we instead observe shapes during runtime (my impression is that PyTorch operations would ideally provide type propagation information, but that is could be me).\n",
    "\n",
    "So when the JIT fuser passes mentioned above go to work, they find these typing annotations on all tensor values and can adjust.\n",
    "\n",
    "One interesting aspects about the type expectations encoded by `TypeCheck` for the TensorExpr fuser and `CudaFuserGuard` for the CUDA fuser. (*Sidenote* Iterestingly, `TypeCheck` is wired into the JIT interpreter and JIT type system, while the `CudaFuserGuard` is implemented as a regular operator and implemented \"manually\" in a function `complyWith` in `torch/csrc/jit/codegen/cuda/interface.cpp`.) While they both nail the tensor shape and layout, the CUDA fuser will use the same kernel on tensors of different sizes as long as the contiguity pattern (i.e. that there are no gaps in the storage between the values of the tensor, e.g. from slicing) is the same.   \n",
    "\n",
    "\n",
    "## Looking at fallback graphs\n",
    "\n",
    "We mentioned the importance of fallbacks and how fallbacks are again optimized. But we have yet to see it.\n",
    "Sadly, the JIT's Python interface is lacking or, hopefully, lagging a bit.\n",
    "\n",
    "But we can hack around this by building our own little PyTorch extension that provides the missing functionality.\n",
    "Again, I recommend to skip this bit on first reading and revisit if you really want to know about types in the JIT (that would be another tutorial)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using /home/tv/.cache/torch_extensions as PyTorch extensions root...\n",
      "Emitting ninja build file /home/tv/.cache/torch_extensions/functiontype_ext/build.ninja...\n",
      "Building extension module functiontype_ext...\n",
      "Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)\n",
      "Loading extension module functiontype_ext...\n"
     ]
    }
   ],
   "source": [
    "csrc = \"\"\"\n",
    "#include <torch/extension.h>\n",
    "\n",
    "using ::c10::Type;\n",
    "using ::torch::jit::FunctionType;\n",
    "\n",
    "PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {\n",
    "  py::class_<FunctionType, Type, std::shared_ptr<FunctionType>>(m, \"FunctionType\")\n",
    "      .def(\"name\", [](const std::shared_ptr<FunctionType>& self) {\n",
    "        return self->function()->name();\n",
    "      })\n",
    "      .def(\n",
    "          \"get_debug_state\",\n",
    "          [](const std::shared_ptr<FunctionType>& self) {\n",
    "            return self->function()->get_executor().getDebugState();\n",
    "          })\n",
    "      .def(\"optimized_graph\", [](const std::shared_ptr<FunctionType>& self) {\n",
    "        return self->function()->optimized_graph();\n",
    "      });\n",
    "}\n",
    "\"\"\"\n",
    "import torch.utils.cpp_extension\n",
    "ext = torch.utils.cpp_extension.load_inline(\"functiontype_ext\",[csrc], verbose=True)\n",
    "\n",
    "\n",
    "def find_function_types(graph_or_block, function_types=None):\n",
    "    if function_types is None:\n",
    "        function_types = []\n",
    "    for n in graph_or_block.nodes():\n",
    "        if n.kind() == 'prim::Constant':\n",
    "            t = n.output().type()\n",
    "            if t.kind() == 'FunctionType':\n",
    "                function_types.append(t)\n",
    "        else:\n",
    "            for b in n.blocks():\n",
    "                find_function_types(b, function_types=function_types)\n",
    "            if n.hasAttribute('Subgraph'):\n",
    "                find_function_types(n.g('Subgraph'), function_types=function_types)\n",
    "    return function_types\n",
    "\n",
    "def get_function_graphs(gr):\n",
    "    return {t.name(): list(t.get_debug_state().execution_plans.values())[0].graph for t in find_function_types(gr)}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With this, we can now extract the fallback. Let us run our function a few times, first without needing gradients and then with needing gradients.\n",
    "\n",
    "The original graph is the part that doesn't need gradients, as could be expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.43.0 (0)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"595pt\" height=\"394pt\"\n",
       " viewBox=\"0.00 0.00 595.00 394.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 390)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-390 591,-390 591,4 -4,4\"/>\n",
       "<g id=\"clust1\" class=\"cluster\">\n",
       "<title>cluster_121_0</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"419,-109 419,-161 579,-161 579,-109 419,-109\"/>\n",
       "</g>\n",
       "<g id=\"clust2\" class=\"cluster\">\n",
       "<title>cluster_121_1</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"140,-109 140,-161 322,-161 322,-109 140,-109\"/>\n",
       "</g>\n",
       "<!-- x1.1 -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>x1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"27\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">x1</text>\n",
       "</g>\n",
       "<!-- 112 -->\n",
       "<g id=\"node9\" class=\"node\">\n",
       "<title>112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M534.5,-313C534.5,-313 463.5,-313 463.5,-313 457.5,-313 451.5,-307 451.5,-301 451.5,-301 451.5,-289 451.5,-289 451.5,-283 457.5,-277 463.5,-277 463.5,-277 534.5,-277 534.5,-277 540.5,-277 546.5,-283 546.5,-289 546.5,-289 546.5,-301 546.5,-301 546.5,-307 540.5,-313 534.5,-313\"/>\n",
       "<text text-anchor=\"middle\" x=\"499\" y=\"-291.3\" font-family=\"Times,serif\" font-size=\"14.00\">TypeCheck</text>\n",
       "</g>\n",
       "<!-- x1.1&#45;&gt;112 -->\n",
       "<g id=\"edge8\" class=\"edge\">\n",
       "<title>x1.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M47.5,-355.95C52.46,-353.67 57.82,-351.5 63,-350 194.68,-311.82 356.4,-300.63 441.05,-297.36\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"441.37,-300.85 451.23,-296.99 441.11,-293.85 441.37,-300.85\"/>\n",
       "</g>\n",
       "<!-- 147 -->\n",
       "<g id=\"node13\" class=\"node\">\n",
       "<title>147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M302.5,-153C302.5,-153 159.5,-153 159.5,-153 153.5,-153 147.5,-147 147.5,-141 147.5,-141 147.5,-129 147.5,-129 147.5,-123 153.5,-117 159.5,-117 159.5,-117 302.5,-117 302.5,-117 308.5,-117 314.5,-123 314.5,-129 314.5,-129 314.5,-141 314.5,-141 314.5,-147 308.5,-153 302.5,-153\"/>\n",
       "<text text-anchor=\"middle\" x=\"231\" y=\"-131.3\" font-family=\"Times,serif\" font-size=\"14.00\">call fallback_function</text>\n",
       "</g>\n",
       "<!-- x1.1&#45;&gt;147 -->\n",
       "<g id=\"edge28\" class=\"edge\">\n",
       "<title>x1.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M33.93,-350.47C50.42,-313.11 95.55,-221.46 161,-171 167.2,-166.22 174.16,-161.81 181.23,-157.86\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"182.99,-160.89 190.17,-153.1 179.7,-154.71 182.99,-160.89\"/>\n",
       "</g>\n",
       "<!-- y1.1 -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>y1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"243\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"243\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">y1</text>\n",
       "</g>\n",
       "<!-- y1.1&#45;&gt;112 -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>y1.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M263.64,-356.39C268.59,-354.09 273.92,-351.8 279,-350 332.93,-330.87 396.62,-316.04 441.56,-306.8\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"442.31,-310.22 451.42,-304.8 440.92,-303.36 442.31,-310.22\"/>\n",
       "</g>\n",
       "<!-- y1.1&#45;&gt;147 -->\n",
       "<g id=\"edge26\" class=\"edge\">\n",
       "<title>y1.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M221.87,-356.3C212.76,-350.47 203.04,-342.31 198,-332 166.56,-267.72 171.59,-237.5 198,-171 199.38,-167.54 201.27,-164.22 203.47,-161.1\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"206.2,-163.29 209.83,-153.34 200.79,-158.85 206.2,-163.29\"/>\n",
       "</g>\n",
       "<!-- w1.1 -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>w1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"99\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">w1</text>\n",
       "</g>\n",
       "<!-- w1.1&#45;&gt;112 -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>w1.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M119.51,-355.97C124.46,-353.69 129.83,-351.52 135,-350 194.35,-332.58 211.79,-340.87 273,-332 330.39,-323.69 395.81,-313.15 441.44,-305.63\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"442.15,-309.06 451.44,-303.98 441.01,-302.15 442.15,-309.06\"/>\n",
       "</g>\n",
       "<!-- w1.1&#45;&gt;147 -->\n",
       "<g id=\"edge23\" class=\"edge\">\n",
       "<title>w1.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M101.98,-349.98C109.08,-313.56 129.97,-226.87 176,-171 179.55,-166.69 183.78,-162.74 188.29,-159.17\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"190.56,-161.85 196.63,-153.16 186.47,-156.16 190.56,-161.85\"/>\n",
       "</g>\n",
       "<!-- h1.1 -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>h1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"171\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"171\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">h1</text>\n",
       "</g>\n",
       "<!-- h1.1&#45;&gt;112 -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>h1.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M191.57,-356.17C196.52,-353.88 201.87,-351.66 207,-350 218.1,-346.42 359.4,-320.92 441.29,-306.28\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"441.99,-309.71 451.21,-304.51 440.75,-302.82 441.99,-309.71\"/>\n",
       "</g>\n",
       "<!-- h1.1&#45;&gt;147 -->\n",
       "<g id=\"edge24\" class=\"edge\">\n",
       "<title>h1.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M169.04,-349.59C166.14,-319.48 162.25,-255.9 174,-204 177.56,-188.29 179.23,-183.81 189,-171 191.94,-167.15 195.38,-163.44 199,-159.97\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"201.44,-162.48 206.58,-153.22 196.78,-157.25 201.44,-162.48\"/>\n",
       "</g>\n",
       "<!-- x2.1 -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>x2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"315\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"315\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">x2</text>\n",
       "</g>\n",
       "<!-- x2.1&#45;&gt;112 -->\n",
       "<g id=\"edge7\" class=\"edge\">\n",
       "<title>x2.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M336.18,-356.61C341.01,-354.34 346.15,-352.02 351,-350 380.58,-337.65 414.09,-325.29 441.73,-315.52\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"442.98,-318.79 451.25,-312.18 440.66,-312.19 442.98,-318.79\"/>\n",
       "</g>\n",
       "<!-- x2.1&#45;&gt;147 -->\n",
       "<g id=\"edge27\" class=\"edge\">\n",
       "<title>x2.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M293.86,-356.53C289.02,-354.27 283.87,-351.97 279,-350 255.94,-340.67 240.49,-352.22 226,-332 189.81,-281.49 207.77,-203.37 221.06,-162.68\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"224.38,-163.78 224.3,-153.19 217.76,-161.52 224.38,-163.78\"/>\n",
       "</g>\n",
       "<!-- y2.1 -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>y2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"387\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"387\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">y2</text>\n",
       "</g>\n",
       "<!-- y2.1&#45;&gt;112 -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>y2.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M406,-354.96C421.69,-345.01 444.41,-330.61 463.46,-318.53\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"465.34,-321.48 471.92,-313.17 461.6,-315.57 465.34,-321.48\"/>\n",
       "</g>\n",
       "<!-- y2.1&#45;&gt;147 -->\n",
       "<g id=\"edge25\" class=\"edge\">\n",
       "<title>y2.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M365.55,-356.77C347.89,-347.45 323.21,-332.21 307,-313 268.83,-267.75 247.1,-200.44 237.34,-163.22\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"240.63,-161.97 234.79,-153.13 233.84,-163.68 240.63,-161.97\"/>\n",
       "</g>\n",
       "<!-- w2.1 -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>w2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"459\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"459\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">w2</text>\n",
       "</g>\n",
       "<!-- w2.1&#45;&gt;112 -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>w2.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M468.08,-350.89C472.9,-342.33 478.94,-331.61 484.4,-321.91\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"487.54,-323.47 489.4,-313.04 481.44,-320.03 487.54,-323.47\"/>\n",
       "</g>\n",
       "<!-- w2.1&#45;&gt;147 -->\n",
       "<g id=\"edge21\" class=\"edge\">\n",
       "<title>w2.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M441.05,-354.23C426.42,-343.62 405.42,-327.9 388,-313 352.41,-282.55 343.36,-274.79 312,-240 288.97,-214.45 265.06,-183.06 249.25,-161.48\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"251.91,-159.19 243.19,-153.17 246.25,-163.31 251.91,-159.19\"/>\n",
       "</g>\n",
       "<!-- h2.1 -->\n",
       "<g id=\"node8\" class=\"node\">\n",
       "<title>h2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"531\" cy=\"-368\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"531\" y=\"-364.3\" font-family=\"Times,serif\" font-size=\"14.00\">h2</text>\n",
       "</g>\n",
       "<!-- h2.1&#45;&gt;112 -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>h2.1&#45;&gt;112</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M523.58,-350.53C519.85,-342.26 515.24,-332.04 511.03,-322.68\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"514.13,-321.05 506.83,-313.37 507.75,-323.92 514.13,-321.05\"/>\n",
       "</g>\n",
       "<!-- h2.1&#45;&gt;147 -->\n",
       "<g id=\"edge22\" class=\"edge\">\n",
       "<title>h2.1&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M511.11,-355.49C493.06,-344.88 465.86,-328.48 443,-313 398.03,-282.54 385.58,-276.1 345,-240 328.56,-225.37 326.46,-219.67 311,-204 296.21,-189.01 292.96,-184.73 277,-171 272.52,-167.14 267.66,-163.21 262.84,-159.43\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"264.89,-156.6 254.83,-153.28 260.62,-162.15 264.89,-156.6\"/>\n",
       "</g>\n",
       "<!-- 121_in -->\n",
       "<g id=\"node10\" class=\"node\">\n",
       "<title>121_in</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"381,-240 354,-222 381,-204 408,-222 381,-240\"/>\n",
       "<text text-anchor=\"middle\" x=\"381\" y=\"-218.3\" font-family=\"Times,serif\" font-size=\"14.00\">If</text>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;121_in -->\n",
       "<g id=\"edge9\" class=\"edge\">\n",
       "<title>112&#45;&gt;121_in</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M455.46,-276.9C444.21,-271.76 432.36,-265.69 422,-259 413.76,-253.68 405.49,-246.78 398.55,-240.44\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"400.77,-237.72 391.1,-233.39 395.96,-242.81 400.77,-237.72\"/>\n",
       "</g>\n",
       "<!-- 68 -->\n",
       "<g id=\"node12\" class=\"node\">\n",
       "<title>68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M558.5,-153C558.5,-153 439.5,-153 439.5,-153 433.5,-153 427.5,-147 427.5,-141 427.5,-141 427.5,-129 427.5,-129 427.5,-123 433.5,-117 439.5,-117 439.5,-117 558.5,-117 558.5,-117 564.5,-117 570.5,-123 570.5,-129 570.5,-129 570.5,-141 570.5,-141 570.5,-147 564.5,-153 558.5,-153\"/>\n",
       "<text text-anchor=\"middle\" x=\"499\" y=\"-131.3\" font-family=\"Times,serif\" font-size=\"14.00\">TensorExprGroup</text>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge11\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M468.19,-276.79C448.23,-249.13 446.45,-194.59 462.82,-162.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"466.04,-163.56 468.2,-153.19 460.04,-159.95 466.04,-163.56\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge12\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M476.99,-276.79C462.8,-249.25 461.47,-195.06 473.01,-162.53\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"476.29,-163.76 477,-153.19 469.86,-161.01 476.29,-163.76\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge13\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M485.8,-276.79C477.32,-249.37 476.49,-195.53 483.32,-162.96\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"486.73,-163.74 485.8,-153.19 479.94,-162.02 486.73,-163.74\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge14\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M494.6,-276.79C491.78,-249.48 491.5,-195.99 493.74,-163.38\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"497.25,-163.45 494.6,-153.19 490.27,-162.86 497.25,-163.45\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge15\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M503.4,-276.79C506.22,-249.48 506.5,-195.99 504.26,-163.38\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"507.73,-162.86 503.4,-153.19 500.75,-163.45 507.73,-162.86\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge16\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M512.2,-276.79C520.68,-249.37 521.51,-195.53 514.68,-162.96\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"518.06,-162.02 512.2,-153.19 511.27,-163.74 518.06,-162.02\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge17\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M521.01,-276.79C535.2,-249.25 536.53,-195.06 524.99,-162.53\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"528.14,-161.01 521,-153.19 521.71,-163.76 528.14,-161.01\"/>\n",
       "</g>\n",
       "<!-- 112&#45;&gt;68 -->\n",
       "<g id=\"edge18\" class=\"edge\">\n",
       "<title>112&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M529.81,-276.79C549.77,-249.13 551.55,-194.59 535.18,-162.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"537.96,-159.95 529.8,-153.19 531.96,-163.56 537.96,-159.95\"/>\n",
       "</g>\n",
       "<!-- 121 -->\n",
       "<g id=\"node11\" class=\"node\">\n",
       "<title>121</title>\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"379\" cy=\"-76.5\" rx=\"3.5\" ry=\"3.5\"/>\n",
       "</g>\n",
       "<!-- 121_in&#45;&gt;121 -->\n",
       "<!-- 121_in&#45;&gt;68 -->\n",
       "<g id=\"edge19\" class=\"edge\">\n",
       "<title>121_in&#45;&gt;68</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M386.8,-207.47C392.17,-196.35 401.12,-180.89 413,-171 419.1,-165.93 426.03,-161.51 433.25,-157.68\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"435.07,-160.69 442.51,-153.15 431.99,-154.4 435.07,-160.69\"/>\n",
       "<text text-anchor=\"middle\" x=\"417.5\" y=\"-174.8\" font-family=\"Times,serif\" font-size=\"14.00\">y</text>\n",
       "</g>\n",
       "<!-- 121_in&#45;&gt;147 -->\n",
       "<g id=\"edge29\" class=\"edge\">\n",
       "<title>121_in&#45;&gt;147</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M372.43,-209.57C363.46,-198.33 348.43,-181.4 332,-171 323.39,-165.54 313.82,-160.82 304.1,-156.76\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"305.35,-153.49 294.77,-153.07 302.78,-160 305.35,-153.49\"/>\n",
       "<text text-anchor=\"middle\" x=\"355\" y=\"-174.8\" font-family=\"Times,serif\" font-size=\"14.00\">n</text>\n",
       "</g>\n",
       "<!-- .outputs -->\n",
       "<g id=\"node14\" class=\"node\">\n",
       "<title>.outputs</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"379\" cy=\"-18\" rx=\"46.29\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"379\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">outputs</text>\n",
       "</g>\n",
       "<!-- 121&#45;&gt;.outputs -->\n",
       "<g id=\"edge31\" class=\"edge\">\n",
       "<title>121&#45;&gt;.outputs</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M379,-72.77C379,-67.83 379,-56.9 379,-46.17\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"382.5,-46.15 379,-36.15 375.5,-46.15 382.5,-46.15\"/>\n",
       "</g>\n",
       "<!-- 68&#45;&gt;121 -->\n",
       "<g id=\"edge20\" class=\"edge\">\n",
       "<title>68&#45;&gt;121</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M462.67,-116.89C438.25,-105.4 407.59,-90.96 391,-83.15\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"392.24,-79.86 381.7,-78.77 389.25,-86.2 392.24,-79.86\"/>\n",
       "</g>\n",
       "<!-- 147&#45;&gt;121 -->\n",
       "<g id=\"edge30\" class=\"edge\">\n",
       "<title>147&#45;&gt;121</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M275.81,-116.89C307.09,-104.95 346.66,-89.85 366.48,-82.28\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"367.93,-85.47 376.02,-78.64 365.43,-78.93 367.93,-85.47\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7ff26ddd7700>"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def ratio_iou(x1, y1, w1, h1, x2, y2, w2, h2):\n",
    "    xi = torch.max(x1, x2)                                  # Intersection left\n",
    "    yi = torch.max(y1, y2)                                  # Intersection top\n",
    "    wi = torch.clamp(torch.min(x1+w1, x2+w2) - xi, min=0.)  # Intersection width\n",
    "    hi = torch.clamp(torch.min(y1+h1, y2+h2) - yi, min=0.)  # Intersection height\n",
    "    area_i = wi * hi                                        # Area Intersection\n",
    "    area_u = w1 * h1 + w2 * h2 - wi * hi                    # Area Union\n",
    "    return area_i / torch.clamp(area_u, min=1e-5)           # Intersection over Union\n",
    "\n",
    "ratio_iou_scripted = torch.jit.script(ratio_iou)\n",
    "\n",
    "x1, y1, w1, h1, x2, y2, w2, h2 = torch.randn(8, 100, 1000, device='cuda').exp()\n",
    "\n",
    "for i in range(10):\n",
    "    ratio_iou_scripted.graph_for(x1, y1, w1, h1, x2, y2, w2, h2)\n",
    "\n",
    "x1, y1, w1, h1, x2, y2, w2, h2 = torch.randn(8, 100, 1000, device='cuda', requires_grad=True).exp()\n",
    "for i in range(10):\n",
    "    ratio_iou_scripted.graph_for(x1, y1, w1, h1, x2, y2, w2, h2)\n",
    "\n",
    "gr = torch.jit.last_executed_optimized_graph()\n",
    "\n",
    "make_graph(gr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.43.0 (0)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"566pt\" height=\"488pt\"\n",
       " viewBox=\"0.00 0.00 566.00 488.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 484)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-484 562,-484 562,4 -4,4\"/>\n",
       "<g id=\"clust1\" class=\"cluster\">\n",
       "<title>cluster_69</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"18,-136 18,-415 398,-415 398,-136 18,-136\"/>\n",
       "<text text-anchor=\"middle\" x=\"318\" y=\"-143.8\" font-family=\"Times,serif\" font-size=\"14.00\">DifferentiableGraph</text>\n",
       "</g>\n",
       "<g id=\"clust2\" class=\"cluster\">\n",
       "<title>cluster_928_0</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"230,-203 230,-255 390,-255 390,-203 230,-203\"/>\n",
       "</g>\n",
       "<g id=\"clust3\" class=\"cluster\">\n",
       "<title>cluster_928_1</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"26,-203 26,-255 208,-255 208,-203 26,-203\"/>\n",
       "</g>\n",
       "<!-- w2.1 -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>w2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"99\" cy=\"-462\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-458.3\" font-family=\"Times,serif\" font-size=\"14.00\">w2</text>\n",
       "</g>\n",
       "<!-- 919 -->\n",
       "<g id=\"node9\" class=\"node\">\n",
       "<title>919</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M345.5,-407C345.5,-407 274.5,-407 274.5,-407 268.5,-407 262.5,-401 262.5,-395 262.5,-395 262.5,-383 262.5,-383 262.5,-377 268.5,-371 274.5,-371 274.5,-371 345.5,-371 345.5,-371 351.5,-371 357.5,-377 357.5,-383 357.5,-383 357.5,-395 357.5,-395 357.5,-401 351.5,-407 345.5,-407\"/>\n",
       "<text text-anchor=\"middle\" x=\"310\" y=\"-385.3\" font-family=\"Times,serif\" font-size=\"14.00\">TypeCheck</text>\n",
       "</g>\n",
       "<!-- w2.1&#45;&gt;919 -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>w2.1&#45;&gt;919</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M120.1,-450.42C124.94,-448.17 130.1,-445.9 135,-444 173.44,-429.08 217.85,-415.42 252.31,-405.57\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"253.54,-408.86 262.21,-402.76 251.64,-402.12 253.54,-408.86\"/>\n",
       "</g>\n",
       "<!-- 976 -->\n",
       "<g id=\"node10\" class=\"node\">\n",
       "<title>976</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M188.5,-247C188.5,-247 45.5,-247 45.5,-247 39.5,-247 33.5,-241 33.5,-235 33.5,-235 33.5,-223 33.5,-223 33.5,-217 39.5,-211 45.5,-211 45.5,-211 188.5,-211 188.5,-211 194.5,-211 200.5,-217 200.5,-223 200.5,-223 200.5,-235 200.5,-235 200.5,-241 194.5,-247 188.5,-247\"/>\n",
       "<text text-anchor=\"middle\" x=\"117\" y=\"-225.3\" font-family=\"Times,serif\" font-size=\"14.00\">call fallback_function</text>\n",
       "</g>\n",
       "<!-- w2.1&#45;&gt;976 -->\n",
       "<g id=\"edge9\" class=\"edge\">\n",
       "<title>w2.1&#45;&gt;976</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M97.32,-444C94.38,-409.72 89.79,-330.04 103,-265 103.56,-262.27 104.31,-259.48 105.19,-256.72\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"108.57,-257.66 108.75,-247.07 102.01,-255.24 108.57,-257.66\"/>\n",
       "</g>\n",
       "<!-- h2.1 -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>h2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"243\" cy=\"-462\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"243\" y=\"-458.3\" font-family=\"Times,serif\" font-size=\"14.00\">h2</text>\n",
       "</g>\n",
       "<!-- h2.1&#45;&gt;919 -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>h2.1&#45;&gt;919</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M256.88,-446.29C265.52,-437.14 276.84,-425.14 286.81,-414.57\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"289.46,-416.87 293.78,-407.19 284.37,-412.06 289.46,-416.87\"/>\n",
       "</g>\n",
       "<!-- h2.1&#45;&gt;976 -->\n",
       "<g id=\"edge10\" class=\"edge\">\n",
       "<title>h2.1&#45;&gt;976</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M235.32,-444.63C219.12,-410.52 179.78,-329.63 141,-265 139.16,-261.94 137.16,-258.78 135.12,-255.67\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"137.88,-253.51 129.39,-247.17 132.08,-257.42 137.88,-253.51\"/>\n",
       "</g>\n",
       "<!-- w1.1 -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>w1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"27\" cy=\"-462\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-458.3\" font-family=\"Times,serif\" font-size=\"14.00\">w1</text>\n",
       "</g>\n",
       "<!-- w1.1&#45;&gt;919 -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>w1.1&#45;&gt;919</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M47.61,-450.29C52.56,-447.99 57.9,-445.74 63,-444 126.17,-422.49 201.42,-407.48 252.12,-398.84\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"252.96,-402.25 262.24,-397.15 251.8,-395.35 252.96,-402.25\"/>\n",
       "</g>\n",
       "<!-- w1.1&#45;&gt;976 -->\n",
       "<g id=\"edge11\" class=\"edge\">\n",
       "<title>w1.1&#45;&gt;976</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M31.94,-444.09C42.22,-409.64 67.48,-329.18 97,-265 98.37,-262.01 99.92,-258.94 101.52,-255.92\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"104.61,-257.57 106.41,-247.13 98.49,-254.17 104.61,-257.57\"/>\n",
       "</g>\n",
       "<!-- h1.1 -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>h1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"171\" cy=\"-462\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"171\" y=\"-458.3\" font-family=\"Times,serif\" font-size=\"14.00\">h1</text>\n",
       "</g>\n",
       "<!-- h1.1&#45;&gt;919 -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>h1.1&#45;&gt;919</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M191.81,-450.37C211.78,-440.17 242.51,-424.47 267.64,-411.64\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"269.3,-414.72 276.61,-407.05 266.11,-408.49 269.3,-414.72\"/>\n",
       "</g>\n",
       "<!-- h1.1&#45;&gt;976 -->\n",
       "<g id=\"edge12\" class=\"edge\">\n",
       "<title>h1.1&#45;&gt;976</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M167.03,-444.01C157.75,-404.3 134.56,-305.1 123.31,-257\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"126.66,-255.95 120.98,-247.01 119.84,-257.55 126.66,-255.95\"/>\n",
       "</g>\n",
       "<!-- y2.1 -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>y2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"387\" cy=\"-462\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"387\" y=\"-458.3\" font-family=\"Times,serif\" font-size=\"14.00\">y2</text>\n",
       "</g>\n",
       "<!-- y2.1&#45;&gt;919 -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>y2.1&#45;&gt;919</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M371.79,-446.97C361.58,-437.56 347.87,-424.91 335.95,-413.93\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"338.16,-411.21 328.44,-407 333.42,-416.36 338.16,-411.21\"/>\n",
       "</g>\n",
       "<!-- y2.1&#45;&gt;976 -->\n",
       "<g id=\"edge13\" class=\"edge\">\n",
       "<title>y2.1&#45;&gt;976</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M394.73,-444.55C412.44,-404.31 450.03,-301.55 391,-265 375.38,-255.33 244.18,-257.62 226,-255 215.33,-253.46 204.17,-251.42 193.22,-249.16\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"193.76,-245.69 183.25,-247.03 192.3,-252.54 193.76,-245.69\"/>\n",
       "</g>\n",
       "<!-- y1.1 -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>y1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"531\" cy=\"-462\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"531\" y=\"-458.3\" font-family=\"Times,serif\" font-size=\"14.00\">y1</text>\n",
       "</g>\n",
       "<!-- y1.1&#45;&gt;919 -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>y1.1&#45;&gt;919</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M509.92,-450.37C505.08,-448.12 499.91,-445.86 495,-444 453.25,-428.16 404.76,-414.16 367.89,-404.39\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"368.38,-400.9 357.82,-401.75 366.6,-407.67 368.38,-400.9\"/>\n",
       "</g>\n",
       "<!-- y1.1&#45;&gt;976 -->\n",
       "<g id=\"edge14\" class=\"edge\">\n",
       "<title>y1.1&#45;&gt;976</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M529.4,-443.86C524.47,-402.54 505.33,-298.38 436,-265 414.95,-254.87 249.14,-258.18 226,-255 215.18,-253.51 203.84,-251.47 192.75,-249.19\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"193.16,-245.7 182.65,-247.04 191.7,-252.55 193.16,-245.7\"/>\n",
       "</g>\n",
       "<!-- x2.1 -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>x2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"315\" cy=\"-462\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"315\" y=\"-458.3\" font-family=\"Times,serif\" font-size=\"14.00\">x2</text>\n",
       "</g>\n",
       "<!-- x2.1&#45;&gt;919 -->\n",
       "<g id=\"edge7\" class=\"edge\">\n",
       "<title>x2.1&#45;&gt;919</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M313.79,-443.81C313.22,-435.79 312.54,-426.05 311.91,-417.07\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"315.39,-416.76 311.2,-407.03 308.41,-417.25 315.39,-416.76\"/>\n",
       "</g>\n",
       "<!-- x2.1&#45;&gt;976 -->\n",
       "<g id=\"edge15\" class=\"edge\">\n",
       "<title>x2.1&#45;&gt;976</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M335.92,-450.44C348.92,-442.61 364.54,-430.6 372,-415 400.75,-354.85 417.5,-313.72 372,-265 360.9,-253.12 242.09,-257.38 226,-255 215.41,-253.43 204.33,-251.38 193.46,-249.13\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"194.06,-245.68 183.55,-247.01 192.6,-252.52 194.06,-245.68\"/>\n",
       "</g>\n",
       "<!-- x1.1 -->\n",
       "<g id=\"node8\" class=\"node\">\n",
       "<title>x1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"459\" cy=\"-462\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"459\" y=\"-458.3\" font-family=\"Times,serif\" font-size=\"14.00\">x1</text>\n",
       "</g>\n",
       "<!-- x1.1&#45;&gt;919 -->\n",
       "<g id=\"edge8\" class=\"edge\">\n",
       "<title>x1.1&#45;&gt;919</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M437.65,-450.83C416.11,-440.56 382.19,-424.4 354.73,-411.31\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"356.23,-408.15 345.69,-407.01 353.21,-414.47 356.23,-408.15\"/>\n",
       "</g>\n",
       "<!-- x1.1&#45;&gt;976 -->\n",
       "<g id=\"edge16\" class=\"edge\">\n",
       "<title>x1.1&#45;&gt;976</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M461.78,-443.88C467.29,-403.07 474.26,-300.79 413,-265 395.03,-254.5 246.61,-257.89 226,-255 215.25,-253.49 204.01,-251.45 192.98,-249.18\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"193.46,-245.7 182.95,-247.04 192,-252.55 193.46,-245.7\"/>\n",
       "</g>\n",
       "<!-- 928_in -->\n",
       "<g id=\"node11\" class=\"node\">\n",
       "<title>928_in</title>\n",
       "<polygon fill=\"none\" stroke=\"black\" points=\"242,-334 215,-316 242,-298 269,-316 242,-334\"/>\n",
       "<text text-anchor=\"middle\" x=\"242\" y=\"-312.3\" font-family=\"Times,serif\" font-size=\"14.00\">If</text>\n",
       "</g>\n",
       "<!-- 919&#45;&gt;928_in -->\n",
       "<g id=\"edge17\" class=\"edge\">\n",
       "<title>919&#45;&gt;928_in</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M293.54,-370.81C283.16,-359.98 269.79,-346.02 259.27,-335.03\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"261.63,-332.44 252.19,-327.64 256.58,-337.28 261.63,-332.44\"/>\n",
       "</g>\n",
       "<!-- 846 -->\n",
       "<g id=\"node13\" class=\"node\">\n",
       "<title>846</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M369.5,-247C369.5,-247 250.5,-247 250.5,-247 244.5,-247 238.5,-241 238.5,-235 238.5,-235 238.5,-223 238.5,-223 238.5,-217 244.5,-211 250.5,-211 250.5,-211 369.5,-211 369.5,-211 375.5,-211 381.5,-217 381.5,-223 381.5,-223 381.5,-235 381.5,-235 381.5,-241 375.5,-247 369.5,-247\"/>\n",
       "<text text-anchor=\"middle\" x=\"310\" y=\"-225.3\" font-family=\"Times,serif\" font-size=\"14.00\">TensorExprGroup</text>\n",
       "</g>\n",
       "<!-- 919&#45;&gt;846 -->\n",
       "<g id=\"edge19\" class=\"edge\">\n",
       "<title>919&#45;&gt;846</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M279.19,-370.79C259.23,-343.13 257.45,-288.59 273.82,-256.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"277.04,-257.56 279.2,-247.19 271.04,-253.95 277.04,-257.56\"/>\n",
       "</g>\n",
       "<!-- 919&#45;&gt;846 -->\n",
       "<g id=\"edge20\" class=\"edge\">\n",
       "<title>919&#45;&gt;846</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M287.99,-370.79C273.8,-343.25 272.47,-289.06 284.01,-256.53\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"287.29,-257.76 288,-247.19 280.86,-255.01 287.29,-257.76\"/>\n",
       "</g>\n",
       "<!-- 919&#45;&gt;846 -->\n",
       "<g id=\"edge21\" class=\"edge\">\n",
       "<title>919&#45;&gt;846</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M296.8,-370.79C288.32,-343.37 287.49,-289.53 294.32,-256.96\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"297.73,-257.74 296.8,-247.19 290.94,-256.02 297.73,-257.74\"/>\n",
       "</g>\n",
       "<!-- 919&#45;&gt;846 -->\n",
       "<g id=\"edge22\" class=\"edge\">\n",
       "<title>919&#45;&gt;846</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M305.6,-370.79C302.78,-343.48 302.5,-289.99 304.74,-257.38\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"308.25,-257.45 305.6,-247.19 301.27,-256.86 308.25,-257.45\"/>\n",
       "</g>\n",
       "<!-- 919&#45;&gt;846 -->\n",
       "<g id=\"edge23\" class=\"edge\">\n",
       "<title>919&#45;&gt;846</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M314.4,-370.79C317.22,-343.48 317.5,-289.99 315.26,-257.38\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"318.73,-256.86 314.4,-247.19 311.75,-257.45 318.73,-256.86\"/>\n",
       "</g>\n",
       "<!-- 919&#45;&gt;846 -->\n",
       "<g id=\"edge24\" class=\"edge\">\n",
       "<title>919&#45;&gt;846</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M323.2,-370.79C331.68,-343.37 332.51,-289.53 325.68,-256.96\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"329.06,-256.02 323.2,-247.19 322.27,-257.74 329.06,-256.02\"/>\n",
       "</g>\n",
       "<!-- 919&#45;&gt;846 -->\n",
       "<g id=\"edge25\" class=\"edge\">\n",
       "<title>919&#45;&gt;846</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M332.01,-370.79C346.2,-343.25 347.53,-289.06 335.99,-256.53\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"339.14,-255.01 332,-247.19 332.71,-257.76 339.14,-255.01\"/>\n",
       "</g>\n",
       "<!-- 919&#45;&gt;846 -->\n",
       "<g id=\"edge26\" class=\"edge\">\n",
       "<title>919&#45;&gt;846</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M340.81,-370.79C360.77,-343.13 362.55,-288.59 346.18,-256.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"348.96,-253.95 340.8,-247.19 342.96,-257.56 348.96,-253.95\"/>\n",
       "</g>\n",
       "<!-- 928 -->\n",
       "<g id=\"node12\" class=\"node\">\n",
       "<title>928</title>\n",
       "<ellipse fill=\"none\" stroke=\"black\" cx=\"219\" cy=\"-170.5\" rx=\"3.5\" ry=\"3.5\"/>\n",
       "</g>\n",
       "<!-- 976&#45;&gt;928 -->\n",
       "<g id=\"edge30\" class=\"edge\">\n",
       "<title>976&#45;&gt;928</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M147.88,-210.89C168.05,-199.72 193.23,-185.78 207.57,-177.83\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"209.4,-180.82 216.45,-172.91 206.01,-174.69 209.4,-180.82\"/>\n",
       "</g>\n",
       "<!-- 928_in&#45;&gt;976 -->\n",
       "<g id=\"edge29\" class=\"edge\">\n",
       "<title>928_in&#45;&gt;976</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M232.97,-303.93C223.6,-292.98 208.09,-276.27 192,-265 185.06,-260.14 177.35,-255.67 169.57,-251.65\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"170.99,-248.45 160.47,-247.16 167.89,-254.72 170.99,-248.45\"/>\n",
       "<text text-anchor=\"middle\" x=\"214\" y=\"-268.8\" font-family=\"Times,serif\" font-size=\"14.00\">n</text>\n",
       "</g>\n",
       "<!-- 928_in&#45;&gt;928 -->\n",
       "<!-- 928_in&#45;&gt;846 -->\n",
       "<g id=\"edge27\" class=\"edge\">\n",
       "<title>928_in&#45;&gt;846</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M250.89,-303.89C260.7,-291.63 276.81,-271.49 289.74,-255.33\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"292.64,-257.3 296.16,-247.3 287.18,-252.93 292.64,-257.3\"/>\n",
       "<text text-anchor=\"middle\" x=\"284.5\" y=\"-268.8\" font-family=\"Times,serif\" font-size=\"14.00\">y</text>\n",
       "</g>\n",
       "<!-- 67 -->\n",
       "<g id=\"node14\" class=\"node\">\n",
       "<title>67</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M270.5,-109C270.5,-109 167.5,-109 167.5,-109 161.5,-109 155.5,-103 155.5,-97 155.5,-97 155.5,-85 155.5,-85 155.5,-79 161.5,-73 167.5,-73 167.5,-73 270.5,-73 270.5,-73 276.5,-73 282.5,-79 282.5,-85 282.5,-85 282.5,-97 282.5,-97 282.5,-103 276.5,-109 270.5,-109\"/>\n",
       "<text text-anchor=\"middle\" x=\"219\" y=\"-87.3\" font-family=\"Times,serif\" font-size=\"14.00\">TupleConstruct</text>\n",
       "</g>\n",
       "<!-- 928&#45;&gt;67 -->\n",
       "<g id=\"edge31\" class=\"edge\">\n",
       "<title>928&#45;&gt;67</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M219,-166.92C219,-159.5 219,-137.45 219,-119.18\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"222.5,-119 219,-109 215.5,-119 222.5,-119\"/>\n",
       "</g>\n",
       "<!-- 846&#45;&gt;928 -->\n",
       "<g id=\"edge28\" class=\"edge\">\n",
       "<title>846&#45;&gt;928</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M282.2,-210.74C264.74,-199.9 243.15,-186.5 230.26,-178.49\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"231.83,-175.35 221.49,-173.05 228.14,-181.3 231.83,-175.35\"/>\n",
       "</g>\n",
       "<!-- .outputs -->\n",
       "<g id=\"node15\" class=\"node\">\n",
       "<title>.outputs</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"219\" cy=\"-18\" rx=\"46.29\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"219\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">outputs</text>\n",
       "</g>\n",
       "<!-- 67&#45;&gt;.outputs -->\n",
       "<g id=\"edge32\" class=\"edge\">\n",
       "<title>67&#45;&gt;.outputs</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M219,-72.81C219,-64.79 219,-55.05 219,-46.07\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"222.5,-46.03 219,-36.03 215.5,-46.03 222.5,-46.03\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7ff26ddd7d60>"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gr_fb1 = get_function_graphs(gr)['fallback_function']\n",
    "make_graph(gr_fb1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can take this to several levels, but when we get an \"internal assert failed\" error regarding a missing optimized plan, it means that we have reached the end of the *optimized* fallback passes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [
    {
     "ename": "RuntimeError",
     "evalue": "optimized_plan_ INTERNAL ASSERT FAILED at \"../torch/csrc/jit/runtime/profiling_graph_executor_impl.cpp\":551, please report a bug to PyTorch. ",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-27-229c5d87e1d1>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mgr_fb2\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mget_function_graphs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgr_fb1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'fallback_function'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m<ipython-input-24-efbba7d523b5>\u001b[0m in \u001b[0;36mget_function_graphs\u001b[0;34m(gr)\u001b[0m\n\u001b[1;32m     40\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     41\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mget_function_graphs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 42\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_debug_state\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecution_plans\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgraph\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mt\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mfind_function_types\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m<ipython-input-24-efbba7d523b5>\u001b[0m in \u001b[0;36m<dictcomp>\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m     40\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     41\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mget_function_graphs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 42\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_debug_state\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecution_plans\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgraph\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mt\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mfind_function_types\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mRuntimeError\u001b[0m: optimized_plan_ INTERNAL ASSERT FAILED at \"../torch/csrc/jit/runtime/profiling_graph_executor_impl.cpp\":551, please report a bug to PyTorch. "
     ]
    }
   ],
   "source": [
    "gr_fb2 = get_function_graphs(gr_fb1)['fallback_function']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can inspect the unoptimized fallback, even if it may seem counterintuitive to the uninitiated like us that the unoptimized graph should be accessed via `optimized_graph` (Also note that the type annotations in the fallback branch are bogus. Oh well.):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "graph(%0 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %1 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %2 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %3 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %4 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %5 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %6 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0),\n",
       "      %7 : Float(100, 1000, strides=[1000, 1], requires_grad=0, device=cuda:0)):\n",
       "  %11 : int = prim::Constant[value=1]()\n",
       "  %10 : float = prim::Constant[value=0.]()\n",
       "  %9 : None = prim::Constant()\n",
       "  %8 : float = prim::Constant[value=1.0000000000000001e-05]()\n",
       "  %xi.4 : Tensor = aten::max(%7, %6) # <ipython-input-25-071038d7fdab>:2:9\n",
       "  %yi.4 : Tensor = aten::max(%5, %4) # <ipython-input-25-071038d7fdab>:3:9\n",
       "  %14 : Tensor = aten::add(%7, %2, %11) # <ipython-input-25-071038d7fdab>:4:31\n",
       "  %15 : Tensor = aten::add(%6, %0, %11) # <ipython-input-25-071038d7fdab>:4:38\n",
       "  %16 : Tensor = aten::min(%14, %15) # <ipython-input-25-071038d7fdab>:4:21\n",
       "  %17 : Tensor = aten::sub(%16, %xi.4, %11) # <ipython-input-25-071038d7fdab>:4:21\n",
       "  %wi.4 : Tensor = aten::clamp(%17, %10, %9) # <ipython-input-25-071038d7fdab>:4:9\n",
       "  %19 : Tensor = aten::add(%5, %3, %11) # <ipython-input-25-071038d7fdab>:5:31\n",
       "  %20 : Tensor = aten::add(%4, %1, %11) # <ipython-input-25-071038d7fdab>:5:38\n",
       "  %21 : Tensor = aten::min(%19, %20) # <ipython-input-25-071038d7fdab>:5:21\n",
       "  %22 : Tensor = aten::sub(%21, %yi.4, %11) # <ipython-input-25-071038d7fdab>:5:21\n",
       "  %hi.4 : Tensor = aten::clamp(%22, %10, %9) # <ipython-input-25-071038d7fdab>:5:9\n",
       "  %area_i.4 : Tensor = aten::mul(%wi.4, %hi.4) # <ipython-input-25-071038d7fdab>:6:13\n",
       "  %25 : Tensor = aten::mul(%2, %3) # <ipython-input-25-071038d7fdab>:7:13\n",
       "  %26 : Tensor = aten::mul(%0, %1) # <ipython-input-25-071038d7fdab>:7:23\n",
       "  %27 : Tensor = aten::add(%25, %26, %11) # <ipython-input-25-071038d7fdab>:7:13\n",
       "  %area_u.4 : Tensor = aten::sub(%27, %area_i.4, %11) # <ipython-input-25-071038d7fdab>:7:13\n",
       "  %29 : Tensor = aten::clamp(%area_u.4, %8, %9) # <ipython-input-25-071038d7fdab>:8:20\n",
       "  %30 : Tensor = aten::div(%area_i.4, %29) # <ipython-input-25-071038d7fdab>:8:11\n",
       "  %31 : (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor) = prim::TupleConstruct(%30, %29, %area_u.4, %area_i.4, %hi.4, %22, %20, %19, %wi.4, %17, %15, %14)\n",
       "  return (%31)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "find_function_types(gr_fb1)[0].optimized_graph()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How we could go at benchmarking\n",
    "\n",
    "We can now pitch the various fusers against each other if we want. We abuse the context manager in a non-contextmanagery way. Note that we do not time the backwards here, but it would be straightforward to do, too."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fuser: None, requires gradient: False\n",
      "159 µs ± 457 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n",
      "fuser: fuser1, requires gradient: False\n",
      "37.1 µs ± 180 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n",
      "fuser: fuser2, requires gradient: False\n",
      "47 µs ± 166 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n",
      "fuser: None, requires gradient: True\n",
      "221 µs ± 1.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n",
      "fuser: fuser1, requires gradient: True\n",
      "92.7 µs ± 197 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n",
      "fuser: fuser2, requires gradient: True\n",
      "106 µs ± 242 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
     ]
    }
   ],
   "source": [
    "for rq in [False, True]:\n",
    "    for fuser in [None, \"fuser1\", \"fuser2\"]:\n",
    "        if fuser is not None:\n",
    "            c = torch.jit.fuser(fuser)        \n",
    "            c.__enter__()\n",
    "        \n",
    "        def ratio_iou(x1, y1, w1, h1, x2, y2, w2, h2):\n",
    "            xi = torch.max(x1, x2)                                  # Intersection left\n",
    "            yi = torch.max(y1, y2)                                  # Intersection top\n",
    "            wi = torch.clamp(torch.min(x1+w1, x2+w2) - xi, min=0.)  # Intersection width\n",
    "            hi = torch.clamp(torch.min(y1+h1, y2+h2) - yi, min=0.)  # Intersection height\n",
    "            area_i = wi * hi                                        # Area Intersection\n",
    "            area_u = w1 * h1 + w2 * h2 - wi * hi                    # Area Union\n",
    "            return area_i / torch.clamp(area_u, min=1e-5)           # Intersection over Union\n",
    "\n",
    "        ratio_iou_scripted = torch.jit.script(ratio_iou) if fuser is not None else ratio_iou\n",
    "            \n",
    "\n",
    "        x1, y1, w1, h1, x2, y2, w2, h2 = torch.randn(8, 100, 1000, device='cuda', requires_grad=rq).exp()\n",
    "    \n",
    "        print(f\"fuser: {fuser}, requires gradient: {rq}\")\n",
    "        for i in range(10):\n",
    "            take_time(ratio_iou_scripted)\n",
    "\n",
    "        %timeit take_time(ratio_iou_scripted)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Doing funny things to kick the tires a bit\n",
    "\n",
    "If you followed along, you will have noticed that the order of kernels to try depends on how we have called our scripted function before. This can lead to somewhat funny effects.\n",
    "\n",
    "One thing is that whether we end up running a `DifferentiableGraph` (and computing the intermediates) depends on what we did during the profiling and the fallback mechanisms for the fusion groups.\n",
    "In fact, there are bugs to be found (reported as [#49299](https://github.com/pytorch/pytorch/issues/49299)) where whether we get gradient requiring outputs does not match what we feed into the scripted function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fuser: fuser1 input requires grad: True output requires grad: True\n",
      "fuser: fuser1 input requires grad: False output requires grad: True\n",
      "fuser: fuser2 input requires grad: True output requires grad: True\n",
      "fuser: fuser2 input requires grad: False output requires grad: True\n"
     ]
    }
   ],
   "source": [
    "for fuser in [\"fuser1\", \"fuser2\"]:\n",
    "    for rq in [True, False]:\n",
    "        c = torch.jit.fuser(fuser)\n",
    "        c.__enter__()\n",
    "\n",
    "        def ratio_iou(x1, y1, w1, h1, x2, y2, w2, h2):\n",
    "            xi = torch.max(x1, x2)                                  # Intersection left\n",
    "            yi = torch.max(y1, y2)                                  # Intersection top\n",
    "            wi = torch.clamp(torch.min(x1+w1, x2+w2) - xi, min=0.)  # Intersection width\n",
    "            hi = torch.clamp(torch.min(y1+h1, y2+h2) - yi, min=0.)  # Intersection height\n",
    "            area_i = wi * hi                                        # Area Intersection\n",
    "            area_u = w1 * h1 + w2 * h2 - wi * hi                    # Area Union\n",
    "            return area_i / torch.clamp(area_u, min=1e-5)           # Intersection over Union\n",
    "\n",
    "        ratio_iou_scripted = torch.jit.script(ratio_iou)\n",
    "\n",
    "        x1, y1, w1, h1, x2, y2, w2, h2 = torch.randn(8, 100, 1000, device='cuda', requires_grad=not rq).exp()\n",
    "        for i in range(10):\n",
    "            ratio_iou_scripted.graph_for(x1, y1, w1, h1, x2, y2, w2, h2)\n",
    "        x1, y1, w1, h1, x2, y2, w2, h2 = torch.randn(8, 100, 1000, device='cuda', requires_grad=rq).exp()\n",
    "        print(\"fuser:\", fuser, \"input requires grad:\", x1.requires_grad, \"output requires grad:\", ratio_iou_scripted(x1, y1, w1, h1, x2, y2, w2, h2).requires_grad)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another fun thing to try is what happens when the profiling runs see different tensor sizes (this is a real thing, e.g. for Neural Machine Translation or other NLP applications).\n",
    "\n",
    "Do change the fuser between `fuser1` and `fuser2` here. We see that the CUDA fuser can handle both sizes with the same kernel while the TensorExpr fuser decides to not optimize this path.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/svg+xml": [
       "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n",
       "<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n",
       " \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n",
       "<!-- Generated by graphviz version 2.43.0 (0)\n",
       " -->\n",
       "<!-- Title: %3 Pages: 1 -->\n",
       "<svg width=\"566pt\" height=\"692pt\"\n",
       " viewBox=\"0.00 0.00 566.00 692.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n",
       "<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 688)\">\n",
       "<title>%3</title>\n",
       "<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-688 562,-688 562,4 -4,4\"/>\n",
       "<!-- x1.1 -->\n",
       "<g id=\"node1\" class=\"node\">\n",
       "<title>x1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"27\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">x1</text>\n",
       "</g>\n",
       "<!-- xi.1 -->\n",
       "<g id=\"node9\" class=\"node\">\n",
       "<title>xi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M42,-612C42,-612 12,-612 12,-612 6,-612 0,-606 0,-600 0,-600 0,-588 0,-588 0,-582 6,-576 12,-576 12,-576 42,-576 42,-576 48,-576 54,-582 54,-588 54,-588 54,-600 54,-600 54,-606 48,-612 42,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"27\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">max</text>\n",
       "</g>\n",
       "<!-- x1.1&#45;&gt;xi.1 -->\n",
       "<g id=\"edge1\" class=\"edge\">\n",
       "<title>x1.1&#45;&gt;xi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M27,-647.7C27,-639.98 27,-630.71 27,-622.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"30.5,-622.1 27,-612.1 23.5,-622.1 30.5,-622.1\"/>\n",
       "</g>\n",
       "<!-- 20 -->\n",
       "<g id=\"node11\" class=\"node\">\n",
       "<title>20</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M114,-612C114,-612 84,-612 84,-612 78,-612 72,-606 72,-600 72,-600 72,-588 72,-588 72,-582 78,-576 84,-576 84,-576 114,-576 114,-576 120,-576 126,-582 126,-588 126,-588 126,-600 126,-600 126,-606 120,-612 114,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">add</text>\n",
       "</g>\n",
       "<!-- x1.1&#45;&gt;20 -->\n",
       "<g id=\"edge5\" class=\"edge\">\n",
       "<title>x1.1&#45;&gt;20</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M41.57,-650.83C50.92,-641.75 63.3,-629.71 74.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"76.67,-621.59 81.4,-612.11 71.79,-616.57 76.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- y1.1 -->\n",
       "<g id=\"node2\" class=\"node\">\n",
       "<title>y1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"531\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"531\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">y1</text>\n",
       "</g>\n",
       "<!-- yi.1 -->\n",
       "<g id=\"node10\" class=\"node\">\n",
       "<title>yi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M546,-612C546,-612 516,-612 516,-612 510,-612 504,-606 504,-600 504,-600 504,-588 504,-588 504,-582 510,-576 516,-576 516,-576 546,-576 546,-576 552,-576 558,-582 558,-588 558,-588 558,-600 558,-600 558,-606 552,-612 546,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"531\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">max</text>\n",
       "</g>\n",
       "<!-- y1.1&#45;&gt;yi.1 -->\n",
       "<g id=\"edge3\" class=\"edge\">\n",
       "<title>y1.1&#45;&gt;yi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M531,-647.7C531,-639.98 531,-630.71 531,-622.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"534.5,-622.1 531,-612.1 527.5,-622.1 534.5,-622.1\"/>\n",
       "</g>\n",
       "<!-- 34 -->\n",
       "<g id=\"node16\" class=\"node\">\n",
       "<title>34</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M474,-612C474,-612 444,-612 444,-612 438,-612 432,-606 432,-600 432,-600 432,-588 432,-588 432,-582 438,-576 444,-576 444,-576 474,-576 474,-576 480,-576 486,-582 486,-588 486,-588 486,-600 486,-600 486,-606 480,-612 474,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"459\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">add</text>\n",
       "</g>\n",
       "<!-- y1.1&#45;&gt;34 -->\n",
       "<g id=\"edge14\" class=\"edge\">\n",
       "<title>y1.1&#45;&gt;34</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M516.43,-650.83C507.08,-641.75 494.7,-629.71 483.83,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"486.21,-616.57 476.6,-612.11 481.33,-621.59 486.21,-616.57\"/>\n",
       "</g>\n",
       "<!-- w1.1 -->\n",
       "<g id=\"node3\" class=\"node\">\n",
       "<title>w1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"171\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"171\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">w1</text>\n",
       "</g>\n",
       "<!-- w1.1&#45;&gt;20 -->\n",
       "<g id=\"edge6\" class=\"edge\">\n",
       "<title>w1.1&#45;&gt;20</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M156.43,-650.83C147.08,-641.75 134.7,-629.71 123.83,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"126.21,-616.57 116.6,-612.11 121.33,-621.59 126.21,-616.57\"/>\n",
       "</g>\n",
       "<!-- 51 -->\n",
       "<g id=\"node22\" class=\"node\">\n",
       "<title>51</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M258,-612C258,-612 228,-612 228,-612 222,-612 216,-606 216,-600 216,-600 216,-588 216,-588 216,-582 222,-576 228,-576 228,-576 258,-576 258,-576 264,-576 270,-582 270,-588 270,-588 270,-600 270,-600 270,-606 264,-612 258,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"243\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">mul</text>\n",
       "</g>\n",
       "<!-- w1.1&#45;&gt;51 -->\n",
       "<g id=\"edge25\" class=\"edge\">\n",
       "<title>w1.1&#45;&gt;51</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M185.57,-650.83C194.92,-641.75 207.3,-629.71 218.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"220.67,-621.59 225.4,-612.11 215.79,-616.57 220.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- h1.1 -->\n",
       "<g id=\"node4\" class=\"node\">\n",
       "<title>h1.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"387\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"387\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">h1</text>\n",
       "</g>\n",
       "<!-- h1.1&#45;&gt;34 -->\n",
       "<g id=\"edge15\" class=\"edge\">\n",
       "<title>h1.1&#45;&gt;34</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M401.57,-650.83C410.92,-641.75 423.3,-629.71 434.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"436.67,-621.59 441.4,-612.11 431.79,-616.57 436.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- h1.1&#45;&gt;51 -->\n",
       "<g id=\"edge26\" class=\"edge\">\n",
       "<title>h1.1&#45;&gt;51</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M365.75,-654.67C343.08,-643.65 306.74,-625.98 279.47,-612.73\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"280.77,-609.47 270.24,-608.24 277.71,-615.76 280.77,-609.47\"/>\n",
       "</g>\n",
       "<!-- x2.1 -->\n",
       "<g id=\"node5\" class=\"node\">\n",
       "<title>x2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"99\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"99\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">x2</text>\n",
       "</g>\n",
       "<!-- x2.1&#45;&gt;xi.1 -->\n",
       "<g id=\"edge2\" class=\"edge\">\n",
       "<title>x2.1&#45;&gt;xi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M84.43,-650.83C75.08,-641.75 62.7,-629.71 51.83,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"54.21,-616.57 44.6,-612.11 49.33,-621.59 54.21,-616.57\"/>\n",
       "</g>\n",
       "<!-- 23 -->\n",
       "<g id=\"node12\" class=\"node\">\n",
       "<title>23</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M186,-612C186,-612 156,-612 156,-612 150,-612 144,-606 144,-600 144,-600 144,-588 144,-588 144,-582 150,-576 156,-576 156,-576 186,-576 186,-576 192,-576 198,-582 198,-588 198,-588 198,-600 198,-600 198,-606 192,-612 186,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"171\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">add</text>\n",
       "</g>\n",
       "<!-- x2.1&#45;&gt;23 -->\n",
       "<g id=\"edge7\" class=\"edge\">\n",
       "<title>x2.1&#45;&gt;23</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M113.57,-650.83C122.92,-641.75 135.3,-629.71 146.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"148.67,-621.59 153.4,-612.11 143.79,-616.57 148.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- y2.1 -->\n",
       "<g id=\"node6\" class=\"node\">\n",
       "<title>y2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"459\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"459\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">y2</text>\n",
       "</g>\n",
       "<!-- y2.1&#45;&gt;yi.1 -->\n",
       "<g id=\"edge4\" class=\"edge\">\n",
       "<title>y2.1&#45;&gt;yi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M473.57,-650.83C482.92,-641.75 495.3,-629.71 506.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"508.67,-621.59 513.4,-612.11 503.79,-616.57 508.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- 37 -->\n",
       "<g id=\"node17\" class=\"node\">\n",
       "<title>37</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M402,-612C402,-612 372,-612 372,-612 366,-612 360,-606 360,-600 360,-600 360,-588 360,-588 360,-582 366,-576 372,-576 372,-576 402,-576 402,-576 408,-576 414,-582 414,-588 414,-588 414,-600 414,-600 414,-606 408,-612 402,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"387\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">add</text>\n",
       "</g>\n",
       "<!-- y2.1&#45;&gt;37 -->\n",
       "<g id=\"edge16\" class=\"edge\">\n",
       "<title>y2.1&#45;&gt;37</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M444.43,-650.83C435.08,-641.75 422.7,-629.71 411.83,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"414.21,-616.57 404.6,-612.11 409.33,-621.59 414.21,-616.57\"/>\n",
       "</g>\n",
       "<!-- w2.1 -->\n",
       "<g id=\"node7\" class=\"node\">\n",
       "<title>w2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"243\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"243\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">w2</text>\n",
       "</g>\n",
       "<!-- w2.1&#45;&gt;23 -->\n",
       "<g id=\"edge8\" class=\"edge\">\n",
       "<title>w2.1&#45;&gt;23</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M228.43,-650.83C219.08,-641.75 206.7,-629.71 195.83,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"198.21,-616.57 188.6,-612.11 193.33,-621.59 198.21,-616.57\"/>\n",
       "</g>\n",
       "<!-- 54 -->\n",
       "<g id=\"node23\" class=\"node\">\n",
       "<title>54</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M330,-612C330,-612 300,-612 300,-612 294,-612 288,-606 288,-600 288,-600 288,-588 288,-588 288,-582 294,-576 300,-576 300,-576 330,-576 330,-576 336,-576 342,-582 342,-588 342,-588 342,-600 342,-600 342,-606 336,-612 330,-612\"/>\n",
       "<text text-anchor=\"middle\" x=\"315\" y=\"-590.3\" font-family=\"Times,serif\" font-size=\"14.00\">mul</text>\n",
       "</g>\n",
       "<!-- w2.1&#45;&gt;54 -->\n",
       "<g id=\"edge27\" class=\"edge\">\n",
       "<title>w2.1&#45;&gt;54</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M257.57,-650.83C266.92,-641.75 279.3,-629.71 290.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"292.67,-621.59 297.4,-612.11 287.79,-616.57 292.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- h2.1 -->\n",
       "<g id=\"node8\" class=\"node\">\n",
       "<title>h2.1</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"315\" cy=\"-666\" rx=\"27\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"315\" y=\"-662.3\" font-family=\"Times,serif\" font-size=\"14.00\">h2</text>\n",
       "</g>\n",
       "<!-- h2.1&#45;&gt;37 -->\n",
       "<g id=\"edge17\" class=\"edge\">\n",
       "<title>h2.1&#45;&gt;37</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M329.57,-650.83C338.92,-641.75 351.3,-629.71 362.17,-619.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"364.67,-621.59 369.4,-612.11 359.79,-616.57 364.67,-621.59\"/>\n",
       "</g>\n",
       "<!-- h2.1&#45;&gt;54 -->\n",
       "<g id=\"edge28\" class=\"edge\">\n",
       "<title>h2.1&#45;&gt;54</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M315,-647.7C315,-639.98 315,-630.71 315,-622.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"318.5,-622.1 315,-612.1 311.5,-622.1 318.5,-622.1\"/>\n",
       "</g>\n",
       "<!-- 29 -->\n",
       "<g id=\"node14\" class=\"node\">\n",
       "<title>29</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M164,-468C164,-468 134,-468 134,-468 128,-468 122,-462 122,-456 122,-456 122,-444 122,-444 122,-438 128,-432 134,-432 134,-432 164,-432 164,-432 170,-432 176,-438 176,-444 176,-444 176,-456 176,-456 176,-462 170,-468 164,-468\"/>\n",
       "<text text-anchor=\"middle\" x=\"149\" y=\"-446.3\" font-family=\"Times,serif\" font-size=\"14.00\">sub</text>\n",
       "</g>\n",
       "<!-- xi.1&#45;&gt;29 -->\n",
       "<g id=\"edge12\" class=\"edge\">\n",
       "<title>xi.1&#45;&gt;29</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M41.72,-575.87C63.06,-551.03 102.74,-504.85 127.4,-476.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"130.37,-478.06 134.23,-468.19 125.06,-473.49 130.37,-478.06\"/>\n",
       "</g>\n",
       "<!-- 43 -->\n",
       "<g id=\"node19\" class=\"node\">\n",
       "<title>43</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M412,-468C412,-468 382,-468 382,-468 376,-468 370,-462 370,-456 370,-456 370,-444 370,-444 370,-438 376,-432 382,-432 382,-432 412,-432 412,-432 418,-432 424,-438 424,-444 424,-444 424,-456 424,-456 424,-462 418,-468 412,-468\"/>\n",
       "<text text-anchor=\"middle\" x=\"397\" y=\"-446.3\" font-family=\"Times,serif\" font-size=\"14.00\">sub</text>\n",
       "</g>\n",
       "<!-- yi.1&#45;&gt;43 -->\n",
       "<g id=\"edge21\" class=\"edge\">\n",
       "<title>yi.1&#45;&gt;43</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M514.84,-575.87C491.29,-550.92 447.44,-504.45 420.38,-475.77\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"422.63,-473.06 413.22,-468.19 417.54,-477.86 422.63,-473.06\"/>\n",
       "</g>\n",
       "<!-- 26 -->\n",
       "<g id=\"node13\" class=\"node\">\n",
       "<title>26</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M164,-540C164,-540 134,-540 134,-540 128,-540 122,-534 122,-528 122,-528 122,-516 122,-516 122,-510 128,-504 134,-504 134,-504 164,-504 164,-504 170,-504 176,-510 176,-516 176,-516 176,-528 176,-528 176,-534 170,-540 164,-540\"/>\n",
       "<text text-anchor=\"middle\" x=\"149\" y=\"-518.3\" font-family=\"Times,serif\" font-size=\"14.00\">min</text>\n",
       "</g>\n",
       "<!-- 20&#45;&gt;26 -->\n",
       "<g id=\"edge9\" class=\"edge\">\n",
       "<title>20&#45;&gt;26</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M111.36,-575.7C117.23,-567.47 124.37,-557.48 130.84,-548.42\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"133.82,-550.28 136.78,-540.1 128.12,-546.21 133.82,-550.28\"/>\n",
       "</g>\n",
       "<!-- 23&#45;&gt;26 -->\n",
       "<g id=\"edge10\" class=\"edge\">\n",
       "<title>23&#45;&gt;26</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M165.56,-575.7C163.11,-567.9 160.16,-558.51 157.43,-549.83\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"160.71,-548.59 154.38,-540.1 154.04,-550.69 160.71,-548.59\"/>\n",
       "</g>\n",
       "<!-- 26&#45;&gt;29 -->\n",
       "<g id=\"edge11\" class=\"edge\">\n",
       "<title>26&#45;&gt;29</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M149,-503.7C149,-495.98 149,-486.71 149,-478.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"152.5,-478.1 149,-468.1 145.5,-478.1 152.5,-478.1\"/>\n",
       "</g>\n",
       "<!-- wi.1 -->\n",
       "<g id=\"node15\" class=\"node\">\n",
       "<title>wi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M220.5,-396C220.5,-396 185.5,-396 185.5,-396 179.5,-396 173.5,-390 173.5,-384 173.5,-384 173.5,-372 173.5,-372 173.5,-366 179.5,-360 185.5,-360 185.5,-360 220.5,-360 220.5,-360 226.5,-360 232.5,-366 232.5,-372 232.5,-372 232.5,-384 232.5,-384 232.5,-390 226.5,-396 220.5,-396\"/>\n",
       "<text text-anchor=\"middle\" x=\"203\" y=\"-374.3\" font-family=\"Times,serif\" font-size=\"14.00\">clamp</text>\n",
       "</g>\n",
       "<!-- 29&#45;&gt;wi.1 -->\n",
       "<g id=\"edge13\" class=\"edge\">\n",
       "<title>29&#45;&gt;wi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M162.35,-431.7C168.76,-423.39 176.56,-413.28 183.61,-404.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"186.47,-406.16 189.81,-396.1 180.93,-401.88 186.47,-406.16\"/>\n",
       "</g>\n",
       "<!-- area_i.1 -->\n",
       "<g id=\"node21\" class=\"node\">\n",
       "<title>area_i.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M374,-324C374,-324 344,-324 344,-324 338,-324 332,-318 332,-312 332,-312 332,-300 332,-300 332,-294 338,-288 344,-288 344,-288 374,-288 374,-288 380,-288 386,-294 386,-300 386,-300 386,-312 386,-312 386,-318 380,-324 374,-324\"/>\n",
       "<text text-anchor=\"middle\" x=\"359\" y=\"-302.3\" font-family=\"Times,serif\" font-size=\"14.00\">mul</text>\n",
       "</g>\n",
       "<!-- wi.1&#45;&gt;area_i.1 -->\n",
       "<g id=\"edge23\" class=\"edge\">\n",
       "<title>wi.1&#45;&gt;area_i.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M232.71,-363.67C258.25,-352.21 295.18,-335.64 322.52,-323.37\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"324.06,-326.51 331.75,-319.23 321.19,-320.13 324.06,-326.51\"/>\n",
       "</g>\n",
       "<!-- 40 -->\n",
       "<g id=\"node18\" class=\"node\">\n",
       "<title>40</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M412,-540C412,-540 382,-540 382,-540 376,-540 370,-534 370,-528 370,-528 370,-516 370,-516 370,-510 376,-504 382,-504 382,-504 412,-504 412,-504 418,-504 424,-510 424,-516 424,-516 424,-528 424,-528 424,-534 418,-540 412,-540\"/>\n",
       "<text text-anchor=\"middle\" x=\"397\" y=\"-518.3\" font-family=\"Times,serif\" font-size=\"14.00\">min</text>\n",
       "</g>\n",
       "<!-- 34&#45;&gt;40 -->\n",
       "<g id=\"edge18\" class=\"edge\">\n",
       "<title>34&#45;&gt;40</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M443.67,-575.7C436.24,-567.3 427.18,-557.07 419.02,-547.86\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"421.4,-545.27 412.15,-540.1 416.16,-549.91 421.4,-545.27\"/>\n",
       "</g>\n",
       "<!-- 37&#45;&gt;40 -->\n",
       "<g id=\"edge19\" class=\"edge\">\n",
       "<title>37&#45;&gt;40</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M389.47,-575.7C390.57,-567.98 391.9,-558.71 393.13,-550.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"396.61,-550.5 394.56,-540.1 389.68,-549.51 396.61,-550.5\"/>\n",
       "</g>\n",
       "<!-- 40&#45;&gt;43 -->\n",
       "<g id=\"edge20\" class=\"edge\">\n",
       "<title>40&#45;&gt;43</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M397,-503.7C397,-495.98 397,-486.71 397,-478.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"400.5,-478.1 397,-468.1 393.5,-478.1 400.5,-478.1\"/>\n",
       "</g>\n",
       "<!-- hi.1 -->\n",
       "<g id=\"node20\" class=\"node\">\n",
       "<title>hi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M397.5,-396C397.5,-396 362.5,-396 362.5,-396 356.5,-396 350.5,-390 350.5,-384 350.5,-384 350.5,-372 350.5,-372 350.5,-366 356.5,-360 362.5,-360 362.5,-360 397.5,-360 397.5,-360 403.5,-360 409.5,-366 409.5,-372 409.5,-372 409.5,-384 409.5,-384 409.5,-390 403.5,-396 397.5,-396\"/>\n",
       "<text text-anchor=\"middle\" x=\"380\" y=\"-374.3\" font-family=\"Times,serif\" font-size=\"14.00\">clamp</text>\n",
       "</g>\n",
       "<!-- 43&#45;&gt;hi.1 -->\n",
       "<g id=\"edge22\" class=\"edge\">\n",
       "<title>43&#45;&gt;hi.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M392.8,-431.7C390.9,-423.9 388.62,-414.51 386.52,-405.83\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"389.92,-405 384.15,-396.1 383.11,-406.65 389.92,-405\"/>\n",
       "</g>\n",
       "<!-- hi.1&#45;&gt;area_i.1 -->\n",
       "<g id=\"edge24\" class=\"edge\">\n",
       "<title>hi.1&#45;&gt;area_i.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M374.81,-359.7C372.47,-351.9 369.65,-342.51 367.05,-333.83\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"370.36,-332.68 364.13,-324.1 363.65,-334.69 370.36,-332.68\"/>\n",
       "</g>\n",
       "<!-- area_u.1 -->\n",
       "<g id=\"node25\" class=\"node\">\n",
       "<title>area_u.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M319,-252C319,-252 289,-252 289,-252 283,-252 277,-246 277,-240 277,-240 277,-228 277,-228 277,-222 283,-216 289,-216 289,-216 319,-216 319,-216 325,-216 331,-222 331,-228 331,-228 331,-240 331,-240 331,-246 325,-252 319,-252\"/>\n",
       "<text text-anchor=\"middle\" x=\"304\" y=\"-230.3\" font-family=\"Times,serif\" font-size=\"14.00\">sub</text>\n",
       "</g>\n",
       "<!-- area_i.1&#45;&gt;area_u.1 -->\n",
       "<g id=\"edge32\" class=\"edge\">\n",
       "<title>area_i.1&#45;&gt;area_u.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M345.4,-287.7C338.88,-279.39 330.93,-269.28 323.75,-260.14\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"326.37,-257.81 317.44,-252.1 320.87,-262.13 326.37,-257.81\"/>\n",
       "</g>\n",
       "<!-- 65 -->\n",
       "<g id=\"node27\" class=\"node\">\n",
       "<title>65</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M346,-108C346,-108 316,-108 316,-108 310,-108 304,-102 304,-96 304,-96 304,-84 304,-84 304,-78 310,-72 316,-72 316,-72 346,-72 346,-72 352,-72 358,-78 358,-84 358,-84 358,-96 358,-96 358,-102 352,-108 346,-108\"/>\n",
       "<text text-anchor=\"middle\" x=\"331\" y=\"-86.3\" font-family=\"Times,serif\" font-size=\"14.00\">div</text>\n",
       "</g>\n",
       "<!-- area_i.1&#45;&gt;65 -->\n",
       "<g id=\"edge34\" class=\"edge\">\n",
       "<title>area_i.1&#45;&gt;65</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M357.24,-287.57C354.18,-257.99 347.49,-196.1 340,-144 338.79,-135.56 337.29,-126.38 335.87,-118.08\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"339.31,-117.4 334.14,-108.15 332.41,-118.6 339.31,-117.4\"/>\n",
       "</g>\n",
       "<!-- 57 -->\n",
       "<g id=\"node24\" class=\"node\">\n",
       "<title>57</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M319,-540C319,-540 289,-540 289,-540 283,-540 277,-534 277,-528 277,-528 277,-516 277,-516 277,-510 283,-504 289,-504 289,-504 319,-504 319,-504 325,-504 331,-510 331,-516 331,-516 331,-528 331,-528 331,-534 325,-540 319,-540\"/>\n",
       "<text text-anchor=\"middle\" x=\"304\" y=\"-518.3\" font-family=\"Times,serif\" font-size=\"14.00\">add</text>\n",
       "</g>\n",
       "<!-- 51&#45;&gt;57 -->\n",
       "<g id=\"edge29\" class=\"edge\">\n",
       "<title>51&#45;&gt;57</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M258.08,-575.7C265.39,-567.3 274.31,-557.07 282.34,-547.86\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"285.16,-549.94 289.09,-540.1 279.89,-545.34 285.16,-549.94\"/>\n",
       "</g>\n",
       "<!-- 54&#45;&gt;57 -->\n",
       "<g id=\"edge30\" class=\"edge\">\n",
       "<title>54&#45;&gt;57</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M312.28,-575.7C311.07,-567.98 309.61,-558.71 308.26,-550.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"311.7,-549.44 306.69,-540.1 304.78,-550.53 311.7,-549.44\"/>\n",
       "</g>\n",
       "<!-- 57&#45;&gt;area_u.1 -->\n",
       "<g id=\"edge31\" class=\"edge\">\n",
       "<title>57&#45;&gt;area_u.1</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M304,-503.97C304,-456.29 304,-321.18 304,-262.63\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"307.5,-262.31 304,-252.31 300.5,-262.31 307.5,-262.31\"/>\n",
       "</g>\n",
       "<!-- 62 -->\n",
       "<g id=\"node26\" class=\"node\">\n",
       "<title>62</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M319.5,-180C319.5,-180 284.5,-180 284.5,-180 278.5,-180 272.5,-174 272.5,-168 272.5,-168 272.5,-156 272.5,-156 272.5,-150 278.5,-144 284.5,-144 284.5,-144 319.5,-144 319.5,-144 325.5,-144 331.5,-150 331.5,-156 331.5,-156 331.5,-168 331.5,-168 331.5,-174 325.5,-180 319.5,-180\"/>\n",
       "<text text-anchor=\"middle\" x=\"302\" y=\"-158.3\" font-family=\"Times,serif\" font-size=\"14.00\">clamp</text>\n",
       "</g>\n",
       "<!-- area_u.1&#45;&gt;62 -->\n",
       "<g id=\"edge33\" class=\"edge\">\n",
       "<title>area_u.1&#45;&gt;62</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M303.51,-215.7C303.29,-207.98 303.02,-198.71 302.77,-190.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"306.27,-190 302.49,-180.1 299.28,-190.2 306.27,-190\"/>\n",
       "</g>\n",
       "<!-- 62&#45;&gt;65 -->\n",
       "<g id=\"edge35\" class=\"edge\">\n",
       "<title>62&#45;&gt;65</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M309.17,-143.7C312.43,-135.81 316.37,-126.3 320,-117.55\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"323.32,-118.68 323.91,-108.1 316.85,-116 323.32,-118.68\"/>\n",
       "</g>\n",
       "<!-- .outputs -->\n",
       "<g id=\"node28\" class=\"node\">\n",
       "<title>.outputs</title>\n",
       "<ellipse fill=\"none\" stroke=\"blue\" cx=\"331\" cy=\"-18\" rx=\"46.29\" ry=\"18\"/>\n",
       "<text text-anchor=\"middle\" x=\"331\" y=\"-14.3\" font-family=\"Times,serif\" font-size=\"14.00\">outputs</text>\n",
       "</g>\n",
       "<!-- 65&#45;&gt;.outputs -->\n",
       "<g id=\"edge36\" class=\"edge\">\n",
       "<title>65&#45;&gt;.outputs</title>\n",
       "<path fill=\"none\" stroke=\"black\" d=\"M331,-71.7C331,-63.98 331,-54.71 331,-46.11\"/>\n",
       "<polygon fill=\"black\" stroke=\"black\" points=\"334.5,-46.1 331,-36.1 327.5,-46.1 334.5,-46.1\"/>\n",
       "</g>\n",
       "</g>\n",
       "</svg>\n"
      ],
      "text/plain": [
       "<graphviz.dot.Digraph at 0x7ff25f358a60>"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c = torch.jit.fuser(\"fuser1\")\n",
    "c.__enter__()\n",
    "torch._C._jit_set_num_profiled_runs(2)\n",
    "\n",
    "def ratio_iou(x1, y1, w1, h1, x2, y2, w2, h2):\n",
    "    xi = torch.max(x1, x2)                                  # Intersection left\n",
    "    yi = torch.max(y1, y2)                                  # Intersection top\n",
    "    wi = torch.clamp(torch.min(x1+w1, x2+w2) - xi, min=0.)  # Intersection width\n",
    "    hi = torch.clamp(torch.min(y1+h1, y2+h2) - yi, min=0.)  # Intersection height\n",
    "    area_i = wi * hi                                        # Area Intersection\n",
    "    area_u = w1 * h1 + w2 * h2 - wi * hi                    # Area Union\n",
    "    return area_i / torch.clamp(area_u, min=1e-5)           # Intersection over Union\n",
    "\n",
    "ratio_iou_scripted = torch.jit.script(ratio_iou)\n",
    "\n",
    "inputs1 = torch.randn(8, 100, 1000, device='cuda').exp()\n",
    "inputs2 = torch.randn(8, 101, 1000, device='cuda').exp()\n",
    "\n",
    "for i in range(10):\n",
    "    ratio_iou_scripted.graph_for(*inputs1)\n",
    "    ratio_iou_scripted.graph_for(*inputs2)\n",
    "    \n",
    "make_graph(ratio_iou_scripted.graph_for(*inputs1))\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting more debug output\n",
    "\n",
    "When we run the JIT on the command line, we can make use of its [debug logging facility](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md#jit-logging) to watch its parts in action more closely. \n",
    "\n",
    "The fusers also have various debugging facilities. The TensorExpr one uses the debug logging facility (grep for `GRAPH_` in `torch/csrc/jit/tensorexpr/`) and the CUDA one uses environment variables starting with `PYTORCH_CUDA_FUSER` (grep for that in `torch/csrc/jit/codegen/cuda/`).\n",
    "\n",
    "## Conclusion\n",
    "\n",
    "In this piece, we saw a bit how the JIT works, with a focus on the parts that make fusion optimizations possible and took a dive from a very high level to experimentation that try to show how some internals work.\n",
    "I hope you enjoyed this tour. As always your feedback is appreciated: <tv@lernapparat.de>.\n",
    "\n",
    "*Sidenote*: There also is a more general technical overview in the file [`torch/csrc/jit/OVERVIEW.md`](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md) in the JIT directory of the PyTorch source code) and various bits of documentation in `.md` files throughout the source as well as in comments in the source.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  },
  "rise": {
   "header": "<img id='mathinf-logo-header' src='mathinf_logo.png' />",
   "theme": "white",
   "transition": "off"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}