{ "cells": [ { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Tracking Failure Origins\n", "\n", "The question of \"Where does this value come from?\" is fundamental for debugging. Which earlier variables could possibly have influenced the current erroneous state? And how did their values come to be?\n", "\n", "When programmers read code during debugging, they scan it for potential _origins_ of given values. This can be a tedious experience, notably, if the origins spread across multiple separate locations, possibly even in different modules. In this chapter, we thus investigate means to _determine such origins_ automatically – by collecting data and control dependencies during program execution." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.061193Z", "iopub.status.busy": "2023-11-12T12:40:19.061034Z", "iopub.status.idle": "2023-11-12T12:40:19.101419Z", "shell.execute_reply": "2023-11-12T12:40:19.101120Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from bookutils import YouTubeVideo\n", "YouTubeVideo(\"sjf3cOR0lcI\")" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "**Prerequisites**\n", "\n", "* You should have read the [Introduction to Debugging](Intro_Debugging).\n", "* To understand how to compute dependencies automatically (the second half of this chapter), you will need\n", " * advanced knowledge of Python semantics\n", " * knowledge on how to instrument and transform code\n", " * knowledge on how an interpreter works" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "button": false, "execution": { "iopub.execute_input": "2023-11-12T12:40:19.122806Z", "iopub.status.busy": "2023-11-12T12:40:19.122599Z", "iopub.status.idle": "2023-11-12T12:40:19.125087Z", "shell.execute_reply": "2023-11-12T12:40:19.124754Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import bookutils.setup" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.126834Z", "iopub.status.busy": "2023-11-12T12:40:19.126701Z", "iopub.status.idle": "2023-11-12T12:40:19.128522Z", "shell.execute_reply": "2023-11-12T12:40:19.128246Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import quiz, next_inputs, print_content" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.130368Z", "iopub.status.busy": "2023-11-12T12:40:19.130231Z", "iopub.status.idle": "2023-11-12T12:40:19.131867Z", "shell.execute_reply": "2023-11-12T12:40:19.131609Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import inspect\n", "import warnings" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.133412Z", "iopub.status.busy": "2023-11-12T12:40:19.133303Z", "iopub.status.idle": "2023-11-12T12:40:19.135028Z", "shell.execute_reply": "2023-11-12T12:40:19.134779Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "from typing import Set, List, Tuple, Any, Callable, Dict, Optional\n", "from typing import Union, Type, Generator, cast" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Synopsis\n", "\n", "\n", "To [use the code provided in this chapter](Importing.ipynb), write\n", "\n", "```python\n", ">>> from debuggingbook.Slicer import \n", "```\n", "\n", "and then make use of the following features.\n", "\n", "\n", "This chapter provides a `Slicer` class to automatically determine and visualize dynamic flows and dependencies. When we say that a variable $x$ _depends_ on a variable $y$ (and that $y$ _flows_ into $x$), we distinguish two kinds of dependencies:\n", "\n", "* **Data dependency**: $x$ is assigned a value computed from $y$.\n", "* **Control dependency**: A statement involving $x$ is executed _only_ because a _condition_ involving $y$ was evaluated, influencing the execution path.\n", "\n", "Such dependencies are crucial for debugging, as they allow determininh the origins of individual values (and notably incorrect values).\n", "\n", "To determine dynamic dependencies in a function `func`, use\n", "\n", "```python\n", "with Slicer() as slicer:\n", " \n", "```\n", "\n", "and then `slicer.graph()` or `slicer.code()` to examine dependencies.\n", "\n", "You can also explicitly specify the functions to be instrumented, as in \n", "\n", "```python\n", "with Slicer(func, func_1, func_2) as slicer:\n", " \n", "```\n", "\n", "Here is an example. The `demo()` function computes some number from `x`:\n", "\n", "```python\n", ">>> def demo(x: int) -> int:\n", ">>> z = x\n", ">>> while x <= z <= 64:\n", ">>> z *= 2\n", ">>> return z\n", "```\n", "By using `with Slicer()`, we first instrument `demo()` and then execute it:\n", "\n", "```python\n", ">>> with Slicer() as slicer:\n", ">>> demo(10)\n", "```\n", "After execution is complete, you can output `slicer` to visualize the dependencies and flows as graph. Data dependencies are shown as black solid edges; control dependencies are shown as grey dashed edges. The arrows indicate influence: If $y$ depends on $x$ (and thus $x$ flows into $y$), then we have an arrow $x \\rightarrow y$.\n", "We see how the parameter `x` flows into `z`, which is returned after some computation that is control dependent on a `` involving `z`.\n", "\n", "```python\n", ">>> slicer\n", "```\n", "![](PICS/Slicer-synopsis-1.svg)\n", "\n", "An alternate representation is `slicer.code()`, annotating the instrumented source code with (backward) dependencies. Data dependencies are shown with `<=`, control dependencies with `<-`; locations (lines) are shown in parentheses.\n", "\n", "```python\n", ">>> slicer.code()\n", "* 1 def demo(x: int) -> int:\n", "* 2 z = x # <= x (1)\n", "* 3 while x <= z <= 64: # <= x (1), z (4), z (2)\n", "* 4 z *= 2 # <= z (4), z (2); <- (3)\n", "* 5 return z # <= z (4)\n", "\n", "```\n", "Dependencies can also be retrieved programmatically. The `dependencies()` method returns a `Dependencies` object encapsulating the dependency graph.\n", "\n", "The method `all_vars()` returns all variables in the dependency graph. Each variable is encoded as a pair (_name_, _location_) where _location_ is a pair (_codename_, _lineno_).\n", "\n", "```python\n", ">>> slicer.dependencies().all_vars()\n", "{('', ( int>, 5)),\n", " ('', ( int>, 3)),\n", " ('x', ( int>, 1)),\n", " ('z', ( int>, 2)),\n", " ('z', ( int>, 4))}\n", "```\n", "`code()` and `graph()` methods can also be applied on dependencies. The method `backward_slice(var)` returns a backward slice for the given variable (again given as a pair (_name_, _location_)). To retrieve where `z` in Line 2 came from, use:\n", "\n", "```python\n", ">>> _, start_demo = inspect.getsourcelines(demo)\n", ">>> start_demo\n", "1\n", ">>> slicer.dependencies().backward_slice(('z', (demo, start_demo + 1))).graph() # type: ignore\n", "```\n", "![](PICS/Slicer-synopsis-2.svg)\n", "\n", "Here are the classes defined in this chapter. A `Slicer` instruments a program, using a `DependencyTracker` at run time to collect `Dependencies`.\n", "\n", "![](PICS/Slicer-synopsis-3.svg)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Dependencies\n", "\n", "In the [Introduction to debugging](Intro_Debugging.ipynb), we have seen how faults in a program state propagate to eventually become visible as failures. This induces a debugging strategy called _tracking origins_:" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "1. We start with a single faulty state _f_ – the failure.\n", "2. We determine _f_'s _origins_ – the parts of earlier states that could have caused the faulty state _f_.\n", "3. For each of these origins _e_, we determine whether they are faulty or not.\n", "4. For each of the faulty origins, we in turn determine _their_ origins.\n", "5. If we find a part of the state that is faulty, yet has only correct origins, we have found the defect." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "In all generality, a \"part of the state\" can be anything that can influence the program – some configuration setting, some database content, or the state of a device. Almost always, though, it is through _individual variables_ that a part of the state manifests itself.\n", "\n", "The good news is that variables do not take arbitrary values at arbitrary times – instead, they are set and accessed at precise moments in time, as determined by the program's semantics. This allows us to determine their _origins_ by reading program code." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us assume you have a piece of code that reads as follows. The `middle()` function is supposed to return the \"middle\" number of three values `x`, `y`, and `z` – that is, the one number that neither is the minimum nor the maximum." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.136793Z", "iopub.status.busy": "2023-11-12T12:40:19.136676Z", "iopub.status.idle": "2023-11-12T12:40:19.138564Z", "shell.execute_reply": "2023-11-12T12:40:19.138309Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def middle(x, y, z): # type: ignore\n", " if y < z:\n", " if x < y:\n", " return y\n", " elif x < z:\n", " return y\n", " else:\n", " if x > y:\n", " return y\n", " elif x > z:\n", " return x\n", " return z" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In most cases, `middle()` runs just fine:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.140161Z", "iopub.status.busy": "2023-11-12T12:40:19.140035Z", "iopub.status.idle": "2023-11-12T12:40:19.142345Z", "shell.execute_reply": "2023-11-12T12:40:19.142046Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = middle(1, 2, 3)\n", "m" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In others, however, it returns the wrong value:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.144147Z", "iopub.status.busy": "2023-11-12T12:40:19.144012Z", "iopub.status.idle": "2023-11-12T12:40:19.146296Z", "shell.execute_reply": "2023-11-12T12:40:19.145950Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = middle(2, 1, 3)\n", "m" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This is a typical debugging situation: You see a value that is erroneous; and you want to find out where it came from. \n", "\n", "* In our case, we see that the erroneous value was returned from `middle()`, so we identify the five `return` statements in `middle()` that the value could have come from.\n", "* The value returned is the value of `y`, and neither `x`, `y`, nor `z` are altered during the execution of `middle()`. Hence, it must be one of the three `return y` statements that is the origin of `m`. But which one?\n", "\n", "For our small example, we can fire up an interactive debugger and simply step through the function; this reveals us the conditions evaluated and the `return` statement executed." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.148073Z", "iopub.status.busy": "2023-11-12T12:40:19.147964Z", "iopub.status.idle": "2023-11-12T12:40:19.261207Z", "shell.execute_reply": "2023-11-12T12:40:19.260834Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import Debugger # minor dependency" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.263237Z", "iopub.status.busy": "2023-11-12T12:40:19.263094Z", "iopub.status.idle": "2023-11-12T12:40:19.264845Z", "shell.execute_reply": "2023-11-12T12:40:19.264555Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# ignore\n", "next_inputs([\"step\", \"step\", \"step\", \"step\", \"quit\"]);" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.266448Z", "iopub.status.busy": "2023-11-12T12:40:19.266315Z", "iopub.status.idle": "2023-11-12T12:40:19.323633Z", "shell.execute_reply": "2023-11-12T12:40:19.323368Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Calling middle(x = 2, y = 1, z = 3)\n" ] }, { "data": { "text/html": [ "(debugger) step" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "2 if y < z:\n" ] }, { "data": { "text/html": [ "(debugger) step" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "3 if x < y:\n" ] }, { "data": { "text/html": [ "(debugger) step" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "5 elif x < z:\n" ] }, { "data": { "text/html": [ "(debugger) step" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "6 return y\n" ] }, { "data": { "text/html": [ "(debugger) quit" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "with Debugger.Debugger():\n", " middle(2, 1, 3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We now see that it was the second `return` statement that returned the incorrect value. But why was it executed after all? To this end, we can resort to the `middle()` source code and have a look at those conditions that caused the `return y` statement to be executed. Indeed, the conditions `y < z`, `x > y`, and finally `x < z` again are _origins_ of the returned value – and in turn have `x`, `y`, and `z` as origins." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In our above reasoning about origins, we have encountered two kinds of origins:\n", "\n", "* earlier _data values_ (such as the value of `y` being returned) and\n", "* earlier _control conditions_ (such as the `if` conditions governing the `return y` statement).\n", "\n", "The later parts of the state that can be influenced by such origins are said to be _dependent_ on these origins. Speaking of variables, a variable $x$ _depends_ on the value of a variable $y$ (written as $x \\leftarrow y$) if a change in $y$ could affect the value of $x$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We distinguish two kinds of dependencies $x \\leftarrow y$, aligned with the two kinds of origins as outlined above:\n", "\n", "* **Data dependency**: $x$ is assigned a value computed from $y$. In our example, `m` is data dependent on the return value of `middle()`.\n", "* **Control dependency**: A statement involving $x$ is executed _only_ because a _condition_ involving $y$ was evaluated, influencing the execution path. In our example, the value returned by `return y` is control dependent on the several conditions along its path, which involve `x`, `y`, and `z`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us examine these dependencies in more detail." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Visualizing Dependencies" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Note: This is an excursion, diverting away from the main flow of the chapter. Unless you know what you are doing, you are encouraged to skip this part." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To illustrate our examples, we introduce a `Dependencies` class that captures dependencies between variables at specific locations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### A Class for Dependencies" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`Dependencies` holds two dependency graphs. `data` holds data dependencies, `control` holds control dependencies." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Each of the two is organized as a dictionary holding _nodes_ as keys and sets of nodes as values. Each node comes as a tuple\n", "\n", "```python\n", "(variable_name, location)\n", " ```\n", " \n", "where `variable_name` is a string and `location` is a pair\n", "\n", "\n", "```python\n", "(func, lineno)\n", " ```\n", " \n", "denoting a unique location in the code." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This is also reflected in the following type definitions:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.326037Z", "iopub.status.busy": "2023-11-12T12:40:19.325812Z", "iopub.status.idle": "2023-11-12T12:40:19.327981Z", "shell.execute_reply": "2023-11-12T12:40:19.327645Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "Location = Tuple[Callable, int]\n", "Node = Tuple[str, Location]\n", "Dependency = Dict[Node, Set[Node]]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In this chapter, for many purposes, we need to lookup a function's location, source code, or simply definition. The class `StackInspector` provides a number of convenience functions for this purpose." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.329655Z", "iopub.status.busy": "2023-11-12T12:40:19.329543Z", "iopub.status.idle": "2023-11-12T12:40:19.331404Z", "shell.execute_reply": "2023-11-12T12:40:19.331118Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from StackInspector import StackInspector" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `Dependencies` class builds on `StackInspector` to capture dependencies." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.332943Z", "iopub.status.busy": "2023-11-12T12:40:19.332840Z", "iopub.status.idle": "2023-11-12T12:40:19.335433Z", "shell.execute_reply": "2023-11-12T12:40:19.335149Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(StackInspector):\n", " \"\"\"A dependency graph\"\"\"\n", "\n", " def __init__(self, \n", " data: Optional[Dependency] = None,\n", " control: Optional[Dependency] = None) -> None:\n", " \"\"\"\n", " Create a dependency graph from `data` and `control`.\n", " Both `data` and `control` are dictionaries\n", " holding _nodes_ as keys and sets of nodes as values.\n", " Each node comes as a tuple (variable_name, location)\n", " where `variable_name` is a string \n", " and `location` is a pair (function, lineno)\n", " where `function` is a callable and `lineno` is a line number\n", " denoting a unique location in the code.\n", " \"\"\"\n", "\n", " if data is None:\n", " data = {}\n", " if control is None:\n", " control = {}\n", "\n", " self.data = data\n", " self.control = control\n", "\n", " for var in self.data:\n", " self.control.setdefault(var, set())\n", " for var in self.control:\n", " self.data.setdefault(var, set())\n", "\n", " self.validate()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `validate()` method checks for consistency." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.337081Z", "iopub.status.busy": "2023-11-12T12:40:19.336950Z", "iopub.status.idle": "2023-11-12T12:40:19.339392Z", "shell.execute_reply": "2023-11-12T12:40:19.339107Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def validate(self) -> None:\n", " \"\"\"Check dependency structure.\"\"\"\n", " assert isinstance(self.data, dict)\n", " assert isinstance(self.control, dict)\n", "\n", " for node in (self.data.keys()) | set(self.control.keys()):\n", " var_name, location = node\n", " assert isinstance(var_name, str)\n", " func, lineno = location\n", " assert callable(func)\n", " assert isinstance(lineno, int)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `source()` method returns the source code for a given node." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.341234Z", "iopub.status.busy": "2023-11-12T12:40:19.341096Z", "iopub.status.idle": "2023-11-12T12:40:19.344375Z", "shell.execute_reply": "2023-11-12T12:40:19.343984Z" }, "ipub": { "ignore": true }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def _source(self, node: Node) -> str:\n", " # Return source line, or ''\n", " (name, location) = node\n", " func, lineno = location\n", " if not func: # type: ignore\n", " # No source\n", " return ''\n", "\n", " try:\n", " source_lines, first_lineno = inspect.getsourcelines(func)\n", " except OSError:\n", " warnings.warn(f\"Couldn't find source \"\n", " f\"for {func} ({func.__name__})\")\n", " return ''\n", "\n", " try:\n", " line = source_lines[lineno - first_lineno].strip()\n", " except IndexError:\n", " return ''\n", "\n", " return line\n", "\n", " def source(self, node: Node) -> str:\n", " \"\"\"Return the source code for a given node.\"\"\"\n", " line = self._source(node)\n", " if line:\n", " return line\n", "\n", " (name, location) = node\n", " func, lineno = location\n", " code_name = func.__name__\n", "\n", " if code_name.startswith('<'):\n", " return code_name\n", " else:\n", " return f'<{code_name}()>'" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.346164Z", "iopub.status.busy": "2023-11-12T12:40:19.346058Z", "iopub.status.idle": "2023-11-12T12:40:19.348743Z", "shell.execute_reply": "2023-11-12T12:40:19.348499Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'def middle(x, y, z): # type: ignore'" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_deps = Dependencies()\n", "test_deps.source(('z', (middle, 1)))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Drawing Dependencies" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Both data and control form a graph between nodes, and can be visualized as such. We use the `graphviz` package for creating such visualizations." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.350264Z", "iopub.status.busy": "2023-11-12T12:40:19.350176Z", "iopub.status.idle": "2023-11-12T12:40:19.351903Z", "shell.execute_reply": "2023-11-12T12:40:19.351644Z" }, "ipub": { "ignore": true }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from graphviz import Digraph" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`make_graph()` sets the basic graph attributes." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.353350Z", "iopub.status.busy": "2023-11-12T12:40:19.353264Z", "iopub.status.idle": "2023-11-12T12:40:19.354849Z", "shell.execute_reply": "2023-11-12T12:40:19.354620Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import html" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.356268Z", "iopub.status.busy": "2023-11-12T12:40:19.356184Z", "iopub.status.idle": "2023-11-12T12:40:19.358786Z", "shell.execute_reply": "2023-11-12T12:40:19.358431Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " NODE_COLOR = 'peachpuff'\n", " FONT_NAME = 'Courier' # 'Fira Mono' may produce warnings in 'dot'\n", "\n", " def make_graph(self,\n", " name: str = \"dependencies\",\n", " comment: str = \"Dependencies\") -> Digraph:\n", " return Digraph(name=name, comment=comment,\n", " graph_attr={\n", " },\n", " node_attr={\n", " 'style': 'filled',\n", " 'shape': 'box',\n", " 'fillcolor': self.NODE_COLOR,\n", " 'fontname': self.FONT_NAME\n", " },\n", " edge_attr={\n", " 'fontname': self.FONT_NAME\n", " })" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`graph()` returns a graph visualization." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.360718Z", "iopub.status.busy": "2023-11-12T12:40:19.360569Z", "iopub.status.idle": "2023-11-12T12:40:19.363267Z", "shell.execute_reply": "2023-11-12T12:40:19.362919Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def graph(self, *, mode: str = 'flow') -> Digraph:\n", " \"\"\"\n", " Draw dependencies. `mode` is either\n", " * `'flow'`: arrows indicate information flow (from A to B); or\n", " * `'depend'`: arrows indicate dependencies (B depends on A)\n", " \"\"\"\n", " self.validate()\n", "\n", " g = self.make_graph()\n", " self.draw_dependencies(g, mode)\n", " self.add_hierarchy(g)\n", " return g\n", "\n", " def _repr_mimebundle_(self, include: Any = None, exclude: Any = None) -> Any:\n", " \"\"\"If the object is output in Jupyter, render dependencies as a SVG graph\"\"\"\n", " return self.graph()._repr_mimebundle_(include, exclude)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The main part of graph drawing takes place in two methods, `draw_dependencies()` and `add_hierarchy()`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`draw_dependencies()` processes through the graph, adding nodes and edges from the dependencies." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.365002Z", "iopub.status.busy": "2023-11-12T12:40:19.364883Z", "iopub.status.idle": "2023-11-12T12:40:19.367479Z", "shell.execute_reply": "2023-11-12T12:40:19.367127Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def all_vars(self) -> Set[Node]:\n", " \"\"\"Return a set of all variables (as `var_name`, `location`) in the dependencies\"\"\"\n", " all_vars = set()\n", " for var in self.data:\n", " all_vars.add(var)\n", " for source in self.data[var]:\n", " all_vars.add(source)\n", "\n", " for var in self.control:\n", " all_vars.add(var)\n", " for source in self.control[var]:\n", " all_vars.add(source)\n", "\n", " return all_vars" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.369296Z", "iopub.status.busy": "2023-11-12T12:40:19.369169Z", "iopub.status.idle": "2023-11-12T12:40:19.372649Z", "shell.execute_reply": "2023-11-12T12:40:19.372327Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def draw_edge(self, g: Digraph, mode: str,\n", " node_from: str, node_to: str, **kwargs: Any) -> None:\n", " if mode == 'flow':\n", " g.edge(node_from, node_to, **kwargs)\n", " elif mode == 'depend':\n", " g.edge(node_from, node_to, dir=\"back\", **kwargs)\n", " else:\n", " raise ValueError(\"`mode` must be 'flow' or 'depend'\")\n", "\n", " def draw_dependencies(self, g: Digraph, mode: str) -> None:\n", " for var in self.all_vars():\n", " g.node(self.id(var),\n", " label=self.label(var),\n", " tooltip=self.tooltip(var))\n", "\n", " if var in self.data:\n", " for source in self.data[var]:\n", " self.draw_edge(g, mode, self.id(source), self.id(var))\n", "\n", " if var in self.control:\n", " for source in self.control[var]:\n", " self.draw_edge(g, mode, self.id(source), self.id(var),\n", " style='dashed', color='grey')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`draw_dependencies()` makes use of a few helper functions." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.374537Z", "iopub.status.busy": "2023-11-12T12:40:19.374392Z", "iopub.status.idle": "2023-11-12T12:40:19.378146Z", "shell.execute_reply": "2023-11-12T12:40:19.377770Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def id(self, var: Node) -> str:\n", " \"\"\"Return a unique ID for `var`.\"\"\"\n", " id = \"\"\n", " # Avoid non-identifier characters\n", " for c in repr(var):\n", " if c.isalnum() or c == '_':\n", " id += c\n", " if c == ':' or c == ',':\n", " id += '_'\n", " return id\n", "\n", " def label(self, var: Node) -> str:\n", " \"\"\"Render node `var` using HTML style.\"\"\"\n", " (name, location) = var\n", " source = self.source(var)\n", "\n", " title = html.escape(name)\n", " if name.startswith('<'):\n", " title = f'{title}'\n", "\n", " label = f'{title}'\n", " if source:\n", " label += (f'

'\n", " f'{html.escape(source)}'\n", " f'
')\n", " label = f'<{label}>'\n", " return label\n", "\n", " def tooltip(self, var: Node) -> str:\n", " \"\"\"Return a tooltip for node `var`.\"\"\"\n", " (name, location) = var\n", " func, lineno = location\n", " return f\"{func.__name__}:{lineno}\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In the second part of graph drawing, `add_hierarchy()` adds invisible edges to ensure that nodes with lower line numbers are drawn above nodes with higher line numbers." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.380189Z", "iopub.status.busy": "2023-11-12T12:40:19.380022Z", "iopub.status.idle": "2023-11-12T12:40:19.382653Z", "shell.execute_reply": "2023-11-12T12:40:19.382364Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def add_hierarchy(self, g: Digraph) -> Digraph:\n", " \"\"\"Add invisible edges for a proper hierarchy.\"\"\"\n", " functions = self.all_functions()\n", " for func in functions:\n", " last_var = None\n", " last_lineno = 0\n", " for (lineno, var) in functions[func]:\n", " if last_var is not None and lineno > last_lineno:\n", " g.edge(self.id(last_var),\n", " self.id(var),\n", " style='invis')\n", "\n", " last_var = var\n", " last_lineno = lineno\n", "\n", " return g" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.384206Z", "iopub.status.busy": "2023-11-12T12:40:19.384089Z", "iopub.status.idle": "2023-11-12T12:40:19.386927Z", "shell.execute_reply": "2023-11-12T12:40:19.386640Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def all_functions(self) -> Dict[Callable, List[Tuple[int, Node]]]:\n", " \"\"\"\n", " Return mapping \n", " {`function`: [(`lineno`, `var`), (`lineno`, `var`), ...], ...}\n", " for all functions in the dependencies.\n", " \"\"\"\n", " functions: Dict[Callable, List[Tuple[int, Node]]] = {}\n", " for var in self.all_vars():\n", " (name, location) = var\n", " func, lineno = location\n", " if func not in functions:\n", " functions[func] = []\n", " functions[func].append((lineno, var))\n", "\n", " for func in functions:\n", " functions[func].sort()\n", "\n", " return functions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here comes the graph in all its glory:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.388465Z", "iopub.status.busy": "2023-11-12T12:40:19.388354Z", "iopub.status.idle": "2023-11-12T12:40:19.392377Z", "shell.execute_reply": "2023-11-12T12:40:19.391922Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def middle_deps() -> Dependencies:\n", " return Dependencies({('z', (middle, 1)): set(), ('y', (middle, 1)): set(), ('x', (middle, 1)): set(), ('', (middle, 2)): {('y', (middle, 1)), ('z', (middle, 1))}, ('', (middle, 3)): {('y', (middle, 1)), ('x', (middle, 1))}, ('', (middle, 5)): {('z', (middle, 1)), ('x', (middle, 1))}, ('', (middle, 6)): {('y', (middle, 1))}}, {('z', (middle, 1)): set(), ('y', (middle, 1)): set(), ('x', (middle, 1)): set(), ('', (middle, 2)): set(), ('', (middle, 3)): {('', (middle, 2))}, ('', (middle, 5)): {('', (middle, 3))}, ('', (middle, 6)): {('', (middle, 5))}})" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.394139Z", "iopub.status.busy": "2023-11-12T12:40:19.394039Z", "iopub.status.idle": "2023-11-12T12:40:19.867220Z", "shell.execute_reply": "2023-11-12T12:40:19.866835Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "<test>\n", "if x < y:\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "<test>\n", "elif x < z:\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_3->test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "\n", "\n", "\n", "\n", "x_functionmiddleat0x1048dec20_1\n", "\n", "\n", "x\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "x_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "\n", "\n", "\n", "x_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1\n", "\n", "\n", "y\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_2\n", "\n", "\n", "<test>\n", "if y < z:\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_2\n", "\n", "\n", "\n", "\n", "\n", "middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "<middle() return value>\n", "return y\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1->middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_2->test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_5->middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "\n", "\n", "\n", "\n", "z_functionmiddleat0x1048dec20_1\n", "\n", "\n", "z\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "z_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_2\n", "\n", "\n", "\n", "\n", "\n", "\n", "z_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "<__main__.Dependencies at 0x1048d55d0>" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "middle_deps()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "By default, the arrow direction indicates _flows_ – an arrow _A_ → _B_ indicates that information from _A_ _flows_ into _B_ (and thus the state in _A_ _causes_ the state in _B_). By setting the extra keyword parameter `mode` to `depend` instead of `flow` (default), you can reverse these arrows; then an arrow _A_ → _B_ indicates _A_ _depends_ on _B_." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:19.869164Z", "iopub.status.busy": "2023-11-12T12:40:19.868880Z", "iopub.status.idle": "2023-11-12T12:40:20.297356Z", "shell.execute_reply": "2023-11-12T12:40:20.296934Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "<test>\n", "if x < y:\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "<test>\n", "elif x < z:\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_3->test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "\n", "\n", "\n", "\n", "x_functionmiddleat0x1048dec20_1\n", "\n", "\n", "x\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "x_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "\n", "\n", "\n", "x_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1\n", "\n", "\n", "y\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_2\n", "\n", "\n", "<test>\n", "if y < z:\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_2\n", "\n", "\n", "\n", "\n", "\n", "middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "<middle() return value>\n", "return y\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1->middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_2->test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_5->middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "\n", "\n", "\n", "\n", "z_functionmiddleat0x1048dec20_1\n", "\n", "\n", "z\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "z_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_2\n", "\n", "\n", "\n", "\n", "\n", "\n", "z_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "middle_deps().graph(mode='depend')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Slices\n", "\n", "The method `backward_slice(*critera, mode='cd')` returns a subset of dependencies, following dependencies backward from the given *slicing criteria* `criteria`. These criteria can be\n", "\n", "* variable names (such as ``); or\n", "* `(function, lineno)` pairs (such as `(middle, 3)`); or\n", "* `(var_name, (function, lineno))` (such as `(`x`, (middle, 1))`) locations.\n", "\n", "The extra parameter `mode` controls which dependencies are to be followed:\n", "\n", "* **`d`** = data dependencies\n", "* **`c`** = control dependencies" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:20.299348Z", "iopub.status.busy": "2023-11-12T12:40:20.299099Z", "iopub.status.idle": "2023-11-12T12:40:20.301355Z", "shell.execute_reply": "2023-11-12T12:40:20.301082Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "Criterion = Union[str, Location, Node]" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:20.302868Z", "iopub.status.busy": "2023-11-12T12:40:20.302745Z", "iopub.status.idle": "2023-11-12T12:40:20.308321Z", "shell.execute_reply": "2023-11-12T12:40:20.307978Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def expand_criteria(self, criteria: List[Criterion]) -> List[Node]:\n", " \"\"\"Return list of vars matched by `criteria`.\"\"\"\n", " all_vars = []\n", " for criterion in criteria:\n", " criterion_var = None\n", " criterion_func = None\n", " criterion_lineno = None\n", "\n", " if isinstance(criterion, str):\n", " criterion_var = criterion\n", " elif len(criterion) == 2 and callable(criterion[0]):\n", " criterion_func, criterion_lineno = criterion\n", " elif len(criterion) == 2 and isinstance(criterion[0], str):\n", " criterion_var = criterion[0]\n", " criterion_func, criterion_lineno = criterion[1]\n", " else:\n", " raise ValueError(\"Invalid argument\")\n", "\n", " for var in self.all_vars():\n", " (var_name, location) = var\n", " func, lineno = location\n", "\n", " name_matches = (criterion_func is None or\n", " criterion_func == func or\n", " criterion_func.__name__ == func.__name__)\n", "\n", " location_matches = (criterion_lineno is None or\n", " criterion_lineno == lineno)\n", "\n", " var_matches = (criterion_var is None or\n", " criterion_var == var_name)\n", "\n", " if name_matches and location_matches and var_matches:\n", " all_vars.append(var)\n", "\n", " return all_vars\n", "\n", " def backward_slice(self, *criteria: Criterion, \n", " mode: str = 'cd', depth: int = -1) -> Dependencies:\n", " \"\"\"\n", " Create a backward slice from nodes `criteria`.\n", " `mode` can contain 'c' (draw control dependencies)\n", " and 'd' (draw data dependencies) (default: 'cd')\n", " \"\"\"\n", " data = {}\n", " control = {}\n", " queue = self.expand_criteria(criteria) # type: ignore\n", " seen = set()\n", "\n", " while len(queue) > 0 and depth != 0:\n", " var = queue[0]\n", " queue = queue[1:]\n", " seen.add(var)\n", "\n", " if 'd' in mode:\n", " # Follow data dependencies\n", " data[var] = self.data[var]\n", " for next_var in data[var]:\n", " if next_var not in seen:\n", " queue.append(next_var)\n", " else:\n", " data[var] = set()\n", "\n", " if 'c' in mode:\n", " # Follow control dependencies\n", " control[var] = self.control[var]\n", " for next_var in control[var]:\n", " if next_var not in seen:\n", " queue.append(next_var)\n", " else:\n", " control[var] = set()\n", "\n", " depth -= 1\n", "\n", " return Dependencies(data, control)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Data Dependencies\n", "\n", "Here is an example of a data dependency in our `middle()` program. The value `y` returned by `middle()` comes from the value `y` as originally passed as argument. We use arrows $x \\leftarrow y$ to indicate that a variable $x$ depends on an earlier variable $y$:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:20.310400Z", "iopub.status.busy": "2023-11-12T12:40:20.310248Z", "iopub.status.idle": "2023-11-12T12:40:20.721073Z", "shell.execute_reply": "2023-11-12T12:40:20.720692Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1\n", "\n", "\n", "y\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "<middle() return value>\n", "return y\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1->middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "<__main__.Dependencies at 0x1044f74c0>" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "middle_deps().backward_slice('', mode='d') # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here, we can see that the value `y` in the return statement is data dependent on the value of `y` as passed to `middle()`. An alternate interpretation of this graph is a *data flow*: The value of `y` in the upper node _flows_ into the value of `y` in the lower node." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Since we consider the values of variables at specific locations in the program, such data dependencies can also be interpreted as dependencies between _statements_ – the above `return` statement thus is data dependent on the initialization of `y` in the upper node." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Control Dependencies\n", "\n", "Here is an example of a control dependency. The execution of the above `return` statement is controlled by the earlier test `x < z`. We use gray dashed lines to indicate control dependencies:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:20.723079Z", "iopub.status.busy": "2023-11-12T12:40:20.722948Z", "iopub.status.idle": "2023-11-12T12:40:21.160748Z", "shell.execute_reply": "2023-11-12T12:40:21.160327Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "<middle() return value>\n", "return y\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "<test>\n", "elif x < z:\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_5->middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "<__main__.Dependencies at 0x104ba0df0>" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "middle_deps().backward_slice('', mode='c', depth=1) # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This test in turn is controlled by earlier tests, so the full chain of control dependencies looks like this:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:21.162745Z", "iopub.status.busy": "2023-11-12T12:40:21.162617Z", "iopub.status.idle": "2023-11-12T12:40:21.597107Z", "shell.execute_reply": "2023-11-12T12:40:21.596756Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "<test>\n", "if x < y:\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "<test>\n", "elif x < z:\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_3->test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_2\n", "\n", "\n", "<test>\n", "if y < z:\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_2->test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "\n", "\n", "\n", "\n", "middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "<middle() return value>\n", "return y\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_5->middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "<__main__.Dependencies at 0x104ba1900>" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "middle_deps().backward_slice('', mode='c') # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Dependency Graphs\n", "\n", "The above `` values (and their statements) are in turn also dependent on earlier data, namely the `x`, `y`, and `z` values as originally passed. We can draw all data and control dependencies in a single graph, called a _program dependency graph_:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:21.598792Z", "iopub.status.busy": "2023-11-12T12:40:21.598673Z", "iopub.status.idle": "2023-11-12T12:40:22.003201Z", "shell.execute_reply": "2023-11-12T12:40:22.002818Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "<test>\n", "if x < y:\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "<test>\n", "elif x < z:\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_3->test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "\n", "\n", "\n", "\n", "x_functionmiddleat0x1048dec20_1\n", "\n", "\n", "x\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "x_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "\n", "\n", "\n", "x_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1\n", "\n", "\n", "y\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_2\n", "\n", "\n", "<test>\n", "if y < z:\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_2\n", "\n", "\n", "\n", "\n", "\n", "middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "<middle() return value>\n", "return y\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x1048dec20_1->middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_2->test_functionmiddleat0x1048dec20_3\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x1048dec20_5->middlereturnvalue_functionmiddleat0x1048dec20_6\n", "\n", "\n", "\n", "\n", "\n", "\n", "z_functionmiddleat0x1048dec20_1\n", "\n", "\n", "z\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "z_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_2\n", "\n", "\n", "\n", "\n", "\n", "\n", "z_functionmiddleat0x1048dec20_1->test_functionmiddleat0x1048dec20_5\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "<__main__.Dependencies at 0x1044f45e0>" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "middle_deps()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This graph now gives us an idea on how to proceed to track the origins of the `middle()` return value at the bottom. Its value can come from any of the origins – namely the initialization of `y` at the function call, or from the `` that controls it. This test in turn depends on `x` and `z` and their associated statements, which we now can check one after the other." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Note that all these dependencies in the graph are _dynamic_ dependencies – that is, they refer to statements actually evaluated in the run at hand, as well as the decisions made in that very run. There also are _static_ dependency graphs coming from static analysis of the code; but for debugging, _dynamic_ dependencies specific to the failing run are more useful." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Showing Dependencies with Code\n", "\n", "While a graph gives us a representation about which possible data and control flows to track, integrating dependencies with actual program code results in a compact representation that is easy to reason about." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Excursion: Listing Dependencies" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To show dependencies as text, we introduce a method `format_var()` that shows a single node (a variable) as text. By default, a node is referenced as\n", "\n", "```python\n", "NAME (FUNCTION:LINENO)\n", "```\n", "\n", "However, within a given function, it makes no sense to re-state the function name again and again, so we have a shorthand\n", "\n", "```python\n", "NAME (LINENO)\n", "```\n", "\n", "to state a dependency to variable `NAME` in line `LINENO`." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.005385Z", "iopub.status.busy": "2023-11-12T12:40:22.005247Z", "iopub.status.idle": "2023-11-12T12:40:22.008167Z", "shell.execute_reply": "2023-11-12T12:40:22.007775Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def format_var(self, var: Node, current_func: Optional[Callable] = None) -> str:\n", " \"\"\"Return string for `var` in `current_func`.\"\"\"\n", " name, location = var\n", " func, lineno = location\n", " if current_func and (func == current_func or func.__name__ == current_func.__name__):\n", " return f\"{name} ({lineno})\"\n", " else:\n", " return f\"{name} ({func.__name__}:{lineno})\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`format_var()` is used extensively in the `__str__()` string representation of dependencies, listing all nodes and their data (`<=`) and control (`<-`) dependencies." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.010329Z", "iopub.status.busy": "2023-11-12T12:40:22.009967Z", "iopub.status.idle": "2023-11-12T12:40:22.014222Z", "shell.execute_reply": "2023-11-12T12:40:22.013868Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def __str__(self) -> str:\n", " \"\"\"Return string representation of dependencies\"\"\"\n", " self.validate()\n", "\n", " out = \"\"\n", " for func in self.all_functions():\n", " code_name = func.__name__\n", "\n", " if out != \"\":\n", " out += \"\\n\"\n", " out += f\"{code_name}():\\n\"\n", "\n", " all_vars = list(set(self.data.keys()) | set(self.control.keys()))\n", " all_vars.sort(key=lambda var: var[1][1])\n", "\n", " for var in all_vars:\n", " (name, location) = var\n", " var_func, var_lineno = location\n", " var_code_name = var_func.__name__\n", "\n", " if var_code_name != code_name:\n", " continue\n", "\n", " all_deps = \"\"\n", " for (source, arrow) in [(self.data, \"<=\"), (self.control, \"<-\")]:\n", " deps = \"\"\n", " for data_dep in source[var]:\n", " if deps == \"\":\n", " deps = f\" {arrow} \"\n", " else:\n", " deps += \", \"\n", " deps += self.format_var(data_dep, func)\n", "\n", " if deps != \"\":\n", " if all_deps != \"\":\n", " all_deps += \";\"\n", " all_deps += deps\n", "\n", " if all_deps == \"\":\n", " continue\n", "\n", " out += (\" \" + \n", " self.format_var(var, func) +\n", " all_deps + \"\\n\")\n", "\n", " return out" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here is a compact string representation of dependencies. We see how the (last) `middle() return value` has a data dependency to `y` in Line 1, and to the `` in Line 5." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.016156Z", "iopub.status.busy": "2023-11-12T12:40:22.016004Z", "iopub.status.idle": "2023-11-12T12:40:22.018479Z", "shell.execute_reply": "2023-11-12T12:40:22.018098Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "middle():\n", " (2) <= z (1), y (1)\n", " (3) <= x (1), y (1); <- (2)\n", " (5) <= x (1), z (1); <- (3)\n", " (6) <= y (1); <- (5)\n", "\n" ] } ], "source": [ "print(middle_deps())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `__repr__()` method shows a raw form of dependencies, useful for creating dependencies from scratch." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.020239Z", "iopub.status.busy": "2023-11-12T12:40:22.020103Z", "iopub.status.idle": "2023-11-12T12:40:22.023339Z", "shell.execute_reply": "2023-11-12T12:40:22.023050Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def repr_var(self, var: Node) -> str:\n", " name, location = var\n", " func, lineno = location\n", " return f\"({repr(name)}, ({func.__name__}, {lineno}))\"\n", "\n", " def repr_deps(self, var_set: Set[Node]) -> str:\n", " if len(var_set) == 0:\n", " return \"set()\"\n", "\n", " return (\"{\" +\n", " \", \".join(f\"{self.repr_var(var)}\"\n", " for var in var_set) +\n", " \"}\")\n", "\n", " def repr_dependencies(self, vars: Dependency) -> str:\n", " return (\"{\\n \" +\n", " \",\\n \".join(\n", " f\"{self.repr_var(var)}: {self.repr_deps(vars[var])}\"\n", " for var in vars) +\n", " \"}\")\n", "\n", " def __repr__(self) -> str:\n", " \"\"\"Represent dependencies as a Python expression\"\"\"\n", " # Useful for saving and restoring values\n", " return (f\"Dependencies(\\n\" +\n", " f\" data={self.repr_dependencies(self.data)},\\n\" +\n", " f\" control={self.repr_dependencies(self.control)})\")" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.025017Z", "iopub.status.busy": "2023-11-12T12:40:22.024856Z", "iopub.status.idle": "2023-11-12T12:40:22.027138Z", "shell.execute_reply": "2023-11-12T12:40:22.026752Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dependencies(\n", " data={\n", " ('z', (middle, 1)): set(),\n", " ('y', (middle, 1)): set(),\n", " ('x', (middle, 1)): set(),\n", " ('', (middle, 2)): {('z', (middle, 1)), ('y', (middle, 1))},\n", " ('', (middle, 3)): {('x', (middle, 1)), ('y', (middle, 1))},\n", " ('', (middle, 5)): {('x', (middle, 1)), ('z', (middle, 1))},\n", " ('', (middle, 6)): {('y', (middle, 1))}},\n", " control={\n", " ('z', (middle, 1)): set(),\n", " ('y', (middle, 1)): set(),\n", " ('x', (middle, 1)): set(),\n", " ('', (middle, 2)): set(),\n", " ('', (middle, 3)): {('', (middle, 2))},\n", " ('', (middle, 5)): {('', (middle, 3))},\n", " ('', (middle, 6)): {('', (middle, 5))}})\n" ] } ], "source": [ "print(repr(middle_deps()))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "An even more useful representation comes when integrating these dependencies as comments into the code. The method `code(item_1, item_2, ...)` lists the given (function) items, including their dependencies; `code()` lists _all_ functions contained in the dependencies." ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.028798Z", "iopub.status.busy": "2023-11-12T12:40:22.028655Z", "iopub.status.idle": "2023-11-12T12:40:22.033830Z", "shell.execute_reply": "2023-11-12T12:40:22.033510Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def code(self, *items: Callable, mode: str = 'cd') -> None:\n", " \"\"\"\n", " List `items` on standard output, including dependencies as comments. \n", " If `items` is empty, all included functions are listed.\n", " `mode` can contain 'c' (draw control dependencies) and 'd' (draw data dependencies)\n", " (default: 'cd').\n", " \"\"\"\n", "\n", " if len(items) == 0:\n", " items = cast(Tuple[Callable], self.all_functions().keys())\n", "\n", " for i, item in enumerate(items):\n", " if i > 0:\n", " print()\n", " self._code(item, mode)\n", "\n", " def _code(self, item: Callable, mode: str) -> None:\n", " # The functions in dependencies may be (instrumented) copies\n", " # of the original function. Find the function with the same name.\n", " func = item\n", " for fn in self.all_functions():\n", " if fn == item or fn.__name__ == item.__name__:\n", " func = fn\n", " break\n", "\n", " all_vars = self.all_vars()\n", " slice_locations = set(location for (name, location) in all_vars)\n", "\n", " source_lines, first_lineno = inspect.getsourcelines(func)\n", "\n", " n = first_lineno\n", " for line in source_lines:\n", " line_location = (func, n)\n", " if line_location in slice_locations:\n", " prefix = \"* \"\n", " else:\n", " prefix = \" \"\n", "\n", " print(f\"{prefix}{n:4} \", end=\"\")\n", "\n", " comment = \"\"\n", " for (mode_control, source, arrow) in [\n", " ('d', self.data, '<='),\n", " ('c', self.control, '<-')\n", " ]:\n", " if mode_control not in mode:\n", " continue\n", "\n", " deps = \"\"\n", " for var in source:\n", " name, location = var\n", " if location == line_location:\n", " for dep_var in source[var]:\n", " if deps == \"\":\n", " deps = arrow + \" \"\n", " else:\n", " deps += \", \"\n", " deps += self.format_var(dep_var, item)\n", "\n", " if deps != \"\":\n", " if comment != \"\":\n", " comment += \"; \"\n", " comment += deps\n", "\n", " if comment != \"\":\n", " line = line.rstrip() + \" # \" + comment\n", "\n", " print_content(line.rstrip(), '.py')\n", " print()\n", " n += 1" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The following listing shows such an integration. For each executed line (`*`), we see its data (`<=`) and control (`<-`) dependencies, listing the associated variables and line numbers. The comment\n", "\n", "```python\n", "# <= y (1); <- (5)\n", "```\n", "\n", "for Line 6, for instance, states that the return value is data dependent on the value of `y` in Line 1, and control dependent on the test in Line 5.\n", "\n", "Again, one can easily follow these dependencies back to track where a value came from (data dependencies) and why a statement was executed (control dependency)." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.035448Z", "iopub.status.busy": "2023-11-12T12:40:22.035330Z", "iopub.status.idle": "2023-11-12T12:40:22.438804Z", "shell.execute_reply": "2023-11-12T12:40:22.438523Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* 1 \u001b[34mdef\u001b[39;49;00m \u001b[32mmiddle\u001b[39;49;00m(x, y, z): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 2 \u001b[34mif\u001b[39;49;00m y < z: \u001b[37m# <= z (1), y (1)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 3 \u001b[34mif\u001b[39;49;00m x < y: \u001b[37m# <= x (1), y (1); <- (2)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 4 \u001b[34mreturn\u001b[39;49;00m y\u001b[37m\u001b[39;49;00m\n", "* 5 \u001b[34melif\u001b[39;49;00m x < z: \u001b[37m# <= x (1), z (1); <- (3)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 6 \u001b[34mreturn\u001b[39;49;00m y \u001b[37m# <= y (1); <- (5)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 7 \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " 8 \u001b[34mif\u001b[39;49;00m x > y:\u001b[37m\u001b[39;49;00m\n", " 9 \u001b[34mreturn\u001b[39;49;00m y\u001b[37m\u001b[39;49;00m\n", " 10 \u001b[34melif\u001b[39;49;00m x > z:\u001b[37m\u001b[39;49;00m\n", " 11 \u001b[34mreturn\u001b[39;49;00m x\u001b[37m\u001b[39;49;00m\n", " 12 \u001b[34mreturn\u001b[39;49;00m z\u001b[37m\u001b[39;49;00m\n" ] } ], "source": [ "# ignore\n", "middle_deps().code() # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "One important aspect of dependencies is that they not only point to specific sources and causes of failures – but that they also _rule out_ parts of program and state as failures.\n", "\n", "* In the above code, Lines 8 and later have no influence on the output, simply because they were not executed.\n", "* Furthermore, we see that we can start our investigation with Line 6, because that is the last one executed.\n", "* The data dependencies tell us that no statement has interfered with the value of `y` between the function call and its return.\n", "* Hence, the error must be in the conditions or the final `return` statement.\n", "\n", "With this in mind, recall that our original invocation was `middle(2, 1, 3)`. Why and how is the above code wrong?" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.440543Z", "iopub.status.busy": "2023-11-12T12:40:22.440415Z", "iopub.status.idle": "2023-11-12T12:40:22.445711Z", "shell.execute_reply": "2023-11-12T12:40:22.445418Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

Which of the following middle() code lines should be fixed?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"Which of the following `middle()` code lines should be fixed?\",\n", " [\n", " \"Line 2: `if y < z:`\",\n", " \"Line 3: `if x < y:`\",\n", " \"Line 5: `elif x < z:`\",\n", " \"Line 6: `return z`\",\n", " ], '(1 ** 0 + 1 ** 1) ** (1 ** 2 + 1 ** 3)')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Indeed, from the controlling conditions, we see that `y < z`, `x >= y`, and `x < z` all hold. Hence, `y <= x < z` holds, and it is `x`, not `y`, that should be returned." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Slices\n", "\n", "Given a dependency graph for a particular variable, we can identify the subset of the program that could have influenced it – the so-called _slice_. In the above code listing, these code locations are highlighted with `*` characters. Only these locations are part of the slice." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Slices are central to debugging for two reasons:\n", "\n", "* First, they _rule out_ those locations of the program that could _not_ have an effect on the failure. Hence, these locations need not be investigated as it comes to searching for the defect. Nor do they need to be considered for a fix, as any change outside the program slice by construction cannot affect the failure.\n", "* Second, they bring together possible origins that may be scattered across the code. Many dependencies in program code are _non-local_, with references to functions, classes, and modules defined in other locations, files, or libraries. A slice brings together all those locations in a single whole." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here is an example of a slice – this time for our well-known `remove_html_markup()` function from [the introduction to debugging](Intro_Debugging.ipynb):" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.448812Z", "iopub.status.busy": "2023-11-12T12:40:22.448649Z", "iopub.status.idle": "2023-11-12T12:40:22.452207Z", "shell.execute_reply": "2023-11-12T12:40:22.451843Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from Intro_Debugging import remove_html_markup" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.454035Z", "iopub.status.busy": "2023-11-12T12:40:22.453919Z", "iopub.status.idle": "2023-11-12T12:40:22.487259Z", "shell.execute_reply": "2023-11-12T12:40:22.486914Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mremove_html_markup\u001b[39;49;00m(s): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " tag = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " quote = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " out = \u001b[33m\"\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[34mfor\u001b[39;49;00m c \u001b[35min\u001b[39;49;00m s:\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m tag \u001b[35mor\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m c == \u001b[33m'\u001b[39;49;00m\u001b[33m<\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote:\u001b[37m\u001b[39;49;00m\n", " tag = \u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34melif\u001b[39;49;00m c == \u001b[33m'\u001b[39;49;00m\u001b[33m>\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote:\u001b[37m\u001b[39;49;00m\n", " tag = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34melif\u001b[39;49;00m (c == \u001b[33m'\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mor\u001b[39;49;00m c == \u001b[33m\"\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m) \u001b[35mand\u001b[39;49;00m tag:\u001b[37m\u001b[39;49;00m\n", " quote = \u001b[35mnot\u001b[39;49;00m quote\u001b[37m\u001b[39;49;00m\n", " \u001b[34melif\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m tag:\u001b[37m\u001b[39;49;00m\n", " out = out + c\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m out\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(inspect.getsource(remove_html_markup), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "When we invoke `remove_html_markup()` as follows..." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.489026Z", "iopub.status.busy": "2023-11-12T12:40:22.488888Z", "iopub.status.idle": "2023-11-12T12:40:22.491187Z", "shell.execute_reply": "2023-11-12T12:40:22.490907Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'bar'" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "remove_html_markup('bar')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "... we obtain the following dependencies:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.493098Z", "iopub.status.busy": "2023-11-12T12:40:22.492957Z", "iopub.status.idle": "2023-11-12T12:40:22.498144Z", "shell.execute_reply": "2023-11-12T12:40:22.497846Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "def remove_html_markup_deps() -> Dependencies:\n", " return Dependencies({('s', (remove_html_markup, 136)): set(), ('tag', (remove_html_markup, 137)): set(), ('quote', (remove_html_markup, 138)): set(), ('out', (remove_html_markup, 139)): set(), ('c', (remove_html_markup, 141)): {('s', (remove_html_markup, 136))}, ('', (remove_html_markup, 144)): {('quote', (remove_html_markup, 138)), ('c', (remove_html_markup, 141))}, ('tag', (remove_html_markup, 145)): set(), ('', (remove_html_markup, 146)): {('quote', (remove_html_markup, 138)), ('c', (remove_html_markup, 141))}, ('', (remove_html_markup, 148)): {('c', (remove_html_markup, 141))}, ('', (remove_html_markup, 150)): {('tag', (remove_html_markup, 147)), ('tag', (remove_html_markup, 145))}, ('tag', (remove_html_markup, 147)): set(), ('out', (remove_html_markup, 151)): {('out', (remove_html_markup, 151)), ('c', (remove_html_markup, 141)), ('out', (remove_html_markup, 139))}, ('', (remove_html_markup, 153)): {('', (remove_html_markup, 146)), ('out', (remove_html_markup, 151))}}, {('s', (remove_html_markup, 136)): set(), ('tag', (remove_html_markup, 137)): set(), ('quote', (remove_html_markup, 138)): set(), ('out', (remove_html_markup, 139)): set(), ('c', (remove_html_markup, 141)): set(), ('', (remove_html_markup, 144)): set(), ('tag', (remove_html_markup, 145)): {('', (remove_html_markup, 144))}, ('', (remove_html_markup, 146)): {('', (remove_html_markup, 144))}, ('', (remove_html_markup, 148)): {('', (remove_html_markup, 146))}, ('', (remove_html_markup, 150)): {('', (remove_html_markup, 148))}, ('tag', (remove_html_markup, 147)): {('', (remove_html_markup, 146))}, ('out', (remove_html_markup, 151)): {('', (remove_html_markup, 150))}, ('', (remove_html_markup, 153)): set()})" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.499739Z", "iopub.status.busy": "2023-11-12T12:40:22.499639Z", "iopub.status.idle": "2023-11-12T12:40:22.922816Z", "shell.execute_reply": "2023-11-12T12:40:22.922389Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_146\n", "\n", "\n", "<test>\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_148\n", "\n", "\n", "<test>\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_146->test_functionremove_html_markupat0x104965090_148\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x104965090_147\n", "\n", "\n", "tag\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_146->tag_functionremove_html_markupat0x104965090_147\n", "\n", "\n", "\n", "\n", "\n", "\n", "remove_html_markupreturnvalue_functionremove_html_markupat0x104965090_153\n", "\n", "\n", "<remove_html_markup() return value>\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_146->remove_html_markupreturnvalue_functionremove_html_markupat0x104965090_153\n", "\n", "\n", "\n", "\n", "\n", "quote_functionremove_html_markupat0x104965090_138\n", "\n", "\n", "quote\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "quote_functionremove_html_markupat0x104965090_138->test_functionremove_html_markupat0x104965090_146\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_144\n", "\n", "\n", "<test>\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "quote_functionremove_html_markupat0x104965090_138->test_functionremove_html_markupat0x104965090_144\n", "\n", "\n", "\n", "\n", "\n", "out_functionremove_html_markupat0x104965090_139\n", "\n", "\n", "out\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x104965090_141\n", "\n", "\n", "c\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x104965090_141->test_functionremove_html_markupat0x104965090_146\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x104965090_141->test_functionremove_html_markupat0x104965090_144\n", "\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x104965090_141->test_functionremove_html_markupat0x104965090_148\n", "\n", "\n", "\n", "\n", "\n", "out_functionremove_html_markupat0x104965090_151\n", "\n", "\n", "out\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x104965090_141->out_functionremove_html_markupat0x104965090_151\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_144->test_functionremove_html_markupat0x104965090_146\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x104965090_145\n", "\n", "\n", "tag\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_144->tag_functionremove_html_markupat0x104965090_145\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_150\n", "\n", "\n", "<test>\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_148->test_functionremove_html_markupat0x104965090_150\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_150->out_functionremove_html_markupat0x104965090_151\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x104965090_147->test_functionremove_html_markupat0x104965090_150\n", "\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x104965090_145->test_functionremove_html_markupat0x104965090_150\n", "\n", "\n", "\n", "\n", "\n", "out_functionremove_html_markupat0x104965090_151->out_functionremove_html_markupat0x104965090_151\n", "\n", "\n", "\n", "\n", "\n", "out_functionremove_html_markupat0x104965090_151->remove_html_markupreturnvalue_functionremove_html_markupat0x104965090_153\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "out_functionremove_html_markupat0x104965090_139->out_functionremove_html_markupat0x104965090_151\n", "\n", "\n", "\n", "\n", "\n", "s_functionremove_html_markupat0x104965090_136\n", "\n", "\n", "s\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "s_functionremove_html_markupat0x104965090_136->c_functionremove_html_markupat0x104965090_141\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x104965090_137\n", "\n", "\n", "tag\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "remove_html_markup_deps().graph()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Again, we can read such a graph _forward_ (starting from, say, `s`) or _backward_ (starting from the return value). Starting forward, we see how the passed string `s` flows into the `for` loop, breaking `s` into individual characters `c` that are then checked on various occasions, before flowing into the `out` return value. We also see how the various `if` conditions are all influenced by `c`, `tag`, and `quote`." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.925132Z", "iopub.status.busy": "2023-11-12T12:40:22.924911Z", "iopub.status.idle": "2023-11-12T12:40:22.930761Z", "shell.execute_reply": "2023-11-12T12:40:22.930361Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

Why does the first line tag = False not influence anything?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"Why does the first line `tag = False` not influence anything?\",\n", " [\n", " \"Because the input contains only tags\",\n", " \"Because `tag` is set to True with the first character\",\n", " \"Because `tag` is not read by any variable\",\n", " \"Because the input contains no tags\",\n", " ], '(1 << 1 + 1 >> 1)')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Which are the locations that set `tag` to True? To this end, we compute the slice of `tag` at `tag = True`:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:22.932686Z", "iopub.status.busy": "2023-11-12T12:40:22.932559Z", "iopub.status.idle": "2023-11-12T12:40:23.346295Z", "shell.execute_reply": "2023-11-12T12:40:23.345862Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x104965090_141\n", "\n", "\n", "c\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_144\n", "\n", "\n", "<test>\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x104965090_141->test_functionremove_html_markupat0x104965090_144\n", "\n", "\n", "\n", "\n", "\n", "\n", "s_functionremove_html_markupat0x104965090_136\n", "\n", "\n", "s\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "s_functionremove_html_markupat0x104965090_136->c_functionremove_html_markupat0x104965090_141\n", "\n", "\n", "\n", "\n", "\n", "quote_functionremove_html_markupat0x104965090_138\n", "\n", "\n", "quote\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x104965090_145\n", "\n", "\n", "tag\n", "<remove_html_markup()>\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x104965090_144->tag_functionremove_html_markupat0x104965090_145\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "quote_functionremove_html_markupat0x104965090_138->test_functionremove_html_markupat0x104965090_144\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "Dependencies(\n", " data={\n", " ('tag', (remove_html_markup, 145)): set(),\n", " ('', (remove_html_markup, 144)): {('quote', (remove_html_markup, 138)), ('c', (remove_html_markup, 141))},\n", " ('quote', (remove_html_markup, 138)): set(),\n", " ('c', (remove_html_markup, 141)): {('s', (remove_html_markup, 136))},\n", " ('s', (remove_html_markup, 136)): set()},\n", " control={\n", " ('tag', (remove_html_markup, 145)): {('', (remove_html_markup, 144))},\n", " ('', (remove_html_markup, 144)): set(),\n", " ('quote', (remove_html_markup, 138)): set(),\n", " ('c', (remove_html_markup, 141)): set(),\n", " ('s', (remove_html_markup, 136)): set()})" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "tag_deps = Dependencies({('tag', (remove_html_markup, 145)): set(), ('', (remove_html_markup, 144)): {('quote', (remove_html_markup, 138)), ('c', (remove_html_markup, 141))}, ('quote', (remove_html_markup, 138)): set(), ('c', (remove_html_markup, 141)): {('s', (remove_html_markup, 136))}, ('s', (remove_html_markup, 136)): set()}, {('tag', (remove_html_markup, 145)): {('', (remove_html_markup, 144))}, ('', (remove_html_markup, 144)): set(), ('quote', (remove_html_markup, 138)): set(), ('c', (remove_html_markup, 141)): set(), ('s', (remove_html_markup, 136)): set()})\n", "tag_deps" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We see where the value of `tag` comes from: from the characters `c` in `s` as well as `quote`, which all cause it to be set. Again, we can combine these dependencies and the listing in a single, compact view. Note, again, that there are no other locations in the code that could possibly have affected `tag` in our run." ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.348156Z", "iopub.status.busy": "2023-11-12T12:40:23.348024Z", "iopub.status.idle": "2023-11-12T12:40:23.870783Z", "shell.execute_reply": "2023-11-12T12:40:23.870483Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 238 \u001b[34mdef\u001b[39;49;00m \u001b[32mremove_html_markup\u001b[39;49;00m(s): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 239 tag = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 240 quote = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 241 out = \u001b[33m\"\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 242 \u001b[37m\u001b[39;49;00m\n", " 243 \u001b[34mfor\u001b[39;49;00m c \u001b[35min\u001b[39;49;00m s:\u001b[37m\u001b[39;49;00m\n", " 244 \u001b[34massert\u001b[39;49;00m tag \u001b[35mor\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote\u001b[37m\u001b[39;49;00m\n", " 245 \u001b[37m\u001b[39;49;00m\n", " 246 \u001b[34mif\u001b[39;49;00m c == \u001b[33m'\u001b[39;49;00m\u001b[33m<\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote:\u001b[37m\u001b[39;49;00m\n", " 247 tag = \u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 248 \u001b[34melif\u001b[39;49;00m c == \u001b[33m'\u001b[39;49;00m\u001b[33m>\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote:\u001b[37m\u001b[39;49;00m\n", " 249 tag = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 250 \u001b[34melif\u001b[39;49;00m (c == \u001b[33m'\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mor\u001b[39;49;00m c == \u001b[33m\"\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m) \u001b[35mand\u001b[39;49;00m tag:\u001b[37m\u001b[39;49;00m\n", " 251 quote = \u001b[35mnot\u001b[39;49;00m quote\u001b[37m\u001b[39;49;00m\n", " 252 \u001b[34melif\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m tag:\u001b[37m\u001b[39;49;00m\n", " 253 out = out + c\u001b[37m\u001b[39;49;00m\n", " 254 \u001b[37m\u001b[39;49;00m\n", " 255 \u001b[34mreturn\u001b[39;49;00m out\u001b[37m\u001b[39;49;00m\n" ] } ], "source": [ "# ignore\n", "tag_deps.code()" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.872647Z", "iopub.status.busy": "2023-11-12T12:40:23.872522Z", "iopub.status.idle": "2023-11-12T12:40:23.877248Z", "shell.execute_reply": "2023-11-12T12:40:23.876924Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

How does the slice of tag = True change for a different value of s?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"How does the slice of `tag = True` change \"\n", " \"for a different value of `s`?\",\n", " [\n", " \"Not at all\",\n", " \"If `s` contains a quote, the `quote` slice is included, too\",\n", " \"If `s` contains no HTML tag, the slice will be empty\"\n", " ], '[1, 2, 3][1:]')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Indeed, our dynamic slices reflect dependencies as they occurred within a single execution. As the execution changes, so do the dependencies." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Tracking Techniques\n", "\n", "For the remainder of this chapter, let us investigate means to _determine such dependencies_ automatically – by _collecting_ them during program execution. The idea is that with a single Python call, we can collect the dependencies for some computation, and present them to programmers – as graphs or as code annotations, as shown above." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To track dependencies, for every variable, we need to keep track of its _origins_ – where it obtained its value, and which tests controlled its assignments. There are two ways to do so:\n", "\n", "* Wrapping Data Objects\n", "* Wrapping Data Accesses" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Wrapping Data Objects" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "One way to track origins is to _wrap_ each value in a class that stores both a value and the origin of the value. If a variable `x` is initialized to zero in Line 3, for instance, we could store it as\n", "```\n", "x = (value=0, origin=)\n", "```\n", "and if it is copied in, say, Line 5 to another variable `y`, we could store this as\n", "```\n", "y = (value=0, origin=)\n", "```\n", "Such a scheme would allow us to track origins and dependencies right within the variable." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In a language like Python, it is actually possibly to subclass from basic types. Here's how we create a `MyInt` subclass of `int`:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.879125Z", "iopub.status.busy": "2023-11-12T12:40:23.879005Z", "iopub.status.idle": "2023-11-12T12:40:23.881089Z", "shell.execute_reply": "2023-11-12T12:40:23.880781Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class MyInt(int):\n", " def __new__(cls: Type, value: Any, *args: Any, **kwargs: Any) -> Any:\n", " return super(cls, cls).__new__(cls, value)\n", "\n", " def __repr__(self) -> str:\n", " return f\"{int(self)}\"" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.882628Z", "iopub.status.busy": "2023-11-12T12:40:23.882519Z", "iopub.status.idle": "2023-11-12T12:40:23.884144Z", "shell.execute_reply": "2023-11-12T12:40:23.883904Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "n: MyInt = MyInt(5)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can access `n` just like any integer:" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.885830Z", "iopub.status.busy": "2023-11-12T12:40:23.885714Z", "iopub.status.idle": "2023-11-12T12:40:23.887848Z", "shell.execute_reply": "2023-11-12T12:40:23.887591Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(5, 6)" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n, n + 1" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "However, we can also add extra attributes to it:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.889629Z", "iopub.status.busy": "2023-11-12T12:40:23.889504Z", "iopub.status.idle": "2023-11-12T12:40:23.891207Z", "shell.execute_reply": "2023-11-12T12:40:23.890929Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "n.origin = \"Line 5\" # type: ignore" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.892748Z", "iopub.status.busy": "2023-11-12T12:40:23.892643Z", "iopub.status.idle": "2023-11-12T12:40:23.894669Z", "shell.execute_reply": "2023-11-12T12:40:23.894421Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'Line 5'" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n.origin # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Such a \"wrapping\" scheme has the advantage of _leaving program code untouched_ – simply pass \"wrapped\" objects instead of the original values. However, it also has a number of drawbacks.\n", "\n", "* First, we must make sure that the \"wrapper\" objects are still compatible with the original values – notably by converting them back whenever needed. (What happens if an internal Python function expects an `int` and gets a `MyInt` instead?)\n", "* Second, we have to make sure that origins do not get lost during computations – which involves overloading operators such as `+`, `-`, `*`, and so on. (Right now, `MyInt(1) + 1` gives us an `int` object, not a `MyInt`.)\n", "* Third, we have to do this for _all_ data types of a language, which is pretty tedious.\n", "* Fourth and last, however, we want to track whenever a value is assigned to another variable. Python has no support for this, and thus our dependencies will necessarily be incomplete." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Wrapping Data Accesses" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "An alternate way of tracking origins is to _instrument_ the source code such that all _data read and write operations are tracked_. That is, the original data stays unchanged, but we change the code instead.\n", "\n", "In essence, for every occurrence of a variable `x` being _read_, we replace it with\n", "\n", "```python\n", "_data.get('x', x) # returns x\n", "```\n", "\n", "and for every occurrence of a value being _written_ to `x`, we replace the value with\n", "\n", "```python\n", "_data.set('x', value) # returns value\n", "```\n", "\n", "and let the `_data` object track these reads and writes.\n", "\n", "Hence, an assignment such as\n", "\n", "```python\n", "a = b + c\n", "```\n", "\n", "would get rewritten to\n", "\n", "```python\n", "a = _data.set('a', _data.get('b', b) + _data.get('c', c))\n", "```\n", "\n", "and with every access to `_data`, we would track \n", "\n", "1. the current _location_ in the code, and \n", "2. whether the respective variable was read or written.\n", "\n", "For the above statement, we could deduce that `b` and `c` were read, and `a` was written – which makes `a` data dependent on `b` and `c`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The advantage of such instrumentation is that it works with _arbitrary objects_ (in Python, that is) – we do not care whether `a`, `b`, and `c` would be integers, floats, strings, lists or any other type for which `+` would be defined. Also, the code semantics remain entirely unchanged.\n", "\n", "The disadvantage, however, is that it takes a bit of effort to exactly separate reads and writes into individual groups, and that a number of language features have to be handled separately. This is what we do in the remainder of this chapter." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## A Data Tracker\n", "\n", "To implement `_data` accesses as shown above, we introduce the `DataTracker` class. As its name suggests, it keeps track of variables being read and written, and provides methods to determine the code location where this took place." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.896347Z", "iopub.status.busy": "2023-11-12T12:40:23.896240Z", "iopub.status.idle": "2023-11-12T12:40:23.898100Z", "shell.execute_reply": "2023-11-12T12:40:23.897851Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class DataTracker(StackInspector):\n", " \"\"\"Track data accesses during execution\"\"\"\n", "\n", " def __init__(self, log: bool = False) -> None:\n", " \"\"\"Constructor. If `log` is set, turn on logging.\"\"\"\n", " self.log = log" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`set()` is invoked when a variable is set, as in\n", "\n", "```python\n", "pi = _data.set('pi', 3.1415)\n", "```\n", "\n", "By default, we simply log the access using name and value. (`loads` will be used later.)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.899684Z", "iopub.status.busy": "2023-11-12T12:40:23.899575Z", "iopub.status.idle": "2023-11-12T12:40:23.901724Z", "shell.execute_reply": "2023-11-12T12:40:23.901485Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DataTracker(DataTracker):\n", " def set(self, name: str, value: Any, loads: Optional[Set[str]] = None) -> Any:\n", " \"\"\"Track setting `name` to `value`.\"\"\"\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: setting {name}\")\n", "\n", " return value" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`get()` is invoked when a variable is retrieved, as in\n", "\n", "```python\n", "print(_data.get('pi', pi))\n", "```\n", "By default, we simply log the access." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.903257Z", "iopub.status.busy": "2023-11-12T12:40:23.903134Z", "iopub.status.idle": "2023-11-12T12:40:23.905169Z", "shell.execute_reply": "2023-11-12T12:40:23.904932Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DataTracker(DataTracker):\n", " def get(self, name: str, value: Any) -> Any:\n", " \"\"\"Track getting `value` from `name`.\"\"\"\n", "\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: getting {name}\")\n", "\n", " return value" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here's an example of a logging `DataTracker`:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.906824Z", "iopub.status.busy": "2023-11-12T12:40:23.906708Z", "iopub.status.idle": "2023-11-12T12:40:23.908588Z", "shell.execute_reply": "2023-11-12T12:40:23.908336Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ":2: setting x\n" ] } ], "source": [ "_test_data = DataTracker(log=True)\n", "x = _test_data.set('x', 1)" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.910080Z", "iopub.status.busy": "2023-11-12T12:40:23.909978Z", "iopub.status.idle": "2023-11-12T12:40:23.911987Z", "shell.execute_reply": "2023-11-12T12:40:23.911750Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ":1: getting x\n" ] }, { "data": { "text/plain": [ "1" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_test_data.get('x', x)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Instrumenting Source Code\n", "\n", "How do we transform source code such that read and write accesses to variables would be automatically rewritten? To this end, we inspect the internal representation of source code, namely the _abstract syntax trees_ (ASTs). An AST represents the code as a tree, with specific node types for each syntactical element." ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.913572Z", "iopub.status.busy": "2023-11-12T12:40:23.913469Z", "iopub.status.idle": "2023-11-12T12:40:23.914941Z", "shell.execute_reply": "2023-11-12T12:40:23.914722Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import ast" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.916325Z", "iopub.status.busy": "2023-11-12T12:40:23.916244Z", "iopub.status.idle": "2023-11-12T12:40:23.917806Z", "shell.execute_reply": "2023-11-12T12:40:23.917580Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import show_ast" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is the tree representation for our `middle()` function. It starts with a `FunctionDef` node at the top (with the name `\"middle\"` and the three arguments `x`, `y`, `z` as children), followed by a subtree for each of the `If` statements, each of which contains a branch for when their condition evaluates to `True` and a branch for when their condition evaluates to `False`." ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:23.919215Z", "iopub.status.busy": "2023-11-12T12:40:23.919133Z", "iopub.status.idle": "2023-11-12T12:40:24.328725Z", "shell.execute_reply": "2023-11-12T12:40:24.328361Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "0\n", "FunctionDef\n", "\n", "\n", "\n", "1\n", ""middle"\n", "\n", "\n", "\n", "0--1\n", "\n", "\n", "\n", "\n", "2\n", "arguments\n", "\n", "\n", "\n", "0--2\n", "\n", "\n", "\n", "\n", "9\n", "If\n", "\n", "\n", "\n", "0--9\n", "\n", "\n", "\n", "\n", "70\n", "Return\n", "\n", "\n", "\n", "0--70\n", "\n", "\n", "\n", "\n", "3\n", "arg\n", "\n", "\n", "\n", "2--3\n", "\n", "\n", "\n", "\n", "5\n", "arg\n", "\n", "\n", "\n", "2--5\n", "\n", "\n", "\n", "\n", "7\n", "arg\n", "\n", "\n", "\n", "2--7\n", "\n", "\n", "\n", "\n", "4\n", ""x"\n", "\n", "\n", "\n", "3--4\n", "\n", "\n", "\n", "\n", "6\n", ""y"\n", "\n", "\n", "\n", "5--6\n", "\n", "\n", "\n", "\n", "8\n", ""z"\n", "\n", "\n", "\n", "7--8\n", "\n", "\n", "\n", "\n", "10\n", "Compare\n", "\n", "\n", "\n", "9--10\n", "\n", "\n", "\n", "\n", "18\n", "If\n", "\n", "\n", "\n", "9--18\n", "\n", "\n", "\n", "\n", "44\n", "If\n", "\n", "\n", "\n", "9--44\n", "\n", "\n", "\n", "\n", "11\n", "Name\n", "\n", "\n", "\n", "10--11\n", "\n", "\n", "\n", "\n", "14\n", "Lt\n", "\n", "\n", "\n", "10--14\n", "\n", "\n", "\n", "\n", "15\n", "Name\n", "\n", "\n", "\n", "10--15\n", "\n", "\n", "\n", "\n", "12\n", ""y"\n", "\n", "\n", "\n", "11--12\n", "\n", "\n", "\n", "\n", "13\n", "Load\n", "\n", "\n", "\n", "11--13\n", "\n", "\n", "\n", "\n", "16\n", ""z"\n", "\n", "\n", "\n", "15--16\n", "\n", "\n", "\n", "\n", "17\n", "Load\n", "\n", "\n", "\n", "15--17\n", "\n", "\n", "\n", "\n", "19\n", "Compare\n", "\n", "\n", "\n", "18--19\n", "\n", "\n", "\n", "\n", "27\n", "Return\n", "\n", "\n", "\n", "18--27\n", "\n", "\n", "\n", "\n", "31\n", "If\n", "\n", "\n", "\n", "18--31\n", "\n", "\n", "\n", "\n", "20\n", "Name\n", "\n", "\n", "\n", "19--20\n", "\n", "\n", "\n", "\n", "23\n", "Lt\n", "\n", "\n", "\n", "19--23\n", "\n", "\n", "\n", "\n", "24\n", "Name\n", "\n", "\n", "\n", "19--24\n", "\n", "\n", "\n", "\n", "21\n", ""x"\n", "\n", "\n", "\n", "20--21\n", "\n", "\n", "\n", "\n", "22\n", "Load\n", "\n", "\n", "\n", "20--22\n", "\n", "\n", "\n", "\n", "25\n", ""y"\n", "\n", "\n", "\n", "24--25\n", "\n", "\n", "\n", "\n", "26\n", "Load\n", "\n", "\n", "\n", "24--26\n", "\n", "\n", "\n", "\n", "28\n", "Name\n", "\n", "\n", "\n", "27--28\n", "\n", "\n", "\n", "\n", "29\n", ""y"\n", "\n", "\n", "\n", "28--29\n", "\n", "\n", "\n", "\n", "30\n", "Load\n", "\n", "\n", "\n", "28--30\n", "\n", "\n", "\n", "\n", "32\n", "Compare\n", "\n", "\n", "\n", "31--32\n", "\n", "\n", "\n", "\n", "40\n", "Return\n", "\n", "\n", "\n", "31--40\n", "\n", "\n", "\n", "\n", "33\n", "Name\n", "\n", "\n", "\n", "32--33\n", "\n", "\n", "\n", "\n", "36\n", "Lt\n", "\n", "\n", "\n", "32--36\n", "\n", "\n", "\n", "\n", "37\n", "Name\n", "\n", "\n", "\n", "32--37\n", "\n", "\n", "\n", "\n", "34\n", ""x"\n", "\n", "\n", "\n", "33--34\n", "\n", "\n", "\n", "\n", "35\n", "Load\n", "\n", "\n", "\n", "33--35\n", "\n", "\n", "\n", "\n", "38\n", ""z"\n", "\n", "\n", "\n", "37--38\n", "\n", "\n", "\n", "\n", "39\n", "Load\n", "\n", "\n", "\n", "37--39\n", "\n", "\n", "\n", "\n", "41\n", "Name\n", "\n", "\n", "\n", "40--41\n", "\n", "\n", "\n", "\n", "42\n", ""y"\n", "\n", "\n", "\n", "41--42\n", "\n", "\n", "\n", "\n", "43\n", "Load\n", "\n", "\n", "\n", "41--43\n", "\n", "\n", "\n", "\n", "45\n", "Compare\n", "\n", "\n", "\n", "44--45\n", "\n", "\n", "\n", "\n", "53\n", "Return\n", "\n", "\n", "\n", "44--53\n", "\n", "\n", "\n", "\n", "57\n", "If\n", "\n", "\n", "\n", "44--57\n", "\n", "\n", "\n", "\n", "46\n", "Name\n", "\n", "\n", "\n", "45--46\n", "\n", "\n", "\n", "\n", "49\n", "Gt\n", "\n", "\n", "\n", "45--49\n", "\n", "\n", "\n", "\n", "50\n", "Name\n", "\n", "\n", "\n", "45--50\n", "\n", "\n", "\n", "\n", "47\n", ""x"\n", "\n", "\n", "\n", "46--47\n", "\n", "\n", "\n", "\n", "48\n", "Load\n", "\n", "\n", "\n", "46--48\n", "\n", "\n", "\n", "\n", "51\n", ""y"\n", "\n", "\n", "\n", "50--51\n", "\n", "\n", "\n", "\n", "52\n", "Load\n", "\n", "\n", "\n", "50--52\n", "\n", "\n", "\n", "\n", "54\n", "Name\n", "\n", "\n", "\n", "53--54\n", "\n", "\n", "\n", "\n", "55\n", ""y"\n", "\n", "\n", "\n", "54--55\n", "\n", "\n", "\n", "\n", "56\n", "Load\n", "\n", "\n", "\n", "54--56\n", "\n", "\n", "\n", "\n", "58\n", "Compare\n", "\n", "\n", "\n", "57--58\n", "\n", "\n", "\n", "\n", "66\n", "Return\n", "\n", "\n", "\n", "57--66\n", "\n", "\n", "\n", "\n", "59\n", "Name\n", "\n", "\n", "\n", "58--59\n", "\n", "\n", "\n", "\n", "62\n", "Gt\n", "\n", "\n", "\n", "58--62\n", "\n", "\n", "\n", "\n", "63\n", "Name\n", "\n", "\n", "\n", "58--63\n", "\n", "\n", "\n", "\n", "60\n", ""x"\n", "\n", "\n", "\n", "59--60\n", "\n", "\n", "\n", "\n", "61\n", "Load\n", "\n", "\n", "\n", "59--61\n", "\n", "\n", "\n", "\n", "64\n", ""z"\n", "\n", "\n", "\n", "63--64\n", "\n", "\n", "\n", "\n", "65\n", "Load\n", "\n", "\n", "\n", "63--65\n", "\n", "\n", "\n", "\n", "67\n", "Name\n", "\n", "\n", "\n", "66--67\n", "\n", "\n", "\n", "\n", "68\n", ""x"\n", "\n", "\n", "\n", "67--68\n", "\n", "\n", "\n", "\n", "69\n", "Load\n", "\n", "\n", "\n", "67--69\n", "\n", "\n", "\n", "\n", "71\n", "Name\n", "\n", "\n", "\n", "70--71\n", "\n", "\n", "\n", "\n", "72\n", ""z"\n", "\n", "\n", "\n", "71--72\n", "\n", "\n", "\n", "\n", "73\n", "Load\n", "\n", "\n", "\n", "71--73\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "middle_tree = ast.parse(inspect.getsource(middle))\n", "show_ast(middle_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "At the very bottom of the tree, you can see a number of `Name` nodes, referring individual variables. These are the ones we want to transform." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Tracking Variable Accesses\n", "\n", "Our goal is to _traverse_ the tree, identify all `Name` nodes, and convert them to respective `_data` accesses.\n", "To this end, we manipulate the AST through the `ast` Python module `ast`. The [official Python `ast` reference](http://docs.python.org/3/library/ast) is complete, but a bit brief; the documentation [\"Green Tree Snakes - the missing Python AST docs\"](https://greentreesnakes.readthedocs.io/en/latest/) provides an excellent introduction." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The Python `ast` module provides a class `NodeTransformer` that allows such transformations. By subclassing from it, we provide a method `visit_Name()` that will be invoked for all `Name` nodes – and replace it by a new subtree from `make_get_data()`:" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.330538Z", "iopub.status.busy": "2023-11-12T12:40:24.330411Z", "iopub.status.idle": "2023-11-12T12:40:24.332275Z", "shell.execute_reply": "2023-11-12T12:40:24.332005Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ast import NodeTransformer, NodeVisitor, Name, AST" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.333724Z", "iopub.status.busy": "2023-11-12T12:40:24.333617Z", "iopub.status.idle": "2023-11-12T12:40:24.335378Z", "shell.execute_reply": "2023-11-12T12:40:24.335016Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import typing" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.336942Z", "iopub.status.busy": "2023-11-12T12:40:24.336816Z", "iopub.status.idle": "2023-11-12T12:40:24.338445Z", "shell.execute_reply": "2023-11-12T12:40:24.338175Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "DATA_TRACKER = '_data'" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.339765Z", "iopub.status.busy": "2023-11-12T12:40:24.339662Z", "iopub.status.idle": "2023-11-12T12:40:24.341504Z", "shell.execute_reply": "2023-11-12T12:40:24.341261Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def is_internal(id: str) -> bool:\n", " \"\"\"Return True if `id` is a built-in function or type\"\"\"\n", " return (id in dir(__builtins__) or id in dir(typing))" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.342887Z", "iopub.status.busy": "2023-11-12T12:40:24.342788Z", "iopub.status.idle": "2023-11-12T12:40:24.344663Z", "shell.execute_reply": "2023-11-12T12:40:24.344394Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert is_internal('int')\n", "assert is_internal('None')\n", "assert is_internal('Tuple')" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.346199Z", "iopub.status.busy": "2023-11-12T12:40:24.346078Z", "iopub.status.idle": "2023-11-12T12:40:24.348546Z", "shell.execute_reply": "2023-11-12T12:40:24.348271Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TrackGetTransformer(NodeTransformer):\n", " def visit_Name(self, node: Name) -> AST:\n", " self.generic_visit(node)\n", "\n", " if is_internal(node.id):\n", " # Do not change built-in names and types\n", " return node\n", "\n", " if node.id == DATA_TRACKER:\n", " # Do not change own accesses\n", " return node\n", "\n", " if not isinstance(node.ctx, Load):\n", " # Only change loads (not stores, not deletions)\n", " return node\n", "\n", " new_node = make_get_data(node.id)\n", " ast.copy_location(new_node, node)\n", " return new_node" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Our function `make_get_data(id, method)` returns a new subtree equivalent to the Python code `_data.method('id', id)`." ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.350058Z", "iopub.status.busy": "2023-11-12T12:40:24.349964Z", "iopub.status.idle": "2023-11-12T12:40:24.351864Z", "shell.execute_reply": "2023-11-12T12:40:24.351575Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ast import Module, Load, Store, \\\n", " Attribute, With, withitem, keyword, Call, Expr, \\\n", " Assign, AugAssign, AnnAssign, Assert" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.353399Z", "iopub.status.busy": "2023-11-12T12:40:24.353272Z", "iopub.status.idle": "2023-11-12T12:40:24.354888Z", "shell.execute_reply": "2023-11-12T12:40:24.354641Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# Starting with Python 3.8, these will become Constant.\n", "# from ast import Num, Str, NameConstant\n", "# Use `ast.Num`, `ast.Str`, and `ast.NameConstant` for compatibility" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.356262Z", "iopub.status.busy": "2023-11-12T12:40:24.356156Z", "iopub.status.idle": "2023-11-12T12:40:24.358340Z", "shell.execute_reply": "2023-11-12T12:40:24.358095Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def make_get_data(id: str, method: str = 'get') -> Call:\n", " return Call(func=Attribute(value=Name(id=DATA_TRACKER, ctx=Load()), \n", " attr=method, ctx=Load()),\n", " args=[ast.Str(s=id), Name(id=id, ctx=Load())],\n", " keywords=[])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This is the tree that `make_get_data()` produces:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.359715Z", "iopub.status.busy": "2023-11-12T12:40:24.359621Z", "iopub.status.idle": "2023-11-12T12:40:24.788252Z", "shell.execute_reply": "2023-11-12T12:40:24.787867Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "0\n", "Call\n", "\n", "\n", "\n", "1\n", "Attribute\n", "\n", "\n", "\n", "0--1\n", "\n", "\n", "\n", "\n", "7\n", "Constant\n", "\n", "\n", "\n", "0--7\n", "\n", "\n", "\n", "\n", "9\n", "Name\n", "\n", "\n", "\n", "0--9\n", "\n", "\n", "\n", "\n", "2\n", "Name\n", "\n", "\n", "\n", "1--2\n", "\n", "\n", "\n", "\n", "5\n", ""get"\n", "\n", "\n", "\n", "1--5\n", "\n", "\n", "\n", "\n", "6\n", "Load\n", "\n", "\n", "\n", "1--6\n", "\n", "\n", "\n", "\n", "3\n", ""_data"\n", "\n", "\n", "\n", "2--3\n", "\n", "\n", "\n", "\n", "4\n", "Load\n", "\n", "\n", "\n", "2--4\n", "\n", "\n", "\n", "\n", "8\n", ""x"\n", "\n", "\n", "\n", "7--8\n", "\n", "\n", "\n", "\n", "10\n", ""x"\n", "\n", "\n", "\n", "9--10\n", "\n", "\n", "\n", "\n", "11\n", "Load\n", "\n", "\n", "\n", "9--11\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_ast(Module(body=[make_get_data(\"x\")]))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "How do we know that this is a correct subtree? We can carefully read the [official Python `ast` reference](http://docs.python.org/3/library/ast) and then proceed by trial and error (and apply [delta debugging](DeltaDebugger.ipynb) to determine error causes). Or – pro tip! – we can simply take a piece of Python code, parse it and use `ast.dump()` to print out how to construct the resulting AST:" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.790278Z", "iopub.status.busy": "2023-11-12T12:40:24.790128Z", "iopub.status.idle": "2023-11-12T12:40:24.792608Z", "shell.execute_reply": "2023-11-12T12:40:24.792108Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Module(body=[Expr(value=Call(func=Attribute(value=Name(id='_data', ctx=Load()), attr='get', ctx=Load()), args=[Constant(value='x'), Name(id='x', ctx=Load())], keywords=[]))], type_ignores=[])\n" ] } ], "source": [ "print(ast.dump(ast.parse(\"_data.get('x', x)\")))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If you compare the above output with the code of `make_get_data()`, above, you will find out where the source of `make_get_data()` comes from." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us put `TrackGetTransformer` to action. Its `visit()` method calls `visit_Name()`, which then in turn transforms the `Name` nodes as we want it. This happens in place." ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.794603Z", "iopub.status.busy": "2023-11-12T12:40:24.794474Z", "iopub.status.idle": "2023-11-12T12:40:24.796871Z", "shell.execute_reply": "2023-11-12T12:40:24.796577Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "TrackGetTransformer().visit(middle_tree);" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To see the effect of our transformations, we introduce a method `dump_tree()` which outputs the tree – and also compiles it to check for any inconsistencies." ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.798775Z", "iopub.status.busy": "2023-11-12T12:40:24.798648Z", "iopub.status.idle": "2023-11-12T12:40:24.800763Z", "shell.execute_reply": "2023-11-12T12:40:24.800387Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def dump_tree(tree: AST) -> None:\n", " print_content(ast.unparse(tree), '.py')\n", " ast.fix_missing_locations(tree) # Must run this before compiling\n", " _ = compile(cast(ast.Module, tree), '', 'exec')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see that our transformer has properly replaced all variable accesses:" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.802517Z", "iopub.status.busy": "2023-11-12T12:40:24.802396Z", "iopub.status.idle": "2023-11-12T12:40:24.837692Z", "shell.execute_reply": "2023-11-12T12:40:24.837400Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmiddle\u001b[39;49;00m(x, y, z):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y)\u001b[37m\u001b[39;49;00m\n", " \u001b[34melif\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y)\u001b[37m\u001b[39;49;00m\n", " \u001b[34melif\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) > _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y)\u001b[37m\u001b[39;49;00m\n", " \u001b[34melif\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) > _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x)\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z)\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "dump_tree(middle_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let us now execute this code together with the `DataTracker()` class we previously introduced. The class `DataTrackerTester()` takes a (transformed) tree and a function. Using it as\n", "\n", "```python\n", "with DataTrackerTester(tree, func):\n", " func(...)\n", "```\n", "\n", "first executes the code in _tree_ (possibly instrumenting `func`) and then the `with` body. At the end, `func` is restored to its previous (non-instrumented) version." ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.839647Z", "iopub.status.busy": "2023-11-12T12:40:24.839513Z", "iopub.status.idle": "2023-11-12T12:40:24.841127Z", "shell.execute_reply": "2023-11-12T12:40:24.840860Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from types import TracebackType" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.842665Z", "iopub.status.busy": "2023-11-12T12:40:24.842538Z", "iopub.status.idle": "2023-11-12T12:40:24.846141Z", "shell.execute_reply": "2023-11-12T12:40:24.845793Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DataTrackerTester:\n", " def __init__(self, tree: AST, func: Callable, log: bool = True) -> None:\n", " \"\"\"Constructor. Execute the code in `tree` while instrumenting `func`.\"\"\"\n", " # We pass the source file of `func` such that we can retrieve it\n", " # when accessing the location of the new compiled code\n", " source = cast(str, inspect.getsourcefile(func))\n", " self.code = compile(cast(ast.Module, tree), source, 'exec')\n", " self.func = func\n", " self.log = log\n", "\n", " def make_data_tracker(self) -> Any:\n", " return DataTracker(log=self.log)\n", "\n", " def __enter__(self) -> Any:\n", " \"\"\"Rewrite function\"\"\"\n", " tracker = self.make_data_tracker()\n", " globals()[DATA_TRACKER] = tracker\n", " exec(self.code, globals())\n", " return tracker\n", "\n", " def __exit__(self, exc_type: Type, exc_value: BaseException,\n", " traceback: TracebackType) -> Optional[bool]:\n", " \"\"\"Restore function\"\"\"\n", " globals()[self.func.__name__] = self.func\n", " del globals()[DATA_TRACKER]\n", " return None" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here is our `middle()` function:" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.847820Z", "iopub.status.busy": "2023-11-12T12:40:24.847698Z", "iopub.status.idle": "2023-11-12T12:40:24.884115Z", "shell.execute_reply": "2023-11-12T12:40:24.882559Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1 \u001b[34mdef\u001b[39;49;00m \u001b[32mmiddle\u001b[39;49;00m(x, y, z): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 2 \u001b[34mif\u001b[39;49;00m y < z:\u001b[37m\u001b[39;49;00m\n", " 3 \u001b[34mif\u001b[39;49;00m x < y:\u001b[37m\u001b[39;49;00m\n", " 4 \u001b[34mreturn\u001b[39;49;00m y\u001b[37m\u001b[39;49;00m\n", " 5 \u001b[34melif\u001b[39;49;00m x < z:\u001b[37m\u001b[39;49;00m\n", " 6 \u001b[34mreturn\u001b[39;49;00m y\u001b[37m\u001b[39;49;00m\n", " 7 \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " 8 \u001b[34mif\u001b[39;49;00m x > y:\u001b[37m\u001b[39;49;00m\n", " 9 \u001b[34mreturn\u001b[39;49;00m y\u001b[37m\u001b[39;49;00m\n", "10 \u001b[34melif\u001b[39;49;00m x > z:\u001b[37m\u001b[39;49;00m\n", "11 \u001b[34mreturn\u001b[39;49;00m x\u001b[37m\u001b[39;49;00m\n", "12 \u001b[34mreturn\u001b[39;49;00m z\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(inspect.getsource(middle), '.py', start_line_number=1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "And here is our instrumented `middle_tree` executed with a `DataTracker` object. We see how the `middle()` tests access one argument after another." ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.897915Z", "iopub.status.busy": "2023-11-12T12:40:24.897734Z", "iopub.status.idle": "2023-11-12T12:40:24.901220Z", "shell.execute_reply": "2023-11-12T12:40:24.900105Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "middle:2: getting y\n", "middle:2: getting z\n", "middle:3: getting x\n", "middle:3: getting y\n", "middle:5: getting x\n", "middle:5: getting z\n", "middle:6: getting y\n" ] } ], "source": [ "with DataTrackerTester(middle_tree, middle):\n", " middle(2, 1, 3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "After `DataTrackerTester` is done, `middle` is reverted to its non-instrumented version:" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.904700Z", "iopub.status.busy": "2023-11-12T12:40:24.904537Z", "iopub.status.idle": "2023-11-12T12:40:24.907138Z", "shell.execute_reply": "2023-11-12T12:40:24.906704Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "middle(2, 1, 3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For a complete picture of what happens during executions, we implement a number of additional code transformers." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For each assignment statement `x = y`, we change it to `x = _data.set('x', y)`. This allows __tracking assignments__." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Tracking Assignments and Assertions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For the remaining transformers, we follow the same steps as for `TrackGetTransformer`, except that our `visit_...()` methods focus on different nodes, and return different subtrees. Here, we focus on assignment nodes." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We want to transform assignments `x = value` into `_data.set('x', value)` to track assignments to `x`." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If the left-hand side of the assignment is more complex, as in `x[y] = value`, we want to ensure the read access to `x` and `y` is also tracked. By transforming `x[y] = value` into `_data.set('x', value, loads=(x, y))`, we ensure that `x` and `y` are marked as read (as the otherwise ignored `loads` argument would be changed to `_data.get()` calls for `x` and `y`)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Using `ast.dump()`, we reveal what the corresponding syntax tree has to look like:" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.909595Z", "iopub.status.busy": "2023-11-12T12:40:24.909364Z", "iopub.status.idle": "2023-11-12T12:40:24.912020Z", "shell.execute_reply": "2023-11-12T12:40:24.911623Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Module(body=[Expr(value=Call(func=Attribute(value=Name(id='_data', ctx=Load()), attr='set', ctx=Load()), args=[Constant(value='x'), Name(id='value', ctx=Load())], keywords=[keyword(arg='loads', value=Tuple(elts=[Name(id='a', ctx=Load()), Name(id='b', ctx=Load())], ctx=Load()))]))], type_ignores=[])\n" ] } ], "source": [ "print(ast.dump(ast.parse(\"_data.set('x', value, loads=(a, b))\")))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Using this structure, we can write a function `make_set_data()` which constructs such a subtree." ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.913670Z", "iopub.status.busy": "2023-11-12T12:40:24.913534Z", "iopub.status.idle": "2023-11-12T12:40:24.916665Z", "shell.execute_reply": "2023-11-12T12:40:24.916220Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def make_set_data(id: str, value: Any, \n", " loads: Optional[Set[str]] = None, method: str = 'set') -> Call:\n", " \"\"\"\n", " Construct a subtree _data.`method`('`id`', `value`). \n", " If `loads` is set to [X1, X2, ...], make it\n", " _data.`method`('`id`', `value`, loads=(X1, X2, ...))\n", " \"\"\"\n", "\n", " keywords=[]\n", "\n", " if loads:\n", " keywords = [\n", " keyword(arg='loads',\n", " value=ast.Tuple(\n", " elts=[Name(id=load, ctx=Load()) for load in loads],\n", " ctx=Load()\n", " ))\n", " ]\n", "\n", " new_node = Call(func=Attribute(value=Name(id=DATA_TRACKER, ctx=Load()),\n", " attr=method, ctx=Load()),\n", " args=[ast.Str(s=id), value],\n", " keywords=keywords)\n", "\n", " ast.copy_location(new_node, value)\n", "\n", " return new_node" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The problem is, however: How do we get the name of the variable being assigned to? The left-hand side of an assignment can be a complex expression such as `x[i]`. We use the leftmost name of the left-hand side as name to be assigned to." ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.918308Z", "iopub.status.busy": "2023-11-12T12:40:24.918187Z", "iopub.status.idle": "2023-11-12T12:40:24.920598Z", "shell.execute_reply": "2023-11-12T12:40:24.920299Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class LeftmostNameVisitor(NodeVisitor):\n", " def __init__(self) -> None:\n", " super().__init__()\n", " self.leftmost_name: Optional[str] = None\n", "\n", " def visit_Name(self, node: Name) -> None:\n", " if self.leftmost_name is None:\n", " self.leftmost_name = node.id\n", " self.generic_visit(node)" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.921948Z", "iopub.status.busy": "2023-11-12T12:40:24.921861Z", "iopub.status.idle": "2023-11-12T12:40:24.923880Z", "shell.execute_reply": "2023-11-12T12:40:24.923491Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def leftmost_name(tree: AST) -> Optional[str]:\n", " visitor = LeftmostNameVisitor()\n", " visitor.visit(tree)\n", " return visitor.leftmost_name" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.925548Z", "iopub.status.busy": "2023-11-12T12:40:24.925391Z", "iopub.status.idle": "2023-11-12T12:40:24.927829Z", "shell.execute_reply": "2023-11-12T12:40:24.927452Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'a'" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "leftmost_name(ast.parse('a[x] = 25'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Python also allows _tuple assignments_, as in `(a, b, c) = (1, 2, 3)`. We extract all variables being stored (that is, expressions whose `ctx` attribute is `Store()`) and extract their (leftmost) names." ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.929437Z", "iopub.status.busy": "2023-11-12T12:40:24.929333Z", "iopub.status.idle": "2023-11-12T12:40:24.931918Z", "shell.execute_reply": "2023-11-12T12:40:24.931627Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class StoreVisitor(NodeVisitor):\n", " def __init__(self) -> None:\n", " super().__init__()\n", " self.names: Set[str] = set()\n", "\n", " def visit(self, node: AST) -> None:\n", " if hasattr(node, 'ctx') and isinstance(node.ctx, Store): # type: ignore\n", " name = leftmost_name(node)\n", " if name:\n", " self.names.add(name)\n", "\n", " self.generic_visit(node)" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.933368Z", "iopub.status.busy": "2023-11-12T12:40:24.933280Z", "iopub.status.idle": "2023-11-12T12:40:24.935085Z", "shell.execute_reply": "2023-11-12T12:40:24.934834Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def store_names(tree: AST) -> Set[str]:\n", " visitor = StoreVisitor()\n", " visitor.visit(tree)\n", " return visitor.names" ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.936362Z", "iopub.status.busy": "2023-11-12T12:40:24.936280Z", "iopub.status.idle": "2023-11-12T12:40:24.938726Z", "shell.execute_reply": "2023-11-12T12:40:24.938414Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'a', 'b', 'c'}" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "store_names(ast.parse('a[x], b[y], c = 1, 2, 3'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For complex assignments, we also want to access the names read in the left hand side of an expression." ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.940594Z", "iopub.status.busy": "2023-11-12T12:40:24.940277Z", "iopub.status.idle": "2023-11-12T12:40:24.943132Z", "shell.execute_reply": "2023-11-12T12:40:24.942742Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class LoadVisitor(NodeVisitor):\n", " def __init__(self) -> None:\n", " super().__init__()\n", " self.names: Set[str] = set()\n", "\n", " def visit(self, node: AST) -> None:\n", " if hasattr(node, 'ctx') and isinstance(node.ctx, Load): # type: ignore\n", " name = leftmost_name(node)\n", " if name is not None:\n", " self.names.add(name)\n", "\n", " self.generic_visit(node)" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.944813Z", "iopub.status.busy": "2023-11-12T12:40:24.944685Z", "iopub.status.idle": "2023-11-12T12:40:24.946535Z", "shell.execute_reply": "2023-11-12T12:40:24.946275Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def load_names(tree: AST) -> Set[str]:\n", " visitor = LoadVisitor()\n", " visitor.visit(tree)\n", " return visitor.names" ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.948077Z", "iopub.status.busy": "2023-11-12T12:40:24.947963Z", "iopub.status.idle": "2023-11-12T12:40:24.950302Z", "shell.execute_reply": "2023-11-12T12:40:24.949908Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'a', 'b', 'x', 'y'}" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "load_names(ast.parse('a[x], b[y], c = 1, 2, 3'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With this, we can now define `TrackSetTransformer` as a transformer for regular assignments. Note that in Python, an assignment can have multiple targets, as in `a = b = c`; we assign the data dependencies of `c` to them all." ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.951797Z", "iopub.status.busy": "2023-11-12T12:40:24.951674Z", "iopub.status.idle": "2023-11-12T12:40:24.954146Z", "shell.execute_reply": "2023-11-12T12:40:24.953839Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TrackSetTransformer(NodeTransformer):\n", " def visit_Assign(self, node: Assign) -> Assign:\n", " value = ast.unparse(node.value)\n", " if value.startswith(DATA_TRACKER + '.set'):\n", " return node # Do not apply twice\n", "\n", " for target in node.targets:\n", " loads = load_names(target)\n", " for store_name in store_names(target):\n", " node.value = make_set_data(store_name, node.value, \n", " loads=loads)\n", " loads = set()\n", "\n", " return node" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The special form of \"augmented assign\" needs special treatment. We change statements of the form `x += y` to `x += _data.augment('x', y)`." ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.955782Z", "iopub.status.busy": "2023-11-12T12:40:24.955663Z", "iopub.status.idle": "2023-11-12T12:40:24.957790Z", "shell.execute_reply": "2023-11-12T12:40:24.957433Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class TrackSetTransformer(TrackSetTransformer):\n", " def visit_AugAssign(self, node: AugAssign) -> AugAssign:\n", " value = ast.unparse(node.value)\n", " if value.startswith(DATA_TRACKER):\n", " return node # Do not apply twice\n", "\n", " id = cast(str, leftmost_name(node.target))\n", " node.value = make_set_data(id, node.value, method='augment')\n", "\n", " return node" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The corresponding `augment()` method uses a combination of `set()` and `get()` to reflect the semantics." ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.959535Z", "iopub.status.busy": "2023-11-12T12:40:24.959420Z", "iopub.status.idle": "2023-11-12T12:40:24.961316Z", "shell.execute_reply": "2023-11-12T12:40:24.961034Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DataTracker(DataTracker):\n", " def augment(self, name: str, value: Any) -> Any:\n", " \"\"\"\n", " Track augmenting `name` with `value`.\n", " To be overloaded in subclasses.\n", " \"\"\"\n", " self.set(name, self.get(name, value))\n", " return value" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "\"Annotated\" assignments come with a type, as in `x: int = 3`. We treat them like regular assignments." ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.962826Z", "iopub.status.busy": "2023-11-12T12:40:24.962711Z", "iopub.status.idle": "2023-11-12T12:40:24.965038Z", "shell.execute_reply": "2023-11-12T12:40:24.964818Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TrackSetTransformer(TrackSetTransformer):\n", " def visit_AnnAssign(self, node: AnnAssign) -> AnnAssign:\n", " if node.value is None:\n", " return node # just : without value\n", "\n", " value = ast.unparse(node.value)\n", " if value.startswith(DATA_TRACKER + '.set'):\n", " return node # Do not apply twice\n", "\n", " loads = load_names(node.target)\n", " for store_name in store_names(node.target):\n", " node.value = make_set_data(store_name, node.value, \n", " loads=loads)\n", " loads = set()\n", "\n", " return node" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Finally, we treat _assertions_ just as if they were assignments to special variables:" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.966923Z", "iopub.status.busy": "2023-11-12T12:40:24.966742Z", "iopub.status.idle": "2023-11-12T12:40:24.969059Z", "shell.execute_reply": "2023-11-12T12:40:24.968813Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class TrackSetTransformer(TrackSetTransformer):\n", " def visit_Assert(self, node: Assert) -> Assert:\n", " value = ast.unparse(node.test)\n", " if value.startswith(DATA_TRACKER + '.set'):\n", " return node # Do not apply twice\n", "\n", " loads = load_names(node.test)\n", " node.test = make_set_data(\"\", node.test, loads=loads)\n", " return node" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here's all of these transformers in action. Our original function has a number of assignments:" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.970568Z", "iopub.status.busy": "2023-11-12T12:40:24.970463Z", "iopub.status.idle": "2023-11-12T12:40:24.972443Z", "shell.execute_reply": "2023-11-12T12:40:24.972201Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def assign_test(x: int) -> Tuple[int, str]: # type: ignore\n", " forty_two: int = 42\n", " forty_two = forty_two = 42\n", " a, b = 1, 2\n", " c = d = [a, b]\n", " c[d[a]].attr = 47 # type: ignore\n", " a *= b + 1\n", " assert a > 0\n", " return forty_two, \"Forty-Two\"" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.973836Z", "iopub.status.busy": "2023-11-12T12:40:24.973721Z", "iopub.status.idle": "2023-11-12T12:40:24.976186Z", "shell.execute_reply": "2023-11-12T12:40:24.975872Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assign_tree = ast.parse(inspect.getsource(assign_test))" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:24.978390Z", "iopub.status.busy": "2023-11-12T12:40:24.978120Z", "iopub.status.idle": "2023-11-12T12:40:25.012933Z", "shell.execute_reply": "2023-11-12T12:40:25.012613Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32massign_test\u001b[39;49;00m(x: \u001b[36mint\u001b[39;49;00m) -> Tuple[\u001b[36mint\u001b[39;49;00m, \u001b[36mstr\u001b[39;49;00m]:\u001b[37m\u001b[39;49;00m\n", " forty_two: \u001b[36mint\u001b[39;49;00m = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mforty_two\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[34m42\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " forty_two = forty_two = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mforty_two\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mforty_two\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[34m42\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", " (a, b) = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33ma\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mb\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, (\u001b[34m1\u001b[39;49;00m, \u001b[34m2\u001b[39;49;00m)))\u001b[37m\u001b[39;49;00m\n", " c = d = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33md\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mc\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, [a, b]))\u001b[37m\u001b[39;49;00m\n", " c[d[a]].attr = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mc\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[34m47\u001b[39;49;00m, loads=(d, a, c))\u001b[37m\u001b[39;49;00m\n", " a *= _data.augment(\u001b[33m'\u001b[39;49;00m\u001b[33ma\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, b + \u001b[34m1\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, a > \u001b[34m0\u001b[39;49;00m, loads=(a,))\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m (forty_two, \u001b[33m'\u001b[39;49;00m\u001b[33mForty-Two\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "TrackSetTransformer().visit(assign_tree)\n", "dump_tree(assign_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we later apply our transformer for data accesses, we can see that we track all variable reads and writes." ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.014655Z", "iopub.status.busy": "2023-11-12T12:40:25.014533Z", "iopub.status.idle": "2023-11-12T12:40:25.049529Z", "shell.execute_reply": "2023-11-12T12:40:25.049151Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32massign_test\u001b[39;49;00m(x: \u001b[36mint\u001b[39;49;00m) -> Tuple[\u001b[36mint\u001b[39;49;00m, \u001b[36mstr\u001b[39;49;00m]:\u001b[37m\u001b[39;49;00m\n", " forty_two: \u001b[36mint\u001b[39;49;00m = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mforty_two\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[34m42\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " forty_two = forty_two = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mforty_two\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mforty_two\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[34m42\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", " (a, b) = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33ma\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mb\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, (\u001b[34m1\u001b[39;49;00m, \u001b[34m2\u001b[39;49;00m)))\u001b[37m\u001b[39;49;00m\n", " c = d = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33md\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mc\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, [_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33ma\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, a), _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mb\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, b)]))\u001b[37m\u001b[39;49;00m\n", " _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mc\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, c)[_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33md\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, d)[_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33ma\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, a)]].attr = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mc\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[34m47\u001b[39;49;00m, loads=(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33md\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, d), _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33ma\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, a), _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mc\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, c)))\u001b[37m\u001b[39;49;00m\n", " a *= _data.augment(\u001b[33m'\u001b[39;49;00m\u001b[33ma\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mb\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, b) + \u001b[34m1\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33ma\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, a) > \u001b[34m0\u001b[39;49;00m, loads=(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33ma\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, a),))\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m (_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mforty_two\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, forty_two), \u001b[33m'\u001b[39;49;00m\u001b[33mForty-Two\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "TrackGetTransformer().visit(assign_tree)\n", "dump_tree(assign_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Each return statement `return x` is transformed to `return _data.set('', x)`. This allows __tracking return values__." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Tracking Return Values" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Our `TrackReturnTransformer` also makes use of `make_set_data()`." ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.051494Z", "iopub.status.busy": "2023-11-12T12:40:25.051361Z", "iopub.status.idle": "2023-11-12T12:40:25.055645Z", "shell.execute_reply": "2023-11-12T12:40:25.055361Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TrackReturnTransformer(NodeTransformer):\n", " def __init__(self) -> None:\n", " self.function_name: Optional[str] = None\n", " super().__init__()\n", "\n", " def visit_FunctionDef(self, node: Union[ast.FunctionDef, ast.AsyncFunctionDef]) -> AST:\n", " outer_name = self.function_name\n", " self.function_name = node.name # Save current name\n", " self.generic_visit(node)\n", " self.function_name = outer_name\n", " return node\n", "\n", " def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> AST:\n", " return self.visit_FunctionDef(node)\n", "\n", " def return_value(self, tp: str = \"return\") -> str:\n", " if self.function_name is None:\n", " return f\"<{tp} value>\"\n", " else:\n", " return f\"<{self.function_name}() {tp} value>\"\n", "\n", " def visit_return_or_yield(self, node: Union[ast.Return, ast.Yield, ast.YieldFrom],\n", " tp: str = \"return\") -> AST:\n", "\n", " if node.value is not None:\n", " value = ast.unparse(node.value)\n", " if not value.startswith(DATA_TRACKER + '.set'):\n", " node.value = make_set_data(self.return_value(tp), node.value)\n", "\n", " return node\n", "\n", " def visit_Return(self, node: ast.Return) -> AST:\n", " return self.visit_return_or_yield(node, tp=\"return\")\n", "\n", " def visit_Yield(self, node: ast.Yield) -> AST:\n", " return self.visit_return_or_yield(node, tp=\"yield\")\n", "\n", " def visit_YieldFrom(self, node: ast.YieldFrom) -> AST:\n", " return self.visit_return_or_yield(node, tp=\"yield\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This is the effect of `TrackReturnTransformer`. We see that all return values are saved, and thus all locations of the corresponding return statements are tracked." ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.057448Z", "iopub.status.busy": "2023-11-12T12:40:25.057261Z", "iopub.status.idle": "2023-11-12T12:40:25.091018Z", "shell.execute_reply": "2023-11-12T12:40:25.090720Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmiddle\u001b[39;49;00m(x, y, z):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y))\u001b[37m\u001b[39;49;00m\n", " \u001b[34melif\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y))\u001b[37m\u001b[39;49;00m\n", " \u001b[34melif\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) > _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y))\u001b[37m\u001b[39;49;00m\n", " \u001b[34melif\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) > _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x))\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z))\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "TrackReturnTransformer().visit(middle_tree)\n", "dump_tree(middle_tree)" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.092956Z", "iopub.status.busy": "2023-11-12T12:40:25.092830Z", "iopub.status.idle": "2023-11-12T12:40:25.094968Z", "shell.execute_reply": "2023-11-12T12:40:25.094742Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "middle:2: getting y\n", "middle:2: getting z\n", "middle:3: getting x\n", "middle:3: getting y\n", "middle:5: getting x\n", "middle:5: getting z\n", "middle:6: getting y\n", "middle:6: setting \n" ] } ], "source": [ "with DataTrackerTester(middle_tree, middle):\n", " middle(2, 1, 3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To track __control dependencies__, for every block controlled by an `if`, `while`, or `for`:\n", "\n", "1. We wrap their tests in a `_data.test()` wrapper. This allows us to assign pseudo-variables like `` which hold the conditions.\n", "2. We wrap their controlled blocks in a `with` statement. This allows us to track the variables read right before the `with` (= the controlling variables), and to restore the current controlling variables when the block is left.\n", "\n", "A statement\n", "\n", "```python\n", "if cond:\n", " body\n", "```\n", "\n", "thus becomes\n", "\n", "```python\n", "if _data.test(cond):\n", " with _data:\n", " body\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Tracking Control" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To modify control statements, we traverse the tree, looking for `If` nodes:" ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.096508Z", "iopub.status.busy": "2023-11-12T12:40:25.096404Z", "iopub.status.idle": "2023-11-12T12:40:25.098437Z", "shell.execute_reply": "2023-11-12T12:40:25.098181Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class TrackControlTransformer(NodeTransformer):\n", " def visit_If(self, node: ast.If) -> ast.If:\n", " self.generic_visit(node)\n", " node.test = self.make_test(node.test)\n", " node.body = self.make_with(node.body)\n", " node.orelse = self.make_with(node.orelse)\n", " return node" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The subtrees come from helper functions `make_with()` and `make_test()`. Again, all these subtrees are obtained via `ast.dump()`." ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.099876Z", "iopub.status.busy": "2023-11-12T12:40:25.099769Z", "iopub.status.idle": "2023-11-12T12:40:25.102743Z", "shell.execute_reply": "2023-11-12T12:40:25.102408Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TrackControlTransformer(TrackControlTransformer):\n", " def make_with(self, block: List[ast.stmt]) -> List[ast.stmt]:\n", " \"\"\"Create a subtree 'with _data: `block`'\"\"\"\n", " if len(block) == 0:\n", " return []\n", "\n", " block_as_text = ast.unparse(block[0])\n", " if block_as_text.startswith('with ' + DATA_TRACKER):\n", " return block # Do not apply twice\n", "\n", " new_node = With(\n", " items=[\n", " withitem(\n", " context_expr=Name(id=DATA_TRACKER, ctx=Load()),\n", " optional_vars=None)\n", " ],\n", " body=block\n", " )\n", " ast.copy_location(new_node, block[0])\n", " return [new_node]" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.104502Z", "iopub.status.busy": "2023-11-12T12:40:25.104354Z", "iopub.status.idle": "2023-11-12T12:40:25.106860Z", "shell.execute_reply": "2023-11-12T12:40:25.106594Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TrackControlTransformer(TrackControlTransformer):\n", " def make_test(self, test: ast.expr) -> ast.expr:\n", " test_as_text = ast.unparse(test)\n", " if test_as_text.startswith(DATA_TRACKER + '.test'):\n", " return test # Do not apply twice\n", "\n", " new_test = Call(func=Attribute(value=Name(id=DATA_TRACKER, ctx=Load()),\n", " attr='test',\n", " ctx=Load()),\n", " args=[test],\n", " keywords=[])\n", " ast.copy_location(new_test, test)\n", " return new_test" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`while` loops are handled just like `if` constructs." ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.108587Z", "iopub.status.busy": "2023-11-12T12:40:25.108286Z", "iopub.status.idle": "2023-11-12T12:40:25.110650Z", "shell.execute_reply": "2023-11-12T12:40:25.110352Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TrackControlTransformer(TrackControlTransformer):\n", " def visit_While(self, node: ast.While) -> ast.While:\n", " self.generic_visit(node)\n", " node.test = self.make_test(node.test)\n", " node.body = self.make_with(node.body)\n", " node.orelse = self.make_with(node.orelse)\n", " return node" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`for` loops gets a different treatment, as there is no condition that would control the body. Still, we ensure that setting the iterator variable is properly tracked." ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.112394Z", "iopub.status.busy": "2023-11-12T12:40:25.112247Z", "iopub.status.idle": "2023-11-12T12:40:25.115279Z", "shell.execute_reply": "2023-11-12T12:40:25.114948Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TrackControlTransformer(TrackControlTransformer):\n", " # regular `for` loop\n", " def visit_For(self, node: Union[ast.For, ast.AsyncFor]) -> AST:\n", " self.generic_visit(node)\n", " id = ast.unparse(node.target).strip()\n", " node.iter = make_set_data(id, node.iter)\n", "\n", " # Uncomment if you want iterators to control their bodies\n", " # node.body = self.make_with(node.body)\n", " # node.orelse = self.make_with(node.orelse)\n", " return node\n", "\n", " # `for` loops in async functions\n", " def visit_AsyncFor(self, node: ast.AsyncFor) -> AST:\n", " return self.visit_For(node)\n", "\n", " # `for` clause in comprehensions\n", " def visit_comprehension(self, node: ast.comprehension) -> AST:\n", " self.generic_visit(node)\n", " id = ast.unparse(node.target).strip()\n", " node.iter = make_set_data(id, node.iter)\n", " return node" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here is the effect of `TrackControlTransformer`:" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.117586Z", "iopub.status.busy": "2023-11-12T12:40:25.117466Z", "iopub.status.idle": "2023-11-12T12:40:25.182049Z", "shell.execute_reply": "2023-11-12T12:40:25.181682Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmiddle\u001b[39;49;00m(x, y, z):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y))\u001b[37m\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y))\u001b[37m\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) > _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y))\u001b[37m\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) > _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x))\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z))\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "TrackControlTransformer().visit(middle_tree)\n", "dump_tree(middle_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We extend `DataTracker` to also log these events:" ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.184062Z", "iopub.status.busy": "2023-11-12T12:40:25.183928Z", "iopub.status.idle": "2023-11-12T12:40:25.186183Z", "shell.execute_reply": "2023-11-12T12:40:25.185911Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class DataTracker(DataTracker):\n", " def test(self, cond: AST) -> AST:\n", " \"\"\"Test condition `cond`. To be overloaded in subclasses.\"\"\"\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: testing condition\")\n", "\n", " return cond" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.188146Z", "iopub.status.busy": "2023-11-12T12:40:25.187892Z", "iopub.status.idle": "2023-11-12T12:40:25.190567Z", "shell.execute_reply": "2023-11-12T12:40:25.190226Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DataTracker(DataTracker):\n", " def __enter__(self) -> Any:\n", " \"\"\"Enter `with` block. To be overloaded in subclasses.\"\"\"\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: entering block\")\n", " return self\n", "\n", " def __exit__(self, exc_type: Type, exc_value: BaseException, \n", " traceback: TracebackType) -> Optional[bool]:\n", " \"\"\"Exit `with` block. To be overloaded in subclasses.\"\"\"\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: exiting block\")\n", " return None" ] }, { "cell_type": "code", "execution_count": 116, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.192233Z", "iopub.status.busy": "2023-11-12T12:40:25.192099Z", "iopub.status.idle": "2023-11-12T12:40:25.194579Z", "shell.execute_reply": "2023-11-12T12:40:25.194193Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "middle:2: getting y\n", "middle:2: getting z\n", "middle:2: testing condition\n", "middle:3: entering block\n", "middle:3: getting x\n", "middle:3: getting y\n", "middle:3: testing condition\n", "middle:5: entering block\n", "middle:5: getting x\n", "middle:5: getting z\n", "middle:5: testing condition\n", "middle:6: entering block\n", "middle:6: getting y\n", "middle:6: setting \n", "middle:6: exiting block\n", "middle:5: exiting block\n", "middle:3: exiting block\n" ] } ], "source": [ "with DataTrackerTester(middle_tree, middle):\n", " middle(2, 1, 3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We also want to be able to __track calls__ across multiple functions. To this end, we wrap each call\n", "\n", "```python\n", "func(arg1, arg2, ...)\n", "```\n", "\n", "into\n", "\n", "```python\n", "_data.ret(_data.call(func)(_data.arg(arg1), _data.arg(arg2), ...))\n", "```\n", "\n", "each of which simply pass through their given argument, but which allow tracking the beginning of calls (`call()`), the computation of arguments (`arg()`), and the return of the call (`ret()`), respectively." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Tracking Calls and Arguments" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Our `TrackCallTransformer` visits all `Call` nodes, applying the transformations as shown above." ] }, { "cell_type": "code", "execution_count": 117, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.196405Z", "iopub.status.busy": "2023-11-12T12:40:25.196240Z", "iopub.status.idle": "2023-11-12T12:40:25.200542Z", "shell.execute_reply": "2023-11-12T12:40:25.200255Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TrackCallTransformer(NodeTransformer):\n", " def make_call(self, node: AST, func: str, \n", " pos: Optional[int] = None, kw: Optional[str] = None) -> Call:\n", " \"\"\"Return _data.call(`func`)(`node`)\"\"\"\n", " keywords = []\n", "\n", " # `Num()` and `Str()` are deprecated in favor of `Constant()`\n", " if pos:\n", " keywords.append(keyword(arg='pos', value=ast.Num(pos)))\n", " if kw:\n", " keywords.append(keyword(arg='kw', value=ast.Str(kw)))\n", "\n", " return Call(func=Attribute(value=Name(id=DATA_TRACKER,\n", " ctx=Load()),\n", " attr=func,\n", " ctx=Load()),\n", " args=[node],\n", " keywords=keywords)\n", "\n", " def visit_Call(self, node: Call) -> Call:\n", " self.generic_visit(node)\n", "\n", " call_as_text = ast.unparse(node)\n", " if call_as_text.startswith(DATA_TRACKER + '.ret'):\n", " return node # Already applied\n", "\n", " func_as_text = ast.unparse(node)\n", " if func_as_text.startswith(DATA_TRACKER + '.'):\n", " return node # Own function\n", "\n", " new_args = []\n", " for n, arg in enumerate(node.args):\n", " new_args.append(self.make_call(arg, 'arg', pos=n + 1))\n", " node.args = cast(List[ast.expr], new_args)\n", "\n", " for kw in node.keywords:\n", " id = kw.arg if hasattr(kw, 'arg') else None\n", " kw.value = self.make_call(kw.value, 'arg', kw=id)\n", "\n", " node.func = self.make_call(node.func, 'call')\n", " return self.make_call(node, 'ret')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Our example function `middle()` does not contain any calls, but here is a function that invokes `middle()` twice:" ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.202341Z", "iopub.status.busy": "2023-11-12T12:40:25.202219Z", "iopub.status.idle": "2023-11-12T12:40:25.204201Z", "shell.execute_reply": "2023-11-12T12:40:25.203885Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def test_call() -> int:\n", " x = middle(1, 2, z=middle(1, 2, 3))\n", " return x" ] }, { "cell_type": "code", "execution_count": 119, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.205961Z", "iopub.status.busy": "2023-11-12T12:40:25.205687Z", "iopub.status.idle": "2023-11-12T12:40:25.242077Z", "shell.execute_reply": "2023-11-12T12:40:25.241690Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mtest_call\u001b[39;49;00m() -> \u001b[36mint\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " x = middle(\u001b[34m1\u001b[39;49;00m, \u001b[34m2\u001b[39;49;00m, z=middle(\u001b[34m1\u001b[39;49;00m, \u001b[34m2\u001b[39;49;00m, \u001b[34m3\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m x\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "call_tree = ast.parse(inspect.getsource(test_call))\n", "dump_tree(call_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we invoke `TrackCallTransformer` on this testing function, we get the following transformed code:" ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.244299Z", "iopub.status.busy": "2023-11-12T12:40:25.244159Z", "iopub.status.idle": "2023-11-12T12:40:25.246364Z", "shell.execute_reply": "2023-11-12T12:40:25.246022Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "TrackCallTransformer().visit(call_tree);" ] }, { "cell_type": "code", "execution_count": 121, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.248928Z", "iopub.status.busy": "2023-11-12T12:40:25.248608Z", "iopub.status.idle": "2023-11-12T12:40:25.283770Z", "shell.execute_reply": "2023-11-12T12:40:25.283390Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mtest_call\u001b[39;49;00m() -> \u001b[36mint\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " x = _data.ret(_data.call(middle)(_data.arg(\u001b[34m1\u001b[39;49;00m, pos=\u001b[34m1\u001b[39;49;00m), _data.arg(\u001b[34m2\u001b[39;49;00m, pos=\u001b[34m2\u001b[39;49;00m), z=_data.arg(_data.ret(_data.call(middle)(_data.arg(\u001b[34m1\u001b[39;49;00m, pos=\u001b[34m1\u001b[39;49;00m), _data.arg(\u001b[34m2\u001b[39;49;00m, pos=\u001b[34m2\u001b[39;49;00m), _data.arg(\u001b[34m3\u001b[39;49;00m, pos=\u001b[34m3\u001b[39;49;00m))), kw=\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)))\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m x\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "dump_tree(call_tree)" ] }, { "cell_type": "code", "execution_count": 122, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.285847Z", "iopub.status.busy": "2023-11-12T12:40:25.285706Z", "iopub.status.idle": "2023-11-12T12:40:25.287969Z", "shell.execute_reply": "2023-11-12T12:40:25.287548Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def f() -> bool:\n", " return math.isclose(1, 1.0)" ] }, { "cell_type": "code", "execution_count": 123, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.290126Z", "iopub.status.busy": "2023-11-12T12:40:25.289878Z", "iopub.status.idle": "2023-11-12T12:40:25.324893Z", "shell.execute_reply": "2023-11-12T12:40:25.324499Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mf\u001b[39;49;00m() -> \u001b[36mbool\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m math.isclose(\u001b[34m1\u001b[39;49;00m, \u001b[34m1.0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "f_tree = ast.parse(inspect.getsource(f))\n", "dump_tree(f_tree)" ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.328035Z", "iopub.status.busy": "2023-11-12T12:40:25.327812Z", "iopub.status.idle": "2023-11-12T12:40:25.330612Z", "shell.execute_reply": "2023-11-12T12:40:25.329928Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "TrackCallTransformer().visit(f_tree);" ] }, { "cell_type": "code", "execution_count": 125, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.334255Z", "iopub.status.busy": "2023-11-12T12:40:25.333937Z", "iopub.status.idle": "2023-11-12T12:40:25.371134Z", "shell.execute_reply": "2023-11-12T12:40:25.370804Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mf\u001b[39;49;00m() -> \u001b[36mbool\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.ret(_data.call(math.isclose)(_data.arg(\u001b[34m1\u001b[39;49;00m, pos=\u001b[34m1\u001b[39;49;00m), _data.arg(\u001b[34m1.0\u001b[39;49;00m, pos=\u001b[34m2\u001b[39;49;00m)))\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "dump_tree(f_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As before, our default `arg()`, `ret()`, and `call()` methods simply log the event and pass through the given value." ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.372948Z", "iopub.status.busy": "2023-11-12T12:40:25.372821Z", "iopub.status.idle": "2023-11-12T12:40:25.375438Z", "shell.execute_reply": "2023-11-12T12:40:25.375167Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DataTracker(DataTracker):\n", " def arg(self, value: Any, pos: Optional[int] = None, kw: Optional[str] = None) -> Any:\n", " \"\"\"\n", " Track `value` being passed as argument.\n", " `pos` (if given) is the argument position (starting with 1).\n", " `kw` (if given) is the argument keyword.\n", " \"\"\"\n", "\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " info = \"\"\n", " if pos:\n", " info += f\" #{pos}\"\n", " if kw:\n", " info += f\" {repr(kw)}\"\n", "\n", " print(f\"{caller_func.__name__}:{lineno}: pushing arg{info}\")\n", "\n", " return value" ] }, { "cell_type": "code", "execution_count": 127, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.376933Z", "iopub.status.busy": "2023-11-12T12:40:25.376762Z", "iopub.status.idle": "2023-11-12T12:40:25.379130Z", "shell.execute_reply": "2023-11-12T12:40:25.378710Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DataTracker(DataTracker):\n", " def ret(self, value: Any) -> Any:\n", " \"\"\"Track `value` being used as return value.\"\"\"\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: returned from call\")\n", "\n", " return value" ] }, { "cell_type": "code", "execution_count": 128, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.380829Z", "iopub.status.busy": "2023-11-12T12:40:25.380708Z", "iopub.status.idle": "2023-11-12T12:40:25.383049Z", "shell.execute_reply": "2023-11-12T12:40:25.382767Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DataTracker(DataTracker):\n", " def instrument_call(self, func: Callable) -> Callable:\n", " \"\"\"Instrument a call to `func`. To be implemented in subclasses.\"\"\"\n", " return func\n", "\n", " def call(self, func: Callable) -> Callable:\n", " \"\"\"Track a call to `func`.\"\"\"\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: calling {func}\")\n", "\n", " return self.instrument_call(func)" ] }, { "cell_type": "code", "execution_count": 129, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.384719Z", "iopub.status.busy": "2023-11-12T12:40:25.384605Z", "iopub.status.idle": "2023-11-12T12:40:25.418565Z", "shell.execute_reply": "2023-11-12T12:40:25.418147Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mtest_call\u001b[39;49;00m() -> \u001b[36mint\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " x = _data.ret(_data.call(middle)(_data.arg(\u001b[34m1\u001b[39;49;00m, pos=\u001b[34m1\u001b[39;49;00m), _data.arg(\u001b[34m2\u001b[39;49;00m, pos=\u001b[34m2\u001b[39;49;00m), z=_data.arg(_data.ret(_data.call(middle)(_data.arg(\u001b[34m1\u001b[39;49;00m, pos=\u001b[34m1\u001b[39;49;00m), _data.arg(\u001b[34m2\u001b[39;49;00m, pos=\u001b[34m2\u001b[39;49;00m), _data.arg(\u001b[34m3\u001b[39;49;00m, pos=\u001b[34m3\u001b[39;49;00m))), kw=\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)))\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m x\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "dump_tree(call_tree)" ] }, { "cell_type": "code", "execution_count": 130, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.420609Z", "iopub.status.busy": "2023-11-12T12:40:25.420443Z", "iopub.status.idle": "2023-11-12T12:40:25.422712Z", "shell.execute_reply": "2023-11-12T12:40:25.422434Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "test_call:2: calling \n", "test_call:2: pushing arg #1\n", "test_call:2: pushing arg #2\n", "test_call:2: calling \n", "test_call:2: pushing arg #1\n", "test_call:2: pushing arg #2\n", "test_call:2: pushing arg #3\n", "test_call:2: returned from call\n", "test_call:2: pushing arg 'z'\n", "test_call:2: returned from call\n" ] } ], "source": [ "with DataTrackerTester(call_tree, test_call):\n", " test_call()" ] }, { "cell_type": "code", "execution_count": 131, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.425116Z", "iopub.status.busy": "2023-11-12T12:40:25.424909Z", "iopub.status.idle": "2023-11-12T12:40:25.427683Z", "shell.execute_reply": "2023-11-12T12:40:25.427357Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 131, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_call()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "On the receiving end, for each function argument `x`, we insert a call `_data.param('x', x, [position info])` to initialize `x`. This is useful for __tracking parameters across function calls.__" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Tracking Parameters" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Again, we use `ast.dump()` to determine the correct syntax tree:" ] }, { "cell_type": "code", "execution_count": 132, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.429796Z", "iopub.status.busy": "2023-11-12T12:40:25.429601Z", "iopub.status.idle": "2023-11-12T12:40:25.432304Z", "shell.execute_reply": "2023-11-12T12:40:25.431691Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Module(body=[Expr(value=Call(func=Attribute(value=Name(id='_data', ctx=Load()), attr='param', ctx=Load()), args=[Constant(value='x'), Name(id='x', ctx=Load())], keywords=[keyword(arg='pos', value=Constant(value=1)), keyword(arg='last', value=Constant(value=True))]))], type_ignores=[])\n" ] } ], "source": [ "print(ast.dump(ast.parse(\"_data.param('x', x, pos=1, last=True)\")))" ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.434680Z", "iopub.status.busy": "2023-11-12T12:40:25.434514Z", "iopub.status.idle": "2023-11-12T12:40:25.438998Z", "shell.execute_reply": "2023-11-12T12:40:25.438553Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TrackParamsTransformer(NodeTransformer):\n", " def visit_FunctionDef(self, node: ast.FunctionDef) -> ast.FunctionDef:\n", " self.generic_visit(node)\n", "\n", " named_args = []\n", " for child in ast.iter_child_nodes(node.args):\n", " if isinstance(child, ast.arg):\n", " named_args.append(child)\n", "\n", " create_stmts = []\n", " for n, child in enumerate(named_args):\n", " keywords=[keyword(arg='pos', value=ast.Num(n=n + 1))]\n", " if child is node.args.vararg:\n", " keywords.append(keyword(arg='vararg', value=ast.Str(s='*')))\n", " if child is node.args.kwarg:\n", " keywords.append(keyword(arg='vararg', value=ast.Str(s='**')))\n", " if n == len(named_args) - 1:\n", " keywords.append(keyword(arg='last',\n", " value=ast.NameConstant(value=True)))\n", "\n", " create_stmt = Expr(\n", " value=Call(\n", " func=Attribute(value=Name(id=DATA_TRACKER, ctx=Load()),\n", " attr='param', ctx=Load()),\n", " args=[ast.Str(s=child.arg),\n", " Name(id=child.arg, ctx=Load())\n", " ],\n", " keywords=keywords\n", " )\n", " )\n", " ast.copy_location(create_stmt, node)\n", " create_stmts.append(create_stmt)\n", "\n", " node.body = cast(List[ast.stmt], create_stmts) + node.body\n", " return node" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This is the effect of `TrackParamsTransformer()`. You see how the first three parameters are all initialized." ] }, { "cell_type": "code", "execution_count": 134, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.441251Z", "iopub.status.busy": "2023-11-12T12:40:25.441049Z", "iopub.status.idle": "2023-11-12T12:40:25.479437Z", "shell.execute_reply": "2023-11-12T12:40:25.479031Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmiddle\u001b[39;49;00m(x, y, z):\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x, pos=\u001b[34m1\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y, pos=\u001b[34m2\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z, pos=\u001b[34m3\u001b[39;49;00m, last=\u001b[34mTrue\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y))\u001b[37m\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y))\u001b[37m\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) > _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y))\u001b[37m\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) > _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x))\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z))\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "TrackParamsTransformer().visit(middle_tree)\n", "dump_tree(middle_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "By default, the `DataTracker` `param()` method simply calls `set()` to set variables." ] }, { "cell_type": "code", "execution_count": 135, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.481457Z", "iopub.status.busy": "2023-11-12T12:40:25.481308Z", "iopub.status.idle": "2023-11-12T12:40:25.484106Z", "shell.execute_reply": "2023-11-12T12:40:25.483811Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DataTracker(DataTracker):\n", " def param(self, name: str, value: Any, \n", " pos: Optional[int] = None, vararg: str = '', last: bool = False) -> Any:\n", " \"\"\"\n", " At the beginning of a function, track parameter `name` being set to `value`.\n", " `pos` is the position of the argument (starting with 1).\n", " `vararg` is \"*\" if `name` is a vararg parameter (as in *args),\n", " and \"**\" is `name` is a kwargs parameter (as in *kwargs).\n", " `last` is True if `name` is the last parameter.\n", " \"\"\"\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " info = \"\"\n", " if pos is not None:\n", " info += f\" #{pos}\"\n", "\n", " print(f\"{caller_func.__name__}:{lineno}: initializing {vararg}{name}{info}\")\n", "\n", " return self.set(name, value)" ] }, { "cell_type": "code", "execution_count": 136, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.485688Z", "iopub.status.busy": "2023-11-12T12:40:25.485566Z", "iopub.status.idle": "2023-11-12T12:40:25.488057Z", "shell.execute_reply": "2023-11-12T12:40:25.487765Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "middle:1: initializing x #1\n", "middle:1: setting x\n", "middle:1: initializing y #2\n", "middle:1: setting y\n", "middle:1: initializing z #3\n", "middle:1: setting z\n", "middle:2: getting y\n", "middle:2: getting z\n", "middle:2: testing condition\n", "middle:3: entering block\n", "middle:3: getting x\n", "middle:3: getting y\n", "middle:3: testing condition\n", "middle:5: entering block\n", "middle:5: getting x\n", "middle:5: getting z\n", "middle:5: testing condition\n", "middle:6: entering block\n", "middle:6: getting y\n", "middle:6: setting \n", "middle:6: exiting block\n", "middle:5: exiting block\n", "middle:3: exiting block\n" ] } ], "source": [ "with DataTrackerTester(middle_tree, middle):\n", " middle(2, 1, 3)" ] }, { "cell_type": "code", "execution_count": 137, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.489701Z", "iopub.status.busy": "2023-11-12T12:40:25.489542Z", "iopub.status.idle": "2023-11-12T12:40:25.491371Z", "shell.execute_reply": "2023-11-12T12:40:25.491102Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def args_test(x, *args, **kwargs): # type: ignore\n", " print(x, *args, **kwargs)" ] }, { "cell_type": "code", "execution_count": 138, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.493928Z", "iopub.status.busy": "2023-11-12T12:40:25.493686Z", "iopub.status.idle": "2023-11-12T12:40:25.528219Z", "shell.execute_reply": "2023-11-12T12:40:25.527747Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32margs_test\u001b[39;49;00m(x, *args, **kwargs):\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x, pos=\u001b[34m1\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33margs\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, args, pos=\u001b[34m2\u001b[39;49;00m, vararg=\u001b[33m'\u001b[39;49;00m\u001b[33m*\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33mkwargs\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, kwargs, pos=\u001b[34m3\u001b[39;49;00m, vararg=\u001b[33m'\u001b[39;49;00m\u001b[33m**\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, last=\u001b[34mTrue\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[36mprint\u001b[39;49;00m(x, *args, **kwargs)\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "args_tree = ast.parse(inspect.getsource(args_test))\n", "TrackParamsTransformer().visit(args_tree)\n", "dump_tree(args_tree)" ] }, { "cell_type": "code", "execution_count": 139, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.530525Z", "iopub.status.busy": "2023-11-12T12:40:25.530354Z", "iopub.status.idle": "2023-11-12T12:40:25.532682Z", "shell.execute_reply": "2023-11-12T12:40:25.532312Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "args_test:1: initializing x #1\n", "args_test:1: setting x\n", "args_test:1: initializing *args #2\n", "args_test:1: setting args\n", "args_test:1: initializing **kwargs #3\n", "args_test:1: setting kwargs\n", "1 2 3\n" ] } ], "source": [ "with DataTrackerTester(args_tree, args_test):\n", " args_test(1, 2, 3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "What do we obtain after we have applied all these transformers on `middle()`? We see that the code now contains quite a load of instrumentation." ] }, { "cell_type": "code", "execution_count": 140, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.535062Z", "iopub.status.busy": "2023-11-12T12:40:25.534914Z", "iopub.status.idle": "2023-11-12T12:40:25.570973Z", "shell.execute_reply": "2023-11-12T12:40:25.570592Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmiddle\u001b[39;49;00m(x, y, z):\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x, pos=\u001b[34m1\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y, pos=\u001b[34m2\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z, pos=\u001b[34m3\u001b[39;49;00m, last=\u001b[34mTrue\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y))\u001b[37m\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) < _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y))\u001b[37m\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) > _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y))\u001b[37m\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) > _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x))\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mz\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, z))\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "dump_tree(middle_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "And when we execute this code, we see that we can track quite a number of events, while the code semantics stay unchanged." ] }, { "cell_type": "code", "execution_count": 141, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.572723Z", "iopub.status.busy": "2023-11-12T12:40:25.572601Z", "iopub.status.idle": "2023-11-12T12:40:25.576239Z", "shell.execute_reply": "2023-11-12T12:40:25.575899Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "middle:1: initializing x #1\n", "middle:1: setting x\n", "middle:1: initializing y #2\n", "middle:1: setting y\n", "middle:1: initializing z #3\n", "middle:1: setting z\n", "middle:2: getting y\n", "middle:2: getting z\n", "middle:2: testing condition\n", "middle:3: entering block\n", "middle:3: getting x\n", "middle:3: getting y\n", "middle:3: testing condition\n", "middle:5: entering block\n", "middle:5: getting x\n", "middle:5: getting z\n", "middle:5: testing condition\n", "middle:6: entering block\n", "middle:6: getting y\n", "middle:6: setting \n", "middle:6: exiting block\n", "middle:5: exiting block\n", "middle:3: exiting block\n" ] }, { "data": { "text/plain": [ "1" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with DataTrackerTester(middle_tree, middle):\n", " m = middle(2, 1, 3)\n", "m" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Transformer Stress Test" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We stress test our transformers by instrumenting, transforming, and compiling a number of modules." ] }, { "cell_type": "code", "execution_count": 142, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.578775Z", "iopub.status.busy": "2023-11-12T12:40:25.578661Z", "iopub.status.idle": "2023-11-12T12:40:25.606137Z", "shell.execute_reply": "2023-11-12T12:40:25.605796Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import Assertions # minor dependency\n", "import Debugger # minor dependency" ] }, { "cell_type": "code", "execution_count": 143, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:25.608244Z", "iopub.status.busy": "2023-11-12T12:40:25.608093Z", "iopub.status.idle": "2023-11-12T12:40:26.660239Z", "shell.execute_reply": "2023-11-12T12:40:26.659838Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'Assertions' instrumented successfully.\n", "'Debugger' instrumented successfully.\n", "'inspect' instrumented successfully.\n", "'ast' instrumented successfully.\n" ] } ], "source": [ "for module in [Assertions, Debugger, inspect, ast]:\n", " module_tree = ast.parse(inspect.getsource(module))\n", " assert isinstance(module_tree, ast.Module)\n", "\n", " TrackCallTransformer().visit(module_tree)\n", " TrackSetTransformer().visit(module_tree)\n", " TrackGetTransformer().visit(module_tree)\n", " TrackControlTransformer().visit(module_tree)\n", " TrackReturnTransformer().visit(module_tree)\n", " TrackParamsTransformer().visit(module_tree)\n", " # dump_tree(module_tree)\n", " ast.fix_missing_locations(module_tree) # Must run this before compiling\n", "\n", " module_code = compile(module_tree, '', 'exec')\n", " print(f\"{repr(module.__name__)} instrumented successfully.\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Our next step will now be not only to _log_ these events, but to actually construct _dependencies_ from them." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Tracking Dependencies" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To construct dependencies from variable accesses, we subclass `DataTracker` into `DependencyTracker` – a class that actually keeps track of all these dependencies. Its constructor initializes a number of variables we will discuss below." ] }, { "cell_type": "code", "execution_count": 144, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.662143Z", "iopub.status.busy": "2023-11-12T12:40:26.662018Z", "iopub.status.idle": "2023-11-12T12:40:26.665073Z", "shell.execute_reply": "2023-11-12T12:40:26.664693Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DependencyTracker(DataTracker):\n", " \"\"\"Track dependencies during execution\"\"\"\n", "\n", " def __init__(self, *args: Any, **kwargs: Any) -> None:\n", " \"\"\"Constructor. Arguments are passed to DataTracker.__init__()\"\"\"\n", " super().__init__(*args, **kwargs)\n", "\n", " self.origins: Dict[str, Location] = {} # Where current variables were last set\n", " self.data_dependencies: Dependency = {} # As with Dependencies, above\n", " self.control_dependencies: Dependency = {}\n", "\n", " self.last_read: List[str] = [] # List of last read variables\n", " self.last_checked_location = (StackInspector.unknown, 1)\n", " self._ignore_location_change = False\n", "\n", " self.data: List[List[str]] = [[]] # Data stack\n", " self.control: List[List[str]] = [[]] # Control stack\n", "\n", " self.frames: List[Dict[Union[int, str], Any]] = [{}] # Argument stack\n", " self.args: Dict[Union[int, str], Any] = {} # Current args" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Data Dependencies\n", "\n", "The first job of our `DependencyTracker` is to construct dependencies between _read_ and _written_ variables." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Reading Variables\n", "\n", "As in `DataTracker`, the key method of `DependencyTracker` again is `get()`, invoked as `_data.get('x', x)` whenever a variable `x` is read. First and foremost, it appends the name of the read variable to the list `last_read`." ] }, { "cell_type": "code", "execution_count": 145, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.666702Z", "iopub.status.busy": "2023-11-12T12:40:26.666602Z", "iopub.status.idle": "2023-11-12T12:40:26.668826Z", "shell.execute_reply": "2023-11-12T12:40:26.668551Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class DependencyTracker(DependencyTracker):\n", " def get(self, name: str, value: Any) -> Any:\n", " \"\"\"Track a read access for variable `name` with value `value`\"\"\"\n", " self.check_location()\n", " self.last_read.append(name)\n", " return super().get(name, value)\n", "\n", " def check_location(self) -> None:\n", " pass # More on that below" ] }, { "cell_type": "code", "execution_count": 146, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.670269Z", "iopub.status.busy": "2023-11-12T12:40:26.670182Z", "iopub.status.idle": "2023-11-12T12:40:26.671708Z", "shell.execute_reply": "2023-11-12T12:40:26.671466Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "x = 5\n", "y = 3" ] }, { "cell_type": "code", "execution_count": 147, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.673093Z", "iopub.status.busy": "2023-11-12T12:40:26.673005Z", "iopub.status.idle": "2023-11-12T12:40:26.675656Z", "shell.execute_reply": "2023-11-12T12:40:26.675259Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ":2: getting x\n", ":2: getting y\n" ] }, { "data": { "text/plain": [ "8" ] }, "execution_count": 147, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_test_data = DependencyTracker(log=True)\n", "_test_data.get('x', x) + _test_data.get('y', y)" ] }, { "cell_type": "code", "execution_count": 148, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.677496Z", "iopub.status.busy": "2023-11-12T12:40:26.677391Z", "iopub.status.idle": "2023-11-12T12:40:26.679680Z", "shell.execute_reply": "2023-11-12T12:40:26.679402Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['x', 'y']" ] }, "execution_count": 148, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_test_data.last_read" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Checking Locations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "However, before appending the read variable to `last_read`, `_data.get()` does one more thing. By invoking `check_location()`, it clears the `last_read` list if we have reached a new line in the execution. This avoids situations such as\n", "\n", "```python\n", "x\n", "y\n", "z = a + b\n", "```\n", "where `x` and `y` are, well, read, but do not affect the last line. Therefore, with every new line, the list of last read lines is cleared." ] }, { "cell_type": "code", "execution_count": 149, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.681170Z", "iopub.status.busy": "2023-11-12T12:40:26.681082Z", "iopub.status.idle": "2023-11-12T12:40:26.684104Z", "shell.execute_reply": "2023-11-12T12:40:26.683850Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DependencyTracker(DependencyTracker):\n", " def clear_read(self) -> None:\n", " \"\"\"Clear set of read variables\"\"\"\n", " if self.log:\n", " direct_caller = inspect.currentframe().f_back.f_code.co_name # type: ignore\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: \"\n", " f\"clearing read variables {self.last_read} \"\n", " f\"(from {direct_caller})\")\n", "\n", " self.last_read = []\n", "\n", " def check_location(self) -> None:\n", " \"\"\"If we are in a new location, clear set of read variables\"\"\"\n", " location = self.caller_location()\n", " func, lineno = location\n", " last_func, last_lineno = self.last_checked_location\n", "\n", " if self.last_checked_location != location:\n", " if self._ignore_location_change:\n", " self._ignore_location_change = False\n", " elif func.__name__.startswith('<'):\n", " # Entering list comprehension, eval(), exec(), ...\n", " pass\n", " elif last_func.__name__.startswith('<'):\n", " # Exiting list comprehension, eval(), exec(), ...\n", " pass\n", " else:\n", " # Standard case\n", " self.clear_read()\n", "\n", " self.last_checked_location = location" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Two methods can suppress this reset of the `last_read` list: \n", "\n", "* `ignore_next_location_change()` suppresses the reset for the next line. This is useful when returning from a function, when the return value is still in the list of \"read\" variables.\n", "* `ignore_location_change()` suppresses the reset for the current line. This is useful if we already have returned from a function call." ] }, { "cell_type": "code", "execution_count": 150, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.685609Z", "iopub.status.busy": "2023-11-12T12:40:26.685528Z", "iopub.status.idle": "2023-11-12T12:40:26.687436Z", "shell.execute_reply": "2023-11-12T12:40:26.687182Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class DependencyTracker(DependencyTracker):\n", " def ignore_next_location_change(self) -> None:\n", " self._ignore_location_change = True\n", "\n", " def ignore_location_change(self) -> None:\n", " self.last_checked_location = self.caller_location()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Watch how `DependencyTracker` resets `last_read` when a new line is executed:" ] }, { "cell_type": "code", "execution_count": 151, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.689196Z", "iopub.status.busy": "2023-11-12T12:40:26.689094Z", "iopub.status.idle": "2023-11-12T12:40:26.690790Z", "shell.execute_reply": "2023-11-12T12:40:26.690529Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "_test_data = DependencyTracker()" ] }, { "cell_type": "code", "execution_count": 152, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.692502Z", "iopub.status.busy": "2023-11-12T12:40:26.692358Z", "iopub.status.idle": "2023-11-12T12:40:26.694780Z", "shell.execute_reply": "2023-11-12T12:40:26.694399Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 152, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_test_data.get('x', x) + _test_data.get('y', y)" ] }, { "cell_type": "code", "execution_count": 153, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.696502Z", "iopub.status.busy": "2023-11-12T12:40:26.696370Z", "iopub.status.idle": "2023-11-12T12:40:26.698653Z", "shell.execute_reply": "2023-11-12T12:40:26.698298Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['x', 'y']" ] }, "execution_count": 153, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_test_data.last_read" ] }, { "cell_type": "code", "execution_count": 154, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.700712Z", "iopub.status.busy": "2023-11-12T12:40:26.700438Z", "iopub.status.idle": "2023-11-12T12:40:26.703061Z", "shell.execute_reply": "2023-11-12T12:40:26.702817Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "41" ] }, "execution_count": 154, "metadata": {}, "output_type": "execute_result" } ], "source": [ "a = 42\n", "b = -1\n", "_test_data.get('a', a) + _test_data.get('b', b)" ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.704783Z", "iopub.status.busy": "2023-11-12T12:40:26.704656Z", "iopub.status.idle": "2023-11-12T12:40:26.706665Z", "shell.execute_reply": "2023-11-12T12:40:26.706372Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['x', 'y', 'a', 'b']" ] }, "execution_count": 155, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_test_data.last_read" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Setting Variables\n", "\n", "The method `set()` creates dependencies. It is invoked as `_data.set('x', value)` whenever a variable `x` is set. \n", "\n", "First and foremost, it takes the list of variables read `last_read`, and for each of the variables $v$, it takes their origin $o$ (the place where they were last set) and appends the pair ($v$, $o$) to the list of data dependencies. It then does a similar thing with control dependencies (more on these below), and finally marks (in `self.origins`) the current location of $v$." ] }, { "cell_type": "code", "execution_count": 156, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.708486Z", "iopub.status.busy": "2023-11-12T12:40:26.708341Z", "iopub.status.idle": "2023-11-12T12:40:26.710211Z", "shell.execute_reply": "2023-11-12T12:40:26.709938Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import itertools" ] }, { "cell_type": "code", "execution_count": 157, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.711841Z", "iopub.status.busy": "2023-11-12T12:40:26.711685Z", "iopub.status.idle": "2023-11-12T12:40:26.716053Z", "shell.execute_reply": "2023-11-12T12:40:26.715725Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DependencyTracker(DependencyTracker):\n", " TEST = '' # Name of pseudo-variables for testing conditions\n", "\n", " def set(self, name: str, value: Any, loads: Optional[Set[str]] = None) -> Any:\n", " \"\"\"Add a dependency for `name` = `value`\"\"\"\n", "\n", " def add_dependencies(dependencies: Set[Node], \n", " vars_read: List[str], tp: str) -> None:\n", " \"\"\"Add origins of `vars_read` to `dependencies`.\"\"\"\n", " for var_read in vars_read:\n", " if var_read in self.origins:\n", " if var_read == self.TEST and tp == \"data\":\n", " # Can't have data dependencies on conditions\n", " continue\n", "\n", " origin = self.origins[var_read]\n", " dependencies.add((var_read, origin))\n", "\n", " if self.log:\n", " origin_func, origin_lineno = origin\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: \"\n", " f\"new {tp} dependency: \"\n", " f\"{name} <= {var_read} \"\n", " f\"({origin_func.__name__}:{origin_lineno})\")\n", "\n", " self.check_location()\n", " ret = super().set(name, value)\n", " location = self.caller_location()\n", "\n", " add_dependencies(self.data_dependencies.setdefault\n", " ((name, location), set()),\n", " self.last_read, tp=\"data\")\n", " add_dependencies(self.control_dependencies.setdefault\n", " ((name, location), set()),\n", " cast(List[str], itertools.chain.from_iterable(self.control)),\n", " tp=\"control\")\n", "\n", " self.origins[name] = location\n", "\n", " # Reset read info for next line\n", " self.last_read = [name]\n", "\n", " # Next line is a new location\n", " self._ignore_location_change = False\n", "\n", " return ret\n", "\n", " def dependencies(self) -> Dependencies:\n", " \"\"\"Return dependencies\"\"\"\n", " return Dependencies(self.data_dependencies,\n", " self.control_dependencies)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let us illustrate `set()` by example. Here's a set of variables read and written:" ] }, { "cell_type": "code", "execution_count": 158, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.717705Z", "iopub.status.busy": "2023-11-12T12:40:26.717603Z", "iopub.status.idle": "2023-11-12T12:40:26.719968Z", "shell.execute_reply": "2023-11-12T12:40:26.719679Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "_test_data = DependencyTracker()\n", "x = _test_data.set('x', 1)\n", "y = _test_data.set('y', _test_data.get('x', x))\n", "z = _test_data.set('z', _test_data.get('x', x) + _test_data.get('y', y))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The attribute `origins` saves for each variable where it was last written:" ] }, { "cell_type": "code", "execution_count": 159, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.721585Z", "iopub.status.busy": "2023-11-12T12:40:26.721482Z", "iopub.status.idle": "2023-11-12T12:40:26.723847Z", "shell.execute_reply": "2023-11-12T12:40:26.723547Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'x': (()>, 2),\n", " 'y': (()>, 3),\n", " 'z': (()>, 4)}" ] }, "execution_count": 159, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_test_data.origins" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The attribute `data_dependencies` tracks for each variable the variables it was read from:" ] }, { "cell_type": "code", "execution_count": 160, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.725687Z", "iopub.status.busy": "2023-11-12T12:40:26.725552Z", "iopub.status.idle": "2023-11-12T12:40:26.728080Z", "shell.execute_reply": "2023-11-12T12:40:26.727823Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{('x', (()>, 2)): set(),\n", " ('y',\n", " (()>, 3)): {('x',\n", " (()>, 2))},\n", " ('z',\n", " (()>, 4)): {('x',\n", " (()>, 2)), ('y',\n", " (()>, 3))}}" ] }, "execution_count": 160, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_test_data.data_dependencies" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Hence, the above code already gives us a small dependency graph:" ] }, { "cell_type": "code", "execution_count": 161, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:26.729709Z", "iopub.status.busy": "2023-11-12T12:40:26.729582Z", "iopub.status.idle": "2023-11-12T12:40:27.150451Z", "shell.execute_reply": "2023-11-12T12:40:27.150062Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "x_functioncellline_2at0x104c6a710_2\n", "\n", "\n", "x\n", "<cell line: 2>\n", "\n", "\n", "\n", "\n", "\n", "y_functioncellline_3at0x105d98a60_3\n", "\n", "\n", "y\n", "<cell line: 3>\n", "\n", "\n", "\n", "\n", "\n", "x_functioncellline_2at0x104c6a710_2->y_functioncellline_3at0x105d98a60_3\n", "\n", "\n", "\n", "\n", "\n", "z_functioncellline_4at0x105d98dc0_4\n", "\n", "\n", "z\n", "<cell line: 4>\n", "\n", "\n", "\n", "\n", "\n", "x_functioncellline_2at0x104c6a710_2->z_functioncellline_4at0x105d98dc0_4\n", "\n", "\n", "\n", "\n", "\n", "y_functioncellline_3at0x105d98a60_3->z_functioncellline_4at0x105d98dc0_4\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 161, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "_test_data.dependencies().graph()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In the remainder of this section, we define methods to\n", "\n", "* track control dependencies (`test()`, `__enter__()`, `__exit__()`)\n", "* track function calls and returns (`call()`, `ret()`)\n", "* track function arguments (`arg()`, `param()`)\n", "* check the validity of our dependencies (`validate()`).\n", "\n", "Like our `get()` and `set()` methods above, these work by refining the appropriate methods defined in the `DataTracker` class, building on our `NodeTransformer` transformations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Control Dependencies" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us detail control dependencies. As discussed with `DataTracker()`, we invoke `test()` methods for all control conditions, and place the controlled blocks into `with` clauses." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `test()` method simply sets a `` variable; this also places it in `last_read`." ] }, { "cell_type": "code", "execution_count": 162, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.152161Z", "iopub.status.busy": "2023-11-12T12:40:27.152043Z", "iopub.status.idle": "2023-11-12T12:40:27.154219Z", "shell.execute_reply": "2023-11-12T12:40:27.153943Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class DependencyTracker(DependencyTracker):\n", " def test(self, value: Any) -> Any:\n", " \"\"\"Track a test for condition `value`\"\"\"\n", " self.set(self.TEST, value)\n", " return super().test(value)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "When entering a `with` block, the set of `last_read` variables holds the `` variable read. We save it in the `control` stack, with the effect of any further variables written now being marked as controlled by ``." ] }, { "cell_type": "code", "execution_count": 163, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.155682Z", "iopub.status.busy": "2023-11-12T12:40:27.155563Z", "iopub.status.idle": "2023-11-12T12:40:27.157619Z", "shell.execute_reply": "2023-11-12T12:40:27.157344Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DependencyTracker(DependencyTracker):\n", " def __enter__(self) -> Any:\n", " \"\"\"Track entering an if/while/for block\"\"\"\n", " self.control.append(self.last_read)\n", " self.clear_read()\n", " return super().__enter__()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "When we exit the `with` block, we restore earlier `last_read` values, preparing for `else` blocks." ] }, { "cell_type": "code", "execution_count": 164, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.159153Z", "iopub.status.busy": "2023-11-12T12:40:27.159036Z", "iopub.status.idle": "2023-11-12T12:40:27.161222Z", "shell.execute_reply": "2023-11-12T12:40:27.160961Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DependencyTracker(DependencyTracker):\n", " def __exit__(self, exc_type: Type, exc_value: BaseException,\n", " traceback: TracebackType) -> Optional[bool]:\n", " \"\"\"Track exiting an if/while/for block\"\"\"\n", " self.clear_read()\n", " self.last_read = self.control.pop()\n", " self.ignore_next_location_change()\n", " return super().__exit__(exc_type, exc_value, traceback)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here's an example of all these parts in action:" ] }, { "cell_type": "code", "execution_count": 165, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.163231Z", "iopub.status.busy": "2023-11-12T12:40:27.163078Z", "iopub.status.idle": "2023-11-12T12:40:27.165259Z", "shell.execute_reply": "2023-11-12T12:40:27.164956Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "_test_data = DependencyTracker()\n", "x = _test_data.set('x', 1)\n", "y = _test_data.set('y', _test_data.get('x', x))" ] }, { "cell_type": "code", "execution_count": 166, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.166771Z", "iopub.status.busy": "2023-11-12T12:40:27.166651Z", "iopub.status.idle": "2023-11-12T12:40:27.168803Z", "shell.execute_reply": "2023-11-12T12:40:27.168541Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "if _test_data.test(_test_data.get('x', x) >= _test_data.get('y', y)):\n", " with _test_data:\n", " z = _test_data.set('z',\n", " _test_data.get('x', x) + _test_data.get('y', y))" ] }, { "cell_type": "code", "execution_count": 167, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.170210Z", "iopub.status.busy": "2023-11-12T12:40:27.170096Z", "iopub.status.idle": "2023-11-12T12:40:27.172825Z", "shell.execute_reply": "2023-11-12T12:40:27.172541Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{('x', (()>, 2)): set(),\n", " ('y', (()>, 3)): set(),\n", " ('', (()>, 1)): set(),\n", " ('z',\n", " (()>, 3)): {('',\n", " (()>, 1))}}" ] }, "execution_count": 167, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_test_data.control_dependencies" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The control dependency for `z` is reflected in the dependency graph:" ] }, { "cell_type": "code", "execution_count": 168, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.174270Z", "iopub.status.busy": "2023-11-12T12:40:27.174157Z", "iopub.status.idle": "2023-11-12T12:40:27.620469Z", "shell.execute_reply": "2023-11-12T12:40:27.620060Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "x_functioncellline_2at0x104c6a710_2\n", "\n", "\n", "x\n", "<cell line: 2>\n", "\n", "\n", "\n", "\n", "\n", "y_functioncellline_3at0x105d98a60_3\n", "\n", "\n", "y\n", "<cell line: 3>\n", "\n", "\n", "\n", "\n", "\n", "x_functioncellline_2at0x104c6a710_2->y_functioncellline_3at0x105d98a60_3\n", "\n", "\n", "\n", "\n", "\n", "test_functioncellline_1at0x104c69fc0_1\n", "\n", "\n", "<test>\n", "_test_data.get('x', x)\n", "\n", "\n", "\n", "\n", "\n", "x_functioncellline_2at0x104c6a710_2->test_functioncellline_1at0x104c69fc0_1\n", "\n", "\n", "\n", "\n", "\n", "z_functioncellline_1at0x105d993f0_3\n", "\n", "\n", "z\n", "<cell line: 1>\n", "\n", "\n", "\n", "\n", "\n", "x_functioncellline_2at0x104c6a710_2->z_functioncellline_1at0x105d993f0_3\n", "\n", "\n", "\n", "\n", "\n", "y_functioncellline_3at0x105d98a60_3->test_functioncellline_1at0x104c69fc0_1\n", "\n", "\n", "\n", "\n", "\n", "y_functioncellline_3at0x105d98a60_3->z_functioncellline_1at0x105d993f0_3\n", "\n", "\n", "\n", "\n", "\n", "test_functioncellline_1at0x104c69fc0_1->z_functioncellline_1at0x105d993f0_3\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "Dependencies(\n", " data={\n", " ('x', (, 2)): set(),\n", " ('y', (, 3)): {('x', (, 2))},\n", " ('', (, 1)): {('x', (, 2)), ('y', (, 3))},\n", " ('z', (, 3)): {('x', (, 2)), ('y', (, 3))}},\n", " control={\n", " ('x', (, 2)): set(),\n", " ('y', (, 3)): set(),\n", " ('', (, 1)): set(),\n", " ('z', (, 3)): {('', (, 1))}})" ] }, "execution_count": 168, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "_test_data.dependencies()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Calls and Returns" ] }, { "cell_type": "code", "execution_count": 169, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.622285Z", "iopub.status.busy": "2023-11-12T12:40:27.622161Z", "iopub.status.idle": "2023-11-12T12:40:27.623997Z", "shell.execute_reply": "2023-11-12T12:40:27.623654Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import copy" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To handle complex expressions involving functions, we introduce a _data stack_. Every time we invoke a function `func` (`call()` is invoked), we save the list of current variables read `last_read` on the `data` stack; when we return (`ret()` is invoked), we restore `last_read`. This also ensures that only those variables read while evaluating arguments will flow into the function call." ] }, { "cell_type": "code", "execution_count": 170, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.625792Z", "iopub.status.busy": "2023-11-12T12:40:27.625661Z", "iopub.status.idle": "2023-11-12T12:40:27.628703Z", "shell.execute_reply": "2023-11-12T12:40:27.628376Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DependencyTracker(DependencyTracker):\n", " def call(self, func: Callable) -> Callable:\n", " \"\"\"Track a call of function `func`\"\"\"\n", " func = super().call(func)\n", "\n", " if inspect.isgeneratorfunction(func):\n", " return self.call_generator(func)\n", "\n", " # Save context\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: \"\n", " f\"saving read variables {self.last_read}\")\n", "\n", " self.data.append(self.last_read)\n", " self.clear_read()\n", " self.ignore_next_location_change()\n", "\n", " self.frames.append(self.args)\n", " self.args = {}\n", "\n", " return func" ] }, { "cell_type": "code", "execution_count": 171, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.630209Z", "iopub.status.busy": "2023-11-12T12:40:27.630114Z", "iopub.status.idle": "2023-11-12T12:40:27.633554Z", "shell.execute_reply": "2023-11-12T12:40:27.633249Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DependencyTracker(DependencyTracker):\n", " def ret(self, value: Any) -> Any:\n", " \"\"\"Track a function return\"\"\"\n", " value = super().ret(value)\n", "\n", " if self.in_generator():\n", " return self.ret_generator(value)\n", "\n", " # Restore old context and add return value\n", " ret_name = None\n", " for var in self.last_read:\n", " if var.startswith(\"<\"): # \"\"\n", " ret_name = var\n", "\n", " self.last_read = self.data.pop()\n", " if ret_name:\n", " self.last_read.append(ret_name)\n", "\n", " if self.args:\n", " # We return from an uninstrumented function:\n", " # Make return value depend on all args\n", " for key, deps in self.args.items():\n", " self.last_read += deps\n", "\n", " self.ignore_location_change()\n", "\n", " self.args = self.frames.pop()\n", "\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: \"\n", " f\"restored read variables {self.last_read}\")\n", "\n", " return value" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Generator functions (those which `yield` a value) are not \"called\" in the sense that Python transfers control to them; instead, a \"call\" to a generator function creates a generator that is evaluated on demand. We mark generator function \"calls\" by saving `None` on the stacks. When the generator function returns the generator, we wrap the generator such that the arguments are being restored when it is invoked." ] }, { "cell_type": "code", "execution_count": 172, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.635090Z", "iopub.status.busy": "2023-11-12T12:40:27.634997Z", "iopub.status.idle": "2023-11-12T12:40:27.638676Z", "shell.execute_reply": "2023-11-12T12:40:27.638407Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DependencyTracker(DependencyTracker):\n", " def in_generator(self) -> bool:\n", " \"\"\"True if we are calling a generator function\"\"\"\n", " return len(self.data) > 0 and self.data[-1] is None\n", "\n", " def call_generator(self, func: Callable) -> Callable:\n", " \"\"\"Track a call of a generator function\"\"\"\n", " # Mark the fact that we're in a generator with `None` values\n", " self.data.append(None) # type: ignore\n", " self.frames.append(None) # type: ignore\n", " assert self.in_generator()\n", "\n", " self.clear_read()\n", " return func\n", "\n", " def ret_generator(self, generator: Any) -> Any:\n", " \"\"\"Track the return of a generator function\"\"\"\n", " # Pop the two 'None' values pushed earlier\n", " self.data.pop()\n", " self.frames.pop()\n", "\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: \"\n", " f\"wrapping generator {generator} (args={self.args})\")\n", "\n", " # At this point, we already have collected the args.\n", " # The returned generator depends on all of them.\n", " for arg in self.args:\n", " self.last_read += self.args[arg]\n", "\n", " # Wrap the generator such that the args are restored \n", " # when it is actually invoked, such that we can map them\n", " # to parameters.\n", " saved_args = copy.deepcopy(self.args)\n", "\n", " def wrapper() -> Generator[Any, None, None]:\n", " self.args = saved_args\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: \"\n", " f\"calling generator (args={self.args})\")\n", "\n", " self.ignore_next_location_change()\n", " yield from generator\n", "\n", " return wrapper()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We see an example of how function calls and returns work in conjunction with function arguments, discussed in the next section." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Function Arguments" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Finally, we handle parameters and arguments. The `args` stack holds the current stack of function arguments, holding the `last_read` variable for each argument." ] }, { "cell_type": "code", "execution_count": 173, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.640288Z", "iopub.status.busy": "2023-11-12T12:40:27.640196Z", "iopub.status.idle": "2023-11-12T12:40:27.643357Z", "shell.execute_reply": "2023-11-12T12:40:27.643053Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DependencyTracker(DependencyTracker):\n", " def arg(self, value: Any, pos: Optional[int] = None, kw: Optional[str] = None) -> Any:\n", " \"\"\"\n", " Track passing an argument `value`\n", " (with given position `pos` 1..n or keyword `kw`)\n", " \"\"\"\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: \"\n", " f\"saving args read {self.last_read}\")\n", "\n", " if pos:\n", " self.args[pos] = self.last_read\n", " if kw:\n", " self.args[kw] = self.last_read\n", "\n", " self.clear_read()\n", " return super().arg(value, pos, kw)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "When accessing the arguments (with `param()`), we can retrieve this set of read variables for each argument." ] }, { "cell_type": "code", "execution_count": 174, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.644903Z", "iopub.status.busy": "2023-11-12T12:40:27.644788Z", "iopub.status.idle": "2023-11-12T12:40:27.648345Z", "shell.execute_reply": "2023-11-12T12:40:27.648070Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DependencyTracker(DependencyTracker):\n", " def param(self, name: str, value: Any,\n", " pos: Optional[int] = None, vararg: str = \"\", last: bool = False) -> Any:\n", " \"\"\"\n", " Track getting a parameter `name` with value `value`\n", " (with given position `pos`).\n", " vararg parameters are indicated by setting `varargs` to \n", " '*' (*args) or '**' (**kwargs)\n", " \"\"\"\n", " self.clear_read()\n", "\n", " if vararg == '*':\n", " # We over-approximate by setting `args` to _all_ positional args\n", " for index in self.args:\n", " if isinstance(index, int) and pos is not None and index >= pos:\n", " self.last_read += self.args[index]\n", " elif vararg == '**':\n", " # We over-approximate by setting `kwargs` to _all_ passed keyword args\n", " for index in self.args:\n", " if isinstance(index, str):\n", " self.last_read += self.args[index]\n", " elif name in self.args:\n", " self.last_read = self.args[name]\n", " elif pos in self.args:\n", " self.last_read = self.args[pos]\n", "\n", " if self.log:\n", " caller_func, lineno = self.caller_location()\n", " print(f\"{caller_func.__name__}:{lineno}: \"\n", " f\"restored params read {self.last_read}\")\n", "\n", " self.ignore_location_change()\n", " ret = super().param(name, value, pos)\n", "\n", " if last:\n", " self.clear_read()\n", " self.args = {} # Mark `args` as processed\n", "\n", " return ret" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let us illustrate all these on a small example." ] }, { "cell_type": "code", "execution_count": 175, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.649927Z", "iopub.status.busy": "2023-11-12T12:40:27.649837Z", "iopub.status.idle": "2023-11-12T12:40:27.652448Z", "shell.execute_reply": "2023-11-12T12:40:27.652190Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def call_test() -> int:\n", " c = 47\n", "\n", " def sq(n: int) -> int:\n", " return n * n\n", "\n", " def gen(e: int) -> Generator[int, None, None]:\n", " yield e * c\n", "\n", " def just_x(x: Any, y: Any) -> Any:\n", " return x\n", "\n", " a = 42\n", " b = gen(a)\n", " d = list(b)[0]\n", "\n", " xs = [1, 2, 3, 4]\n", " ys = [sq(elem) for elem in xs if elem > 2]\n", "\n", " return just_x(just_x(d, y=b), ys[0])" ] }, { "cell_type": "code", "execution_count": 176, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.654098Z", "iopub.status.busy": "2023-11-12T12:40:27.653988Z", "iopub.status.idle": "2023-11-12T12:40:27.656377Z", "shell.execute_reply": "2023-11-12T12:40:27.656141Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "1974" ] }, "execution_count": 176, "metadata": {}, "output_type": "execute_result" } ], "source": [ "call_test()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We apply all our transformers on this code:" ] }, { "cell_type": "code", "execution_count": 177, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.657884Z", "iopub.status.busy": "2023-11-12T12:40:27.657768Z", "iopub.status.idle": "2023-11-12T12:40:27.702145Z", "shell.execute_reply": "2023-11-12T12:40:27.701821Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mcall_test\u001b[39;49;00m() -> \u001b[36mint\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " c = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mc\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[34m47\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[34mdef\u001b[39;49;00m \u001b[32msq\u001b[39;49;00m(n: \u001b[36mint\u001b[39;49;00m) -> \u001b[36mint\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33mn\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, n, pos=\u001b[34m1\u001b[39;49;00m, last=\u001b[34mTrue\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mn\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, n) * _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mn\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, n))\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[34mdef\u001b[39;49;00m \u001b[32mgen\u001b[39;49;00m(e: \u001b[36mint\u001b[39;49;00m) -> Generator[\u001b[36mint\u001b[39;49;00m, \u001b[34mNone\u001b[39;49;00m, \u001b[34mNone\u001b[39;49;00m]:\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33me\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, e, pos=\u001b[34m1\u001b[39;49;00m, last=\u001b[34mTrue\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[34myield\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33me\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, e) * _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mc\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, c))\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[34mdef\u001b[39;49;00m \u001b[32mjust_x\u001b[39;49;00m(x: Any, y: Any) -> Any:\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x, pos=\u001b[34m1\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, y, pos=\u001b[34m2\u001b[39;49;00m, last=\u001b[34mTrue\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x))\u001b[37m\u001b[39;49;00m\n", " a = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33ma\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[34m42\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " b = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mb\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.ret(_data.call(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mgen\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, gen))(_data.arg(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33ma\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, a), pos=\u001b[34m1\u001b[39;49;00m))))\u001b[37m\u001b[39;49;00m\n", " d = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33md\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.ret(_data.call(\u001b[36mlist\u001b[39;49;00m)(_data.arg(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mb\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, b), pos=\u001b[34m1\u001b[39;49;00m)))[\u001b[34m0\u001b[39;49;00m])\u001b[37m\u001b[39;49;00m\n", " xs = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mxs\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, [\u001b[34m1\u001b[39;49;00m, \u001b[34m2\u001b[39;49;00m, \u001b[34m3\u001b[39;49;00m, \u001b[34m4\u001b[39;49;00m])\u001b[37m\u001b[39;49;00m\n", " ys = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mys\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, [_data.ret(_data.call(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33msq\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, sq))(_data.arg(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33melem\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, elem), pos=\u001b[34m1\u001b[39;49;00m))) \u001b[34mfor\u001b[39;49;00m elem \u001b[35min\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33melem\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mxs\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, xs)) \u001b[34mif\u001b[39;49;00m _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33melem\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, elem) > \u001b[34m2\u001b[39;49;00m])\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.ret(_data.call(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mjust_x\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, just_x))(_data.arg(_data.ret(_data.call(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mjust_x\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, just_x))(_data.arg(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33md\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, d), pos=\u001b[34m1\u001b[39;49;00m), y=_data.arg(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mb\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, b), kw=\u001b[33m'\u001b[39;49;00m\u001b[33my\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m))), pos=\u001b[34m1\u001b[39;49;00m), _data.arg(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mys\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, ys)[\u001b[34m0\u001b[39;49;00m], pos=\u001b[34m2\u001b[39;49;00m))))\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "call_tree = ast.parse(inspect.getsource(call_test))\n", "TrackCallTransformer().visit(call_tree)\n", "TrackSetTransformer().visit(call_tree)\n", "TrackGetTransformer().visit(call_tree)\n", "TrackControlTransformer().visit(call_tree)\n", "TrackReturnTransformer().visit(call_tree)\n", "TrackParamsTransformer().visit(call_tree)\n", "dump_tree(call_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Again, we capture the dependencies:" ] }, { "cell_type": "code", "execution_count": 178, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.703924Z", "iopub.status.busy": "2023-11-12T12:40:27.703792Z", "iopub.status.idle": "2023-11-12T12:40:27.705822Z", "shell.execute_reply": "2023-11-12T12:40:27.705535Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class DependencyTrackerTester(DataTrackerTester):\n", " def make_data_tracker(self) -> DependencyTracker:\n", " return DependencyTracker(log=self.log)" ] }, { "cell_type": "code", "execution_count": 179, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.707504Z", "iopub.status.busy": "2023-11-12T12:40:27.707359Z", "iopub.status.idle": "2023-11-12T12:40:27.711017Z", "shell.execute_reply": "2023-11-12T12:40:27.710667Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with DependencyTrackerTester(call_tree, call_test, log=False) as call_deps:\n", " call_test()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see how \n", "\n", "* `a` flows into the generator `b` and into the parameter `e` of `gen()`.\n", "* `xs` flows into `elem` which in turn flows into the parameter `n` of `sq()`. Both flow into `ys`.\n", "* `just_x()` returns only its `x` argument." ] }, { "cell_type": "code", "execution_count": 180, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:27.712950Z", "iopub.status.busy": "2023-11-12T12:40:27.712836Z", "iopub.status.idle": "2023-11-12T12:40:28.248111Z", "shell.execute_reply": "2023-11-12T12:40:28.247655Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "e_functioncall_testlocalsgenat0x105d98790_7\n", "\n", "\n", "e\n", "def gen(e: int) -> Generator[int, None, None]:\n", "\n", "\n", "\n", "\n", "\n", "genyieldvalue_functioncall_testlocalsgenat0x105d98790_8\n", "\n", "\n", "<gen() yield value>\n", "yield e * c\n", "\n", "\n", "\n", "\n", "\n", "e_functioncall_testlocalsgenat0x105d98790_7->genyieldvalue_functioncall_testlocalsgenat0x105d98790_8\n", "\n", "\n", "\n", "\n", "\n", "\n", "a_functioncall_testat0x104da7e20_13\n", "\n", "\n", "a\n", "a = 42\n", "\n", "\n", "\n", "\n", "\n", "a_functioncall_testat0x104da7e20_13->e_functioncall_testlocalsgenat0x105d98790_7\n", "\n", "\n", "\n", "\n", "\n", "b_functioncall_testat0x104da7e20_14\n", "\n", "\n", "b\n", "b = gen(a)\n", "\n", "\n", "\n", "\n", "\n", "a_functioncall_testat0x104da7e20_13->b_functioncall_testat0x104da7e20_14\n", "\n", "\n", "\n", "\n", "\n", "\n", "elem_functioncall_testat0x104da7e20_18\n", "\n", "\n", "elem\n", "ys = [sq(elem) for elem in xs if elem > 2]\n", "\n", "\n", "\n", "\n", "\n", "n_functioncall_testlocalssqat0x104c68b80_4\n", "\n", "\n", "n\n", "def sq(n: int) -> int:\n", "\n", "\n", "\n", "\n", "\n", "elem_functioncall_testat0x104da7e20_18->n_functioncall_testlocalssqat0x104c68b80_4\n", "\n", "\n", "\n", "\n", "\n", "xs_functioncall_testat0x104da7e20_17\n", "\n", "\n", "xs\n", "xs = [1, 2, 3, 4]\n", "\n", "\n", "\n", "\n", "\n", "xs_functioncall_testat0x104da7e20_17->elem_functioncall_testat0x104da7e20_18\n", "\n", "\n", "\n", "\n", "\n", "\n", "x_functioncall_testlocalsjust_xat0x105d99cf0_10\n", "\n", "\n", "x\n", "def just_x(x: Any, y: Any) -> Any:\n", "\n", "\n", "\n", "\n", "\n", "just_xreturnvalue_functioncall_testlocalsjust_xat0x105d99cf0_11\n", "\n", "\n", "<just_x() return value>\n", "return x\n", "\n", "\n", "\n", "\n", "\n", "x_functioncall_testlocalsjust_xat0x105d99cf0_10->just_xreturnvalue_functioncall_testlocalsjust_xat0x105d99cf0_11\n", "\n", "\n", "\n", "\n", "\n", "d_functioncall_testat0x104da7e20_15\n", "\n", "\n", "d\n", "d = list(b)[0]\n", "\n", "\n", "\n", "\n", "\n", "\n", "d_functioncall_testat0x104da7e20_15->x_functioncall_testlocalsjust_xat0x105d99cf0_10\n", "\n", "\n", "\n", "\n", "\n", "just_xreturnvalue_functioncall_testlocalsjust_xat0x105d99cf0_11->x_functioncall_testlocalsjust_xat0x105d99cf0_10\n", "\n", "\n", "\n", "\n", "\n", "call_testreturnvalue_functioncall_testat0x104da7e20_20\n", "\n", "\n", "<call_test() return value>\n", "return just_x(just_x(d, y=b), ys[0])\n", "\n", "\n", "\n", "\n", "\n", "just_xreturnvalue_functioncall_testlocalsjust_xat0x105d99cf0_11->call_testreturnvalue_functioncall_testat0x104da7e20_20\n", "\n", "\n", "\n", "\n", "\n", "ys_functioncall_testat0x104da7e20_18\n", "\n", "\n", "ys\n", "ys = [sq(elem) for elem in xs if elem > 2]\n", "\n", "\n", "\n", "\n", "\n", "\n", "y_functioncall_testlocalsjust_xat0x105d99cf0_10\n", "\n", "\n", "y\n", "def just_x(x: Any, y: Any) -> Any:\n", "\n", "\n", "\n", "\n", "\n", "ys_functioncall_testat0x104da7e20_18->y_functioncall_testlocalsjust_xat0x105d99cf0_10\n", "\n", "\n", "\n", "\n", "\n", "c_functioncall_testat0x104da7e20_2\n", "\n", "\n", "c\n", "c = 47\n", "\n", "\n", "\n", "\n", "\n", "\n", "c_functioncall_testat0x104da7e20_2->genyieldvalue_functioncall_testlocalsgenat0x105d98790_8\n", "\n", "\n", "\n", "\n", "\n", "b_functioncall_testat0x104da7e20_14->d_functioncall_testat0x104da7e20_15\n", "\n", "\n", "\n", "\n", "\n", "\n", "b_functioncall_testat0x104da7e20_14->y_functioncall_testlocalsjust_xat0x105d99cf0_10\n", "\n", "\n", "\n", "\n", "\n", "genyieldvalue_functioncall_testlocalsgenat0x105d98790_8->d_functioncall_testat0x104da7e20_15\n", "\n", "\n", "\n", "\n", "\n", "sqreturnvalue_functioncall_testlocalssqat0x104c68b80_5\n", "\n", "\n", "<sq() return value>\n", "return n * n\n", "\n", "\n", "\n", "\n", "\n", "n_functioncall_testlocalssqat0x104c68b80_4->sqreturnvalue_functioncall_testlocalssqat0x104c68b80_5\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "Dependencies(\n", " data={\n", " ('c', (call_test, 2)): set(),\n", " ('a', (call_test, 13)): set(),\n", " ('b', (call_test, 14)): {('a', (call_test, 13))},\n", " ('e', (gen, 7)): {('a', (call_test, 13))},\n", " ('', (gen, 8)): {('c', (call_test, 2)), ('e', (gen, 7))},\n", " ('d', (call_test, 15)): {('', (gen, 8)), ('b', (call_test, 14))},\n", " ('xs', (call_test, 17)): set(),\n", " ('elem', (call_test, 18)): {('xs', (call_test, 17))},\n", " ('n', (sq, 4)): {('elem', (call_test, 18))},\n", " ('', (sq, 5)): {('n', (sq, 4))},\n", " ('ys', (call_test, 18)): set(),\n", " ('x', (just_x, 10)): {('d', (call_test, 15)), ('', (just_x, 11))},\n", " ('y', (just_x, 10)): {('ys', (call_test, 18)), ('b', (call_test, 14))},\n", " ('', (just_x, 11)): {('x', (just_x, 10))},\n", " ('', (call_test, 20)): {('', (just_x, 11))}},\n", " control={\n", " ('c', (call_test, 2)): set(),\n", " ('a', (call_test, 13)): set(),\n", " ('b', (call_test, 14)): set(),\n", " ('e', (gen, 7)): set(),\n", " ('', (gen, 8)): set(),\n", " ('d', (call_test, 15)): set(),\n", " ('xs', (call_test, 17)): set(),\n", " ('elem', (call_test, 18)): set(),\n", " ('n', (sq, 4)): set(),\n", " ('', (sq, 5)): set(),\n", " ('ys', (call_test, 18)): set(),\n", " ('x', (just_x, 10)): set(),\n", " ('y', (just_x, 10)): set(),\n", " ('', (just_x, 11)): set(),\n", " ('', (call_test, 20)): set()})" ] }, "execution_count": 180, "metadata": {}, "output_type": "execute_result" } ], "source": [ "call_deps.dependencies()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `code()` view lists each function separately:" ] }, { "cell_type": "code", "execution_count": 181, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:28.249922Z", "iopub.status.busy": "2023-11-12T12:40:28.249760Z", "iopub.status.idle": "2023-11-12T12:40:29.043907Z", "shell.execute_reply": "2023-11-12T12:40:29.043529Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* 7 \u001b[34mdef\u001b[39;49;00m \u001b[32mgen\u001b[39;49;00m(e: \u001b[36mint\u001b[39;49;00m) -> Generator[\u001b[36mint\u001b[39;49;00m, \u001b[34mNone\u001b[39;49;00m, \u001b[34mNone\u001b[39;49;00m]: \u001b[37m# <= a (call_test:13)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 8 \u001b[34myield\u001b[39;49;00m e * c \u001b[37m# <= c (call_test:2), e (7)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\n", " 1 \u001b[34mdef\u001b[39;49;00m \u001b[32mcall_test\u001b[39;49;00m() -> \u001b[36mint\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", "* 2 c = \u001b[34m47\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 3 \u001b[37m\u001b[39;49;00m\n", " 4 \u001b[34mdef\u001b[39;49;00m \u001b[32msq\u001b[39;49;00m(n: \u001b[36mint\u001b[39;49;00m) -> \u001b[36mint\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " 5 \u001b[34mreturn\u001b[39;49;00m n * n\u001b[37m\u001b[39;49;00m\n", " 6 \u001b[37m\u001b[39;49;00m\n", " 7 \u001b[34mdef\u001b[39;49;00m \u001b[32mgen\u001b[39;49;00m(e: \u001b[36mint\u001b[39;49;00m) -> Generator[\u001b[36mint\u001b[39;49;00m, \u001b[34mNone\u001b[39;49;00m, \u001b[34mNone\u001b[39;49;00m]:\u001b[37m\u001b[39;49;00m\n", " 8 \u001b[34myield\u001b[39;49;00m e * c\u001b[37m\u001b[39;49;00m\n", " 9 \u001b[37m\u001b[39;49;00m\n", " 10 \u001b[34mdef\u001b[39;49;00m \u001b[32mjust_x\u001b[39;49;00m(x: Any, y: Any) -> Any:\u001b[37m\u001b[39;49;00m\n", " 11 \u001b[34mreturn\u001b[39;49;00m x\u001b[37m\u001b[39;49;00m\n", " 12 \u001b[37m\u001b[39;49;00m\n", "* 13 a = \u001b[34m42\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 14 b = gen(a) \u001b[37m# <= a (13)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 15 d = \u001b[36mlist\u001b[39;49;00m(b)[\u001b[34m0\u001b[39;49;00m] \u001b[37m# <= (gen:8), b (14)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 16 \u001b[37m\u001b[39;49;00m\n", "* 17 xs = [\u001b[34m1\u001b[39;49;00m, \u001b[34m2\u001b[39;49;00m, \u001b[34m3\u001b[39;49;00m, \u001b[34m4\u001b[39;49;00m]\u001b[37m\u001b[39;49;00m\n", "* 18 ys = [sq(elem) \u001b[34mfor\u001b[39;49;00m elem \u001b[35min\u001b[39;49;00m xs \u001b[34mif\u001b[39;49;00m elem > \u001b[34m2\u001b[39;49;00m] \u001b[37m# <= xs (17)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 19 \u001b[37m\u001b[39;49;00m\n", "* 20 \u001b[34mreturn\u001b[39;49;00m just_x(just_x(d, y=b), ys[\u001b[34m0\u001b[39;49;00m]) \u001b[37m# <= (just_x:11)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\n", "* 10 \u001b[34mdef\u001b[39;49;00m \u001b[32mjust_x\u001b[39;49;00m(x: Any, y: Any) -> Any: \u001b[37m# <= d (call_test:15), (11), ys (call_test:18), b (call_test:14)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 11 \u001b[34mreturn\u001b[39;49;00m x \u001b[37m# <= x (10)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\n", "* 4 \u001b[34mdef\u001b[39;49;00m \u001b[32msq\u001b[39;49;00m(n: \u001b[36mint\u001b[39;49;00m) -> \u001b[36mint\u001b[39;49;00m: \u001b[37m# <= elem (call_test:18)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 5 \u001b[34mreturn\u001b[39;49;00m n * n \u001b[37m# <= n (4)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n" ] } ], "source": [ "call_deps.dependencies().code()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Diagnostics" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To check the dependencies we obtain, we perform some minimal checks on whether a referenced variable actually also occurs in the source code." ] }, { "cell_type": "code", "execution_count": 182, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.045780Z", "iopub.status.busy": "2023-11-12T12:40:29.045642Z", "iopub.status.idle": "2023-11-12T12:40:29.047457Z", "shell.execute_reply": "2023-11-12T12:40:29.047148Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import re" ] }, { "cell_type": "code", "execution_count": 183, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.049030Z", "iopub.status.busy": "2023-11-12T12:40:29.048924Z", "iopub.status.idle": "2023-11-12T12:40:29.052367Z", "shell.execute_reply": "2023-11-12T12:40:29.052081Z" }, "ipub": { "ignore": true }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Dependencies(Dependencies):\n", " def validate(self) -> None:\n", " \"\"\"Perform a simple syntactic validation of dependencies\"\"\"\n", " super().validate()\n", "\n", " for var in self.all_vars():\n", " source = self.source(var)\n", " if not source:\n", " continue\n", " if source.startswith('<'):\n", " continue # no source\n", "\n", " for dep_var in self.data[var] | self.control[var]:\n", " dep_name, dep_location = dep_var\n", "\n", " if dep_name == DependencyTracker.TEST:\n", " continue # dependency on \n", "\n", " if dep_name.endswith(' value>'):\n", " if source.find('(') < 0:\n", " warnings.warn(f\"Warning: {self.format_var(var)} \"\n", " f\"depends on {self.format_var(dep_var)}, \"\n", " f\"but {repr(source)} does not \"\n", " f\"seem to have a call\")\n", " continue\n", "\n", " if source.startswith('def'):\n", " continue # function call\n", "\n", " rx = re.compile(r'\\b' + dep_name + r'\\b')\n", " if rx.search(source) is None:\n", " warnings.warn(f\"{self.format_var(var)} \"\n", " f\"depends on {self.format_var(dep_var)}, \"\n", " f\"but {repr(dep_name)} does not occur \"\n", " f\"in {repr(source)}\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`validate()` is automatically called whenever dependencies are output, so if you see any of its error messages, something may be wrong." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "At this point, `DependencyTracker` is complete; we have all in place to track even complex dependencies in instrumented code." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Slicing Code" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let us now put all these pieces together. We have a means to instrument the source code (our various `NodeTransformer` classes) and a means to track dependencies (the `DependencyTracker` class). Now comes the time to put all these things together in a single tool, which we call `Slicer`.\n", "\n", "The basic idea of `Slicer` is that you can use it as follows:\n", "\n", "```python\n", "with Slicer(func_1, func_2, ...) as slicer:\n", " func(...)\n", "```\n", "\n", "which first _instruments_ the functions given in the constructor (i.e., replaces their definitions with instrumented counterparts), and then runs the code in the body, calling instrumented functions, and allowing the slicer to collect dependencies. When the body returns, the original definition of the instrumented functions is restored." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### An Instrumenter Base Class" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The basic functionality of instrumenting a number of functions (and restoring them at the end of the `with` block) comes in a `Instrumenter` base class. It invokes `instrument()` on all items to instrument; this is to be overloaded in subclasses." ] }, { "cell_type": "code", "execution_count": 184, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.054057Z", "iopub.status.busy": "2023-11-12T12:40:29.053960Z", "iopub.status.idle": "2023-11-12T12:40:29.057650Z", "shell.execute_reply": "2023-11-12T12:40:29.057253Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Instrumenter(StackInspector):\n", " \"\"\"Instrument functions for dynamic tracking\"\"\"\n", "\n", " def __init__(self, *items_to_instrument: Callable,\n", " globals: Optional[Dict[str, Any]] = None,\n", " log: Union[bool, int] = False) -> None:\n", " \"\"\"\n", " Create an instrumenter.\n", " `items_to_instrument` is a list of items to instrument.\n", " `globals` is a namespace to use (default: caller's globals())\n", " \"\"\"\n", "\n", " self.log = log\n", " self.items_to_instrument: List[Callable] = list(items_to_instrument)\n", " self.instrumented_items: Set[Any] = set()\n", "\n", " if globals is None:\n", " globals = self.caller_globals()\n", " self.globals = globals\n", "\n", " def __enter__(self) -> Any:\n", " \"\"\"Instrument sources\"\"\"\n", " items = self.items_to_instrument\n", " if not items:\n", " items = self.default_items_to_instrument()\n", "\n", " for item in items:\n", " self.instrument(item)\n", "\n", " return self\n", "\n", " def default_items_to_instrument(self) -> List[Callable]:\n", " return []\n", "\n", " def instrument(self, item: Any) -> Any:\n", " \"\"\"Instrument `item`. To be overloaded in subclasses.\"\"\"\n", " if self.log:\n", " print(\"Instrumenting\", item)\n", " self.instrumented_items.add(item)\n", " return item" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "At the end of the `with` block, we restore the given functions." ] }, { "cell_type": "code", "execution_count": 185, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.059539Z", "iopub.status.busy": "2023-11-12T12:40:29.059294Z", "iopub.status.idle": "2023-11-12T12:40:29.061720Z", "shell.execute_reply": "2023-11-12T12:40:29.061395Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Instrumenter(Instrumenter):\n", " def __exit__(self, exc_type: Type, exc_value: BaseException,\n", " traceback: TracebackType) -> Optional[bool]:\n", " \"\"\"Restore sources\"\"\"\n", " self.restore()\n", " return None\n", "\n", " def restore(self) -> None:\n", " for item in self.instrumented_items:\n", " self.globals[item.__name__] = item" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "By default, an `Instrumenter` simply outputs a log message:" ] }, { "cell_type": "code", "execution_count": 186, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.063235Z", "iopub.status.busy": "2023-11-12T12:40:29.063118Z", "iopub.status.idle": "2023-11-12T12:40:29.065026Z", "shell.execute_reply": "2023-11-12T12:40:29.064755Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Instrumenting \n" ] } ], "source": [ "with Instrumenter(middle, log=True) as ins:\n", " pass" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### The Slicer Class" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `Slicer` class comes as a subclass of `Instrumenter`. It sets its own dependency tracker (which can be overwritten by setting the `dependency_tracker` keyword argument)." ] }, { "cell_type": "code", "execution_count": 187, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.066865Z", "iopub.status.busy": "2023-11-12T12:40:29.066647Z", "iopub.status.idle": "2023-11-12T12:40:29.069770Z", "shell.execute_reply": "2023-11-12T12:40:29.069485Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Slicer(Instrumenter):\n", " \"\"\"Track dependencies in an execution\"\"\"\n", "\n", " def __init__(self, *items_to_instrument: Any,\n", " dependency_tracker: Optional[DependencyTracker] = None,\n", " globals: Optional[Dict[str, Any]] = None,\n", " log: Union[bool, int] = False):\n", " \"\"\"Create a slicer.\n", " `items_to_instrument` are Python functions or modules with source code.\n", " `dependency_tracker` is the tracker to be used (default: DependencyTracker).\n", " `globals` is the namespace to be used(default: caller's `globals()`)\n", " `log`=True or `log` > 0 turns on logging\n", " \"\"\"\n", " super().__init__(*items_to_instrument, globals=globals, log=log)\n", "\n", " if dependency_tracker is None:\n", " dependency_tracker = DependencyTracker(log=(log > 1))\n", " self.dependency_tracker = dependency_tracker\n", "\n", " self.saved_dependencies = None\n", "\n", " def default_items_to_instrument(self) -> List[Callable]:\n", " raise ValueError(\"Need one or more items to instrument\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `parse()` method parses a given item, returning its AST." ] }, { "cell_type": "code", "execution_count": 188, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.071367Z", "iopub.status.busy": "2023-11-12T12:40:29.071253Z", "iopub.status.idle": "2023-11-12T12:40:29.073708Z", "shell.execute_reply": "2023-11-12T12:40:29.073395Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Slicer(Slicer):\n", " def parse(self, item: Any) -> AST:\n", " \"\"\"Parse `item`, returning its AST\"\"\"\n", " source_lines, lineno = inspect.getsourcelines(item)\n", " source = \"\".join(source_lines)\n", "\n", " if self.log >= 2:\n", " print_content(source, '.py', start_line_number=lineno)\n", " print()\n", " print()\n", "\n", " tree = ast.parse(source)\n", " ast.increment_lineno(tree, lineno - 1)\n", " return tree" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `transform()` method applies the list of transformers defined earlier in this chapter." ] }, { "cell_type": "code", "execution_count": 189, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.075799Z", "iopub.status.busy": "2023-11-12T12:40:29.075645Z", "iopub.status.idle": "2023-11-12T12:40:29.078886Z", "shell.execute_reply": "2023-11-12T12:40:29.078581Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Slicer(Slicer):\n", " def transformers(self) -> List[NodeTransformer]:\n", " \"\"\"List of transformers to apply. To be extended in subclasses.\"\"\"\n", " return [\n", " TrackCallTransformer(),\n", " TrackSetTransformer(),\n", " TrackGetTransformer(),\n", " TrackControlTransformer(),\n", " TrackReturnTransformer(),\n", " TrackParamsTransformer()\n", " ]\n", "\n", " def transform(self, tree: AST) -> AST:\n", " \"\"\"Apply transformers on `tree`. May be extended in subclasses.\"\"\"\n", " # Apply transformers\n", " for transformer in self.transformers():\n", " if self.log >= 3:\n", " print(transformer.__class__.__name__ + ':')\n", "\n", " transformer.visit(tree)\n", " ast.fix_missing_locations(tree)\n", " if self.log >= 3:\n", " print_content(ast.unparse(tree), '.py')\n", " print()\n", " print()\n", "\n", " if 0 < self.log < 3:\n", " print_content(ast.unparse(tree), '.py')\n", " print()\n", " print()\n", "\n", " return tree" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `execute()` method executes the transformed tree (such that we get the new definitions). We also make the dependency tracker available for the code in the `with` block." ] }, { "cell_type": "code", "execution_count": 190, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.080511Z", "iopub.status.busy": "2023-11-12T12:40:29.080384Z", "iopub.status.idle": "2023-11-12T12:40:29.082864Z", "shell.execute_reply": "2023-11-12T12:40:29.082608Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Slicer(Slicer):\n", " def execute(self, tree: AST, item: Any) -> None:\n", " \"\"\"Compile and execute `tree`. May be extended in subclasses.\"\"\"\n", "\n", " # We pass the source file of `item` such that we can retrieve it\n", " # when accessing the location of the new compiled code\n", " source = cast(str, inspect.getsourcefile(item))\n", " code = compile(cast(ast.Module, tree), source, 'exec')\n", "\n", " # Enable dependency tracker\n", " self.globals[DATA_TRACKER] = self.dependency_tracker\n", "\n", " # Execute the code, resulting in a redefinition of item\n", " exec(code, self.globals)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `instrument()` method puts all these together, first parsing the item into a tree, then transforming and executing the tree." ] }, { "cell_type": "code", "execution_count": 191, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.084328Z", "iopub.status.busy": "2023-11-12T12:40:29.084200Z", "iopub.status.idle": "2023-11-12T12:40:29.086680Z", "shell.execute_reply": "2023-11-12T12:40:29.086398Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Slicer(Slicer):\n", " def instrument(self, item: Any) -> Any:\n", " \"\"\"Instrument `item`, transforming its source code, and re-defining it.\"\"\"\n", " if is_internal(item.__name__):\n", " return item # Do not instrument `print()` and the like\n", "\n", " if inspect.isbuiltin(item):\n", " return item # No source code\n", "\n", " item = super().instrument(item)\n", " tree = self.parse(item)\n", " tree = self.transform(tree)\n", " self.execute(tree, item)\n", "\n", " new_item = self.globals[item.__name__]\n", " return new_item" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "When we restore the original definition (after the `with` block), we save the dependency tracker again." ] }, { "cell_type": "code", "execution_count": 192, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.088191Z", "iopub.status.busy": "2023-11-12T12:40:29.088086Z", "iopub.status.idle": "2023-11-12T12:40:29.090401Z", "shell.execute_reply": "2023-11-12T12:40:29.090091Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Slicer(Slicer):\n", " def restore(self) -> None:\n", " \"\"\"Restore original code.\"\"\"\n", " if DATA_TRACKER in self.globals:\n", " self.saved_dependencies = self.globals[DATA_TRACKER]\n", " del self.globals[DATA_TRACKER]\n", "\n", " super().restore()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Three convenience functions allow us to see the dependencies as (well) dependencies, as code, and as graph. These simply invoke the respective functions on the saved dependencies." ] }, { "cell_type": "code", "execution_count": 193, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.092169Z", "iopub.status.busy": "2023-11-12T12:40:29.092037Z", "iopub.status.idle": "2023-11-12T12:40:29.095157Z", "shell.execute_reply": "2023-11-12T12:40:29.094854Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Slicer(Slicer):\n", " def dependencies(self) -> Dependencies:\n", " \"\"\"Return collected dependencies.\"\"\"\n", " if self.saved_dependencies is None:\n", " return Dependencies({}, {})\n", " return self.saved_dependencies.dependencies()\n", "\n", " def code(self, *args: Any, **kwargs: Any) -> None:\n", " \"\"\"Show code of instrumented items, annotated with dependencies.\"\"\"\n", " first = True\n", " for item in self.instrumented_items:\n", " if not first:\n", " print()\n", " self.dependencies().code(item, *args, **kwargs) # type: ignore\n", " first = False\n", "\n", " def graph(self, *args: Any, **kwargs: Any) -> Digraph:\n", " \"\"\"Show dependency graph.\"\"\"\n", " return self.dependencies().graph(*args, **kwargs) # type: ignore\n", "\n", " def _repr_mimebundle_(self, include: Any = None, exclude: Any = None) -> Any:\n", " \"\"\"If the object is output in Jupyter, render dependencies as a SVG graph\"\"\"\n", " return self.graph()._repr_mimebundle_(include, exclude)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let us put `Slicer` into action. We track our `middle()` function:" ] }, { "cell_type": "code", "execution_count": 194, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.096654Z", "iopub.status.busy": "2023-11-12T12:40:29.096533Z", "iopub.status.idle": "2023-11-12T12:40:29.102618Z", "shell.execute_reply": "2023-11-12T12:40:29.102337Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 194, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with Slicer(middle) as slicer:\n", " m = middle(2, 1, 3)\n", "m" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "These are the dependencies in string form (used when printed):" ] }, { "cell_type": "code", "execution_count": 195, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.104213Z", "iopub.status.busy": "2023-11-12T12:40:29.104098Z", "iopub.status.idle": "2023-11-12T12:40:29.108294Z", "shell.execute_reply": "2023-11-12T12:40:29.108003Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "middle():\n", " (2) <= y (1), z (1)\n", " (3) <= x (1), y (1); <- (2)\n", " (5) <= x (1), z (1); <- (3)\n", " (6) <= y (1); <- (5)\n", "\n" ] } ], "source": [ "print(slicer.dependencies())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This is the code form:" ] }, { "cell_type": "code", "execution_count": 196, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.110090Z", "iopub.status.busy": "2023-11-12T12:40:29.109966Z", "iopub.status.idle": "2023-11-12T12:40:29.468080Z", "shell.execute_reply": "2023-11-12T12:40:29.467781Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* 1 \u001b[34mdef\u001b[39;49;00m \u001b[32mmiddle\u001b[39;49;00m(x, y, z): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 2 \u001b[34mif\u001b[39;49;00m y < z: \u001b[37m# <= y (1), z (1)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 3 \u001b[34mif\u001b[39;49;00m x < y: \u001b[37m# <= x (1), y (1); <- (2)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 4 \u001b[34mreturn\u001b[39;49;00m y\u001b[37m\u001b[39;49;00m\n", "* 5 \u001b[34melif\u001b[39;49;00m x < z: \u001b[37m# <= x (1), z (1); <- (3)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 6 \u001b[34mreturn\u001b[39;49;00m y \u001b[37m# <= y (1); <- (5)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 7 \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " 8 \u001b[34mif\u001b[39;49;00m x > y:\u001b[37m\u001b[39;49;00m\n", " 9 \u001b[34mreturn\u001b[39;49;00m y\u001b[37m\u001b[39;49;00m\n", " 10 \u001b[34melif\u001b[39;49;00m x > z:\u001b[37m\u001b[39;49;00m\n", " 11 \u001b[34mreturn\u001b[39;49;00m x\u001b[37m\u001b[39;49;00m\n", " 12 \u001b[34mreturn\u001b[39;49;00m z\u001b[37m\u001b[39;49;00m\n" ] } ], "source": [ "slicer.code()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And this is the graph form:" ] }, { "cell_type": "code", "execution_count": 197, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.469835Z", "iopub.status.busy": "2023-11-12T12:40:29.469716Z", "iopub.status.idle": "2023-11-12T12:40:29.896014Z", "shell.execute_reply": "2023-11-12T12:40:29.895616Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "test_functionmiddleat0x105d9a5f0_5\n", "\n", "\n", "<test>\n", "elif x < z:\n", "\n", "\n", "\n", "\n", "\n", "middlereturnvalue_functionmiddleat0x105d9a5f0_6\n", "\n", "\n", "<middle() return value>\n", "return y\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x105d9a5f0_5->middlereturnvalue_functionmiddleat0x105d9a5f0_6\n", "\n", "\n", "\n", "\n", "\n", "\n", "x_functionmiddleat0x105d9a5f0_1\n", "\n", "\n", "x\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "x_functionmiddleat0x105d9a5f0_1->test_functionmiddleat0x105d9a5f0_5\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x105d9a5f0_3\n", "\n", "\n", "<test>\n", "if x < y:\n", "\n", "\n", "\n", "\n", "\n", "x_functionmiddleat0x105d9a5f0_1->test_functionmiddleat0x105d9a5f0_3\n", "\n", "\n", "\n", "\n", "\n", "z_functionmiddleat0x105d9a5f0_1\n", "\n", "\n", "z\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "z_functionmiddleat0x105d9a5f0_1->test_functionmiddleat0x105d9a5f0_5\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x105d9a5f0_2\n", "\n", "\n", "<test>\n", "if y < z:\n", "\n", "\n", "\n", "\n", "\n", "z_functionmiddleat0x105d9a5f0_1->test_functionmiddleat0x105d9a5f0_2\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x105d9a5f0_3->test_functionmiddleat0x105d9a5f0_5\n", "\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x105d9a5f0_1\n", "\n", "\n", "y\n", "def middle(x, y, z):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x105d9a5f0_1->test_functionmiddleat0x105d9a5f0_3\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x105d9a5f0_1->middlereturnvalue_functionmiddleat0x105d9a5f0_6\n", "\n", "\n", "\n", "\n", "\n", "y_functionmiddleat0x105d9a5f0_1->test_functionmiddleat0x105d9a5f0_2\n", "\n", "\n", "\n", "\n", "\n", "test_functionmiddleat0x105d9a5f0_2->test_functionmiddleat0x105d9a5f0_3\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "<__main__.Slicer at 0x107471d50>" ] }, "execution_count": 197, "metadata": {}, "output_type": "execute_result" } ], "source": [ "slicer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "You can also access the raw `repr()` form, which allows you to reconstruct dependencies at any time. (This is how we showed off dependencies at the beginning of this chapter, before even introducing the code that computes them.)" ] }, { "cell_type": "code", "execution_count": 198, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.897862Z", "iopub.status.busy": "2023-11-12T12:40:29.897695Z", "iopub.status.idle": "2023-11-12T12:40:29.901481Z", "shell.execute_reply": "2023-11-12T12:40:29.901203Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dependencies(\n", " data={\n", " ('x', (middle, 1)): set(),\n", " ('y', (middle, 1)): set(),\n", " ('z', (middle, 1)): set(),\n", " ('', (middle, 2)): {('y', (middle, 1)), ('z', (middle, 1))},\n", " ('', (middle, 3)): {('x', (middle, 1)), ('y', (middle, 1))},\n", " ('', (middle, 5)): {('x', (middle, 1)), ('z', (middle, 1))},\n", " ('', (middle, 6)): {('y', (middle, 1))}},\n", " control={\n", " ('x', (middle, 1)): set(),\n", " ('y', (middle, 1)): set(),\n", " ('z', (middle, 1)): set(),\n", " ('', (middle, 2)): set(),\n", " ('', (middle, 3)): {('', (middle, 2))},\n", " ('', (middle, 5)): {('', (middle, 3))},\n", " ('', (middle, 6)): {('', (middle, 5))}})\n" ] } ], "source": [ "print(repr(slicer.dependencies()))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Diagnostics" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `Slicer` constructor accepts a `log` argument (default: False), which can be set to show various intermediate results:\n", "\n", "* `log=True` (or `log=1`): Show instrumented source code\n", "* `log=2`: Also log execution\n", "* `log=3`: Also log individual transformer steps\n", "* `log=4`: Also log source line numbers" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## More Examples" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us demonstrate our `Slicer` class on a few more examples." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Square Root" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `square_root()` function from [the chapter on assertions](Assertions.ipynb) demonstrates a nice interplay between data and control dependencies." ] }, { "cell_type": "code", "execution_count": 199, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.903110Z", "iopub.status.busy": "2023-11-12T12:40:29.903021Z", "iopub.status.idle": "2023-11-12T12:40:29.904693Z", "shell.execute_reply": "2023-11-12T12:40:29.904397Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import math" ] }, { "cell_type": "code", "execution_count": 200, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.905981Z", "iopub.status.busy": "2023-11-12T12:40:29.905898Z", "iopub.status.idle": "2023-11-12T12:40:29.907719Z", "shell.execute_reply": "2023-11-12T12:40:29.907326Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from Assertions import square_root # minor dependency" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is the original source code:" ] }, { "cell_type": "code", "execution_count": 201, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.909568Z", "iopub.status.busy": "2023-11-12T12:40:29.909453Z", "iopub.status.idle": "2023-11-12T12:40:29.943669Z", "shell.execute_reply": "2023-11-12T12:40:29.943298Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32msquare_root\u001b[39;49;00m(x): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m x >= \u001b[34m0\u001b[39;49;00m \u001b[37m# precondition\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m math.isclose(approx * approx, x)\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(inspect.getsource(square_root), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Turning on logging shows the instrumented version:" ] }, { "cell_type": "code", "execution_count": 202, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.945525Z", "iopub.status.busy": "2023-11-12T12:40:29.945360Z", "iopub.status.idle": "2023-11-12T12:40:29.985726Z", "shell.execute_reply": "2023-11-12T12:40:29.985429Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Instrumenting \n", "\u001b[34mdef\u001b[39;49;00m \u001b[32msquare_root\u001b[39;49;00m(x):\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x, pos=\u001b[34m1\u001b[39;49;00m, last=\u001b[34mTrue\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) >= \u001b[34m0\u001b[39;49;00m, loads=(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x),))\u001b[37m\u001b[39;49;00m\n", " approx = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mapprox\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[34mNone\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " guess = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mguess\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) / \u001b[34m2\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m _data.test(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mapprox\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, approx) != _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mguess\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, guess)):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwith\u001b[39;49;00m _data:\u001b[37m\u001b[39;49;00m\n", " approx = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mapprox\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mguess\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, guess))\u001b[37m\u001b[39;49;00m\n", " guess = _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33mguess\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, (_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mapprox\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, approx) + _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x) / _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mapprox\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, approx)) / \u001b[34m2\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.ret(_data.call(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mmath\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, math).isclose)(_data.arg(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mapprox\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, approx) * _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mapprox\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, approx), pos=\u001b[34m1\u001b[39;49;00m), _data.arg(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x), pos=\u001b[34m2\u001b[39;49;00m))), loads=(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x), _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mmath\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, math), _data, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mapprox\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, approx)))\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mapprox\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, approx))\u001b[37m\u001b[39;49;00m\n", "\n" ] } ], "source": [ "with Slicer(square_root, log=True) as root_slicer:\n", " y = square_root(2.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The dependency graph shows how `guess` and `approx` flow into each other until they are the same." ] }, { "cell_type": "code", "execution_count": 203, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:29.987417Z", "iopub.status.busy": "2023-11-12T12:40:29.987277Z", "iopub.status.idle": "2023-11-12T12:40:30.426087Z", "shell.execute_reply": "2023-11-12T12:40:30.425594Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "square_rootreturnvalue_functionsquare_rootat0x105d38160_64\n", "\n", "\n", "<square_root() return value>\n", "return approx\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x105d38160_60\n", "\n", "\n", "approx\n", "approx = guess\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x105d38160_60->square_rootreturnvalue_functionsquare_rootat0x105d38160_64\n", "\n", "\n", "\n", "\n", "\n", "guess_functionsquare_rootat0x105d38160_61\n", "\n", "\n", "guess\n", "guess = (approx + x / approx) / 2\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x105d38160_60->guess_functionsquare_rootat0x105d38160_61\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionsquare_rootat0x105d38160_59\n", "\n", "\n", "<test>\n", "while approx != guess:\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x105d38160_60->test_functionsquare_rootat0x105d38160_59\n", "\n", "\n", "\n", "\n", "\n", "assertion_functionsquare_rootat0x105d38160_63\n", "\n", "\n", "<assertion>\n", "assert math.isclose(approx * approx, x)\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x105d38160_60->assertion_functionsquare_rootat0x105d38160_63\n", "\n", "\n", "\n", "\n", "\n", "x_functionsquare_rootat0x105d38160_54\n", "\n", "\n", "x\n", "def square_root(x):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "x_functionsquare_rootat0x105d38160_54->guess_functionsquare_rootat0x105d38160_61\n", "\n", "\n", "\n", "\n", "\n", "guess_functionsquare_rootat0x105d38160_58\n", "\n", "\n", "guess\n", "guess = x / 2\n", "\n", "\n", "\n", "\n", "\n", "x_functionsquare_rootat0x105d38160_54->guess_functionsquare_rootat0x105d38160_58\n", "\n", "\n", "\n", "\n", "\n", "assertion_functionsquare_rootat0x105d38160_55\n", "\n", "\n", "<assertion>\n", "assert x >= 0  # precondition\n", "\n", "\n", "\n", "\n", "\n", "x_functionsquare_rootat0x105d38160_54->assertion_functionsquare_rootat0x105d38160_55\n", "\n", "\n", "\n", "\n", "\n", "\n", "x_functionsquare_rootat0x105d38160_54->assertion_functionsquare_rootat0x105d38160_63\n", "\n", "\n", "\n", "\n", "\n", "guess_functionsquare_rootat0x105d38160_61->approx_functionsquare_rootat0x105d38160_60\n", "\n", "\n", "\n", "\n", "\n", "guess_functionsquare_rootat0x105d38160_61->test_functionsquare_rootat0x105d38160_59\n", "\n", "\n", "\n", "\n", "\n", "\n", "guess_functionsquare_rootat0x105d38160_58->approx_functionsquare_rootat0x105d38160_60\n", "\n", "\n", "\n", "\n", "\n", "guess_functionsquare_rootat0x105d38160_58->test_functionsquare_rootat0x105d38160_59\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionsquare_rootat0x105d38160_59->approx_functionsquare_rootat0x105d38160_60\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionsquare_rootat0x105d38160_59->guess_functionsquare_rootat0x105d38160_61\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x105d38160_57\n", "\n", "\n", "approx\n", "approx = None\n", "\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x105d38160_57->test_functionsquare_rootat0x105d38160_59\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "<__main__.Slicer at 0x105de88e0>" ] }, "execution_count": 203, "metadata": {}, "output_type": "execute_result" } ], "source": [ "root_slicer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Again, we can show the code annotated with dependencies:" ] }, { "cell_type": "code", "execution_count": 204, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:30.427994Z", "iopub.status.busy": "2023-11-12T12:40:30.427867Z", "iopub.status.idle": "2023-11-12T12:40:30.764670Z", "shell.execute_reply": "2023-11-12T12:40:30.764224Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* 54 \u001b[34mdef\u001b[39;49;00m \u001b[32msquare_root\u001b[39;49;00m(x): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 55 \u001b[34massert\u001b[39;49;00m x >= \u001b[34m0\u001b[39;49;00m \u001b[37m# precondition # <= x (54)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 56 \u001b[37m\u001b[39;49;00m\n", "* 57 approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 58 guess = x / \u001b[34m2\u001b[39;49;00m \u001b[37m# <= x (54)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 59 \u001b[34mwhile\u001b[39;49;00m approx != guess: \u001b[37m# <= guess (61), approx (57), guess (58), approx (60)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 60 approx = guess \u001b[37m# <= guess (61), guess (58); <- (59)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 61 guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m \u001b[37m# <= x (54), approx (60); <- (59)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 62 \u001b[37m\u001b[39;49;00m\n", "* 63 \u001b[34massert\u001b[39;49;00m math.isclose(approx * approx, x) \u001b[37m# <= x (54), approx (60)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 64 \u001b[34mreturn\u001b[39;49;00m approx \u001b[37m# <= approx (60)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n" ] } ], "source": [ "root_slicer.code()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The astute reader may find that a statement `assert p` does not control the following code, although it would be equivalent to `if not p: raise Exception`. Why is that?" ] }, { "cell_type": "code", "execution_count": 205, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:30.767001Z", "iopub.status.busy": "2023-11-12T12:40:30.766849Z", "iopub.status.idle": "2023-11-12T12:40:30.771871Z", "shell.execute_reply": "2023-11-12T12:40:30.771560Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

Why don't assert statements induce control dependencies?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 205, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"Why don't `assert` statements induce control dependencies?\",\n", " [\n", " \"We have no special handling of `raise` statements\",\n", " \"We have no special handling of exceptions\",\n", " \"Assertions are not supposed to act as controlling mechanisms\",\n", " \"All of the above\",\n", " ], '(1 * 1 << 1 * 1 << 1 * 1)')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Indeed: we treat assertions as \"neutral\" in the sense that they do not affect the remainder of the program – if they are turned off, they have no effect; and if they are turned on, the remaining program logic should not depend on them. (Our instrumentation also has no special treatment of `raise` or even `return` statements; they should be handled by our `with` blocks, though.)" ] }, { "cell_type": "code", "execution_count": 206, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:30.773423Z", "iopub.status.busy": "2023-11-12T12:40:30.773297Z", "iopub.status.idle": "2023-11-12T12:40:30.774933Z", "shell.execute_reply": "2023-11-12T12:40:30.774645Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# print(repr(root_slicer))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Removing HTML Markup" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us come to our ongoing example, `remove_html_markup()`. This is how its instrumented code looks like:" ] }, { "cell_type": "code", "execution_count": 207, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:30.776764Z", "iopub.status.busy": "2023-11-12T12:40:30.776633Z", "iopub.status.idle": "2023-11-12T12:40:30.786841Z", "shell.execute_reply": "2023-11-12T12:40:30.786486Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with Slicer(remove_html_markup) as rhm_slicer:\n", " s = remove_html_markup(\"bar\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The graph is as discussed in the introduction to this chapter:" ] }, { "cell_type": "code", "execution_count": 208, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:30.788862Z", "iopub.status.busy": "2023-11-12T12:40:30.788574Z", "iopub.status.idle": "2023-11-12T12:40:31.210345Z", "shell.execute_reply": "2023-11-12T12:40:31.209886Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "s_functionremove_html_markupat0x105d9b2e0_238\n", "\n", "\n", "s\n", "def remove_html_markup(s):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x105d9b2e0_243\n", "\n", "\n", "c\n", "for c in s:\n", "\n", "\n", "\n", "\n", "\n", "s_functionremove_html_markupat0x105d9b2e0_238->c_functionremove_html_markupat0x105d9b2e0_243\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x105d9b2e0_239\n", "\n", "\n", "tag\n", "tag = False\n", "\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x105d9b2e0_247\n", "\n", "\n", "tag\n", "tag = True\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x105d9b2e0_248\n", "\n", "\n", "<test>\n", "elif c == '>' and not quote:\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x105d9b2e0_252\n", "\n", "\n", "<test>\n", "elif not tag:\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x105d9b2e0_247->test_functionremove_html_markupat0x105d9b2e0_252\n", "\n", "\n", "\n", "\n", "\n", "assertion_functionremove_html_markupat0x105d9b2e0_244\n", "\n", "\n", "<assertion>\n", "assert tag or not quote\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x105d9b2e0_247->assertion_functionremove_html_markupat0x105d9b2e0_244\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x105d9b2e0_246\n", "\n", "\n", "<test>\n", "if c == '<' and not quote:\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x105d9b2e0_246->tag_functionremove_html_markupat0x105d9b2e0_247\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x105d9b2e0_246->test_functionremove_html_markupat0x105d9b2e0_248\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x105d9b2e0_243->test_functionremove_html_markupat0x105d9b2e0_246\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x105d9b2e0_243->test_functionremove_html_markupat0x105d9b2e0_248\n", "\n", "\n", "\n", "\n", "\n", "out_functionremove_html_markupat0x105d9b2e0_253\n", "\n", "\n", "out\n", "out = out + c\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x105d9b2e0_243->out_functionremove_html_markupat0x105d9b2e0_253\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x105d9b2e0_250\n", "\n", "\n", "<test>\n", "elif (c == '"' or c == "'") and tag:\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x105d9b2e0_243->test_functionremove_html_markupat0x105d9b2e0_250\n", "\n", "\n", "\n", "\n", "\n", "\n", "quote_functionremove_html_markupat0x105d9b2e0_240\n", "\n", "\n", "quote\n", "quote = False\n", "\n", "\n", "\n", "\n", "\n", "quote_functionremove_html_markupat0x105d9b2e0_240->test_functionremove_html_markupat0x105d9b2e0_246\n", "\n", "\n", "\n", "\n", "\n", "out_functionremove_html_markupat0x105d9b2e0_241\n", "\n", "\n", "out\n", "out = ""\n", "\n", "\n", "\n", "\n", "\n", "\n", "quote_functionremove_html_markupat0x105d9b2e0_240->test_functionremove_html_markupat0x105d9b2e0_248\n", "\n", "\n", "\n", "\n", "\n", "quote_functionremove_html_markupat0x105d9b2e0_240->assertion_functionremove_html_markupat0x105d9b2e0_244\n", "\n", "\n", "\n", "\n", "\n", "\n", "out_functionremove_html_markupat0x105d9b2e0_241->out_functionremove_html_markupat0x105d9b2e0_253\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x105d9b2e0_249\n", "\n", "\n", "tag\n", "tag = False\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x105d9b2e0_248->tag_functionremove_html_markupat0x105d9b2e0_249\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x105d9b2e0_248->test_functionremove_html_markupat0x105d9b2e0_250\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x105d9b2e0_249->test_functionremove_html_markupat0x105d9b2e0_252\n", "\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x105d9b2e0_249->assertion_functionremove_html_markupat0x105d9b2e0_244\n", "\n", "\n", "\n", "\n", "\n", "out_functionremove_html_markupat0x105d9b2e0_253->out_functionremove_html_markupat0x105d9b2e0_253\n", "\n", "\n", "\n", "\n", "\n", "remove_html_markupreturnvalue_functionremove_html_markupat0x105d9b2e0_255\n", "\n", "\n", "<remove_html_markup() return value>\n", "return out\n", "\n", "\n", "\n", "\n", "\n", "out_functionremove_html_markupat0x105d9b2e0_253->remove_html_markupreturnvalue_functionremove_html_markupat0x105d9b2e0_255\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x105d9b2e0_252->out_functionremove_html_markupat0x105d9b2e0_253\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x105d9b2e0_250->test_functionremove_html_markupat0x105d9b2e0_252\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x105d9b2e0_239->assertion_functionremove_html_markupat0x105d9b2e0_244\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "<__main__.Slicer at 0x105deb0d0>" ] }, "execution_count": 208, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rhm_slicer" ] }, { "cell_type": "code", "execution_count": 209, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:31.212477Z", "iopub.status.busy": "2023-11-12T12:40:31.212235Z", "iopub.status.idle": "2023-11-12T12:40:31.214184Z", "shell.execute_reply": "2023-11-12T12:40:31.213875Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# print(repr(rhm_slicer.dependencies()))" ] }, { "cell_type": "code", "execution_count": 210, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:31.215930Z", "iopub.status.busy": "2023-11-12T12:40:31.215773Z", "iopub.status.idle": "2023-11-12T12:40:31.764425Z", "shell.execute_reply": "2023-11-12T12:40:31.764148Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* 238 \u001b[34mdef\u001b[39;49;00m \u001b[32mremove_html_markup\u001b[39;49;00m(s): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 239 tag = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 240 quote = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 241 out = \u001b[33m\"\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 242 \u001b[37m\u001b[39;49;00m\n", "* 243 \u001b[34mfor\u001b[39;49;00m c \u001b[35min\u001b[39;49;00m s: \u001b[37m# <= s (238)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 244 \u001b[34massert\u001b[39;49;00m tag \u001b[35mor\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote \u001b[37m# <= tag (239), quote (240), tag (247), tag (249)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 245 \u001b[37m\u001b[39;49;00m\n", "* 246 \u001b[34mif\u001b[39;49;00m c == \u001b[33m'\u001b[39;49;00m\u001b[33m<\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote: \u001b[37m# <= c (243), quote (240)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 247 tag = \u001b[34mTrue\u001b[39;49;00m \u001b[37m# <- (246)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 248 \u001b[34melif\u001b[39;49;00m c == \u001b[33m'\u001b[39;49;00m\u001b[33m>\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote: \u001b[37m# <= c (243), quote (240); <- (246)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 249 tag = \u001b[34mFalse\u001b[39;49;00m \u001b[37m# <- (248)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 250 \u001b[34melif\u001b[39;49;00m (c == \u001b[33m'\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mor\u001b[39;49;00m c == \u001b[33m\"\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m) \u001b[35mand\u001b[39;49;00m tag: \u001b[37m# <= c (243); <- (248)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 251 quote = \u001b[35mnot\u001b[39;49;00m quote\u001b[37m\u001b[39;49;00m\n", "* 252 \u001b[34melif\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m tag: \u001b[37m# <= tag (247), tag (249); <- (250)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 253 out = out + c \u001b[37m# <= out (241), c (243), out (253); <- (252)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " 254 \u001b[37m\u001b[39;49;00m\n", "* 255 \u001b[34mreturn\u001b[39;49;00m out \u001b[37m# <= out (253)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n" ] } ], "source": [ "rhm_slicer.code()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can also compute slices over the dependencies:" ] }, { "cell_type": "code", "execution_count": 211, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:31.766159Z", "iopub.status.busy": "2023-11-12T12:40:31.766028Z", "iopub.status.idle": "2023-11-12T12:40:31.768740Z", "shell.execute_reply": "2023-11-12T12:40:31.768454Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "238" ] }, "execution_count": 211, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_, start_remove_html_markup = inspect.getsourcelines(remove_html_markup)\n", "start_remove_html_markup" ] }, { "cell_type": "code", "execution_count": 212, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:31.770406Z", "iopub.status.busy": "2023-11-12T12:40:31.770282Z", "iopub.status.idle": "2023-11-12T12:40:32.197448Z", "shell.execute_reply": "2023-11-12T12:40:32.197077Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "s_functionremove_html_markupat0x105d9b2e0_238\n", "\n", "\n", "s\n", "def remove_html_markup(s):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x105d9b2e0_243\n", "\n", "\n", "c\n", "for c in s:\n", "\n", "\n", "\n", "\n", "\n", "s_functionremove_html_markupat0x105d9b2e0_238->c_functionremove_html_markupat0x105d9b2e0_243\n", "\n", "\n", "\n", "\n", "\n", "quote_functionremove_html_markupat0x105d9b2e0_240\n", "\n", "\n", "quote\n", "quote = False\n", "\n", "\n", "\n", "\n", "\n", "\n", "tag_functionremove_html_markupat0x105d9b2e0_247\n", "\n", "\n", "tag\n", "tag = True\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x105d9b2e0_246\n", "\n", "\n", "<test>\n", "if c == '<' and not quote:\n", "\n", "\n", "\n", "\n", "\n", "test_functionremove_html_markupat0x105d9b2e0_246->tag_functionremove_html_markupat0x105d9b2e0_247\n", "\n", "\n", "\n", "\n", "\n", "\n", "c_functionremove_html_markupat0x105d9b2e0_243->test_functionremove_html_markupat0x105d9b2e0_246\n", "\n", "\n", "\n", "\n", "\n", "\n", "quote_functionremove_html_markupat0x105d9b2e0_240->test_functionremove_html_markupat0x105d9b2e0_246\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "Dependencies(\n", " data={\n", " ('tag', (remove_html_markup, 247)): set(),\n", " ('', (remove_html_markup, 246)): {('c', (remove_html_markup, 243)), ('quote', (remove_html_markup, 240))},\n", " ('c', (remove_html_markup, 243)): {('s', (remove_html_markup, 238))},\n", " ('quote', (remove_html_markup, 240)): set(),\n", " ('s', (remove_html_markup, 238)): set()},\n", " control={\n", " ('tag', (remove_html_markup, 247)): {('', (remove_html_markup, 246))},\n", " ('', (remove_html_markup, 246)): set(),\n", " ('c', (remove_html_markup, 243)): set(),\n", " ('quote', (remove_html_markup, 240)): set(),\n", " ('s', (remove_html_markup, 238)): set()})" ] }, "execution_count": 212, "metadata": {}, "output_type": "execute_result" } ], "source": [ "slicing_criterion = ('tag', (remove_html_markup,\n", " start_remove_html_markup + 9))\n", "tag_deps = rhm_slicer.dependencies().backward_slice(slicing_criterion) # type: ignore\n", "tag_deps" ] }, { "cell_type": "code", "execution_count": 213, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:32.199410Z", "iopub.status.busy": "2023-11-12T12:40:32.199284Z", "iopub.status.idle": "2023-11-12T12:40:32.201001Z", "shell.execute_reply": "2023-11-12T12:40:32.200752Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# repr(tag_deps)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Calls and Augmented Assign" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Our last example covers augmented assigns and data flow across function calls. We introduce two simple functions `add_to()` and `mul_with()`:" ] }, { "cell_type": "code", "execution_count": 214, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:32.202441Z", "iopub.status.busy": "2023-11-12T12:40:32.202344Z", "iopub.status.idle": "2023-11-12T12:40:32.204031Z", "shell.execute_reply": "2023-11-12T12:40:32.203775Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def add_to(n, m): # type: ignore\n", " n += m\n", " return n" ] }, { "cell_type": "code", "execution_count": 215, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:32.205483Z", "iopub.status.busy": "2023-11-12T12:40:32.205365Z", "iopub.status.idle": "2023-11-12T12:40:32.207085Z", "shell.execute_reply": "2023-11-12T12:40:32.206803Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def mul_with(x, y): # type: ignore\n", " x *= y\n", " return x" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And we put these two together in a single call:" ] }, { "cell_type": "code", "execution_count": 216, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:32.208859Z", "iopub.status.busy": "2023-11-12T12:40:32.208678Z", "iopub.status.idle": "2023-11-12T12:40:32.210766Z", "shell.execute_reply": "2023-11-12T12:40:32.210471Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def test_math() -> None:\n", " return mul_with(1, add_to(2, 3))" ] }, { "cell_type": "code", "execution_count": 217, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:32.212225Z", "iopub.status.busy": "2023-11-12T12:40:32.212118Z", "iopub.status.idle": "2023-11-12T12:40:32.217243Z", "shell.execute_reply": "2023-11-12T12:40:32.216985Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with Slicer(add_to, mul_with, test_math) as math_slicer:\n", " test_math()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The resulting dependence graph nicely captures the data flow between these calls, notably arguments and parameters:" ] }, { "cell_type": "code", "execution_count": 218, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:32.218688Z", "iopub.status.busy": "2023-11-12T12:40:32.218598Z", "iopub.status.idle": "2023-11-12T12:40:32.629216Z", "shell.execute_reply": "2023-11-12T12:40:32.628842Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "x_functionmul_withat0x105d39d80_1\n", "\n", "\n", "x\n", "def mul_with(x, y):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "x_functionmul_withat0x105d39d80_2\n", "\n", "\n", "x\n", "x *= y\n", "\n", "\n", "\n", "\n", "\n", "x_functionmul_withat0x105d39d80_1->x_functionmul_withat0x105d39d80_2\n", "\n", "\n", "\n", "\n", "\n", "n_functionadd_toat0x105d39990_1\n", "\n", "\n", "n\n", "def add_to(n, m):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "n_functionadd_toat0x105d39990_2\n", "\n", "\n", "n\n", "n += m\n", "\n", "\n", "\n", "\n", "\n", "n_functionadd_toat0x105d39990_1->n_functionadd_toat0x105d39990_2\n", "\n", "\n", "\n", "\n", "\n", "\n", "mul_withreturnvalue_functionmul_withat0x105d39d80_3\n", "\n", "\n", "<mul_with() return value>\n", "return x\n", "\n", "\n", "\n", "\n", "\n", "test_mathreturnvalue_functiontest_mathat0x105d3a170_2\n", "\n", "\n", "<test_math() return value>\n", "return mul_with(1, add_to(2, 3))\n", "\n", "\n", "\n", "\n", "\n", "mul_withreturnvalue_functionmul_withat0x105d39d80_3->test_mathreturnvalue_functiontest_mathat0x105d3a170_2\n", "\n", "\n", "\n", "\n", "\n", "x_functionmul_withat0x105d39d80_2->mul_withreturnvalue_functionmul_withat0x105d39d80_3\n", "\n", "\n", "\n", "\n", "\n", "\n", "y_functionmul_withat0x105d39d80_1\n", "\n", "\n", "y\n", "def mul_with(x, y):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "y_functionmul_withat0x105d39d80_1->x_functionmul_withat0x105d39d80_2\n", "\n", "\n", "\n", "\n", "\n", "\n", "add_toreturnvalue_functionadd_toat0x105d39990_3\n", "\n", "\n", "<add_to() return value>\n", "return n\n", "\n", "\n", "\n", "\n", "\n", "add_toreturnvalue_functionadd_toat0x105d39990_3->y_functionmul_withat0x105d39d80_1\n", "\n", "\n", "\n", "\n", "\n", "n_functionadd_toat0x105d39990_2->add_toreturnvalue_functionadd_toat0x105d39990_3\n", "\n", "\n", "\n", "\n", "\n", "\n", "m_functionadd_toat0x105d39990_1\n", "\n", "\n", "m\n", "def add_to(n, m):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "m_functionadd_toat0x105d39990_1->n_functionadd_toat0x105d39990_2\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "<__main__.Slicer at 0x105dea140>" ] }, "execution_count": 218, "metadata": {}, "output_type": "execute_result" } ], "source": [ "math_slicer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "These are also reflected in the code view:" ] }, { "cell_type": "code", "execution_count": 219, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:32.631090Z", "iopub.status.busy": "2023-11-12T12:40:32.630868Z", "iopub.status.idle": "2023-11-12T12:40:32.972851Z", "shell.execute_reply": "2023-11-12T12:40:32.966966Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1 \u001b[34mdef\u001b[39;49;00m \u001b[32mtest_math\u001b[39;49;00m() -> \u001b[34mNone\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", "* 2 \u001b[34mreturn\u001b[39;49;00m mul_with(\u001b[34m1\u001b[39;49;00m, add_to(\u001b[34m2\u001b[39;49;00m, \u001b[34m3\u001b[39;49;00m)) \u001b[37m# <= (mul_with:3)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\n", "* 1 \u001b[34mdef\u001b[39;49;00m \u001b[32madd_to\u001b[39;49;00m(n, m): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 2 n += m \u001b[37m# <= n (1), m (1)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 3 \u001b[34mreturn\u001b[39;49;00m n \u001b[37m# <= n (2)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\n", "* 1 \u001b[34mdef\u001b[39;49;00m \u001b[32mmul_with\u001b[39;49;00m(x, y): \u001b[37m# type: ignore # <= (add_to:3)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 2 x *= y \u001b[37m# <= x (1), y (1)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 3 \u001b[34mreturn\u001b[39;49;00m x \u001b[37m# <= x (2)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n" ] } ], "source": [ "math_slicer.code()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Dynamic Instrumentation\n", "\n", "When initializing `Slicer()`, one has to provide the set of functions to be instrumented. This is because the instrumentation has to take place _before_ the code in the `with` block is executed. Can we determine this list on the fly – while `Slicer()` is executed? \n", "\n", "The answer is: Yes, but the solution is a bit hackish – even more so than what we have seen above. In essence, we proceed in two steps:\n", "\n", "1. When `DynamicSlicer.__init__()` is called:\n", " * Use the `inspect` module to determine the source code of the call\n", " * Analyze the enclosed `with` block for function calls\n", " * Instrument these functions\n", "2. Whenever a function is about to be called (`DataTracker.call()`)\n", " * Create an instrumented version of that function\n", " * Have the `call()` method return the instrumented function instead" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Implementing Dynamic Instrumentation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We start with the aim of determining the `with` block to which our slicer is applied. The `our_with_block()` method inspects the source code of the `Slicer()` caller, and returns the one `with` block (as an AST) whose source line is currently active." ] }, { "cell_type": "code", "execution_count": 220, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:32.976764Z", "iopub.status.busy": "2023-11-12T12:40:32.976605Z", "iopub.status.idle": "2023-11-12T12:40:32.980554Z", "shell.execute_reply": "2023-11-12T12:40:32.980085Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class WithVisitor(NodeVisitor):\n", " def __init__(self) -> None:\n", " self.withs: List[ast.With] = []\n", "\n", " def visit_With(self, node: ast.With) -> AST:\n", " self.withs.append(node)\n", " return self.generic_visit(node)" ] }, { "cell_type": "code", "execution_count": 221, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:32.983713Z", "iopub.status.busy": "2023-11-12T12:40:32.983561Z", "iopub.status.idle": "2023-11-12T12:40:32.986663Z", "shell.execute_reply": "2023-11-12T12:40:32.986177Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Slicer(Slicer):\n", " def our_with_block(self) -> ast.With:\n", " \"\"\"Return the currently active `with` block.\"\"\"\n", " frame = self.caller_frame()\n", " source_lines, starting_lineno = inspect.getsourcelines(frame)\n", " starting_lineno = max(starting_lineno, 1)\n", " if len(source_lines) == 1:\n", " # We only get one `with` line, rather than the full block\n", " # This happens in Jupyter notebooks with iPython 8.1.0 and later.\n", " # Here's a hacky workaround to get the cell contents:\n", " # https://stackoverflow.com/questions/51566497/getting-the-source-of-an-object-defined-in-a-jupyter-notebook\n", " source_lines = inspect.linecache.getlines(inspect.getfile(frame)) # type: ignore\n", " starting_lineno = 1\n", "\n", " source_ast = ast.parse(''.join(source_lines))\n", " wv = WithVisitor()\n", " wv.visit(source_ast)\n", "\n", " for with_ast in wv.withs:\n", " if starting_lineno + (with_ast.lineno - 1) == frame.f_lineno:\n", " return with_ast\n", "\n", " raise ValueError(\"Cannot find 'with' block\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Within the `with` AST, we can identify all calls:" ] }, { "cell_type": "code", "execution_count": 222, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:32.988517Z", "iopub.status.busy": "2023-11-12T12:40:32.988423Z", "iopub.status.idle": "2023-11-12T12:40:32.991425Z", "shell.execute_reply": "2023-11-12T12:40:32.991102Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class CallCollector(NodeVisitor):\n", " def __init__(self) -> None:\n", " self.calls: Set[str] = set()\n", "\n", " def visit_Call(self, node: ast.Call) -> AST:\n", " caller_id = ast.unparse(node.func).strip()\n", " self.calls.add(caller_id)\n", " return self.generic_visit(node)" ] }, { "cell_type": "code", "execution_count": 223, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:32.993540Z", "iopub.status.busy": "2023-11-12T12:40:32.993413Z", "iopub.status.idle": "2023-11-12T12:40:32.995646Z", "shell.execute_reply": "2023-11-12T12:40:32.995328Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Slicer(Slicer):\n", " def calls_in_our_with_block(self) -> Set[str]:\n", " \"\"\"Return a set of function names called in the `with` block.\"\"\"\n", " block_ast = self.our_with_block()\n", " cc = CallCollector()\n", " for stmt in block_ast.body:\n", " cc.visit(stmt)\n", " return cc.calls" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The method `funcs_in_our_with_block()` finally returns a list of all functions used in the `with` block. This is the list of functions we will instrument upfront." ] }, { "cell_type": "code", "execution_count": 224, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:32.997276Z", "iopub.status.busy": "2023-11-12T12:40:32.997158Z", "iopub.status.idle": "2023-11-12T12:40:32.999512Z", "shell.execute_reply": "2023-11-12T12:40:32.999162Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Slicer(Slicer):\n", " def funcs_in_our_with_block(self) -> List[Callable]:\n", " funcs = []\n", " for id in self.calls_in_our_with_block():\n", " func = self.search_func(id)\n", " if func:\n", " funcs.append(func)\n", "\n", " return funcs" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The method `default_items_to_instrument()` is already provided for dynamic instrumentation – it is invoked whenever no list of functions to instrument is provided. So we have it return the list of functions as computed above.\n", "\n", "However, `default_items_to_instrument()` does one more thing: Using the (also provided) `instrument_call()` hook in our dependency tracker, we ensure that all (later) calls are redirected to instrumented functions! This way, if further functions are called, these will be instrumented on the fly." ] }, { "cell_type": "code", "execution_count": 225, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.001396Z", "iopub.status.busy": "2023-11-12T12:40:33.001267Z", "iopub.status.idle": "2023-11-12T12:40:33.003472Z", "shell.execute_reply": "2023-11-12T12:40:33.003124Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Slicer(Slicer):\n", " def default_items_to_instrument(self) -> List[Callable]:\n", " # In _data.call(), return instrumented function\n", " self.dependency_tracker.instrument_call = self.instrument # type: ignore\n", "\n", " # Start instrumenting the functions in our `with` block\n", " return self.funcs_in_our_with_block()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Both these hacks are effective, as shown in the following example. We use the `Slicer()` constructor without arguments; it automatically identifies `fun_2()` as a function in the `with` block. As the instrumented `fun2()` is invoked, its `_data.call()` method instruments the call to `fun_1()` (and ensures the instrumented version is called)." ] }, { "cell_type": "code", "execution_count": 226, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.005048Z", "iopub.status.busy": "2023-11-12T12:40:33.004929Z", "iopub.status.idle": "2023-11-12T12:40:33.006674Z", "shell.execute_reply": "2023-11-12T12:40:33.006380Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def fun_1(x: int) -> int:\n", " return x" ] }, { "cell_type": "code", "execution_count": 227, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.008274Z", "iopub.status.busy": "2023-11-12T12:40:33.008150Z", "iopub.status.idle": "2023-11-12T12:40:33.010310Z", "shell.execute_reply": "2023-11-12T12:40:33.009947Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def fun_2(x: int) -> int:\n", " return fun_1(x)" ] }, { "cell_type": "code", "execution_count": 228, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.012138Z", "iopub.status.busy": "2023-11-12T12:40:33.012020Z", "iopub.status.idle": "2023-11-12T12:40:33.083214Z", "shell.execute_reply": "2023-11-12T12:40:33.082770Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Instrumenting \n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mfun_2\u001b[39;49;00m(x: \u001b[36mint\u001b[39;49;00m) -> \u001b[36mint\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x, pos=\u001b[34m1\u001b[39;49;00m, last=\u001b[34mTrue\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.ret(_data.call(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mfun_1\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, fun_1))(_data.arg(_data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x), pos=\u001b[34m1\u001b[39;49;00m))))\u001b[37m\u001b[39;49;00m\n", "\n", "Instrumenting \n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mfun_1\u001b[39;49;00m(x: \u001b[36mint\u001b[39;49;00m) -> \u001b[36mint\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " _data.param(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x, pos=\u001b[34m1\u001b[39;49;00m, last=\u001b[34mTrue\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m _data.set(\u001b[33m'\u001b[39;49;00m\u001b[33m\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, _data.get(\u001b[33m'\u001b[39;49;00m\u001b[33mx\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, x))\u001b[37m\u001b[39;49;00m\n", "\n" ] } ], "source": [ "with Slicer(log=True) as slicer:\n", " fun_2(10)" ] }, { "cell_type": "code", "execution_count": 229, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.085221Z", "iopub.status.busy": "2023-11-12T12:40:33.085103Z", "iopub.status.idle": "2023-11-12T12:40:33.524800Z", "shell.execute_reply": "2023-11-12T12:40:33.524418Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "fun_1returnvalue_functionfun_1at0x104c68dc0_2\n", "\n", "\n", "<fun_1() return value>\n", "return x\n", "\n", "\n", "\n", "\n", "\n", "fun_2returnvalue_functionfun_2at0x105d9b520_2\n", "\n", "\n", "<fun_2() return value>\n", "return fun_1(x)\n", "\n", "\n", "\n", "\n", "\n", "fun_1returnvalue_functionfun_1at0x104c68dc0_2->fun_2returnvalue_functionfun_2at0x105d9b520_2\n", "\n", "\n", "\n", "\n", "\n", "x_functionfun_1at0x104c68dc0_1\n", "\n", "\n", "x\n", "def fun_1(x: int) -> int:\n", "\n", "\n", "\n", "\n", "\n", "x_functionfun_1at0x104c68dc0_1->fun_1returnvalue_functionfun_1at0x104c68dc0_2\n", "\n", "\n", "\n", "\n", "\n", "\n", "x_functionfun_2at0x105d9b520_1\n", "\n", "\n", "x\n", "def fun_2(x: int) -> int:\n", "\n", "\n", "\n", "\n", "\n", "x_functionfun_2at0x105d9b520_1->x_functionfun_1at0x104c68dc0_1\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "<__main__.Slicer at 0x105f4d510>" ] }, "execution_count": 229, "metadata": {}, "output_type": "execute_result" } ], "source": [ "slicer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## More Applications\n", "\n", "The main use of dynamic slices is for _debugging tools_, where they show the origins of individual values. However, beyond facilitating debugging, tracking information flows has a number of additional applications, some of which we briefly sketch here." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Verifying Information Flows\n", "\n", "Using dynamic slices, we can check all the locations where (potentially sensitive) information is used. As an example, consider the following function `password_checker()`, which requests a password from the user and returns `True` if it is the correct one:" ] }, { "cell_type": "code", "execution_count": 230, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.526742Z", "iopub.status.busy": "2023-11-12T12:40:33.526558Z", "iopub.status.idle": "2023-11-12T12:40:33.528607Z", "shell.execute_reply": "2023-11-12T12:40:33.528301Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import hashlib" ] }, { "cell_type": "code", "execution_count": 231, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.530524Z", "iopub.status.busy": "2023-11-12T12:40:33.530335Z", "iopub.status.idle": "2023-11-12T12:40:33.532679Z", "shell.execute_reply": "2023-11-12T12:40:33.532180Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import input, next_inputs" ] }, { "cell_type": "code", "execution_count": 232, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.534655Z", "iopub.status.busy": "2023-11-12T12:40:33.534481Z", "iopub.status.idle": "2023-11-12T12:40:33.536302Z", "shell.execute_reply": "2023-11-12T12:40:33.536027Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "SECRET_HASH_DIGEST = '59f2da35bcc39525b87932b4cc1f3d68'" ] }, { "cell_type": "code", "execution_count": 233, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.537785Z", "iopub.status.busy": "2023-11-12T12:40:33.537664Z", "iopub.status.idle": "2023-11-12T12:40:33.539731Z", "shell.execute_reply": "2023-11-12T12:40:33.539475Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def password_checker() -> bool:\n", " \"\"\"Request a password. Return True if correct.\"\"\"\n", " secret_password = input(\"Enter secret password: \")\n", " password_digest = hashlib.md5(secret_password.encode('utf-8')).hexdigest()\n", "\n", " if password_digest == SECRET_HASH_DIGEST:\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "(Note that this is a very naive implementation: A true password checker would use the Python `getpass` module to read in a password without echoing it in the clear, and possibly also use a more sophisticated hash function than `md5`.)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "From a security perspective, the interesting question we can ask using slicing is: _Is the entered password stored in the clear somewhere_? For this, we can simply run our slicer to see where the inputs are going:" ] }, { "cell_type": "code", "execution_count": 234, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.541205Z", "iopub.status.busy": "2023-11-12T12:40:33.541101Z", "iopub.status.idle": "2023-11-12T12:40:33.543132Z", "shell.execute_reply": "2023-11-12T12:40:33.542881Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['secret123']" ] }, "execution_count": 234, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "next_inputs(['secret123'])" ] }, { "cell_type": "code", "execution_count": 235, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.544573Z", "iopub.status.busy": "2023-11-12T12:40:33.544460Z", "iopub.status.idle": "2023-11-12T12:40:33.551976Z", "shell.execute_reply": "2023-11-12T12:40:33.551721Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "Enter secret password: secret123" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "with Slicer() as slicer:\n", " valid_pwd = password_checker()" ] }, { "cell_type": "code", "execution_count": 236, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.553435Z", "iopub.status.busy": "2023-11-12T12:40:33.553324Z", "iopub.status.idle": "2023-11-12T12:40:33.960010Z", "shell.execute_reply": "2023-11-12T12:40:33.959555Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "password_checkerreturnvalue_functionpassword_checkerat0x10582eb00_9\n", "\n", "\n", "<password_checker() return value>\n", "return False\n", "\n", "\n", "\n", "\n", "\n", "test_functionpassword_checkerat0x10582eb00_6\n", "\n", "\n", "<test>\n", "if password_digest == SECRET_HASH_DIGEST:\n", "\n", "\n", "\n", "\n", "\n", "test_functionpassword_checkerat0x10582eb00_6->password_checkerreturnvalue_functionpassword_checkerat0x10582eb00_9\n", "\n", "\n", "\n", "\n", "\n", "\n", "password_digest_functionpassword_checkerat0x10582eb00_4\n", "\n", "\n", "password_digest\n", "password_digest = hashlib.md5(secret_password.encode('utf-8')).hexdigest()\n", "\n", "\n", "\n", "\n", "\n", "password_digest_functionpassword_checkerat0x10582eb00_4->test_functionpassword_checkerat0x10582eb00_6\n", "\n", "\n", "\n", "\n", "\n", "\n", "secret_password_functionpassword_checkerat0x10582eb00_3\n", "\n", "\n", "secret_password\n", "secret_password = input("Enter secret password: ")\n", "\n", "\n", "\n", "\n", "\n", "secret_password_functionpassword_checkerat0x10582eb00_3->password_digest_functionpassword_checkerat0x10582eb00_4\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "<__main__.Slicer at 0x105de8eb0>" ] }, "execution_count": 236, "metadata": {}, "output_type": "execute_result" } ], "source": [ "slicer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see that the password only flows into `password_digest`, where it is already encrypted. If the password were flowing into some other function or variable, we would see this in our slice." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "(Note that an attacker may still be able to find out which password was entered, for instance, by checking memory contents.)" ] }, { "cell_type": "code", "execution_count": 237, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.962111Z", "iopub.status.busy": "2023-11-12T12:40:33.961930Z", "iopub.status.idle": "2023-11-12T12:40:33.967814Z", "shell.execute_reply": "2023-11-12T12:40:33.967520Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

What is the secret password, actually?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 237, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "secret_answers = [\n", " 'automated',\n", " 'debugging',\n", " 'is',\n", " 'fun'\n", "]\n", "\n", "quiz(\"What is the secret password, actually?\", \n", " [f\"`{repr(s)}`\" for s in secret_answers],\n", " min([i + 1 for i, ans in enumerate(secret_answers) \n", " if hashlib.md5(ans.encode('utf-8')).hexdigest() == \n", " SECRET_HASH_DIGEST])\n", " )" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Assessing Test Quality" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Another interesting usage of dynamic slices is to _assess test quality_. With our `square_root()` function, we have seen that the included assertions well test the arguments and the result for correctness:" ] }, { "cell_type": "code", "execution_count": 238, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.969369Z", "iopub.status.busy": "2023-11-12T12:40:33.969255Z", "iopub.status.idle": "2023-11-12T12:40:33.971213Z", "shell.execute_reply": "2023-11-12T12:40:33.970946Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "_, start_square_root = inspect.getsourcelines(square_root)" ] }, { "cell_type": "code", "execution_count": 239, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:33.972591Z", "iopub.status.busy": "2023-11-12T12:40:33.972484Z", "iopub.status.idle": "2023-11-12T12:40:34.006133Z", "shell.execute_reply": "2023-11-12T12:40:34.005863Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "54 \u001b[34mdef\u001b[39;49;00m \u001b[32msquare_root\u001b[39;49;00m(x): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "55 \u001b[34massert\u001b[39;49;00m x >= \u001b[34m0\u001b[39;49;00m \u001b[37m# precondition\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "56 \u001b[37m\u001b[39;49;00m\n", "57 approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "58 guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "59 \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", "60 approx = guess\u001b[37m\u001b[39;49;00m\n", "61 guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "62 \u001b[37m\u001b[39;49;00m\n", "63 \u001b[34massert\u001b[39;49;00m math.isclose(approx * approx, x)\u001b[37m\u001b[39;49;00m\n", "64 \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "# ignore\n", "print_content(inspect.getsource(square_root), '.py',\n", " start_line_number=start_square_root)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "However, a lazy programmer could also omit these tests – or worse yet, include tests that always pass:" ] }, { "cell_type": "code", "execution_count": 240, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:34.007959Z", "iopub.status.busy": "2023-11-12T12:40:34.007819Z", "iopub.status.idle": "2023-11-12T12:40:34.010216Z", "shell.execute_reply": "2023-11-12T12:40:34.009822Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def square_root_unchecked(x): # type: ignore\n", " assert True # <-- new \"precondition\"\n", "\n", " approx = None\n", " guess = x / 2\n", " while approx != guess:\n", " approx = guess\n", " guess = (approx + x / approx) / 2\n", "\n", " assert True # <-- new \"postcondition\"\n", " return approx" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "How can one check that the tests supplied actually are effective? This is a problem of \"Who watches the watchmen\" – we need to find a way to ensure that the tests do their job." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The \"classical\" way of testing tests is so-called *mutation testing* – that is, introducing _artificial errors_ into the code to see whether the tests catch them. Mutation testing is effective: The above \"weak\" tests would not catch any change to the `square_root()` computation code, and hence quickly be determined as ineffective. However, mutation testing is also _costly_, as tests have to be ran again and again for every small code mutation." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Slices offer a cost-effective alternative to determine the quality of tests. The idea is that if there are statements in the code whose result does _not_ flow into an assertion, then any errors in these statements will go unnoticed. In consequence, the larger the backward slice of an assertion, the higher its ability to catch errors." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can easily validate this assumption using the two examples, above. Here is the backward slice for the \"full\" postcondition in `square_root()`. We see that the entire computation code flows into the final postcondition:" ] }, { "cell_type": "code", "execution_count": 241, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:34.012084Z", "iopub.status.busy": "2023-11-12T12:40:34.011967Z", "iopub.status.idle": "2023-11-12T12:40:34.014255Z", "shell.execute_reply": "2023-11-12T12:40:34.013886Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "63" ] }, "execution_count": 241, "metadata": {}, "output_type": "execute_result" } ], "source": [ "postcondition_lineno = start_square_root + 9\n", "postcondition_lineno" ] }, { "cell_type": "code", "execution_count": 242, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:34.015864Z", "iopub.status.busy": "2023-11-12T12:40:34.015747Z", "iopub.status.idle": "2023-11-12T12:40:34.480976Z", "shell.execute_reply": "2023-11-12T12:40:34.480643Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "assertion_functionsquare_rootat0x10582f400_63\n", "\n", "\n", "<assertion>\n", "assert math.isclose(approx * approx, x)\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x10582f400_60\n", "\n", "\n", "approx\n", "approx = guess\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x10582f400_60->assertion_functionsquare_rootat0x10582f400_63\n", "\n", "\n", "\n", "\n", "\n", "test_functionsquare_rootat0x10582f400_59\n", "\n", "\n", "<test>\n", "while approx != guess:\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x10582f400_60->test_functionsquare_rootat0x10582f400_59\n", "\n", "\n", "\n", "\n", "\n", "guess_functionsquare_rootat0x10582f400_61\n", "\n", "\n", "guess\n", "guess = (approx + x / approx) / 2\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x10582f400_60->guess_functionsquare_rootat0x10582f400_61\n", "\n", "\n", "\n", "\n", "\n", "\n", "x_functionsquare_rootat0x10582f400_54\n", "\n", "\n", "x\n", "def square_root(x):  # type: ignore\n", "\n", "\n", "\n", "\n", "\n", "x_functionsquare_rootat0x10582f400_54->assertion_functionsquare_rootat0x10582f400_63\n", "\n", "\n", "\n", "\n", "\n", "guess_functionsquare_rootat0x10582f400_58\n", "\n", "\n", "guess\n", "guess = x / 2\n", "\n", "\n", "\n", "\n", "\n", "x_functionsquare_rootat0x10582f400_54->guess_functionsquare_rootat0x10582f400_58\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x10582f400_57\n", "\n", "\n", "approx\n", "approx = None\n", "\n", "\n", "\n", "\n", "\n", "\n", "x_functionsquare_rootat0x10582f400_54->guess_functionsquare_rootat0x10582f400_61\n", "\n", "\n", "\n", "\n", "\n", "test_functionsquare_rootat0x10582f400_59->approx_functionsquare_rootat0x10582f400_60\n", "\n", "\n", "\n", "\n", "\n", "\n", "test_functionsquare_rootat0x10582f400_59->guess_functionsquare_rootat0x10582f400_61\n", "\n", "\n", "\n", "\n", "\n", "guess_functionsquare_rootat0x10582f400_58->approx_functionsquare_rootat0x10582f400_60\n", "\n", "\n", "\n", "\n", "\n", "guess_functionsquare_rootat0x10582f400_58->test_functionsquare_rootat0x10582f400_59\n", "\n", "\n", "\n", "\n", "\n", "\n", "approx_functionsquare_rootat0x10582f400_57->test_functionsquare_rootat0x10582f400_59\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "guess_functionsquare_rootat0x10582f400_61->test_functionsquare_rootat0x10582f400_59\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "Dependencies(\n", " data={\n", " ('', (square_root, 63)): {('approx', (square_root, 60)), ('x', (square_root, 54))},\n", " ('approx', (square_root, 60)): {('guess', (square_root, 58))},\n", " ('x', (square_root, 54)): set(),\n", " ('guess', (square_root, 58)): {('x', (square_root, 54))},\n", " ('', (square_root, 59)): {('guess', (square_root, 58)), ('approx', (square_root, 60)), ('approx', (square_root, 57)), ('guess', (square_root, 61))},\n", " ('approx', (square_root, 57)): set(),\n", " ('guess', (square_root, 61)): {('approx', (square_root, 60)), ('x', (square_root, 54))}},\n", " control={\n", " ('', (square_root, 63)): set(),\n", " ('approx', (square_root, 60)): {('', (square_root, 59))},\n", " ('x', (square_root, 54)): set(),\n", " ('guess', (square_root, 58)): set(),\n", " ('', (square_root, 59)): set(),\n", " ('approx', (square_root, 57)): set(),\n", " ('guess', (square_root, 61)): {('', (square_root, 59))}})" ] }, "execution_count": 242, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with Slicer() as slicer:\n", " y = square_root(4)\n", "\n", "slicer.dependencies().backward_slice((square_root, postcondition_lineno))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In contrast, the \"lazy\" assertion in `square_root_unchecked()` has an empty backward slice, showing that it depends on no other value at all:" ] }, { "cell_type": "code", "execution_count": 243, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:34.482862Z", "iopub.status.busy": "2023-11-12T12:40:34.482615Z", "iopub.status.idle": "2023-11-12T12:40:34.512622Z", "shell.execute_reply": "2023-11-12T12:40:34.512068Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n" ], "text/plain": [ "Dependencies(\n", " data={\n", " },\n", " control={\n", " })" ] }, "execution_count": 243, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with Slicer() as slicer:\n", " y = square_root_unchecked(4)\n", "\n", "slicer.dependencies().backward_slice((square_root, postcondition_lineno))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In \\cite{Schuler2011}, Schuler et al. have tried out this technique and found their \"checked coverage\" to be a sure indicator for the quality of the checks in tests. Using our dynamic slices, you may wish to try this out on Python code." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Use in Statistical Debugging" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Collecting dynamic slices over several runs allows for _correlating dependencies with other execution features_, notably _failures_: \"The program fails whenever the value of `weekday` comes from `calendar()`.\" We will revisit this idea in [the chapter on statistical debugging](StatisticalDebugger.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Synopsis" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This chapter provides a `Slicer` class to automatically determine and visualize dynamic flows and dependencies. When we say that a variable $x$ _depends_ on a variable $y$ (and that $y$ _flows_ into $x$), we distinguish two kinds of dependencies:\n", "\n", "* **Data dependency**: $x$ is assigned a value computed from $y$.\n", "* **Control dependency**: A statement involving $x$ is executed _only_ because a _condition_ involving $y$ was evaluated, influencing the execution path.\n", "\n", "Such dependencies are crucial for debugging, as they allow determininh the origins of individual values (and notably incorrect values)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To determine dynamic dependencies in a function `func`, use\n", "\n", "```python\n", "with Slicer() as slicer:\n", " \n", "```\n", "\n", "and then `slicer.graph()` or `slicer.code()` to examine dependencies." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "You can also explicitly specify the functions to be instrumented, as in \n", "\n", "```python\n", "with Slicer(func, func_1, func_2) as slicer:\n", " \n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is an example. The `demo()` function computes some number from `x`:" ] }, { "cell_type": "code", "execution_count": 244, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:34.515589Z", "iopub.status.busy": "2023-11-12T12:40:34.515372Z", "iopub.status.idle": "2023-11-12T12:40:34.517869Z", "shell.execute_reply": "2023-11-12T12:40:34.517488Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def demo(x: int) -> int:\n", " z = x\n", " while x <= z <= 64:\n", " z *= 2\n", " return z" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "By using `with Slicer()`, we first instrument `demo()` and then execute it:" ] }, { "cell_type": "code", "execution_count": 245, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:34.519491Z", "iopub.status.busy": "2023-11-12T12:40:34.519375Z", "iopub.status.idle": "2023-11-12T12:40:34.525224Z", "shell.execute_reply": "2023-11-12T12:40:34.524933Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with Slicer() as slicer:\n", " demo(10)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "After execution is complete, you can output `slicer` to visualize the dependencies and flows as graph. Data dependencies are shown as black solid edges; control dependencies are shown as grey dashed edges. The arrows indicate influence: If $y$ depends on $x$ (and thus $x$ flows into $y$), then we have an arrow $x \\rightarrow y$.\n", "We see how the parameter `x` flows into `z`, which is returned after some computation that is control dependent on a `` involving `z`." ] }, { "cell_type": "code", "execution_count": 246, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:34.527197Z", "iopub.status.busy": "2023-11-12T12:40:34.527037Z", "iopub.status.idle": "2023-11-12T12:40:34.986988Z", "shell.execute_reply": "2023-11-12T12:40:34.986614Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "demoreturnvalue_functiondemoat0x105d9bd90_5\n", "\n", "\n", "<demo() return value>\n", "return z\n", "\n", "\n", "\n", "\n", "\n", "z_functiondemoat0x105d9bd90_4\n", "\n", "\n", "z\n", "z *= 2\n", "\n", "\n", "\n", "\n", "\n", "z_functiondemoat0x105d9bd90_4->demoreturnvalue_functiondemoat0x105d9bd90_5\n", "\n", "\n", "\n", "\n", "\n", "\n", "z_functiondemoat0x105d9bd90_4->z_functiondemoat0x105d9bd90_4\n", "\n", "\n", "\n", "\n", "\n", "test_functiondemoat0x105d9bd90_3\n", "\n", "\n", "<test>\n", "while x <= z <= 64:\n", "\n", "\n", "\n", "\n", "\n", "z_functiondemoat0x105d9bd90_4->test_functiondemoat0x105d9bd90_3\n", "\n", "\n", "\n", "\n", "\n", "test_functiondemoat0x105d9bd90_3->z_functiondemoat0x105d9bd90_4\n", "\n", "\n", "\n", "\n", "\n", "\n", "x_functiondemoat0x105d9bd90_1\n", "\n", "\n", "x\n", "def demo(x: int) -> int:\n", "\n", "\n", "\n", "\n", "\n", "x_functiondemoat0x105d9bd90_1->test_functiondemoat0x105d9bd90_3\n", "\n", "\n", "\n", "\n", "\n", "z_functiondemoat0x105d9bd90_2\n", "\n", "\n", "z\n", "z = x\n", "\n", "\n", "\n", "\n", "\n", "x_functiondemoat0x105d9bd90_1->z_functiondemoat0x105d9bd90_2\n", "\n", "\n", "\n", "\n", "\n", "\n", "z_functiondemoat0x105d9bd90_2->z_functiondemoat0x105d9bd90_4\n", "\n", "\n", "\n", "\n", "\n", "z_functiondemoat0x105d9bd90_2->test_functiondemoat0x105d9bd90_3\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "<__main__.Slicer at 0x105debeb0>" ] }, "execution_count": 246, "metadata": {}, "output_type": "execute_result" } ], "source": [ "slicer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "An alternate representation is `slicer.code()`, annotating the instrumented source code with (backward) dependencies. Data dependencies are shown with `<=`, control dependencies with `<-`; locations (lines) are shown in parentheses." ] }, { "cell_type": "code", "execution_count": 247, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:34.988719Z", "iopub.status.busy": "2023-11-12T12:40:34.988593Z", "iopub.status.idle": "2023-11-12T12:40:35.145352Z", "shell.execute_reply": "2023-11-12T12:40:35.145041Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "* 1 \u001b[34mdef\u001b[39;49;00m \u001b[32mdemo\u001b[39;49;00m(x: \u001b[36mint\u001b[39;49;00m) -> \u001b[36mint\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", "* 2 z = x \u001b[37m# <= x (1)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 3 \u001b[34mwhile\u001b[39;49;00m x <= z <= \u001b[34m64\u001b[39;49;00m: \u001b[37m# <= x (1), z (4), z (2)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 4 z *= \u001b[34m2\u001b[39;49;00m \u001b[37m# <= z (4), z (2); <- (3)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "* 5 \u001b[34mreturn\u001b[39;49;00m z \u001b[37m# <= z (4)\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n" ] } ], "source": [ "slicer.code()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Dependencies can also be retrieved programmatically. The `dependencies()` method returns a `Dependencies` object encapsulating the dependency graph." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The method `all_vars()` returns all variables in the dependency graph. Each variable is encoded as a pair (_name_, _location_) where _location_ is a pair (_codename_, _lineno_)." ] }, { "cell_type": "code", "execution_count": 248, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:35.147211Z", "iopub.status.busy": "2023-11-12T12:40:35.147098Z", "iopub.status.idle": "2023-11-12T12:40:35.150592Z", "shell.execute_reply": "2023-11-12T12:40:35.150163Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{('', ( int>, 5)),\n", " ('', ( int>, 3)),\n", " ('x', ( int>, 1)),\n", " ('z', ( int>, 2)),\n", " ('z', ( int>, 4))}" ] }, "execution_count": 248, "metadata": {}, "output_type": "execute_result" } ], "source": [ "slicer.dependencies().all_vars()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`code()` and `graph()` methods can also be applied on dependencies. The method `backward_slice(var)` returns a backward slice for the given variable (again given as a pair (_name_, _location_)). To retrieve where `z` in Line 2 came from, use:" ] }, { "cell_type": "code", "execution_count": 249, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:35.152337Z", "iopub.status.busy": "2023-11-12T12:40:35.152239Z", "iopub.status.idle": "2023-11-12T12:40:35.154891Z", "shell.execute_reply": "2023-11-12T12:40:35.154622Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 249, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_, start_demo = inspect.getsourcelines(demo)\n", "start_demo" ] }, { "cell_type": "code", "execution_count": 250, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:35.156362Z", "iopub.status.busy": "2023-11-12T12:40:35.156270Z", "iopub.status.idle": "2023-11-12T12:40:35.565871Z", "shell.execute_reply": "2023-11-12T12:40:35.565514Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "dependencies\n", "\n", "\n", "\n", "x_functiondemoat0x105d9bd90_1\n", "\n", "\n", "x\n", "def demo(x: int) -> int:\n", "\n", "\n", "\n", "\n", "\n", "z_functiondemoat0x105d9bd90_2\n", "\n", "\n", "z\n", "z = x\n", "\n", "\n", "\n", "\n", "\n", "x_functiondemoat0x105d9bd90_1->z_functiondemoat0x105d9bd90_2\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 250, "metadata": {}, "output_type": "execute_result" } ], "source": [ "slicer.dependencies().backward_slice(('z', (demo, start_demo + 1))).graph() # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here are the classes defined in this chapter. A `Slicer` instruments a program, using a `DependencyTracker` at run time to collect `Dependencies`." ] }, { "cell_type": "code", "execution_count": 251, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:35.567709Z", "iopub.status.busy": "2023-11-12T12:40:35.567590Z", "iopub.status.idle": "2023-11-12T12:40:35.569460Z", "shell.execute_reply": "2023-11-12T12:40:35.569174Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "from ClassDiagram import display_class_hierarchy, class_tree" ] }, { "cell_type": "code", "execution_count": 252, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:35.570895Z", "iopub.status.busy": "2023-11-12T12:40:35.570790Z", "iopub.status.idle": "2023-11-12T12:40:35.572485Z", "shell.execute_reply": "2023-11-12T12:40:35.572184Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "assert class_tree(Slicer)[0][0] == Slicer" ] }, { "cell_type": "code", "execution_count": 253, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:35.574350Z", "iopub.status.busy": "2023-11-12T12:40:35.573878Z", "iopub.status.idle": "2023-11-12T12:40:36.067255Z", "shell.execute_reply": "2023-11-12T12:40:36.066884Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Slicer\n", "\n", "\n", "Slicer\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "_repr_mimebundle_()\n", "\n", "\n", "\n", "code()\n", "\n", "\n", "\n", "dependencies()\n", "\n", "\n", "\n", "graph()\n", "\n", "\n", "\n", "calls_in_our_with_block()\n", "\n", "\n", "\n", "default_items_to_instrument()\n", "\n", "\n", "\n", "execute()\n", "\n", "\n", "\n", "funcs_in_our_with_block()\n", "\n", "\n", "\n", "instrument()\n", "\n", "\n", "\n", "our_with_block()\n", "\n", "\n", "\n", "parse()\n", "\n", "\n", "\n", "restore()\n", "\n", "\n", "\n", "transform()\n", "\n", "\n", "\n", "transformers()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Instrumenter\n", "\n", "\n", "Instrumenter\n", "\n", "\n", "\n", "__enter__()\n", "\n", "\n", "\n", "__exit__()\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "instrument()\n", "\n", "\n", "\n", "default_items_to_instrument()\n", "\n", "\n", "\n", "restore()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Slicer->Instrumenter\n", "\n", "\n", "\n", "\n", "\n", "StackInspector\n", "\n", "\n", "StackInspector\n", "\n", "\n", "\n", "_generated_function_cache\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "caller_frame()\n", "\n", "\n", "\n", "caller_function()\n", "\n", "\n", "\n", "caller_globals()\n", "\n", "\n", "\n", "caller_locals()\n", "\n", "\n", "\n", "caller_location()\n", "\n", "\n", "\n", "search_frame()\n", "\n", "\n", "\n", "search_func()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Instrumenter->StackInspector\n", "\n", "\n", "\n", "\n", "\n", "DependencyTracker\n", "\n", "\n", "DependencyTracker\n", "\n", "\n", "\n", "TEST\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__enter__()\n", "\n", "\n", "\n", "__exit__()\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "arg()\n", "\n", "\n", "\n", "call()\n", "\n", "\n", "\n", "get()\n", "\n", "\n", "\n", "param()\n", "\n", "\n", "\n", "ret()\n", "\n", "\n", "\n", "set()\n", "\n", "\n", "\n", "test()\n", "\n", "\n", "\n", "call_generator()\n", "\n", "\n", "\n", "check_location()\n", "\n", "\n", "\n", "clear_read()\n", "\n", "\n", "\n", "dependencies()\n", "\n", "\n", "\n", "ignore_location_change()\n", "\n", "\n", "\n", "ignore_next_location_change()\n", "\n", "\n", "\n", "in_generator()\n", "\n", "\n", "\n", "ret_generator()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "DataTracker\n", "\n", "\n", "DataTracker\n", "\n", "\n", "\n", "__enter__()\n", "\n", "\n", "\n", "__exit__()\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "arg()\n", "\n", "\n", "\n", "augment()\n", "\n", "\n", "\n", "call()\n", "\n", "\n", "\n", "get()\n", "\n", "\n", "\n", "param()\n", "\n", "\n", "\n", "ret()\n", "\n", "\n", "\n", "set()\n", "\n", "\n", "\n", "test()\n", "\n", "\n", "\n", "instrument_call()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "DependencyTracker->DataTracker\n", "\n", "\n", "\n", "\n", "\n", "DataTracker->StackInspector\n", "\n", "\n", "\n", "\n", "\n", "Dependencies\n", "\n", "\n", "Dependencies\n", "\n", "\n", "\n", "FONT_NAME\n", "\n", "\n", "\n", "NODE_COLOR\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "__repr__()\n", "\n", "\n", "\n", "__str__()\n", "\n", "\n", "\n", "_repr_mimebundle_()\n", "\n", "\n", "\n", "all_functions()\n", "\n", "\n", "\n", "all_vars()\n", "\n", "\n", "\n", "backward_slice()\n", "\n", "\n", "\n", "code()\n", "\n", "\n", "\n", "graph()\n", "\n", "\n", "\n", "_code()\n", "\n", "\n", "\n", "_source()\n", "\n", "\n", "\n", "add_hierarchy()\n", "\n", "\n", "\n", "draw_dependencies()\n", "\n", "\n", "\n", "draw_edge()\n", "\n", "\n", "\n", "expand_criteria()\n", "\n", "\n", "\n", "format_var()\n", "\n", "\n", "\n", "id()\n", "\n", "\n", "\n", "label()\n", "\n", "\n", "\n", "make_graph()\n", "\n", "\n", "\n", "repr_dependencies()\n", "\n", "\n", "\n", "repr_deps()\n", "\n", "\n", "\n", "repr_var()\n", "\n", "\n", "\n", "source()\n", "\n", "\n", "\n", "tooltip()\n", "\n", "\n", "\n", "validate()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Dependencies->StackInspector\n", "\n", "\n", "\n", "\n", "\n", "Legend\n", "Legend\n", "• \n", "public_method()\n", "• \n", "private_method()\n", "• \n", "overloaded_method()\n", "Hover over names to see doc\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 253, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "display_class_hierarchy([Slicer, DependencyTracker, \n", " StackInspector, Dependencies],\n", " abstract_classes=[\n", " StackInspector,\n", " Instrumenter\n", " ],\n", " public_methods=[\n", " StackInspector.caller_frame,\n", " StackInspector.caller_function,\n", " StackInspector.caller_globals,\n", " StackInspector.caller_locals,\n", " StackInspector.caller_location,\n", " StackInspector.search_frame,\n", " StackInspector.search_func,\n", " Instrumenter.__init__,\n", " Instrumenter.__enter__,\n", " Instrumenter.__exit__,\n", " Instrumenter.instrument,\n", " Slicer.__init__,\n", " Slicer.code,\n", " Slicer.dependencies,\n", " Slicer.graph,\n", " Slicer._repr_mimebundle_,\n", " DataTracker.__init__,\n", " DataTracker.__enter__,\n", " DataTracker.__exit__,\n", " DataTracker.arg,\n", " DataTracker.augment,\n", " DataTracker.call,\n", " DataTracker.get,\n", " DataTracker.param,\n", " DataTracker.ret,\n", " DataTracker.set,\n", " DataTracker.test,\n", " DataTracker.__repr__,\n", " DependencyTracker.__init__,\n", " DependencyTracker.__enter__,\n", " DependencyTracker.__exit__,\n", " DependencyTracker.arg,\n", " # DependencyTracker.augment,\n", " DependencyTracker.call,\n", " DependencyTracker.get,\n", " DependencyTracker.param,\n", " DependencyTracker.ret,\n", " DependencyTracker.set,\n", " DependencyTracker.test,\n", " DependencyTracker.__repr__,\n", " Dependencies.__init__,\n", " Dependencies.__repr__,\n", " Dependencies.__str__,\n", " Dependencies._repr_mimebundle_,\n", " Dependencies.code,\n", " Dependencies.graph,\n", " Dependencies.backward_slice,\n", " Dependencies.all_functions,\n", " Dependencies.all_vars,\n", " ],\n", " project='debuggingbook')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Things that do not Work" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Our slicer (and especially the underlying dependency tracker) is still a proof of concept. A number of Python features are not or only partially supported, and/or hardly tested:\n", "\n", "* __Exceptions__ can lead to missing or erroneous dependencies. The code assumes that for every `call()`, there is a matching `ret()`; when exceptions break this, dependencies across function calls and arguments may be missing or be assigned incorrectly.\n", "* __Multiple definitions on a single line__ as in `x = y; x = 1` can lead to missing or erroneous dependencies. Our implementation assumes that there is one statement per line.\n", "* __If-Expressions__ (`y = 1 if x else 0`) do not create control dependencies, as there are no statements to control. Neither do `if` clauses in comprehensions.\n", "* __Asynchronous functions__ (`async`, `await`) are not tested.\n", "\n", "In these cases, the instrumentation and the underlying dependency tracker may fail to identify control and/or data flows. The semantics of the code, however, should always stay unchanged." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Lessons Learned\n", "\n", "* To track the origin of some incorrect value, follow back its _dependencies_:\n", " * _Data dependencies_ indicate where the value came from.\n", " * _Control dependencies_ show why a statement was executed.\n", "* A _slice_ is a subset of the code that could have influenced a specific value. It can be computed by transitively following all dependencies.\n", "* _Instrument code_ to automatically determine and visualize dependencies." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Next Steps\n", "\n", "In the [next chapter](StatisticalDebugger.ipynb), we will explore how to make use of _multiple_ passing and failing executions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Background\n", "\n", "Slicing as computing a subset of a program by means of data and control dependencies was invented by Mark Weiser \\cite{Weiser1981}. In his seminal work \"Programmers use Slices when Debugging\", \\cite{Weiser1982}, Weiser demonstrated how such dependencies are crucial for systematic debugging:\n", "\n", "> When debugging unfamiliar programs programmers use program pieces called _slices_ which are sets of statements related by their flow of data. The statements in a slice are not necessarily textually contiguous, but may be scattered through a program." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Weiser's slices (and dependencies) were determined _statically_ from program code. Both Korel and Laski \\cite{Korel1988} as well as Agrawal and Horgan \\cite{Agrawal1990} introduced _dynamic_ program slicing, building on _dynamic_ dependencies, which would be more specific to a given (failing) run. (The `Slicer` we implement in this chapter is a dynamic slicer.) Tip \\cite{Tip1995} gives a survey on program slicing techniques. Chen et al. \\cite{Chen2014} describe and evaluate the first dynamic slicer for Python programs (which is independent of our implementation)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "One exemplary application of program slices is [the Whyline](https://github.com/amyjko/whyline) by Ko and Myers \\cite{Ko2004}. The Whyline is a debugging interface for asking questions about program behavior. It allows querying interactively where a particular variable came from (a data dependency) and why or why not specific things took place (control dependencies)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In \\cite{Soremekun2021}, Soremekun et al. evaluated the performance of slicing as a fault localization mechanism and found that following dependencies was one of the most successful strategies to determine fault locations. Notably, if programmers first examine at most the top five most suspicious locations from [statistical debugging](StatisticalDebugger.ipynb), and then switch to dynamic slices, on average, they will need to examine only 15% (12 lines) of the code. " ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercises\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 1: Control Slices\n", "\n", "Augment the `Slicer` class with two keyword arguments, `include` and `exclude`, each taking a list of functions to instrument or not to instrument, respectively. These can be helpful when using \"automatic\" instrumentation." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 2: Incremental Exploration\n", "\n", "This is more of a programming project than a simple exercise. Rather than showing all dependencies as a whole, as we do, build a system that allows the user to _explore_ dependencies interactively." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 3: Forward Slicing\n", "\n", "Extend `Dependencies` with a variant of `backward_slice()` named `forward_slice()` that, instead of computing the dependencies that go _into_ a location, computes the dependencies that go _out_ of a location." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 4: Code with Forward Dependencies\n", "\n", "Create a variant of `Dependencies.code()` that, for each statement `s`, instead of showing a \"passive\" view (which variables and locations influenced `s`?), shows an \"active\" view (which variables and locations were influenced by `s`?). For `middle()`, for instance, the first line should show which lines are influenced by `x`, `y`, and `z`, respectively. Use `->` for control flows and `=>` for data flows." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 5: Flow Assertions\n", "\n", "In line with [Verifying Flows at Runtime](#Verifying-Flows-at-Runtime), above, implement a function `assert_flow(target, source)` that checks at runtime that the data flowing into `target` only comes from the variables named in `source`. " ] }, { "cell_type": "code", "execution_count": 254, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:36.069634Z", "iopub.status.busy": "2023-11-12T12:40:36.069363Z", "iopub.status.idle": "2023-11-12T12:40:36.071548Z", "shell.execute_reply": "2023-11-12T12:40:36.071262Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def assert_flow(target: Any, source: List[Any]) -> bool:\n", " \"\"\"\n", " Raise an `AssertionError` if the dependencies of `target`\n", " are not equal to `source`.\n", " \"\"\"\n", " ...\n", " return True" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`assert_flow()` would be used in conjunction with `Slicer()` as follows:" ] }, { "cell_type": "code", "execution_count": 255, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:36.073039Z", "iopub.status.busy": "2023-11-12T12:40:36.072935Z", "iopub.status.idle": "2023-11-12T12:40:36.074820Z", "shell.execute_reply": "2023-11-12T12:40:36.074524Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def demo4() -> int:\n", " x = 25\n", " y = 26\n", " assert_flow(y, [x]) # ensures that `y` depends on `x` only\n", " return y" ] }, { "cell_type": "code", "execution_count": 256, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:36.076711Z", "iopub.status.busy": "2023-11-12T12:40:36.076476Z", "iopub.status.idle": "2023-11-12T12:40:36.083635Z", "shell.execute_reply": "2023-11-12T12:40:36.083303Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with Slicer() as slicer:\n", " demo4()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To check dependencies, have `assert_flow()` check the contents of the `_data` dependency collector as set up by the slicer." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 6: Checked Coverage\n", "\n", "Implement checked coverage, as sketched in [Assessing Test Quality](#Assessing-Test-Quality) above. For every `assert` statement encountered during a run, produce the number of statements it depends upon." ] } ], "metadata": { "ipub": { "bibliography": "fuzzingbook.bib", "toc": true }, "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "toc-autonumbering": false, "vscode": { "interpreter": { "hash": "0af4f07dd039d1b4e562c7a7d0340393b1c66f50605ac6af30beb81aa23b7ef5" } } }, "nbformat": 4, "nbformat_minor": 4 }