{ "cells": [ { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Tracing Executions\n", "\n", "In this chapter, we show how to _observe program state during an execution_ – a prerequisite for logging and interactive debugging. Thanks to the power of Python, we can do this in a few lines of code." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.772902Z", "iopub.status.busy": "2023-11-12T12:39:40.772793Z", "iopub.status.idle": "2023-11-12T12:39:40.810306Z", "shell.execute_reply": "2023-11-12T12:39:40.810048Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from bookutils import YouTubeVideo\n", "YouTubeVideo(\"UYAvCl-5NGY\")" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "**Prerequisites**\n", "\n", "* You should have read the [Introduction to Debugging](Intro_Debugging.ipynb).\n", "* Knowing a bit of _Python_ is helpful for understanding the code examples in the book." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "button": false, "execution": { "iopub.execute_input": "2023-11-12T12:39:40.830406Z", "iopub.status.busy": "2023-11-12T12:39:40.830251Z", "iopub.status.idle": "2023-11-12T12:39:40.832242Z", "shell.execute_reply": "2023-11-12T12:39:40.832008Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import bookutils.setup" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.833757Z", "iopub.status.busy": "2023-11-12T12:39:40.833657Z", "iopub.status.idle": "2023-11-12T12:39:40.835143Z", "shell.execute_reply": "2023-11-12T12:39:40.834920Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import quiz" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.836537Z", "iopub.status.busy": "2023-11-12T12:39:40.836462Z", "iopub.status.idle": "2023-11-12T12:39:40.893574Z", "shell.execute_reply": "2023-11-12T12:39:40.893220Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import Intro_Debugging" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Synopsis\n", "\n", "\n", "To [use the code provided in this chapter](Importing.ipynb), write\n", "\n", "```python\n", ">>> from debuggingbook.Tracer import \n", "```\n", "\n", "and then make use of the following features.\n", "\n", "\n", "This chapter provides a `Tracer` class that allows logging events during program execution. The advanced subclass `EventTracer` allows restricting logs to specific conditions. Logs are shown only while the given `condition` holds:\n", "\n", "```python\n", ">>> with EventTracer(condition='line == 223 or len(out) >= 6'):\n", ">>> remove_html_markup('foobar')\n", "...\n", " # s = 'foobar', function = 'remove_html_markup', line = 243, tag = False, quote = False, out = 'foobar', c = 'r'\n", "243 for c in s:\n", " # line = 255\n", "255 return out\n", "remove_html_markup() returns 'foobar'\n", "\n", "```\n", "It also allows restricting logs to specific events. Log entries are shown only if one of the given `events` changes its value:\n", "\n", "```python\n", ">>> with EventTracer(events=[\"c == '/'\"]):\n", ">>> remove_html_markup('foobar')\n", "...\n", "Calling remove_html_markup(s = 'foobar', function = 'remove_html_markup', line = 238)\n", "...\n", " # line = 244, tag = False, quote = False, out = '', c = '<'\n", "244 assert tag or not quote\n", "...\n", " # tag = True, out = 'foo', c = '/'\n", "244 assert tag or not quote\n", "...\n", " # c = 'b'\n", "244 assert tag or not quote\n", "\n", "```\n", "`Tracer` and `EventTracer` classes allow for subclassing and further customization.\n", "\n", "![](PICS/Tracer-synopsis-1.svg)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Tracing Python Programs\n", "\n", "How do debugging tools access the state of a program during execution? For _interpreted_ languages such as Python, this is a simple task. If a language is interpreted, it is typically fairly easy to control execution and to inspect state – since this is what the interpreter is doing already anyway. Debuggers are then implemented on top of _hooks_ that allow interrupting execution and accessing program state." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "Python makes such a hook available in the function `sys.settrace()`. You invoke it with a *tracing function* that will be called at every line executed, as in\n", "\n", "```python\n", "sys.settrace(traceit)\n", "```\n", "\n", "Such a tracing function is convenient, as it simply traces _everything_. In contrast to an interactive debugger, where you have to select which aspect of the execution you're interested in, you can just print out a long trace into an *execution log*, to examine it later.\n", "\n", "This tracing function takes the format" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.895436Z", "iopub.status.busy": "2023-11-12T12:39:40.895318Z", "iopub.status.idle": "2023-11-12T12:39:40.897015Z", "shell.execute_reply": "2023-11-12T12:39:40.896770Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from types import FrameType, TracebackType" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.898440Z", "iopub.status.busy": "2023-11-12T12:39:40.898344Z", "iopub.status.idle": "2023-11-12T12:39:40.900030Z", "shell.execute_reply": "2023-11-12T12:39:40.899745Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# ignore\n", "from typing import Any, Optional, Callable, Dict, List, Type, TextIO, cast" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.901397Z", "iopub.status.busy": "2023-11-12T12:39:40.901312Z", "iopub.status.idle": "2023-11-12T12:39:40.903062Z", "shell.execute_reply": "2023-11-12T12:39:40.902786Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def traceit(frame: FrameType, event: str, arg: Any) -> Optional[Callable]:\n", " ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here, `event` is a string telling what has happened in the program – for instance,\n", "\n", "* `'line'` – a new line is executed\n", "* `'call'` – a function just has been called\n", "* `'return'` – a function returns" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `frame` argument holds the current execution frame – that is, the function and its local variables:\n", "\n", "* `frame.f_lineno` – the current line\n", "* `frame.f_locals` – the current variables (as a Python dictionary)\n", "* `frame.f_code` – the current code (as a Code object), with attributes such as\n", " * `frame.f_code.co_name` – the name of the current function" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can thus get a *trace* of the program by simply printing out these values:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.904576Z", "iopub.status.busy": "2023-11-12T12:39:40.904474Z", "iopub.status.idle": "2023-11-12T12:39:40.906107Z", "shell.execute_reply": "2023-11-12T12:39:40.905870Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def traceit(frame: FrameType, event: str, arg: Any) -> Optional[Callable]: # type: ignore\n", " print(event, frame.f_lineno, frame.f_code.co_name, frame.f_locals)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The return value of the trace function is the function to be executed at the next event – typically, this is the function itself:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.907609Z", "iopub.status.busy": "2023-11-12T12:39:40.907492Z", "iopub.status.idle": "2023-11-12T12:39:40.909206Z", "shell.execute_reply": "2023-11-12T12:39:40.908969Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def traceit(frame: FrameType, event: str, arg: Any) -> Optional[Callable]: # type: ignore\n", " print(event, frame.f_lineno, frame.f_code.co_name, frame.f_locals)\n", " return traceit" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us try this out on the `remove_html_markup()` function introduced in the [Introduction to Debugging](Intro_Debugging.ipynb):" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.910716Z", "iopub.status.busy": "2023-11-12T12:39:40.910613Z", "iopub.status.idle": "2023-11-12T12:39:40.912162Z", "shell.execute_reply": "2023-11-12T12:39:40.911938Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from Intro_Debugging import remove_html_markup" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.913494Z", "iopub.status.busy": "2023-11-12T12:39:40.913409Z", "iopub.status.idle": "2023-11-12T12:39:40.915009Z", "shell.execute_reply": "2023-11-12T12:39:40.914767Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import inspect" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.916470Z", "iopub.status.busy": "2023-11-12T12:39:40.916389Z", "iopub.status.idle": "2023-11-12T12:39:40.917974Z", "shell.execute_reply": "2023-11-12T12:39:40.917749Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import print_content" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.919435Z", "iopub.status.busy": "2023-11-12T12:39:40.919356Z", "iopub.status.idle": "2023-11-12T12:39:40.989359Z", "shell.execute_reply": "2023-11-12T12:39:40.989084Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "238 \u001b[34mdef\u001b[39;49;00m \u001b[32mremove_html_markup\u001b[39;49;00m(s): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "239 tag = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "240 quote = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "241 out = \u001b[33m\"\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "242 \u001b[37m\u001b[39;49;00m\n", "243 \u001b[34mfor\u001b[39;49;00m c \u001b[35min\u001b[39;49;00m s:\u001b[37m\u001b[39;49;00m\n", "244 \u001b[34massert\u001b[39;49;00m tag \u001b[35mor\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote\u001b[37m\u001b[39;49;00m\n", "245 \u001b[37m\u001b[39;49;00m\n", "246 \u001b[34mif\u001b[39;49;00m c == \u001b[33m'\u001b[39;49;00m\u001b[33m<\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote:\u001b[37m\u001b[39;49;00m\n", "247 tag = \u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "248 \u001b[34melif\u001b[39;49;00m c == \u001b[33m'\u001b[39;49;00m\u001b[33m>\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote:\u001b[37m\u001b[39;49;00m\n", "249 tag = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "250 \u001b[34melif\u001b[39;49;00m (c == \u001b[33m'\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mor\u001b[39;49;00m c == \u001b[33m\"\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m) \u001b[35mand\u001b[39;49;00m tag:\u001b[37m\u001b[39;49;00m\n", "251 quote = \u001b[35mnot\u001b[39;49;00m quote\u001b[37m\u001b[39;49;00m\n", "252 \u001b[34melif\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m tag:\u001b[37m\u001b[39;49;00m\n", "253 out = out + c\u001b[37m\u001b[39;49;00m\n", "254 \u001b[37m\u001b[39;49;00m\n", "255 \u001b[34mreturn\u001b[39;49;00m out\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "content, start_line_number = inspect.getsourcelines(remove_html_markup)\n", "print_content(content=\"\".join(content).strip(), filename='.py', start_line_number=start_line_number)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We define a variant `remove_html_markup_traced()` which turns on tracing, invokes `remove_html_markup()`, and turns tracing off again." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.990904Z", "iopub.status.busy": "2023-11-12T12:39:40.990789Z", "iopub.status.idle": "2023-11-12T12:39:40.992414Z", "shell.execute_reply": "2023-11-12T12:39:40.992174Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import sys" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.993825Z", "iopub.status.busy": "2023-11-12T12:39:40.993721Z", "iopub.status.idle": "2023-11-12T12:39:40.995490Z", "shell.execute_reply": "2023-11-12T12:39:40.995256Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def remove_html_markup_traced(s): # type: ignore\n", " sys.settrace(traceit)\n", " ret = remove_html_markup(s)\n", " sys.settrace(None)\n", " return ret" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is what we get when we run `remove_html_markup_traced()`:\n", "* We first get a `call` event (showing the call of `remove_html_markup()`)\n", "* We then get various `line` events (for each line of `remove_html_markup()`)\n", "* In the end, we get a `return` event (showing the return from `remove_html_markup()`)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:40.996946Z", "iopub.status.busy": "2023-11-12T12:39:40.996860Z", "iopub.status.idle": "2023-11-12T12:39:40.999717Z", "shell.execute_reply": "2023-11-12T12:39:40.999503Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "call 238 remove_html_markup {'s': 'xyz'}\n", "line 239 remove_html_markup {'s': 'xyz'}\n", "line 240 remove_html_markup {'s': 'xyz', 'tag': False}\n", "line 241 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False}\n", "line 243 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': ''}\n", "line 244 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}\n", "line 246 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}\n", "line 248 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}\n", "line 250 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}\n", "line 252 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}\n", "line 253 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}\n", "line 243 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'x', 'c': 'x'}\n", "line 244 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'x', 'c': 'y'}\n", "line 246 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'x', 'c': 'y'}\n", "line 248 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'x', 'c': 'y'}\n", "line 250 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'x', 'c': 'y'}\n", "line 252 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'x', 'c': 'y'}\n", "line 253 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'x', 'c': 'y'}\n", "line 243 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'xy', 'c': 'y'}\n", "line 244 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'xy', 'c': 'z'}\n", "line 246 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'xy', 'c': 'z'}\n", "line 248 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'xy', 'c': 'z'}\n", "line 250 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'xy', 'c': 'z'}\n", "line 252 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'xy', 'c': 'z'}\n", "line 253 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'xy', 'c': 'z'}\n", "line 243 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'xyz', 'c': 'z'}\n", "line 255 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'xyz', 'c': 'z'}\n", "return 255 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'xyz', 'c': 'z'}\n" ] }, { "data": { "text/plain": [ "'xyz'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "remove_html_markup_traced('xyz')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "During the execution, we also see all local _variables_. As `remove_html_markup()` is called at the very beginning, the parameter `s` holds the argument `\"xyz\"`. As more local variables are being assigned, these show up in our dictionary of local variables." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see how the variable `c` takes one character of the input string at a time; the `out` variable accumulates them. and the `tag` and `quote` flags stay unchanged throughout the execution." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "An interesting aspect is that we can actually _access_ all these local variables as regular Python objects. We can, for instance, separately access the value of `c` by looking up `frame.f_locals['c']`:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.001560Z", "iopub.status.busy": "2023-11-12T12:39:41.001279Z", "iopub.status.idle": "2023-11-12T12:39:41.003225Z", "shell.execute_reply": "2023-11-12T12:39:41.002999Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def traceit(frame: FrameType, event: str, arg: Any) -> Optional[Callable]: # type: ignore\n", " if 'c' in frame.f_locals:\n", " value_of_c = frame.f_locals['c']\n", " print(f\"{frame.f_lineno:} c = {repr(value_of_c)}\")\n", " else:\n", " print(f\"{frame.f_lineno:} c is undefined\")\n", "\n", " return traceit" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This allows us to specifically monitor individual variables:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.004930Z", "iopub.status.busy": "2023-11-12T12:39:41.004683Z", "iopub.status.idle": "2023-11-12T12:39:41.007092Z", "shell.execute_reply": "2023-11-12T12:39:41.006847Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "238 c is undefined\n", "239 c is undefined\n", "240 c is undefined\n", "241 c is undefined\n", "243 c is undefined\n", "244 c = 'x'\n", "246 c = 'x'\n", "248 c = 'x'\n", "250 c = 'x'\n", "252 c = 'x'\n", "253 c = 'x'\n", "243 c = 'x'\n", "244 c = 'y'\n", "246 c = 'y'\n", "248 c = 'y'\n", "250 c = 'y'\n", "252 c = 'y'\n", "253 c = 'y'\n", "243 c = 'y'\n", "244 c = 'z'\n", "246 c = 'z'\n", "248 c = 'z'\n", "250 c = 'z'\n", "252 c = 'z'\n", "253 c = 'z'\n", "243 c = 'z'\n", "255 c = 'z'\n", "255 c = 'z'\n" ] }, { "data": { "text/plain": [ "'xyz'" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "remove_html_markup_traced('xyz')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This tracing capability is tremendously powerful – actually, it is one of the reasons this book uses Python all over the place. In most other languages, inspecting the program state during execution is much more complex than the handful of lines we have needed so far." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To learn more about `sys.settrace()`, spend a moment to look up [its documentation in the Python reference](https://docs.python.org/3/library/sys.html)." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.008807Z", "iopub.status.busy": "2023-11-12T12:39:41.008565Z", "iopub.status.idle": "2023-11-12T12:39:41.010071Z", "shell.execute_reply": "2023-11-12T12:39:41.009847Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "import math" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.011475Z", "iopub.status.busy": "2023-11-12T12:39:41.011397Z", "iopub.status.idle": "2023-11-12T12:39:41.015910Z", "shell.execute_reply": "2023-11-12T12:39:41.015519Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

What happens if the tracing function returns None while tracing function f()? (You can also try this out yourself.)
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"What happens if the tracing function returns `None` while tracing function `f()`?\"\n", " \" (You can also try this out yourself.)\",\n", " [\n", " 'Tracing stops for all functions;'\n", " ' the tracing function is no longer called',\n", " 'Tracing stops for `f()`: the tracing function is called when `f()` returns',\n", " 'Tracing stops for `f()` the rest of the execution: the tracing function'\n", " ' is no longer called for calls to `f()`',\n", " 'Nothing changes'\n", " ], \"int(math.log(7.38905609893065))\", globals())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Indeed, as listed in the documentation: if `sys.settrace()` returns `None`, then tracing stops for the current scope; tracing will resume when the current function returns. This can be helpful for momentarily disable (expensive) tracing." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## A Tracer Class" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us refine our tracing function a bit. First, it would be nice if one could actually _customize_ tracing just as needed. To this end, we introduce a `Tracer` class that does all the formatting for us, and which can be _subclassed_ to allow for different output formats." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `traceit()` method in `Tracer` is the same as above, and again is set up via `sys.settrace()`. It uses a `log()` method after the Python `print()` function." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The typical usage of `Tracer`, however, is as follows:\n", "\n", "```python\n", "with Tracer():\n", " # Code to be traced\n", " ...\n", "\n", "# Code no longer traced\n", "...\n", "```\n", "\n", "When the `with` statement is encountered, the `__enter__()` method is called, which starts tracing. When the `with` block ends, the `__exit__()` method is called, and tracing is turned off. We take special care that the internal `__exit__()` method is not part of the trace, and that any other tracing function that was active before is being restored." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We build `Tracer` on top of a class named `StackInspector`, whose `our_frame()` and `is_internal_error()` methods us with providing better diagnostics in case of error." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.017888Z", "iopub.status.busy": "2023-11-12T12:39:41.017727Z", "iopub.status.idle": "2023-11-12T12:39:41.038242Z", "shell.execute_reply": "2023-11-12T12:39:41.037961Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from StackInspector import StackInspector" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.039981Z", "iopub.status.busy": "2023-11-12T12:39:41.039892Z", "iopub.status.idle": "2023-11-12T12:39:41.043677Z", "shell.execute_reply": "2023-11-12T12:39:41.043453Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Tracer(StackInspector):\n", " \"\"\"A class for tracing a piece of code. Use as `with Tracer(): block()`\"\"\"\n", "\n", " def __init__(self, *, file: TextIO = sys.stdout) -> None:\n", " \"\"\"Trace a block of code, sending logs to `file` (default: stdout)\"\"\"\n", " self.original_trace_function: Optional[Callable] = None\n", " self.file = file\n", "\n", " def traceit(self, frame: FrameType, event: str, arg: Any) -> None:\n", " \"\"\"Tracing function. To be overridden in subclasses.\"\"\"\n", " self.log(event, frame.f_lineno, frame.f_code.co_name, frame.f_locals)\n", "\n", " def _traceit(self, frame: FrameType, event: str, arg: Any) -> Optional[Callable]:\n", " \"\"\"Internal tracing function.\"\"\"\n", " if self.our_frame(frame):\n", " # Do not trace our own methods\n", " pass\n", " else:\n", " self.traceit(frame, event, arg)\n", " return self._traceit\n", "\n", " def log(self, *objects: Any, \n", " sep: str = ' ', end: str = '\\n', \n", " flush: bool = True) -> None:\n", " \"\"\"\n", " Like `print()`, but always sending to `file` given at initialization,\n", " and flushing by default.\n", " \"\"\"\n", " print(*objects, sep=sep, end=end, file=self.file, flush=flush)\n", "\n", " def __enter__(self) -> Any:\n", " \"\"\"Called at begin of `with` block. Turn tracing on.\"\"\"\n", " self.original_trace_function = sys.gettrace()\n", " sys.settrace(self._traceit)\n", "\n", " # This extra line also enables tracing for the current block\n", " # inspect.currentframe().f_back.f_trace = self._traceit\n", " return self\n", "\n", " def __exit__(self, exc_tp: Type, exc_value: BaseException, \n", " exc_traceback: TracebackType) -> Optional[bool]:\n", " \"\"\"\n", " Called at end of `with` block. Turn tracing off.\n", " Return `None` if ok, not `None` if internal error.\n", " \"\"\"\n", " sys.settrace(self.original_trace_function)\n", "\n", " # Note: we must return a non-True value here,\n", " # such that we re-raise all exceptions\n", " if self.is_internal_error(exc_tp, exc_value, exc_traceback):\n", " return False # internal error\n", " else:\n", " return None # all ok" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here's how we use the `Tracer` class. You see that everything works as before, except that it is nicer to use:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.045339Z", "iopub.status.busy": "2023-11-12T12:39:41.045250Z", "iopub.status.idle": "2023-11-12T12:39:41.052898Z", "shell.execute_reply": "2023-11-12T12:39:41.052630Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "call 238 remove_html_markup {'s': 'abc'}\n", "line 239 remove_html_markup {'s': 'abc'}\n", "line 240 remove_html_markup {'s': 'abc', 'tag': False}\n", "line 241 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False}\n", "line 243 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': ''}\n", "line 244 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': '', 'c': 'a'}\n", "line 246 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': '', 'c': 'a'}\n", "line 248 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': '', 'c': 'a'}\n", "line 250 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': '', 'c': 'a'}\n", "line 252 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': '', 'c': 'a'}\n", "line 253 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': '', 'c': 'a'}\n", "line 243 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'a', 'c': 'a'}\n", "line 244 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'a', 'c': 'b'}\n", "line 246 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'a', 'c': 'b'}\n", "line 248 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'a', 'c': 'b'}\n", "line 250 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'a', 'c': 'b'}\n", "line 252 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'a', 'c': 'b'}\n", "line 253 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'a', 'c': 'b'}\n", "line 243 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'ab', 'c': 'b'}\n", "line 244 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'ab', 'c': 'c'}\n", "line 246 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'ab', 'c': 'c'}\n", "line 248 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'ab', 'c': 'c'}\n", "line 250 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'ab', 'c': 'c'}\n", "line 252 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'ab', 'c': 'c'}\n", "line 253 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'ab', 'c': 'c'}\n", "line 243 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'abc', 'c': 'c'}\n", "line 255 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'abc', 'c': 'c'}\n", "return 255 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'abc', 'c': 'c'}\n" ] } ], "source": [ "with Tracer():\n", " remove_html_markup(\"abc\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Accessing Source Code\n", "\n", "We can now go and _extend_ the class with additional features. It would be nice if it could actually display the source code of the function being tracked, such that we know where we are. In Python, the function `inspect.getsource()` returns the source code of a function or module. Looking up\n", "\n", "```python\n", "module = inspect.getmodule(frame.f_code)\n", "```\n", "\n", "gives us the current module, and\n", "\n", "```python\n", "inspect.getsource(module)\n", "```\n", "\n", "gives us its source code. All we then have to do is to retrieve the current line." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To implement our extended `traceit()` method, we use a bit of a hack. The Python language requires us to define an entire class with all methods as a single, continuous unit; however, we would like to introduce one method after another. To avoid this problem, we use a special hack: Whenever we want to introduce a new method to some class `C`, we use the construct\n", "\n", "```python\n", "class C(C):\n", " def new_method(self, args):\n", " pass\n", "```\n", "\n", "This seems to define `C` as a subclass of itself, which would make no sense – but actually, it introduces a new `C` class as a subclass of the _old_ `C` class, and then shadowing the old `C` definition. What this gets us is a `C` class with `new_method()` as a method, which is just what we want. (`C` objects defined earlier will retain the earlier `C` definition, though, and thus must be rebuilt.)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Using this hack, we can now redefine the `traceit()` method. Our new tracer shows the current line as it is executed." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.059573Z", "iopub.status.busy": "2023-11-12T12:39:41.059437Z", "iopub.status.idle": "2023-11-12T12:39:41.060966Z", "shell.execute_reply": "2023-11-12T12:39:41.060717Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import inspect" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.062456Z", "iopub.status.busy": "2023-11-12T12:39:41.062370Z", "iopub.status.idle": "2023-11-12T12:39:41.064782Z", "shell.execute_reply": "2023-11-12T12:39:41.064536Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Tracer(Tracer):\n", " def traceit(self, frame: FrameType, event: str, arg: Any) -> None:\n", " \"\"\"Tracing function; called at every line. To be overloaded in subclasses.\"\"\"\n", "\n", " if event == 'line':\n", " module = inspect.getmodule(frame.f_code)\n", " if module is None:\n", " source = inspect.getsource(frame.f_code)\n", " else:\n", " source = inspect.getsource(module)\n", " current_line = source.split('\\n')[frame.f_lineno - 1]\n", " self.log(frame.f_lineno, current_line)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.066859Z", "iopub.status.busy": "2023-11-12T12:39:41.066623Z", "iopub.status.idle": "2023-11-12T12:39:41.129658Z", "shell.execute_reply": "2023-11-12T12:39:41.129341Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "239 tag = False\n", "240 quote = False\n", "241 out = \"\"\n", "243 for c in s:\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "253 out = out + c\n", "243 for c in s:\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "253 out = out + c\n", "243 for c in s:\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "253 out = out + c\n", "243 for c in s:\n", "255 return out\n" ] } ], "source": [ "with Tracer():\n", " remove_html_markup(\"abc\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Tracing Calls and Returns\n", "\n", "Next, we'd like to report calling and returning from functions. For the `return` event, `arg` holds the value being returned." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.135318Z", "iopub.status.busy": "2023-11-12T12:39:41.135129Z", "iopub.status.idle": "2023-11-12T12:39:41.137989Z", "shell.execute_reply": "2023-11-12T12:39:41.137748Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Tracer(Tracer):\n", " def traceit(self, frame: FrameType, event: str, arg: Any) -> None:\n", " \"\"\"Tracing function. To be overridden in subclasses.\"\"\"\n", "\n", " if event == 'call':\n", " self.log(f\"Calling {frame.f_code.co_name}()\")\n", "\n", " if event == 'line':\n", " module = inspect.getmodule(frame.f_code)\n", " if module:\n", " source = inspect.getsource(module)\n", " if source:\n", " current_line = source.split('\\n')[frame.f_lineno - 1]\n", " self.log(frame.f_lineno, current_line)\n", "\n", " if event == 'return':\n", " self.log(f\"{frame.f_code.co_name}() returns {repr(arg)}\")" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.139591Z", "iopub.status.busy": "2023-11-12T12:39:41.139469Z", "iopub.status.idle": "2023-11-12T12:39:41.148483Z", "shell.execute_reply": "2023-11-12T12:39:41.148217Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Calling remove_html_markup()\n", "239 tag = False\n", "240 quote = False\n", "241 out = \"\"\n", "243 for c in s:\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "253 out = out + c\n", "243 for c in s:\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "253 out = out + c\n", "243 for c in s:\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "253 out = out + c\n", "243 for c in s:\n", "255 return out\n", "remove_html_markup() returns 'abc'\n" ] } ], "source": [ "with Tracer():\n", " remove_html_markup(\"abc\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Tracing Variable Changes\n", "\n", "Finally, we'd again like to report variables – but only those that have changed. To this end, we save a copy of the last reported variables in the class, reporting only the changed values." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.153777Z", "iopub.status.busy": "2023-11-12T12:39:41.153614Z", "iopub.status.idle": "2023-11-12T12:39:41.156404Z", "shell.execute_reply": "2023-11-12T12:39:41.156145Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Tracer(Tracer):\n", " def __init__(self, file: TextIO = sys.stdout) -> None:\n", " \"\"\"\n", " Create a new tracer.\n", " If `file` is given, output to `file` instead of stdout.\n", " \"\"\"\n", "\n", " self.last_vars: Dict[str, Any] = {}\n", " super().__init__(file=file)\n", "\n", " def changed_vars(self, new_vars: Dict[str, Any]) -> Dict[str, Any]:\n", " \"\"\"Track changed variables, based on `new_vars` observed.\"\"\"\n", " changed = {}\n", " for var_name, var_value in new_vars.items():\n", " if (var_name not in self.last_vars or\n", " self.last_vars[var_name] != var_value):\n", " changed[var_name] = var_value\n", " self.last_vars = new_vars.copy()\n", " return changed" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here's how this works: If variable `a` is set to 10 (and we didn't have it so far), it is marked as changed:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.158144Z", "iopub.status.busy": "2023-11-12T12:39:41.158001Z", "iopub.status.idle": "2023-11-12T12:39:41.159610Z", "shell.execute_reply": "2023-11-12T12:39:41.159367Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "tracer = Tracer()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.161169Z", "iopub.status.busy": "2023-11-12T12:39:41.161047Z", "iopub.status.idle": "2023-11-12T12:39:41.163231Z", "shell.execute_reply": "2023-11-12T12:39:41.162993Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'a': 10}" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tracer.changed_vars({'a': 10})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If another variable `b` is added, and only `b` is changed, then only `b` is marked as changed:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.164892Z", "iopub.status.busy": "2023-11-12T12:39:41.164749Z", "iopub.status.idle": "2023-11-12T12:39:41.167218Z", "shell.execute_reply": "2023-11-12T12:39:41.166960Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'b': 25}" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tracer.changed_vars({'a': 10, 'b': 25})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If both variables keep their values, nothing changes:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.168907Z", "iopub.status.busy": "2023-11-12T12:39:41.168759Z", "iopub.status.idle": "2023-11-12T12:39:41.170988Z", "shell.execute_reply": "2023-11-12T12:39:41.170627Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{}" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tracer.changed_vars({'a': 10, 'b': 25})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "But if new variables come along, they are listed again." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.172753Z", "iopub.status.busy": "2023-11-12T12:39:41.172602Z", "iopub.status.idle": "2023-11-12T12:39:41.175011Z", "shell.execute_reply": "2023-11-12T12:39:41.174704Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{'c': 10, 'd': 25}" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "changes = tracer.changed_vars({'c': 10, 'd': 25})\n", "changes" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The following expression creates a comma-separated list of variables and values:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.176887Z", "iopub.status.busy": "2023-11-12T12:39:41.176732Z", "iopub.status.idle": "2023-11-12T12:39:41.179153Z", "shell.execute_reply": "2023-11-12T12:39:41.178836Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'c = 10, d = 25'" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\", \".join([var + \" = \" + repr(changes[var]) for var in changes])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can now put all of this together in our tracing function, reporting any variable changes as we see them. Note how we exploit the fact that in a call, all variables have a \"new\" value; and when we return from a function, we explicitly delete the \"last\" variables." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.181047Z", "iopub.status.busy": "2023-11-12T12:39:41.180912Z", "iopub.status.idle": "2023-11-12T12:39:41.184743Z", "shell.execute_reply": "2023-11-12T12:39:41.184437Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Tracer(Tracer):\n", " def print_debugger_status(self, frame: FrameType, event: str, arg: Any) -> None:\n", " \"\"\"Show current source line and changed vars\"\"\"\n", " changes = self.changed_vars(frame.f_locals)\n", " changes_s = \", \".join([var + \" = \" + repr(changes[var])\n", " for var in changes])\n", "\n", " if event == 'call':\n", " self.log(\"Calling \" + frame.f_code.co_name + '(' + changes_s + ')')\n", " elif changes:\n", " self.log(' ' * 40, '#', changes_s)\n", "\n", " if event == 'line':\n", " try:\n", " module = inspect.getmodule(frame.f_code)\n", " if module is None:\n", " source = inspect.getsource(frame.f_code)\n", " else:\n", " source = inspect.getsource(module)\n", " current_line = source.split('\\n')[frame.f_lineno - 1]\n", "\n", " except OSError as err:\n", " self.log(f\"{err.__class__.__name__}: {err}\")\n", " current_line = \"\"\n", "\n", " self.log(repr(frame.f_lineno) + ' ' + current_line)\n", "\n", " if event == 'return':\n", " self.log(frame.f_code.co_name + '()' + \" returns \" + repr(arg))\n", " self.last_vars = {} # Delete 'last' variables\n", "\n", " def traceit(self, frame: FrameType, event: str, arg: Any) -> None:\n", " \"\"\"Tracing function; called at every line. To be overloaded in subclasses.\"\"\"\n", " self.print_debugger_status(frame, event, arg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here's the resulting trace of `remove_html_markup()` for a more complex input. You can see that the tracing output allows us to see which lines are executed as well as the variables whose value changes." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.186431Z", "iopub.status.busy": "2023-11-12T12:39:41.186336Z", "iopub.status.idle": "2023-11-12T12:39:41.206194Z", "shell.execute_reply": "2023-11-12T12:39:41.205924Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Calling remove_html_markup(s = 'x')\n", "239 tag = False\n", " # tag = False\n", "240 quote = False\n", " # quote = False\n", "241 out = \"\"\n", " # out = ''\n", "243 for c in s:\n", " # c = '<'\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "247 tag = True\n", " # tag = True\n", "243 for c in s:\n", " # c = 'b'\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "243 for c in s:\n", " # c = '>'\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "249 tag = False\n", " # tag = False\n", "243 for c in s:\n", " # c = 'x'\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "253 out = out + c\n", " # out = 'x'\n", "243 for c in s:\n", " # c = '<'\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "247 tag = True\n", " # tag = True\n", "243 for c in s:\n", " # c = '/'\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "243 for c in s:\n", " # c = 'b'\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "243 for c in s:\n", " # c = '>'\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "249 tag = False\n", " # tag = False\n", "243 for c in s:\n", "255 return out\n", "remove_html_markup() returns 'x'\n" ] } ], "source": [ "with Tracer():\n", " remove_html_markup('x')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As you see, even a simple function can create a long execution log. Hence, we will now explore how to focus tracing on particular _events_." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Conditional Tracing\n", "\n", "A log such as the above can very quickly become very messy – notably if executions take a long time, or if data structures become very complex. If one of our local variables were a list with 1,000 entries for instance, and were changed with each line, we'd be printing out the entire list with 1,000 entries for each step. \n", "\n", "We could still load the log into, say, a text editor or a database and then search for specific values, but this is still cumbersome – and expensive. A better alternative, however, is to have our tracer only log while specific _conditions_ hold." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To this end, we introduce a class `ConditionalTracer`, which gets a _conditional expression_ to be checked during executions. Only if this condition holds do we list the current status. With\n", "\n", "```python\n", "with ConditionalTracer(condition='c == \"z\"'):\n", " remove_html_markup(...)\n", "```\n", "\n", "we would obtain only the lines executed while `c` gets a value of `'z'`, and with\n", "\n", "```python\n", "with ConditionalTracer(condition='quote'):\n", " remove_html_markup(...)\n", "```\n", "\n", "we would obtain only the lines executed while `quote` is True. If we have multiple conditions, we can combine them into one using `and`, `or`, or `not`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Our `ConditionalTracer` class stores the condition in its `condition` attribute:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.217969Z", "iopub.status.busy": "2023-11-12T12:39:41.217780Z", "iopub.status.idle": "2023-11-12T12:39:41.220015Z", "shell.execute_reply": "2023-11-12T12:39:41.219723Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class ConditionalTracer(Tracer):\n", " def __init__(self, *, condition: Optional[str] = None, file: TextIO = sys.stdout) -> None:\n", " \"\"\"Constructor. Trace all events for which `condition` (a Python expr) holds.\"\"\"\n", "\n", " if condition is None:\n", " condition = 'False'\n", "\n", " self.condition: str = condition\n", " self.last_report: Optional[bool] = None\n", " super().__init__(file=file)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Its `traceit()` function _evaluates_ `condition` and reports the current line only if it holds. To this end, it uses the Python `eval()` function which evaluates the condition in the context of the local variables of the program under test. If the condition gets set, we print out three dots to indicate the elapsed time." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.221902Z", "iopub.status.busy": "2023-11-12T12:39:41.221712Z", "iopub.status.idle": "2023-11-12T12:39:41.223924Z", "shell.execute_reply": "2023-11-12T12:39:41.223594Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class ConditionalTracer(ConditionalTracer):\n", " def eval_in_context(self, expr: str, frame: FrameType) -> Optional[bool]:\n", " try:\n", " cond = eval(expr, None, frame.f_locals)\n", " except NameError: # (yet) undefined variable\n", " cond = None\n", " return cond" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `do_report()` function returns True if the status is to be reported:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.225856Z", "iopub.status.busy": "2023-11-12T12:39:41.225702Z", "iopub.status.idle": "2023-11-12T12:39:41.227685Z", "shell.execute_reply": "2023-11-12T12:39:41.227421Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ConditionalTracer(ConditionalTracer):\n", " def do_report(self, frame: FrameType, event: str, arg: Any) -> Optional[bool]:\n", " return self.eval_in_context(self.condition, frame)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We put everything together in our `traceit()` function:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.229458Z", "iopub.status.busy": "2023-11-12T12:39:41.229327Z", "iopub.status.idle": "2023-11-12T12:39:41.231509Z", "shell.execute_reply": "2023-11-12T12:39:41.231278Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class ConditionalTracer(ConditionalTracer):\n", " def traceit(self, frame: FrameType, event: str, arg: Any) -> None:\n", " report = self.do_report(frame, event, arg)\n", " if report != self.last_report:\n", " if report:\n", " self.log(\"...\")\n", " self.last_report = report\n", "\n", " if report:\n", " self.print_debugger_status(frame, event, arg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here's an example. We see that `quote` is set only while the three characters `b`, `a`, and `r` are processed (as should be)." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.233249Z", "iopub.status.busy": "2023-11-12T12:39:41.233122Z", "iopub.status.idle": "2023-11-12T12:39:41.245041Z", "shell.execute_reply": "2023-11-12T12:39:41.244670Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "...\n", " # s = '\"foo\"', tag = True, quote = True, out = '', c = '\"'\n", "243 for c in s:\n", " # c = 'b'\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "243 for c in s:\n", " # c = 'a'\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "243 for c in s:\n", " # c = 'r'\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "252 elif not tag:\n", "243 for c in s:\n", " # c = '\"'\n", "244 assert tag or not quote\n", "246 if c == '<' and not quote:\n", "248 elif c == '>' and not quote:\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", "251 quote = not quote\n" ] } ], "source": [ "with ConditionalTracer(condition='quote'):\n", " remove_html_markup('\"foo\"')" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.249172Z", "iopub.status.busy": "2023-11-12T12:39:41.249005Z", "iopub.status.idle": "2023-11-12T12:39:41.252538Z", "shell.execute_reply": "2023-11-12T12:39:41.252192Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

What happens if the condition contains a syntax error?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"What happens if the condition contains a syntax error?\",\n", " [\n", " \"The tracer stops, raising an exception\",\n", " \"The tracer continues as if the condition were `True`\",\n", " \"The tracer continues as if the condition were `False`\",\n", " ], '393 % 7')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here's the answer, illustrated in two examples. For syntax errors, we indeed get an exception:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.254273Z", "iopub.status.busy": "2023-11-12T12:39:41.254174Z", "iopub.status.idle": "2023-11-12T12:39:41.255941Z", "shell.execute_reply": "2023-11-12T12:39:41.255567Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ExpectError import ExpectError" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.257713Z", "iopub.status.busy": "2023-11-12T12:39:41.257622Z", "iopub.status.idle": "2023-11-12T12:39:41.259988Z", "shell.execute_reply": "2023-11-12T12:39:41.259744Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_21446/3292487498.py\", line 3, in \n", " remove_html_markup('\"foo\"')\n", " File \"/Users/zeller/Projects/debuggingbook/notebooks/Intro_Debugging.ipynb\", line 238, in remove_html_markup\n", " def remove_html_markup(s): # type: ignore\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_21446/3738699336.py\", line 19, in _traceit\n", " self.traceit(frame, event, arg)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_21446/2841885016.py\", line 3, in traceit\n", " report = self.do_report(frame, event, arg)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_21446/3084505080.py\", line 3, in do_report\n", " return self.eval_in_context(self.condition, frame)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_21446/1027065478.py\", line 4, in eval_in_context\n", " cond = eval(expr, None, frame.f_locals)\n", " File \"\", line 1\n", " 2 +\n", "SyntaxError: invalid syntax (expected)\n" ] } ], "source": [ "with ExpectError(SyntaxError):\n", " with ConditionalTracer(condition='2 +'):\n", " remove_html_markup('\"foo\"')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If a variable is undefined, though, the condition evaluates to False:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.261894Z", "iopub.status.busy": "2023-11-12T12:39:41.261647Z", "iopub.status.idle": "2023-11-12T12:39:41.264178Z", "shell.execute_reply": "2023-11-12T12:39:41.263912Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ExpectError():\n", " with ConditionalTracer(condition='undefined_variable'):\n", " remove_html_markup('\"foo\"')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can also have the log focus on _particular code locations_ only. To this end, we add the pseudo-variables `function` and `line` to our evaluation context, which can be used within our condition to refer to the current function name or line. Then, we invoke the original `eval_cond()` as above." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.265963Z", "iopub.status.busy": "2023-11-12T12:39:41.265747Z", "iopub.status.idle": "2023-11-12T12:39:41.267962Z", "shell.execute_reply": "2023-11-12T12:39:41.267671Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class ConditionalTracer(ConditionalTracer):\n", " def eval_in_context(self, expr: str, frame: FrameType) -> Any:\n", " frame.f_locals['function'] = frame.f_code.co_name\n", " frame.f_locals['line'] = frame.f_lineno\n", "\n", " return super().eval_in_context(expr, frame)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Again, here is an example. We focus on the parts of the function where the `out` variable is being set:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.269928Z", "iopub.status.busy": "2023-11-12T12:39:41.269783Z", "iopub.status.idle": "2023-11-12T12:39:41.284933Z", "shell.execute_reply": "2023-11-12T12:39:41.284550Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "...\n", "Calling remove_html_markup(s = 'xyz', function = 'remove_html_markup', line = 238)\n", " # line = 239\n", "239 tag = False\n", " # line = 240, tag = False\n", "240 quote = False\n", " # line = 241, quote = False\n", "241 out = \"\"\n", " # line = 243, out = ''\n", "243 for c in s:\n", " # line = 244, c = 'x'\n", "244 assert tag or not quote\n", " # line = 246\n", "246 if c == '<' and not quote:\n", " # line = 248\n", "248 elif c == '>' and not quote:\n", " # line = 250\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", " # line = 252\n", "252 elif not tag:\n", " # line = 253\n", "253 out = out + c\n", " # line = 243, out = 'x'\n", "243 for c in s:\n", " # line = 244, c = 'y'\n", "244 assert tag or not quote\n", " # line = 246\n", "246 if c == '<' and not quote:\n", " # line = 248\n", "248 elif c == '>' and not quote:\n", " # line = 250\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", " # line = 252\n", "252 elif not tag:\n", " # line = 253\n", "253 out = out + c\n", " # line = 243, out = 'xy'\n", "243 for c in s:\n", " # line = 244, c = 'z'\n", "244 assert tag or not quote\n", " # line = 246\n", "246 if c == '<' and not quote:\n", " # line = 248\n", "248 elif c == '>' and not quote:\n", " # line = 250\n", "250 elif (c == '\"' or c == \"'\") and tag:\n", " # line = 252\n", "252 elif not tag:\n", " # line = 253\n", "253 out = out + c\n", " # line = 243, out = 'xyz'\n", "243 for c in s:\n", " # line = 255\n", "255 return out\n", "remove_html_markup() returns 'xyz'\n" ] } ], "source": [ "with ConditionalTracer(condition=\"function == 'remove_html_markup' and line >= 237\"):\n", " remove_html_markup('xyz')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Using `line` and `function` in conditions is equivalent to conventional _breakpoints_ in interactive debuggers. We will encounter them again in the next chapter." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.294419Z", "iopub.status.busy": "2023-11-12T12:39:41.294263Z", "iopub.status.idle": "2023-11-12T12:39:41.297753Z", "shell.execute_reply": "2023-11-12T12:39:41.297467Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

If the program under test contains a variable named line, which line does the condition refer to?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"If the program under test contains a variable named `line`, \"\n", " \"which `line` does the condition refer to?\",\n", " [\n", " \"`line` as in the debugger\",\n", " \"`line` as in the program\"\n", " ], '(326 * 27 == 8888) + 1')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Watching Events\n", "\n", "As an alternative to conditional logging, we may also be interested to exactly trace when a variable not only _has_ a particular value, but also when it _changes_ its value.\n", "\n", "To this end, we set up an `EventTracer` class that _watches_ when some event takes place. It takes a list of expressions (\"events\") and evaluates them for each line; if any event changes its value, we log the status." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "With\n", "\n", "```python\n", "with EventTracer(events=['tag', 'quote']):\n", " remove_html_markup(...)\n", "```\n", "\n", "for instance, we would get a listing of all lines where `tag` or `quote` change their value; and with\n", "\n", "```python\n", "with EventTracer(events=['function']):\n", " remove_html_markup(...)\n", "```\n", "\n", "we would obtain a listing of all lines where the current function changes." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Our `EventTracer` class stores the list of events in its `events` attribute:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.299518Z", "iopub.status.busy": "2023-11-12T12:39:41.299422Z", "iopub.status.idle": "2023-11-12T12:39:41.301869Z", "shell.execute_reply": "2023-11-12T12:39:41.301605Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class EventTracer(ConditionalTracer):\n", " \"\"\"Log when a given event expression changes its value\"\"\"\n", "\n", " def __init__(self, *, condition: Optional[str] = None,\n", " events: List[str] = [], file: TextIO = sys.stdout) -> None:\n", " \"\"\"Constructor. `events` is a list of expressions to watch.\"\"\"\n", " self.events = events\n", " self.last_event_values: Dict[str, Any] = {}\n", " super().__init__(file=file, condition=condition)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Its `events_changed()` function _evaluates_ the individual events and checks if they change." ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.303576Z", "iopub.status.busy": "2023-11-12T12:39:41.303474Z", "iopub.status.idle": "2023-11-12T12:39:41.305892Z", "shell.execute_reply": "2023-11-12T12:39:41.305630Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class EventTracer(EventTracer):\n", " def events_changed(self, events: List[str], frame: FrameType) -> bool:\n", " \"\"\"Return True if any of the observed `events` has changed\"\"\"\n", " change = False\n", " for event in events:\n", " value = self.eval_in_context(event, frame)\n", "\n", " if (event not in self.last_event_values or\n", " value != self.last_event_values[event]):\n", " self.last_event_values[event] = value\n", " change = True\n", "\n", " return change" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We hook this into `do_report()`, the method that determines whether a line should be shown." ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.307582Z", "iopub.status.busy": "2023-11-12T12:39:41.307488Z", "iopub.status.idle": "2023-11-12T12:39:41.309710Z", "shell.execute_reply": "2023-11-12T12:39:41.309394Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class EventTracer(EventTracer):\n", " def do_report(self, frame: FrameType, event: str, arg: Any) -> bool:\n", " \"\"\"Return True if a line should be shown\"\"\"\n", " return (self.eval_in_context(self.condition, frame) or\n", " self.events_changed(self.events, frame))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This allows us to track, for instance, how `quote` and `tag` change their values over time." ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.311592Z", "iopub.status.busy": "2023-11-12T12:39:41.311432Z", "iopub.status.idle": "2023-11-12T12:39:41.320946Z", "shell.execute_reply": "2023-11-12T12:39:41.320642Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "...\n", "Calling remove_html_markup(s = '\"foo\"', function = 'remove_html_markup', line = 238)\n", "...\n", " # line = 240, tag = False\n", "240 quote = False\n", " # line = 241, quote = False\n", "241 out = \"\"\n", "...\n", " # line = 243, tag = True, out = '', c = '<'\n", "243 for c in s:\n", "...\n", " # quote = True, c = '\"'\n", "243 for c in s:\n", "...\n", " # quote = False\n", "243 for c in s:\n", "...\n", " # tag = False, c = '>'\n", "243 for c in s:\n", "...\n", " # tag = True, out = '\"foo\"', c = '<'\n", "243 for c in s:\n", "...\n", " # tag = False, c = '>'\n", "243 for c in s:\n" ] } ], "source": [ "with EventTracer(events=['quote', 'tag']):\n", " remove_html_markup('\"foo\"')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Continuously monitoring variable values at execution time is equivalent to the concept of *watchpoints* in interactive debuggers." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With this, we have all we need for observing what happens during execution: We can explore the entire state, and we can evaluate conditions and events we are interested in. In the next chapter, we will see how to turn these capabilities into an interactive debugger, where we can query all these things interactively." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Efficient Tracing\n", "\n", "While our framework is very flexible (and can still be extended further), it also is _slow_, since we have to evaluate all conditions and events for every single line of the program. Just how slow are things? We can easily measure this." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.324411Z", "iopub.status.busy": "2023-11-12T12:39:41.324272Z", "iopub.status.idle": "2023-11-12T12:39:41.328512Z", "shell.execute_reply": "2023-11-12T12:39:41.328248Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from Timer import Timer" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.330317Z", "iopub.status.busy": "2023-11-12T12:39:41.330212Z", "iopub.status.idle": "2023-11-12T12:39:41.331872Z", "shell.execute_reply": "2023-11-12T12:39:41.331617Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "runs = 1000" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here's the untraced execution time in seconds:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.333824Z", "iopub.status.busy": "2023-11-12T12:39:41.333664Z", "iopub.status.idle": "2023-11-12T12:39:41.337884Z", "shell.execute_reply": "2023-11-12T12:39:41.337578Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0.0016191670001717284" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with Timer() as t:\n", " for i in range(runs):\n", " remove_html_markup('\"foo\"')\n", "untraced_execution_time = t.elapsed_time()\n", "untraced_execution_time" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And here's the _traced_ execution time:" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.339705Z", "iopub.status.busy": "2023-11-12T12:39:41.339585Z", "iopub.status.idle": "2023-11-12T12:39:41.938043Z", "shell.execute_reply": "2023-11-12T12:39:41.937739Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "0.5955859589994361" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with Timer() as t:\n", " for i in range(runs):\n", " with EventTracer():\n", " remove_html_markup('\"foo\"')\n", "traced_execution_time = t.elapsed_time()\n", "traced_execution_time" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see that the _traced_ execution time is several hundred times slower:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.940060Z", "iopub.status.busy": "2023-11-12T12:39:41.939904Z", "iopub.status.idle": "2023-11-12T12:39:41.942179Z", "shell.execute_reply": "2023-11-12T12:39:41.941903Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "367.834793406899" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "traced_execution_time / untraced_execution_time" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can still speed up our implementation somewhat, but still will get nowhere near the untraced execution time." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "There is a trick, though, that allows us to execute programs at full speed while being traced. Rather than _dynamically_ checking at run time whether a condition is met, we can also _statically_ inject appropriate code into the program under test. This way, the non-traced code is executed at normal speed." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "There is a downside, though: This only works if the condition to be checked is limited to specific _locations_ – because it is precisely these locations where we insert our tracing code. With this limitation, though, _static_ tracing can speed up things significantly." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "How does static code injection work? The trick involves _rewriting_ the program code to insert special _debugging statements_ at the given locations. This way, we do not need to use the tracing function at all." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The following `insert_tracer()` function demonstrates this. It takes a function as well as a list of _breakpoint_ lines where to insert tracing statements. At each given line, it injects the code" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.944284Z", "iopub.status.busy": "2023-11-12T12:39:41.944145Z", "iopub.status.idle": "2023-11-12T12:39:41.945906Z", "shell.execute_reply": "2023-11-12T12:39:41.945652Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "TRACER_CODE = \\\n", " \"TRACER.print_debugger_status(inspect.currentframe(), 'line', None); \"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "into the function definition, which calls into this tracer:" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.947667Z", "iopub.status.busy": "2023-11-12T12:39:41.947500Z", "iopub.status.idle": "2023-11-12T12:39:41.949401Z", "shell.execute_reply": "2023-11-12T12:39:41.949093Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "TRACER = Tracer()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`insert_tracer()` then _compiles_ the resulting code into a new \"traced\" function, which it then returns." ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.950991Z", "iopub.status.busy": "2023-11-12T12:39:41.950884Z", "iopub.status.idle": "2023-11-12T12:39:41.953843Z", "shell.execute_reply": "2023-11-12T12:39:41.953607Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def insert_tracer(function: Callable, breakpoints: List[int] = []) -> Callable:\n", " \"\"\"Return a variant of `function` with tracing code `TRACER_CODE` inserted\n", " at each line given by `breakpoints`.\"\"\"\n", "\n", " source_lines, starting_line_number = inspect.getsourcelines(function)\n", "\n", " breakpoints.sort(reverse=True)\n", " for given_line in breakpoints:\n", " # Set new source line\n", " relative_line = given_line - starting_line_number + 1\n", " inject_line = source_lines[relative_line - 1]\n", " indent = len(inject_line) - len(inject_line.lstrip())\n", " source_lines[relative_line - 1] = ' ' * indent + TRACER_CODE + inject_line.lstrip()\n", "\n", " # Rename function\n", " new_function_name = function.__name__ + \"_traced\"\n", " source_lines[0] = source_lines[0].replace(function.__name__, new_function_name)\n", " new_def = \"\".join(source_lines)\n", "\n", " # For debugging\n", " print_content(new_def, '.py', start_line_number=starting_line_number)\n", "\n", " # We keep original source and filename to ease debugging\n", " prefix = '\\n' * starting_line_number # Get line number right\n", " new_function_code = compile(prefix + new_def, function.__code__.co_filename, 'exec')\n", " exec(new_function_code)\n", " new_function = eval(new_function_name)\n", " return new_function" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here's an example: inserting two breakpoints in (relative) Lines 7 and 18 of `remove_html_markup()` results in the following (rewritten) definition of `remove_html_markup_traced()`:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.955443Z", "iopub.status.busy": "2023-11-12T12:39:41.955331Z", "iopub.status.idle": "2023-11-12T12:39:41.957725Z", "shell.execute_reply": "2023-11-12T12:39:41.957448Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "_, remove_html_markup_starting_line_number = inspect.getsourcelines(remove_html_markup)\n", "breakpoints = [(remove_html_markup_starting_line_number - 1) + 7, \n", " (remove_html_markup_starting_line_number - 1) + 18]" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.959324Z", "iopub.status.busy": "2023-11-12T12:39:41.959181Z", "iopub.status.idle": "2023-11-12T12:39:41.992111Z", "shell.execute_reply": "2023-11-12T12:39:41.991825Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "238 \u001b[34mdef\u001b[39;49;00m \u001b[32mremove_html_markup_traced\u001b[39;49;00m(s): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "239 tag = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "240 quote = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "241 out = \u001b[33m\"\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "242 \u001b[37m\u001b[39;49;00m\n", "243 \u001b[34mfor\u001b[39;49;00m c \u001b[35min\u001b[39;49;00m s:\u001b[37m\u001b[39;49;00m\n", "244 TRACER.print_debugger_status(inspect.currentframe(), \u001b[33m'\u001b[39;49;00m\u001b[33mline\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[34mNone\u001b[39;49;00m); \u001b[34massert\u001b[39;49;00m tag \u001b[35mor\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote\u001b[37m\u001b[39;49;00m\n", "245 \u001b[37m\u001b[39;49;00m\n", "246 \u001b[34mif\u001b[39;49;00m c == \u001b[33m'\u001b[39;49;00m\u001b[33m<\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote:\u001b[37m\u001b[39;49;00m\n", "247 tag = \u001b[34mTrue\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "248 \u001b[34melif\u001b[39;49;00m c == \u001b[33m'\u001b[39;49;00m\u001b[33m>\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mand\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m quote:\u001b[37m\u001b[39;49;00m\n", "249 tag = \u001b[34mFalse\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "250 \u001b[34melif\u001b[39;49;00m (c == \u001b[33m'\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m \u001b[35mor\u001b[39;49;00m c == \u001b[33m\"\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m) \u001b[35mand\u001b[39;49;00m tag:\u001b[37m\u001b[39;49;00m\n", "251 quote = \u001b[35mnot\u001b[39;49;00m quote\u001b[37m\u001b[39;49;00m\n", "252 \u001b[34melif\u001b[39;49;00m \u001b[35mnot\u001b[39;49;00m tag:\u001b[37m\u001b[39;49;00m\n", "253 out = out + c\u001b[37m\u001b[39;49;00m\n", "254 \u001b[37m\u001b[39;49;00m\n", "255 TRACER.print_debugger_status(inspect.currentframe(), \u001b[33m'\u001b[39;49;00m\u001b[33mline\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, \u001b[34mNone\u001b[39;49;00m); \u001b[34mreturn\u001b[39;49;00m out\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "remove_html_markup_traced = insert_tracer(remove_html_markup, breakpoints)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If we execute the statically instrumented `remove_html_markup_traced()`, we obtain the same output as when using a dynamic tracer. Note that the source code listed shows the original code; the injected calls into `TRACER` do not show up." ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:41.993760Z", "iopub.status.busy": "2023-11-12T12:39:41.993649Z", "iopub.status.idle": "2023-11-12T12:39:42.006372Z", "shell.execute_reply": "2023-11-12T12:39:42.006080Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " # s = '\"foo\"', tag = False, quote = False, out = '', c = '<'\n", "245 \n", " # tag = True, c = 'b'\n", "245 \n", " # c = ' '\n", "245 \n", " # c = 't'\n", "245 \n", " # c = 'i'\n", "245 \n", " # c = 't'\n", "245 \n", " # c = 'l'\n", "245 \n", " # c = 'e'\n", "245 \n", " # c = '='\n", "245 \n", " # c = '\"'\n", "245 \n", " # quote = True, c = 'b'\n", "245 \n", " # c = 'a'\n", "245 \n", " # c = 'r'\n", "245 \n", " # c = '\"'\n", "245 \n", " # quote = False, c = '>'\n", "245 \n", " # tag = False, c = '\"'\n", "245 \n", " # out = '\"', c = 'f'\n", "245 \n", " # out = '\"f', c = 'o'\n", "245 \n", " # out = '\"fo'\n", "245 \n", " # out = '\"foo', c = '\"'\n", "245 \n", " # out = '\"foo\"', c = '<'\n", "245 \n", " # tag = True, c = '/'\n", "245 \n", " # c = 'b'\n", "245 \n", " # c = '>'\n", "245 \n", " # tag = False\n", "256 \n" ] } ], "source": [ "with Timer() as t:\n", " remove_html_markup_traced('\"foo\"')\n", "static_tracer_execution_time = t.elapsed_time()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "How fast is the static tracer compared with the dynamic tracer? This is the execution time of the above code:" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.016244Z", "iopub.status.busy": "2023-11-12T12:39:42.016099Z", "iopub.status.idle": "2023-11-12T12:39:42.018279Z", "shell.execute_reply": "2023-11-12T12:39:42.018024Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0.011141417000544607" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "static_tracer_execution_time" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Compare this with the equivalent dynamic tracer:" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.020105Z", "iopub.status.busy": "2023-11-12T12:39:42.019953Z", "iopub.status.idle": "2023-11-12T12:39:42.039651Z", "shell.execute_reply": "2023-11-12T12:39:42.039381Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "...\n", " # s = '\"foo\"', function = 'remove_html_markup', line = 244, tag = False, quote = False, out = '', c = '<'\n", "244 assert tag or not quote\n", "...\n", " # tag = True, c = 'b'\n", "244 assert tag or not quote\n", "...\n", " # c = ' '\n", "244 assert tag or not quote\n", "...\n", " # c = 't'\n", "244 assert tag or not quote\n", "...\n", " # c = 'i'\n", "244 assert tag or not quote\n", "...\n", " # c = 't'\n", "244 assert tag or not quote\n", "...\n", " # c = 'l'\n", "244 assert tag or not quote\n", "...\n", " # c = 'e'\n", "244 assert tag or not quote\n", "...\n", " # c = '='\n", "244 assert tag or not quote\n", "...\n", " # c = '\"'\n", "244 assert tag or not quote\n", "...\n", " # quote = True, c = 'b'\n", "244 assert tag or not quote\n", "...\n", " # c = 'a'\n", "244 assert tag or not quote\n", "...\n", " # c = 'r'\n", "244 assert tag or not quote\n", "...\n", " # c = '\"'\n", "244 assert tag or not quote\n", "...\n", " # quote = False, c = '>'\n", "244 assert tag or not quote\n", "...\n", " # tag = False, c = '\"'\n", "244 assert tag or not quote\n", "...\n", " # out = '\"', c = 'f'\n", "244 assert tag or not quote\n", "...\n", " # out = '\"f', c = 'o'\n", "244 assert tag or not quote\n", "...\n", " # out = '\"fo'\n", "244 assert tag or not quote\n", "...\n", " # out = '\"foo', c = '\"'\n", "244 assert tag or not quote\n", "...\n", " # out = '\"foo\"', c = '<'\n", "244 assert tag or not quote\n", "...\n", " # tag = True, c = '/'\n", "244 assert tag or not quote\n", "...\n", " # c = 'b'\n", "244 assert tag or not quote\n", "...\n", " # c = '>'\n", "244 assert tag or not quote\n", "...\n", " # line = 255, tag = False\n", "255 return out\n", "remove_html_markup() returns '\"foo\"'\n" ] }, { "data": { "text/plain": [ "0.017035457999554637" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "line7 = (remove_html_markup_starting_line_number - 1) + 7\n", "line18 = (remove_html_markup_starting_line_number - 1) + 18\n", "\n", "with Timer() as t:\n", " with EventTracer(condition=f'line == {line7} or line == {line18}'):\n", " remove_html_markup('\"foo\"')\n", "\n", "dynamic_tracer_execution_time = t.elapsed_time()\n", "dynamic_tracer_execution_time" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.053117Z", "iopub.status.busy": "2023-11-12T12:39:42.052977Z", "iopub.status.idle": "2023-11-12T12:39:42.055381Z", "shell.execute_reply": "2023-11-12T12:39:42.055077Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "1.5290207698645444" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_tracer_execution_time / static_tracer_execution_time" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see that the static tracker is several times faster – an advantage that will only increase further as more non-traced code is executed. If our code looks like this:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.057179Z", "iopub.status.busy": "2023-11-12T12:39:42.057018Z", "iopub.status.idle": "2023-11-12T12:39:42.058686Z", "shell.execute_reply": "2023-11-12T12:39:42.058434Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def some_extreme_function(s: str) -> None:\n", " ... # Long-running function\n", " remove_html_markup(s)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "and we then execute it with" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.060324Z", "iopub.status.busy": "2023-11-12T12:39:42.060199Z", "iopub.status.idle": "2023-11-12T12:39:42.063018Z", "shell.execute_reply": "2023-11-12T12:39:42.062717Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "...\n", " # s = 'foo', function = 'remove_html_markup', line = 255, tag = False, quote = False, out = 'foo', c = 'o'\n", "255 return out\n", "remove_html_markup() returns 'foo'\n" ] } ], "source": [ "with EventTracer(condition=f\"function=='remove_html_markup' and line == {line18}\"):\n", " some_extreme_function(\"foo\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "we will spend quite some time." ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.065275Z", "iopub.status.busy": "2023-11-12T12:39:42.065118Z", "iopub.status.idle": "2023-11-12T12:39:42.069744Z", "shell.execute_reply": "2023-11-12T12:39:42.069488Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

In the above example, where is the EventTracer.traceit() function called?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"In the above example, \"\n", " \"where is the `EventTracer.traceit()` function called?\",\n", " [\n", " \"When `some_extreme_function()` returns\",\n", " \"For each line of `some_extreme_function()`\",\n", " \"When `remove_html_markup()` returns\",\n", " \"For each line of `remove_html_markup()`\"\n", " ], \"[ord(c) - 100 for c in 'efgh']\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Indeed: Stepping line by line through some function can be pretty expensive, as every call, line, and return of `some_extreme_function()` and `remove_html_markup()` is tracked." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "On the other hand, the static tracker is limited to conditions that refer to a _specific location in the code._ If we want to check whether some variable changes, for instance, we have to perform a (nontrivial) static analysis of the code to determine possible locations for a change. If a variable is changed indirectly through references or pointers (a common risk in system-level languages like C), there is no alternative to actually watching its value after each instruction." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Tracing Binary Executables\n", "\n", "Debuggers that act on binary code (say, code compiled from C) operate similarly as our \"static\" tracer: They take a location in the binary code and replace its instruction with a _break instruction_ that interrupts execution, returning control to the debugger. The debugger then replaces the break instruction with the original instruction before resuming execution.\n", "\n", "If the code cannot be altered (for instance, because it is in read-only memory), however, then debuggers resort to the \"dynamic\" tracing method, executing one instruction at a time and checking the program counter for its current value after each step.\n", "\n", "To provide a minimum of efficient support, some processor architectures, such as x86, provide *hardware breakpoints*. Programmers (or more precisely, debugging tools) can define a set of specific values for the program counter to watch, and if the program counter reaches one of these values, execution is interrupted to return to the debugger. Likewise, *hardware watchpoints* will check specific memory locations at run time for changes and given values. There are also hardware watchpoints that break when a specific memory location is read from. Both hardware watchpoints and hardware breakpoints allow a limited tracking of stopping conditions while still maintaining original execution speed – and the best debugging tools will use a mix of static tracing, dynamic tracing, and hardware tracing." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Synopsis" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This chapter provides a `Tracer` class that allows logging events during program execution. The advanced subclass `EventTracer` allows restricting logs to specific conditions. Logs are shown only while the given `condition` holds:" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.071785Z", "iopub.status.busy": "2023-11-12T12:39:42.071550Z", "iopub.status.idle": "2023-11-12T12:39:42.075758Z", "shell.execute_reply": "2023-11-12T12:39:42.075513Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "...\n", " # s = 'foobar', function = 'remove_html_markup', line = 243, tag = False, quote = False, out = 'foobar', c = 'r'\n", "243 for c in s:\n", " # line = 255\n", "255 return out\n", "remove_html_markup() returns 'foobar'\n" ] } ], "source": [ "with EventTracer(condition='line == 223 or len(out) >= 6'):\n", " remove_html_markup('foobar')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "It also allows restricting logs to specific events. Log entries are shown only if one of the given `events` changes its value:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.077280Z", "iopub.status.busy": "2023-11-12T12:39:42.077200Z", "iopub.status.idle": "2023-11-12T12:39:42.082364Z", "shell.execute_reply": "2023-11-12T12:39:42.082110Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "...\n", "Calling remove_html_markup(s = 'foobar', function = 'remove_html_markup', line = 238)\n", "...\n", " # line = 244, tag = False, quote = False, out = '', c = '<'\n", "244 assert tag or not quote\n", "...\n", " # tag = True, out = 'foo', c = '/'\n", "244 assert tag or not quote\n", "...\n", " # c = 'b'\n", "244 assert tag or not quote\n" ] } ], "source": [ "with EventTracer(events=[\"c == '/'\"]):\n", " remove_html_markup('foobar')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`Tracer` and `EventTracer` classes allow for subclassing and further customization." ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.084420Z", "iopub.status.busy": "2023-11-12T12:39:42.084253Z", "iopub.status.idle": "2023-11-12T12:39:42.085768Z", "shell.execute_reply": "2023-11-12T12:39:42.085527Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "from ClassDiagram import display_class_hierarchy" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.087479Z", "iopub.status.busy": "2023-11-12T12:39:42.087316Z", "iopub.status.idle": "2023-11-12T12:39:42.610023Z", "shell.execute_reply": "2023-11-12T12:39:42.609546Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "EventTracer\n", "\n", "\n", "EventTracer\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "do_report()\n", "\n", "\n", "\n", "events_changed()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "ConditionalTracer\n", "\n", "\n", "ConditionalTracer\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "do_report()\n", "\n", "\n", "\n", "eval_in_context()\n", "\n", "\n", "\n", "traceit()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "EventTracer->ConditionalTracer\n", "\n", "\n", "\n", "\n", "\n", "Tracer\n", "\n", "\n", "Tracer\n", "\n", "\n", "\n", "__enter__()\n", "\n", "\n", "\n", "__exit__()\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "changed_vars()\n", "\n", "\n", "\n", "print_debugger_status()\n", "\n", "\n", "\n", "_traceit()\n", "\n", "\n", "\n", "log()\n", "\n", "\n", "\n", "traceit()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "ConditionalTracer->Tracer\n", "\n", "\n", "\n", "\n", "\n", "StackInspector\n", "\n", "\n", "StackInspector\n", "\n", "\n", "\n", "_generated_function_cache\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Tracer->StackInspector\n", "\n", "\n", "\n", "\n", "\n", "Legend\n", "Legend\n", "• \n", "public_method()\n", "• \n", "private_method()\n", "• \n", "overloaded_method()\n", "Hover over names to see doc\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "display_class_hierarchy(EventTracer,\n", " public_methods=[\n", " Tracer.__init__,\n", " Tracer.__enter__,\n", " Tracer.__exit__,\n", " Tracer.changed_vars,\n", " Tracer.print_debugger_status,\n", " ConditionalTracer.__init__,\n", " EventTracer.__init__,\n", " ],\n", " project='debuggingbook')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Lessons Learned\n", "\n", "* Interpreted languages can provide _debugging hooks_ that allow controlling program execution and access program state.\n", "* Tracing can be limited to specific conditions and events:\n", " * A _breakpoint_ is a condition referring to a particular location in the code.\n", " * A _watchpoint_ is an event referring to a particular state change.\n", "* Compiled languages allow _instrumenting_ code at compile time, injecting code that allows handing over control to a tracing or debugging tool." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Next Steps\n", "\n", "In the next chapter, we will see how to\n", "\n", "* [leverage our tracing infrastructure for interactive debugging](Debugger.ipynb)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Background\n", "\n", "Debugging interfaces like Python `sys.settrace()` are common in all programming languages that provide support for interactive debugging, providing support for executing programs step by step and inspecting program state along the way." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Low-Level Debugging Interfaces\n", "\n", "The first set of interfaces considered takes place at a _low level_, allowing access to _machine level_ features. On Linux and other UNIX-like systems, the [ptrace()](https://en.wikipedia.org/wiki/Ptrace) system call provides a means by which one process (the 'tracer') may observe and control the execution of another process (the 'tracee'), and examine and change the tracee's memory and registers.\n", "\n", "`ptrace()` is a low-level interface, which allows stepping over individual machine instructions and to read raw memory. In order to map instructions back to original statements and translate memory contents to variable values, compilers can include *debugging information* in the produced binaries, which debuggers then read out during a debugging session." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### High-Level Debugging Interfaces\n", "\n", "The second set of interfaces allows accessing the program's internals using the concepts of the program – i.e. variables and code locations, as Python does. The [Java Debug Interface](https://docs.oracle.com/javase/8/docs/jdk/api/jpda/jdi/) (JDI) is a _high-level interface_ for implementing a debugger (or tracer) on top of Java. [This introduction to JDI](https://www.baeldung.com/java-debug-interface) shows how to build a debugger using this interface in a few steps.\n", "\n", "For JavaScript, Mozilla's [Debugger API](https://developer.mozilla.org/en-US/docs/Tools/Debugger-API) and Google's [chrome.debugger API](https://developer.chrome.com/docs/extensions/reference/debugger/) similarly allow tracing and inspecting program execution." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercises\n" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 1: Exception Handling\n", "\n", "So far, we have only seen execution of lines in individual functions. But if a function raises an exception, we also may want to catch and report this. Right now, an exception is being raised right through our tracer, interrupting the trace. " ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.612296Z", "iopub.status.busy": "2023-11-12T12:39:42.612145Z", "iopub.status.idle": "2023-11-12T12:39:42.614323Z", "shell.execute_reply": "2023-11-12T12:39:42.613962Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def fail() -> float:\n", " return 2 / 0" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.615916Z", "iopub.status.busy": "2023-11-12T12:39:42.615787Z", "iopub.status.idle": "2023-11-12T12:39:42.621022Z", "shell.execute_reply": "2023-11-12T12:39:42.620734Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Calling fail()\n", "2 return 2 / 0\n", "fail() returns None\n" ] } ], "source": [ "with Tracer():\n", " try:\n", " fail()\n", " except Exception:\n", " pass" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" }, "solution2": "hidden", "solution2_first": true }, "source": [ "Extend the `Tracer` class (or the `EventTracer` subclasses) such that exceptions (event type `'exception'`) are properly traced, too, say as\n", "\n", "```\n", "fail() raises ZeroDivisionError: division by zero\n", "```\n", "\n", "See the Python documentation for `sys.settrace()`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** Simply extend `print_debugger_status()`:" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "cell_style": "split", "execution": { "iopub.execute_input": "2023-11-12T12:39:42.622954Z", "iopub.status.busy": "2023-11-12T12:39:42.622829Z", "iopub.status.idle": "2023-11-12T12:39:42.625115Z", "shell.execute_reply": "2023-11-12T12:39:42.624823Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "class Tracer(Tracer):\n", " def print_debugger_status(self, frame: FrameType, event: str, arg: Any) -> None:\n", " if event == 'exception':\n", " exception, value, tb = arg\n", " self.log(f\"{frame.f_code.co_name}() \"\n", " f\"raises {exception.__name__}: {value}\")\n", " else:\n", " super().print_debugger_status(frame, event, arg)" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.626962Z", "iopub.status.busy": "2023-11-12T12:39:42.626802Z", "iopub.status.idle": "2023-11-12T12:39:42.631600Z", "shell.execute_reply": "2023-11-12T12:39:42.631250Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Calling fail()\n", "2 return 2 / 0\n", "fail() raises ZeroDivisionError: division by zero\n", "fail() returns None\n" ] } ], "source": [ "with Tracer():\n", " try:\n", " fail()\n", " except Exception:\n", " pass" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 2: Syntax-Based Instrumentation\n", "\n", "Adding instrumentation to source code is a complicated business, notably because it is not always easy to determine where and how to instrument. If a Python line starts with \n", "\n", "```python\n", "if condition:\n", "```\n", "\n", "where should one insert code to instrument it?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A much more elegant way to instrument code is to add instrumentation _after the code has already been parsed_. Python code, like most other code, is first _parsed_ into an intermediate tree-like structure (called an *abstract syntax tree*, or *AST*). This AST can then be inspected and manipulated, before a second step compiles it into low-level instruction sequences to be executed." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us start with an example. Here is an AST resulting from parsing a very simple piece of code:" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.633506Z", "iopub.status.busy": "2023-11-12T12:39:42.633263Z", "iopub.status.idle": "2023-11-12T12:39:42.635197Z", "shell.execute_reply": "2023-11-12T12:39:42.634907Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def foo(): # type: ignore\n", " ret = 2 * 2\n", " return ret" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.636600Z", "iopub.status.busy": "2023-11-12T12:39:42.636481Z", "iopub.status.idle": "2023-11-12T12:39:42.668621Z", "shell.execute_reply": "2023-11-12T12:39:42.668330Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mfoo\u001b[39;49;00m(): \u001b[37m# type: ignore\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " ret = \u001b[34m2\u001b[39;49;00m * \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m ret\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "source = inspect.getsource(foo)\n", "print_content(source, '.py')" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.670160Z", "iopub.status.busy": "2023-11-12T12:39:42.670064Z", "iopub.status.idle": "2023-11-12T12:39:42.671789Z", "shell.execute_reply": "2023-11-12T12:39:42.671527Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import ast" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.673315Z", "iopub.status.busy": "2023-11-12T12:39:42.673195Z", "iopub.status.idle": "2023-11-12T12:39:42.674988Z", "shell.execute_reply": "2023-11-12T12:39:42.674623Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import show_ast" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.677046Z", "iopub.status.busy": "2023-11-12T12:39:42.676787Z", "iopub.status.idle": "2023-11-12T12:39:42.678583Z", "shell.execute_reply": "2023-11-12T12:39:42.678319Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "tree = ast.parse(source)" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:42.680169Z", "iopub.status.busy": "2023-11-12T12:39:42.680048Z", "iopub.status.idle": "2023-11-12T12:39:43.098916Z", "shell.execute_reply": "2023-11-12T12:39:43.098539Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "0\n", "FunctionDef\n", "\n", "\n", "\n", "1\n", ""foo"\n", "\n", "\n", "\n", "0--1\n", "\n", "\n", "\n", "\n", "2\n", "arguments\n", "\n", "\n", "\n", "0--2\n", "\n", "\n", "\n", "\n", "3\n", "Assign\n", "\n", "\n", "\n", "0--3\n", "\n", "\n", "\n", "\n", "13\n", "Return\n", "\n", "\n", "\n", "0--13\n", "\n", "\n", "\n", "\n", "4\n", "Name\n", "\n", "\n", "\n", "3--4\n", "\n", "\n", "\n", "\n", "7\n", "BinOp\n", "\n", "\n", "\n", "3--7\n", "\n", "\n", "\n", "\n", "5\n", ""ret"\n", "\n", "\n", "\n", "4--5\n", "\n", "\n", "\n", "\n", "6\n", "Store\n", "\n", "\n", "\n", "4--6\n", "\n", "\n", "\n", "\n", "8\n", "Constant\n", "\n", "\n", "\n", "7--8\n", "\n", "\n", "\n", "\n", "10\n", "Mult\n", "\n", "\n", "\n", "7--10\n", "\n", "\n", "\n", "\n", "11\n", "Constant\n", "\n", "\n", "\n", "7--11\n", "\n", "\n", "\n", "\n", "9\n", "2\n", "\n", "\n", "\n", "8--9\n", "\n", "\n", "\n", "\n", "12\n", "2\n", "\n", "\n", "\n", "11--12\n", "\n", "\n", "\n", "\n", "14\n", "Name\n", "\n", "\n", "\n", "13--14\n", "\n", "\n", "\n", "\n", "15\n", ""ret"\n", "\n", "\n", "\n", "14--15\n", "\n", "\n", "\n", "\n", "16\n", "Load\n", "\n", "\n", "\n", "14--16\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_ast(tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "You see that the function `foo()` has a `FunctionDef` node with four children: The function name (`\"foo\"`), its arguments (`arguments`; currently empty), followed by the statements that make the function body – `Assign` for the assignment, `Return` for the `return` statement." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We obtain and manipulate the AST through the Python `ast` module. The [official Python `ast` reference](http://docs.python.org/3/library/ast) is complete, but a bit brief; the documentation [\"Green Tree Snakes - the missing Python AST docs\"](https://greentreesnakes.readthedocs.io/en/latest/) provides an excellent introduction." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To instrument the above code, we need to insert a new statement as a child to `FunctionDef` node." ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:43.100988Z", "iopub.status.busy": "2023-11-12T12:39:43.100855Z", "iopub.status.idle": "2023-11-12T12:39:43.102813Z", "shell.execute_reply": "2023-11-12T12:39:43.102555Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ast import NodeTransformer, FunctionDef, fix_missing_locations, AST, Module" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here's the code we want to inject:" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:43.104498Z", "iopub.status.busy": "2023-11-12T12:39:43.104373Z", "iopub.status.idle": "2023-11-12T12:39:43.106337Z", "shell.execute_reply": "2023-11-12T12:39:43.106075Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "subtree_to_be_injected: AST = ast.parse(\"print('entering function')\")" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:43.107874Z", "iopub.status.busy": "2023-11-12T12:39:43.107770Z", "iopub.status.idle": "2023-11-12T12:39:43.494688Z", "shell.execute_reply": "2023-11-12T12:39:43.494357Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "0\n", "Expr\n", "\n", "\n", "\n", "1\n", "Call\n", "\n", "\n", "\n", "0--1\n", "\n", "\n", "\n", "\n", "2\n", "Name\n", "\n", "\n", "\n", "1--2\n", "\n", "\n", "\n", "\n", "5\n", "Constant\n", "\n", "\n", "\n", "1--5\n", "\n", "\n", "\n", "\n", "3\n", ""print"\n", "\n", "\n", "\n", "2--3\n", "\n", "\n", "\n", "\n", "4\n", "Load\n", "\n", "\n", "\n", "2--4\n", "\n", "\n", "\n", "\n", "6\n", ""entering function"\n", "\n", "\n", "\n", "5--6\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_ast(subtree_to_be_injected)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The root of an `ast.parse()` tree actually is a `Module` node; we go directly to its child, which is the `Expr` node we want to inject." ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:43.496539Z", "iopub.status.busy": "2023-11-12T12:39:43.496406Z", "iopub.status.idle": "2023-11-12T12:39:43.498262Z", "shell.execute_reply": "2023-11-12T12:39:43.498006Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "subtree_to_be_injected = cast(Module, subtree_to_be_injected).body[0]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To inject the code, we use the `NodeTransformer` class as described in the Python `ast` documentation. We vist all function definitions (`FunctionDef`) and replace them with a new function definition in which the `body` gets an additional child – namely our subtree to be injected." ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:43.499687Z", "iopub.status.busy": "2023-11-12T12:39:43.499586Z", "iopub.status.idle": "2023-11-12T12:39:43.501549Z", "shell.execute_reply": "2023-11-12T12:39:43.501305Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class InjectPass(NodeTransformer):\n", " def visit_FunctionDef(self, node: FunctionDef) -> AST:\n", " return FunctionDef(\n", " name=node.name,\n", " args=node.args,\n", " body=[subtree_to_be_injected] + node.body,\n", " decorator_list=node.decorator_list,\n", " returns=node.returns\n", " )" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:43.502951Z", "iopub.status.busy": "2023-11-12T12:39:43.502844Z", "iopub.status.idle": "2023-11-12T12:39:43.504561Z", "shell.execute_reply": "2023-11-12T12:39:43.504317Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "new_tree = fix_missing_locations(InjectPass().visit(tree))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This is what our new tree looks like:" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:43.505972Z", "iopub.status.busy": "2023-11-12T12:39:43.505872Z", "iopub.status.idle": "2023-11-12T12:39:43.899124Z", "shell.execute_reply": "2023-11-12T12:39:43.898726Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "0\n", "FunctionDef\n", "\n", "\n", "\n", "1\n", ""foo"\n", "\n", "\n", "\n", "0--1\n", "\n", "\n", "\n", "\n", "2\n", "arguments\n", "\n", "\n", "\n", "0--2\n", "\n", "\n", "\n", "\n", "3\n", "Expr\n", "\n", "\n", "\n", "0--3\n", "\n", "\n", "\n", "\n", "10\n", "Assign\n", "\n", "\n", "\n", "0--10\n", "\n", "\n", "\n", "\n", "20\n", "Return\n", "\n", "\n", "\n", "0--20\n", "\n", "\n", "\n", "\n", "4\n", "Call\n", "\n", "\n", "\n", "3--4\n", "\n", "\n", "\n", "\n", "5\n", "Name\n", "\n", "\n", "\n", "4--5\n", "\n", "\n", "\n", "\n", "8\n", "Constant\n", "\n", "\n", "\n", "4--8\n", "\n", "\n", "\n", "\n", "6\n", ""print"\n", "\n", "\n", "\n", "5--6\n", "\n", "\n", "\n", "\n", "7\n", "Load\n", "\n", "\n", "\n", "5--7\n", "\n", "\n", "\n", "\n", "9\n", ""entering function"\n", "\n", "\n", "\n", "8--9\n", "\n", "\n", "\n", "\n", "11\n", "Name\n", "\n", "\n", "\n", "10--11\n", "\n", "\n", "\n", "\n", "14\n", "BinOp\n", "\n", "\n", "\n", "10--14\n", "\n", "\n", "\n", "\n", "12\n", ""ret"\n", "\n", "\n", "\n", "11--12\n", "\n", "\n", "\n", "\n", "13\n", "Store\n", "\n", "\n", "\n", "11--13\n", "\n", "\n", "\n", "\n", "15\n", "Constant\n", "\n", "\n", "\n", "14--15\n", "\n", "\n", "\n", "\n", "17\n", "Mult\n", "\n", "\n", "\n", "14--17\n", "\n", "\n", "\n", "\n", "18\n", "Constant\n", "\n", "\n", "\n", "14--18\n", "\n", "\n", "\n", "\n", "16\n", "2\n", "\n", "\n", "\n", "15--16\n", "\n", "\n", "\n", "\n", "19\n", "2\n", "\n", "\n", "\n", "18--19\n", "\n", "\n", "\n", "\n", "21\n", "Name\n", "\n", "\n", "\n", "20--21\n", "\n", "\n", "\n", "\n", "22\n", ""ret"\n", "\n", "\n", "\n", "21--22\n", "\n", "\n", "\n", "\n", "23\n", "Load\n", "\n", "\n", "\n", "21--23\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_ast(new_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This is what the tree looks like when converted back to source code:" ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:43.901049Z", "iopub.status.busy": "2023-11-12T12:39:43.900904Z", "iopub.status.idle": "2023-11-12T12:39:43.935349Z", "shell.execute_reply": "2023-11-12T12:39:43.935048Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mfoo\u001b[39;49;00m():\u001b[37m\u001b[39;49;00m\n", " \u001b[36mprint\u001b[39;49;00m(\u001b[33m'\u001b[39;49;00m\u001b[33mentering function\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", " ret = \u001b[34m2\u001b[39;49;00m * \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m ret\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "new_source = ast.unparse(new_tree)\n", "print_content(new_source, '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can now compile the new source into a function:" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:43.937273Z", "iopub.status.busy": "2023-11-12T12:39:43.937145Z", "iopub.status.idle": "2023-11-12T12:39:43.939035Z", "shell.execute_reply": "2023-11-12T12:39:43.938686Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "exec(new_source)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "... and happily invoke our instrumented function." ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:39:43.940837Z", "iopub.status.busy": "2023-11-12T12:39:43.940633Z", "iopub.status.idle": "2023-11-12T12:39:43.942864Z", "shell.execute_reply": "2023-11-12T12:39:43.942594Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "entering function\n" ] }, { "data": { "text/plain": [ "4" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "foo()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Your task is to implement a function `insert_tracer_ast(function, breakpoints)` that works like `insert_tracer()`, above, except that it uses this AST-based mechanism to inject debugging code into the given function." ] } ], "metadata": { "ipub": { "bibliography": "fuzzingbook.bib", "toc": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "toc-autonumbering": false }, "nbformat": 4, "nbformat_minor": 4 }