{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Generalizing Failure Circumstances\n", "\n", "One central question in debugging is: _Does this bug occur in other situations, too?_ In this chapter, we present a technique that is set to _generalize_ the circumstances under which a failure occurs. The DDSET algorithm takes a failure-inducing input, breaks it into individual elements. For each element, it tries to find whether it can be replaced by others in the same category, and if so, it _generalizes_ the concrete element to the very category. The result is a _pattern_ that characterizes the failure condition: \"The failure occurs for all inputs of the form `( * )`\"." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:46.845859Z", "iopub.status.busy": "2023-11-12T12:41:46.845748Z", "iopub.status.idle": "2023-11-12T12:41:46.883486Z", "shell.execute_reply": "2023-11-12T12:41:46.883191Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from bookutils import YouTubeVideo\n", "YouTubeVideo(\"PV22XtIQU1s\")" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "**Prerequisites**\n", "\n", "* You should have read the [chapter on _delta debugging_](DeltaDebugger.ipynb)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "button": false, "execution": { "iopub.execute_input": "2023-11-12T12:41:46.904662Z", "iopub.status.busy": "2023-11-12T12:41:46.904435Z", "iopub.status.idle": "2023-11-12T12:41:46.906739Z", "shell.execute_reply": "2023-11-12T12:41:46.906369Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import bookutils.setup" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:46.908587Z", "iopub.status.busy": "2023-11-12T12:41:46.908462Z", "iopub.status.idle": "2023-11-12T12:41:47.066481Z", "shell.execute_reply": "2023-11-12T12:41:47.066176Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import DeltaDebugger" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Synopsis\n", "\n", "\n", "To [use the code provided in this chapter](Importing.ipynb), write\n", "\n", "```python\n", ">>> from debuggingbook.DDSetDebugger import \n", "```\n", "\n", "and then make use of the following features.\n", "\n", "\n", "This chapter provides a class `DDSetDebugger`, implementing the DDSET algorithm for generalizing failure-inducing inputs. The `DDSetDebugger` is used as follows:\n", "\n", "```python\n", "with DDSetDebugger(grammar) as dd:\n", " function(args...)\n", "dd\n", "```\n", "\n", "Here, `function(args...)` is a failing function call (= raises an execption) that takes at least one string argument; `grammar` is an [input grammar in fuzzingbook format](https://www.fuzzingbook.org/html/Grammars.html) that matches the format of this argument.\n", "\n", "The result is a call of `function()` with an _abstract failure-inducing input_ – a variant of the conrete input in which parts are replaced by placeholders in the form ``, where `` is a nonterminal in the grammar. The failure has been verified to occur for a number of instantiations of ``.\n", "\n", "Here is an example of how `DDSetDebugger` works. The concrete failing input `\"bar` is generalized to an _abstract failure-inducing input_:\n", "\n", "```python\n", ">>> with DDSetDebugger(SIMPLE_HTML_GRAMMAR) as dd:\n", ">>> remove_html_markup('\"bar')\n", ">>> dd\n", "remove_html_markup(s='\"')\n", "```\n", "The abstract input tells us that the failure occurs for whatever opening and closing HTML tags as long as there is a double quote between them.\n", "\n", "A programmatic interface is available as well. `generalize()` returns a mapping of argument names to (generalized) values:\n", "\n", "```python\n", ">>> dd.generalize()\n", "{'s': '\"'}\n", "```\n", "Using `fuzz()`, the abstract input can be instantiated to further concrete inputs, all set to produce the failure again:\n", "\n", "```python\n", ">>> for i in range(10):\n", ">>> print(dd.fuzz())\n", "remove_html_markup(s='\"1')\n", "remove_html_markup(s='\"c*C')\n", "remove_html_markup(s='\"')\n", "remove_html_markup(s='\")')\n", "remove_html_markup(s='\"')\n", "remove_html_markup(s='\"')\n", "remove_html_markup(s='\"\\t7')\n", "remove_html_markup(s='\"')\n", "remove_html_markup(s='\"2')\n", "remove_html_markup(s='\"\\r~\\t\\r')\n", "\n", "```\n", "`DDSetDebugger` can be customized by passing a subclass of `TreeGeneralizer`, which does the gist of the work; for details, see its constructor.\n", "The full class hierarchy is shown below.\n", "\n", "![](PICS/DDSetDebugger-synopsis-1.svg)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## A Failing Program\n", "\n", "As with previous chapters, we use `remove_html_markup()` as our ongoing example. This function is set to remove HTML markup tags (like ``) from a given string `s`, returning the plain text only. We use the version from [the chapter on asssertions](Assertions.ipynb), using an assertion as postcondition checker." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.068440Z", "iopub.status.busy": "2023-11-12T12:41:47.068298Z", "iopub.status.idle": "2023-11-12T12:41:47.070708Z", "shell.execute_reply": "2023-11-12T12:41:47.070438Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def remove_html_markup(s): # type: ignore\n", " tag = False\n", " quote = False\n", " out = \"\"\n", "\n", " for c in s:\n", " if c == '<' and not quote:\n", " tag = True\n", " elif c == '>' and not quote:\n", " tag = False\n", " elif c == '\"' or c == \"'\" and tag:\n", " quote = not quote\n", " elif not tag:\n", " out = out + c\n", "\n", " # postcondition\n", " assert '<' not in out and '>' not in out\n", "\n", " return out" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "For the most inputs, `remove_html_markup()` works just as expected:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.072265Z", "iopub.status.busy": "2023-11-12T12:41:47.072108Z", "iopub.status.idle": "2023-11-12T12:41:47.074232Z", "shell.execute_reply": "2023-11-12T12:41:47.073982Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'Be quiet, he said'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "remove_html_markup(\"Be quiet, he said\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "There are inputs, however, for which it fails:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.075836Z", "iopub.status.busy": "2023-11-12T12:41:47.075728Z", "iopub.status.idle": "2023-11-12T12:41:47.077323Z", "shell.execute_reply": "2023-11-12T12:41:47.077057Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "BAD_INPUT = '\"bar'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.078851Z", "iopub.status.busy": "2023-11-12T12:41:47.078747Z", "iopub.status.idle": "2023-11-12T12:41:47.080258Z", "shell.execute_reply": "2023-11-12T12:41:47.079977Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ExpectError import ExpectError" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.081907Z", "iopub.status.busy": "2023-11-12T12:41:47.081786Z", "iopub.status.idle": "2023-11-12T12:41:47.083786Z", "shell.execute_reply": "2023-11-12T12:41:47.083538Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_27613/3787550988.py\", line 2, in \n", " remove_html_markup(BAD_INPUT)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_27613/2717035104.py\", line 17, in remove_html_markup\n", " assert '<' not in out and '>' not in out\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError(AssertionError):\n", " remove_html_markup(BAD_INPUT)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.085306Z", "iopub.status.busy": "2023-11-12T12:41:47.085198Z", "iopub.status.idle": "2023-11-12T12:41:47.086761Z", "shell.execute_reply": "2023-11-12T12:41:47.086522Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import quiz" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In contrast to the other chapters, our aim now is not to immediately go and debug `remove_html_markup()`. Instead, we focus on another important question: \n", "\n", "> Under which conditions precisely does `remove_html_markup()` fail?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This question can be generalized to\n", "\n", "> What is the set of inputs for which `remove_html_markup()` fails?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Our plan for this is to _generalize_ concrete inputs (such as `BAD_INPUTS`) into an *abstract failure-inducing inputs*. These are patterns formed from a concrete input, but in which specific _placeholders_ indicate sets of inputs that are permitted. In the abstract failure-inducing input\n", "\n", "```html\n", "\"bar\n", "```\n", "\n", "for instance, `` and `` are placeholders for opening and closing HTML tags, respectively. The pattern indicates that any opening HTML tag and closing HTML tag can be present in the input, as long as the enclosed text reads `\"bar`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Given a concrete failure-inducing input, our aim is to _generalize_ it as much as possible to such an abstract failure-inducing input. The resulting pattern should then\n", "\n", "* capture the _circumstances_ under which the program fails;\n", "* allow for _test generation_ by instantiating the placeholders;\n", "* help ensuring our fix is as _general as possible_." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.088429Z", "iopub.status.busy": "2023-11-12T12:41:47.088321Z", "iopub.status.idle": "2023-11-12T12:41:47.092989Z", "shell.execute_reply": "2023-11-12T12:41:47.092720Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

If s = '<foo>\"bar</foo>' (i.e., BAD_INPUT), what is the value of out such that the assertion fails?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"If `s = '\\\"bar'` (i.e., `BAD_INPUT`), \"\n", " \"what is the value of `out` such that the assertion fails?\",\n", " [\n", " '`bar`',\n", " '`bar`',\n", " '`\"bar`',\n", " '`\"bar`',\n", " ], '9999999 // 4999999')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Grammars\n", "\n", "To determine abstract failure-inducing inputs, we need means to determine and characterize _sets of inputs_ – known in computer science as _languages_. To formally describe languages, the field of *formal languages* has devised a number of *language specifications* that describe a language. *Regular expressions* represent the simplest class of these languages to denote sets of strings: The regular expression `[a-z]*`, for instance, denotes a (possibly empty) sequence of lowercase letters. *Automata theory* connects these languages to automata that accept these inputs; *finite state machines*, for instance, can be used to specify the language of regular expressions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Regular expressions are great for not-too-complex input formats, and the associated finite state machines have many properties that make them great for reasoning. To specify more complex inputs, though, they quickly encounter limitations. At the other end of the language spectrum, we have *universal grammars* that denote the language accepted by *Turing machines*. A Turing machine can compute anything that can be computed; and with Python being Turing-complete, this means that we can also use a Python program $p$ to specify or even enumerate legal inputs. But then, computer science theory also tells us that each such program has to be written specifically for the input to be considered, which is not the level of automation we want." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The middle ground between regular expressions and Turing machines is covered by *grammars*. Grammars are among the most popular (and best understood) formalisms to formally specify input languages. Using a grammar, one can express a wide range of the properties of an input language. Grammars are particularly great for expressing the *syntactical structure* of an input, and are the formalism of choice to express nested or recursive inputs. The grammars we use are so-called *context-free grammars*, one of the easiest and most popular grammar formalisms." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A grammar is defined as a mapping of _nonterminal_ symbols (denoted in ``) to lists of alternative _expansions_, which are strings containing _terminal_ symbols and possibly more _nonterminal_ symbols. To make the writing of grammars as simple as possible, we adopt the [fuzzingbook](https://www.fuzzingbook.org/) format that is based on strings and lists." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.094710Z", "iopub.status.busy": "2023-11-12T12:41:47.094608Z", "iopub.status.idle": "2023-11-12T12:41:47.098033Z", "shell.execute_reply": "2023-11-12T12:41:47.097772Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import fuzzingbook" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Fuzzingbook grammars take the format of a _mapping_ between symbol names and expansions, where expansions are _lists_ of alternatives." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.099695Z", "iopub.status.busy": "2023-11-12T12:41:47.099598Z", "iopub.status.idle": "2023-11-12T12:41:47.101354Z", "shell.execute_reply": "2023-11-12T12:41:47.101100Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "from typing import Any, Callable, Optional, Type, Tuple\n", "from typing import Dict, Union, List, cast, Generator" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.102874Z", "iopub.status.busy": "2023-11-12T12:41:47.102786Z", "iopub.status.idle": "2023-11-12T12:41:47.104723Z", "shell.execute_reply": "2023-11-12T12:41:47.104415Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "Grammar = Dict[str, # A grammar maps strings...\n", " List[\n", " Union[str, # to list of strings...\n", " Tuple[str, Dict[str, Any]] # or to pairs of strings and attributes.\n", " ]\n", " ]\n", " ]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A one-rule grammar for digits thus takes the form" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.106255Z", "iopub.status.busy": "2023-11-12T12:41:47.106167Z", "iopub.status.idle": "2023-11-12T12:41:47.107841Z", "shell.execute_reply": "2023-11-12T12:41:47.107547Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "DIGIT_GRAMMAR: Grammar = {\n", " \"\":\n", " [\"0\", \"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\"]\n", "}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "which means that the `` symbol can be expanded into any of the digits listed." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A full grammar for arithmetic expressions looks like this:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.109580Z", "iopub.status.busy": "2023-11-12T12:41:47.109484Z", "iopub.status.idle": "2023-11-12T12:41:47.111683Z", "shell.execute_reply": "2023-11-12T12:41:47.111433Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "EXPR_GRAMMAR: Grammar = {\n", " \"\":\n", " [\"\"],\n", "\n", " \"\":\n", " [\" + \", \" - \", \"\"],\n", "\n", " \"\":\n", " [\" * \", \" / \", \"\"],\n", "\n", " \"\":\n", " [\"+\",\n", " \"-\",\n", " \"()\",\n", " \".\",\n", " \"\"],\n", "\n", " \"\":\n", " [\"\", \"\"],\n", "\n", " \"\":\n", " [\"0\", \"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\"]\n", "}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "From such a grammar, one can easily generate inputs that conform to the grammar." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.113255Z", "iopub.status.busy": "2023-11-12T12:41:47.113167Z", "iopub.status.idle": "2023-11-12T12:41:47.122158Z", "shell.execute_reply": "2023-11-12T12:41:47.121855Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from fuzzingbook.GrammarFuzzer import GrammarFuzzer" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.123696Z", "iopub.status.busy": "2023-11-12T12:41:47.123594Z", "iopub.status.idle": "2023-11-12T12:41:47.125378Z", "shell.execute_reply": "2023-11-12T12:41:47.125110Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "simple_expr_fuzzer = GrammarFuzzer(EXPR_GRAMMAR)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.126876Z", "iopub.status.busy": "2023-11-12T12:41:47.126771Z", "iopub.status.idle": "2023-11-12T12:41:47.171623Z", "shell.execute_reply": "2023-11-12T12:41:47.171213Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.8 + --62.912 - ++4 - +5 * 3.0 * 4\n", "7 * (75.5 - -6 + 5 - 4) + -(8 - 1) / 5 * 2\n", "(-(9) * +6 + 9 / 3 * 8 - 9 * 8 / 7) / -+-65\n", "(9 + 8) * 2 * (6 + 6 + 9) * 0 * 1.9 * 0\n", "(1 * 7 - 9 + 5) * 5 / 0 * 5 + 7 * 5 * 7\n", "-(6 / 9 - 5 - 3 - 1) - -1 / +1 + (9) / (8) * 6\n", "(+-(0 - (1) * 7 / 3)) / ((1 * 3 + 8) + 9 - +1 / --0) - 5 * (-+939.491)\n", "+2.9 * 0 / 501.19814 / --+--(6.05002)\n", "+-8.8 / (1) * -+1 + -8 + 9 - 3 / 8 * 6 + 4 * 3 * 5\n", "(+(8 / 9 - 1 - 7)) + ---06.30 / +4.39\n" ] } ], "source": [ "for i in range(10):\n", " fuzz_expr = simple_expr_fuzzer.fuzz()\n", " print(fuzz_expr)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Nonterminals as found in the grammar make natural _placeholders_ in abstract failure-inducing inputs. If we know, for instance, that it is not just the concrete failure-inducing input\n", "\n", "```python\n", "(2 * 3)\n", "```\n", "\n", "but the abstract failure-inducing input\n", "\n", "```html\n", "( * )\n", "```\n", "\n", "that causes the failure, we immediately see that the error is due to the multiplication operator rather than its operands." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Coming back to our `remove_html_markup()` example, let us create a simple grammar for HTML expressions. A `` element is either plain text or tagged text." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "button": false, "execution": { "iopub.execute_input": "2023-11-12T12:41:47.173621Z", "iopub.status.busy": "2023-11-12T12:41:47.173492Z", "iopub.status.idle": "2023-11-12T12:41:47.175787Z", "shell.execute_reply": "2023-11-12T12:41:47.175449Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "SIMPLE_HTML_GRAMMAR: Grammar = {\n", " \"\":\n", " [\"\"],\n", "\n", " \"\":\n", " [\"\", \"\"],\n", "}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Plain text is a simple (possibly empty) sequence of letter, digits, punctuation, and whitespace. (Note how `` is either empty or some character followed by more plain text.) The characters `<` and `>` are not allowed, though." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.177441Z", "iopub.status.busy": "2023-11-12T12:41:47.177308Z", "iopub.status.idle": "2023-11-12T12:41:47.179279Z", "shell.execute_reply": "2023-11-12T12:41:47.178978Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import string" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "button": false, "execution": { "iopub.execute_input": "2023-11-12T12:41:47.180739Z", "iopub.status.busy": "2023-11-12T12:41:47.180619Z", "iopub.status.idle": "2023-11-12T12:41:47.182839Z", "shell.execute_reply": "2023-11-12T12:41:47.182531Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "SIMPLE_HTML_GRAMMAR.update({\n", " \"\":\n", " [\"\", \"\"],\n", "\n", " \"\":\n", " [\"\", \"\", \"\", \"\"],\n", "\n", " \"\": list(string.ascii_letters),\n", " \"\": list(string.digits),\n", " \"\": list(string.punctuation.replace('<', '').replace('>', '')),\n", " \"\": list(string.whitespace)\n", "})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Tagged text is a bit more complicated. We have opening tags ``, followed by some more HTML material, and then closed by a closing tag ``. (We do not insist that the two tags match.) A self-closing tag has the form `
`. For compatibility reasons, we also allow just opening tags without closing tags, as in ``." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "button": false, "execution": { "iopub.execute_input": "2023-11-12T12:41:47.184379Z", "iopub.status.busy": "2023-11-12T12:41:47.184271Z", "iopub.status.idle": "2023-11-12T12:41:47.186021Z", "shell.execute_reply": "2023-11-12T12:41:47.185765Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "SIMPLE_HTML_GRAMMAR.update({\n", " \"\":\n", " [\"\",\n", " \"\",\n", " \"\"],\n", "})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Since the characters `<` and `>` are already reserved for denoting nonterminal symbols, we use the special nonterminal symbols `` and `` that expand into `<` and `>`, respectively," ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "button": false, "execution": { "iopub.execute_input": "2023-11-12T12:41:47.187477Z", "iopub.status.busy": "2023-11-12T12:41:47.187375Z", "iopub.status.idle": "2023-11-12T12:41:47.189265Z", "shell.execute_reply": "2023-11-12T12:41:47.189001Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "SIMPLE_HTML_GRAMMAR.update({\n", " \"\":\n", " [\"\",\n", " \"\"],\n", "\n", " \"\": [\"<\"],\n", " \"\": [\">\"],\n", "\n", " \"\":\n", " [\"\", \"\", \"\"],\n", "\n", " \"\":\n", " [\"/\"],\n", "\n", " \"\":\n", " [\"/\"],\n", "})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Finally, HTML tags can have attributes, which are enclosed in quotes." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "button": false, "execution": { "iopub.execute_input": "2023-11-12T12:41:47.190623Z", "iopub.status.busy": "2023-11-12T12:41:47.190519Z", "iopub.status.idle": "2023-11-12T12:41:47.192383Z", "shell.execute_reply": "2023-11-12T12:41:47.192085Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "SIMPLE_HTML_GRAMMAR.update({\n", " \"\":\n", " [\"\", \"\" ],\n", "\n", " \"\":\n", " [\" =''\",\n", " ' =\"\"'],\n", "})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Again, we can generate inputs from the grammar." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.194283Z", "iopub.status.busy": "2023-11-12T12:41:47.194123Z", "iopub.status.idle": "2023-11-12T12:41:47.196232Z", "shell.execute_reply": "2023-11-12T12:41:47.195808Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "simple_html_fuzzer = GrammarFuzzer(SIMPLE_HTML_GRAMMAR)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.198041Z", "iopub.status.busy": "2023-11-12T12:41:47.197927Z", "iopub.status.idle": "2023-11-12T12:41:47.217544Z", "shell.execute_reply": "2023-11-12T12:41:47.217314Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "''\n", "'9'\n", "'\\x0b'\n", "' ea\\\\\\\\'\n", "'&7'\n", "\"\"\n", "''\n", "''\n", "''\n", "''\n" ] } ], "source": [ "for i in range(10):\n", " fuzz_html = simple_html_fuzzer.fuzz()\n", " print(repr(fuzz_html))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Such inputs, of course, are great for systematic testing. Our sister book, [the fuzzing book](https://www.fuzzingbook.org/), covers these and more." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Derivation Trees\n", "\n", "To produce inputs from a grammar, the fuzzingbook `GrammarFuzzer` makes use of a structure called a *derivation tree* (also known as *syntax tree*). A derivation tree encodes the individual expansion steps undertaken while producing the output." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.219097Z", "iopub.status.busy": "2023-11-12T12:41:47.218975Z", "iopub.status.idle": "2023-11-12T12:41:47.220580Z", "shell.execute_reply": "2023-11-12T12:41:47.220343Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "DerivationTree = Tuple[str, Optional[List[Any]]]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us illustrate derivation trees by example, using the last HTML output we produced." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.222020Z", "iopub.status.busy": "2023-11-12T12:41:47.221926Z", "iopub.status.idle": "2023-11-12T12:41:47.223946Z", "shell.execute_reply": "2023-11-12T12:41:47.223692Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "''" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fuzz_html" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `GrammarFuzzer` attribute `derivation_tree` holds the last tree used to produce this input. We can visualize the tree as follows:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.225515Z", "iopub.status.busy": "2023-11-12T12:41:47.225407Z", "iopub.status.idle": "2023-11-12T12:41:47.226994Z", "shell.execute_reply": "2023-11-12T12:41:47.226718Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "from graphviz import Digraph" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.228378Z", "iopub.status.busy": "2023-11-12T12:41:47.228290Z", "iopub.status.idle": "2023-11-12T12:41:47.231133Z", "shell.execute_reply": "2023-11-12T12:41:47.230875Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# ignore\n", "def display_tree(tree: DerivationTree) -> Digraph:\n", " def graph_attr(dot: Digraph) -> None:\n", " dot.attr('node', shape='box', color='white', margin='0.0,0.0')\n", " dot.attr('node',\n", " fontname=\"'Fira Mono', 'Source Code Pro', 'Courier', monospace\")\n", "\n", " def node_attr(dot: Digraph, nid: str, symbol: str, ann: str) -> None:\n", " fuzzingbook.GrammarFuzzer.default_node_attr(dot, nid, symbol, ann)\n", " if symbol.startswith('<'):\n", " dot.node(repr(nid), fontcolor='#0060a0')\n", " else:\n", " dot.node(repr(nid), fontcolor='#00a060')\n", " dot.node(repr(nid), scale='2')\n", "\n", " return fuzzingbook.GrammarFuzzer.display_tree(tree,\n", " node_attr=node_attr,\n", " graph_attr=graph_attr)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.232588Z", "iopub.status.busy": "2023-11-12T12:41:47.232499Z", "iopub.status.idle": "2023-11-12T12:41:47.644798Z", "shell.execute_reply": "2023-11-12T12:41:47.644426Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "0\n", "\n", "<start>\n", "\n", "\n", "\n", "1\n", "\n", "<html>\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "\n", "\n", "\n", "2\n", "\n", "<tagged-text>\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "\n", "3\n", "\n", "<self-closing-tag>\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "\n", "\n", "\n", "4\n", "\n", "<lt>\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "\n", "6\n", "\n", "<id>\n", "\n", "\n", "\n", "3->6\n", "\n", "\n", "\n", "\n", "\n", "21\n", "\n", "<attrs>\n", "\n", "\n", "\n", "3->21\n", "\n", "\n", "\n", "\n", "\n", "34\n", "\n", "/ (47)\n", "\n", "\n", "\n", "3->34\n", "\n", "\n", "\n", "\n", "\n", "35\n", "\n", "<gt>\n", "\n", "\n", "\n", "3->35\n", "\n", "\n", "\n", "\n", "\n", "5\n", "\n", "< (60)\n", "\n", "\n", "\n", "4->5\n", "\n", "\n", "\n", "\n", "\n", "7\n", "\n", "<id>\n", "\n", "\n", "\n", "6->7\n", "\n", "\n", "\n", "\n", "\n", "19\n", "\n", "<digit>\n", "\n", "\n", "\n", "6->19\n", "\n", "\n", "\n", "\n", "\n", "8\n", "\n", "<id>\n", "\n", "\n", "\n", "7->8\n", "\n", "\n", "\n", "\n", "\n", "17\n", "\n", "<digit>\n", "\n", "\n", "\n", "7->17\n", "\n", "\n", "\n", "\n", "\n", "9\n", "\n", "<id>\n", "\n", "\n", "\n", "8->9\n", "\n", "\n", "\n", "\n", "\n", "15\n", "\n", "<letter>\n", "\n", "\n", "\n", "8->15\n", "\n", "\n", "\n", "\n", "\n", "10\n", "\n", "<id>\n", "\n", "\n", "\n", "9->10\n", "\n", "\n", "\n", "\n", "\n", "13\n", "\n", "<letter>\n", "\n", "\n", "\n", "9->13\n", "\n", "\n", "\n", "\n", "\n", "11\n", "\n", "<letter>\n", "\n", "\n", "\n", "10->11\n", "\n", "\n", "\n", "\n", "\n", "12\n", "\n", "F (70)\n", "\n", "\n", "\n", "11->12\n", "\n", "\n", "\n", "\n", "\n", "14\n", "\n", "Q (81)\n", "\n", "\n", "\n", "13->14\n", "\n", "\n", "\n", "\n", "\n", "16\n", "\n", "c (99)\n", "\n", "\n", "\n", "15->16\n", "\n", "\n", "\n", "\n", "\n", "18\n", "\n", "9 (57)\n", "\n", "\n", "\n", "17->18\n", "\n", "\n", "\n", "\n", "\n", "20\n", "\n", "0 (48)\n", "\n", "\n", "\n", "19->20\n", "\n", "\n", "\n", "\n", "\n", "22\n", "\n", "<attr>\n", "\n", "\n", "\n", "21->22\n", "\n", "\n", "\n", "\n", "\n", "23\n", "\n", "  (32)\n", "\n", "\n", "\n", "22->23\n", "\n", "\n", "\n", "\n", "\n", "24\n", "\n", "<id>\n", "\n", "\n", "\n", "22->24\n", "\n", "\n", "\n", "\n", "\n", "30\n", "\n", "="\n", "\n", "\n", "\n", "22->30\n", "\n", "\n", "\n", "\n", "\n", "31\n", "\n", "<plain-text>\n", "\n", "\n", "\n", "22->31\n", "\n", "\n", "\n", "\n", "\n", "33\n", "\n", "" (34)\n", "\n", "\n", "\n", "22->33\n", "\n", "\n", "\n", "\n", "\n", "25\n", "\n", "<id>\n", "\n", "\n", "\n", "24->25\n", "\n", "\n", "\n", "\n", "\n", "28\n", "\n", "<letter>\n", "\n", "\n", "\n", "24->28\n", "\n", "\n", "\n", "\n", "\n", "26\n", "\n", "<letter>\n", "\n", "\n", "\n", "25->26\n", "\n", "\n", "\n", "\n", "\n", "27\n", "\n", "W (87)\n", "\n", "\n", "\n", "26->27\n", "\n", "\n", "\n", "\n", "\n", "29\n", "\n", "t (116)\n", "\n", "\n", "\n", "28->29\n", "\n", "\n", "\n", "\n", "\n", "32\n", "\n", "\n", "\n", "\n", "31->32\n", "\n", "\n", "\n", "\n", "\n", "36\n", "\n", "> (62)\n", "\n", "\n", "\n", "35->36\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "display_tree(simple_html_fuzzer.derivation_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "From top to bottom, we see that the input was constructed from a `` symbol, which then expanded into `html`, which then expanded into HTML text, and so on. Multiple children in a tree stand for a concatenation of individual symbols." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Internally, these trees come as pairs `(symbol, children)`, where `symbol` is the name of a node (say, ``), and `children` is a (possibly empty) list of subtrees. Here are the topmost nodes of the above tree:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.647021Z", "iopub.status.busy": "2023-11-12T12:41:47.646829Z", "iopub.status.idle": "2023-11-12T12:41:47.648824Z", "shell.execute_reply": "2023-11-12T12:41:47.648533Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import pprint" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.650348Z", "iopub.status.busy": "2023-11-12T12:41:47.650194Z", "iopub.status.idle": "2023-11-12T12:41:47.652254Z", "shell.execute_reply": "2023-11-12T12:41:47.651950Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('', [('', [('', [('', [...])])])])\n" ] } ], "source": [ "pp = pprint.PrettyPrinter(depth=7)\n", "pp.pprint(simple_html_fuzzer.derivation_tree)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To produce abstract failure-inducing patterns, we will work on this very structure. The idea is to\n", "\n", "1. systematically replace subtrees by other, generated, compatible subtrees (e.g. replace one `` subtree in the concrete input by some other generated `` subtree);\n", "2. see whether these subtrees also result in failures; and\n", "3. if they do, use the nonterminal (``) as a placeholder in the pattern.\n", "\n", "This will involve some subtree manipulation, construction, and finally testing. First, though, we need to be able to turn an _existing input_ into a derivation tree." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Parsing\n", "\n", "The activity of creating a structure out of an unstructured input is called _parsing_. Generally speaking, a _parser_ uses a _grammar_ to create a _derivation tree_ (also called *parse tree* in parsing contexts) from a string input." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Again, there's a whole body of theory (and practice!) around constructing parsers. We make our life simple by using an existing parser (again, from [the fuzzing book](https://www.fuzzingbook.org/Parser.html)), which does just what we want. The `EarleyParser` is instantiated with a grammar such as `SIMPLE_HTML_GRAMMAR`:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.653866Z", "iopub.status.busy": "2023-11-12T12:41:47.653745Z", "iopub.status.idle": "2023-11-12T12:41:47.657771Z", "shell.execute_reply": "2023-11-12T12:41:47.657494Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from fuzzingbook.Parser import Parser, EarleyParser # minor dependency" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.659289Z", "iopub.status.busy": "2023-11-12T12:41:47.659194Z", "iopub.status.idle": "2023-11-12T12:41:47.661313Z", "shell.execute_reply": "2023-11-12T12:41:47.661052Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "simple_html_parser = EarleyParser(SIMPLE_HTML_GRAMMAR)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Its method `parse()` returns an iterator over multiple possible derivation trees. (There can be multiple trees because the grammar could be ambiguous). We are only interested in the first such tree. Let us parse `BAD_INPUT` and inspect the resulting ~parse tree~ ~syntax tree~ derivation tree:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.662724Z", "iopub.status.busy": "2023-11-12T12:41:47.662630Z", "iopub.status.idle": "2023-11-12T12:41:47.668172Z", "shell.execute_reply": "2023-11-12T12:41:47.667884Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "bad_input_tree = list(simple_html_parser.parse(BAD_INPUT))[0]" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:47.669799Z", "iopub.status.busy": "2023-11-12T12:41:47.669669Z", "iopub.status.idle": "2023-11-12T12:41:48.066653Z", "shell.execute_reply": "2023-11-12T12:41:48.066241Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "0\n", "\n", "<start>\n", "\n", "\n", "\n", "1\n", "\n", "<html>\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "\n", "\n", "\n", "2\n", "\n", "<tagged-text>\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "\n", "3\n", "\n", "<opening-tag>\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "\n", "\n", "\n", "17\n", "\n", "<html>\n", "\n", "\n", "\n", "2->17\n", "\n", "\n", "\n", "\n", "\n", "35\n", "\n", "<closing-tag>\n", "\n", "\n", "\n", "2->35\n", "\n", "\n", "\n", "\n", "\n", "4\n", "\n", "<lt>\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "\n", "6\n", "\n", "<id>\n", "\n", "\n", "\n", "3->6\n", "\n", "\n", "\n", "\n", "\n", "15\n", "\n", "<gt>\n", "\n", "\n", "\n", "3->15\n", "\n", "\n", "\n", "\n", "\n", "5\n", "\n", "< (60)\n", "\n", "\n", "\n", "4->5\n", "\n", "\n", "\n", "\n", "\n", "7\n", "\n", "<id>\n", "\n", "\n", "\n", "6->7\n", "\n", "\n", "\n", "\n", "\n", "13\n", "\n", "<letter>\n", "\n", "\n", "\n", "6->13\n", "\n", "\n", "\n", "\n", "\n", "8\n", "\n", "<id>\n", "\n", "\n", "\n", "7->8\n", "\n", "\n", "\n", "\n", "\n", "11\n", "\n", "<letter>\n", "\n", "\n", "\n", "7->11\n", "\n", "\n", "\n", "\n", "\n", "9\n", "\n", "<letter>\n", "\n", "\n", "\n", "8->9\n", "\n", "\n", "\n", "\n", "\n", "10\n", "\n", "f (102)\n", "\n", "\n", "\n", "9->10\n", "\n", "\n", "\n", "\n", "\n", "12\n", "\n", "o (111)\n", "\n", "\n", "\n", "11->12\n", "\n", "\n", "\n", "\n", "\n", "14\n", "\n", "o (111)\n", "\n", "\n", "\n", "13->14\n", "\n", "\n", "\n", "\n", "\n", "16\n", "\n", "> (62)\n", "\n", "\n", "\n", "15->16\n", "\n", "\n", "\n", "\n", "\n", "18\n", "\n", "<plain-text>\n", "\n", "\n", "\n", "17->18\n", "\n", "\n", "\n", "\n", "\n", "19\n", "\n", "<plain-char>\n", "\n", "\n", "\n", "18->19\n", "\n", "\n", "\n", "\n", "\n", "22\n", "\n", "<plain-text>\n", "\n", "\n", "\n", "18->22\n", "\n", "\n", "\n", "\n", "\n", "20\n", "\n", "<other>\n", "\n", "\n", "\n", "19->20\n", "\n", "\n", "\n", "\n", "\n", "21\n", "\n", "" (34)\n", "\n", "\n", "\n", "20->21\n", "\n", "\n", "\n", "\n", "\n", "23\n", "\n", "<plain-char>\n", "\n", "\n", "\n", "22->23\n", "\n", "\n", "\n", "\n", "\n", "26\n", "\n", "<plain-text>\n", "\n", "\n", "\n", "22->26\n", "\n", "\n", "\n", "\n", "\n", "24\n", "\n", "<letter>\n", "\n", "\n", "\n", "23->24\n", "\n", "\n", "\n", "\n", "\n", "25\n", "\n", "b (98)\n", "\n", "\n", "\n", "24->25\n", "\n", "\n", "\n", "\n", "\n", "27\n", "\n", "<plain-char>\n", "\n", "\n", "\n", "26->27\n", "\n", "\n", "\n", "\n", "\n", "30\n", "\n", "<plain-text>\n", "\n", "\n", "\n", "26->30\n", "\n", "\n", "\n", "\n", "\n", "28\n", "\n", "<letter>\n", "\n", "\n", "\n", "27->28\n", "\n", "\n", "\n", "\n", "\n", "29\n", "\n", "a (97)\n", "\n", "\n", "\n", "28->29\n", "\n", "\n", "\n", "\n", "\n", "31\n", "\n", "<plain-char>\n", "\n", "\n", "\n", "30->31\n", "\n", "\n", "\n", "\n", "\n", "34\n", "\n", "<plain-text>\n", "\n", "\n", "\n", "30->34\n", "\n", "\n", "\n", "\n", "\n", "32\n", "\n", "<letter>\n", "\n", "\n", "\n", "31->32\n", "\n", "\n", "\n", "\n", "\n", "33\n", "\n", "r (114)\n", "\n", "\n", "\n", "32->33\n", "\n", "\n", "\n", "\n", "\n", "36\n", "\n", "<lt>\n", "\n", "\n", "\n", "35->36\n", "\n", "\n", "\n", "\n", "\n", "38\n", "\n", "/ (47)\n", "\n", "\n", "\n", "35->38\n", "\n", "\n", "\n", "\n", "\n", "39\n", "\n", "<id>\n", "\n", "\n", "\n", "35->39\n", "\n", "\n", "\n", "\n", "\n", "48\n", "\n", "<gt>\n", "\n", "\n", "\n", "35->48\n", "\n", "\n", "\n", "\n", "\n", "37\n", "\n", "< (60)\n", "\n", "\n", "\n", "36->37\n", "\n", "\n", "\n", "\n", "\n", "40\n", "\n", "<id>\n", "\n", "\n", "\n", "39->40\n", "\n", "\n", "\n", "\n", "\n", "46\n", "\n", "<letter>\n", "\n", "\n", "\n", "39->46\n", "\n", "\n", "\n", "\n", "\n", "41\n", "\n", "<id>\n", "\n", "\n", "\n", "40->41\n", "\n", "\n", "\n", "\n", "\n", "44\n", "\n", "<letter>\n", "\n", "\n", "\n", "40->44\n", "\n", "\n", "\n", "\n", "\n", "42\n", "\n", "<letter>\n", "\n", "\n", "\n", "41->42\n", "\n", "\n", "\n", "\n", "\n", "43\n", "\n", "f (102)\n", "\n", "\n", "\n", "42->43\n", "\n", "\n", "\n", "\n", "\n", "45\n", "\n", "o (111)\n", "\n", "\n", "\n", "44->45\n", "\n", "\n", "\n", "\n", "\n", "47\n", "\n", "o (111)\n", "\n", "\n", "\n", "46->47\n", "\n", "\n", "\n", "\n", "\n", "49\n", "\n", "> (62)\n", "\n", "\n", "\n", "48->49\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "display_tree(bad_input_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This derivation tree has the same structure as the one created from our `GrammarFuzzer` above. We see how the `` is composed of three elements:\n", "\n", "1. an`` (``);\n", "2. a `` element which becomes `` (`\"bar`); and\n", "3. a `` (``)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can easily turn the tree back into a string. The method `tree_to_string()` traverses the tree left-to-right and joins all nonterminal symbols." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.068652Z", "iopub.status.busy": "2023-11-12T12:41:48.068522Z", "iopub.status.idle": "2023-11-12T12:41:48.070565Z", "shell.execute_reply": "2023-11-12T12:41:48.070301Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from fuzzingbook.GrammarFuzzer import tree_to_string, all_terminals" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.072173Z", "iopub.status.busy": "2023-11-12T12:41:48.072057Z", "iopub.status.idle": "2023-11-12T12:41:48.074300Z", "shell.execute_reply": "2023-11-12T12:41:48.074000Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'\"bar'" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tree_to_string(bad_input_tree)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.075703Z", "iopub.status.busy": "2023-11-12T12:41:48.075595Z", "iopub.status.idle": "2023-11-12T12:41:48.077201Z", "shell.execute_reply": "2023-11-12T12:41:48.076967Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert tree_to_string(bad_input_tree) == BAD_INPUT" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "With this, we can now\n", "\n", "* parse an input into a tree structure;\n", "* (re-)create parts of the tree structure; and\n", "* turn the tree back into an input string." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Mutating the Tree\n", "\n", "We introduce a class `TreeMutator` that is set to mutate a tree. Its constructor takes a grammar and a tree." ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.078902Z", "iopub.status.busy": "2023-11-12T12:41:48.078782Z", "iopub.status.idle": "2023-11-12T12:41:48.080424Z", "shell.execute_reply": "2023-11-12T12:41:48.080169Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from fuzzingbook.Grammars import is_valid_grammar" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.081919Z", "iopub.status.busy": "2023-11-12T12:41:48.081811Z", "iopub.status.idle": "2023-11-12T12:41:48.084301Z", "shell.execute_reply": "2023-11-12T12:41:48.084039Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TreeMutator:\n", " \"\"\"Grammar-based mutations of derivation trees.\"\"\"\n", "\n", " def __init__(self, grammar: Grammar, tree: DerivationTree,\n", " fuzzer: Optional[GrammarFuzzer] = None, log: Union[bool, int] = False):\n", " \"\"\"\n", " Constructor. \n", " `grammar` is the underlying grammar; \n", " `tree` is the tree to work on.\n", " `fuzzer` is the grammar fuzzer to use (default: `GrammarFuzzer`)\n", " \"\"\"\n", "\n", " assert is_valid_grammar(grammar)\n", " self.grammar = grammar\n", " self.tree = tree\n", " self.log = log\n", "\n", " if fuzzer is None:\n", " fuzzer = GrammarFuzzer(grammar)\n", "\n", " self.fuzzer = fuzzer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Referencing Subtrees" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To reference individual elements in the tree, we introduce the concept of a _path_. A path is a list of numbers indicating the children (starting with 0) we should follow. A path `[0, 0, 0, ..., 0]` stands for the leftmost child in a tree." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.085982Z", "iopub.status.busy": "2023-11-12T12:41:48.085866Z", "iopub.status.idle": "2023-11-12T12:41:48.087610Z", "shell.execute_reply": "2023-11-12T12:41:48.087334Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "TreePath = List[int]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The method `get_subtree()` returns the subtree for a given path." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.089248Z", "iopub.status.busy": "2023-11-12T12:41:48.089093Z", "iopub.status.idle": "2023-11-12T12:41:48.091456Z", "shell.execute_reply": "2023-11-12T12:41:48.091171Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TreeMutator(TreeMutator):\n", " def get_subtree(self, path: TreePath, tree: Optional[DerivationTree] = None) -> DerivationTree:\n", " \"\"\"Access a subtree based on `path` (a list of children numbers)\"\"\"\n", " if tree is None:\n", " tree = self.tree\n", "\n", " symbol, children = tree\n", "\n", " if not path or children is None:\n", " return tree\n", "\n", " return self.get_subtree(path[1:], children[path[0]])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here's `get_subtree()` in action. We instantiate a `TreeMutator` on the `BAD_INPUT` tree as shown above and return the element at the path `[0, 0, 1, 0]` – i.e. follow the leftmost edge twice, than the second-to-leftmost edge, then the leftmost edge again. This gives us the `` subtree representing the string `\"bar`:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.093045Z", "iopub.status.busy": "2023-11-12T12:41:48.092900Z", "iopub.status.idle": "2023-11-12T12:41:48.094730Z", "shell.execute_reply": "2023-11-12T12:41:48.094435Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def bad_input_tree_mutator() -> TreeMutator:\n", " return TreeMutator(SIMPLE_HTML_GRAMMAR, bad_input_tree, log=2) " ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.096447Z", "iopub.status.busy": "2023-11-12T12:41:48.096329Z", "iopub.status.idle": "2023-11-12T12:41:48.098655Z", "shell.execute_reply": "2023-11-12T12:41:48.098395Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('',\n", " [('', [('', [('\"', [])])]),\n", " ('',\n", " [('', [('', [...])]),\n", " ('', [('', [...]), ('', [...])])])])\n" ] } ], "source": [ "plain_text_subtree = bad_input_tree_mutator().get_subtree([0, 0, 1, 0])\n", "pp.pprint(plain_text_subtree)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.100439Z", "iopub.status.busy": "2023-11-12T12:41:48.100246Z", "iopub.status.idle": "2023-11-12T12:41:48.102589Z", "shell.execute_reply": "2023-11-12T12:41:48.102294Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'\"bar'" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tree_to_string(plain_text_subtree)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.104348Z", "iopub.status.busy": "2023-11-12T12:41:48.104205Z", "iopub.status.idle": "2023-11-12T12:41:48.106571Z", "shell.execute_reply": "2023-11-12T12:41:48.106291Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# ignore\n", "def primes_generator() -> Generator[int, None, None]:\n", " # Adapted from https://www.python.org/ftp/python/doc/nluug-paper.ps\n", " primes = [2]\n", " yield 2\n", " i = 3\n", " while True:\n", " for p in primes:\n", " if i % p == 0 or p * p > i:\n", " break\n", "\n", " if i % p != 0:\n", " primes.append(i)\n", " yield i\n", "\n", " i += 2" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.108257Z", "iopub.status.busy": "2023-11-12T12:41:48.108105Z", "iopub.status.idle": "2023-11-12T12:41:48.109942Z", "shell.execute_reply": "2023-11-12T12:41:48.109639Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# ignore\n", "prime_numbers = primes_generator()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.111690Z", "iopub.status.busy": "2023-11-12T12:41:48.111531Z", "iopub.status.idle": "2023-11-12T12:41:48.116657Z", "shell.execute_reply": "2023-11-12T12:41:48.116354Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

In bad_input_tree, what is the subtree at the path [0, 0, 2, 1] as string?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"In `bad_input_tree`, what is \"\n", " \" the subtree at the path `[0, 0, 2, 1]` as string?\", \n", " [\n", " f\"`{tree_to_string(bad_input_tree_mutator().get_subtree([0, 0, 2, 0]))}`\",\n", " f\"`{tree_to_string(bad_input_tree_mutator().get_subtree([0, 0, 2, 1]))}`\",\n", " f\"`{tree_to_string(bad_input_tree_mutator().get_subtree([0, 0, 2]))}`\",\n", " f\"`{tree_to_string(bad_input_tree_mutator().get_subtree([0, 0, 0]))}`\",\n", " ], 'next(prime_numbers)', globals()\n", " )" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Creating new Subtrees\n", "\n", "The method `new_tree()` creates a new subtree for the given `` according to the rules of the grammar. It invokes `expand_tree()` on the given `GrammarFuzzer` – a method that takes an initial (empty) tree and expands it until no more expansions are left." ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.118284Z", "iopub.status.busy": "2023-11-12T12:41:48.118162Z", "iopub.status.idle": "2023-11-12T12:41:48.120395Z", "shell.execute_reply": "2023-11-12T12:41:48.120118Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TreeMutator(TreeMutator):\n", " def new_tree(self, start_symbol: str) -> DerivationTree:\n", " \"\"\"Create a new subtree for .\"\"\"\n", "\n", " if self.log >= 2:\n", " print(f\"Creating new tree for {start_symbol}\")\n", "\n", " tree = (start_symbol, None)\n", " return self.fuzzer.expand_tree(tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is an example of `new_tree()`:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.122086Z", "iopub.status.busy": "2023-11-12T12:41:48.121968Z", "iopub.status.idle": "2023-11-12T12:41:48.524847Z", "shell.execute_reply": "2023-11-12T12:41:48.524448Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Creating new tree for \n" ] }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "0\n", "\n", "<plain-text>\n", "\n", "\n", "\n", "1\n", "\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plain_text_tree = cast(TreeMutator, bad_input_tree_mutator()).new_tree('')\n", "display_tree(plain_text_tree)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.526881Z", "iopub.status.busy": "2023-11-12T12:41:48.526745Z", "iopub.status.idle": "2023-11-12T12:41:48.529363Z", "shell.execute_reply": "2023-11-12T12:41:48.529020Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "''" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tree_to_string(plain_text_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Mutating the Tree\n", "\n", "With us now being able to \n", "* access a particular path in the tree (`get_subtree()`) and\n", "* create a new subtree (`new_tree()`),\n", "\n", "we can mutate the tree at a given path. This is the task of `mutate()`." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.531091Z", "iopub.status.busy": "2023-11-12T12:41:48.530947Z", "iopub.status.idle": "2023-11-12T12:41:48.533953Z", "shell.execute_reply": "2023-11-12T12:41:48.533672Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TreeMutator(TreeMutator):\n", " def mutate(self, path: TreePath, tree: Optional[DerivationTree] = None) -> DerivationTree:\n", " \"\"\"Return a new tree mutated at `path`\"\"\"\n", " if tree is None:\n", " tree = self.tree\n", " assert tree is not None\n", "\n", " symbol, children = tree\n", "\n", " if not path or children is None:\n", " return self.new_tree(symbol)\n", "\n", " head = path[0]\n", " new_children = (children[:head] +\n", " [self.mutate(path[1:], children[head])] +\n", " children[head + 1:])\n", " return symbol, new_children" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here is an example of `mutate()` in action. We mutate `bad_input_tree` at the path `[0, 0, 1, 0]` – that is, ``:" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.535592Z", "iopub.status.busy": "2023-11-12T12:41:48.535467Z", "iopub.status.idle": "2023-11-12T12:41:48.924144Z", "shell.execute_reply": "2023-11-12T12:41:48.923729Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Creating new tree for \n" ] }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "0\n", "\n", "<start>\n", "\n", "\n", "\n", "1\n", "\n", "<html>\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "\n", "\n", "\n", "2\n", "\n", "<tagged-text>\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "\n", "3\n", "\n", "<opening-tag>\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "\n", "\n", "\n", "17\n", "\n", "<html>\n", "\n", "\n", "\n", "2->17\n", "\n", "\n", "\n", "\n", "\n", "24\n", "\n", "<closing-tag>\n", "\n", "\n", "\n", "2->24\n", "\n", "\n", "\n", "\n", "\n", "4\n", "\n", "<lt>\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "\n", "6\n", "\n", "<id>\n", "\n", "\n", "\n", "3->6\n", "\n", "\n", "\n", "\n", "\n", "15\n", "\n", "<gt>\n", "\n", "\n", "\n", "3->15\n", "\n", "\n", "\n", "\n", "\n", "5\n", "\n", "< (60)\n", "\n", "\n", "\n", "4->5\n", "\n", "\n", "\n", "\n", "\n", "7\n", "\n", "<id>\n", "\n", "\n", "\n", "6->7\n", "\n", "\n", "\n", "\n", "\n", "13\n", "\n", "<letter>\n", "\n", "\n", "\n", "6->13\n", "\n", "\n", "\n", "\n", "\n", "8\n", "\n", "<id>\n", "\n", "\n", "\n", "7->8\n", "\n", "\n", "\n", "\n", "\n", "11\n", "\n", "<letter>\n", "\n", "\n", "\n", "7->11\n", "\n", "\n", "\n", "\n", "\n", "9\n", "\n", "<letter>\n", "\n", "\n", "\n", "8->9\n", "\n", "\n", "\n", "\n", "\n", "10\n", "\n", "f (102)\n", "\n", "\n", "\n", "9->10\n", "\n", "\n", "\n", "\n", "\n", "12\n", "\n", "o (111)\n", "\n", "\n", "\n", "11->12\n", "\n", "\n", "\n", "\n", "\n", "14\n", "\n", "o (111)\n", "\n", "\n", "\n", "13->14\n", "\n", "\n", "\n", "\n", "\n", "16\n", "\n", "> (62)\n", "\n", "\n", "\n", "15->16\n", "\n", "\n", "\n", "\n", "\n", "18\n", "\n", "<plain-text>\n", "\n", "\n", "\n", "17->18\n", "\n", "\n", "\n", "\n", "\n", "19\n", "\n", "<plain-char>\n", "\n", "\n", "\n", "18->19\n", "\n", "\n", "\n", "\n", "\n", "22\n", "\n", "<plain-text>\n", "\n", "\n", "\n", "18->22\n", "\n", "\n", "\n", "\n", "\n", "20\n", "\n", "<whitespace>\n", "\n", "\n", "\n", "19->20\n", "\n", "\n", "\n", "\n", "\n", "21\n", "\n", "  (32)\n", "\n", "\n", "\n", "20->21\n", "\n", "\n", "\n", "\n", "\n", "23\n", "\n", "\n", "\n", "\n", "22->23\n", "\n", "\n", "\n", "\n", "\n", "25\n", "\n", "<lt>\n", "\n", "\n", "\n", "24->25\n", "\n", "\n", "\n", "\n", "\n", "27\n", "\n", "/ (47)\n", "\n", "\n", "\n", "24->27\n", "\n", "\n", "\n", "\n", "\n", "28\n", "\n", "<id>\n", "\n", "\n", "\n", "24->28\n", "\n", "\n", "\n", "\n", "\n", "37\n", "\n", "<gt>\n", "\n", "\n", "\n", "24->37\n", "\n", "\n", "\n", "\n", "\n", "26\n", "\n", "< (60)\n", "\n", "\n", "\n", "25->26\n", "\n", "\n", "\n", "\n", "\n", "29\n", "\n", "<id>\n", "\n", "\n", "\n", "28->29\n", "\n", "\n", "\n", "\n", "\n", "35\n", "\n", "<letter>\n", "\n", "\n", "\n", "28->35\n", "\n", "\n", "\n", "\n", "\n", "30\n", "\n", "<id>\n", "\n", "\n", "\n", "29->30\n", "\n", "\n", "\n", "\n", "\n", "33\n", "\n", "<letter>\n", "\n", "\n", "\n", "29->33\n", "\n", "\n", "\n", "\n", "\n", "31\n", "\n", "<letter>\n", "\n", "\n", "\n", "30->31\n", "\n", "\n", "\n", "\n", "\n", "32\n", "\n", "f (102)\n", "\n", "\n", "\n", "31->32\n", "\n", "\n", "\n", "\n", "\n", "34\n", "\n", "o (111)\n", "\n", "\n", "\n", "33->34\n", "\n", "\n", "\n", "\n", "\n", "36\n", "\n", "o (111)\n", "\n", "\n", "\n", "35->36\n", "\n", "\n", "\n", "\n", "\n", "38\n", "\n", "> (62)\n", "\n", "\n", "\n", "37->38\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mutated_tree = cast(TreeMutator, bad_input_tree_mutator()).mutate([0, 0, 1, 0])\n", "display_tree(mutated_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see that the `` subtree is now different, which also becomes evident in the string representation." ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.925960Z", "iopub.status.busy": "2023-11-12T12:41:48.925840Z", "iopub.status.idle": "2023-11-12T12:41:48.928239Z", "shell.execute_reply": "2023-11-12T12:41:48.927940Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "' '" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tree_to_string(mutated_tree)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Generalizing Trees\n", "\n", "Now for the main part – finding out which parts of a tree can be generalized. Our idea is to _test_ a finite number of mutations to a subtree (say, 10). If all of these tests fail as well, then we assume we can generalize the subtree to a placeholder." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We introduce a class `TreeGeneralizer` for this purpose. On top of `grammar` and `tree` already used for the `TreeMutator` constructor, the `TreeGeneralizer` also takes a `test` function." ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.929836Z", "iopub.status.busy": "2023-11-12T12:41:48.929720Z", "iopub.status.idle": "2023-11-12T12:41:48.932014Z", "shell.execute_reply": "2023-11-12T12:41:48.931757Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TreeGeneralizer(TreeMutator):\n", " \"\"\"Determine which parts of a derivation tree can be generalized.\"\"\"\n", "\n", " def __init__(self, grammar: Grammar, tree: DerivationTree, test: Callable,\n", " max_tries_for_generalization: int = 10, **kwargs: Any) -> None:\n", " \"\"\"\n", " Constructor. `grammar` and `tree` are as in `TreeMutator`.\n", " `test` is a function taking a string that either\n", " * raises an exception, indicating test failure;\n", " * or not, indicating test success.\n", " `max_tries_for_generalization` is the number of times\n", " an instantiation has to fail before it is generalized.\n", " \"\"\"\n", "\n", " super().__init__(grammar, tree, **kwargs)\n", " self.test = test\n", " self.max_tries_for_generalization = max_tries_for_generalization" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `test` function is used in `test_tree()`, returning `False` if the test fails (raising an exception), and `True` if the test passes (no exception)." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.933443Z", "iopub.status.busy": "2023-11-12T12:41:48.933338Z", "iopub.status.idle": "2023-11-12T12:41:48.935675Z", "shell.execute_reply": "2023-11-12T12:41:48.935419Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TreeGeneralizer(TreeGeneralizer):\n", " def test_tree(self, tree: DerivationTree) -> bool:\n", " \"\"\"Return True if testing `tree` passes, else False\"\"\"\n", " s = tree_to_string(tree)\n", " if self.log:\n", " print(f\"Testing {repr(s)}...\", end=\"\")\n", " try:\n", " self.test(s)\n", " except Exception as exc:\n", " if self.log:\n", " print(f\"FAIL ({type(exc).__name__})\")\n", " ret = False\n", " else:\n", " if self.log:\n", " print(f\"PASS\")\n", " ret = True\n", "\n", " return ret" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Testing for Generalization\n", "\n", "The `can_generalize()` method brings the above methods together. It creates a number of tree mutations at the given path, and returns True if all of them produce a failure. (Note that this is not as sophisticated as our [delta debugger](DeltaDebugger.ipynb) implementation, which also checks that the _same_ error occurs.)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.937144Z", "iopub.status.busy": "2023-11-12T12:41:48.937040Z", "iopub.status.idle": "2023-11-12T12:41:48.939115Z", "shell.execute_reply": "2023-11-12T12:41:48.938864Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TreeGeneralizer(TreeGeneralizer):\n", " def can_generalize(self, path: TreePath, tree: Optional[DerivationTree] = None) -> bool:\n", " \"\"\"Return True if the subtree at `path` can be generalized.\"\"\"\n", " for i in range(self.max_tries_for_generalization):\n", " mutated_tree = self.mutate(path, tree)\n", " if self.test_tree(mutated_tree):\n", " # Failure no longer occurs; cannot abstract\n", " return False\n", "\n", " return True" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us put `TreeGeneralizer` into action. We can directly use `remove_html_markup()` as test function." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.940535Z", "iopub.status.busy": "2023-11-12T12:41:48.940431Z", "iopub.status.idle": "2023-11-12T12:41:48.942192Z", "shell.execute_reply": "2023-11-12T12:41:48.941925Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def bad_input_tree_generalizer(**kwargs: Any) -> TreeGeneralizer:\n", " return TreeGeneralizer(SIMPLE_HTML_GRAMMAR, bad_input_tree,\n", " remove_html_markup, **kwargs)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "On our `BAD_INPUT` (and its tree), can we generalize the root `` node? In other words, does the failure occur for all possible `` inputs?" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.943725Z", "iopub.status.busy": "2023-11-12T12:41:48.943605Z", "iopub.status.idle": "2023-11-12T12:41:48.947847Z", "shell.execute_reply": "2023-11-12T12:41:48.947591Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing \"\"...PASS\n" ] }, { "data": { "text/plain": [ "False" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bad_input_tree_generalizer(log=True).can_generalize([0])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The answer is no. The first alternative passes the test; hence no generalization." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "How about the middle `` part? Can we generalize this?" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.949283Z", "iopub.status.busy": "2023-11-12T12:41:48.949189Z", "iopub.status.idle": "2023-11-12T12:41:48.951780Z", "shell.execute_reply": "2023-11-12T12:41:48.951537Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing '8'...PASS\n" ] }, { "data": { "text/plain": [ "False" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bad_input_tree_generalizer(log=True).can_generalize([0, 0, 1, 0])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The answer is no – just as above." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "How about the closing tag? Can we generalize this one?" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.953280Z", "iopub.status.busy": "2023-11-12T12:41:48.953176Z", "iopub.status.idle": "2023-11-12T12:41:48.959047Z", "shell.execute_reply": "2023-11-12T12:41:48.958809Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bad_input_tree_generalizer(log=True).can_generalize([0, 0, 2])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Yes, we can! All alternate instantiations of `` result in a failure." ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.960698Z", "iopub.status.busy": "2023-11-12T12:41:48.960560Z", "iopub.status.idle": "2023-11-12T12:41:48.964087Z", "shell.execute_reply": "2023-11-12T12:41:48.963777Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

Is this also true for <opening-tag>?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"Is this also true for ``?\",\n", " [\n", " \"Yes\",\n", " \"No\"\n", " ], '(\"Yes\" == \"Yes\") + (\"No\" == \"No\")')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Note that the above does not hold for ``. If the attribute value contains a quote character, it will extend to the end of the input. This is another error, but not caught by our assertion; hence, the input will be flagged as passing:" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.965618Z", "iopub.status.busy": "2023-11-12T12:41:48.965536Z", "iopub.status.idle": "2023-11-12T12:41:48.967070Z", "shell.execute_reply": "2023-11-12T12:41:48.966771Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "BAD_ATTR_INPUT = 'bar'" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.968444Z", "iopub.status.busy": "2023-11-12T12:41:48.968351Z", "iopub.status.idle": "2023-11-12T12:41:48.970443Z", "shell.execute_reply": "2023-11-12T12:41:48.970168Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "''" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "remove_html_markup(BAD_ATTR_INPUT)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The effect of this is that there are patterns for `` which do not cause the failure to occur; hence, `` is not a fully valid generalization." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This, however, becomes apparent only if one of our generated tests includes a quote character in the attribute value. Since quote characters are as likely (or as unlikely) to appear as other characters, this effect may not become apparent in our default 10 tests:" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.971926Z", "iopub.status.busy": "2023-11-12T12:41:48.971841Z", "iopub.status.idle": "2023-11-12T12:41:48.989894Z", "shell.execute_reply": "2023-11-12T12:41:48.989619Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bad_input_tree_generalizer().can_generalize([0, 0, 0])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It will become apparent, however, as we increase the number of tests:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:48.991342Z", "iopub.status.busy": "2023-11-12T12:41:48.991233Z", "iopub.status.idle": "2023-11-12T12:41:49.018489Z", "shell.execute_reply": "2023-11-12T12:41:49.018156Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing '\"bar
'...FAIL (AssertionError)\n", "Testing '\"bar
'...FAIL (AssertionError)\n", "Testing '\"bar
'...FAIL (AssertionError)\n", "Testing '\"bar
'...FAIL (AssertionError)\n", "Testing '\"bar
'...FAIL (AssertionError)\n", "Testing '\"bar
'...FAIL (AssertionError)\n", "Testing '\"bar
'...FAIL (AssertionError)\n", "Testing '\"bar
'...FAIL (AssertionError)\n", "Testing '\"bar
'...FAIL (AssertionError)\n", "Testing '\"bar
'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...FAIL (AssertionError)\n", "Testing '\"bar'...PASS\n" ] }, { "data": { "text/plain": [ "False" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bad_input_tree_generalizer(max_tries_for_generalization=100, log=True).can_generalize([0, 0, 0])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We see that our approach may _overgeneralize_ – producing a generalization that may be too lenient. In practice, this is not too much of a problem, as we would be interested in characterizing cases that trigger the failure, rather than characterizing a small subset that does not trigger the failure." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Generalizable Paths\n", "\n", "Using `can_generalize()`, we can devise a method `generalizable_paths()` that returns all paths in the tree that can be generalized." ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:49.020391Z", "iopub.status.busy": "2023-11-12T12:41:49.020299Z", "iopub.status.idle": "2023-11-12T12:41:49.023727Z", "shell.execute_reply": "2023-11-12T12:41:49.023415Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TreeGeneralizer(TreeGeneralizer):\n", " def find_paths(self, \n", " predicate: Callable[[TreePath, DerivationTree], bool], \n", " path: Optional[TreePath] = None, \n", " tree: Optional[DerivationTree] = None) -> List[TreePath]:\n", " \"\"\"\n", " Return a list of all paths for which `predicate` holds.\n", " `predicate` is a function `predicate`(`path`, `tree`), where\n", " `path` denotes a subtree in `tree`. If `predicate()` returns\n", " True, `path` is included in the returned list.\n", " \"\"\"\n", "\n", " if path is None:\n", " path = []\n", " assert path is not None\n", "\n", " if tree is None:\n", " tree = self.tree\n", " assert tree is not None\n", "\n", " symbol, children = self.get_subtree(path)\n", "\n", " if predicate(path, tree):\n", " return [path]\n", "\n", " paths = []\n", " if children is not None:\n", " for i, child in enumerate(children):\n", " child_symbol, _ = child\n", " if child_symbol in self.grammar:\n", " paths += self.find_paths(predicate, path + [i])\n", "\n", " return paths\n", "\n", " def generalizable_paths(self) -> List[TreePath]:\n", " \"\"\"Return a list of all paths whose subtrees can be generalized.\"\"\"\n", " return self.find_paths(self.can_generalize)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here is `generalizable_paths()` in action. We obtain all (paths to) subtrees that can be generalized:" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:49.025318Z", "iopub.status.busy": "2023-11-12T12:41:49.025231Z", "iopub.status.idle": "2023-11-12T12:41:49.046698Z", "shell.execute_reply": "2023-11-12T12:41:49.046436Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[[0, 0, 0], [0, 0, 1, 0, 1, 0], [0, 0, 1, 0, 1, 1], [0, 0, 2]]" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bad_input_generalizable_paths = \\\n", " cast(TreeGeneralizer, bad_input_tree_generalizer()).generalizable_paths()\n", "bad_input_generalizable_paths" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To convert these subtrees into abstract failure-inducing patterns, the method `generalize_path()` returns a copy of the tree with the subtree replaced by a nonterminal without children:" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:49.048501Z", "iopub.status.busy": "2023-11-12T12:41:49.048320Z", "iopub.status.idle": "2023-11-12T12:41:49.050885Z", "shell.execute_reply": "2023-11-12T12:41:49.050641Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TreeGeneralizer(TreeGeneralizer):\n", " def generalize_path(self, path: TreePath, \n", " tree: Optional[DerivationTree] = None) -> DerivationTree:\n", " \"\"\"Return a copy of the tree in which the subtree at `path`\n", " is generalized (= replaced by a nonterminal without children)\"\"\"\n", "\n", " if tree is None:\n", " tree = self.tree\n", " assert tree is not None\n", "\n", " symbol, children = tree\n", "\n", " if not path or children is None:\n", " return symbol, None # Nonterminal without children\n", "\n", " head = path[0]\n", " new_children = (children[:head] +\n", " [self.generalize_path(path[1:], children[head])] +\n", " children[head + 1:])\n", " return symbol, new_children" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The function `all_terminals()` expands these placeholders:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:49.053152Z", "iopub.status.busy": "2023-11-12T12:41:49.052658Z", "iopub.status.idle": "2023-11-12T12:41:49.055669Z", "shell.execute_reply": "2023-11-12T12:41:49.055378Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'\"bar'" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_terminals(cast(TreeGeneralizer, bad_input_tree_generalizer()).generalize_path([0, 0, 0]))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Finally, the method `generalize()` obtains a tree in which all generalizable paths actually are generalized:" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:49.057291Z", "iopub.status.busy": "2023-11-12T12:41:49.057189Z", "iopub.status.idle": "2023-11-12T12:41:49.059308Z", "shell.execute_reply": "2023-11-12T12:41:49.059037Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TreeGeneralizer(TreeGeneralizer):\n", " def generalize(self) -> DerivationTree:\n", " \"\"\"Returns a copy of the tree in which all generalizable subtrees\n", " are generalized (= replaced by nonterminals without children)\"\"\"\n", " tree = self.tree\n", " assert tree is not None\n", "\n", " for path in self.generalizable_paths():\n", " tree = self.generalize_path(path, tree)\n", "\n", " return tree" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:49.060828Z", "iopub.status.busy": "2023-11-12T12:41:49.060686Z", "iopub.status.idle": "2023-11-12T12:41:49.079179Z", "shell.execute_reply": "2023-11-12T12:41:49.078859Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "abstract_failure_inducing_input = cast(TreeGeneralizer, bad_input_tree_generalizer()).generalize()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This gives us the final generalization of `BAD_INPUT`. In the abstract failure-inducing input, all generalizable elements are generalized." ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:49.080666Z", "iopub.status.busy": "2023-11-12T12:41:49.080560Z", "iopub.status.idle": "2023-11-12T12:41:49.082728Z", "shell.execute_reply": "2023-11-12T12:41:49.082460Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'\"'" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_terminals(abstract_failure_inducing_input)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see that to obtain the failure, it suffices to have an ``, followed by a quote and (any) `` and (any) ``. Clearly, all that it takes to produce the failure is to have a double quote in the plain text." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Also note how this diagnosis was reached through _experiments_ only – just as with [delta debugging](DeltaDebugger.ipynb), we could treat the program under test as a black box. In contrast to delta debugging, however, we obtain an _abstraction_ that generalizes the circumstances under which a given failure occurs." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Fuzzing with Patterns\n", "\n", "One neat side effect of abstract failure-inducing patterns is that they can be easily instantiated into further test cases, all set to reproduce the failure in question. This gives us a test suite we can later test our fix against." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The method `fuzz_tree()` takes a tree representing an abstract failure-inducing input and instantiates all missing subtrees." ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:49.084376Z", "iopub.status.busy": "2023-11-12T12:41:49.084270Z", "iopub.status.idle": "2023-11-12T12:41:49.085915Z", "shell.execute_reply": "2023-11-12T12:41:49.085643Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import copy" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:49.087312Z", "iopub.status.busy": "2023-11-12T12:41:49.087228Z", "iopub.status.idle": "2023-11-12T12:41:49.089204Z", "shell.execute_reply": "2023-11-12T12:41:49.088912Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class TreeGeneralizer(TreeGeneralizer):\n", " def fuzz_tree(self, tree: DerivationTree) -> DerivationTree:\n", " \"\"\"Return an instantiated copy of `tree`.\"\"\"\n", " tree = copy.deepcopy(tree)\n", " return self.fuzzer.expand_tree(tree)" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:49.090733Z", "iopub.status.busy": "2023-11-12T12:41:49.090611Z", "iopub.status.idle": "2023-11-12T12:41:49.107677Z", "shell.execute_reply": "2023-11-12T12:41:49.107404Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\"

\n", "\"
\n", "\"\n", "\"\n", "\"\n", "\"\n", "\"\n", "\"\n", "\"\n", "\"\t\n" ] } ], "source": [ "bitg = cast(TreeGeneralizer, bad_input_tree_generalizer())\n", "for i in range(10):\n", " print(all_terminals(bitg.fuzz_tree(abstract_failure_inducing_input)))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can take these inputs and see whether they reproduce the failure in question:" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:49.109192Z", "iopub.status.busy": "2023-11-12T12:41:49.109096Z", "iopub.status.idle": "2023-11-12T12:41:50.984997Z", "shell.execute_reply": "2023-11-12T12:41:50.984598Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "successes = 0\n", "failures = 0\n", "trials = 1000\n", "\n", "for i in range(trials):\n", " test_input = all_terminals(\n", " bitg.fuzz_tree(abstract_failure_inducing_input))\n", " try:\n", " remove_html_markup(test_input)\n", " except AssertionError:\n", " successes += 1\n", " else:\n", " failures += 1" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:50.986904Z", "iopub.status.busy": "2023-11-12T12:41:50.986774Z", "iopub.status.idle": "2023-11-12T12:41:50.989250Z", "shell.execute_reply": "2023-11-12T12:41:50.988941Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "(982, 18)" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "successes, failures" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We get an overall failure rate of ~98%, which is not bad at all." ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:50.991009Z", "iopub.status.busy": "2023-11-12T12:41:50.990868Z", "iopub.status.idle": "2023-11-12T12:41:50.993192Z", "shell.execute_reply": "2023-11-12T12:41:50.992859Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0.018" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "failures / 1000" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In our case, it is _overgeneralization_ (as discussed above) that is responsible for not reaching a 100% rate. (In all generality, we are trying to approximate the behavior of a Turing machine with a context free grammar, which is, well, always an approximation.) However, even a lower rate would still be useful, as any additional test case that reproduces a failure helps in ensuring the final fix is complete." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Putting it all Together\n", "\n", "Let us now put together all this in a more convenient package that does not require the user to parse and unparse derivation trees." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Our `DDSetDebugger` is modeled after the `DeltaDebugger` from [the chapter on delta debugging](DeltaDebugger.ipynb). It is to be used as\n", "\n", "```python\n", "with DDSetDebugger(grammar) as dd:\n", " some_failing_function(...)\n", "```\n", "\n", "After that, evaluating `dd` yields a generalized abstract failure-inducing input as a string." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Since `DDSetDebugger` accepts only one grammar, the function to be debugged should have exactly one string argument (besides other arguments); this string must fit the grammar." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Constructor" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The constructor puts together the various components. It allows for customization by subclassing." ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:50.994955Z", "iopub.status.busy": "2023-11-12T12:41:50.994808Z", "iopub.status.idle": "2023-11-12T12:41:50.996642Z", "shell.execute_reply": "2023-11-12T12:41:50.996355Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from DeltaDebugger import CallCollector" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:50.998370Z", "iopub.status.busy": "2023-11-12T12:41:50.998233Z", "iopub.status.idle": "2023-11-12T12:41:51.001513Z", "shell.execute_reply": "2023-11-12T12:41:51.001179Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DDSetDebugger(CallCollector):\n", " \"\"\"\n", " Debugger implementing the DDSET algorithm for abstracting failure-inducing inputs.\n", " \"\"\"\n", "\n", " def __init__(self, grammar: Grammar, \n", " generalizer_class: Type = TreeGeneralizer,\n", " parser: Optional[Parser] = None,\n", " **kwargs: Any) -> None:\n", " \"\"\"Constructor.\n", " `grammar` is an input grammar in fuzzingbook format.\n", " `generalizer_class` is the tree generalizer class to use\n", " (default: `TreeGeneralizer`)\n", " `parser` is the parser to use (default: `EarleyParser(grammar)`).\n", " All other keyword args are passed to the tree generalizer, notably:\n", " `fuzzer` - the fuzzer to use (default: `GrammarFuzzer`), and\n", " `log` - enables debugging output if True.\n", " \"\"\"\n", " super().__init__()\n", " self.grammar = grammar\n", " assert is_valid_grammar(grammar)\n", "\n", " self.generalizer_class = generalizer_class\n", "\n", " if parser is None:\n", " parser = EarleyParser(grammar)\n", " self.parser = parser\n", " self.kwargs = kwargs\n", "\n", " # These save state for further fuzz() calls\n", " self.generalized_args: Dict[str, Any] = {}\n", " self.generalized_trees: Dict[str, DerivationTree] = {}\n", " self.generalizers: Dict[str, TreeGeneralizer] = {}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Generalizing Arguments" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The method `generalize()` is the main entry point. For all string arguments collected in the first function call, it generalizes the arguments and returns an abstract failure-inducing string." ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.003220Z", "iopub.status.busy": "2023-11-12T12:41:51.003119Z", "iopub.status.idle": "2023-11-12T12:41:51.006435Z", "shell.execute_reply": "2023-11-12T12:41:51.006114Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DDSetDebugger(DDSetDebugger):\n", " def generalize(self) -> Dict[str, Any]:\n", " \"\"\"\n", " Generalize arguments seen. For each function argument,\n", " produce an abstract failure-inducing input that characterizes\n", " the set of inputs for which the function fails.\n", " \"\"\"\n", " if self.generalized_args:\n", " return self.generalized_args\n", "\n", " self.generalized_args = copy.deepcopy(self.args())\n", " self.generalized_trees = {}\n", " self.generalizers = {}\n", "\n", " for arg in self.args():\n", " def test(value: Any) -> Any:\n", " return self.call({arg: value})\n", "\n", " value = self.args()[arg]\n", " if isinstance(value, str):\n", " tree = list(self.parser.parse(value))[0]\n", " gen = self.generalizer_class(self.grammar, tree, test, \n", " **self.kwargs)\n", " generalized_tree = gen.generalize()\n", "\n", " self.generalizers[arg] = gen\n", " self.generalized_trees[arg] = generalized_tree\n", " self.generalized_args[arg] = all_terminals(generalized_tree)\n", "\n", " return self.generalized_args" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.007989Z", "iopub.status.busy": "2023-11-12T12:41:51.007879Z", "iopub.status.idle": "2023-11-12T12:41:51.010108Z", "shell.execute_reply": "2023-11-12T12:41:51.009701Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DDSetDebugger(DDSetDebugger):\n", " def __repr__(self) -> str:\n", " \"\"\"Return a string representation of the generalized call.\"\"\"\n", " return self.format_call(self.generalize())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is an example of how `DDSetDebugger` would be used on our `BAD_INPUT` example. Simply evaluating the debugger yields a call with a generalized input." ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.011842Z", "iopub.status.busy": "2023-11-12T12:41:51.011711Z", "iopub.status.idle": "2023-11-12T12:41:51.036962Z", "shell.execute_reply": "2023-11-12T12:41:51.036516Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "remove_html_markup(s='\"')" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with DDSetDebugger(SIMPLE_HTML_GRAMMAR) as dd:\n", " remove_html_markup(BAD_INPUT)\n", "dd" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Fuzzing\n", "\n", "The `fuzz()` method produces instantiations of the abstract failure-inducing pattern." ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.038847Z", "iopub.status.busy": "2023-11-12T12:41:51.038622Z", "iopub.status.idle": "2023-11-12T12:41:51.041551Z", "shell.execute_reply": "2023-11-12T12:41:51.041248Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DDSetDebugger(DDSetDebugger):\n", " def fuzz_args(self) -> Dict[str, Any]:\n", " \"\"\"\n", " Return arguments randomly instantiated\n", " from the abstract failure-inducing pattern.\n", " \"\"\"\n", " if not self.generalized_trees:\n", " self.generalize()\n", "\n", " args = copy.deepcopy(self.generalized_args)\n", " for arg in args:\n", " if arg not in self.generalized_trees:\n", " continue\n", "\n", " tree = self.generalized_trees[arg]\n", " gen = self.generalizers[arg]\n", " instantiated_tree = gen.fuzz_tree(tree)\n", " args[arg] = all_terminals(instantiated_tree)\n", "\n", " return args\n", "\n", " def fuzz(self) -> str:\n", " \"\"\"\n", " Return a call with arguments randomly instantiated\n", " from the abstract failure-inducing pattern.\n", " \"\"\"\n", " return self.format_call(self.fuzz_args())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here are some axamples of `fuzz()` in action:" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.043205Z", "iopub.status.busy": "2023-11-12T12:41:51.043088Z", "iopub.status.idle": "2023-11-12T12:41:51.045325Z", "shell.execute_reply": "2023-11-12T12:41:51.044986Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with DDSetDebugger(SIMPLE_HTML_GRAMMAR) as dd:\n", " remove_html_markup(BAD_INPUT)" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.046948Z", "iopub.status.busy": "2023-11-12T12:41:51.046825Z", "iopub.status.idle": "2023-11-12T12:41:51.071379Z", "shell.execute_reply": "2023-11-12T12:41:51.071085Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'remove_html_markup(s=\\'\"
\\')'" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dd.fuzz()" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.073099Z", "iopub.status.busy": "2023-11-12T12:41:51.072955Z", "iopub.status.idle": "2023-11-12T12:41:51.077163Z", "shell.execute_reply": "2023-11-12T12:41:51.076897Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'remove_html_markup(s=\\'\"\\\\t\\')'" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dd.fuzz()" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.078821Z", "iopub.status.busy": "2023-11-12T12:41:51.078710Z", "iopub.status.idle": "2023-11-12T12:41:51.083679Z", "shell.execute_reply": "2023-11-12T12:41:51.083387Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'remove_html_markup(s=\\'\")\\')'" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dd.fuzz()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " These can be fed into `eval()`, set to produce more failing calls." ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.085190Z", "iopub.status.busy": "2023-11-12T12:41:51.085068Z", "iopub.status.idle": "2023-11-12T12:41:51.088260Z", "shell.execute_reply": "2023-11-12T12:41:51.087918Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_27613/2072246869.py\", line 2, in \n", " eval(dd.fuzz())\n", " File \"\", line 1, in \n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_27613/2717035104.py\", line 17, in remove_html_markup\n", " assert '<' not in out and '>' not in out\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError(AssertionError):\n", " eval(dd.fuzz())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## More Examples\n", "\n", "Let us apply `DDSetDebugger` on more examples." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Square Root\n", "\n", "Our first example is the `square_root()` function from [the chapter on assertions](Assertions.ipynb)." ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.089865Z", "iopub.status.busy": "2023-11-12T12:41:51.089770Z", "iopub.status.idle": "2023-11-12T12:41:51.091538Z", "shell.execute_reply": "2023-11-12T12:41:51.091237Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from Assertions import square_root # minor dependency" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `square_root()` function fails on a value of `-1`:" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.093078Z", "iopub.status.busy": "2023-11-12T12:41:51.092962Z", "iopub.status.idle": "2023-11-12T12:41:51.094995Z", "shell.execute_reply": "2023-11-12T12:41:51.094700Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_27613/1205325604.py\", line 2, in \n", " square_root(-1)\n", " File \"/Users/zeller/Projects/debuggingbook/notebooks/Assertions.ipynb\", line 55, in square_root\n", " assert x >= 0 # precondition\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError(AssertionError):\n", " square_root(-1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We define a grammar for its arguments:" ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "button": false, "execution": { "iopub.execute_input": "2023-11-12T12:41:51.096682Z", "iopub.status.busy": "2023-11-12T12:41:51.096506Z", "iopub.status.idle": "2023-11-12T12:41:51.098674Z", "shell.execute_reply": "2023-11-12T12:41:51.098324Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "INT_GRAMMAR: Grammar = {\n", " \"\":\n", " [\"\"],\n", "\n", " \"\":\n", " [\"\", \"-\"],\n", "\n", " \"\":\n", " [\"\", \"\"],\n", "\n", " \"\": list(\"123456789\"),\n", " \n", " \"\": list(string.digits),\n", "}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The test function takes a string and converts it into an integer:" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.100253Z", "iopub.status.busy": "2023-11-12T12:41:51.100161Z", "iopub.status.idle": "2023-11-12T12:41:51.101780Z", "shell.execute_reply": "2023-11-12T12:41:51.101509Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def square_root_test(s: str) -> None:\n", " return square_root(int(s))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With this, we can go and see whether we can generalize a failing input:" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.103145Z", "iopub.status.busy": "2023-11-12T12:41:51.103064Z", "iopub.status.idle": "2023-11-12T12:41:51.104845Z", "shell.execute_reply": "2023-11-12T12:41:51.104574Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with DDSetDebugger(INT_GRAMMAR, log=True) as dd_square_root:\n", " square_root_test(\"-1\")" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.106399Z", "iopub.status.busy": "2023-11-12T12:41:51.106300Z", "iopub.status.idle": "2023-11-12T12:41:51.110473Z", "shell.execute_reply": "2023-11-12T12:41:51.110213Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing '-8'...FAIL (AssertionError)\n", "Testing '-316'...FAIL (AssertionError)\n", "Testing '8'...PASS\n", "Testing '684'...PASS\n", "Testing '-3'...FAIL (AssertionError)\n", "Testing '-870'...FAIL (AssertionError)\n", "Testing '-3'...FAIL (AssertionError)\n", "Testing '-3451'...FAIL (AssertionError)\n", "Testing '-8'...FAIL (AssertionError)\n", "Testing '-63213'...FAIL (AssertionError)\n", "Testing '-26'...FAIL (AssertionError)\n", "Testing '-4'...FAIL (AssertionError)\n", "Testing '-6'...FAIL (AssertionError)\n", "Testing '-8'...FAIL (AssertionError)\n" ] }, { "data": { "text/plain": [ "square_root_test(s='-')" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dd_square_root" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Success! Using `DDSetDebugger`, we have nicely generalized the failure-inducing input to a pattern `-` that translates into \"any negative number\"." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Middle\n", "\n", "The `middle()` function from [the chapter on statistical debugging](StatisticalDebugger.ipynb) returns the middle of three numerical values `x`, `y`, and `z`." ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.112315Z", "iopub.status.busy": "2023-11-12T12:41:51.112139Z", "iopub.status.idle": "2023-11-12T12:41:51.651389Z", "shell.execute_reply": "2023-11-12T12:41:51.651075Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from StatisticalDebugger import middle # minor dependency" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We set up a test function that evaluates a string – a tuple of three arguments – and then tests `middle()`:" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.653397Z", "iopub.status.busy": "2023-11-12T12:41:51.653220Z", "iopub.status.idle": "2023-11-12T12:41:51.655303Z", "shell.execute_reply": "2023-11-12T12:41:51.655033Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def middle_test(s: str) -> None:\n", " x, y, z = eval(s)\n", " assert middle(x, y, z) == sorted([x, y, z])[1]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The grammar for the three numbers simply puts three integers together:" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "button": false, "execution": { "iopub.execute_input": "2023-11-12T12:41:51.656863Z", "iopub.status.busy": "2023-11-12T12:41:51.656735Z", "iopub.status.idle": "2023-11-12T12:41:51.658665Z", "shell.execute_reply": "2023-11-12T12:41:51.658425Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "XYZ_GRAMMAR: Grammar = {\n", " \"\":\n", " [\", , \"],\n", "\n", " \"\":\n", " [\"\", \"-\"],\n", "\n", " \"\":\n", " [\"\", \"\"],\n", "\n", " \"\": list(\"123456789\"),\n", " \n", " \"\": list(string.digits),\n", "}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is an example of `middle()` failing:" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.660149Z", "iopub.status.busy": "2023-11-12T12:41:51.660039Z", "iopub.status.idle": "2023-11-12T12:41:51.661887Z", "shell.execute_reply": "2023-11-12T12:41:51.661641Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_27613/982110330.py\", line 2, in \n", " middle_test(\"2, 1, 3\")\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_27613/3079946275.py\", line 3, in middle_test\n", " assert middle(x, y, z) == sorted([x, y, z])[1]\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError(AssertionError):\n", " middle_test(\"2, 1, 3\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "What happens if we debug this with `DDSetDebugger`? We see that there is no abstraction at the syntax level that could characterize this failure:" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.663514Z", "iopub.status.busy": "2023-11-12T12:41:51.663400Z", "iopub.status.idle": "2023-11-12T12:41:51.665206Z", "shell.execute_reply": "2023-11-12T12:41:51.664954Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with DDSetDebugger(XYZ_GRAMMAR, log=True) as dd_middle:\n", " middle_test(\"2, 1, 3\")" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.666650Z", "iopub.status.busy": "2023-11-12T12:41:51.666546Z", "iopub.status.idle": "2023-11-12T12:41:51.671477Z", "shell.execute_reply": "2023-11-12T12:41:51.671183Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing '7, 4591, -0'...PASS\n", "Testing '6, 1, 3'...PASS\n", "Testing '7, 1, 3'...PASS\n", "Testing '7, 1, 3'...PASS\n", "Testing '2, -7, 3'...FAIL (AssertionError)\n", "Testing '2, 0, 3'...FAIL (AssertionError)\n", "Testing '2, -89, 3'...FAIL (AssertionError)\n", "Testing '2, 973, 3'...PASS\n", "Testing '2, 11, 3'...PASS\n", "Testing '2, 8, 3'...PASS\n", "Testing '2, 1, 9'...FAIL (AssertionError)\n", "Testing '2, 1, -16'...PASS\n", "Testing '2, 1, 35'...FAIL (AssertionError)\n", "Testing '2, 1, 6'...FAIL (AssertionError)\n", "Testing '2, 1, 53'...FAIL (AssertionError)\n", "Testing '2, 1, 5'...FAIL (AssertionError)\n", "Testing '2, 1, 737'...FAIL (AssertionError)\n", "Testing '2, 1, 28'...FAIL (AssertionError)\n", "Testing '2, 1, 3'...FAIL (AssertionError)\n", "Testing '2, 1, 5'...FAIL (AssertionError)\n", "Testing '2, 1, 5'...FAIL (AssertionError)\n", "Testing '2, 1, 56'...FAIL (AssertionError)\n" ] }, { "data": { "text/plain": [ "middle_test(s='2, 1, ')" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dd_middle" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "So, while there are failures that can be nicely characterized using abstractions of input elements, `middle()` is not one of them. Which is good, because this means that all our other techniques such as [statistical debugging](StatisticalDebugger.ipynb) and [dynamic invariants](DynamicInvariants.ipynb) still have a use case :-)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Synopsis" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This chapter provides a class `DDSetDebugger`, implementing the DDSET algorithm for generalizing failure-inducing inputs. The `DDSetDebugger` is used as follows:\n", "\n", "```python\n", "with DDSetDebugger(grammar) as dd:\n", " function(args...)\n", "dd\n", "```\n", "\n", "Here, `function(args...)` is a failing function call (= raises an execption) that takes at least one string argument; `grammar` is an [input grammar in fuzzingbook format](https://www.fuzzingbook.org/html/Grammars.html) that matches the format of this argument.\n", "\n", "The result is a call of `function()` with an _abstract failure-inducing input_ – a variant of the conrete input in which parts are replaced by placeholders in the form ``, where `` is a nonterminal in the grammar. The failure has been verified to occur for a number of instantiations of ``." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here is an example of how `DDSetDebugger` works. The concrete failing input `\"bar` is generalized to an _abstract failure-inducing input_:" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.673445Z", "iopub.status.busy": "2023-11-12T12:41:51.673259Z", "iopub.status.idle": "2023-11-12T12:41:51.688668Z", "shell.execute_reply": "2023-11-12T12:41:51.688389Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "remove_html_markup(s='\"')" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with DDSetDebugger(SIMPLE_HTML_GRAMMAR) as dd:\n", " remove_html_markup('\"bar')\n", "dd" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The abstract input tells us that the failure occurs for whatever opening and closing HTML tags as long as there is a double quote between them." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A programmatic interface is available as well. `generalize()` returns a mapping of argument names to (generalized) values:" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.690221Z", "iopub.status.busy": "2023-11-12T12:41:51.690094Z", "iopub.status.idle": "2023-11-12T12:41:51.692359Z", "shell.execute_reply": "2023-11-12T12:41:51.692117Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'s': '\"'}" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dd.generalize()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Using `fuzz()`, the abstract input can be instantiated to further concrete inputs, all set to produce the failure again:" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.693916Z", "iopub.status.busy": "2023-11-12T12:41:51.693748Z", "iopub.status.idle": "2023-11-12T12:41:51.705838Z", "shell.execute_reply": "2023-11-12T12:41:51.705582Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "remove_html_markup(s='\"1')\n", "remove_html_markup(s='\"c*C')\n", "remove_html_markup(s='\"')\n", "remove_html_markup(s='\")')\n", "remove_html_markup(s='\"')\n", "remove_html_markup(s='\"')\n", "remove_html_markup(s='\"\\t7')\n", "remove_html_markup(s='\"')\n", "remove_html_markup(s='\"2')\n", "remove_html_markup(s='\"\\r~\\t\\r')\n" ] } ], "source": [ "for i in range(10):\n", " print(dd.fuzz())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`DDSetDebugger` can be customized by passing a subclass of `TreeGeneralizer`, which does the gist of the work; for details, see its constructor.\n", "The full class hierarchy is shown below." ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.707506Z", "iopub.status.busy": "2023-11-12T12:41:51.707406Z", "iopub.status.idle": "2023-11-12T12:41:51.709102Z", "shell.execute_reply": "2023-11-12T12:41:51.708853Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "from ClassDiagram import display_class_hierarchy" ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:51.710742Z", "iopub.status.busy": "2023-11-12T12:41:51.710583Z", "iopub.status.idle": "2023-11-12T12:41:52.241034Z", "shell.execute_reply": "2023-11-12T12:41:52.240534Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "DDSetDebugger\n", "\n", "\n", "DDSetDebugger\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "__repr__()\n", "\n", "\n", "\n", "fuzz()\n", "\n", "\n", "\n", "fuzz_args()\n", "\n", "\n", "\n", "generalize()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "CallCollector\n", "\n", "\n", "CallCollector\n", "\n", "\n", "\n", "__enter__()\n", "\n", "\n", "\n", "__exit__()\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "args()\n", "\n", "\n", "\n", "call()\n", "\n", "\n", "\n", "exception()\n", "\n", "\n", "\n", "function()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "DDSetDebugger->CallCollector\n", "\n", "\n", "\n", "\n", "\n", "StackInspector\n", "\n", "\n", "StackInspector\n", "\n", "\n", "\n", "_generated_function_cache\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "CallCollector->StackInspector\n", "\n", "\n", "\n", "\n", "\n", "TreeGeneralizer\n", "\n", "\n", "TreeGeneralizer\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "can_generalize()\n", "\n", "\n", "\n", "find_paths()\n", "\n", "\n", "\n", "fuzz_tree()\n", "\n", "\n", "\n", "generalizable_paths()\n", "\n", "\n", "\n", "generalize()\n", "\n", "\n", "\n", "generalize_path()\n", "\n", "\n", "\n", "test_tree()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "TreeMutator\n", "\n", "\n", "TreeMutator\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "get_subtree()\n", "\n", "\n", "\n", "mutate()\n", "\n", "\n", "\n", "new_tree()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "TreeGeneralizer->TreeMutator\n", "\n", "\n", "\n", "\n", "\n", "Legend\n", "Legend\n", "• \n", "public_method()\n", "• \n", "private_method()\n", "• \n", "overloaded_method()\n", "Hover over names to see doc\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "display_class_hierarchy([DDSetDebugger, TreeGeneralizer],\n", " public_methods=[\n", " CallCollector.__init__,\n", " CallCollector.__enter__,\n", " CallCollector.__exit__,\n", " CallCollector.function,\n", " CallCollector.args,\n", " CallCollector.exception,\n", " CallCollector.call, # type: ignore\n", " DDSetDebugger.__init__,\n", " DDSetDebugger.__repr__,\n", " DDSetDebugger.fuzz,\n", " DDSetDebugger.fuzz_args,\n", " DDSetDebugger.generalize,\n", " ], project='debuggingbook')" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Lessons Learned\n", "\n", "* Generalizing failure-inducing inputs can yield important information for which inputs and under which circumstances a failure occurs.\n", "* Generalizing failure-inducing inputs is most useful if the input can be split into multiple elements, of which only a part are relevant for producing the error.\n", "* As they help in _parsing_ and _producing_ input, _grammars_ can play an important role in testing and debugging." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Next Steps\n", "\n", "Our [next chapter](Repairer.ipynb) introduces _automated repair_ of programs, building on the fault localization and generalization mechanisms introduced so far." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Background\n", "\n", "Our `DDSetDebugger` class implements the DDSET algorithm as introduced by Gopinath et al. in \\cite{Gopinath2020}. A [full-fledged implementation of DDSET](https://rahul.gopinath.org/post/2020/07/15/ddset/) with plenty of details and experiments is available as a Jupyter Notebook. Our implementation follows the [simplified implementation of DDSET, as described by Gopinath](https://rahul.gopinath.org/post/2020/08/03/simple-ddset/).\n", "\n", "The potential for determining how input features relate to bugs is not nearly explored yet. \n", "The ALHAZEN work by Kampmann et al. \\cite{Kampmann2020} generalizes over DDSET differently, by investigating _semantic_ features of input elements such as their numeric interpretation or length and their correlation with failures. Like DDSET, ALHAZEN also uses a feedback loop to strengthen or refute its hypotheses.\n", "\n", "In recent work \\cite{Gopinath2021}, Gopinath has extended the concept of DDSET further. His work on _evocative expressions_ introduces a _pattern language_ in which arbitrary DDSET-like patterns can be combined into Boolean formula that even more precisely capture and produce failure circumstances. In particular, evocative expressions can _specialize_ grammars towards Boolean pattern combinations, thus allowing for great flexibility in testing and debugging." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercises" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 1: Generalization and Specialization\n", "\n", "Consider the abstract failure-inducing input for `BAD_INPUT` we determined:" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:52.243363Z", "iopub.status.busy": "2023-11-12T12:41:52.243207Z", "iopub.status.idle": "2023-11-12T12:41:52.245668Z", "shell.execute_reply": "2023-11-12T12:41:52.245399Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'\"'" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_terminals(abstract_failure_inducing_input)" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "fragment" }, "solution2": "hidden", "solution2_first": true }, "source": [ "1. How does it change if you increase the number of test runs, using `max_tries_for_generalization`?\n", "2. What is the success rate of the new pattern?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** We can compute this by increasing `max_tries_for_generalization`:" ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:52.247450Z", "iopub.status.busy": "2023-11-12T12:41:52.247260Z", "iopub.status.idle": "2023-11-12T12:41:52.368015Z", "shell.execute_reply": "2023-11-12T12:41:52.367691Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "more_precise_bitg = \\\n", " cast(TreeGeneralizer, bad_input_tree_generalizer(max_tries_for_generalization=100))\n", "\n", "more_precise_abstract_failure_inducing_input = \\\n", " more_precise_bitg.generalize()" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "cell_style": "split", "execution": { "iopub.execute_input": "2023-11-12T12:41:52.369886Z", "iopub.status.busy": "2023-11-12T12:41:52.369778Z", "iopub.status.idle": "2023-11-12T12:41:52.372323Z", "shell.execute_reply": "2023-11-12T12:41:52.372033Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "data": { "text/plain": [ "'\"'" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_terminals(more_precise_abstract_failure_inducing_input)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "We see that we still have an opening tag; however, it no longer assumes attributes." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "The success rate can be computed as before:" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:52.373842Z", "iopub.status.busy": "2023-11-12T12:41:52.373732Z", "iopub.status.idle": "2023-11-12T12:41:53.545004Z", "shell.execute_reply": "2023-11-12T12:41:53.544686Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "successes = 0\n", "failures = 0\n", "trials = 1000\n", "\n", "for i in range(trials):\n", " test_input = all_terminals(\n", " more_precise_bitg.fuzz_tree(\n", " more_precise_abstract_failure_inducing_input))\n", " try:\n", " remove_html_markup(test_input)\n", " except AssertionError:\n", " successes += 1\n", " else:\n", " failures += 1" ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:53.546837Z", "iopub.status.busy": "2023-11-12T12:41:53.546706Z", "iopub.status.idle": "2023-11-12T12:41:53.548939Z", "shell.execute_reply": "2023-11-12T12:41:53.548697Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "data": { "text/plain": [ "(993, 7)" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "successes, failures" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:41:53.550457Z", "iopub.status.busy": "2023-11-12T12:41:53.550335Z", "iopub.status.idle": "2023-11-12T12:41:53.552420Z", "shell.execute_reply": "2023-11-12T12:41:53.552160Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "data": { "text/plain": [ "0.007" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "failures / 1000" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "We see that the success rate is now more than 99%, which is better than before. On the other hand, the pattern is now overly _special_, since there are `` with attributes such that the failure occurs (but also some that cancel out the error)." ] } ], "metadata": { "ipub": { "bibliography": "fuzzingbook.bib", "toc": true }, "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "toc-autonumbering": false, "vscode": { "interpreter": { "hash": "0af4f07dd039d1b4e562c7a7d0340393b1c66f50605ac6af30beb81aa23b7ef5" } } }, "nbformat": 4, "nbformat_minor": 4 }