{ "cells": [ { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "# Fuzzing with Generators\n", "\n", "In this chapter, we show how to extend grammars with _functions_ – pieces of code that get executed during grammar expansion, and that can generate, check, or change elements produced. Adding functions to a grammar allows for very versatile test generation, bringing together the best of grammar generation and programming." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:02.514314Z", "iopub.status.busy": "2024-01-18T17:17:02.513772Z", "iopub.status.idle": "2024-01-18T17:17:02.581721Z", "shell.execute_reply": "2024-01-18T17:17:02.581204Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from bookutils import YouTubeVideo\n", "YouTubeVideo('6Z35ChunpLY')" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "**Prerequisites**\n", "\n", "* As this chapter deeply interacts with the techniques discussed in the [chapter on efficient grammar fuzzing](GrammarFuzzer.ipynb), a good understanding of the techniques is recommended." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Synopsis\n", "\n", "\n", "To [use the code provided in this chapter](Importing.ipynb), write\n", "\n", "```python\n", ">>> from fuzzingbook.GeneratorGrammarFuzzer import \n", "```\n", "\n", "and then make use of the following features.\n", "\n", "\n", "This chapter introduces the ability to attach _functions_ to individual production rules:\n", "\n", "* A `pre` function is executed _before_ the expansion takes place. Its result (typically a string) can _replace_ the actual expansion.\n", "* A `post` function is executed _after_ the expansion has taken place. If it returns a string, the string replaces the expansion; if it returns `False`, it triggers a new expansion.\n", "\n", "Both functions can return `None` to not interfere with grammar production at all.\n", "\n", "To attach a function `F` to an individual expansion `S` in a grammar, replace `S` with a pair\n", "\n", "```python\n", "(S, opts(pre=F)) # Set a function to be executed before expansion\n", "```\n", "or\n", "```python\n", "(S, opts(post=F)) # Set a function to be executed after expansion\n", "```\n", "\n", "Here is an example, To take an area code from a list that is given programmatically, we can write:\n", "\n", "```python\n", ">>> from Grammars import US_PHONE_GRAMMAR, extend_grammar, opts\n", ">>> def pick_area_code():\n", ">>> return random.choice(['555', '554', '553'])\n", ">>> PICKED_US_PHONE_GRAMMAR = extend_grammar(US_PHONE_GRAMMAR,\n", ">>> {\n", ">>> \"\": [(\"\", opts(pre=pick_area_code))]\n", ">>> })\n", "```\n", "A `GeneratorGrammarFuzzer` will extract and interpret these options. Here is an example:\n", "\n", "```python\n", ">>> picked_us_phone_fuzzer = GeneratorGrammarFuzzer(PICKED_US_PHONE_GRAMMAR)\n", ">>> [picked_us_phone_fuzzer.fuzz() for i in range(5)]\n", "['(554)732-6097',\n", " '(555)469-0662',\n", " '(553)671-5358',\n", " '(555)686-8011',\n", " '(554)453-4067']\n", "```\n", "As you can see, the area codes now all stem from `pick_area_code()`. Such definitions allow closely tying program code (such as `pick_area_code()`) to grammars.\n", "\n", "The `PGGCFuzzer` class incorporates all features from [the `GrammarFuzzer` class](GrammarFuzzer.ipynb) and its [coverage-based](GrammarCoverageFuzzer.ipynb), [probabilistic-based](ProbabilisticGrammarFuzzer.ipynb), and [generator-based](GeneratorGrammarFuzzer.ipynb) derivatives.\n", "\n", "![](PICS/GeneratorGrammarFuzzer-synopsis-1.svg)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Example: Test a Credit Card System\n", "\n", "Suppose you work with a shopping system that – among several other features – allows customers to pay with a credit card. Your task is to test the payment functionality. " ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "source": [ "To make things simple, we will assume that we need only two pieces of data – a 16-digit credit card number and an amount to be charged. Both pieces can be easily generated with grammars, as in the following:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "button": false, "execution": { "iopub.execute_input": "2024-01-18T17:17:02.604220Z", "iopub.status.busy": "2024-01-18T17:17:02.604037Z", "iopub.status.idle": "2024-01-18T17:17:02.606111Z", "shell.execute_reply": "2024-01-18T17:17:02.605881Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import bookutils.setup" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:02.607716Z", "iopub.status.busy": "2024-01-18T17:17:02.607597Z", "iopub.status.idle": "2024-01-18T17:17:02.609298Z", "shell.execute_reply": "2024-01-18T17:17:02.609015Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from typing import Callable, Set, List, Dict, Optional, Iterator, Any, Union, Tuple, cast" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:02.610879Z", "iopub.status.busy": "2024-01-18T17:17:02.610775Z", "iopub.status.idle": "2024-01-18T17:17:02.688976Z", "shell.execute_reply": "2024-01-18T17:17:02.688699Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from Fuzzer import Fuzzer" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:02.690952Z", "iopub.status.busy": "2024-01-18T17:17:02.690819Z", "iopub.status.idle": "2024-01-18T17:17:03.027772Z", "shell.execute_reply": "2024-01-18T17:17:03.026687Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from Grammars import EXPR_GRAMMAR, is_valid_grammar, is_nonterminal, extend_grammar\n", "from Grammars import opts, exp_opt, exp_string, crange, Grammar, Expansion" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.032923Z", "iopub.status.busy": "2024-01-18T17:17:03.032489Z", "iopub.status.idle": "2024-01-18T17:17:03.068817Z", "shell.execute_reply": "2024-01-18T17:17:03.068530Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from GrammarFuzzer import DerivationTree" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.070604Z", "iopub.status.busy": "2024-01-18T17:17:03.070509Z", "iopub.status.idle": "2024-01-18T17:17:03.072570Z", "shell.execute_reply": "2024-01-18T17:17:03.072276Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "CHARGE_GRAMMAR: Grammar = {\n", " \"\": [\"Charge to my credit card \"],\n", " \"\": [\"$\"],\n", " \"\": [\".\"],\n", " \"\": [\"\", \"\"],\n", " \"\": crange('0', '9'),\n", "\n", " \"\": [\"\"],\n", " \"\": [\"\"],\n", " \"\": [\"\"],\n", "}" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.074241Z", "iopub.status.busy": "2024-01-18T17:17:03.074137Z", "iopub.status.idle": "2024-01-18T17:17:03.076220Z", "shell.execute_reply": "2024-01-18T17:17:03.075907Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert is_valid_grammar(CHARGE_GRAMMAR)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "All of this works neatly – we can generate arbitrary amounts and credit card numbers:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.077818Z", "iopub.status.busy": "2024-01-18T17:17:03.077709Z", "iopub.status.idle": "2024-01-18T17:17:03.079312Z", "shell.execute_reply": "2024-01-18T17:17:03.079056Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from GrammarFuzzer import GrammarFuzzer, all_terminals" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.080648Z", "iopub.status.busy": "2024-01-18T17:17:03.080562Z", "iopub.status.idle": "2024-01-18T17:17:03.088043Z", "shell.execute_reply": "2024-01-18T17:17:03.087737Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "['Charge $9.40 to my credit card 7166898575638313',\n", " 'Charge $8.79 to my credit card 6845418694643271',\n", " 'Charge $5.64 to my credit card 6655894657077388',\n", " 'Charge $0.60 to my credit card 2596728464872261',\n", " 'Charge $8.90 to my credit card 2363769342732142']" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g = GrammarFuzzer(CHARGE_GRAMMAR)\n", "[g.fuzz() for i in range(5)]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "However, when actually testing our system with this data, we find two problems:\n", "\n", "1. We'd like to test _specific_ amounts being charged – for instance, amounts that would excess the credit card limit.\n", "2. We find that 9 out of 10 credit card numbers are rejected because of having an incorrect checksum. This is fine if we want to test rejection of credit card numbers – but if we want to test the actual functionality of processing a charge, we need _valid_ numbers." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We could go and ignore these issues; after all, eventually, it is only a matter of time until large amounts and valid numbers are generated. As it comes to the first concern, we could also address it by changing the grammar appropriately – say, to only produce charges that have at least six leading digits. However, generalizing this to arbitrary ranges of values will be cumbersome.\n", "\n", "The second concern, the checksums of credit card numbers, however, runs deeper – at least as far as grammars are concerned, is that a complex arithmetic operation like a checksum cannot be expressed in a grammar alone – at least not in the _context-free grammars_ we use here. (In principle, one _could_ do this in a _context–sensitive_ grammar, but specifying this would be no fun at all.) What we want is a mechanism that allows us to _attach programmatic computations_ to our grammars, bringing together the best of both worlds." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" }, "toc-hr-collapsed": true }, "source": [ "## Attaching Functions to Expansions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The key idea of this chapter is to _extend_ grammars such that one can _attach Python functions_ to individual expansions. These functions can be executed \n", "\n", "1. _before_ expansion, _replacing_ the element to be expanded by a computed value; or\n", "2. _after_ expansion, _checking_ generated elements, and possibly also replacing them.\n", "\n", "In both cases, functions are specified using the `opts()` expansion mechanism introduced in the [chapter on grammars](Grammars.ipynb). They are thus tied to a specific expansion $e$ of a symbol $s$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Functions Called Before Expansion\n", "\n", "A function defined using the `pre` option is invoked _before_ expansion of $s$ into $e$. Its value _replaces_ the expansion $e$ to be produced. To generate a value for the credit card example, above, we could define a _pre-expansion_ generator function" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.089682Z", "iopub.status.busy": "2024-01-18T17:17:03.089580Z", "iopub.status.idle": "2024-01-18T17:17:03.091336Z", "shell.execute_reply": "2024-01-18T17:17:03.090992Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import random" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.093036Z", "iopub.status.busy": "2024-01-18T17:17:03.092915Z", "iopub.status.idle": "2024-01-18T17:17:03.094692Z", "shell.execute_reply": "2024-01-18T17:17:03.094441Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def high_charge() -> float:\n", " return random.randint(10000000, 90000000) / 100.0" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With `opts()`, we could attach this function to the grammar:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.096327Z", "iopub.status.busy": "2024-01-18T17:17:03.096227Z", "iopub.status.idle": "2024-01-18T17:17:03.097847Z", "shell.execute_reply": "2024-01-18T17:17:03.097605Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "CHARGE_GRAMMAR.update({\n", " \"\": [(\".\", opts(pre=high_charge))],\n", "})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "with the intention that whenever `` is expanded, the function `high_charge` would be invoked to generate a value for ``. (The actual expansion in the grammar would still be present for fuzzers that ignore functions, such as `GrammarFuzzer`)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Since functions tied to a grammar are frequently very simple, we can also _inline_ them using a *lambda* expression. A _lambda expression_ is used for _anonymous_ functions that are limited in scope and functionality. Here's an example:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.099367Z", "iopub.status.busy": "2024-01-18T17:17:03.099260Z", "iopub.status.idle": "2024-01-18T17:17:03.100853Z", "shell.execute_reply": "2024-01-18T17:17:03.100615Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def apply_twice(function, x):\n", " return function(function(x))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.102235Z", "iopub.status.busy": "2024-01-18T17:17:03.102150Z", "iopub.status.idle": "2024-01-18T17:17:03.104460Z", "shell.execute_reply": "2024-01-18T17:17:03.104210Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "16" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "apply_twice(lambda x: x * x, 2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here, we don't have to give the `function` to be applied twice a name (say, `square()`); instead, we apply it inline within the invocation." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Using `lambda`, this is what our grammar looks like:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.106033Z", "iopub.status.busy": "2024-01-18T17:17:03.105926Z", "iopub.status.idle": "2024-01-18T17:17:03.107689Z", "shell.execute_reply": "2024-01-18T17:17:03.107409Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "CHARGE_GRAMMAR.update({\n", " \"\": [(\".\",\n", " opts(pre=lambda: random.randint(10000000, 90000000) / 100.0))]\n", "})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Functions Called After Expansion\n", "\n", "A function defined using the `post` option is invoked _after_ expansion of $s$ into $e$, passing the expanded values of the symbols in $e$ as arguments. A post-expansion function can serve in two ways:\n", "\n", "1. It can serve as a *constraint* or _filter_ on the expanded values, returning `True` if the expansion is valid, and `False` if not; if it returns `False`, another expansion is attempted.\n", "2. It can also serve as a *repair*, returning a string value; like pre-expansion functions, the returned value replaces the expansion.\n", "\n", "For our credit card example, we can choose both ways. If we have a function `check_credit_card(s)` which returns `True` for a valid number `s` and `False` for invalid ones, we would go for the first option:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.109495Z", "iopub.status.busy": "2024-01-18T17:17:03.109343Z", "iopub.status.idle": "2024-01-18T17:17:03.111209Z", "shell.execute_reply": "2024-01-18T17:17:03.110966Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "CHARGE_GRAMMAR.update({\n", " \"\": [(\"\", opts(post=lambda digits: check_credit_card(digits)))]\n", "})" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With such a filter, only valid credit cards will be produced. On average, it will still take 10 attempts for each time `check_credit_card()` is satisfied." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we have a function `fix_credit_card(s)` which changes the number such that the checksum is valid and returns the \"fixed\" number, we can make use of this one instead:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.112759Z", "iopub.status.busy": "2024-01-18T17:17:03.112655Z", "iopub.status.idle": "2024-01-18T17:17:03.114320Z", "shell.execute_reply": "2024-01-18T17:17:03.114063Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "CHARGE_GRAMMAR.update({\n", " \"\": [(\"\", opts(post=lambda digits: fix_credit_card(digits)))]\n", "})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here, each number is generated only once and then repaired. This is very efficient." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The checksum function used for credit cards is the [Luhn algorithm](https://en.wikipedia.org/wiki/Luhn_algorithm), a simple yet effective formula." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.115938Z", "iopub.status.busy": "2024-01-18T17:17:03.115832Z", "iopub.status.idle": "2024-01-18T17:17:03.117926Z", "shell.execute_reply": "2024-01-18T17:17:03.117681Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def luhn_checksum(s: str) -> int:\n", " \"\"\"Compute Luhn's check digit over a string of digits\"\"\"\n", " LUHN_ODD_LOOKUP = (0, 2, 4, 6, 8, 1, 3, 5, 7,\n", " 9) # sum_of_digits (index * 2)\n", "\n", " evens = sum(int(p) for p in s[-1::-2])\n", " odds = sum(LUHN_ODD_LOOKUP[int(p)] for p in s[-2::-2])\n", " return (evens + odds) % 10" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.119312Z", "iopub.status.busy": "2024-01-18T17:17:03.119215Z", "iopub.status.idle": "2024-01-18T17:17:03.120897Z", "shell.execute_reply": "2024-01-18T17:17:03.120665Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def valid_luhn_checksum(s: str) -> bool:\n", " \"\"\"Check whether the last digit is Luhn's checksum over the earlier digits\"\"\"\n", " return luhn_checksum(s[:-1]) == int(s[-1])" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.122344Z", "iopub.status.busy": "2024-01-18T17:17:03.122236Z", "iopub.status.idle": "2024-01-18T17:17:03.123895Z", "shell.execute_reply": "2024-01-18T17:17:03.123643Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def fix_luhn_checksum(s: str) -> str:\n", " \"\"\"Return the given string of digits, with a fixed check digit\"\"\"\n", " return s[:-1] + repr(luhn_checksum(s[:-1]))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.125618Z", "iopub.status.busy": "2024-01-18T17:17:03.125487Z", "iopub.status.idle": "2024-01-18T17:17:03.127561Z", "shell.execute_reply": "2024-01-18T17:17:03.127290Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "luhn_checksum(\"123\")" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.129061Z", "iopub.status.busy": "2024-01-18T17:17:03.128957Z", "iopub.status.idle": "2024-01-18T17:17:03.130924Z", "shell.execute_reply": "2024-01-18T17:17:03.130663Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'1238'" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fix_luhn_checksum(\"123x\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can make use of these functions in our credit card grammar:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.132547Z", "iopub.status.busy": "2024-01-18T17:17:03.132440Z", "iopub.status.idle": "2024-01-18T17:17:03.134103Z", "shell.execute_reply": "2024-01-18T17:17:03.133847Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "check_credit_card: Callable[[str], bool] = valid_luhn_checksum\n", "fix_credit_card: Callable[[str], str] = fix_luhn_checksum" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.135446Z", "iopub.status.busy": "2024-01-18T17:17:03.135361Z", "iopub.status.idle": "2024-01-18T17:17:03.137697Z", "shell.execute_reply": "2024-01-18T17:17:03.137404Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'1234567890123458'" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fix_credit_card(\"1234567890123456\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## A Class for Integrating Constraints" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "While it is easy to specify functions, our grammar fuzzer will simply ignore them just as it ignores all extensions. It will issue a warning, though:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.139312Z", "iopub.status.busy": "2024-01-18T17:17:03.139204Z", "iopub.status.idle": "2024-01-18T17:17:03.142532Z", "shell.execute_reply": "2024-01-18T17:17:03.142176Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'Charge $4.05 to my credit card 0637034038177393'" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "g = GrammarFuzzer(CHARGE_GRAMMAR)\n", "g.fuzz()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We need to define a special fuzzer that actually invokes the given `pre` and `post` functions and acts accordingly. We name this a `GeneratorGrammarFuzzer`:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.144128Z", "iopub.status.busy": "2024-01-18T17:17:03.144037Z", "iopub.status.idle": "2024-01-18T17:17:03.145930Z", "shell.execute_reply": "2024-01-18T17:17:03.145690Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class GeneratorGrammarFuzzer(GrammarFuzzer):\n", " def supported_opts(self) -> Set[str]:\n", " return super().supported_opts() | {\"pre\", \"post\", \"order\"}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We define custom functions to access the `pre` and `post` options:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.147318Z", "iopub.status.busy": "2024-01-18T17:17:03.147239Z", "iopub.status.idle": "2024-01-18T17:17:03.148933Z", "shell.execute_reply": "2024-01-18T17:17:03.148642Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def exp_pre_expansion_function(expansion: Expansion) -> Optional[Callable]:\n", " \"\"\"Return the specified pre-expansion function, or None if unspecified\"\"\"\n", " return exp_opt(expansion, 'pre')" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.150358Z", "iopub.status.busy": "2024-01-18T17:17:03.150278Z", "iopub.status.idle": "2024-01-18T17:17:03.152068Z", "shell.execute_reply": "2024-01-18T17:17:03.151847Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def exp_post_expansion_function(expansion: Expansion) -> Optional[Callable]:\n", " \"\"\"Return the specified post-expansion function, or None if unspecified\"\"\"\n", " return exp_opt(expansion, 'post')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `order` attribute will be used [later in this chapter](#Ordering-Expansions)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "toc-hr-collapsed": true }, "source": [ "## Generating Elements before Expansion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Our first task will be implementing the pre-expansion functions – that is, the function that would be invoked _before_ expansion to replace the value to be expanded. To this end, we hook into the `process_chosen_children()` method, which gets the selected children before expansion. We set it up such that it invokes the given `pre` function and applies its result on the children, possibly replacing them." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.153529Z", "iopub.status.busy": "2024-01-18T17:17:03.153444Z", "iopub.status.idle": "2024-01-18T17:17:03.155102Z", "shell.execute_reply": "2024-01-18T17:17:03.154838Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import inspect" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.156558Z", "iopub.status.busy": "2024-01-18T17:17:03.156477Z", "iopub.status.idle": "2024-01-18T17:17:03.159288Z", "shell.execute_reply": "2024-01-18T17:17:03.158996Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n", " def process_chosen_children(self, children: List[DerivationTree],\n", " expansion: Expansion) -> List[DerivationTree]:\n", " function = exp_pre_expansion_function(expansion)\n", " if function is None:\n", " return children\n", "\n", " assert callable(function)\n", " if inspect.isgeneratorfunction(function):\n", " # See \"generators\", below\n", " result = self.run_generator(expansion, function)\n", " else:\n", " result = function()\n", "\n", " if self.log:\n", " print(repr(function) + \"()\", \"=\", repr(result))\n", " return self.apply_result(result, children)\n", "\n", " def run_generator(self, expansion: Expansion, function: Callable):\n", " ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The method `apply_result()` takes the result from the pre-expansion function and applies it on the children. The exact effect depends on the type of the result:\n", "\n", "* A _string_ $s$ replaces the entire expansion with $s$.\n", "* A _list_ $[x_1, x_2, \\dots, x_n]$ replaces the $i$-th symbol with $x_i$ for every $x_i$ that is not `None`. Specifying `None` as a list element $x_i$ is useful to leave that element unchanged. If $x_i$ is not a string, it is converted to a string.\n", "* A value of `None` is ignored. This is useful if one wants to simply call a function upon expansion, with no effect on the expanded strings.\n", "* _Boolean_ values are ignored. This is useful for post-expansion functions, discussed below.\n", "* All _other types_ are converted to strings, replacing the entire expansion." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.160796Z", "iopub.status.busy": "2024-01-18T17:17:03.160713Z", "iopub.status.idle": "2024-01-18T17:17:03.164336Z", "shell.execute_reply": "2024-01-18T17:17:03.164079Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n", " def apply_result(self, result: Any,\n", " children: List[DerivationTree]) -> List[DerivationTree]:\n", " if isinstance(result, str):\n", " children = [(result, [])]\n", " elif isinstance(result, list):\n", " symbol_indexes = [i for i, c in enumerate(children)\n", " if is_nonterminal(c[0])]\n", "\n", " for index, value in enumerate(result):\n", " if value is not None:\n", " child_index = symbol_indexes[index]\n", " if not isinstance(value, str):\n", " value = repr(value)\n", " if self.log:\n", " print(\n", " \"Replacing\", all_terminals(\n", " children[child_index]), \"by\", value)\n", "\n", " # children[child_index] = (value, [])\n", " child_symbol, _ = children[child_index]\n", " children[child_index] = (child_symbol, [(value, [])])\n", " elif result is None:\n", " pass\n", " elif isinstance(result, bool):\n", " pass\n", " else:\n", " if self.log:\n", " print(\"Replacing\", \"\".join(\n", " [all_terminals(c) for c in children]), \"by\", result)\n", "\n", " children = [(repr(result), [])]\n", "\n", " return children" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example: Numeric Ranges\n", "\n", "With the above extensions, we have full support for pre-expansion functions. Using the augmented `CHARGE_GRAMMAR`, we find that the pre-expansion `lambda` function is actually used:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.165769Z", "iopub.status.busy": "2024-01-18T17:17:03.165685Z", "iopub.status.idle": "2024-01-18T17:17:03.168527Z", "shell.execute_reply": "2024-01-18T17:17:03.168299Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'Charge $439383.87 to my credit card 2433506594138520'" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "charge_fuzzer = GeneratorGrammarFuzzer(CHARGE_GRAMMAR)\n", "charge_fuzzer.fuzz()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The log reveals a bit more details what happens when the pre-expansion function is called. We see that the expansion `.` is directly replaced by the computed value:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.169955Z", "iopub.status.busy": "2024-01-18T17:17:03.169871Z", "iopub.status.idle": "2024-01-18T17:17:03.172286Z", "shell.execute_reply": "2024-01-18T17:17:03.172042Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tree: \n", "Expanding randomly\n", "Tree: $\n", "Expanding randomly\n", " at 0x118a69630>() = 382087.72\n", "Replacing . by 382087.72\n", "Tree: $382087.72\n", "'$382087.72'\n" ] }, { "data": { "text/plain": [ "'$382087.72'" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "amount_fuzzer = GeneratorGrammarFuzzer(\n", " CHARGE_GRAMMAR, start_symbol=\"\", log=True)\n", "amount_fuzzer.fuzz()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example: More Numeric Ranges\n", "\n", "We can use such pre-expansion functions in other contexts, too. Suppose we want to generate arithmetic expressions in which each number is between 100 and 200. We can extend `EXPR_GRAMMAR` accordingly:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.173959Z", "iopub.status.busy": "2024-01-18T17:17:03.173819Z", "iopub.status.idle": "2024-01-18T17:17:03.176003Z", "shell.execute_reply": "2024-01-18T17:17:03.175747Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "expr_100_200_grammar = extend_grammar(EXPR_GRAMMAR,\n", " {\n", " \"\": [\n", " \"+\", \"-\", \"()\",\n", "\n", " # Generate only the integer part with a function;\n", " # the fractional part comes from\n", " # the grammar\n", " (\".\", opts(\n", " pre=lambda: [random.randint(100, 200), None])),\n", "\n", " # Generate the entire integer\n", " # from the function\n", " (\"\", opts(\n", " pre=lambda: random.randint(100, 200))),\n", " ],\n", " }\n", " )" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.177523Z", "iopub.status.busy": "2024-01-18T17:17:03.177410Z", "iopub.status.idle": "2024-01-18T17:17:03.181672Z", "shell.execute_reply": "2024-01-18T17:17:03.181378Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'(108.6 / 155 + 177) / 118 * 120 * 107 + 151 + 195 / -200 - 150 * 188 / 147 + 112'" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "expr_100_200_fuzzer = GeneratorGrammarFuzzer(expr_100_200_grammar)\n", "expr_100_200_fuzzer.fuzz()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Support for Python Generators\n", "\n", "The Python language has its own concept of generator functions, which we of course want to support as well. A *generator function in Python* is a function that returns a so-called *iterator object* which we can iterate over, one value at a time." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To create a generator function in Python, one defines a normal function, using the `yield` statement instead of a `return` statement. While a `return` statement terminates the function, a `yield` statement pauses its execution, saving all of its state, to be resumed later for the next successive calls." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is an example of a generator function. When first invoked, `iterate()` yields the value 1, followed by 2, 3, and so on:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.183164Z", "iopub.status.busy": "2024-01-18T17:17:03.183083Z", "iopub.status.idle": "2024-01-18T17:17:03.184854Z", "shell.execute_reply": "2024-01-18T17:17:03.184619Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def iterate():\n", " t = 0\n", " while True:\n", " t = t + 1\n", " yield t" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can use `iterate` in a loop, just like the `range()` function (which also is a generator function):" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.186300Z", "iopub.status.busy": "2024-01-18T17:17:03.186215Z", "iopub.status.idle": "2024-01-18T17:17:03.188063Z", "shell.execute_reply": "2024-01-18T17:17:03.187841Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 2 3 4 5 6 7 8 9 10 " ] } ], "source": [ "for i in iterate():\n", " if i > 10:\n", " break\n", " print(i, end=\" \")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can also use `iterate()` as a pre-expansion generator function, ensuring it will create one successive integer after another:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.189555Z", "iopub.status.busy": "2024-01-18T17:17:03.189481Z", "iopub.status.idle": "2024-01-18T17:17:03.191463Z", "shell.execute_reply": "2024-01-18T17:17:03.191184Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "iterate_grammar = extend_grammar(EXPR_GRAMMAR,\n", " {\n", " \"\": [\n", " \"+\", \"-\", \"()\",\n", " # \".\",\n", "\n", " # Generate one integer after another\n", " # from the function\n", " (\"\", opts(pre=iterate)),\n", " ],\n", " })" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To support generators, our `process_chosen_children()` method, above, checks whether a function is a generator; if so, it invokes the `run_generator()` method. When `run_generator()` sees the function for the first time during a `fuzz_tree()` (or `fuzz()`) call, it invokes the function to create a generator object; this is saved in the `generators` attribute, and then called. Subsequent calls directly go to the generator, preserving state." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.193051Z", "iopub.status.busy": "2024-01-18T17:17:03.192952Z", "iopub.status.idle": "2024-01-18T17:17:03.195505Z", "shell.execute_reply": "2024-01-18T17:17:03.195264Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n", " def fuzz_tree(self) -> DerivationTree:\n", " self.reset_generators()\n", " return super().fuzz_tree()\n", "\n", " def reset_generators(self) -> None:\n", " self.generators: Dict[str, Iterator] = {}\n", "\n", " def run_generator(self, expansion: Expansion,\n", " function: Callable) -> Iterator:\n", " key = repr((expansion, function))\n", " if key not in self.generators:\n", " self.generators[key] = function()\n", " generator = self.generators[key]\n", " return next(generator)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Does this work? Let us run our fuzzer on the above grammar, using `iterator()`:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.196964Z", "iopub.status.busy": "2024-01-18T17:17:03.196879Z", "iopub.status.idle": "2024-01-18T17:17:03.202835Z", "shell.execute_reply": "2024-01-18T17:17:03.202582Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'1 * ++++3 / ---+4 - 2 * +--6 / 7 * 10 - (9 - 11) - 5 + (13) * 14 + 8 + 12'" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iterate_fuzzer = GeneratorGrammarFuzzer(iterate_grammar)\n", "iterate_fuzzer.fuzz()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see that the expression contains all integers starting with 1." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Instead of specifying our own Python generator function such as `iterate()`, we can also use one of the built-in Python generators such as `range()`. This will also generate integers starting with 1:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.204333Z", "iopub.status.busy": "2024-01-18T17:17:03.204230Z", "iopub.status.idle": "2024-01-18T17:17:03.205946Z", "shell.execute_reply": "2024-01-18T17:17:03.205721Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "iterate_grammar = extend_grammar(EXPR_GRAMMAR,\n", " {\n", " \"\": [\n", " \"+\", \"-\", \"()\",\n", " (\"\", opts(pre=range(1, 1000))),\n", " ],\n", " })" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It is also possible to use Python list comprehensions, by adding their generator functions in parentheses:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.207469Z", "iopub.status.busy": "2024-01-18T17:17:03.207341Z", "iopub.status.idle": "2024-01-18T17:17:03.209282Z", "shell.execute_reply": "2024-01-18T17:17:03.209038Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "iterate_grammar = extend_grammar(EXPR_GRAMMAR,\n", " {\n", " \"\": [\n", " \"+\", \"-\", \"()\",\n", " (\"\", opts(\n", " pre=(x for x in range(1, 1000)))),\n", " ],\n", " })" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Note that both above grammars will actually cause the fuzzer to raise an exception when more than 1,000 integers are created, but you will find it very easy to fix this." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Finally, `yield` is actually an expression, not a statement, so it is also possible to have a `lambda` expression `yield` a value. If you find some reasonable use for this, let us know." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "toc-hr-collapsed": true }, "source": [ "## Checking and Repairing Elements after Expansion\n", "\n", "Let us now turn to our second set of functions to be supported – namely, post-expansion functions. The simplest way of using them is to run them once the entire tree is generated, taking care of replacements as with `pre` functions. If one of them returns `False`, however, we start anew." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.210788Z", "iopub.status.busy": "2024-01-18T17:17:03.210686Z", "iopub.status.idle": "2024-01-18T17:17:03.212868Z", "shell.execute_reply": "2024-01-18T17:17:03.212641Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n", " def fuzz_tree(self) -> DerivationTree:\n", " while True:\n", " tree = super().fuzz_tree()\n", " (symbol, children) = tree\n", " result, new_children = self.run_post_functions(tree)\n", " if not isinstance(result, bool) or result:\n", " return (symbol, new_children)\n", " self.restart_expansion()\n", "\n", " def restart_expansion(self) -> None:\n", " # To be overloaded in subclasses\n", " self.reset_generators()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The method `run_post_functions()` is applied recursively on all nodes of the derivation tree. For each node, it determines the expansion applied, and then runs the function associated with that expansion." ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.214276Z", "iopub.status.busy": "2024-01-18T17:17:03.214182Z", "iopub.status.idle": "2024-01-18T17:17:03.217916Z", "shell.execute_reply": "2024-01-18T17:17:03.217675Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n", " # Return True iff all constraints of grammar are satisfied in TREE\n", " def run_post_functions(self, tree: DerivationTree,\n", " depth: Union[int, float] = float(\"inf\")) \\\n", " -> Tuple[bool, Optional[List[DerivationTree]]]:\n", " symbol: str = tree[0]\n", " children: List[DerivationTree] = cast(List[DerivationTree], tree[1])\n", "\n", " if children == []:\n", " return True, children # Terminal symbol\n", "\n", " try:\n", " expansion = self.find_expansion(tree)\n", " except KeyError:\n", " # Expansion (no longer) found - ignore\n", " return True, children\n", "\n", " result = True\n", " function = exp_post_expansion_function(expansion)\n", " if function is not None:\n", " result = self.eval_function(tree, function)\n", " if isinstance(result, bool) and not result:\n", " if self.log:\n", " print(\n", " all_terminals(tree),\n", " \"did not satisfy\",\n", " symbol,\n", " \"constraint\")\n", " return False, children\n", "\n", " children = self.apply_result(result, children)\n", "\n", " if depth > 0:\n", " for c in children:\n", " result, _ = self.run_post_functions(c, depth - 1)\n", " if isinstance(result, bool) and not result:\n", " return False, children\n", "\n", " return result, children" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The helper method `find_expansion()` takes a subtree `tree` and determines the expansion from the grammar that was applied to create the children in `tree`." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.219417Z", "iopub.status.busy": "2024-01-18T17:17:03.219333Z", "iopub.status.idle": "2024-01-18T17:17:03.221332Z", "shell.execute_reply": "2024-01-18T17:17:03.221082Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n", " def find_expansion(self, tree):\n", " symbol, children = tree\n", "\n", " applied_expansion = \\\n", " \"\".join([child_symbol for child_symbol, _ in children])\n", "\n", " for expansion in self.grammar[symbol]:\n", " if exp_string(expansion) == applied_expansion:\n", " return expansion\n", "\n", " raise KeyError(\n", " symbol +\n", " \": did not find expansion \" +\n", " repr(applied_expansion))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The method `eval_function()` is the one that takes care of actually invoking the post-expansion function. It creates an argument list containing the expansions of all nonterminal children – that is, one argument for each symbol in the grammar expansion. It then calls the given function." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.222709Z", "iopub.status.busy": "2024-01-18T17:17:03.222628Z", "iopub.status.idle": "2024-01-18T17:17:03.224983Z", "shell.execute_reply": "2024-01-18T17:17:03.224714Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n", " def eval_function(self, tree, function):\n", " symbol, children = tree\n", "\n", " assert callable(function)\n", "\n", " args = []\n", " for (symbol, exp) in children:\n", " if exp != [] and exp is not None:\n", " symbol_value = all_terminals((symbol, exp))\n", " args.append(symbol_value)\n", "\n", " result = function(*args)\n", " if self.log:\n", " print(repr(function) + repr(tuple(args)), \"=\", repr(result))\n", "\n", " return result" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Note that unlike pre-expansion functions, post-expansion functions typically process the values already produced, so we do not support Python generators here." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example: Negative Expressions\n", "\n", "Let us try out these post-expression functions on an example. Suppose we want to produce only arithmetic expressions that evaluate to a negative number – for instance, to feed such generated expressions into a compiler or some other external system. Doing so constructively with `pre` functions would be very difficult. Instead, we can define a constraint that checks for precisely this property, using the Python `eval()` function." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The Python `eval()` function takes a string and evaluates it according to Python rules. Since the syntax of our generated expressions is slightly different from Python, and since Python can raise arithmetic exceptions during evaluation, we need a means to handle such errors gracefully. The function `eval_with_exception()` wraps around `eval()`; if an exception occurs during evaluation, it returns False – which causes the production algorithm to produce another value." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.226700Z", "iopub.status.busy": "2024-01-18T17:17:03.226612Z", "iopub.status.idle": "2024-01-18T17:17:03.228159Z", "shell.execute_reply": "2024-01-18T17:17:03.227915Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ExpectError import ExpectError" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.229638Z", "iopub.status.busy": "2024-01-18T17:17:03.229561Z", "iopub.status.idle": "2024-01-18T17:17:03.231284Z", "shell.execute_reply": "2024-01-18T17:17:03.231041Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def eval_with_exception(s):\n", " # Use \"mute=True\" to suppress all messages\n", " with ExpectError(print_traceback=False):\n", " return eval(s)\n", " return False" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.232671Z", "iopub.status.busy": "2024-01-18T17:17:03.232592Z", "iopub.status.idle": "2024-01-18T17:17:03.234300Z", "shell.execute_reply": "2024-01-18T17:17:03.234056Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "negative_expr_grammar = extend_grammar(EXPR_GRAMMAR,\n", " {\n", " \"\": [(\"\", opts(post=lambda s: eval_with_exception(s) < 0))]\n", " }\n", " )\n", "\n", "assert is_valid_grammar(negative_expr_grammar)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.235667Z", "iopub.status.busy": "2024-01-18T17:17:03.235590Z", "iopub.status.idle": "2024-01-18T17:17:03.246073Z", "shell.execute_reply": "2024-01-18T17:17:03.245803Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "ZeroDivisionError: division by zero (expected)\n" ] }, { "data": { "text/plain": [ "'(8.9 / 6 * 4 - 0.2 + -7 - 7 - 8 * 6) * 7 * 15.55 - -945.9'" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "negative_expr_fuzzer = GeneratorGrammarFuzzer(negative_expr_grammar)\n", "expr = negative_expr_fuzzer.fuzz()\n", "expr" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The result is indeed negative:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.247625Z", "iopub.status.busy": "2024-01-18T17:17:03.247491Z", "iopub.status.idle": "2024-01-18T17:17:03.249578Z", "shell.execute_reply": "2024-01-18T17:17:03.249314Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "-5178.726666666667" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "eval(expr)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example: Matching XML Tags\n", "\n", "Post-expansion functions can not only be used to _check_ expansions, but also to repair them. To this end, we can have them return a string or a list of strings; just like pre-expansion functions, these strings would then replace the entire expansion or individual symbols." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "As an example, consider *XML documents*, which are composed of text within matching _XML tags_. For instance, consider the following fragment in HTML, a subset of XML:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.251032Z", "iopub.status.busy": "2024-01-18T17:17:03.250932Z", "iopub.status.idle": "2024-01-18T17:17:03.252380Z", "shell.execute_reply": "2024-01-18T17:17:03.252128Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import HTML" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.253831Z", "iopub.status.busy": "2024-01-18T17:17:03.253749Z", "iopub.status.idle": "2024-01-18T17:17:03.255837Z", "shell.execute_reply": "2024-01-18T17:17:03.255603Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "A bold text" ], "text/plain": [ "" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "HTML(\"A bold text\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This fragment consists of two HTML (XML) tags that surround the text; the tag name (`strong`) is present both in the opening (``) as well as in the closing (``) tag." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "For a _finite_ set of tags (for instance, the HTML tags ``, ``, ``, `
`, and so on), we could define a context-free grammar that parses it; each pair of tags would make up an individual rule in the grammar. If the set of tags is _infinite_, though, as with general XML, we cannot define an appropriate grammar; that is because the constraint that the closing tag must match the opening tag is context-sensitive and thus does not fit context-free grammars." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "(Incidentally, if the closing tag had the identifier _reversed_ (``), then a context-free grammar could describe it. Make this a programming exercise.)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can address this problem by introducing appropriate post-expansion functions that automatically make the closing tag match the opening tag. Let us start with a simple grammar for producing XML trees:" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.257535Z", "iopub.status.busy": "2024-01-18T17:17:03.257417Z", "iopub.status.idle": "2024-01-18T17:17:03.259275Z", "shell.execute_reply": "2024-01-18T17:17:03.259002Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "XML_GRAMMAR: Grammar = {\n", " \"\": [\"\"],\n", " \"\": [\"<>>\"],\n", " \"\": [\"Text\", \"\"],\n", " \"\": [\"\", \"\"],\n", " \"\": crange('a', 'z')\n", "}" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.260718Z", "iopub.status.busy": "2024-01-18T17:17:03.260618Z", "iopub.status.idle": "2024-01-18T17:17:03.262109Z", "shell.execute_reply": "2024-01-18T17:17:03.261877Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert is_valid_grammar(XML_GRAMMAR)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we fuzz using this grammar, we get non-matching XML tags, as expected:" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.263452Z", "iopub.status.busy": "2024-01-18T17:17:03.263378Z", "iopub.status.idle": "2024-01-18T17:17:03.266036Z", "shell.execute_reply": "2024-01-18T17:17:03.265802Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'Text'" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xml_fuzzer = GrammarFuzzer(XML_GRAMMAR)\n", "xml_fuzzer.fuzz()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Setting up a post-expansion function that sets the second identifier to the string found in the first solves the problem:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.267483Z", "iopub.status.busy": "2024-01-18T17:17:03.267399Z", "iopub.status.idle": "2024-01-18T17:17:03.269210Z", "shell.execute_reply": "2024-01-18T17:17:03.268974Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "XML_GRAMMAR.update({\n", " \"\": [(\"<>>\",\n", " opts(post=lambda id1, content, id2: [None, None, id1])\n", " )]\n", "})" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.270731Z", "iopub.status.busy": "2024-01-18T17:17:03.270647Z", "iopub.status.idle": "2024-01-18T17:17:03.272261Z", "shell.execute_reply": "2024-01-18T17:17:03.272036Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert is_valid_grammar(XML_GRAMMAR)" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.273749Z", "iopub.status.busy": "2024-01-18T17:17:03.273667Z", "iopub.status.idle": "2024-01-18T17:17:03.276337Z", "shell.execute_reply": "2024-01-18T17:17:03.276073Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'Text'" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "xml_fuzzer = GeneratorGrammarFuzzer(XML_GRAMMAR)\n", "xml_fuzzer.fuzz()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example: Checksums\n", "\n", "As our last example, let us consider the checksum problem from the introduction. With our newly defined repair mechanisms, we can now generate credit card numbers that are valid: " ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.277875Z", "iopub.status.busy": "2024-01-18T17:17:03.277768Z", "iopub.status.idle": "2024-01-18T17:17:03.280616Z", "shell.execute_reply": "2024-01-18T17:17:03.280360Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'2967308746680770'" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "credit_card_fuzzer = GeneratorGrammarFuzzer(\n", " CHARGE_GRAMMAR, start_symbol=\"\")\n", "credit_card_number = credit_card_fuzzer.fuzz()\n", "credit_card_number" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.282042Z", "iopub.status.busy": "2024-01-18T17:17:03.281937Z", "iopub.status.idle": "2024-01-18T17:17:03.283397Z", "shell.execute_reply": "2024-01-18T17:17:03.283167Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert valid_luhn_checksum(credit_card_number)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The validity extends to the entire grammar:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.284843Z", "iopub.status.busy": "2024-01-18T17:17:03.284756Z", "iopub.status.idle": "2024-01-18T17:17:03.287738Z", "shell.execute_reply": "2024-01-18T17:17:03.287513Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'Charge $818819.97 to my credit card 2817984968014288'" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "charge_fuzzer = GeneratorGrammarFuzzer(CHARGE_GRAMMAR)\n", "charge_fuzzer.fuzz()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Local Checking and Repairing\n", "\n", "So far, we have always first generated an entire expression tree, only to check it later for validity. This can become expensive: If several elements are first generated only to find later that one of them is invalid, we spend a lot of time trying (randomly) to regenerate a matching input." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To demonstrate the issue, let us create an expression grammar in which all digits consist of zeros and ones. Rather than doing this constructively, though, we filter out all non-conforming expressions after the fact, using a `post` constraint:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.289270Z", "iopub.status.busy": "2024-01-18T17:17:03.289178Z", "iopub.status.idle": "2024-01-18T17:17:03.291305Z", "shell.execute_reply": "2024-01-18T17:17:03.291001Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "binary_expr_grammar = extend_grammar(EXPR_GRAMMAR,\n", " {\n", " \"\": [(\"\", opts(post=lambda digit, _: digit in [\"0\", \"1\"])),\n", " (\"\", opts(post=lambda digit: digit in [\"0\", \"1\"]))]\n", " }\n", " )" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.292849Z", "iopub.status.busy": "2024-01-18T17:17:03.292744Z", "iopub.status.idle": "2024-01-18T17:17:03.294354Z", "shell.execute_reply": "2024-01-18T17:17:03.294110Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert is_valid_grammar(binary_expr_grammar)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This works, but is very slow; it can take several seconds before a matching expression is found." ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.295807Z", "iopub.status.busy": "2024-01-18T17:17:03.295731Z", "iopub.status.idle": "2024-01-18T17:17:03.435710Z", "shell.execute_reply": "2024-01-18T17:17:03.435431Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'(-+0)'" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "binary_expr_fuzzer = GeneratorGrammarFuzzer(binary_expr_grammar)\n", "binary_expr_fuzzer.fuzz()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can address the problem by checking constraints not only for the final subtree, but also for partial subtrees as soon as they are complete. To this end, we extend the method `expand_tree_once()` such that it invokes the post-expansion function as soon as all symbols in a subtree are expanded." ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.437360Z", "iopub.status.busy": "2024-01-18T17:17:03.437248Z", "iopub.status.idle": "2024-01-18T17:17:03.438860Z", "shell.execute_reply": "2024-01-18T17:17:03.438626Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class RestartExpansionException(Exception):\n", " pass" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.440253Z", "iopub.status.busy": "2024-01-18T17:17:03.440171Z", "iopub.status.idle": "2024-01-18T17:17:03.442328Z", "shell.execute_reply": "2024-01-18T17:17:03.442072Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n", " def expand_tree_once(self, tree: DerivationTree) -> DerivationTree:\n", " # Apply inherited method. This also calls `expand_tree_once()` on all\n", " # subtrees.\n", " new_tree: DerivationTree = super().expand_tree_once(tree)\n", "\n", " (symbol, children) = new_tree\n", " if all([exp_post_expansion_function(expansion)\n", " is None for expansion in self.grammar[symbol]]):\n", " # No constraints for this symbol\n", " return new_tree\n", "\n", " if self.any_possible_expansions(tree):\n", " # Still expanding\n", " return new_tree\n", "\n", " return self.run_post_functions_locally(new_tree)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The main work takes place in the helper method `run_post_functions_locally()`. It runs the post-expansion function $f$ with `run_post_functions()` only on the current node by setting `depth` to zero, as any completed subtrees would have their post-expansion functions ran already. If $f$ returns `False`, `run_post_functions_locally()` returns a non-expanded symbol, such that the main driver can try another expansion. It does so for up to 10 times (configurable via a `replacement_attempts` parameter during construction); after that, it raises a `RestartExpansionException` to restart creating the tree from scratch." ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.443751Z", "iopub.status.busy": "2024-01-18T17:17:03.443653Z", "iopub.status.idle": "2024-01-18T17:17:03.446184Z", "shell.execute_reply": "2024-01-18T17:17:03.445928Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n", " def run_post_functions_locally(self, new_tree: DerivationTree) -> DerivationTree:\n", " symbol, _ = new_tree\n", "\n", " result, children = self.run_post_functions(new_tree, depth=0)\n", " if not isinstance(result, bool) or result:\n", " # No constraints, or constraint satisfied\n", " # children = self.apply_result(result, children)\n", " new_tree = (symbol, children)\n", " return new_tree\n", "\n", " # Replace tree by unexpanded symbol and try again\n", " if self.log:\n", " print(\n", " all_terminals(new_tree),\n", " \"did not satisfy\",\n", " symbol,\n", " \"constraint\")\n", "\n", " if self.replacement_attempts_counter > 0:\n", " if self.log:\n", " print(\"Trying another expansion\")\n", " self.replacement_attempts_counter -= 1\n", " return (symbol, None)\n", "\n", " if self.log:\n", " print(\"Starting from scratch\")\n", " raise RestartExpansionException" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The class constructor method and `fuzz_tree()` are set up to handle the additional functionality: " ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.447638Z", "iopub.status.busy": "2024-01-18T17:17:03.447539Z", "iopub.status.idle": "2024-01-18T17:17:03.450001Z", "shell.execute_reply": "2024-01-18T17:17:03.449754Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n", " def __init__(self, grammar: Grammar, replacement_attempts: int = 10,\n", " **kwargs) -> None:\n", " super().__init__(grammar, **kwargs)\n", " self.replacement_attempts = replacement_attempts\n", "\n", " def restart_expansion(self) -> None:\n", " super().restart_expansion()\n", " self.replacement_attempts_counter = self.replacement_attempts\n", "\n", " def fuzz_tree(self) -> DerivationTree:\n", " self.replacement_attempts_counter = self.replacement_attempts\n", " while True:\n", " try:\n", " # This is fuzz_tree() as defined above\n", " tree = super().fuzz_tree()\n", " return tree\n", " except RestartExpansionException:\n", " self.restart_expansion()" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.451480Z", "iopub.status.busy": "2024-01-18T17:17:03.451378Z", "iopub.status.idle": "2024-01-18T17:17:03.464903Z", "shell.execute_reply": "2024-01-18T17:17:03.464626Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'+0 / +-1 - 1 / +0 * -+0 * 0 * 1 / 1'" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "binary_expr_fuzzer = GeneratorGrammarFuzzer(\n", " binary_expr_grammar, replacement_attempts=100)\n", "binary_expr_fuzzer.fuzz()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Definitions and Uses\n", "\n", "With the above generators and constraints, we can also address complex examples. The `VAR_GRAMMAR` grammar from [the chapter on parsers](Parser.ipynb) defines a number of variables as arithmetic expressions (which in turn can contain variables, too). Applying a simple `GrammarFuzzer` on the grammar produces plenty of identifiers, but each identifier has a unique name." ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.466351Z", "iopub.status.busy": "2024-01-18T17:17:03.466272Z", "iopub.status.idle": "2024-01-18T17:17:03.467862Z", "shell.execute_reply": "2024-01-18T17:17:03.467611Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import string" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.469234Z", "iopub.status.busy": "2024-01-18T17:17:03.469149Z", "iopub.status.idle": "2024-01-18T17:17:03.471406Z", "shell.execute_reply": "2024-01-18T17:17:03.471167Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "VAR_GRAMMAR: Grammar = {\n", " '': [''],\n", " '': [';', ''],\n", " '': [''],\n", " '': ['='],\n", " '': [''],\n", " '': ['', ''],\n", " '': list(string.ascii_letters),\n", " '': ['+', '-', ''],\n", " '': ['*', '/', ''],\n", " '':\n", " ['+', '-', '()', '', ''],\n", " '': ['.', ''],\n", " '': ['', ''],\n", " '': crange('0', '9')\n", "}" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.472765Z", "iopub.status.busy": "2024-01-18T17:17:03.472684Z", "iopub.status.idle": "2024-01-18T17:17:03.474502Z", "shell.execute_reply": "2024-01-18T17:17:03.474272Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "assert is_valid_grammar(VAR_GRAMMAR)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.475895Z", "iopub.status.busy": "2024-01-18T17:17:03.475816Z", "iopub.status.idle": "2024-01-18T17:17:03.528934Z", "shell.execute_reply": "2024-01-18T17:17:03.528714Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Gc=F/1*Y+M-D-9;N=n/(m)/m*7\n", "a=79.0;W=o-9;v=2;K=u;D=9\n", "o=y-z+y+4;q=5+W;X=T\n", "M=-98.032*5/o\n", "H=IA-5-1;n=3-t;QQ=5-5\n", "Y=-80;d=D-M+M;Z=4.3+1*r-5+b\n", "ZDGSS=(1*Y-4)*54/0*pcO/4;RI=r*5.0\n", "Q=6+z-6;J=6/t/9/i-3-5+k\n", "x=-GT*+-x*6++-93*5\n", "q=da*T/e--v;x=3+g;bk=u\n" ] } ], "source": [ "g = GrammarFuzzer(VAR_GRAMMAR)\n", "for i in range(10):\n", " print(g.fuzz())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "What we'd like is that within expressions, only identifiers _previously defined_ should be used. To this end, we introduce a set of functions around a *symbol table*, which keeps track of all variables already defined." ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.530534Z", "iopub.status.busy": "2024-01-18T17:17:03.530455Z", "iopub.status.idle": "2024-01-18T17:17:03.532138Z", "shell.execute_reply": "2024-01-18T17:17:03.531897Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "SYMBOL_TABLE: Set[str] = set()" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.533523Z", "iopub.status.busy": "2024-01-18T17:17:03.533438Z", "iopub.status.idle": "2024-01-18T17:17:03.535106Z", "shell.execute_reply": "2024-01-18T17:17:03.534872Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def define_id(id: str) -> None:\n", " SYMBOL_TABLE.add(id)" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.536473Z", "iopub.status.busy": "2024-01-18T17:17:03.536394Z", "iopub.status.idle": "2024-01-18T17:17:03.538248Z", "shell.execute_reply": "2024-01-18T17:17:03.538019Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def use_id() -> Union[bool, str]:\n", " if len(SYMBOL_TABLE) == 0:\n", " return False\n", "\n", " id = random.choice(list(SYMBOL_TABLE))\n", " return id" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.539647Z", "iopub.status.busy": "2024-01-18T17:17:03.539560Z", "iopub.status.idle": "2024-01-18T17:17:03.541192Z", "shell.execute_reply": "2024-01-18T17:17:03.540969Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def clear_symbol_table() -> None:\n", " global SYMBOL_TABLE\n", " SYMBOL_TABLE = set()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To make use of the symbol table, we attach pre- and post-expansion functions to `VAR_GRAMMAR` that define and lookup identifiers from the symbol table. We name our extended grammar `CONSTRAINED_VAR_GRAMMAR`:" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.542604Z", "iopub.status.busy": "2024-01-18T17:17:03.542529Z", "iopub.status.idle": "2024-01-18T17:17:03.544160Z", "shell.execute_reply": "2024-01-18T17:17:03.543920Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "CONSTRAINED_VAR_GRAMMAR = extend_grammar(VAR_GRAMMAR)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "First, we set up the grammar such that after each time an identifier is defined, we store its name in the symbol table:" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.545546Z", "iopub.status.busy": "2024-01-18T17:17:03.545467Z", "iopub.status.idle": "2024-01-18T17:17:03.547202Z", "shell.execute_reply": "2024-01-18T17:17:03.546921Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "CONSTRAINED_VAR_GRAMMAR = extend_grammar(CONSTRAINED_VAR_GRAMMAR, {\n", " \"\": [(\"=\",\n", " opts(post=lambda id, expr: define_id(id)))]\n", "})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Second, we make sure that when an identifier is generated, we pick it from the symbol table, too. (We use `post` here such that we can return `False` if no identifier is yet available, leading to another expansion being made.)" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.548663Z", "iopub.status.busy": "2024-01-18T17:17:03.548584Z", "iopub.status.idle": "2024-01-18T17:17:03.550344Z", "shell.execute_reply": "2024-01-18T17:17:03.550116Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "CONSTRAINED_VAR_GRAMMAR = extend_grammar(CONSTRAINED_VAR_GRAMMAR, {\n", " \"\": ['+', '-', '()',\n", " (\"\", opts(post=lambda _: use_id())),\n", " '']\n", "})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Finally, we clear the symbol table each time we (re)start an expansion. This is helpful as we may occasionally have to restart expansions." ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.551774Z", "iopub.status.busy": "2024-01-18T17:17:03.551692Z", "iopub.status.idle": "2024-01-18T17:17:03.553337Z", "shell.execute_reply": "2024-01-18T17:17:03.553057Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "CONSTRAINED_VAR_GRAMMAR = extend_grammar(CONSTRAINED_VAR_GRAMMAR, {\n", " \"\": [(\"\", opts(pre=clear_symbol_table))]\n", "})" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.554783Z", "iopub.status.busy": "2024-01-18T17:17:03.554702Z", "iopub.status.idle": "2024-01-18T17:17:03.556397Z", "shell.execute_reply": "2024-01-18T17:17:03.556103Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert is_valid_grammar(CONSTRAINED_VAR_GRAMMAR)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Fuzzing with this grammar ensures that each identifier used is actually defined:" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.557857Z", "iopub.status.busy": "2024-01-18T17:17:03.557779Z", "iopub.status.idle": "2024-01-18T17:17:03.693438Z", "shell.execute_reply": "2024-01-18T17:17:03.693133Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DB=+(8/4/7-9+3+3)/2178/+-9\n", "lNIqc=+(1+9-8)/2.9*8/5*0\n", "Sg=(+9/8/6)*++1/(1+7)*8*4\n", "r=+---552\n", "iz=5/7/7;K=1+6*iz*1\n", "q=3-2;MPy=q;p=2*5\n", "zj=+5*-+35.2-+1.5727978+(-(-0/6-7+3))*--+44*1\n", "Tl=((0*9+4-3)-6)/(-3-7*8*8/7)+9\n", "aXZ=-5/-+3*9/3/1-8-+0*0/3+7+4\n", "NA=-(8+g-1)*1.6;g=++7;a=++g*g*g\n" ] } ], "source": [ "var_grammar_fuzzer = GeneratorGrammarFuzzer(CONSTRAINED_VAR_GRAMMAR)\n", "for i in range(10):\n", " print(var_grammar_fuzzer.fuzz())" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Ordering Expansions\n", "\n", "While our previous def/use example ensures that each _used_ variable also is a _defined_ variable, it does not take care of the _order_ in which these definitions are made. In fact, it is possible that first, the term on the right-hand side of a `;` expands, creating an entry in the symbol table, which is then later used in the expression on the left hand side. We can demonstrate this by actually evaluating the produced variable assignments in Python, using `exec()` to execute the sequence of assignments. (Little known fact: Python _does_ support `;` as statement separator.)" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.695099Z", "iopub.status.busy": "2024-01-18T17:17:03.694990Z", "iopub.status.idle": "2024-01-18T17:17:03.755138Z", "shell.execute_reply": "2024-01-18T17:17:03.754865Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "f=(9)*kOj*kOj-6/7;kOj=(9-8)*7*1\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_76616/3970000697.py\", line 6, in \n", " exec(s, {}, {})\n", " File \"\", line 1, in \n", "NameError: name 'kOj' is not defined (expected)\n" ] } ], "source": [ "var_grammar_fuzzer = GeneratorGrammarFuzzer(CONSTRAINED_VAR_GRAMMAR)\n", "with ExpectError():\n", " for i in range(100):\n", " s = var_grammar_fuzzer.fuzz()\n", " try:\n", " exec(s, {}, {})\n", " except SyntaxError:\n", " continue\n", " except ZeroDivisionError:\n", " continue\n", "print(s)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To address this issue, we allow explicitly specifying an *ordering of expansions*. For our previous fuzzers, such an ordering was inconsequential, as eventually, all symbols would be expanded; if we have expansion functions with side effects, though, having control over the ordering in which expansions are made (and thus over the ordering in which the associated functions are called) can be important." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To specify orderings, we assign a special attribute `order` to individual expansions. This is a list with a number for each symbol in the expansion stating in which order the expansions are to be made, starting with the smallest one. As an example, the following rule specifies that the left-hand side of a `;` separator should be expanded first:" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.756672Z", "iopub.status.busy": "2024-01-18T17:17:03.756582Z", "iopub.status.idle": "2024-01-18T17:17:03.758444Z", "shell.execute_reply": "2024-01-18T17:17:03.758212Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "CONSTRAINED_VAR_GRAMMAR = extend_grammar(CONSTRAINED_VAR_GRAMMAR, {\n", " \"\": [(\";\", opts(order=[1, 2])),\n", " \"\"]\n", "})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Likewise, we want the definition of a variable to be produced only _after_ the expression is expanded, since otherwise, the expression might already refer to the defined variable:" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.759870Z", "iopub.status.busy": "2024-01-18T17:17:03.759788Z", "iopub.status.idle": "2024-01-18T17:17:03.761736Z", "shell.execute_reply": "2024-01-18T17:17:03.761512Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "CONSTRAINED_VAR_GRAMMAR = extend_grammar(CONSTRAINED_VAR_GRAMMAR, {\n", " \"\": [(\"=\", opts(post=lambda id, expr: define_id(id),\n", " order=[2, 1]))],\n", "})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The helper `exp_order()` allows us to retrieve the order:" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.763316Z", "iopub.status.busy": "2024-01-18T17:17:03.763238Z", "iopub.status.idle": "2024-01-18T17:17:03.764954Z", "shell.execute_reply": "2024-01-18T17:17:03.764720Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def exp_order(expansion):\n", " \"\"\"Return the specified expansion ordering, or None if unspecified\"\"\"\n", " return exp_opt(expansion, 'order')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To control the ordering in which symbols are expanded, we hook into the method `choose_tree_expansion()`, which is specifically set for being extended in subclasses. It proceeds through the list `expandable_children` of expandable children to choose from and matches them with the nonterminal children from the expansion to determine their order number. The index `min_given_order` of the expandable child with the lowest order number is then returned, choosing this child for expansion." ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.766417Z", "iopub.status.busy": "2024-01-18T17:17:03.766340Z", "iopub.status.idle": "2024-01-18T17:17:03.770002Z", "shell.execute_reply": "2024-01-18T17:17:03.769789Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n", " def choose_tree_expansion(self, tree: DerivationTree,\n", " expandable_children: List[DerivationTree]) \\\n", " -> int:\n", " \"\"\"Return index of subtree in `expandable_children`\n", " to be selected for expansion. Defaults to random.\"\"\"\n", " (symbol, tree_children) = tree\n", " assert isinstance(tree_children, list)\n", "\n", " if len(expandable_children) == 1:\n", " # No choice\n", " return super().choose_tree_expansion(tree, expandable_children)\n", "\n", " expansion = self.find_expansion(tree)\n", " given_order = exp_order(expansion)\n", " if given_order is None:\n", " # No order specified\n", " return super().choose_tree_expansion(tree, expandable_children)\n", "\n", " nonterminal_children = [c for c in tree_children if c[1] != []]\n", " assert len(nonterminal_children) == len(given_order), \\\n", " \"Order must have one element for each nonterminal\"\n", "\n", " # Find expandable child with lowest ordering\n", " min_given_order = None\n", " j = 0\n", " for k, expandable_child in enumerate(expandable_children):\n", " while j < len(\n", " nonterminal_children) and expandable_child != nonterminal_children[j]:\n", " j += 1\n", " assert j < len(nonterminal_children), \"Expandable child not found\"\n", " if self.log:\n", " print(\"Expandable child #%d %s has order %d\" %\n", " (k, expandable_child[0], given_order[j]))\n", "\n", " if min_given_order is None or given_order[j] < given_order[min_given_order]:\n", " min_given_order = k\n", "\n", " assert min_given_order is not None\n", "\n", " if self.log:\n", " print(\"Returning expandable child #%d %s\" %\n", " (min_given_order, expandable_children[min_given_order][0]))\n", "\n", " return min_given_order" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "With this, our fuzzer can now respect orderings, and all variables are properly defined:" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:03.771412Z", "iopub.status.busy": "2024-01-18T17:17:03.771326Z", "iopub.status.idle": "2024-01-18T17:17:05.087439Z", "shell.execute_reply": "2024-01-18T17:17:05.087108Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a=(1)*0*3/8+0/8-4/8-0\n", "r=+0*+8/-4+((9)*2-1-8+6/9)\n", "D=+(2*3+6*0)-(5)/9*0/2;Q=D\n", "C=9*(2-1)*9*0-1.2/6-3*5\n", "G=-25.1\n", "H=+4*4/8.5*4-8*4+(5);D=6\n", "PIF=4841/++(460.1---626)*51755;E=(8)/-PIF+6.8*(7-PIF)*9*PIF;k=8\n", "X=((0)*2/0*6+7*3)/(0-7-9)\n", "x=94.2+25;x=++x/(7)+-9/8/2/x+-1/x;I=x\n", "cBM=51.15;f=81*-+--((2++cBM/cBM*+1*0/0-5+cBM))\n" ] } ], "source": [ "var_grammar_fuzzer = GeneratorGrammarFuzzer(CONSTRAINED_VAR_GRAMMAR)\n", "for i in range(100):\n", " s = var_grammar_fuzzer.fuzz()\n", " if i < 10:\n", " print(s)\n", " try:\n", " exec(s, {}, {})\n", " except SyntaxError:\n", " continue\n", " except ZeroDivisionError:\n", " continue" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Real programming languages not only have one global scope, but multiple local scopes, frequently nested. By carefully organizing global and local symbol tables, we can set up a grammar to handle all of these. However, when fuzzing compilers and interpreters, we typically focus on single functions, for which one single scope is enough to make most inputs valid." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## All Together\n", "\n", "Let us close this chapter by integrating our generator features with the other grammar features introduced earlier, in particular [coverage-driven fuzzing](GrammarCoverageFuzzer.ipynb) and [probabilistic grammar fuzzing](ProbabilisticGrammarFuzzer.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The general idea to integrate the individual features is through *multiple inheritance*, which we already used for `ProbabilisticGrammarCoverageFuzzer`, introduced in the [exercises on probabilistic fuzzing](ProbabilisticGrammarFuzzer.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Generators and Probabilistic Fuzzing\n", "\n", "Probabilistic fuzzing integrates very easily with generators, as both extend `GrammarFuzzer` in different ways." ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.089555Z", "iopub.status.busy": "2024-01-18T17:17:05.089428Z", "iopub.status.idle": "2024-01-18T17:17:05.435926Z", "shell.execute_reply": "2024-01-18T17:17:05.435576Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ProbabilisticGrammarFuzzer import ProbabilisticGrammarFuzzer # minor dependency" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.437665Z", "iopub.status.busy": "2024-01-18T17:17:05.437570Z", "iopub.status.idle": "2024-01-18T17:17:05.439230Z", "shell.execute_reply": "2024-01-18T17:17:05.438976Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import inheritance_conflicts" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.440848Z", "iopub.status.busy": "2024-01-18T17:17:05.440737Z", "iopub.status.idle": "2024-01-18T17:17:05.445257Z", "shell.execute_reply": "2024-01-18T17:17:05.444955Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['supported_opts']" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inheritance_conflicts(ProbabilisticGrammarFuzzer, GeneratorGrammarFuzzer)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We have to implement `supported_opts()` as the merger of both superclasses. At the same time, we also set up the constructor such that it invokes both." ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.446867Z", "iopub.status.busy": "2024-01-18T17:17:05.446778Z", "iopub.status.idle": "2024-01-18T17:17:05.449251Z", "shell.execute_reply": "2024-01-18T17:17:05.448966Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ProbabilisticGeneratorGrammarFuzzer(GeneratorGrammarFuzzer,\n", " ProbabilisticGrammarFuzzer):\n", " \"\"\"Join the features of `GeneratorGrammarFuzzer` \n", " and `ProbabilisticGrammarFuzzer`\"\"\"\n", "\n", " def supported_opts(self) -> Set[str]:\n", " return (super(GeneratorGrammarFuzzer, self).supported_opts() |\n", " super(ProbabilisticGrammarFuzzer, self).supported_opts())\n", "\n", " def __init__(self, grammar: Grammar, *, replacement_attempts: int = 10,\n", " **kwargs):\n", " \"\"\"Constructor.\n", " `replacement_attempts` - see `GeneratorGrammarFuzzer` constructor.\n", " All other keywords go into `ProbabilisticGrammarFuzzer`.\n", " \"\"\"\n", " super(GeneratorGrammarFuzzer, self).__init__(\n", " grammar,\n", " replacement_attempts=replacement_attempts)\n", " super(ProbabilisticGrammarFuzzer, self).__init__(grammar, **kwargs)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let us give our joint class a simple test, using probabilities to favor long identifiers:" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.450742Z", "iopub.status.busy": "2024-01-18T17:17:05.450641Z", "iopub.status.idle": "2024-01-18T17:17:05.452312Z", "shell.execute_reply": "2024-01-18T17:17:05.452077Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "CONSTRAINED_VAR_GRAMMAR.update({\n", " '': [('', opts(prob=0.9)),\n", " ''],\n", "})" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.453719Z", "iopub.status.busy": "2024-01-18T17:17:05.453642Z", "iopub.status.idle": "2024-01-18T17:17:05.456135Z", "shell.execute_reply": "2024-01-18T17:17:05.455874Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'order', 'post', 'pre', 'prob'}" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pgg_fuzzer = ProbabilisticGeneratorGrammarFuzzer(CONSTRAINED_VAR_GRAMMAR)\n", "pgg_fuzzer.supported_opts()" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.457909Z", "iopub.status.busy": "2024-01-18T17:17:05.457758Z", "iopub.status.idle": "2024-01-18T17:17:05.469562Z", "shell.execute_reply": "2024-01-18T17:17:05.469304Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'a=5+3/8/8+-1/6/9/8;E=6'" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pgg_fuzzer.fuzz()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Generators and Grammar Coverage\n", "\n", "Fuzzing based on grammar coverage is a bigger challenge. Not so much for the methods overloaded in both; we can resolve these just as above." ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.471063Z", "iopub.status.busy": "2024-01-18T17:17:05.470972Z", "iopub.status.idle": "2024-01-18T17:17:05.472638Z", "shell.execute_reply": "2024-01-18T17:17:05.472396Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ProbabilisticGrammarFuzzer import ProbabilisticGrammarCoverageFuzzer # minor dependency" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.474272Z", "iopub.status.busy": "2024-01-18T17:17:05.474179Z", "iopub.status.idle": "2024-01-18T17:17:05.475915Z", "shell.execute_reply": "2024-01-18T17:17:05.475653Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from GrammarCoverageFuzzer import GrammarCoverageFuzzer # minor dependency" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.477324Z", "iopub.status.busy": "2024-01-18T17:17:05.477240Z", "iopub.status.idle": "2024-01-18T17:17:05.482666Z", "shell.execute_reply": "2024-01-18T17:17:05.482434Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['__init__', 'supported_opts']" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inheritance_conflicts(ProbabilisticGrammarCoverageFuzzer,\n", " GeneratorGrammarFuzzer)" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.484126Z", "iopub.status.busy": "2024-01-18T17:17:05.484041Z", "iopub.status.idle": "2024-01-18T17:17:05.485572Z", "shell.execute_reply": "2024-01-18T17:17:05.485338Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import copy" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.487045Z", "iopub.status.busy": "2024-01-18T17:17:05.486958Z", "iopub.status.idle": "2024-01-18T17:17:05.489350Z", "shell.execute_reply": "2024-01-18T17:17:05.489118Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ProbabilisticGeneratorGrammarCoverageFuzzer(GeneratorGrammarFuzzer,\n", " ProbabilisticGrammarCoverageFuzzer):\n", " \"\"\"Join the features of `GeneratorGrammarFuzzer` \n", " and `ProbabilisticGrammarCoverageFuzzer`\"\"\"\n", "\n", " def supported_opts(self) -> Set[str]:\n", " return (super(GeneratorGrammarFuzzer, self).supported_opts() |\n", " super(ProbabilisticGrammarCoverageFuzzer, self).supported_opts())\n", "\n", " def __init__(self, grammar: Grammar, *,\n", " replacement_attempts: int = 10, **kwargs) -> None:\n", " \"\"\"Constructor.\n", " `replacement_attempts` - see `GeneratorGrammarFuzzer` constructor.\n", " All other keywords go into `ProbabilisticGrammarFuzzer`.\n", " \"\"\"\n", " super(GeneratorGrammarFuzzer, self).__init__(\n", " grammar,\n", " replacement_attempts)\n", " super(ProbabilisticGrammarCoverageFuzzer, self).__init__(\n", " grammar,\n", " **kwargs)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The problem is that during expansion, we _may_ generate (and cover) expansions that we later drop (for instance, because a `post` function returns `False`). Hence, we have to _remove_ this coverage which is no longer present in the final production." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We resolve the problem by _rebuilding the coverage_ from the final tree after it is produced. To this end, we hook into the `fuzz_tree()` method. We have it save the original coverage before creating the tree, restoring it afterwards. Then we traverse the resulting tree, adding its coverage back again (`add_tree_coverage()`)." ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.490979Z", "iopub.status.busy": "2024-01-18T17:17:05.490887Z", "iopub.status.idle": "2024-01-18T17:17:05.493767Z", "shell.execute_reply": "2024-01-18T17:17:05.493509Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ProbabilisticGeneratorGrammarCoverageFuzzer(\n", " ProbabilisticGeneratorGrammarCoverageFuzzer):\n", "\n", " def fuzz_tree(self) -> DerivationTree:\n", " self.orig_covered_expansions = copy.deepcopy(self.covered_expansions)\n", " tree = super().fuzz_tree()\n", " self.covered_expansions = self.orig_covered_expansions\n", " self.add_tree_coverage(tree)\n", " return tree\n", "\n", " def add_tree_coverage(self, tree: DerivationTree) -> None:\n", " (symbol, children) = tree\n", " assert isinstance(children, list)\n", " if len(children) > 0:\n", " flat_children: List[DerivationTree] = [\n", " (child_symbol, None)\n", " for (child_symbol, _) in children\n", " ]\n", " self.add_coverage(symbol, flat_children)\n", " for c in children:\n", " self.add_tree_coverage(c)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As a final step, we ensure that if we do have to restart an expansion from scratch, we also restore the previous coverage such that we can start fully anew:" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.495308Z", "iopub.status.busy": "2024-01-18T17:17:05.495223Z", "iopub.status.idle": "2024-01-18T17:17:05.497164Z", "shell.execute_reply": "2024-01-18T17:17:05.496936Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class ProbabilisticGeneratorGrammarCoverageFuzzer(\n", " ProbabilisticGeneratorGrammarCoverageFuzzer):\n", "\n", " def restart_expansion(self) -> None:\n", " super().restart_expansion()\n", " self.covered_expansions = self.orig_covered_expansions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us try this out. After we have produced a string, we should see its coverage in `expansion_coverage()`:" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.498725Z", "iopub.status.busy": "2024-01-18T17:17:05.498638Z", "iopub.status.idle": "2024-01-18T17:17:05.533285Z", "shell.execute_reply": "2024-01-18T17:17:05.532982Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'H=+-2+7.4*(9)/0-6*8;T=5'" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pggc_fuzzer = ProbabilisticGeneratorGrammarCoverageFuzzer(\n", " CONSTRAINED_VAR_GRAMMAR)\n", "pggc_fuzzer.fuzz()" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.535171Z", "iopub.status.busy": "2024-01-18T17:17:05.534956Z", "iopub.status.idle": "2024-01-18T17:17:05.537445Z", "shell.execute_reply": "2024-01-18T17:17:05.537179Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{' -> H',\n", " ' -> T',\n", " ' -> =',\n", " ' -> 0',\n", " ' -> 2',\n", " ' -> 4',\n", " ' -> 5',\n", " ' -> 6',\n", " ' -> 7',\n", " ' -> 8',\n", " ' -> 9',\n", " ' -> ',\n", " ' -> +',\n", " ' -> -',\n", " ' -> ()',\n", " ' -> +',\n", " ' -> -',\n", " ' -> ',\n", " ' -> ',\n", " ' -> ',\n", " ' -> ',\n", " ' -> .',\n", " ' -> ',\n", " ' -> ',\n", " ' -> ',\n", " ' -> ;',\n", " ' -> ',\n", " ' -> *',\n", " ' -> /',\n", " ' -> '}" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pggc_fuzzer.expansion_coverage()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Fuzzing again would eventually cover all letters in identifiers:" ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:05.539134Z", "iopub.status.busy": "2024-01-18T17:17:05.539024Z", "iopub.status.idle": "2024-01-18T17:17:06.235159Z", "shell.execute_reply": "2024-01-18T17:17:06.234869Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['llcyzc=3.0*02.3*1',\n", " 'RfMgRYmd=---2.9',\n", " 'p=+(7+3/4)*+-4-3.2*((2)-4)/2',\n", " 'z=1-2/4-3*9+3+5',\n", " 'v=(2/3)/1/2*8+(3)-7*2-1',\n", " 'L=9.5/9-(7)/8/1+2-2;c=L',\n", " 'U=+-91535-1-9-(9)/1;i=U',\n", " 'g=-8.3*7*5+1*5*9-5;k=1',\n", " 'J=+-8-(5/6-1)/7-6+7',\n", " 'p=053/-(8*0*3*2/1);t=p']" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[pggc_fuzzer.fuzz() for i in range(10)]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "With `ProbabilisticGeneratorGrammarCoverageFuzzer`, we now have a grammar fuzzer that combines efficient grammar fuzzing with coverage, probabilities, and generator functions. The only thing that is missing is a shorter name. `PGGCFuzzer`, maybe?" ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:06.236794Z", "iopub.status.busy": "2024-01-18T17:17:06.236681Z", "iopub.status.idle": "2024-01-18T17:17:06.238419Z", "shell.execute_reply": "2024-01-18T17:17:06.238168Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class PGGCFuzzer(ProbabilisticGeneratorGrammarCoverageFuzzer):\n", " \"\"\"The one grammar-based fuzzer that supports all fuzzingbook features\"\"\"\n", " pass" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Synopsis\n", "\n", "This chapter introduces the ability to attach _functions_ to individual production rules:\n", "\n", "* A `pre` function is executed _before_ the expansion takes place. Its result (typically a string) can _replace_ the actual expansion.\n", "* A `post` function is executed _after_ the expansion has taken place. If it returns a string, the string replaces the expansion; if it returns `False`, it triggers a new expansion.\n", "\n", "Both functions can return `None` to not interfere with grammar production at all." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To attach a function `F` to an individual expansion `S` in a grammar, replace `S` with a pair\n", "\n", "```python\n", "(S, opts(pre=F)) # Set a function to be executed before expansion\n", "```\n", "or\n", "```python\n", "(S, opts(post=F)) # Set a function to be executed after expansion\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is an example, To take an area code from a list that is given programmatically, we can write:" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:06.239999Z", "iopub.status.busy": "2024-01-18T17:17:06.239898Z", "iopub.status.idle": "2024-01-18T17:17:06.241684Z", "shell.execute_reply": "2024-01-18T17:17:06.241244Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from Grammars import US_PHONE_GRAMMAR, extend_grammar, opts" ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:06.243554Z", "iopub.status.busy": "2024-01-18T17:17:06.243436Z", "iopub.status.idle": "2024-01-18T17:17:06.245140Z", "shell.execute_reply": "2024-01-18T17:17:06.244898Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def pick_area_code():\n", " return random.choice(['555', '554', '553'])" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:06.246583Z", "iopub.status.busy": "2024-01-18T17:17:06.246483Z", "iopub.status.idle": "2024-01-18T17:17:06.248161Z", "shell.execute_reply": "2024-01-18T17:17:06.247883Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "PICKED_US_PHONE_GRAMMAR = extend_grammar(US_PHONE_GRAMMAR,\n", "{\n", " \"\": [(\"\", opts(pre=pick_area_code))]\n", "})" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A `GeneratorGrammarFuzzer` will extract and interpret these options. Here is an example:" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:06.249618Z", "iopub.status.busy": "2024-01-18T17:17:06.249514Z", "iopub.status.idle": "2024-01-18T17:17:06.253676Z", "shell.execute_reply": "2024-01-18T17:17:06.253383Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['(554)732-6097',\n", " '(555)469-0662',\n", " '(553)671-5358',\n", " '(555)686-8011',\n", " '(554)453-4067']" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "picked_us_phone_fuzzer = GeneratorGrammarFuzzer(PICKED_US_PHONE_GRAMMAR)\n", "[picked_us_phone_fuzzer.fuzz() for i in range(5)]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As you can see, the area codes now all stem from `pick_area_code()`. Such definitions allow closely tying program code (such as `pick_area_code()`) to grammars." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `PGGCFuzzer` class incorporates all features from [the `GrammarFuzzer` class](GrammarFuzzer.ipynb) and its [coverage-based](GrammarCoverageFuzzer.ipynb), [probabilistic-based](ProbabilisticGrammarFuzzer.ipynb), and [generator-based](GeneratorGrammarFuzzer.ipynb) derivatives." ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:06.255200Z", "iopub.status.busy": "2024-01-18T17:17:06.255117Z", "iopub.status.idle": "2024-01-18T17:17:06.256553Z", "shell.execute_reply": "2024-01-18T17:17:06.256300Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "from ClassDiagram import display_class_hierarchy" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:06.258152Z", "iopub.status.busy": "2024-01-18T17:17:06.258058Z", "iopub.status.idle": "2024-01-18T17:17:06.792633Z", "shell.execute_reply": "2024-01-18T17:17:06.792184Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "PGGCFuzzer\n", "\n", "\n", "PGGCFuzzer\n", "\n", "\n", "\n", "\n", "\n", "ProbabilisticGeneratorGrammarCoverageFuzzer\n", "\n", "\n", "ProbabilisticGeneratorGrammarCoverageFuzzer\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "fuzz_tree()\n", "\n", "\n", "\n", "add_tree_coverage()\n", "\n", "\n", "\n", "restart_expansion()\n", "\n", "\n", "\n", "supported_opts()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "PGGCFuzzer->ProbabilisticGeneratorGrammarCoverageFuzzer\n", "\n", "\n", "\n", "\n", "\n", "GeneratorGrammarFuzzer\n", "\n", "\n", "GeneratorGrammarFuzzer\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "fuzz_tree()\n", "\n", "\n", "\n", "apply_result()\n", "\n", "\n", "\n", "choose_tree_expansion()\n", "\n", "\n", "\n", "eval_function()\n", "\n", "\n", "\n", "expand_tree_once()\n", "\n", "\n", "\n", "find_expansion()\n", "\n", "\n", "\n", "process_chosen_children()\n", "\n", "\n", "\n", "reset_generators()\n", "\n", "\n", "\n", "restart_expansion()\n", "\n", "\n", "\n", "run_generator()\n", "\n", "\n", "\n", "run_post_functions()\n", "\n", "\n", "\n", "run_post_functions_locally()\n", "\n", "\n", "\n", "supported_opts()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "ProbabilisticGeneratorGrammarCoverageFuzzer->GeneratorGrammarFuzzer\n", "\n", "\n", "\n", "\n", "\n", "ProbabilisticGrammarCoverageFuzzer\n", "\n", "\n", "ProbabilisticGrammarCoverageFuzzer\n", "\n", "\n", "\n", "\n", "\n", "ProbabilisticGeneratorGrammarCoverageFuzzer->ProbabilisticGrammarCoverageFuzzer\n", "\n", "\n", "\n", "\n", "\n", "GrammarFuzzer\n", "\n", "\n", "GrammarFuzzer\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "fuzz()\n", "\n", "\n", "\n", "fuzz_tree()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "GeneratorGrammarFuzzer->GrammarFuzzer\n", "\n", "\n", "\n", "\n", "\n", "Fuzzer\n", "\n", "\n", "Fuzzer\n", "\n", "\n", "\n", "run()\n", "\n", "\n", "\n", "runs()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "GrammarFuzzer->Fuzzer\n", "\n", "\n", "\n", "\n", "\n", "GrammarCoverageFuzzer\n", "\n", "\n", "GrammarCoverageFuzzer\n", "\n", "\n", "\n", "\n", "\n", "ProbabilisticGrammarCoverageFuzzer->GrammarCoverageFuzzer\n", "\n", "\n", "\n", "\n", "\n", "ProbabilisticGrammarFuzzer\n", "\n", "\n", "ProbabilisticGrammarFuzzer\n", "\n", "\n", "\n", "\n", "\n", "ProbabilisticGrammarCoverageFuzzer->ProbabilisticGrammarFuzzer\n", "\n", "\n", "\n", "\n", "\n", "SimpleGrammarCoverageFuzzer\n", "\n", "\n", "SimpleGrammarCoverageFuzzer\n", "\n", "\n", "\n", "\n", "\n", "GrammarCoverageFuzzer->SimpleGrammarCoverageFuzzer\n", "\n", "\n", "\n", "\n", "\n", "TrackingGrammarCoverageFuzzer\n", "\n", "\n", "TrackingGrammarCoverageFuzzer\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "SimpleGrammarCoverageFuzzer->TrackingGrammarCoverageFuzzer\n", "\n", "\n", "\n", "\n", "\n", "TrackingGrammarCoverageFuzzer->GrammarFuzzer\n", "\n", "\n", "\n", "\n", "\n", "ProbabilisticGrammarFuzzer->GrammarFuzzer\n", "\n", "\n", "\n", "\n", "\n", "Legend\n", "Legend\n", "• \n", "public_method()\n", "• \n", "private_method()\n", "• \n", "overloaded_method()\n", "Hover over names to see doc\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "display_class_hierarchy([PGGCFuzzer],\n", " public_methods=[\n", " Fuzzer.run,\n", " Fuzzer.runs,\n", " GrammarFuzzer.__init__,\n", " GrammarFuzzer.fuzz,\n", " GrammarFuzzer.fuzz_tree,\n", " GeneratorGrammarFuzzer.__init__,\n", " GeneratorGrammarFuzzer.fuzz_tree,\n", " GrammarCoverageFuzzer.__init__,\n", " ProbabilisticGrammarFuzzer.__init__,\n", " ProbabilisticGrammarCoverageFuzzer.__init__,\n", " ProbabilisticGeneratorGrammarCoverageFuzzer.__init__,\n", " ProbabilisticGeneratorGrammarCoverageFuzzer.fuzz_tree,\n", " PGGCFuzzer.__init__,\n", " ],\n", " types={\n", " 'DerivationTree': DerivationTree,\n", " 'Expansion': Expansion,\n", " 'Grammar': Grammar\n", " },\n", " project='fuzzingbook')" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Lessons Learned\n", "\n", "Functions attached to grammar expansions can serve \n", "* as _generators_ to efficiently produce a symbol expansion from a function;\n", "* as _constraints_ to check produced strings against (complex) validity conditions; and\n", "* as _repairs_ to apply changes to produced strings, such as checksums and identifiers." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Next Steps\n", "\n", "With this chapter, we have powerful grammars which we can use in a number of domains:\n", "\n", "* In the [chapter on fuzzing APIs](APIFuzzer.ipynb), we show how to produce complex data structures for testing, making use of `GeneratorGrammarFuzzer` features to combine grammars and generator functions.\n", "* In the [chapter on fuzzing User Interfaces](WebFuzzer.ipynb), we make use of `GeneratorGrammarFuzzer` to produce complex user interface inputs.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Background\n", "\n", "For fuzzing APIs, generator functions are very common. In the [chapter on API fuzzing](APIFuzzer.ipynb), we show how to combine them with grammars for even richer test generation.\n", "\n", "The combination of generator functions and grammars is mostly possible because we define and make use of grammars in an all-Python environment. We are not aware of another grammar-based fuzzing system that exhibits similar features." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercises\n" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" }, "solution2": "hidden", "solution2_first": true }, "source": [ "### Exercise 1: Tree Processing\n", "\n", "So far, our `pre` and `post` processing functions all accept and produce strings. In some circumstances, however, it can be useful to access the _derivation trees_ directly – for instance, to access and check some child element.\n", "\n", "Your task is to extend `GeneratorGrammarFuzzer` with pre- and post-processing functions that can accept and return derivation trees. To this end, proceed as follows:\n", "\n", "1. Extend `GeneratorGrammarFuzzer` such that a function can return a derivation tree (a tuple) or a list of derivation trees, which would then replace subtrees in the same way as strings.\n", "2. Extend `GeneratorGrammarFuzzer` with a `post_tree` attribute which takes a function just like `post`, except that its arguments would be derivation trees." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** Left to the reader at this point." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" }, "solution": "hidden", "solution2": "hidden", "solution2_first": true, "solution_first": true }, "source": [ "### Exercise 2: Attribute Grammars\n", "\n", "Set up a mechanism through which it is possible to attach arbitrary _attributes_ to individual elements in the derivation tree. Expansion functions could attach such attributes to individual symbols (say, by returning `opts()`), and also access attributes of symbols in later calls. Here is an example:" ] }, { "cell_type": "code", "execution_count": 116, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:17:06.795830Z", "iopub.status.busy": "2024-01-18T17:17:06.795686Z", "iopub.status.idle": "2024-01-18T17:17:06.798595Z", "shell.execute_reply": "2024-01-18T17:17:06.798146Z" }, "slideshow": { "slide_type": "fragment" }, "solution2": "hidden", "solution2_first": true }, "outputs": [], "source": [ "ATTR_GRAMMAR = {\n", " \"\": [(\"Text\", opts(post=lambda x1, x2: [None, x1.name]))],\n", " \"\": [(\"<>\", opts(post=lambda tag: opts(name=...)))],\n", " \"\": [\">\"]\n", "}" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" }, "solution": "hidden", "solution2": "hidden" }, "source": [ "**Solution.** Left to the reader at this point." ] } ], "metadata": { "ipub": { "bibliography": "fuzzingbook.bib", "toc": true }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "toc-autonumbering": false, "toc-showmarkdowntxt": false, "vscode": { "interpreter": { "hash": "4185989cf89c47c310c2629adcadd634093b57a2c49dffb5ae8d0d14fa302f2b" } } }, "nbformat": 4, "nbformat_minor": 4 }