{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"# Fuzzing with Generators\n",
"\n",
"In this chapter, we show how to extend grammars with _functions_ – pieces of code that get executed during grammar expansion, and that can generate, check, or change elements produced. Adding functions to a grammar allows for very versatile test generation, bringing together the best of grammar generation and programming."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:24.981480Z",
"iopub.status.busy": "2024-06-30T20:09:24.981388Z",
"iopub.status.idle": "2024-06-30T20:09:25.062218Z",
"shell.execute_reply": "2024-06-30T20:09:25.061930Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from bookutils import YouTubeVideo\n",
"YouTubeVideo('6Z35ChunpLY')"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Prerequisites**\n",
"\n",
"* As this chapter deeply interacts with the techniques discussed in the [chapter on efficient grammar fuzzing](GrammarFuzzer.ipynb), a good understanding of the techniques is recommended."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Synopsis\n",
"\n",
"\n",
"To [use the code provided in this chapter](Importing.ipynb), write\n",
"\n",
"```python\n",
">>> from fuzzingbook.GeneratorGrammarFuzzer import \n",
"```\n",
"\n",
"and then make use of the following features.\n",
"\n",
"\n",
"This chapter introduces the ability to attach _functions_ to individual production rules:\n",
"\n",
"* A `pre` function is executed _before_ the expansion takes place. Its result (typically a string) can _replace_ the actual expansion.\n",
"* A `post` function is executed _after_ the expansion has taken place. If it returns a string, the string replaces the expansion; if it returns `False`, it triggers a new expansion.\n",
"\n",
"Both functions can return `None` to not interfere with grammar production at all.\n",
"\n",
"To attach a function `F` to an individual expansion `S` in a grammar, replace `S` with a pair\n",
"\n",
"```python\n",
"(S, opts(pre=F)) # Set a function to be executed before expansion\n",
"```\n",
"or\n",
"```python\n",
"(S, opts(post=F)) # Set a function to be executed after expansion\n",
"```\n",
"\n",
"Here is an example, To take an area code from a list that is given programmatically, we can write:\n",
"\n",
"```python\n",
">>> from Grammars import US_PHONE_GRAMMAR, extend_grammar, opts\n",
">>> def pick_area_code():\n",
">>> return random.choice(['555', '554', '553'])\n",
">>> PICKED_US_PHONE_GRAMMAR = extend_grammar(US_PHONE_GRAMMAR,\n",
">>> {\n",
">>> \"\": [(\"\", opts(pre=pick_area_code))]\n",
">>> })\n",
"```\n",
"A `GeneratorGrammarFuzzer` will extract and interpret these options. Here is an example:\n",
"\n",
"```python\n",
">>> picked_us_phone_fuzzer = GeneratorGrammarFuzzer(PICKED_US_PHONE_GRAMMAR)\n",
">>> [picked_us_phone_fuzzer.fuzz() for i in range(5)]\n",
"['(554)732-6097',\n",
" '(555)469-0662',\n",
" '(553)671-5358',\n",
" '(555)686-8011',\n",
" '(554)453-4067']\n",
"```\n",
"As you can see, the area codes now all stem from `pick_area_code()`. Such definitions allow closely tying program code (such as `pick_area_code()`) to grammars.\n",
"\n",
"The `PGGCFuzzer` class incorporates all features from [the `GrammarFuzzer` class](GrammarFuzzer.ipynb) and its [coverage-based](GrammarCoverageFuzzer.ipynb), [probabilistic-based](ProbabilisticGrammarFuzzer.ipynb), and [generator-based](GeneratorGrammarFuzzer.ipynb) derivatives.\n",
"\n",
"![](PICS/GeneratorGrammarFuzzer-synopsis-1.svg)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": true,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Example: Test a Credit Card System\n",
"\n",
"Suppose you work with a shopping system that – among several other features – allows customers to pay with a credit card. Your task is to test the payment functionality. "
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": true,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"To make things simple, we will assume that we need only two pieces of data – a 16-digit credit card number and an amount to be charged. Both pieces can be easily generated with grammars, as in the following:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"button": false,
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.083337Z",
"iopub.status.busy": "2024-06-30T20:09:25.083198Z",
"iopub.status.idle": "2024-06-30T20:09:25.085537Z",
"shell.execute_reply": "2024-06-30T20:09:25.085246Z"
},
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import bookutils.setup"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.087368Z",
"iopub.status.busy": "2024-06-30T20:09:25.087244Z",
"iopub.status.idle": "2024-06-30T20:09:25.089230Z",
"shell.execute_reply": "2024-06-30T20:09:25.088933Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from typing import Callable, Set, List, Dict, Optional, Iterator, Any, Union, Tuple, cast"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.090938Z",
"iopub.status.busy": "2024-06-30T20:09:25.090817Z",
"iopub.status.idle": "2024-06-30T20:09:25.160040Z",
"shell.execute_reply": "2024-06-30T20:09:25.159701Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from Fuzzer import Fuzzer"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.161962Z",
"iopub.status.busy": "2024-06-30T20:09:25.161851Z",
"iopub.status.idle": "2024-06-30T20:09:25.677560Z",
"shell.execute_reply": "2024-06-30T20:09:25.677246Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from Grammars import EXPR_GRAMMAR, is_valid_grammar, is_nonterminal, extend_grammar\n",
"from Grammars import opts, exp_opt, exp_string, crange, Grammar, Expansion"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.679484Z",
"iopub.status.busy": "2024-06-30T20:09:25.679332Z",
"iopub.status.idle": "2024-06-30T20:09:25.709606Z",
"shell.execute_reply": "2024-06-30T20:09:25.709191Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from GrammarFuzzer import DerivationTree"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.711702Z",
"iopub.status.busy": "2024-06-30T20:09:25.711568Z",
"iopub.status.idle": "2024-06-30T20:09:25.714025Z",
"shell.execute_reply": "2024-06-30T20:09:25.713571Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"CHARGE_GRAMMAR: Grammar = {\n",
" \"\": [\"Charge to my credit card \"],\n",
" \"\": [\"$\"],\n",
" \"\": [\".\"],\n",
" \"\": [\"\", \"\"],\n",
" \"\": crange('0', '9'),\n",
"\n",
" \"\": [\"\"],\n",
" \"\": [\"\"],\n",
" \"\": [\"\"],\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.716420Z",
"iopub.status.busy": "2024-06-30T20:09:25.716291Z",
"iopub.status.idle": "2024-06-30T20:09:25.718223Z",
"shell.execute_reply": "2024-06-30T20:09:25.717886Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"assert is_valid_grammar(CHARGE_GRAMMAR)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"All of this works neatly – we can generate arbitrary amounts and credit card numbers:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.720073Z",
"iopub.status.busy": "2024-06-30T20:09:25.719895Z",
"iopub.status.idle": "2024-06-30T20:09:25.721977Z",
"shell.execute_reply": "2024-06-30T20:09:25.721627Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from GrammarFuzzer import GrammarFuzzer, all_terminals"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.723794Z",
"iopub.status.busy": "2024-06-30T20:09:25.723650Z",
"iopub.status.idle": "2024-06-30T20:09:25.731721Z",
"shell.execute_reply": "2024-06-30T20:09:25.731344Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['Charge $9.40 to my credit card 7166898575638313',\n",
" 'Charge $8.79 to my credit card 6845418694643271',\n",
" 'Charge $5.64 to my credit card 6655894657077388',\n",
" 'Charge $0.60 to my credit card 2596728464872261',\n",
" 'Charge $8.90 to my credit card 2363769342732142']"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"g = GrammarFuzzer(CHARGE_GRAMMAR)\n",
"[g.fuzz() for i in range(5)]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"However, when actually testing our system with this data, we find two problems:\n",
"\n",
"1. We'd like to test _specific_ amounts being charged – for instance, amounts that would excess the credit card limit.\n",
"2. We find that 9 out of 10 credit card numbers are rejected because of having an incorrect checksum. This is fine if we want to test rejection of credit card numbers – but if we want to test the actual functionality of processing a charge, we need _valid_ numbers."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We could go and ignore these issues; after all, eventually, it is only a matter of time until large amounts and valid numbers are generated. As it comes to the first concern, we could also address it by changing the grammar appropriately – say, to only produce charges that have at least six leading digits. However, generalizing this to arbitrary ranges of values will be cumbersome.\n",
"\n",
"The second concern, the checksums of credit card numbers, however, runs deeper – at least as far as grammars are concerned, is that a complex arithmetic operation like a checksum cannot be expressed in a grammar alone – at least not in the _context-free grammars_ we use here. (In principle, one _could_ do this in a _context–sensitive_ grammar, but specifying this would be no fun at all.) What we want is a mechanism that allows us to _attach programmatic computations_ to our grammars, bringing together the best of both worlds."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "skip"
},
"toc-hr-collapsed": true
},
"source": [
"## Attaching Functions to Expansions"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The key idea of this chapter is to _extend_ grammars such that one can _attach Python functions_ to individual expansions. These functions can be executed \n",
"\n",
"1. _before_ expansion, _replacing_ the element to be expanded by a computed value; or\n",
"2. _after_ expansion, _checking_ generated elements, and possibly also replacing them.\n",
"\n",
"In both cases, functions are specified using the `opts()` expansion mechanism introduced in the [chapter on grammars](Grammars.ipynb). They are thus tied to a specific expansion $e$ of a symbol $s$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Functions Called Before Expansion\n",
"\n",
"A function defined using the `pre` option is invoked _before_ expansion of $s$ into $e$. Its value _replaces_ the expansion $e$ to be produced. To generate a value for the credit card example, above, we could define a _pre-expansion_ generator function"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.733750Z",
"iopub.status.busy": "2024-06-30T20:09:25.733612Z",
"iopub.status.idle": "2024-06-30T20:09:25.735384Z",
"shell.execute_reply": "2024-06-30T20:09:25.735115Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import random"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.737021Z",
"iopub.status.busy": "2024-06-30T20:09:25.736911Z",
"iopub.status.idle": "2024-06-30T20:09:25.738843Z",
"shell.execute_reply": "2024-06-30T20:09:25.738437Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def high_charge() -> float:\n",
" return random.randint(10000000, 90000000) / 100.0"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"With `opts()`, we could attach this function to the grammar:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.740723Z",
"iopub.status.busy": "2024-06-30T20:09:25.740590Z",
"iopub.status.idle": "2024-06-30T20:09:25.742382Z",
"shell.execute_reply": "2024-06-30T20:09:25.742100Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"CHARGE_GRAMMAR.update({\n",
" \"\": [(\".\", opts(pre=high_charge))],\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"with the intention that whenever `` is expanded, the function `high_charge` would be invoked to generate a value for ``. (The actual expansion in the grammar would still be present for fuzzers that ignore functions, such as `GrammarFuzzer`)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Since functions tied to a grammar are frequently very simple, we can also _inline_ them using a *lambda* expression. A _lambda expression_ is used for _anonymous_ functions that are limited in scope and functionality. Here's an example:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.748153Z",
"iopub.status.busy": "2024-06-30T20:09:25.747995Z",
"iopub.status.idle": "2024-06-30T20:09:25.750273Z",
"shell.execute_reply": "2024-06-30T20:09:25.749818Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def apply_twice(function, x):\n",
" return function(function(x))"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.752873Z",
"iopub.status.busy": "2024-06-30T20:09:25.752690Z",
"iopub.status.idle": "2024-06-30T20:09:25.755532Z",
"shell.execute_reply": "2024-06-30T20:09:25.755178Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"16"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"apply_twice(lambda x: x * x, 2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Here, we don't have to give the `function` to be applied twice a name (say, `square()`); instead, we apply it inline within the invocation."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Using `lambda`, this is what our grammar looks like:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.757481Z",
"iopub.status.busy": "2024-06-30T20:09:25.757238Z",
"iopub.status.idle": "2024-06-30T20:09:25.759690Z",
"shell.execute_reply": "2024-06-30T20:09:25.759275Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"CHARGE_GRAMMAR.update({\n",
" \"\": [(\".\",\n",
" opts(pre=lambda: random.randint(10000000, 90000000) / 100.0))]\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Functions Called After Expansion\n",
"\n",
"A function defined using the `post` option is invoked _after_ expansion of $s$ into $e$, passing the expanded values of the symbols in $e$ as arguments. A post-expansion function can serve in two ways:\n",
"\n",
"1. It can serve as a *constraint* or _filter_ on the expanded values, returning `True` if the expansion is valid, and `False` if not; if it returns `False`, another expansion is attempted.\n",
"2. It can also serve as a *repair*, returning a string value; like pre-expansion functions, the returned value replaces the expansion.\n",
"\n",
"For our credit card example, we can choose both ways. If we have a function `check_credit_card(s)` which returns `True` for a valid number `s` and `False` for invalid ones, we would go for the first option:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.761924Z",
"iopub.status.busy": "2024-06-30T20:09:25.761745Z",
"iopub.status.idle": "2024-06-30T20:09:25.764044Z",
"shell.execute_reply": "2024-06-30T20:09:25.763719Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"CHARGE_GRAMMAR.update({\n",
" \"\": [(\"\", opts(post=lambda digits: check_credit_card(digits)))]\n",
"})"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"With such a filter, only valid credit cards will be produced. On average, it will still take 10 attempts for each time `check_credit_card()` is satisfied."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"If we have a function `fix_credit_card(s)` which changes the number such that the checksum is valid and returns the \"fixed\" number, we can make use of this one instead:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.765670Z",
"iopub.status.busy": "2024-06-30T20:09:25.765552Z",
"iopub.status.idle": "2024-06-30T20:09:25.767319Z",
"shell.execute_reply": "2024-06-30T20:09:25.767060Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"CHARGE_GRAMMAR.update({\n",
" \"\": [(\"\", opts(post=lambda digits: fix_credit_card(digits)))]\n",
"})"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Here, each number is generated only once and then repaired. This is very efficient."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The checksum function used for credit cards is the [Luhn algorithm](https://en.wikipedia.org/wiki/Luhn_algorithm), a simple yet effective formula."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.768829Z",
"iopub.status.busy": "2024-06-30T20:09:25.768707Z",
"iopub.status.idle": "2024-06-30T20:09:25.771324Z",
"shell.execute_reply": "2024-06-30T20:09:25.771027Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def luhn_checksum(s: str) -> int:\n",
" \"\"\"Compute Luhn's check digit over a string of digits\"\"\"\n",
" LUHN_ODD_LOOKUP = (0, 2, 4, 6, 8, 1, 3, 5, 7,\n",
" 9) # sum_of_digits (index * 2)\n",
"\n",
" evens = sum(int(p) for p in s[-1::-2])\n",
" odds = sum(LUHN_ODD_LOOKUP[int(p)] for p in s[-2::-2])\n",
" return (evens + odds) % 10"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.772981Z",
"iopub.status.busy": "2024-06-30T20:09:25.772825Z",
"iopub.status.idle": "2024-06-30T20:09:25.774884Z",
"shell.execute_reply": "2024-06-30T20:09:25.774581Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def valid_luhn_checksum(s: str) -> bool:\n",
" \"\"\"Check whether the last digit is Luhn's checksum over the earlier digits\"\"\"\n",
" return luhn_checksum(s[:-1]) == int(s[-1])"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.776424Z",
"iopub.status.busy": "2024-06-30T20:09:25.776308Z",
"iopub.status.idle": "2024-06-30T20:09:25.778405Z",
"shell.execute_reply": "2024-06-30T20:09:25.777978Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"def fix_luhn_checksum(s: str) -> str:\n",
" \"\"\"Return the given string of digits, with a fixed check digit\"\"\"\n",
" return s[:-1] + repr(luhn_checksum(s[:-1]))"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.779962Z",
"iopub.status.busy": "2024-06-30T20:09:25.779850Z",
"iopub.status.idle": "2024-06-30T20:09:25.782389Z",
"shell.execute_reply": "2024-06-30T20:09:25.782054Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"luhn_checksum(\"123\")"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.784163Z",
"iopub.status.busy": "2024-06-30T20:09:25.783928Z",
"iopub.status.idle": "2024-06-30T20:09:25.786387Z",
"shell.execute_reply": "2024-06-30T20:09:25.786030Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'1238'"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fix_luhn_checksum(\"123x\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We can make use of these functions in our credit card grammar:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.788258Z",
"iopub.status.busy": "2024-06-30T20:09:25.788126Z",
"iopub.status.idle": "2024-06-30T20:09:25.790109Z",
"shell.execute_reply": "2024-06-30T20:09:25.789814Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"check_credit_card: Callable[[str], bool] = valid_luhn_checksum\n",
"fix_credit_card: Callable[[str], str] = fix_luhn_checksum"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.791902Z",
"iopub.status.busy": "2024-06-30T20:09:25.791642Z",
"iopub.status.idle": "2024-06-30T20:09:25.793991Z",
"shell.execute_reply": "2024-06-30T20:09:25.793680Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'1234567890123458'"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fix_credit_card(\"1234567890123456\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## A Class for Integrating Constraints"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"While it is easy to specify functions, our grammar fuzzer will simply ignore them just as it ignores all extensions. It will issue a warning, though:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.795896Z",
"iopub.status.busy": "2024-06-30T20:09:25.795748Z",
"iopub.status.idle": "2024-06-30T20:09:25.799358Z",
"shell.execute_reply": "2024-06-30T20:09:25.798944Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'Charge $4.05 to my credit card 0637034038177393'"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"g = GrammarFuzzer(CHARGE_GRAMMAR)\n",
"g.fuzz()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We need to define a special fuzzer that actually invokes the given `pre` and `post` functions and acts accordingly. We name this a `GeneratorGrammarFuzzer`:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.801151Z",
"iopub.status.busy": "2024-06-30T20:09:25.801019Z",
"iopub.status.idle": "2024-06-30T20:09:25.803280Z",
"shell.execute_reply": "2024-06-30T20:09:25.802956Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"class GeneratorGrammarFuzzer(GrammarFuzzer):\n",
" def supported_opts(self) -> Set[str]:\n",
" return super().supported_opts() | {\"pre\", \"post\", \"order\"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We define custom functions to access the `pre` and `post` options:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.805371Z",
"iopub.status.busy": "2024-06-30T20:09:25.805209Z",
"iopub.status.idle": "2024-06-30T20:09:25.807433Z",
"shell.execute_reply": "2024-06-30T20:09:25.806976Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"def exp_pre_expansion_function(expansion: Expansion) -> Optional[Callable]:\n",
" \"\"\"Return the specified pre-expansion function, or None if unspecified\"\"\"\n",
" return exp_opt(expansion, 'pre')"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.809313Z",
"iopub.status.busy": "2024-06-30T20:09:25.809167Z",
"iopub.status.idle": "2024-06-30T20:09:25.811273Z",
"shell.execute_reply": "2024-06-30T20:09:25.810977Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def exp_post_expansion_function(expansion: Expansion) -> Optional[Callable]:\n",
" \"\"\"Return the specified post-expansion function, or None if unspecified\"\"\"\n",
" return exp_opt(expansion, 'post')"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The `order` attribute will be used [later in this chapter](#Ordering-Expansions)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"toc-hr-collapsed": true
},
"source": [
"## Generating Elements before Expansion"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Our first task will be implementing the pre-expansion functions – that is, the function that would be invoked _before_ expansion to replace the value to be expanded. To this end, we hook into the `process_chosen_children()` method, which gets the selected children before expansion. We set it up such that it invokes the given `pre` function and applies its result on the children, possibly replacing them."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.813020Z",
"iopub.status.busy": "2024-06-30T20:09:25.812891Z",
"iopub.status.idle": "2024-06-30T20:09:25.814604Z",
"shell.execute_reply": "2024-06-30T20:09:25.814255Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import inspect"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.816313Z",
"iopub.status.busy": "2024-06-30T20:09:25.816174Z",
"iopub.status.idle": "2024-06-30T20:09:25.819130Z",
"shell.execute_reply": "2024-06-30T20:09:25.818793Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n",
" def process_chosen_children(self, children: List[DerivationTree],\n",
" expansion: Expansion) -> List[DerivationTree]:\n",
" function = exp_pre_expansion_function(expansion)\n",
" if function is None:\n",
" return children\n",
"\n",
" assert callable(function)\n",
" if inspect.isgeneratorfunction(function):\n",
" # See \"generators\", below\n",
" result = self.run_generator(expansion, function)\n",
" else:\n",
" result = function()\n",
"\n",
" if self.log:\n",
" print(repr(function) + \"()\", \"=\", repr(result))\n",
" return self.apply_result(result, children)\n",
"\n",
" def run_generator(self, expansion: Expansion, function: Callable):\n",
" ..."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The method `apply_result()` takes the result from the pre-expansion function and applies it on the children. The exact effect depends on the type of the result:\n",
"\n",
"* A _string_ $s$ replaces the entire expansion with $s$.\n",
"* A _list_ $[x_1, x_2, \\dots, x_n]$ replaces the $i$-th symbol with $x_i$ for every $x_i$ that is not `None`. Specifying `None` as a list element $x_i$ is useful to leave that element unchanged. If $x_i$ is not a string, it is converted to a string.\n",
"* A value of `None` is ignored. This is useful if one wants to simply call a function upon expansion, with no effect on the expanded strings.\n",
"* _Boolean_ values are ignored. This is useful for post-expansion functions, discussed below.\n",
"* All _other types_ are converted to strings, replacing the entire expansion."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.820979Z",
"iopub.status.busy": "2024-06-30T20:09:25.820833Z",
"iopub.status.idle": "2024-06-30T20:09:25.824734Z",
"shell.execute_reply": "2024-06-30T20:09:25.824411Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n",
" def apply_result(self, result: Any,\n",
" children: List[DerivationTree]) -> List[DerivationTree]:\n",
" if isinstance(result, str):\n",
" children = [(result, [])]\n",
" elif isinstance(result, list):\n",
" symbol_indexes = [i for i, c in enumerate(children)\n",
" if is_nonterminal(c[0])]\n",
"\n",
" for index, value in enumerate(result):\n",
" if value is not None:\n",
" child_index = symbol_indexes[index]\n",
" if not isinstance(value, str):\n",
" value = repr(value)\n",
" if self.log:\n",
" print(\n",
" \"Replacing\", all_terminals(\n",
" children[child_index]), \"by\", value)\n",
"\n",
" # children[child_index] = (value, [])\n",
" child_symbol, _ = children[child_index]\n",
" children[child_index] = (child_symbol, [(value, [])])\n",
" elif result is None:\n",
" pass\n",
" elif isinstance(result, bool):\n",
" pass\n",
" else:\n",
" if self.log:\n",
" print(\"Replacing\", \"\".join(\n",
" [all_terminals(c) for c in children]), \"by\", result)\n",
"\n",
" children = [(repr(result), [])]\n",
"\n",
" return children"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Example: Numeric Ranges\n",
"\n",
"With the above extensions, we have full support for pre-expansion functions. Using the augmented `CHARGE_GRAMMAR`, we find that the pre-expansion `lambda` function is actually used:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.826290Z",
"iopub.status.busy": "2024-06-30T20:09:25.826187Z",
"iopub.status.idle": "2024-06-30T20:09:25.829837Z",
"shell.execute_reply": "2024-06-30T20:09:25.829514Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'Charge $439383.87 to my credit card 2433506594138520'"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"charge_fuzzer = GeneratorGrammarFuzzer(CHARGE_GRAMMAR)\n",
"charge_fuzzer.fuzz()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The log reveals a bit more details what happens when the pre-expansion function is called. We see that the expansion `.` is directly replaced by the computed value:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.831433Z",
"iopub.status.busy": "2024-06-30T20:09:25.831307Z",
"iopub.status.idle": "2024-06-30T20:09:25.834324Z",
"shell.execute_reply": "2024-06-30T20:09:25.834024Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Tree: \n",
"Expanding randomly\n",
"Tree: $\n",
"Expanding randomly\n",
" at 0x104432e60>() = 382087.72\n",
"Replacing . by 382087.72\n",
"Tree: $382087.72\n",
"'$382087.72'\n"
]
},
{
"data": {
"text/plain": [
"'$382087.72'"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"amount_fuzzer = GeneratorGrammarFuzzer(\n",
" CHARGE_GRAMMAR, start_symbol=\"\", log=True)\n",
"amount_fuzzer.fuzz()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Example: More Numeric Ranges\n",
"\n",
"We can use such pre-expansion functions in other contexts, too. Suppose we want to generate arithmetic expressions in which each number is between 100 and 200. We can extend `EXPR_GRAMMAR` accordingly:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.836108Z",
"iopub.status.busy": "2024-06-30T20:09:25.835930Z",
"iopub.status.idle": "2024-06-30T20:09:25.838665Z",
"shell.execute_reply": "2024-06-30T20:09:25.838317Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"expr_100_200_grammar = extend_grammar(EXPR_GRAMMAR,\n",
" {\n",
" \"\": [\n",
" \"+\", \"-\", \"()\",\n",
"\n",
" # Generate only the integer part with a function;\n",
" # the fractional part comes from\n",
" # the grammar\n",
" (\".\", opts(\n",
" pre=lambda: [random.randint(100, 200), None])),\n",
"\n",
" # Generate the entire integer\n",
" # from the function\n",
" (\"\", opts(\n",
" pre=lambda: random.randint(100, 200))),\n",
" ],\n",
" }\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.840527Z",
"iopub.status.busy": "2024-06-30T20:09:25.840398Z",
"iopub.status.idle": "2024-06-30T20:09:25.845432Z",
"shell.execute_reply": "2024-06-30T20:09:25.844967Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'(108.6 / 155 + 177) / 118 * 120 * 107 + 151 + 195 / -200 - 150 * 188 / 147 + 112'"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"expr_100_200_fuzzer = GeneratorGrammarFuzzer(expr_100_200_grammar)\n",
"expr_100_200_fuzzer.fuzz()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Support for Python Generators\n",
"\n",
"The Python language has its own concept of generator functions, which we of course want to support as well. A *generator function in Python* is a function that returns a so-called *iterator object* which we can iterate over, one value at a time."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"To create a generator function in Python, one defines a normal function, using the `yield` statement instead of a `return` statement. While a `return` statement terminates the function, a `yield` statement pauses its execution, saving all of its state, to be resumed later for the next successive calls."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Here is an example of a generator function. When first invoked, `iterate()` yields the value 1, followed by 2, 3, and so on:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.847908Z",
"iopub.status.busy": "2024-06-30T20:09:25.847752Z",
"iopub.status.idle": "2024-06-30T20:09:25.849901Z",
"shell.execute_reply": "2024-06-30T20:09:25.849541Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"def iterate():\n",
" t = 0\n",
" while True:\n",
" t = t + 1\n",
" yield t"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We can use `iterate` in a loop, just like the `range()` function (which also is a generator function):"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.851869Z",
"iopub.status.busy": "2024-06-30T20:09:25.851748Z",
"iopub.status.idle": "2024-06-30T20:09:25.853726Z",
"shell.execute_reply": "2024-06-30T20:09:25.853423Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 2 3 4 5 6 7 8 9 10 "
]
}
],
"source": [
"for i in iterate():\n",
" if i > 10:\n",
" break\n",
" print(i, end=\" \")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We can also use `iterate()` as a pre-expansion generator function, ensuring it will create one successive integer after another:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.855806Z",
"iopub.status.busy": "2024-06-30T20:09:25.855644Z",
"iopub.status.idle": "2024-06-30T20:09:25.858023Z",
"shell.execute_reply": "2024-06-30T20:09:25.857608Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"iterate_grammar = extend_grammar(EXPR_GRAMMAR,\n",
" {\n",
" \"\": [\n",
" \"+\", \"-\", \"()\",\n",
" # \".\",\n",
"\n",
" # Generate one integer after another\n",
" # from the function\n",
" (\"\", opts(pre=iterate)),\n",
" ],\n",
" })"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"To support generators, our `process_chosen_children()` method, above, checks whether a function is a generator; if so, it invokes the `run_generator()` method. When `run_generator()` sees the function for the first time during a `fuzz_tree()` (or `fuzz()`) call, it invokes the function to create a generator object; this is saved in the `generators` attribute, and then called. Subsequent calls directly go to the generator, preserving state."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.859835Z",
"iopub.status.busy": "2024-06-30T20:09:25.859701Z",
"iopub.status.idle": "2024-06-30T20:09:25.862804Z",
"shell.execute_reply": "2024-06-30T20:09:25.862358Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n",
" def fuzz_tree(self) -> DerivationTree:\n",
" self.reset_generators()\n",
" return super().fuzz_tree()\n",
"\n",
" def reset_generators(self) -> None:\n",
" self.generators: Dict[str, Iterator] = {}\n",
"\n",
" def run_generator(self, expansion: Expansion,\n",
" function: Callable) -> Iterator:\n",
" key = repr((expansion, function))\n",
" if key not in self.generators:\n",
" self.generators[key] = function()\n",
" generator = self.generators[key]\n",
" return next(generator)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Does this work? Let us run our fuzzer on the above grammar, using `iterator()`:"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.864977Z",
"iopub.status.busy": "2024-06-30T20:09:25.864874Z",
"iopub.status.idle": "2024-06-30T20:09:25.871884Z",
"shell.execute_reply": "2024-06-30T20:09:25.871510Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'1 * ++++3 / ---+4 - 2 * +--6 / 7 * 10 - (9 - 11) - 5 + (13) * 14 + 8 + 12'"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"iterate_fuzzer = GeneratorGrammarFuzzer(iterate_grammar)\n",
"iterate_fuzzer.fuzz()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We see that the expression contains all integers starting with 1."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Instead of specifying our own Python generator function such as `iterate()`, we can also use one of the built-in Python generators such as `range()`. This will also generate integers starting with 1:"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.874032Z",
"iopub.status.busy": "2024-06-30T20:09:25.873872Z",
"iopub.status.idle": "2024-06-30T20:09:25.876088Z",
"shell.execute_reply": "2024-06-30T20:09:25.875771Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"iterate_grammar = extend_grammar(EXPR_GRAMMAR,\n",
" {\n",
" \"\": [\n",
" \"+\", \"-\", \"()\",\n",
" (\"\", opts(pre=range(1, 1000))),\n",
" ],\n",
" })"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"It is also possible to use Python list comprehensions, by adding their generator functions in parentheses:"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.877791Z",
"iopub.status.busy": "2024-06-30T20:09:25.877643Z",
"iopub.status.idle": "2024-06-30T20:09:25.880039Z",
"shell.execute_reply": "2024-06-30T20:09:25.879605Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"iterate_grammar = extend_grammar(EXPR_GRAMMAR,\n",
" {\n",
" \"\": [\n",
" \"+\", \"-\", \"()\",\n",
" (\"\", opts(\n",
" pre=(x for x in range(1, 1000)))),\n",
" ],\n",
" })"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Note that both above grammars will actually cause the fuzzer to raise an exception when more than 1,000 integers are created, but you will find it very easy to fix this."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Finally, `yield` is actually an expression, not a statement, so it is also possible to have a `lambda` expression `yield` a value. If you find some reasonable use for this, let us know."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"toc-hr-collapsed": true
},
"source": [
"## Checking and Repairing Elements after Expansion\n",
"\n",
"Let us now turn to our second set of functions to be supported – namely, post-expansion functions. The simplest way of using them is to run them once the entire tree is generated, taking care of replacements as with `pre` functions. If one of them returns `False`, however, we start anew."
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.881849Z",
"iopub.status.busy": "2024-06-30T20:09:25.881722Z",
"iopub.status.idle": "2024-06-30T20:09:25.884184Z",
"shell.execute_reply": "2024-06-30T20:09:25.883932Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n",
" def fuzz_tree(self) -> DerivationTree:\n",
" while True:\n",
" tree = super().fuzz_tree()\n",
" (symbol, children) = tree\n",
" result, new_children = self.run_post_functions(tree)\n",
" if not isinstance(result, bool) or result:\n",
" return (symbol, new_children)\n",
" self.restart_expansion()\n",
"\n",
" def restart_expansion(self) -> None:\n",
" # To be overloaded in subclasses\n",
" self.reset_generators()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The method `run_post_functions()` is applied recursively on all nodes of the derivation tree. For each node, it determines the expansion applied, and then runs the function associated with that expansion."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.885850Z",
"iopub.status.busy": "2024-06-30T20:09:25.885734Z",
"iopub.status.idle": "2024-06-30T20:09:25.889622Z",
"shell.execute_reply": "2024-06-30T20:09:25.889316Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n",
" # Return True iff all constraints of grammar are satisfied in TREE\n",
" def run_post_functions(self, tree: DerivationTree,\n",
" depth: Union[int, float] = float(\"inf\")) \\\n",
" -> Tuple[bool, Optional[List[DerivationTree]]]:\n",
" symbol: str = tree[0]\n",
" children: List[DerivationTree] = cast(List[DerivationTree], tree[1])\n",
"\n",
" if children == []:\n",
" return True, children # Terminal symbol\n",
"\n",
" try:\n",
" expansion = self.find_expansion(tree)\n",
" except KeyError:\n",
" # Expansion (no longer) found - ignore\n",
" return True, children\n",
"\n",
" result = True\n",
" function = exp_post_expansion_function(expansion)\n",
" if function is not None:\n",
" result = self.eval_function(tree, function)\n",
" if isinstance(result, bool) and not result:\n",
" if self.log:\n",
" print(\n",
" all_terminals(tree),\n",
" \"did not satisfy\",\n",
" symbol,\n",
" \"constraint\")\n",
" return False, children\n",
"\n",
" children = self.apply_result(result, children)\n",
"\n",
" if depth > 0:\n",
" for c in children:\n",
" result, _ = self.run_post_functions(c, depth - 1)\n",
" if isinstance(result, bool) and not result:\n",
" return False, children\n",
"\n",
" return result, children"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The helper method `find_expansion()` takes a subtree `tree` and determines the expansion from the grammar that was applied to create the children in `tree`."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.891301Z",
"iopub.status.busy": "2024-06-30T20:09:25.891200Z",
"iopub.status.idle": "2024-06-30T20:09:25.893655Z",
"shell.execute_reply": "2024-06-30T20:09:25.893365Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n",
" def find_expansion(self, tree):\n",
" symbol, children = tree\n",
"\n",
" applied_expansion = \\\n",
" \"\".join([child_symbol for child_symbol, _ in children])\n",
"\n",
" for expansion in self.grammar[symbol]:\n",
" if exp_string(expansion) == applied_expansion:\n",
" return expansion\n",
"\n",
" raise KeyError(\n",
" symbol +\n",
" \": did not find expansion \" +\n",
" repr(applied_expansion))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The method `eval_function()` is the one that takes care of actually invoking the post-expansion function. It creates an argument list containing the expansions of all nonterminal children – that is, one argument for each symbol in the grammar expansion. It then calls the given function."
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.895278Z",
"iopub.status.busy": "2024-06-30T20:09:25.895183Z",
"iopub.status.idle": "2024-06-30T20:09:25.897892Z",
"shell.execute_reply": "2024-06-30T20:09:25.897615Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class GeneratorGrammarFuzzer(GeneratorGrammarFuzzer):\n",
" def eval_function(self, tree, function):\n",
" symbol, children = tree\n",
"\n",
" assert callable(function)\n",
"\n",
" args = []\n",
" for (symbol, exp) in children:\n",
" if exp != [] and exp is not None:\n",
" symbol_value = all_terminals((symbol, exp))\n",
" args.append(symbol_value)\n",
"\n",
" result = function(*args)\n",
" if self.log:\n",
" print(repr(function) + repr(tuple(args)), \"=\", repr(result))\n",
"\n",
" return result"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Note that unlike pre-expansion functions, post-expansion functions typically process the values already produced, so we do not support Python generators here."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Example: Negative Expressions\n",
"\n",
"Let us try out these post-expression functions on an example. Suppose we want to produce only arithmetic expressions that evaluate to a negative number – for instance, to feed such generated expressions into a compiler or some other external system. Doing so constructively with `pre` functions would be very difficult. Instead, we can define a constraint that checks for precisely this property, using the Python `eval()` function."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The Python `eval()` function takes a string and evaluates it according to Python rules. Since the syntax of our generated expressions is slightly different from Python, and since Python can raise arithmetic exceptions during evaluation, we need a means to handle such errors gracefully. The function `eval_with_exception()` wraps around `eval()`; if an exception occurs during evaluation, it returns False – which causes the production algorithm to produce another value."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.899411Z",
"iopub.status.busy": "2024-06-30T20:09:25.899327Z",
"iopub.status.idle": "2024-06-30T20:09:25.901019Z",
"shell.execute_reply": "2024-06-30T20:09:25.900770Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from ExpectError import ExpectError"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.902510Z",
"iopub.status.busy": "2024-06-30T20:09:25.902421Z",
"iopub.status.idle": "2024-06-30T20:09:25.904255Z",
"shell.execute_reply": "2024-06-30T20:09:25.903999Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def eval_with_exception(s):\n",
" # Use \"mute=True\" to suppress all messages\n",
" with ExpectError(print_traceback=False):\n",
" return eval(s)\n",
" return False"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.905762Z",
"iopub.status.busy": "2024-06-30T20:09:25.905666Z",
"iopub.status.idle": "2024-06-30T20:09:25.908062Z",
"shell.execute_reply": "2024-06-30T20:09:25.907783Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"negative_expr_grammar = extend_grammar(EXPR_GRAMMAR,\n",
" {\n",
" \"\": [(\"\", opts(post=lambda s: eval_with_exception(s) < 0))]\n",
" }\n",
" )\n",
"\n",
"assert is_valid_grammar(negative_expr_grammar)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.909784Z",
"iopub.status.busy": "2024-06-30T20:09:25.909656Z",
"iopub.status.idle": "2024-06-30T20:09:25.920814Z",
"shell.execute_reply": "2024-06-30T20:09:25.920507Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"ZeroDivisionError: division by zero (expected)\n"
]
},
{
"data": {
"text/plain": [
"'(8.9 / 6 * 4 - 0.2 + -7 - 7 - 8 * 6) * 7 * 15.55 - -945.9'"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"negative_expr_fuzzer = GeneratorGrammarFuzzer(negative_expr_grammar)\n",
"expr = negative_expr_fuzzer.fuzz()\n",
"expr"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The result is indeed negative:"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.922711Z",
"iopub.status.busy": "2024-06-30T20:09:25.922590Z",
"iopub.status.idle": "2024-06-30T20:09:25.924819Z",
"shell.execute_reply": "2024-06-30T20:09:25.924564Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"-5178.726666666667"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"eval(expr)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Example: Matching XML Tags\n",
"\n",
"Post-expansion functions can not only be used to _check_ expansions, but also to repair them. To this end, we can have them return a string or a list of strings; just like pre-expansion functions, these strings would then replace the entire expansion or individual symbols."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"As an example, consider *XML documents*, which are composed of text within matching _XML tags_. For instance, consider the following fragment in HTML, a subset of XML:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.926350Z",
"iopub.status.busy": "2024-06-30T20:09:25.926235Z",
"iopub.status.idle": "2024-06-30T20:09:25.928061Z",
"shell.execute_reply": "2024-06-30T20:09:25.927750Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from bookutils import HTML"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"execution": {
"iopub.execute_input": "2024-06-30T20:09:25.929795Z",
"iopub.status.busy": "2024-06-30T20:09:25.929648Z",
"iopub.status.idle": "2024-06-30T20:09:25.932147Z",
"shell.execute_reply": "2024-06-30T20:09:25.931865Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/html": [
"A bold text"
],
"text/plain": [
""
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"HTML(\"A bold text\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"This fragment consists of two HTML (XML) tags that surround the text; the tag name (`strong`) is present both in the opening (``) as well as in the closing (``) tag."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"For a _finite_ set of tags (for instance, the HTML tags ``, ``, ``, `