{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"# Fuzzing APIs\n",
"\n",
"So far, we have always generated _system input_, i.e. data that the program as a whole obtains via its input channels. However, we can also generate inputs that go directly into individual functions, gaining flexibility and speed in the process. In this chapter, we explore the use of grammars to synthesize code for function calls, which allows you to generate _program code that very efficiently invokes functions directly._ "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:06.491411Z",
"iopub.status.busy": "2025-10-26T13:32:06.491115Z",
"iopub.status.idle": "2025-10-26T13:32:07.213325Z",
"shell.execute_reply": "2025-10-26T13:32:07.213025Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from bookutils import YouTubeVideo\n",
"YouTubeVideo('CC1VvOGkzm8')"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"**Prerequisites**\n",
"\n",
"* You have to know how grammar fuzzing work, e.g. from the [chapter on grammars](Grammars.ipynb).\n",
"* We make use of _generator functions_, as discussed in the [chapter on fuzzing with generators](GeneratorGrammarFuzzer.ipynb).\n",
"* We make use of probabilities, as discussed in the [chapter on fuzzing with probabilities](ProbabilisticGrammarFuzzer.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Synopsis\n",
"To [use the code provided in this chapter](Importing.ipynb), write\n",
"\n",
"```python\n",
">>> from fuzzingbook.APIFuzzer import \n",
"```\n",
"\n",
"and then make use of the following features.\n",
"\n",
"**Note**: The examples in this section only work after the rest of the cells have been executed.\n"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.727252Z",
"iopub.status.busy": "2025-10-26T13:32:08.727107Z",
"iopub.status.idle": "2025-10-26T13:32:08.728819Z",
"shell.execute_reply": "2025-10-26T13:32:08.728567Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from GeneratorGrammarFuzzer import ProbabilisticGeneratorGrammarFuzzer"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"`INT_GRAMMAR`, `FLOAT_GRAMMAR`, `ASCII_STRING_GRAMMAR` produce integers, floats, and strings, respectively:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.730009Z",
"iopub.status.busy": "2025-10-26T13:32:08.729916Z",
"iopub.status.idle": "2025-10-26T13:32:08.733430Z",
"shell.execute_reply": "2025-10-26T13:32:08.733029Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['-51', '9', '0', '0', '0', '0', '32', '0', '0', '0']"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fuzzer = ProbabilisticGeneratorGrammarFuzzer(INT_GRAMMAR)\n",
"[fuzzer.fuzz() for i in range(10)]"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.735139Z",
"iopub.status.busy": "2025-10-26T13:32:08.735011Z",
"iopub.status.idle": "2025-10-26T13:32:08.743214Z",
"shell.execute_reply": "2025-10-26T13:32:08.742954Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['0e0',\n",
" '-9.43e34',\n",
" '-7.3282e0',\n",
" '-9.5e-9',\n",
" '0',\n",
" '-30.840386e-5',\n",
" '3',\n",
" '-4.1e0',\n",
" '-9.7',\n",
" '413']"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fuzzer = ProbabilisticGeneratorGrammarFuzzer(FLOAT_GRAMMAR)\n",
"[fuzzer.fuzz() for i in range(10)]"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.744676Z",
"iopub.status.busy": "2025-10-26T13:32:08.744580Z",
"iopub.status.idle": "2025-10-26T13:32:08.867313Z",
"shell.execute_reply": "2025-10-26T13:32:08.866995Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['\"#vYV*t@I%KNTT[q~}&-v+[zAzj[X-z|RzC$(g$Br]1tC\\':5!5\":\n",
" ['urlparse(\"\")']\n",
"}\n",
"\n",
"# Import definitions from URL_GRAMMAR\n",
"URLPARSE_GRAMMAR.update(URL_GRAMMAR)\n",
"URLPARSE_GRAMMAR[\"\"] = [\"\"]\n",
"\n",
"assert is_valid_grammar(URLPARSE_GRAMMAR)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"This grammar creates calls in the form `urlparse()`, where `` comes from the \"imported\" URL grammar. The idea is to create many of these calls and to feed them into the Python interpreter."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:07.658410Z",
"iopub.status.busy": "2025-10-26T13:32:07.658318Z",
"iopub.status.idle": "2025-10-26T13:32:07.660723Z",
"shell.execute_reply": "2025-10-26T13:32:07.660477Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"{'': ['urlparse(\"\")'],\n",
" '': [''],\n",
" '': ['://'],\n",
" '': ['http', 'https', 'ftp', 'ftps'],\n",
" '': ['',\n",
" ':',\n",
" '@',\n",
" '@:'],\n",
" '': ['cispa.saarland', 'www.google.com', 'fuzzingbook.com'],\n",
" '': ['80', '8080', ''],\n",
" '': ['', ''],\n",
" '': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],\n",
" '': ['user:password'],\n",
" '': ['', '/', '/'],\n",
" '': ['abc', 'def', 'x'],\n",
" '': ['', '?'],\n",
" '': ['', '&'],\n",
" '': ['=', '=']}"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"URLPARSE_GRAMMAR"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We can now use this grammar for fuzzing and synthesizing calls to `urlparse)`:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:07.661914Z",
"iopub.status.busy": "2025-10-26T13:32:07.661820Z",
"iopub.status.idle": "2025-10-26T13:32:07.664300Z",
"shell.execute_reply": "2025-10-26T13:32:07.664067Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'urlparse(\"http://user:password@fuzzingbook.com:8080?abc=x29\")'"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"urlparse_fuzzer = GrammarFuzzer(URLPARSE_GRAMMAR)\n",
"urlparse_fuzzer.fuzz()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Just as above, we can immediately execute these calls. To better see what is happening, we define a small helper function:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:07.665498Z",
"iopub.status.busy": "2025-10-26T13:32:07.665412Z",
"iopub.status.idle": "2025-10-26T13:32:07.666966Z",
"shell.execute_reply": "2025-10-26T13:32:07.666761Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# Call function_name(arg[0], arg[1], ...) as a string\n",
"def do_call(call_string):\n",
" print(call_string)\n",
" result = eval(call_string)\n",
" print(\"\\t= \" + repr(result))\n",
" return result"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:07.668047Z",
"iopub.status.busy": "2025-10-26T13:32:07.667953Z",
"iopub.status.idle": "2025-10-26T13:32:07.670234Z",
"shell.execute_reply": "2025-10-26T13:32:07.670002Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"urlparse(\"http://www.google.com?abc=def\")\n",
"\t= ParseResult(scheme='http', netloc='www.google.com', path='', params='', query='abc=def', fragment='')\n"
]
},
{
"data": {
"text/plain": [
"ParseResult(scheme='http', netloc='www.google.com', path='', params='', query='abc=def', fragment='')"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"call = urlparse_fuzzer.fuzz()\n",
"do_call(call)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"If `urlparse()` were a C function, for instance, we could embed its call into some (also generated) C function:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:07.671432Z",
"iopub.status.busy": "2025-10-26T13:32:07.671337Z",
"iopub.status.idle": "2025-10-26T13:32:07.672912Z",
"shell.execute_reply": "2025-10-26T13:32:07.672728Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"URLPARSE_C_GRAMMAR: Grammar = {\n",
" \"\": [\"\"],\n",
" \"\": ['#include \"urlparse.h\"\\n\\n'],\n",
" \"\": [\"void test() {\\n}\\n\"],\n",
" \"\": [\"\", \"\"],\n",
" \"\": [' urlparse(\"\");\\n']\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:07.673912Z",
"iopub.status.busy": "2025-10-26T13:32:07.673828Z",
"iopub.status.idle": "2025-10-26T13:32:07.675413Z",
"shell.execute_reply": "2025-10-26T13:32:07.675062Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"URLPARSE_C_GRAMMAR.update(URL_GRAMMAR)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:07.676827Z",
"iopub.status.busy": "2025-10-26T13:32:07.676740Z",
"iopub.status.idle": "2025-10-26T13:32:07.678339Z",
"shell.execute_reply": "2025-10-26T13:32:07.678052Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"URLPARSE_C_GRAMMAR[\"\"] = [\"\"]"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:07.679464Z",
"iopub.status.busy": "2025-10-26T13:32:07.679392Z",
"iopub.status.idle": "2025-10-26T13:32:07.680854Z",
"shell.execute_reply": "2025-10-26T13:32:07.680660Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"assert is_valid_grammar(URLPARSE_C_GRAMMAR)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:07.681943Z",
"iopub.status.busy": "2025-10-26T13:32:07.681858Z",
"iopub.status.idle": "2025-10-26T13:32:07.684677Z",
"shell.execute_reply": "2025-10-26T13:32:07.684439Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"#include \"urlparse.h\"\n",
"\n",
"void test() {\n",
" urlparse(\"http://user:password@cispa.saarland:99/x69?x57=abc\");\n",
"}\n",
"\n"
]
}
],
"source": [
"urlparse_fuzzer = GrammarFuzzer(URLPARSE_C_GRAMMAR)\n",
"print(urlparse_fuzzer.fuzz())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Synthesizing Oracles\n",
"\n",
"In our `urlparse()` example, both the Python as well as the C variant only check for _generic_ errors in `urlparse()`; that is, they only detect fatal errors and exceptions. For a full test, we need to set up a specific *oracle* as well that checks whether the result is valid."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Our plan is to check whether specific parts of the URL reappear in the result – that is, if the scheme is `http:`, then the `ParseResult` returned should also contain a `http:` scheme. As discussed in the [chapter on fuzzing with generators](GeneratorGrammarFuzzer.ipynb), equalities of strings such as `http:` across two symbols cannot be expressed in a context-free grammar. We can, however, use a _generator function_ (also introduced in the [chapter on fuzzing with generators](GeneratorGrammarFuzzer.ipynb)) to automatically enforce such equalities."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Here is an example. Invoking `geturl()` on a `urlparse()` result should return the URL as originally passed to `urlparse()`."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:07.686077Z",
"iopub.status.busy": "2025-10-26T13:32:07.685970Z",
"iopub.status.idle": "2025-10-26T13:32:08.316173Z",
"shell.execute_reply": "2025-10-26T13:32:08.315818Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from GeneratorGrammarFuzzer import GeneratorGrammarFuzzer, ProbabilisticGeneratorGrammarFuzzer"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.317659Z",
"iopub.status.busy": "2025-10-26T13:32:08.317564Z",
"iopub.status.idle": "2025-10-26T13:32:08.319359Z",
"shell.execute_reply": "2025-10-26T13:32:08.319136Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"URLPARSE_ORACLE_GRAMMAR: Grammar = extend_grammar(URLPARSE_GRAMMAR,\n",
"{\n",
" \"\": [(\"assert urlparse('').geturl() == ''\",\n",
" opts(post=lambda url_1, url_2: [None, url_1]))]\n",
"})"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.320726Z",
"iopub.status.busy": "2025-10-26T13:32:08.320616Z",
"iopub.status.idle": "2025-10-26T13:32:08.323582Z",
"shell.execute_reply": "2025-10-26T13:32:08.323330Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"assert urlparse('https://user:password@cispa.saarland/abc?abc=abc').geturl() == 'https://user:password@cispa.saarland/abc?abc=abc'\n"
]
}
],
"source": [
"urlparse_oracle_fuzzer = GeneratorGrammarFuzzer(URLPARSE_ORACLE_GRAMMAR)\n",
"test = urlparse_oracle_fuzzer.fuzz()\n",
"print(test)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.324728Z",
"iopub.status.busy": "2025-10-26T13:32:08.324644Z",
"iopub.status.idle": "2025-10-26T13:32:08.326268Z",
"shell.execute_reply": "2025-10-26T13:32:08.326019Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"exec(test)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"In a similar way, we can also check individual components of the result:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.327483Z",
"iopub.status.busy": "2025-10-26T13:32:08.327398Z",
"iopub.status.idle": "2025-10-26T13:32:08.329603Z",
"shell.execute_reply": "2025-10-26T13:32:08.329384Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"URLPARSE_ORACLE_GRAMMAR: Grammar = extend_grammar(URLPARSE_GRAMMAR,\n",
"{\n",
" \"\": [(\"result = urlparse('://?')\\n\"\n",
" # + \"print(result)\\n\"\n",
" + \"assert result.scheme == ''\\n\"\n",
" + \"assert result.netloc == ''\\n\"\n",
" + \"assert result.path == ''\\n\"\n",
" + \"assert result.query == ''\",\n",
" opts(post=lambda scheme_1, authority_1, path_1, params_1,\n",
" scheme_2, authority_2, path_2, params_2:\n",
" [None, None, None, None,\n",
" scheme_1, authority_1, path_1, params_1]))]\n",
"})\n",
"\n",
"# Get rid of unused symbols\n",
"del URLPARSE_ORACLE_GRAMMAR[\"\"]\n",
"del URLPARSE_ORACLE_GRAMMAR[\"\"]\n",
"del URLPARSE_ORACLE_GRAMMAR[\"\"]\n",
"del URLPARSE_ORACLE_GRAMMAR[\"\"]\n",
"del URLPARSE_ORACLE_GRAMMAR[\"\"]"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.330789Z",
"iopub.status.busy": "2025-10-26T13:32:08.330709Z",
"iopub.status.idle": "2025-10-26T13:32:08.333533Z",
"shell.execute_reply": "2025-10-26T13:32:08.333252Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"result = urlparse('https://www.google.com/?def=18&abc=abc')\n",
"assert result.scheme == 'https'\n",
"assert result.netloc == 'www.google.com'\n",
"assert result.path == '/'\n",
"assert result.query == 'def=18&abc=abc'\n"
]
}
],
"source": [
"urlparse_oracle_fuzzer = GeneratorGrammarFuzzer(URLPARSE_ORACLE_GRAMMAR)\n",
"test = urlparse_oracle_fuzzer.fuzz()\n",
"print(test)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.335118Z",
"iopub.status.busy": "2025-10-26T13:32:08.335013Z",
"iopub.status.idle": "2025-10-26T13:32:08.336674Z",
"shell.execute_reply": "2025-10-26T13:32:08.336434Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"exec(test)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The use of generator functions may feel a bit cumbersome. Indeed, if we uniquely stick to Python, we could also create a _unit test_ that directly invokes the fuzzer to generate individual parts:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.337857Z",
"iopub.status.busy": "2025-10-26T13:32:08.337773Z",
"iopub.status.idle": "2025-10-26T13:32:08.339350Z",
"shell.execute_reply": "2025-10-26T13:32:08.339169Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def fuzzed_url_element(symbol):\n",
" return GrammarFuzzer(URLPARSE_GRAMMAR, start_symbol=symbol).fuzz()"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.340427Z",
"iopub.status.busy": "2025-10-26T13:32:08.340359Z",
"iopub.status.idle": "2025-10-26T13:32:08.342884Z",
"shell.execute_reply": "2025-10-26T13:32:08.342527Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"scheme = fuzzed_url_element(\"\")\n",
"authority = fuzzed_url_element(\"\")\n",
"path = fuzzed_url_element(\"\")\n",
"query = fuzzed_url_element(\"\")\n",
"url = \"%s://%s%s?%s\" % (scheme, authority, path, query)\n",
"result = urlparse(url)\n",
"# print(result)\n",
"assert result.geturl() == url\n",
"assert result.scheme == scheme\n",
"assert result.path == path\n",
"assert result.query == query"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Using such a unit test makes it easier to express oracles. However, we lose the ability to systematically cover individual URL elements and alternatives as with [`GrammarCoverageFuzzer`](GrammarCoverageFuzzer.ipynb) as well as the ability to guide generation towards specific elements as with [`ProbabilisticGrammarFuzzer`](ProbabilisticGrammarFuzzer.ipynb). Furthermore, a grammar allows us to generate tests for arbitrary programming languages and APIs."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"toc-hr-collapsed": false
},
"source": [
"## Synthesizing Data\n",
"\n",
"For `urlparse()`, we have used a very specific grammar for creating a very specific argument. Many functions take basic data types as (some) arguments, though; we therefore define grammars that generate precisely those arguments. Even better, we can define functions that _generate_ grammars tailored towards our specific needs, returning values in a particular range, for instance."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Integers\n",
"\n",
"We introduce a simple grammar to produce integers."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.344741Z",
"iopub.status.busy": "2025-10-26T13:32:08.344582Z",
"iopub.status.idle": "2025-10-26T13:32:08.346252Z",
"shell.execute_reply": "2025-10-26T13:32:08.346012Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from Grammars import convert_ebnf_grammar, crange"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.347285Z",
"iopub.status.busy": "2025-10-26T13:32:08.347205Z",
"iopub.status.idle": "2025-10-26T13:32:08.348661Z",
"shell.execute_reply": "2025-10-26T13:32:08.348473Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from ProbabilisticGrammarFuzzer import ProbabilisticGrammarFuzzer"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.349835Z",
"iopub.status.busy": "2025-10-26T13:32:08.349759Z",
"iopub.status.idle": "2025-10-26T13:32:08.351406Z",
"shell.execute_reply": "2025-10-26T13:32:08.351199Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"INT_EBNF_GRAMMAR: Grammar = {\n",
" \"\": [\"\"],\n",
" \"\": [\"<_int>\"],\n",
" \"<_int>\": [\"(-)?*\", \"0\"],\n",
" \"\": crange('1', '9'),\n",
" \"\": crange('0', '9')\n",
"}\n",
"\n",
"assert is_valid_grammar(INT_EBNF_GRAMMAR)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.352534Z",
"iopub.status.busy": "2025-10-26T13:32:08.352456Z",
"iopub.status.idle": "2025-10-26T13:32:08.354544Z",
"shell.execute_reply": "2025-10-26T13:32:08.354324Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"{'': [''],\n",
" '': ['<_int>'],\n",
" '<_int>': ['', '0'],\n",
" '': ['1', '2', '3', '4', '5', '6', '7', '8', '9'],\n",
" '': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],\n",
" '': ['-'],\n",
" '': ['', ''],\n",
" '': ['', '']}"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"INT_GRAMMAR = convert_ebnf_grammar(INT_EBNF_GRAMMAR)\n",
"INT_GRAMMAR"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.355661Z",
"iopub.status.busy": "2025-10-26T13:32:08.355576Z",
"iopub.status.idle": "2025-10-26T13:32:08.358675Z",
"shell.execute_reply": "2025-10-26T13:32:08.358402Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['699', '-44', '321', '-7', '-6', '67', '0', '0', '57', '0']\n"
]
}
],
"source": [
"int_fuzzer = GrammarFuzzer(INT_GRAMMAR)\n",
"print([int_fuzzer.fuzz() for i in range(10)])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"If we need integers in a specific range, we can add a generator function that does right that:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.359902Z",
"iopub.status.busy": "2025-10-26T13:32:08.359818Z",
"iopub.status.idle": "2025-10-26T13:32:08.361326Z",
"shell.execute_reply": "2025-10-26T13:32:08.361064Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from Grammars import set_opts"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.362299Z",
"iopub.status.busy": "2025-10-26T13:32:08.362226Z",
"iopub.status.idle": "2025-10-26T13:32:08.363576Z",
"shell.execute_reply": "2025-10-26T13:32:08.363393Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import random"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.364673Z",
"iopub.status.busy": "2025-10-26T13:32:08.364572Z",
"iopub.status.idle": "2025-10-26T13:32:08.366340Z",
"shell.execute_reply": "2025-10-26T13:32:08.366131Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def int_grammar_with_range(start, end):\n",
" int_grammar = extend_grammar(INT_GRAMMAR)\n",
" set_opts(int_grammar, \"\", \"<_int>\",\n",
" opts(pre=lambda: random.randint(start, end)))\n",
" return int_grammar"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.367482Z",
"iopub.status.busy": "2025-10-26T13:32:08.367407Z",
"iopub.status.idle": "2025-10-26T13:32:08.369839Z",
"shell.execute_reply": "2025-10-26T13:32:08.369503Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['942', '955', '997', '967', '939', '923', '984', '914', '991', '982']"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int_fuzzer = GeneratorGrammarFuzzer(int_grammar_with_range(900, 1000))\n",
"[int_fuzzer.fuzz() for i in range(10)]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Floats\n",
"\n",
"The grammar for floating-point values closely resembles the integer grammar."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.371600Z",
"iopub.status.busy": "2025-10-26T13:32:08.371486Z",
"iopub.status.idle": "2025-10-26T13:32:08.373505Z",
"shell.execute_reply": "2025-10-26T13:32:08.373285Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"FLOAT_EBNF_GRAMMAR: Grammar = {\n",
" \"\": [\"\"],\n",
" \"\": [(\"<_float>\", opts(prob=0.9)), \"inf\", \"NaN\"],\n",
" \"<_float>\": [\"(.+)??\"],\n",
" \"\": [\"e\"]\n",
"}\n",
"FLOAT_EBNF_GRAMMAR.update(INT_EBNF_GRAMMAR)\n",
"FLOAT_EBNF_GRAMMAR[\"\"] = [\"\"]\n",
"\n",
"assert is_valid_grammar(FLOAT_EBNF_GRAMMAR)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.374711Z",
"iopub.status.busy": "2025-10-26T13:32:08.374612Z",
"iopub.status.idle": "2025-10-26T13:32:08.377653Z",
"shell.execute_reply": "2025-10-26T13:32:08.377212Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"{'': [''],\n",
" '': [('<_float>', {'prob': 0.9}), 'inf', 'NaN'],\n",
" '<_float>': [''],\n",
" '': ['e'],\n",
" '': ['<_int>'],\n",
" '<_int>': ['', '0'],\n",
" '': ['1', '2', '3', '4', '5', '6', '7', '8', '9'],\n",
" '': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],\n",
" '': ['.'],\n",
" '': ['-'],\n",
" '': ['', ''],\n",
" '': ['', ''],\n",
" '': ['', ''],\n",
" '': ['', ''],\n",
" '': ['', '']}"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"FLOAT_GRAMMAR = convert_ebnf_grammar(FLOAT_EBNF_GRAMMAR)\n",
"FLOAT_GRAMMAR"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.379628Z",
"iopub.status.busy": "2025-10-26T13:32:08.379522Z",
"iopub.status.idle": "2025-10-26T13:32:08.384530Z",
"shell.execute_reply": "2025-10-26T13:32:08.384106Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['0', '-4e0', '-3.3', '0.55e0', '0e2', '0.2', '-48.6e0', '0.216', '-4.844', '-6.100']\n"
]
}
],
"source": [
"float_fuzzer = ProbabilisticGrammarFuzzer(FLOAT_GRAMMAR)\n",
"print([float_fuzzer.fuzz() for i in range(10)])"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.386257Z",
"iopub.status.busy": "2025-10-26T13:32:08.386148Z",
"iopub.status.idle": "2025-10-26T13:32:08.387947Z",
"shell.execute_reply": "2025-10-26T13:32:08.387681Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def float_grammar_with_range(start, end):\n",
" float_grammar = extend_grammar(FLOAT_GRAMMAR)\n",
" set_opts(float_grammar, \"\", \"<_float>\", opts(\n",
" pre=lambda: start + random.random() * (end - start)))\n",
" return float_grammar"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.389291Z",
"iopub.status.busy": "2025-10-26T13:32:08.389195Z",
"iopub.status.idle": "2025-10-26T13:32:08.392173Z",
"shell.execute_reply": "2025-10-26T13:32:08.391889Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['900.1695968039919',\n",
" '900.3273891873373',\n",
" '900.225192820568',\n",
" '900.3231805358258',\n",
" '900.4963527393471',\n",
" 'inf',\n",
" 'inf',\n",
" '900.6037658059212',\n",
" '900.6212350658716',\n",
" '900.3877831415683']"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"float_fuzzer = ProbabilisticGeneratorGrammarFuzzer(\n",
" float_grammar_with_range(900.0, 900.9))\n",
"[float_fuzzer.fuzz() for i in range(10)]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Strings"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Finally, we introduce a grammar for producing strings."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.393573Z",
"iopub.status.busy": "2025-10-26T13:32:08.393478Z",
"iopub.status.idle": "2025-10-26T13:32:08.395581Z",
"shell.execute_reply": "2025-10-26T13:32:08.395121Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"ASCII_STRING_EBNF_GRAMMAR: Grammar = {\n",
" \"\": [\"\"],\n",
" \"\": ['\"\"'],\n",
" \"\": [\n",
" (\"\", opts(prob=0.05)),\n",
" \"\"\n",
" ],\n",
" \"\": crange(\" \", \"!\") + [r'\\\"'] + crange(\"#\", \"~\")\n",
"}\n",
"\n",
"assert is_valid_grammar(ASCII_STRING_EBNF_GRAMMAR)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.397370Z",
"iopub.status.busy": "2025-10-26T13:32:08.397190Z",
"iopub.status.idle": "2025-10-26T13:32:08.399364Z",
"shell.execute_reply": "2025-10-26T13:32:08.399062Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"ASCII_STRING_GRAMMAR = convert_ebnf_grammar(ASCII_STRING_EBNF_GRAMMAR)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.400613Z",
"iopub.status.busy": "2025-10-26T13:32:08.400509Z",
"iopub.status.idle": "2025-10-26T13:32:08.585248Z",
"shell.execute_reply": "2025-10-26T13:32:08.584842Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['\"BgY)\"', '\"j[-64Big65wso(f:wg|}w&*D9JthLX}0@PT^]mr[`69Cq8H713ITYx<#jpml)\\\\\"\"', '\"{);XWZJ@d`\\'[h#F{1)C9M?%C`=\"', '\"Y\"', '\"C4gh`?uzJzD~$\\\\\\\\\"=|j)jj=SrBLIJ@0IbYiwIvNf5#pT4QUR}[g,35?Wg4i?3TdIsR0|eq3r;ZKuyI\\'<\\\\\"[p/x$<$B!\\\\\"_\"', '\"J0HG33+E(p8JQtKW.;G7 ^?.\"', '\"7r^B:Jf*J.@sqfED|M)3,eJ&OD\"', '\"c3Hcx^&*~3\\\\\"Jvac}cX\"', '\"\\'IHBQ:N+U:w(OAFn0pHLzX\"', '\"x4agH>H-2{Q|\\\\kpYF\"']\n"
]
}
],
"source": [
"string_fuzzer = ProbabilisticGrammarFuzzer(ASCII_STRING_GRAMMAR)\n",
"print([string_fuzzer.fuzz() for i in range(10)])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Synthesizing Composite Data\n",
"\n",
"From basic data, as discussed above, we can also produce _composite data_ in data structures such as sets or lists. We illustrate such generation on lists."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Lists"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.587302Z",
"iopub.status.busy": "2025-10-26T13:32:08.587144Z",
"iopub.status.idle": "2025-10-26T13:32:08.589155Z",
"shell.execute_reply": "2025-10-26T13:32:08.588886Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"LIST_EBNF_GRAMMAR: Grammar = {\n",
" \"\": [\"\"],\n",
" \"\": [\n",
" (\"[]\", opts(prob=0.05)),\n",
" \"[]\"\n",
" ],\n",
" \"\": [\n",
" (\"\", opts(prob=0.2)),\n",
" \", \"\n",
" ],\n",
" \"\": [\"0\"],\n",
"}\n",
"\n",
"assert is_valid_grammar(LIST_EBNF_GRAMMAR)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.590413Z",
"iopub.status.busy": "2025-10-26T13:32:08.590311Z",
"iopub.status.idle": "2025-10-26T13:32:08.592199Z",
"shell.execute_reply": "2025-10-26T13:32:08.591744Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"LIST_GRAMMAR = convert_ebnf_grammar(LIST_EBNF_GRAMMAR)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Our list generator takes a grammar that produces objects; it then instantiates a list grammar with the objects from these grammars."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.594645Z",
"iopub.status.busy": "2025-10-26T13:32:08.594483Z",
"iopub.status.idle": "2025-10-26T13:32:08.596721Z",
"shell.execute_reply": "2025-10-26T13:32:08.596412Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"def list_grammar(object_grammar, list_object_symbol=None):\n",
" obj_list_grammar = extend_grammar(LIST_GRAMMAR)\n",
" if list_object_symbol is None:\n",
" # Default: Use the first expansion of as list symbol\n",
" list_object_symbol = object_grammar[START_SYMBOL][0]\n",
"\n",
" obj_list_grammar.update(object_grammar)\n",
" obj_list_grammar[START_SYMBOL] = [\"\"]\n",
" obj_list_grammar[\"\"] = [list_object_symbol]\n",
"\n",
" assert is_valid_grammar(obj_list_grammar)\n",
"\n",
" return obj_list_grammar"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.598150Z",
"iopub.status.busy": "2025-10-26T13:32:08.598047Z",
"iopub.status.idle": "2025-10-26T13:32:08.626596Z",
"shell.execute_reply": "2025-10-26T13:32:08.626337Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['[0, -4, 23, 0, 0, 9, 0, -6067681]',\n",
" '[-1, -1, 0, -7]',\n",
" '[-5, 0]',\n",
" '[1, 0, -628088, -6, -811, 0, 99, 0]',\n",
" '[-35, -10, 0, 67]',\n",
" '[-3, 0, -2, 0, 0]',\n",
" '[0, -267, -78, -733, 0, 0, 0, 0]',\n",
" '[0, -6, 71, -9]',\n",
" '[-72, 76, 0, 2]',\n",
" '[0, 9, 0, 0, -572, 29, 8, 8, 0]']"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int_list_fuzzer = ProbabilisticGrammarFuzzer(list_grammar(INT_GRAMMAR))\n",
"[int_list_fuzzer.fuzz() for i in range(10)]"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"execution": {
"iopub.execute_input": "2025-10-26T13:32:08.627800Z",
"iopub.status.busy": "2025-10-26T13:32:08.627717Z",
"iopub.status.idle": "2025-10-26T13:32:08.697016Z",
"shell.execute_reply": "2025-10-26T13:32:08.696755Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['[\"gn-A$j>\", \"SPX;\", \"\", \"\", \"\"]',\n",
" '[\"_\", \"Qp\"]',\n",
" '[\"M\", \"5\\\\\"`X744\", \"b+5fyM!\", \"gR`\"]',\n",
" '[\"^h\", \"8$u\", \"\", \"\", \"\"]',\n",
" '[\"6X;\", \"\", \"T1wp%\\'t\"]',\n",
" '[\"-?Kk\", \"@B\", \"}\", \"\", \"\"]',\n",
" '[\"FD