{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "slide"
},
"tags": [],
"toc-hr-collapsed": false
},
"source": [
"# Testing Configurations\n",
"\n",
"The behavior of a program is not only governed by its data. The _configuration_ of a program – that is, the settings that govern the execution of a program on its (regular) input data, as set by options or configuration files – just as well influences behavior, and thus can and should be tested. In this chapter, we explore how to systematically _test_ and _cover_ software configurations. By _automatically inferring configuration options_, we can apply these techniques out of the box, with no need for writing a grammar. Finally, we show how to systematically cover _combinations_ of configuration options, quickly detecting unwanted interferences."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:57.938597Z",
"iopub.status.busy": "2024-01-18T17:20:57.938038Z",
"iopub.status.idle": "2024-01-18T17:20:58.011621Z",
"shell.execute_reply": "2024-01-18T17:20:58.011233Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from bookutils import YouTubeVideo\n",
"YouTubeVideo('L0ztoXVru2U')"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"**Prerequisites**\n",
"\n",
"* You should have read the [chapter on grammars](Grammars.ipynb).\n",
"* You should have read the [chapter on grammar coverage](GrammarCoverageFuzzer.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.034242Z",
"iopub.status.busy": "2024-01-18T17:20:58.034029Z",
"iopub.status.idle": "2024-01-18T17:20:58.036588Z",
"shell.execute_reply": "2024-01-18T17:20:58.036257Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import bookutils.setup"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.038410Z",
"iopub.status.busy": "2024-01-18T17:20:58.038284Z",
"iopub.status.idle": "2024-01-18T17:20:58.040217Z",
"shell.execute_reply": "2024-01-18T17:20:58.039907Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from typing import List, Union, Optional, Callable, Type"
]
},
{
"cell_type": "markdown",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"## Synopsis\n",
"\n",
"\n",
"To [use the code provided in this chapter](Importing.ipynb), write\n",
"\n",
"```python\n",
">>> from fuzzingbook.ConfigurationFuzzer import \n",
"```\n",
"\n",
"and then make use of the following features.\n",
"\n",
"\n",
"This chapter provides two classes:\n",
"\n",
"* `OptionRunner` automatically extract command-line options from a Python program;\n",
"* `OptionFuzzer` uses these to automatically test a Python program with a large variety of options.\n",
"\n",
"`OptionRunner` runs a program up to the point where it parses its arguments, and then extracts a grammar that describes its invocations:\n",
"\n",
"```python\n",
">>> autopep8_runner = OptionRunner(\"autopep8\", \"foo.py\")\n",
"```\n",
"The grammar can be extracted via the method `ebnf_grammar()`:\n",
"\n",
"```python\n",
">>> option_ebnf_grammar = autopep8_runner.ebnf_grammar()\n",
">>> option_ebnf_grammar\n",
"{'': ['()*'],\n",
" '': [' -h',\n",
" ' --help',\n",
" ' --version',\n",
" ' -v',\n",
" ' --verbose',\n",
" ' -d',\n",
" ' --diff',\n",
" ' -i',\n",
" ' --in-place',\n",
" ' --global-config ',\n",
" ' --ignore-local-config',\n",
" ' -r',\n",
" ' --recursive',\n",
" ' -j ',\n",
" ' --jobs ',\n",
" ' -p ',\n",
" ' --pep8-passes ',\n",
" ' -a',\n",
" ' --aggressive',\n",
" ' --experimental',\n",
" ' --exclude ',\n",
" ' --list-fixes',\n",
" ' --ignore ',\n",
" ' --select ',\n",
" ' --max-line-length ',\n",
" ' --line-range ',\n",
" ' --range ',\n",
" ' --indent-size ',\n",
" ' --hang-closing',\n",
" ' --exit-code'],\n",
" '': [' foo.py'],\n",
" '': ['+'],\n",
" '': ['0',\n",
" '1',\n",
" '2',\n",
" '3',\n",
" '4',\n",
" '5',\n",
" '6',\n",
" '7',\n",
" '8',\n",
" '9',\n",
" 'a',\n",
" 'b',\n",
" 'c',\n",
" 'd',\n",
" 'e',\n",
" 'f',\n",
" 'g',\n",
" 'h',\n",
" 'i',\n",
" 'j',\n",
" 'k',\n",
" 'l',\n",
" 'm',\n",
" 'n',\n",
" 'o',\n",
" 'p',\n",
" 'q',\n",
" 'r',\n",
" 's',\n",
" 't',\n",
" 'u',\n",
" 'v',\n",
" 'w',\n",
" 'x',\n",
" 'y',\n",
" 'z',\n",
" 'A',\n",
" 'B',\n",
" 'C',\n",
" 'D',\n",
" 'E',\n",
" 'F',\n",
" 'G',\n",
" 'H',\n",
" 'I',\n",
" 'J',\n",
" 'K',\n",
" 'L',\n",
" 'M',\n",
" 'N',\n",
" 'O',\n",
" 'P',\n",
" 'Q',\n",
" 'R',\n",
" 'S',\n",
" 'T',\n",
" 'U',\n",
" 'V',\n",
" 'W',\n",
" 'X',\n",
" 'Y',\n",
" 'Z',\n",
" '!',\n",
" '\"',\n",
" '#',\n",
" '$',\n",
" '%',\n",
" '&',\n",
" \"'\",\n",
" '(',\n",
" ')',\n",
" '*',\n",
" '+',\n",
" ',',\n",
" '-',\n",
" '.',\n",
" '/',\n",
" ':',\n",
" ';',\n",
" '<',\n",
" '=',\n",
" '>',\n",
" '?',\n",
" '@',\n",
" '[',\n",
" '\\\\',\n",
" ']',\n",
" '^',\n",
" '_',\n",
" '`',\n",
" '{',\n",
" '|',\n",
" '}',\n",
" '~'],\n",
" '': [''],\n",
" '': ['(-)?+'],\n",
" '': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],\n",
" '': [''],\n",
" '': [''],\n",
" '': [''],\n",
" '': ['']}\n",
"```\n",
"The grammar can be immediately used for fuzzing. A `GrammarCoverageFuzzer` will ensure all options are covered:\n",
"\n",
"```python\n",
">>> from Grammars import convert_ebnf_grammar\n",
">>> fuzzer = GrammarCoverageFuzzer(convert_ebnf_grammar(option_ebnf_grammar))\n",
">>> [fuzzer.fuzz() for i in range(3)]\n",
"[' --list-fixes foo.py',\n",
" ' --version -i --range -5 04 --aggressive --diff -j 817 --help -d --hang-closing -p 26 -a --verbose --exclude |g -v --global-config f --recursive --ignore .= -r --experimental -h --line-range -317 9 --pep8-passes 8 --max-line-length 2 --indent-size 0 --in-place foo.py',\n",
" ' --ignore-local-config --jobs 92 --exit-code --select Q\"6 --global-config (h --ignore )` --select M --select t,\\\\Z --global-config u5 --global-config ! --ignore <*p --ignore # --global-config c8N[ --ignore \\'x --global-config 09+ --select Y --ignore @ --global-config K --global-config T --exclude E --select odCAH --global-config DFR --ignore s --select ; --global-config Wyi --global-config q2 --exclude - --ignore r --global-config { --global-config zv$} --global-config 3O --ignore e]>lUSb --exclude V^ --global-config LBj --global-config X --select _ --exclude n4 --global-config %&/: --select a --global-config k --ignore 1 --ignore G~m?7 --exclude PJX --select I --select w -d foo.py']\n",
"```\n",
"The `OptionFuzzer` class summarizes these steps. Its constructor takes an `OptionRunner` to automatically extract the grammar; it does the necessary steps to extract the grammar and fuzz with it.\n",
"\n",
"```python\n",
">>> autopep8_runner = OptionRunner(\"autopep8\", \"foo.py\")\n",
">>> autopep8_fuzzer = OptionFuzzer(autopep8_runner)\n",
">>> [autopep8_fuzzer.fuzz() for i in range(3)]\n",
"[' foo.py',\n",
" ' -a --hang-closing --aggressive --help -h --ignore : --verbose --diff --jobs 68 --line-range 4 -159 --indent-size -0 -d -i --range 27 38 --experimental --global-config gt --ignore-local-config --exit-code --in-place -v foo.py',\n",
" ' --select ?H --pep8-passes 7 --max-line-length -154 -r --exclude P --recursive --list-fixes -j 77 --version -p 71 --global-config )c --select 4!m/ --ignore 1d3 --select ~ --ignore b[^ --global-config qn --exclude >V --exclude 2 --select $_iUzQ].0 --select ; --ignore 5M --ignore 9 --select pN`C --ignore 76(J --select Dh --exclude ov\" -h foo.py']\n",
"```\n",
"The final step in testing would now to invoke the program with these arguments.\n",
"\n",
"Note that `OptionRunner` is experimental: It assumes that the Python program in question uses the `argparse` module; and not all `argparse` features are supported. Still, it does a pretty good job even on nontrivial programs.\n",
"\n",
"The `OptionRunner` constructor accepts an additional `miner` keyword parameter, which takes the class of the argument grammar miner to be used. By default, this is `OptionGrammarMiner` – a helper class that can be used (and extended) to create own option grammar miners.\n",
"\n",
"![](PICS/ConfigurationFuzzer-synopsis-1.svg)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": true,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Configuration Options\n",
"\n",
"When we talk about the input to a program, we usually think of the _data_ it processes. This is also what we have been fuzzing in the past chapters – be it with [random input](Fuzzer.ipynb), [mutation-based fuzzing](MutationFuzzer.ipynb), or [grammar-based fuzzing](GrammarFuzzer.ipynb). However, programs typically have several input sources, all of which can and should be tested – and included in test generation."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": true,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"One important source of input is the program's _configuration_ – that is, a set of inputs that typically is set once when beginning to process data and then stays constant while processing data, while the program is running, or even while the program is deployed. Such a configuration is frequently set in _configuration files_ (for instance, as key/value pairs); the most ubiquitous method for command-line tools, though, are _configuration options_ on the command line."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"As an example, consider the `grep` utility to find textual patterns in files. The exact mode by which `grep` works is governed by a multitude of options, which can be listed by providing a `--help` option:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.042429Z",
"iopub.status.busy": "2024-01-18T17:20:58.042287Z",
"iopub.status.idle": "2024-01-18T17:20:58.159826Z",
"shell.execute_reply": "2024-01-18T17:20:58.159135Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"usage: grep [-abcdDEFGHhIiJLlMmnOopqRSsUVvwXxZz] [-A num] [-B num] [-C[num]]\r\n",
"\t[-e pattern] [-f file] [--binary-files=value] [--color=when]\r\n",
"\t[--context[=num]] [--directories=action] [--label] [--line-buffered]\r\n",
"\t[--null] [pattern] [file ...]\r\n"
]
}
],
"source": [
"!grep --help"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"All these options need to be tested for whether they operate correctly. In security testing, any such option may also trigger a yet unknown vulnerability. Hence, such options can become _fuzz targets_ on their own. In this chapter, we analyze how to systematically test such options – and better yet, how to extract possible configurations right out of given program files, such that we do not have to specify anything."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Options in Python\n",
"\n",
"Let us stick to our common programming language here and examine how options are processed in Python. The `argparse` module provides a parser for command-line arguments (and options) with great functionality – and great complexity. You start by defining a parser (`argparse.ArgumentParser()`) to which individual arguments with various features are added, one after another. Additional parameters for each argument can specify the type (`type`) of the argument (say, integers or strings), or the number of arguments (`nargs`)."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"By default, arguments are stored under their name in the `args` object coming from `parse_args()` – thus, `args.integers` holds the `integer` arguments added earlier. Special actions (`actions`) allow storing specific values in given variables; the `store_const` action stores the given `const` in the attribute named by `dest`. The following example takes a number of integer arguments (`integers`) as well as an operator (`--sum`, `--min`, or `--max`) to be applied on these integers. The operators all store a function reference in the `accumulate` attribute, which is finally invoked on the integers parsed:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.163403Z",
"iopub.status.busy": "2024-01-18T17:20:58.163094Z",
"iopub.status.idle": "2024-01-18T17:20:58.165668Z",
"shell.execute_reply": "2024-01-18T17:20:58.165238Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import argparse"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.167827Z",
"iopub.status.busy": "2024-01-18T17:20:58.167655Z",
"iopub.status.idle": "2024-01-18T17:20:58.171452Z",
"shell.execute_reply": "2024-01-18T17:20:58.171078Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"def process_numbers(args=[]):\n",
" parser = argparse.ArgumentParser(description='Process some integers.')\n",
" parser.add_argument('integers', metavar='N', type=int, nargs='+',\n",
" help='an integer for the accumulator')\n",
" group = parser.add_mutually_exclusive_group(required=True)\n",
" group.add_argument('--sum', dest='accumulate', action='store_const',\n",
" const=sum,\n",
" help='sum the integers')\n",
" group.add_argument('--min', dest='accumulate', action='store_const',\n",
" const=min,\n",
" help='compute the minimum')\n",
" group.add_argument('--max', dest='accumulate', action='store_const',\n",
" const=max,\n",
" help='compute the maximum')\n",
"\n",
" args = parser.parse_args(args)\n",
" print(args.accumulate(args.integers))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Here's how `process_numbers()` works. We can, for instance, invoke the `--min` option on the given arguments to compute the minimum:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.173310Z",
"iopub.status.busy": "2024-01-18T17:20:58.173193Z",
"iopub.status.idle": "2024-01-18T17:20:58.176215Z",
"shell.execute_reply": "2024-01-18T17:20:58.175854Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"100\n"
]
}
],
"source": [
"process_numbers([\"--min\", \"100\", \"200\", \"300\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Or compute the sum of three numbers:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.178725Z",
"iopub.status.busy": "2024-01-18T17:20:58.178554Z",
"iopub.status.idle": "2024-01-18T17:20:58.181060Z",
"shell.execute_reply": "2024-01-18T17:20:58.180743Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"6\n"
]
}
],
"source": [
"process_numbers([\"--sum\", \"1\", \"2\", \"3\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"When defined via `add_mutually_exclusive_group()` (as above), options are mutually exclusive. Consequently, we can have only one operator:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.182839Z",
"iopub.status.busy": "2024-01-18T17:20:58.182708Z",
"iopub.status.idle": "2024-01-18T17:20:58.184518Z",
"shell.execute_reply": "2024-01-18T17:20:58.184209Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import bookutils.setup"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.186214Z",
"iopub.status.busy": "2024-01-18T17:20:58.186082Z",
"iopub.status.idle": "2024-01-18T17:20:58.215202Z",
"shell.execute_reply": "2024-01-18T17:20:58.214874Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from ExpectError import ExpectError"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.216864Z",
"iopub.status.busy": "2024-01-18T17:20:58.216777Z",
"iopub.status.idle": "2024-01-18T17:20:58.221637Z",
"shell.execute_reply": "2024-01-18T17:20:58.221379Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"usage: ipykernel_launcher.py [-h] (--sum | --min | --max) N [N ...]\n",
"ipykernel_launcher.py: error: argument --max: not allowed with argument --sum\n",
"SystemExit: 2 (expected)\n"
]
}
],
"source": [
"with ExpectError(SystemExit, print_traceback=False):\n",
" process_numbers([\"--sum\", \"--max\", \"1\", \"2\", \"3\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## A Grammar for Configurations\n",
"\n",
"How can we test a system with several options? The easiest answer is to write a grammar for it. The grammar `PROCESS_NUMBERS_EBNF_GRAMMAR` reflects the possible combinations of options and arguments:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.223161Z",
"iopub.status.busy": "2024-01-18T17:20:58.223081Z",
"iopub.status.idle": "2024-01-18T17:20:58.628105Z",
"shell.execute_reply": "2024-01-18T17:20:58.626043Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from Grammars import crange, srange, convert_ebnf_grammar, extend_grammar, is_valid_grammar\n",
"from Grammars import START_SYMBOL, new_symbol, Grammar"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.630846Z",
"iopub.status.busy": "2024-01-18T17:20:58.630648Z",
"iopub.status.idle": "2024-01-18T17:20:58.632833Z",
"shell.execute_reply": "2024-01-18T17:20:58.632577Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"PROCESS_NUMBERS_EBNF_GRAMMAR: Grammar = {\n",
" \"\": [\" \"],\n",
" \"\": [\"--sum\", \"--min\", \"--max\"],\n",
" \"\": [\"\", \" \"],\n",
" \"\": [\"+\"],\n",
" \"\": crange('0', '9')\n",
"}\n",
"\n",
"assert is_valid_grammar(PROCESS_NUMBERS_EBNF_GRAMMAR)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.634432Z",
"iopub.status.busy": "2024-01-18T17:20:58.634314Z",
"iopub.status.idle": "2024-01-18T17:20:58.635997Z",
"shell.execute_reply": "2024-01-18T17:20:58.635727Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"PROCESS_NUMBERS_GRAMMAR = convert_ebnf_grammar(PROCESS_NUMBERS_EBNF_GRAMMAR)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We can feed this grammar into our [grammar coverage fuzzer](GrammarCoverageFuzzer.ipynb) and have it cover one option after another:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.637531Z",
"iopub.status.busy": "2024-01-18T17:20:58.637422Z",
"iopub.status.idle": "2024-01-18T17:20:58.921017Z",
"shell.execute_reply": "2024-01-18T17:20:58.920130Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from GrammarCoverageFuzzer import GrammarCoverageFuzzer"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.926653Z",
"iopub.status.busy": "2024-01-18T17:20:58.925811Z",
"iopub.status.idle": "2024-01-18T17:20:58.957812Z",
"shell.execute_reply": "2024-01-18T17:20:58.956952Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--max 9 5 8 210 80 9756431\n",
"--sum 9 4 99 1245 612370\n",
"--min 2 3 0 46 15798 7570926\n"
]
}
],
"source": [
"f = GrammarCoverageFuzzer(PROCESS_NUMBERS_GRAMMAR, min_nonterminals=10)\n",
"for i in range(3):\n",
" print(f.fuzz())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Of course, we can also invoke `process_numbers()` with these very arguments. To this end, we need to convert the string produced by the grammar back into a list of individual arguments, using `split()`:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.966272Z",
"iopub.status.busy": "2024-01-18T17:20:58.966143Z",
"iopub.status.idle": "2024-01-18T17:20:58.986813Z",
"shell.execute_reply": "2024-01-18T17:20:58.986518Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['--max', '8', '9', '3067', '44', '13852967057']\n",
"13852967057\n",
"['--sum', '9', '8', '63', '9278111', '59206197798']\n",
"59215475989\n",
"['--min', '4', '1', '4864', '33342', '7827970808951']\n",
"1\n"
]
}
],
"source": [
"f = GrammarCoverageFuzzer(PROCESS_NUMBERS_GRAMMAR, min_nonterminals=10)\n",
"for i in range(3):\n",
" args = f.fuzz().split()\n",
" print(args)\n",
" process_numbers(args)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Similarly, we can define grammars for any program to be tested; as well as define grammars for, say, configuration files. Yet, the grammar has to be updated with every change to the program, which creates a maintenance burden. Given that the information required for the grammar is already all encoded in the program, the question arises: _Can't we go and extract configuration options right out of the program in the first place?_"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"## Mining Configuration Options\n",
"\n",
"In this section, we try to extract option and argument information right out of a program, such that we do not have to specify a configuration grammar. The aim is to have a configuration fuzzer that works on the options and arguments of an arbitrary program, as long as it follows specific conventions for processing its arguments. In the case of Python programs, this means using the `argparse` module.\n",
"\n",
"Our idea is as follows: We execute the given program up to the point where the arguments are actually parsed – that is, `argparse.parse_args()` is invoked. Up to this point, we track all calls into the argument parser, notably those calls that define arguments and options (`add_argument()`). From these, we construct the grammar."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"### Tracking Arguments\n",
"\n",
"Let us illustrate this approach with a simple experiment: We define a trace function (see [our chapter on coverage](Coverage.ipynb) for details) that is active while `process_numbers` is invoked. If we have a call to a method `add_argument`, we access and print out the local variables (which at this point are the arguments to the method)."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.988440Z",
"iopub.status.busy": "2024-01-18T17:20:58.988329Z",
"iopub.status.idle": "2024-01-18T17:20:58.989931Z",
"shell.execute_reply": "2024-01-18T17:20:58.989663Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import sys"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.991289Z",
"iopub.status.busy": "2024-01-18T17:20:58.991205Z",
"iopub.status.idle": "2024-01-18T17:20:58.992750Z",
"shell.execute_reply": "2024-01-18T17:20:58.992484Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import string"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.994218Z",
"iopub.status.busy": "2024-01-18T17:20:58.994117Z",
"iopub.status.idle": "2024-01-18T17:20:58.996182Z",
"shell.execute_reply": "2024-01-18T17:20:58.995921Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def trace_locals(frame, event, arg):\n",
" if event != \"call\":\n",
" return\n",
" method_name = frame.f_code.co_name\n",
" if method_name != \"add_argument\":\n",
" return\n",
" locals = frame.f_locals\n",
" print(method_name, locals)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"What we get is a list of all calls to `add_argument()`, together with the method arguments passed:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:58.997748Z",
"iopub.status.busy": "2024-01-18T17:20:58.997656Z",
"iopub.status.idle": "2024-01-18T17:20:59.000259Z",
"shell.execute_reply": "2024-01-18T17:20:59.000021Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"add_argument {'self': ArgumentParser(prog='ipykernel_launcher.py', usage=None, description='Process some integers.', formatter_class=, conflict_handler='error', add_help=True), 'args': ('-h', '--help'), 'kwargs': {'action': 'help', 'default': '==SUPPRESS==', 'help': 'show this help message and exit'}}\n",
"add_argument {'self': ArgumentParser(prog='ipykernel_launcher.py', usage=None, description='Process some integers.', formatter_class=, conflict_handler='error', add_help=True), 'args': ('integers',), 'kwargs': {'metavar': 'N', 'type': , 'nargs': '+', 'help': 'an integer for the accumulator'}}\n",
"add_argument {'self': , 'args': ('--sum',), 'kwargs': {'dest': 'accumulate', 'action': 'store_const', 'const': , 'help': 'sum the integers'}}\n",
"add_argument {'self': , 'args': ('--min',), 'kwargs': {'dest': 'accumulate', 'action': 'store_const', 'const': , 'help': 'compute the minimum'}}\n",
"add_argument {'self': , 'args': ('--max',), 'kwargs': {'dest': 'accumulate', 'action': 'store_const', 'const': , 'help': 'compute the maximum'}}\n",
"6\n"
]
}
],
"source": [
"sys.settrace(trace_locals)\n",
"process_numbers([\"--sum\", \"1\", \"2\", \"3\"])\n",
"sys.settrace(None)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"From the `args` argument, we can access the individual options and arguments to be defined:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.001912Z",
"iopub.status.busy": "2024-01-18T17:20:59.001747Z",
"iopub.status.idle": "2024-01-18T17:20:59.003732Z",
"shell.execute_reply": "2024-01-18T17:20:59.003453Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def trace_options(frame, event, arg):\n",
" if event != \"call\":\n",
" return\n",
" method_name = frame.f_code.co_name\n",
" if method_name != \"add_argument\":\n",
" return\n",
" locals = frame.f_locals\n",
" print(locals['args'])"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.005211Z",
"iopub.status.busy": "2024-01-18T17:20:59.005111Z",
"iopub.status.idle": "2024-01-18T17:20:59.007248Z",
"shell.execute_reply": "2024-01-18T17:20:59.007009Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('-h', '--help')\n",
"('integers',)\n",
"('--sum',)\n",
"('--min',)\n",
"('--max',)\n",
"6\n"
]
}
],
"source": [
"sys.settrace(trace_options)\n",
"process_numbers([\"--sum\", \"1\", \"2\", \"3\"])\n",
"sys.settrace(None)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We see that each argument comes as a tuple with one (say, `integers` or `--sum`) or two members (`-h` and `--help`), which denote alternate forms for the same option. Our job will be to go through the arguments of `add_arguments()` and detect not only the names of options and arguments, but also whether they accept additional parameters, as well as the type of the parameters."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### A Grammar Miner for Options and Arguments"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Let us now build a class that gathers all this information to create a grammar."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We use the `ParseInterrupt` exception to interrupt program execution after gathering all arguments and options:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.008848Z",
"iopub.status.busy": "2024-01-18T17:20:59.008739Z",
"iopub.status.idle": "2024-01-18T17:20:59.010304Z",
"shell.execute_reply": "2024-01-18T17:20:59.010060Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"class ParseInterrupt(Exception):\n",
" pass"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The class `OptionGrammarMiner` takes an executable function for which the grammar of options and arguments is to be mined:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.011825Z",
"iopub.status.busy": "2024-01-18T17:20:59.011684Z",
"iopub.status.idle": "2024-01-18T17:20:59.013895Z",
"shell.execute_reply": "2024-01-18T17:20:59.013621Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"class OptionGrammarMiner:\n",
" \"\"\"Helper class for extracting option grammars\"\"\"\n",
"\n",
" def __init__(self, function: Callable, log: bool = False):\n",
" \"\"\"Constructor.\n",
" `function` - a function processing arguments using argparse()\n",
" `log` - output diagnostics if True\n",
" \"\"\"\n",
" self.function = function\n",
" self.log = log"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The method `mine_ebnf_grammar()` is where everything happens. It creates a grammar of the form\n",
"\n",
"```\n",
" ::= * \n",
" ::= \n",
" ::= \n",
"```\n",
"\n",
"in which the options and arguments will be collected. It then sets a trace function (see [our chapter on coverage](Coverage.ipynb) for details) that is active while the previously defined `function` is invoked. Raising `ParseInterrupt` (when `parse_args()` is invoked) ends execution."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.015452Z",
"iopub.status.busy": "2024-01-18T17:20:59.015338Z",
"iopub.status.idle": "2024-01-18T17:20:59.018112Z",
"shell.execute_reply": "2024-01-18T17:20:59.017830Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class OptionGrammarMiner(OptionGrammarMiner):\n",
" OPTION_SYMBOL = \"\"\n",
" ARGUMENTS_SYMBOL = \"\"\n",
"\n",
" def mine_ebnf_grammar(self):\n",
" \"\"\"Extract EBNF option grammar\"\"\"\n",
" self.grammar = {\n",
" START_SYMBOL: [\"(\" + self.OPTION_SYMBOL + \")*\" + self.ARGUMENTS_SYMBOL],\n",
" self.OPTION_SYMBOL: [],\n",
" self.ARGUMENTS_SYMBOL: []\n",
" }\n",
" self.current_group = self.OPTION_SYMBOL\n",
"\n",
" old_trace = sys.gettrace()\n",
" sys.settrace(self.traceit)\n",
" try:\n",
" self.function()\n",
" except ParseInterrupt:\n",
" pass\n",
" sys.settrace(old_trace)\n",
"\n",
" return self.grammar\n",
"\n",
" def mine_grammar(self):\n",
" \"\"\"Extract BNF option grammar\"\"\"\n",
" return convert_ebnf_grammar(self.mine_ebnf_grammar())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The trace function checks for four methods: `add_argument()` is the most important function, resulting in processing arguments; `frame.f_locals` again is the set of local variables, which at this point is mostly the arguments to `add_argument()`. Since mutually exclusive groups also have a method `add_argument()`, we set the flag `in_group` to differentiate."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Note that we make no specific efforts to differentiate between multiple parsers or groups; we simply assume that there is one parser, and at any point at most one mutually exclusive group."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.019690Z",
"iopub.status.busy": "2024-01-18T17:20:59.019576Z",
"iopub.status.idle": "2024-01-18T17:20:59.022256Z",
"shell.execute_reply": "2024-01-18T17:20:59.021984Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class OptionGrammarMiner(OptionGrammarMiner):\n",
" def traceit(self, frame, event, arg):\n",
" if event != \"call\":\n",
" return\n",
"\n",
" if \"self\" not in frame.f_locals:\n",
" return\n",
"\n",
" self_var = frame.f_locals[\"self\"]\n",
" method_name = frame.f_code.co_name\n",
"\n",
" if method_name == \"add_argument\":\n",
" in_group = repr(type(self_var)).find(\"Group\") >= 0\n",
" self.process_argument(frame.f_locals, in_group)\n",
" elif method_name == \"add_mutually_exclusive_group\":\n",
" self.add_group(frame.f_locals, exclusive=True)\n",
" elif method_name == \"add_argument_group\":\n",
" # self.add_group(frame.f_locals, exclusive=False)\n",
" pass\n",
" elif method_name == \"parse_args\":\n",
" raise ParseInterrupt\n",
"\n",
" return self.traceit"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The `process_arguments()` now analyzes the arguments passed and adds them to the grammar:\n",
"\n",
"* If the argument starts with `-`, it gets added as an optional element to the `` list\n",
"* Otherwise, it gets added to the `` list.\n",
"\n",
"The optional `nargs` argument specifies how many arguments can follow. If it is a number, we add the appropriate number of elements to the grammar; if it is an abstract specifier (say, `+` or `*`), we use it directly as EBNF operator.\n",
"\n",
"Given the large number of parameters and optional behavior, this is a somewhat messy function, but it does the job."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.023796Z",
"iopub.status.busy": "2024-01-18T17:20:59.023687Z",
"iopub.status.idle": "2024-01-18T17:20:59.025756Z",
"shell.execute_reply": "2024-01-18T17:20:59.025513Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class OptionGrammarMiner(OptionGrammarMiner):\n",
" def process_argument(self, locals, in_group):\n",
" args = locals[\"args\"]\n",
" kwargs = locals[\"kwargs\"]\n",
"\n",
" if self.log:\n",
" print(args)\n",
" print(kwargs)\n",
" print()\n",
"\n",
" for arg in args:\n",
" self.process_arg(arg, in_group, kwargs)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.027402Z",
"iopub.status.busy": "2024-01-18T17:20:59.027288Z",
"iopub.status.idle": "2024-01-18T17:20:59.030406Z",
"shell.execute_reply": "2024-01-18T17:20:59.030108Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class OptionGrammarMiner(OptionGrammarMiner):\n",
" def process_arg(self, arg, in_group, kwargs):\n",
" if arg.startswith('-'):\n",
" if not in_group:\n",
" target = self.OPTION_SYMBOL\n",
" else:\n",
" target = self.current_group\n",
" metavar = None\n",
" arg = \" \" + arg\n",
" else:\n",
" target = self.ARGUMENTS_SYMBOL\n",
" metavar = arg\n",
" arg = \"\"\n",
"\n",
" if \"nargs\" in kwargs:\n",
" nargs = kwargs[\"nargs\"]\n",
" else:\n",
" nargs = 1\n",
"\n",
" param = self.add_parameter(kwargs, metavar)\n",
" if param == \"\":\n",
" nargs = 0\n",
"\n",
" if isinstance(nargs, int):\n",
" for i in range(nargs):\n",
" arg += param\n",
" else:\n",
" assert nargs in \"?+*\"\n",
" arg += '(' + param + ')' + nargs\n",
"\n",
" if target == self.OPTION_SYMBOL:\n",
" self.grammar[target].append(arg)\n",
" else:\n",
" self.grammar[target].append(arg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The method `add_parameter()` handles possible parameters of options. If the argument has an `action` defined, it takes no parameter. Otherwise, we identify the type of the parameter (as `int` or `str`) and augment the grammar with an appropriate rule."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.031989Z",
"iopub.status.busy": "2024-01-18T17:20:59.031900Z",
"iopub.status.idle": "2024-01-18T17:20:59.033413Z",
"shell.execute_reply": "2024-01-18T17:20:59.033154Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import inspect"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.034810Z",
"iopub.status.busy": "2024-01-18T17:20:59.034693Z",
"iopub.status.idle": "2024-01-18T17:20:59.037222Z",
"shell.execute_reply": "2024-01-18T17:20:59.036969Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class OptionGrammarMiner(OptionGrammarMiner):\n",
" def add_parameter(self, kwargs, metavar):\n",
" if \"action\" in kwargs:\n",
" # No parameter\n",
" return \"\"\n",
"\n",
" type_ = \"str\"\n",
" if \"type\" in kwargs:\n",
" given_type = kwargs[\"type\"]\n",
" # int types come as ''\n",
" if inspect.isclass(given_type) and issubclass(given_type, int):\n",
" type_ = \"int\"\n",
"\n",
" if metavar is None:\n",
" if \"metavar\" in kwargs:\n",
" metavar = kwargs[\"metavar\"]\n",
" else:\n",
" metavar = type_\n",
"\n",
" self.add_type_rule(type_)\n",
" if metavar != type_:\n",
" self.add_metavar_rule(metavar, type_)\n",
"\n",
" param = \" <\" + metavar + \">\"\n",
"\n",
" return param"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The method `add_type_rule()` adds a rule for parameter types to the grammar. If the parameter is identified by a meta-variable (say, `N`), we add a rule for this as well to improve legibility."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.038689Z",
"iopub.status.busy": "2024-01-18T17:20:59.038589Z",
"iopub.status.idle": "2024-01-18T17:20:59.041028Z",
"shell.execute_reply": "2024-01-18T17:20:59.040772Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"class OptionGrammarMiner(OptionGrammarMiner):\n",
" def add_type_rule(self, type_):\n",
" if type_ == \"int\":\n",
" self.add_int_rule()\n",
" else:\n",
" self.add_str_rule()\n",
"\n",
" def add_int_rule(self):\n",
" self.grammar[\"\"] = [\"(-)?+\"]\n",
" self.grammar[\"\"] = crange('0', '9')\n",
"\n",
" def add_str_rule(self):\n",
" self.grammar[\"\"] = [\"+\"]\n",
" self.grammar[\"\"] = srange(\n",
" string.digits\n",
" + string.ascii_letters\n",
" + string.punctuation)\n",
"\n",
" def add_metavar_rule(self, metavar, type_):\n",
" self.grammar[\"<\" + metavar + \">\"] = [\"<\" + type_ + \">\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The method `add_group()` adds a new mutually exclusive group to the grammar. We define a new symbol (say, ``) for the options added to the group, and use the `required` and `exclusive` flags to define an appropriate expansion operator. The group is then prefixed to the grammar, as in\n",
"\n",
"```\n",
" ::= * \n",
" ::= \n",
"```\n",
"\n",
"and filled with the next calls to `add_argument()` within the group."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.042497Z",
"iopub.status.busy": "2024-01-18T17:20:59.042400Z",
"iopub.status.idle": "2024-01-18T17:20:59.045052Z",
"shell.execute_reply": "2024-01-18T17:20:59.044747Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class OptionGrammarMiner(OptionGrammarMiner):\n",
" def add_group(self, locals, exclusive):\n",
" kwargs = locals[\"kwargs\"]\n",
" if self.log:\n",
" print(kwargs)\n",
"\n",
" required = kwargs.get(\"required\", False)\n",
" group = new_symbol(self.grammar, \"\")\n",
"\n",
" if required and exclusive:\n",
" group_expansion = group\n",
" if required and not exclusive:\n",
" group_expansion = group + \"+\"\n",
" if not required and exclusive:\n",
" group_expansion = group + \"?\"\n",
" if not required and not exclusive:\n",
" group_expansion = group + \"*\"\n",
"\n",
" self.grammar[START_SYMBOL][0] = group_expansion + \\\n",
" self.grammar[START_SYMBOL][0]\n",
" self.grammar[group] = []\n",
" self.current_group = group"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"That's it! With this, we can now extract the grammar from our `process_numbers()` program. Turning on logging again reveals the variables we draw upon."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.046686Z",
"iopub.status.busy": "2024-01-18T17:20:59.046575Z",
"iopub.status.idle": "2024-01-18T17:20:59.049160Z",
"shell.execute_reply": "2024-01-18T17:20:59.048928Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('-h', '--help')\n",
"{'action': 'help', 'default': '==SUPPRESS==', 'help': 'show this help message and exit'}\n",
"\n",
"('integers',)\n",
"{'metavar': 'N', 'type': , 'nargs': '+', 'help': 'an integer for the accumulator'}\n",
"\n",
"{'required': True}\n",
"('--sum',)\n",
"{'dest': 'accumulate', 'action': 'store_const', 'const': , 'help': 'sum the integers'}\n",
"\n",
"('--min',)\n",
"{'dest': 'accumulate', 'action': 'store_const', 'const': , 'help': 'compute the minimum'}\n",
"\n",
"('--max',)\n",
"{'dest': 'accumulate', 'action': 'store_const', 'const': , 'help': 'compute the maximum'}\n",
"\n"
]
}
],
"source": [
"miner = OptionGrammarMiner(process_numbers, log=True)\n",
"process_numbers_grammar = miner.mine_ebnf_grammar()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Here is the extracted grammar:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.050819Z",
"iopub.status.busy": "2024-01-18T17:20:59.050643Z",
"iopub.status.idle": "2024-01-18T17:20:59.053013Z",
"shell.execute_reply": "2024-01-18T17:20:59.052783Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"{'': ['()*'],\n",
" '': [' -h', ' --help'],\n",
" '': ['( )+'],\n",
" '': ['(-)?+'],\n",
" '': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],\n",
" '': [''],\n",
" '': [' --sum', ' --min', ' --max']}"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"process_numbers_grammar"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The grammar properly identifies the group found:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.054462Z",
"iopub.status.busy": "2024-01-18T17:20:59.054367Z",
"iopub.status.idle": "2024-01-18T17:20:59.056304Z",
"shell.execute_reply": "2024-01-18T17:20:59.056067Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['()*']"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"process_numbers_grammar[\"\"]"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.057724Z",
"iopub.status.busy": "2024-01-18T17:20:59.057620Z",
"iopub.status.idle": "2024-01-18T17:20:59.059598Z",
"shell.execute_reply": "2024-01-18T17:20:59.059361Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[' --sum', ' --min', ' --max']"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"process_numbers_grammar[\"\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"It also identifies a `--help` option provided not by us, but by the `argparse` module:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.061166Z",
"iopub.status.busy": "2024-01-18T17:20:59.061046Z",
"iopub.status.idle": "2024-01-18T17:20:59.063118Z",
"shell.execute_reply": "2024-01-18T17:20:59.062836Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[' -h', ' --help']"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"process_numbers_grammar[\"\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The grammar also correctly identifies the types of the arguments:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.064609Z",
"iopub.status.busy": "2024-01-18T17:20:59.064503Z",
"iopub.status.idle": "2024-01-18T17:20:59.066479Z",
"shell.execute_reply": "2024-01-18T17:20:59.066243Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['( )+']"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"process_numbers_grammar[\"\"]"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.067945Z",
"iopub.status.busy": "2024-01-18T17:20:59.067845Z",
"iopub.status.idle": "2024-01-18T17:20:59.069792Z",
"shell.execute_reply": "2024-01-18T17:20:59.069519Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['']"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"process_numbers_grammar[\"\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The rules for `int` are set as defined by `add_int_rule()`"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.071349Z",
"iopub.status.busy": "2024-01-18T17:20:59.071240Z",
"iopub.status.idle": "2024-01-18T17:20:59.073205Z",
"shell.execute_reply": "2024-01-18T17:20:59.072960Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['(-)?+']"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"process_numbers_grammar[\"\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We can take this grammar and convert it to BNF, such that we can fuzz with it right away:"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.074702Z",
"iopub.status.busy": "2024-01-18T17:20:59.074604Z",
"iopub.status.idle": "2024-01-18T17:20:59.076191Z",
"shell.execute_reply": "2024-01-18T17:20:59.075940Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"assert is_valid_grammar(process_numbers_grammar)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.077790Z",
"iopub.status.busy": "2024-01-18T17:20:59.077675Z",
"iopub.status.idle": "2024-01-18T17:20:59.079448Z",
"shell.execute_reply": "2024-01-18T17:20:59.079131Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"grammar = convert_ebnf_grammar(process_numbers_grammar)\n",
"assert is_valid_grammar(grammar)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.081050Z",
"iopub.status.busy": "2024-01-18T17:20:59.080940Z",
"iopub.status.idle": "2024-01-18T17:20:59.135732Z",
"shell.execute_reply": "2024-01-18T17:20:59.135446Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" --sum 9\n",
" --max -h --help --help -16 -0\n",
" --min --help 2745341 8\n",
" --min 1 27\n",
" --sum --help --help -2\n",
" --sum --help 0 3 -77\n",
" --sum -3\n",
" --sum --help 429 8 10 0295 -694 1\n",
" --max -h 91 -1425 99\n",
" --sum -795 -94 8 -44\n"
]
}
],
"source": [
"f = GrammarCoverageFuzzer(grammar)\n",
"for i in range(10):\n",
" print(f.fuzz())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Each and every invocation adheres to the rules as set forth in the `argparse` calls. By mining options and arguments from existing programs, we can now fuzz these options out of the box – without having to specify a grammar."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"toc-hr-collapsed": true
},
"source": [
"## Testing Autopep8"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Let us try out the option grammar miner on real-world Python programs. `autopep8` is a tool that automatically converts Python code to the [PEP 8 Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/). (Actually, all Python code in this book runs through `autopep8` during production.) `autopep8` offers a wide range of options, as can be seen by invoking it with `--help`:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.137474Z",
"iopub.status.busy": "2024-01-18T17:20:59.137356Z",
"iopub.status.idle": "2024-01-18T17:20:59.291878Z",
"shell.execute_reply": "2024-01-18T17:20:59.291185Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"usage: autopep8 [-h] [--version] [-v] [-d] [-i] [--global-config filename]\r\n",
" [--ignore-local-config] [-r] [-j n] [-p n] [-a]\r\n",
" [--experimental] [--exclude globs] [--list-fixes]\r\n",
" [--ignore errors] [--select errors] [--max-line-length n]\r\n",
" [--line-range line line] [--hang-closing] [--exit-code]\r\n",
" [files ...]\r\n",
"\r\n",
"Automatically formats Python code to conform to the PEP 8 style guide.\r\n",
"\r\n",
"positional arguments:\r\n",
" files files to format or '-' for standard in\r\n",
"\r\n",
"options:\r\n",
" -h, --help show this help message and exit\r\n",
" --version show program's version number and exit\r\n",
" -v, --verbose print verbose messages; multiple -v result in more\r\n",
" verbose messages\r\n",
" -d, --diff print the diff for the fixed source\r\n",
" -i, --in-place make changes to files in place\r\n",
" --global-config filename\r\n",
" path to a global pep8 config file; if this file does\r\n",
" not exist then this is ignored (default:\r\n",
" /Users/zeller/.config/pep8)\r\n",
" --ignore-local-config\r\n",
" don't look for and apply local config files; if not\r\n",
" passed, defaults are updated with any config files in\r\n",
" the project's root directory\r\n",
" -r, --recursive run recursively over directories; must be used with\r\n",
" --in-place or --diff\r\n",
" -j n, --jobs n number of parallel jobs; match CPU count if value is\r\n",
" less than 1\r\n",
" -p n, --pep8-passes n\r\n",
" maximum number of additional pep8 passes (default:\r\n",
" infinite)\r\n",
" -a, --aggressive enable non-whitespace changes; multiple -a result in\r\n",
" more aggressive changes\r\n",
" --experimental enable experimental fixes\r\n",
" --exclude globs exclude file/directory names that match these comma-\r\n",
" separated globs\r\n",
" --list-fixes list codes for fixes; used by --ignore and --select\r\n",
" --ignore errors do not fix these errors/warnings (default:\r\n",
" E226,E24,W50,W690)\r\n",
" --select errors fix only these errors/warnings (e.g. E4,W)\r\n",
" --max-line-length n set maximum allowed line length (default: 79)\r\n",
" --line-range line line, --range line line\r\n",
" only fix errors found within this inclusive range of\r\n",
" line numbers (e.g. 1 99); line numbers are indexed at\r\n",
" 1\r\n",
" --hang-closing hang-closing option passed to pycodestyle\r\n",
" --exit-code change to behavior of exit code. default behavior of\r\n",
" return value, 0 is no differences, 1 is error exit.\r\n",
" return 2 when add this option. 2 is exists\r\n",
" differences.\r\n"
]
}
],
"source": [
"!autopep8 --help"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Autopep8 Setup\n",
"\n",
"We want to systematically test these options. In order to deploy our configuration grammar miner, we need to find the source code of the executable:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.295039Z",
"iopub.status.busy": "2024-01-18T17:20:59.294796Z",
"iopub.status.idle": "2024-01-18T17:20:59.297468Z",
"shell.execute_reply": "2024-01-18T17:20:59.297015Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import os"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.299701Z",
"iopub.status.busy": "2024-01-18T17:20:59.299515Z",
"iopub.status.idle": "2024-01-18T17:20:59.302677Z",
"shell.execute_reply": "2024-01-18T17:20:59.302189Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def find_executable(name):\n",
" for path in os.get_exec_path():\n",
" qualified_name = os.path.join(path, name)\n",
" if os.path.exists(qualified_name):\n",
" return qualified_name\n",
" return None"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.304875Z",
"iopub.status.busy": "2024-01-18T17:20:59.304715Z",
"iopub.status.idle": "2024-01-18T17:20:59.308084Z",
"shell.execute_reply": "2024-01-18T17:20:59.307706Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'/Users/zeller/.pyenv/versions/3.10.2/bin/autopep8'"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"autopep8_executable = find_executable(\"autopep8\")\n",
"assert autopep8_executable is not None\n",
"autopep8_executable"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Next, we build a function that reads the contents of the file and executes it."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.310063Z",
"iopub.status.busy": "2024-01-18T17:20:59.309917Z",
"iopub.status.idle": "2024-01-18T17:20:59.312348Z",
"shell.execute_reply": "2024-01-18T17:20:59.311982Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def autopep8():\n",
" executable = find_executable(\"autopep8\")\n",
"\n",
" # First line has to contain \"/usr/bin/env python\" or like\n",
" first_line = open(executable).readline()\n",
" assert first_line.find(\"python\") >= 0\n",
"\n",
" contents = open(executable).read()\n",
" exec(contents)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Mining an Autopep8 Grammar\n",
"\n",
"We can use the `autopep8()` function in our grammar miner:"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.314530Z",
"iopub.status.busy": "2024-01-18T17:20:59.314368Z",
"iopub.status.idle": "2024-01-18T17:20:59.316276Z",
"shell.execute_reply": "2024-01-18T17:20:59.315896Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"autopep8_miner = OptionGrammarMiner(autopep8)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"and extract a grammar for it:"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.318082Z",
"iopub.status.busy": "2024-01-18T17:20:59.317941Z",
"iopub.status.idle": "2024-01-18T17:20:59.343813Z",
"shell.execute_reply": "2024-01-18T17:20:59.343429Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"autopep8_ebnf_grammar = autopep8_miner.mine_ebnf_grammar()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"This works because here, `autopep8` is not a separate process (and a separate Python interpreter), but we run the `autopep8()` function (and the `autopep8` code) in our current Python interpreter – up to the call to `parse_args()`, where we interrupt execution again. At this point, the `autopep8` code has done nothing but setting up the argument parser – which is what we are interested in."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The grammar options mined reflect precisely the options seen when providing `--help`:"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.345825Z",
"iopub.status.busy": "2024-01-18T17:20:59.345680Z",
"iopub.status.idle": "2024-01-18T17:20:59.347713Z",
"shell.execute_reply": "2024-01-18T17:20:59.347388Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[' -h', ' --help', ' --version', ' -v', ' --verbose', ' -d', ' --diff', ' -i', ' --in-place', ' --global-config ', ' --ignore-local-config', ' -r', ' --recursive', ' -j ', ' --jobs ', ' -p ', ' --pep8-passes ', ' -a', ' --aggressive', ' --experimental', ' --exclude ', ' --list-fixes', ' --ignore ', ' --select ', ' --max-line-length ', ' --line-range ', ' --range ', ' --indent-size ', ' --hang-closing', ' --exit-code']\n"
]
}
],
"source": [
"print(autopep8_ebnf_grammar[\"\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Metavariables like `` or `` are placeholders for integers. We assume all metavariables of the same name have the same type:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.349295Z",
"iopub.status.busy": "2024-01-18T17:20:59.349182Z",
"iopub.status.idle": "2024-01-18T17:20:59.351185Z",
"shell.execute_reply": "2024-01-18T17:20:59.350905Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['']"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"autopep8_ebnf_grammar[\"\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The grammar miner has inferred that the argument to `autopep8` is a list of files:"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.352758Z",
"iopub.status.busy": "2024-01-18T17:20:59.352625Z",
"iopub.status.idle": "2024-01-18T17:20:59.354655Z",
"shell.execute_reply": "2024-01-18T17:20:59.354421Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['( )*']"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"autopep8_ebnf_grammar[\"\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"which in turn all are strings:"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.356112Z",
"iopub.status.busy": "2024-01-18T17:20:59.356010Z",
"iopub.status.idle": "2024-01-18T17:20:59.357982Z",
"shell.execute_reply": "2024-01-18T17:20:59.357728Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"['']"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"autopep8_ebnf_grammar[\"\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"As we are only interested in testing options, not arguments, we fix the arguments to a single mandatory input. (Otherwise, we'd have plenty of random file names generated.)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.359496Z",
"iopub.status.busy": "2024-01-18T17:20:59.359399Z",
"iopub.status.idle": "2024-01-18T17:20:59.361284Z",
"shell.execute_reply": "2024-01-18T17:20:59.360974Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"autopep8_ebnf_grammar[\"\"] = [\" \"]\n",
"autopep8_ebnf_grammar[\"\"] = [\"foo.py\"]\n",
"assert is_valid_grammar(autopep8_ebnf_grammar)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Creating Autopep8 Options"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Let us now use the inferred grammar for fuzzing. Again, we convert the EBNF grammar into a regular BNF grammar:"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.363263Z",
"iopub.status.busy": "2024-01-18T17:20:59.363145Z",
"iopub.status.idle": "2024-01-18T17:20:59.365423Z",
"shell.execute_reply": "2024-01-18T17:20:59.365099Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"autopep8_grammar = convert_ebnf_grammar(autopep8_ebnf_grammar)\n",
"assert is_valid_grammar(autopep8_grammar)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"And we can use the grammar for fuzzing all options:"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.367084Z",
"iopub.status.busy": "2024-01-18T17:20:59.366974Z",
"iopub.status.idle": "2024-01-18T17:20:59.408963Z",
"shell.execute_reply": "2024-01-18T17:20:59.408679Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" -r foo.py\n",
" -h --experimental --hang-closing foo.py\n",
" --list-fixes -v foo.py\n",
" --aggressive -d foo.py\n",
" --indent-size 9 --help foo.py\n",
" --exit-code --recursive foo.py\n",
" --diff --version -i foo.py\n",
" --max-line-length 0 --in-place --verbose foo.py\n",
" --ignore-local-config -a foo.py\n",
" --select x -i --exit-code foo.py\n",
" -j 8 --diff foo.py\n",
" -d -v -d foo.py\n",
" -p 6 -i foo.py\n",
" -v --diff foo.py\n",
" --ignore uA --recursive foo.py\n",
" --jobs 5 -r foo.py\n",
" --range 4 1 foo.py\n",
" --ignore-local-config -i foo.py\n",
" -r --exit-code foo.py\n",
" -v -r foo.py\n"
]
}
],
"source": [
"f = GrammarCoverageFuzzer(autopep8_grammar, max_nonterminals=4)\n",
"for i in range(20):\n",
" print(f.fuzz())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Let us apply these options on the actual program. We need a file `foo.py` that will serve as input: (Note that the following commands will overwrite the file `foo.py`, if it already exists in the current working directory. Be aware of this, if you downloaded the notebooks and are working locally.)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.411075Z",
"iopub.status.busy": "2024-01-18T17:20:59.410899Z",
"iopub.status.idle": "2024-01-18T17:20:59.413122Z",
"shell.execute_reply": "2024-01-18T17:20:59.412815Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def create_foo_py():\n",
" open(\"foo.py\", \"w\").write(\"\"\"\n",
"def twice(x = 2):\n",
" return x + x\n",
"\"\"\")"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.415130Z",
"iopub.status.busy": "2024-01-18T17:20:59.415007Z",
"iopub.status.idle": "2024-01-18T17:20:59.417274Z",
"shell.execute_reply": "2024-01-18T17:20:59.416814Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"create_foo_py()"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.419478Z",
"iopub.status.busy": "2024-01-18T17:20:59.419310Z",
"iopub.status.idle": "2024-01-18T17:20:59.422194Z",
"shell.execute_reply": "2024-01-18T17:20:59.421734Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"def twice(x = 2):\n",
" return x + x\n"
]
}
],
"source": [
"print(open(\"foo.py\").read(), end=\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We see how `autopep8` fixes the spacing:"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.424178Z",
"iopub.status.busy": "2024-01-18T17:20:59.424012Z",
"iopub.status.idle": "2024-01-18T17:20:59.591713Z",
"shell.execute_reply": "2024-01-18T17:20:59.591099Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\r\n",
"def twice(x=2):\r\n",
" return x + x\r\n"
]
}
],
"source": [
"!autopep8 foo.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Let us now put things together. We define a `ProgramRunner` that will run the `autopep8` executable with arguments coming from the mined `autopep8` grammar."
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.595102Z",
"iopub.status.busy": "2024-01-18T17:20:59.594853Z",
"iopub.status.idle": "2024-01-18T17:20:59.597903Z",
"shell.execute_reply": "2024-01-18T17:20:59.597346Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from Fuzzer import ProgramRunner"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Running `autopep8` with the mined options reveals a surprisingly high number of passing runs. (We see that some options depend on each other or are mutually exclusive, but this is handled by the program logic, not the argument parser, and hence out of our scope.) The `GrammarCoverageFuzzer` ensures that each option is tested at least once. (Digits and letters, too, by the way.)"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:20:59.600425Z",
"iopub.status.busy": "2024-01-18T17:20:59.600271Z",
"iopub.status.idle": "2024-01-18T17:21:00.571408Z",
"shell.execute_reply": "2024-01-18T17:21:00.571024Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"$ autopep8 foo.py\n",
"$ autopep8 --diff --max-line-length 4 --exit-code --range 5 8 -p 2 foo.py\n",
"$ autopep8 --ignore z --verbose -r --list-fixes foo.py\n",
"--recursive must be used with --in-place or --diff$ autopep8 --exclude 5 -h -i --aggressive --in-place foo.py\n",
"$ autopep8 --select a --help --experimental foo.py\n",
"$ autopep8 --indent-size -30 --recursive foo.py\n",
"--recursive must be used with --in-place or --diff$ autopep8 --global-config < -j 9 -v -a foo.py\n",
"parallel jobs requires --in-place$ autopep8 --line-range 7 1 --hang-closing -d foo.py\n",
"First value of --range should be less than or equal to the second$ autopep8 --pep8-passes 6 --hang-closing --version --ignore-local-config foo.py\n",
"$ autopep8 --jobs -2 --experimental --version foo.py\n",
"$ autopep8 --ignore Y: --select ! --global-config e foo.py\n",
"$ autopep8 --select 1 -a --recursive --aggressive foo.py\n",
"--recursive must be used with --in-place or --diff$ autopep8 --ignore * --ignore `0 --global-config _ --verbose foo.py\n",
"[file:foo.py]\n",
"---> Applying global fix for E265\n",
"---> 5 issue(s) to fix {'E251': {2}, 'E271': {3}, 'E221': {3}, 'E222': {3}}\n",
"---> 3 issue(s) to fix {'E251': {2}, 'E221': {3}, 'E222': {3}}\n",
"---> 1 issue(s) to fix {'E222': {3}}\n",
"---> 0 issue(s) to fix {}\n",
"$ autopep8 --global-config ,\\ --exclude r -v foo.py\n",
"[file:foo.py]\n",
"---> Applying global fix for E265\n",
"---> 5 issue(s) to fix {'E251': {2}, 'E271': {3}, 'E221': {3}, 'E222': {3}}\n",
"---> 3 issue(s) to fix {'E251': {2}, 'E221': {3}, 'E222': {3}}\n",
"---> 1 issue(s) to fix {'E222': {3}}\n",
"---> 0 issue(s) to fix {}\n",
"$ autopep8 --global-config xd6M --recursive foo.py\n",
"--recursive must be used with --in-place or --diff$ autopep8 --select R --exclude L --version --ignore-local-config foo.py\n",
"$ autopep8 --select \" --verbose -h -d foo.py\n",
"$ autopep8 --diff -i -h foo.py\n",
"$ autopep8 --in-place --select w --version -i foo.py\n",
"$ autopep8 --ignore 49 --exclude lI -i foo.py\n"
]
}
],
"source": [
"f = GrammarCoverageFuzzer(autopep8_grammar, max_nonterminals=5)\n",
"for i in range(20):\n",
" invocation = \"autopep8\" + f.fuzz()\n",
" print(\"$ \" + invocation)\n",
" args = invocation.split()\n",
" autopep8_runner = ProgramRunner(args)\n",
" result, outcome = autopep8_runner.run()\n",
" if result.stderr != \"\":\n",
" print(result.stderr, end=\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Our `foo.py` file now has been formatted in place a number of times:"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.573582Z",
"iopub.status.busy": "2024-01-18T17:21:00.573451Z",
"iopub.status.idle": "2024-01-18T17:21:00.575988Z",
"shell.execute_reply": "2024-01-18T17:21:00.575701Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"def twice(x=2):\n",
" return x + x\n"
]
}
],
"source": [
"print(open(\"foo.py\").read(), end=\"\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"We don't need it anymore, so we clean up things:"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.577853Z",
"iopub.status.busy": "2024-01-18T17:21:00.577711Z",
"iopub.status.idle": "2024-01-18T17:21:00.579535Z",
"shell.execute_reply": "2024-01-18T17:21:00.579199Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import os"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.581269Z",
"iopub.status.busy": "2024-01-18T17:21:00.581155Z",
"iopub.status.idle": "2024-01-18T17:21:00.582980Z",
"shell.execute_reply": "2024-01-18T17:21:00.582725Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"os.remove(\"foo.py\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"## Classes for Fuzzing Configuration Options\n",
"\n",
"Let us now create reusable classes that we can use for testing arbitrary programs. (Okay, make that \"arbitrary programs that are written in Python and use the `argparse` module to process command-line arguments.\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The class `OptionRunner` is a subclass of `ProgramRunner` that takes care of automatically determining the grammar, using the same steps we used for `autopep8`, above."
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.584651Z",
"iopub.status.busy": "2024-01-18T17:21:00.584541Z",
"iopub.status.idle": "2024-01-18T17:21:00.586183Z",
"shell.execute_reply": "2024-01-18T17:21:00.585910Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from Grammars import unreachable_nonterminals"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.587689Z",
"iopub.status.busy": "2024-01-18T17:21:00.587580Z",
"iopub.status.idle": "2024-01-18T17:21:00.590558Z",
"shell.execute_reply": "2024-01-18T17:21:00.590279Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"class OptionRunner(ProgramRunner):\n",
" \"\"\"Run a program while determining its option grammar\"\"\"\n",
"\n",
" def __init__(self, program: Union[str, List[str]],\n",
" arguments: Optional[str] = None, *,\n",
" log: bool = False,\n",
" miner_class: Optional[Type[OptionGrammarMiner]] = None):\n",
" \"\"\"Constructor.\n",
" `program` - the (Python) program to be executed\n",
" `arguments` - an (optional) string with arguments for `program`\n",
" `log` - if True, enable logging in miner\n",
" `miner_class` - the `OptionGrammarMiner` class to be used\n",
" (default: `OptionGrammarMiner`)\n",
" \"\"\"\n",
" if isinstance(program, str):\n",
" self.base_executable = program\n",
" else:\n",
" self.base_executable = program[0]\n",
"\n",
" if miner_class is None:\n",
" miner_class = OptionGrammarMiner\n",
" self.miner_class = miner_class\n",
" self.log = log\n",
"\n",
" self.find_contents()\n",
" self.find_grammar()\n",
" if arguments is not None:\n",
" self.set_arguments(arguments)\n",
" super().__init__(program)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"First, we find the contents of the Python executable:"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.592128Z",
"iopub.status.busy": "2024-01-18T17:21:00.592022Z",
"iopub.status.idle": "2024-01-18T17:21:00.594863Z",
"shell.execute_reply": "2024-01-18T17:21:00.594459Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"class OptionRunner(OptionRunner):\n",
" def find_contents(self):\n",
" self._executable = find_executable(self.base_executable)\n",
" if self._executable is None:\n",
" raise IOError(self.base_executable + \": not found\")\n",
"\n",
" first_line = open(self._executable).readline()\n",
" if first_line.find(\"python\") < 0:\n",
" raise IOError(self.base_executable + \": not a Python executable\")\n",
"\n",
" self.contents = open(self._executable).read()\n",
"\n",
" def invoker(self):\n",
" # We are passing the local variables as is, such that we can access `self`\n",
" # We set __name__ to '__main__' to invoke the script as an executable\n",
" exec(self.contents, {'__name__': '__main__'})\n",
"\n",
" def executable(self):\n",
" return self._executable"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Next, we determine the grammar using the `OptionGrammarMiner` class:"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.596685Z",
"iopub.status.busy": "2024-01-18T17:21:00.596561Z",
"iopub.status.idle": "2024-01-18T17:21:00.598828Z",
"shell.execute_reply": "2024-01-18T17:21:00.598546Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"class OptionRunner(OptionRunner):\n",
" def find_grammar(self):\n",
" miner = self.miner_class(self.invoker, log=self.log)\n",
" self._ebnf_grammar = miner.mine_ebnf_grammar()\n",
"\n",
" def ebnf_grammar(self):\n",
" \"\"\"Return extracted grammar in EBNF form\"\"\"\n",
" return self._ebnf_grammar\n",
"\n",
" def grammar(self):\n",
" \"\"\"Return extracted grammar in BNF form\"\"\"\n",
" return convert_ebnf_grammar(self._ebnf_grammar)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The two service methods `set_arguments()` and `set_invocation()` help us to change the arguments and program, respectively."
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.600494Z",
"iopub.status.busy": "2024-01-18T17:21:00.600318Z",
"iopub.status.idle": "2024-01-18T17:21:00.602646Z",
"shell.execute_reply": "2024-01-18T17:21:00.602379Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"class OptionRunner(OptionRunner):\n",
" def set_arguments(self, args):\n",
" self._ebnf_grammar[\"\"] = [\" \" + args]\n",
" # Delete rules for previous arguments\n",
" for nonterminal in unreachable_nonterminals(self._ebnf_grammar):\n",
" del self._ebnf_grammar[nonterminal]\n",
"\n",
" def set_invocation(self, program):\n",
" self.program = program"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We can instantiate the class on `autopep8` and immediately get the grammar:"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.604623Z",
"iopub.status.busy": "2024-01-18T17:21:00.604522Z",
"iopub.status.idle": "2024-01-18T17:21:00.608888Z",
"shell.execute_reply": "2024-01-18T17:21:00.608625Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"autopep8_runner = OptionRunner(\"autopep8\", \"foo.py\")"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.610491Z",
"iopub.status.busy": "2024-01-18T17:21:00.610403Z",
"iopub.status.idle": "2024-01-18T17:21:00.612476Z",
"shell.execute_reply": "2024-01-18T17:21:00.612222Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[' -h', ' --help', ' --version', ' -v', ' --verbose', ' -d', ' --diff', ' -i', ' --in-place', ' --global-config ', ' --ignore-local-config', ' -r', ' --recursive', ' -j ', ' --jobs ', ' -p ', ' --pep8-passes ', ' -a', ' --aggressive', ' --experimental', ' --exclude ', ' --list-fixes', ' --ignore ', ' --select ', ' --max-line-length ', ' --line-range ', ' --range ', ' --indent-size ', ' --hang-closing', ' --exit-code']\n"
]
}
],
"source": [
"print(autopep8_runner.ebnf_grammar()[\"\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"An `OptionFuzzer` interacts with the given `OptionRunner` to obtain its grammar, which is then passed to its `GrammarCoverageFuzzer` superclass."
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.614196Z",
"iopub.status.busy": "2024-01-18T17:21:00.614070Z",
"iopub.status.idle": "2024-01-18T17:21:00.616264Z",
"shell.execute_reply": "2024-01-18T17:21:00.616050Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"class OptionFuzzer(GrammarCoverageFuzzer):\n",
" \"\"\"Fuzz a (Python) program using its arguments\"\"\"\n",
"\n",
" def __init__(self, runner: OptionRunner, *args, **kwargs):\n",
" \"\"\"Constructor. `runner` is an OptionRunner.\"\"\"\n",
" assert issubclass(type(runner), OptionRunner)\n",
" self.runner = runner\n",
" grammar = runner.grammar()\n",
" super().__init__(grammar, *args, **kwargs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"When invoking `run()`, the `OptionFuzzer` creates a new invocation (using `fuzz()` from its grammar) and runs the now given (or previously set) runner with the arguments from the grammar. Note that the runner specified in `run()` can differ from the one set during initialization; this allows for mining options from one program and applying it in another context."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.617920Z",
"iopub.status.busy": "2024-01-18T17:21:00.617809Z",
"iopub.status.idle": "2024-01-18T17:21:00.619987Z",
"shell.execute_reply": "2024-01-18T17:21:00.619737Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"class OptionFuzzer(OptionFuzzer):\n",
" def run(self, runner=None, inp=\"\"):\n",
" if runner is None:\n",
" runner = self.runner\n",
" assert issubclass(type(runner), OptionRunner)\n",
" invocation = runner.executable() + \" \" + self.fuzz()\n",
" runner.set_invocation(invocation.split())\n",
" return runner.run(inp)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"### Example: Autopep8\n",
"\n",
"Let us apply our newly defined classes on the `autopep8` runner:"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.621512Z",
"iopub.status.busy": "2024-01-18T17:21:00.621399Z",
"iopub.status.idle": "2024-01-18T17:21:00.623563Z",
"shell.execute_reply": "2024-01-18T17:21:00.623331Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"autopep8_fuzzer = OptionFuzzer(autopep8_runner, max_nonterminals=5)"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.625008Z",
"iopub.status.busy": "2024-01-18T17:21:00.624914Z",
"iopub.status.idle": "2024-01-18T17:21:00.634022Z",
"shell.execute_reply": "2024-01-18T17:21:00.633732Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" foo.py\n",
" --in-place --ignore-local-config --jobs 6 --recursive -i foo.py\n",
" --help -a --indent-size -95 --pep8-passes 3 --exclude = -r foo.py\n"
]
}
],
"source": [
"for i in range(3):\n",
" print(autopep8_fuzzer.fuzz())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We can now systematically test `autopep8` with these classes:"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.635870Z",
"iopub.status.busy": "2024-01-18T17:21:00.635765Z",
"iopub.status.idle": "2024-01-18T17:21:00.678962Z",
"shell.execute_reply": "2024-01-18T17:21:00.678529Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(CompletedProcess(args=['/Users/zeller/.pyenv/versions/3.10.2/bin/autopep8', '--hang-closing', '--exit-code', '-d', '--version', 'foo.py'], returncode=0, stdout='autopep8 1.6.0 (pycodestyle: 2.5.0)\\n', stderr=''),\n",
" 'PASS')"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"autopep8_fuzzer.run(autopep8_runner)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Example: MyPy\n",
"\n",
"We can extract options for the `mypy` static type checker for Python:"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.680873Z",
"iopub.status.busy": "2024-01-18T17:21:00.680752Z",
"iopub.status.idle": "2024-01-18T17:21:00.682712Z",
"shell.execute_reply": "2024-01-18T17:21:00.682449Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"assert find_executable(\"mypy\") is not None"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.684217Z",
"iopub.status.busy": "2024-01-18T17:21:00.684111Z",
"iopub.status.idle": "2024-01-18T17:21:00.839583Z",
"shell.execute_reply": "2024-01-18T17:21:00.839287Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[' -h', ' --help', ' -v', ' --verbose', ' -V', ' --version', ' --config-file ', ' --warn-unused-configs', ' --no-warn-unused-configs', ' --no-namespace-packages', ' --namespace-packages', ' --ignore-missing-imports', ' --follow-imports ', ' --python-executable', ' --no-site-packages', ' --no-silence-site-packages', ' --python-version ', ' --platform', ' --always-true', ' --always-false', ' --disallow-any-unimported', ' --disallow-any-expr', ' --disallow-any-decorated', ' --disallow-any-explicit', ' --disallow-any-generics', ' --allow-any-generics', ' --disallow-subclassing-any', ' --allow-subclassing-any', ' --disallow-untyped-calls', ' --allow-untyped-calls', ' --untyped-calls-exclude', ' --disallow-untyped-defs', ' --allow-untyped-defs', ' --disallow-incomplete-defs', ' --allow-incomplete-defs', ' --check-untyped-defs', ' --no-check-untyped-defs', ' --disallow-untyped-decorators', ' --allow-untyped-decorators', ' --implicit-optional', ' --no-implicit-optional', ' --strict-optional', ' --no-strict-optional', ' --force-uppercase-builtins', ' --no-force-uppercase-builtins', ' --force-union-syntax', ' --no-force-union-syntax', ' --warn-redundant-casts', ' --no-warn-redundant-casts', ' --warn-unused-ignores', ' --no-warn-unused-ignores', ' --no-warn-no-return', ' --warn-no-return', ' --warn-return-any', ' --no-warn-return-any', ' --warn-unreachable', ' --no-warn-unreachable', ' --allow-untyped-globals', ' --disallow-untyped-globals', ' --allow-redefinition', ' --disallow-redefinition', ' --no-implicit-reexport', ' --implicit-reexport', ' --strict-equality', ' --no-strict-equality', ' --extra-checks', ' --no-extra-checks', ' --strict', ' --disable-error-code', ' --enable-error-code', ' --show-error-context', ' --hide-error-context', ' --show-column-numbers', ' --hide-column-numbers', ' --show-error-end', ' --hide-error-end', ' --hide-error-codes', ' --show-error-codes', ' --show-error-code-links', ' --hide-error-code-links', ' --pretty', ' --no-pretty', ' --no-color-output', ' --color-output', ' --no-error-summary', ' --error-summary', ' --show-absolute-path', ' --hide-absolute-path', ' --soft-error-limit ', ' -i', ' --incremental', ' --no-incremental', ' --cache-dir', ' --sqlite-cache', ' --no-sqlite-cache', ' --cache-fine-grained', ' --skip-version-check', ' --skip-cache-mtime-checks', ' --pdb', ' --show-traceback', ' --tb', ' --raise-exceptions', ' --custom-typing-module ', ' --new-type-inference', ' --disable-recursive-aliases', ' --enable-recursive-aliases', ' --enable-incomplete-feature', ' --custom-typeshed-dir ', ' --warn-incomplete-stub', ' --no-warn-incomplete-stub', ' --shadow-file', ' --fast-exit', ' --no-fast-exit', ' --allow-empty-bodies', ' --disallow-empty-bodies', ' --export-ref-info', ' --any-exprs-report ', ' --cobertura-xml-report ', ' --html-report ', ' --linecount-report ', ' --linecoverage-report ', ' --lineprecision-report ', ' --txt-report ', ' --xml-report ', ' --xslt-html-report ', ' --xslt-txt-report ', ' --quickstart-file ', ' --junit-xml ', ' --find-occurrences ', ' --scripts-are-modules', ' --install-types', ' --no-install-types', ' --non-interactive', ' --interactive', ' --stats', ' --inferstats', ' --dump-build-stats', ' --timing-stats ', ' --line-checking-stats ', ' --debug-cache', ' --dump-deps', ' --dump-graph', ' --semantic-analysis-only', ' --local-partial-types', ' --logical-deps', ' --bazel', ' --package-root', ' --cache-map( )+', ' --debug-serialize', ' --enable-incomplete-features', ' --disable-bytearray-promotion', ' --disable-memoryview-promotion', ' --strict-concatenate', ' --explicit-package-bases', ' --no-explicit-package-bases', ' --fast-module-lookup', ' --no-fast-module-lookup', ' --exclude', ' -m', ' --module', ' -p', ' --package', ' -c', ' --command']\n"
]
}
],
"source": [
"mypy_runner = OptionRunner(\"mypy\", \"foo.py\")\n",
"print(mypy_runner.ebnf_grammar()[\"\"])"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.841256Z",
"iopub.status.busy": "2024-01-18T17:21:00.841099Z",
"iopub.status.idle": "2024-01-18T17:21:00.946804Z",
"shell.execute_reply": "2024-01-18T17:21:00.946539Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" --follow-imports l foo.py\n",
" --junit-xml p\" --allow-untyped-decorators --install-types --no-warn-incomplete-stub foo.py\n",
" --soft-error-limit 2 --show-error-context --disable-recursive-aliases foo.py\n",
" --no-site-packages --version --warn-unused-configs --allow-redefinition --always-false --show-error-codes --warn-incomplete-stub --strict-optional --bazel --explicit-package-bases foo.py\n",
" --color-output --quickstart-file + --strict --incremental --show-traceback -h --cache-dir --sqlite-cache --pretty foo.py\n",
" --disallow-any-explicit --force-uppercase-builtins --allow-untyped-defs --cache-fine-grained --skip-cache-mtime-checks --no-warn-unreachable foo.py\n",
" --no-silence-site-packages --implicit-optional --xml-report Y --implicit-reexport --shadow-file --logical-deps -c --no-strict-optional --line-checking-stats u3% --local-partial-types --stats --disallow-subclassing-any --no-fast-module-lookup foo.py\n",
" --hide-error-end --disallow-incomplete-defs --package --linecount-report 0 --no-sqlite-cache --pdb --no-incremental --dump-build-stats --no-color-output --exclude --disallow-untyped-globals -V foo.py\n",
" --force-union-syntax --disable-memoryview-promotion --allow-incomplete-defs --semantic-analysis-only --no-pretty --python-executable --hide-absolute-path foo.py\n",
" --fast-module-lookup --disallow-untyped-defs --no-warn-no-return --warn-unused-ignores foo.py\n"
]
}
],
"source": [
"mypy_fuzzer = OptionFuzzer(mypy_runner, max_nonterminals=5)\n",
"for i in range(10):\n",
" print(mypy_fuzzer.fuzz())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Example: Notedown\n",
"\n",
"Here's the configuration options for the `notedown` Notebook to Markdown converter:"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.948469Z",
"iopub.status.busy": "2024-01-18T17:21:00.948381Z",
"iopub.status.idle": "2024-01-18T17:21:00.950098Z",
"shell.execute_reply": "2024-01-18T17:21:00.949840Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"assert find_executable(\"notedown\") is not None"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.951521Z",
"iopub.status.busy": "2024-01-18T17:21:00.951442Z",
"iopub.status.idle": "2024-01-18T17:21:00.952930Z",
"shell.execute_reply": "2024-01-18T17:21:00.952687Z"
},
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"outputs": [],
"source": [
"import warnings"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:00.954282Z",
"iopub.status.busy": "2024-01-18T17:21:00.954201Z",
"iopub.status.idle": "2024-01-18T17:21:02.486328Z",
"shell.execute_reply": "2024-01-18T17:21:02.485956Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"with warnings.catch_warnings():\n",
" # Workaround: `notedown` can issue a `DeprecationWarning`\n",
" warnings.filterwarnings(\"ignore\", category=DeprecationWarning)\n",
" notedown_runner = OptionRunner(\"notedown\")"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.488176Z",
"iopub.status.busy": "2024-01-18T17:21:02.487987Z",
"iopub.status.idle": "2024-01-18T17:21:02.489800Z",
"shell.execute_reply": "2024-01-18T17:21:02.489547Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[' -h', ' --help', ' -o( )?', ' --output( )?', ' --from ', ' --to ', ' --run', ' --execute', ' --timeout ', ' --strip', ' --precode( )+', ' --knit( )?', ' --rmagic', ' --nomagic', ' --render', ' --template ', ' --match ', ' --examples', ' --version', ' --debug']\n"
]
}
],
"source": [
"print(notedown_runner.ebnf_grammar()[\"\"])"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.491535Z",
"iopub.status.busy": "2024-01-18T17:21:02.491397Z",
"iopub.status.idle": "2024-01-18T17:21:02.526354Z",
"shell.execute_reply": "2024-01-18T17:21:02.526100Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2\n",
" --strip --version --timeout 47 --knit {\n",
" --help --render Q@\n",
" --template 9v --examples\n",
" --run --execute --rmagic p\n",
" -h --match ' --nomagic --from ,w --debug --execute =\n",
" --version --execute Pa\n",
" --output -o 5C --to g --precode } --knit --debug E\n",
" -h --render d\\\n",
" --output --execute /DG\n"
]
}
],
"source": [
"notedown_fuzzer = OptionFuzzer(notedown_runner, max_nonterminals=5)\n",
"for i in range(10):\n",
" print(notedown_fuzzer.fuzz())"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Combinatorial Testing\n",
"\n",
"Our `CoverageGrammarFuzzer` does a good job in covering each and every option at least once, which is great for systematic testing. However, as we also can see in our examples above, some options require each other, while others interfere with each other. What we should do as good testers is not only to cover every option individually, but also _combinations_ of options."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The Python `itertools` module gives us means to create combinations from lists. We can, for instance, take the `notedown` options and create a list of all pairs."
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.528280Z",
"iopub.status.busy": "2024-01-18T17:21:02.528141Z",
"iopub.status.idle": "2024-01-18T17:21:02.530067Z",
"shell.execute_reply": "2024-01-18T17:21:02.529745Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from itertools import combinations"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.531515Z",
"iopub.status.busy": "2024-01-18T17:21:02.531404Z",
"iopub.status.idle": "2024-01-18T17:21:02.533168Z",
"shell.execute_reply": "2024-01-18T17:21:02.532900Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"option_list = notedown_runner.ebnf_grammar()[\" \"]\n",
"pairs = list(combinations(option_list, 2))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"There's quite a number of pairs:"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.534872Z",
"iopub.status.busy": "2024-01-18T17:21:02.534658Z",
"iopub.status.idle": "2024-01-18T17:21:02.536911Z",
"shell.execute_reply": "2024-01-18T17:21:02.536613Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"190"
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(pairs)"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.538398Z",
"iopub.status.busy": "2024-01-18T17:21:02.538286Z",
"iopub.status.idle": "2024-01-18T17:21:02.540050Z",
"shell.execute_reply": "2024-01-18T17:21:02.539753Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[(' -h', ' --help'), (' -h', ' -o( )?'), (' -h', ' --output( )?'), (' -h', ' --from '), (' -h', ' --to '), (' -h', ' --run'), (' -h', ' --execute'), (' -h', ' --timeout '), (' -h', ' --strip'), (' -h', ' --precode( )+'), (' -h', ' --knit( )?'), (' -h', ' --rmagic'), (' -h', ' --nomagic'), (' -h', ' --render'), (' -h', ' --template '), (' -h', ' --match '), (' -h', ' --examples'), (' -h', ' --version'), (' -h', ' --debug'), (' --help', ' -o( )?')]\n"
]
}
],
"source": [
"print(pairs[:20])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Testing every such pair of options frequently suffices to cover all interferences between options. (Programs rarely have conditions involving three or more configuration settings.) To this end, we _change_ the grammar from having a list of options to having a list of _option pairs_, such that covering these will automatically cover all pairs."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We create a function `pairwise()` that takes a list of options as occurring in our grammar and returns a list of _pairwise options_ – that is, our original options, but concatenated."
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.541677Z",
"iopub.status.busy": "2024-01-18T17:21:02.541564Z",
"iopub.status.idle": "2024-01-18T17:21:02.543448Z",
"shell.execute_reply": "2024-01-18T17:21:02.543105Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"def pairwise(option_list):\n",
" return [option_1 + option_2\n",
" for (option_1, option_2) in combinations(option_list, 2)]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Here's the first 20 pairs:"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.545161Z",
"iopub.status.busy": "2024-01-18T17:21:02.545031Z",
"iopub.status.idle": "2024-01-18T17:21:02.547006Z",
"shell.execute_reply": "2024-01-18T17:21:02.546757Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[' -h --help', ' -h -o( )?', ' -h --output( )?', ' -h --from ', ' -h --to ', ' -h --run', ' -h --execute', ' -h --timeout ', ' -h --strip', ' -h --precode( )+', ' -h --knit( )?', ' -h --rmagic', ' -h --nomagic', ' -h --render', ' -h --template ', ' -h --match ', ' -h --examples', ' -h --version', ' -h --debug', ' --help -o( )?']\n"
]
}
],
"source": [
"print(pairwise(option_list)[:20])"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The new grammar `pairwise_notedown_grammar` is a copy of the `notedown` grammar, but with the list of options replaced with the above pairwise option list."
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.548456Z",
"iopub.status.busy": "2024-01-18T17:21:02.548340Z",
"iopub.status.idle": "2024-01-18T17:21:02.550977Z",
"shell.execute_reply": "2024-01-18T17:21:02.550658Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"notedown_grammar = notedown_runner.grammar()\n",
"pairwise_notedown_grammar = extend_grammar(notedown_grammar)\n",
"pairwise_notedown_grammar[\"\"] = pairwise(notedown_grammar[\" \"])\n",
"assert is_valid_grammar(pairwise_notedown_grammar)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Using the \"pairwise\" grammar to fuzz now covers one pair after another:"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.552604Z",
"iopub.status.busy": "2024-01-18T17:21:02.552481Z",
"iopub.status.idle": "2024-01-18T17:21:02.554565Z",
"shell.execute_reply": "2024-01-18T17:21:02.554279Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"notedown_pairwise_fuzzer = GrammarCoverageFuzzer(\n",
" pairwise_notedown_grammar, max_nonterminals=4)"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.573887Z",
"iopub.status.busy": "2024-01-18T17:21:02.573732Z",
"iopub.status.idle": "2024-01-18T17:21:02.764001Z",
"shell.execute_reply": "2024-01-18T17:21:02.763698Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2\n",
" -h --nomagic --version --debug\n",
" --output G --strip --examples --version t\n",
" --help --strip --nomagic --examples\n",
" -h --run --help --render\n",
" --render --examples --nomagic --version\n",
" --precode i --match 6 D\n",
" --strip --render --strip --examples (\n",
" --nomagic --debug --nomagic --render B\n",
" --execute --rmagic --run --debug\n"
]
}
],
"source": [
"for i in range(10):\n",
" print(notedown_pairwise_fuzzer.fuzz())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Can we actually test all combinations of options? Not in practice, as the number of combinations quickly grows as the length increases. It decreases again as the number of options reaches the maximum (with 20 options, there is only 1 combination involving _all_ options), but the absolute numbers are still staggering:"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.765705Z",
"iopub.status.busy": "2024-01-18T17:21:02.765587Z",
"iopub.status.idle": "2024-01-18T17:21:02.898971Z",
"shell.execute_reply": "2024-01-18T17:21:02.898613Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 20\n",
"2 190\n",
"3 1140\n",
"4 4845\n",
"5 15504\n",
"6 38760\n",
"7 77520\n",
"8 125970\n",
"9 167960\n",
"10 184756\n",
"11 167960\n",
"12 125970\n",
"13 77520\n",
"14 38760\n",
"15 15504\n",
"16 4845\n",
"17 1140\n",
"18 190\n",
"19 20\n"
]
}
],
"source": [
"for combination_length in range(1, 20):\n",
" tuples = list(combinations(option_list, combination_length))\n",
" print(combination_length, len(tuples))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Formally, the number of combinations of length $k$ in a set of options of length $n$ is the binomial coefficient\n",
"$$\n",
"{n \\choose k} = \\frac{n!}{k!(n - k)!}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"which for $k = 2$ (all pairs) gives us\n",
"\n",
"$$\n",
"{n \\choose 2} = \\frac{n!}{2(n - 2)!} = \\frac{n (n - 1)}{2}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"For `autopep8` with its 30 options..."
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.901191Z",
"iopub.status.busy": "2024-01-18T17:21:02.900867Z",
"iopub.status.idle": "2024-01-18T17:21:02.903352Z",
"shell.execute_reply": "2024-01-18T17:21:02.903034Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"30"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(autopep8_runner.ebnf_grammar()[\" \"])"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.905026Z",
"iopub.status.busy": "2024-01-18T17:21:02.904892Z",
"iopub.status.idle": "2024-01-18T17:21:02.906706Z",
"shell.execute_reply": "2024-01-18T17:21:02.906452Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# docassert\n",
"assert len(autopep8_runner.ebnf_grammar()[\" \"]) == 30"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"... we thus need 870 tests to cover all pairs:"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.908364Z",
"iopub.status.busy": "2024-01-18T17:21:02.908262Z",
"iopub.status.idle": "2024-01-18T17:21:02.910379Z",
"shell.execute_reply": "2024-01-18T17:21:02.910116Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"870"
]
},
"execution_count": 100,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(autopep8_runner.ebnf_grammar()[\" \"]) * \\\n",
" (len(autopep8_runner.ebnf_grammar()[\" \"]) - 1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"For `mypy` with its 140+ options, though, we already end up with 20,000+ tests to be conducted:"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.912296Z",
"iopub.status.busy": "2024-01-18T17:21:02.912124Z",
"iopub.status.idle": "2024-01-18T17:21:02.914302Z",
"shell.execute_reply": "2024-01-18T17:21:02.914055Z"
},
"slideshow": {
"slide_type": "fragment"
},
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"164"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(mypy_runner.ebnf_grammar()[\" \"])"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.915811Z",
"iopub.status.busy": "2024-01-18T17:21:02.915707Z",
"iopub.status.idle": "2024-01-18T17:21:02.917257Z",
"shell.execute_reply": "2024-01-18T17:21:02.917025Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# docassert\n",
"assert len(mypy_runner.ebnf_grammar()[\" \"]) >= 140"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.918627Z",
"iopub.status.busy": "2024-01-18T17:21:02.918545Z",
"iopub.status.idle": "2024-01-18T17:21:02.920700Z",
"shell.execute_reply": "2024-01-18T17:21:02.920442Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"26732"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(mypy_runner.ebnf_grammar()[\" \"]) * \\\n",
" (len(mypy_runner.ebnf_grammar()[\" \"]) - 1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Even if each pair takes a second to run, we'd still be done in three hours of testing, though."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"If your program has more options that you all want to get covered in combinations, it is advisable that you limit the number of configurations further – for instance by limiting combinatorial testing to those combinations that possibly can interact with each other; and covering all other (presumably orthogonal) options individually."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"This mechanism of creating configurations by extending grammars can be easily extended to other configuration targets. One may want to explore a greater number of configurations, or expansions in specific contexts. The [exercises](#Exercises), below, have a number of options ready for you."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Synopsis\n",
"\n",
"This chapter provides two classes:\n",
"\n",
"* `OptionRunner` automatically extract command-line options from a Python program;\n",
"* `OptionFuzzer` uses these to automatically test a Python program with a large variety of options."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"`OptionRunner` runs a program up to the point where it parses its arguments, and then extracts a grammar that describes its invocations:"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.922273Z",
"iopub.status.busy": "2024-01-18T17:21:02.922169Z",
"iopub.status.idle": "2024-01-18T17:21:02.926074Z",
"shell.execute_reply": "2024-01-18T17:21:02.925826Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"autopep8_runner = OptionRunner(\"autopep8\", \"foo.py\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The grammar can be extracted via the method `ebnf_grammar()`:"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.927640Z",
"iopub.status.busy": "2024-01-18T17:21:02.927546Z",
"iopub.status.idle": "2024-01-18T17:21:02.930689Z",
"shell.execute_reply": "2024-01-18T17:21:02.930421Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"{'': ['()*'],\n",
" '': [' -h',\n",
" ' --help',\n",
" ' --version',\n",
" ' -v',\n",
" ' --verbose',\n",
" ' -d',\n",
" ' --diff',\n",
" ' -i',\n",
" ' --in-place',\n",
" ' --global-config ',\n",
" ' --ignore-local-config',\n",
" ' -r',\n",
" ' --recursive',\n",
" ' -j ',\n",
" ' --jobs ',\n",
" ' -p ',\n",
" ' --pep8-passes ',\n",
" ' -a',\n",
" ' --aggressive',\n",
" ' --experimental',\n",
" ' --exclude ',\n",
" ' --list-fixes',\n",
" ' --ignore ',\n",
" ' --select ',\n",
" ' --max-line-length ',\n",
" ' --line-range ',\n",
" ' --range ',\n",
" ' --indent-size ',\n",
" ' --hang-closing',\n",
" ' --exit-code'],\n",
" '': [' foo.py'],\n",
" '': ['+'],\n",
" '': ['0',\n",
" '1',\n",
" '2',\n",
" '3',\n",
" '4',\n",
" '5',\n",
" '6',\n",
" '7',\n",
" '8',\n",
" '9',\n",
" 'a',\n",
" 'b',\n",
" 'c',\n",
" 'd',\n",
" 'e',\n",
" 'f',\n",
" 'g',\n",
" 'h',\n",
" 'i',\n",
" 'j',\n",
" 'k',\n",
" 'l',\n",
" 'm',\n",
" 'n',\n",
" 'o',\n",
" 'p',\n",
" 'q',\n",
" 'r',\n",
" 's',\n",
" 't',\n",
" 'u',\n",
" 'v',\n",
" 'w',\n",
" 'x',\n",
" 'y',\n",
" 'z',\n",
" 'A',\n",
" 'B',\n",
" 'C',\n",
" 'D',\n",
" 'E',\n",
" 'F',\n",
" 'G',\n",
" 'H',\n",
" 'I',\n",
" 'J',\n",
" 'K',\n",
" 'L',\n",
" 'M',\n",
" 'N',\n",
" 'O',\n",
" 'P',\n",
" 'Q',\n",
" 'R',\n",
" 'S',\n",
" 'T',\n",
" 'U',\n",
" 'V',\n",
" 'W',\n",
" 'X',\n",
" 'Y',\n",
" 'Z',\n",
" '!',\n",
" '\"',\n",
" '#',\n",
" '$',\n",
" '%',\n",
" '&',\n",
" \"'\",\n",
" '(',\n",
" ')',\n",
" '*',\n",
" '+',\n",
" ',',\n",
" '-',\n",
" '.',\n",
" '/',\n",
" ':',\n",
" ';',\n",
" '<',\n",
" '=',\n",
" '>',\n",
" '?',\n",
" '@',\n",
" '[',\n",
" '\\\\',\n",
" ']',\n",
" '^',\n",
" '_',\n",
" '`',\n",
" '{',\n",
" '|',\n",
" '}',\n",
" '~'],\n",
" '': [''],\n",
" '': ['(-)?+'],\n",
" '': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],\n",
" '': [''],\n",
" '': [''],\n",
" '': [''],\n",
" '': ['']}"
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"option_ebnf_grammar = autopep8_runner.ebnf_grammar()\n",
"option_ebnf_grammar"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The grammar can be immediately used for fuzzing. A `GrammarCoverageFuzzer` will ensure all options are covered:"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.932172Z",
"iopub.status.busy": "2024-01-18T17:21:02.932066Z",
"iopub.status.idle": "2024-01-18T17:21:02.933650Z",
"shell.execute_reply": "2024-01-18T17:21:02.933416Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from Grammars import convert_ebnf_grammar"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:02.935061Z",
"iopub.status.busy": "2024-01-18T17:21:02.934977Z",
"iopub.status.idle": "2024-01-18T17:21:03.708459Z",
"shell.execute_reply": "2024-01-18T17:21:03.708170Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[' --list-fixes foo.py',\n",
" ' --version -i --range -5 04 --aggressive --diff -j 817 --help -d --hang-closing -p 26 -a --verbose --exclude |g -v --global-config f --recursive --ignore .= -r --experimental -h --line-range -317 9 --pep8-passes 8 --max-line-length 2 --indent-size 0 --in-place foo.py',\n",
" ' --ignore-local-config --jobs 92 --exit-code --select Q\"6 --global-config (h --ignore )` --select M --select t,\\\\Z --global-config u5 --global-config ! --ignore <*p --ignore # --global-config c8N[ --ignore \\'x --global-config 09+ --select Y --ignore @ --global-config K --global-config T --exclude E --select odCAH --global-config DFR --ignore s --select ; --global-config Wyi --global-config q2 --exclude - --ignore r --global-config { --global-config zv$} --global-config 3O --ignore e]>lUSb --exclude V^ --global-config LBj --global-config X --select _ --exclude n4 --global-config %&/: --select a --global-config k --ignore 1 --ignore G~m?7 --exclude PJX --select I --select w -d foo.py']"
]
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fuzzer = GrammarCoverageFuzzer(convert_ebnf_grammar(option_ebnf_grammar))\n",
"[fuzzer.fuzz() for i in range(3)]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The `OptionFuzzer` class summarizes these steps. Its constructor takes an `OptionRunner` to automatically extract the grammar; it does the necessary steps to extract the grammar and fuzz with it."
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:03.710234Z",
"iopub.status.busy": "2024-01-18T17:21:03.710106Z",
"iopub.status.idle": "2024-01-18T17:21:03.714686Z",
"shell.execute_reply": "2024-01-18T17:21:03.714442Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"autopep8_runner = OptionRunner(\"autopep8\", \"foo.py\")\n",
"autopep8_fuzzer = OptionFuzzer(autopep8_runner)"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:03.716297Z",
"iopub.status.busy": "2024-01-18T17:21:03.716210Z",
"iopub.status.idle": "2024-01-18T17:21:03.944468Z",
"shell.execute_reply": "2024-01-18T17:21:03.944196Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"[' foo.py',\n",
" ' -a --hang-closing --aggressive --help -h --ignore : --verbose --diff --jobs 68 --line-range 4 -159 --indent-size -0 -d -i --range 27 38 --experimental --global-config gt --ignore-local-config --exit-code --in-place -v foo.py',\n",
" ' --select ?H --pep8-passes 7 --max-line-length -154 -r --exclude P --recursive --list-fixes -j 77 --version -p 71 --global-config )c --select 4!m/ --ignore 1d3 --select ~ --ignore b[^ --global-config qn --exclude >V --exclude 2 --select $_iUzQ].0 --select ; --ignore 5M --ignore 9 --select pN`C --ignore 76(J --select Dh --exclude ov\" -h foo.py']"
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[autopep8_fuzzer.fuzz() for i in range(3)]"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The final step in testing would now to invoke the program with these arguments."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Note that `OptionRunner` is experimental: It assumes that the Python program in question uses the `argparse` module; and not all `argparse` features are supported. Still, it does a pretty good job even on nontrivial programs."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The `OptionRunner` constructor accepts an additional `miner` keyword parameter, which takes the class of the argument grammar miner to be used. By default, this is `OptionGrammarMiner` – a helper class that can be used (and extended) to create own option grammar miners."
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:03.946129Z",
"iopub.status.busy": "2024-01-18T17:21:03.946014Z",
"iopub.status.idle": "2024-01-18T17:21:03.947710Z",
"shell.execute_reply": "2024-01-18T17:21:03.947444Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# ignore\n",
"from ClassDiagram import display_class_hierarchy\n",
"from Fuzzer import Fuzzer, Runner, ProgramRunner\n",
"from Grammars import Expansion\n",
"from GrammarFuzzer import GrammarFuzzer, DerivationTree\n",
"from GrammarCoverageFuzzer import TrackingGrammarCoverageFuzzer"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:03.949197Z",
"iopub.status.busy": "2024-01-18T17:21:03.949095Z",
"iopub.status.idle": "2024-01-18T17:21:04.521675Z",
"shell.execute_reply": "2024-01-18T17:21:04.521233Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
"\n",
"\n",
"OptionRunner \n",
"\n",
" \n",
"OptionRunner \n",
" \n",
"\n",
"\n",
"__init__() \n",
" \n",
" \n",
"\n",
"ebnf_grammar() \n",
" \n",
" \n",
"\n",
"grammar() \n",
" \n",
" \n",
"\n",
"executable() \n",
" \n",
" \n",
"\n",
"find_contents() \n",
" \n",
" \n",
"\n",
"find_grammar() \n",
" \n",
" \n",
"\n",
"invoker() \n",
" \n",
" \n",
"\n",
"set_arguments() \n",
" \n",
" \n",
"\n",
"set_invocation() \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"ProgramRunner \n",
"\n",
" \n",
"ProgramRunner \n",
" \n",
"\n",
"\n",
"__init__() \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"OptionRunner->ProgramRunner \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"Runner \n",
"\n",
" \n",
"Runner \n",
" \n",
"\n",
"\n",
"FAIL \n",
" \n",
" \n",
"\n",
"PASS \n",
" \n",
" \n",
"\n",
"UNRESOLVED \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"__init__() \n",
" \n",
" \n",
"\n",
"run() \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"ProgramRunner->Runner \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"OptionFuzzer \n",
"\n",
" \n",
"OptionFuzzer \n",
" \n",
"\n",
"\n",
"__init__() \n",
" \n",
" \n",
"\n",
"run() \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"GrammarCoverageFuzzer \n",
"\n",
" \n",
"GrammarCoverageFuzzer \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"OptionFuzzer->GrammarCoverageFuzzer \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"SimpleGrammarCoverageFuzzer \n",
"\n",
" \n",
"SimpleGrammarCoverageFuzzer \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"GrammarCoverageFuzzer->SimpleGrammarCoverageFuzzer \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"TrackingGrammarCoverageFuzzer \n",
"\n",
" \n",
"TrackingGrammarCoverageFuzzer \n",
" \n",
"\n",
"\n",
"__init__() \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"SimpleGrammarCoverageFuzzer->TrackingGrammarCoverageFuzzer \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"GrammarFuzzer \n",
"\n",
" \n",
"GrammarFuzzer \n",
" \n",
"\n",
"\n",
"__init__() \n",
" \n",
" \n",
"\n",
"fuzz() \n",
" \n",
" \n",
"\n",
"fuzz_tree() \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"TrackingGrammarCoverageFuzzer->GrammarFuzzer \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"Fuzzer \n",
"\n",
" \n",
"Fuzzer \n",
" \n",
"\n",
"\n",
"__init__() \n",
" \n",
" \n",
"\n",
"fuzz() \n",
" \n",
" \n",
"\n",
"run() \n",
" \n",
" \n",
"\n",
"runs() \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"GrammarFuzzer->Fuzzer \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"OptionGrammarMiner \n",
"\n",
" \n",
"OptionGrammarMiner \n",
" \n",
"\n",
"\n",
"ARGUMENTS_SYMBOL \n",
" \n",
" \n",
"\n",
"OPTION_SYMBOL \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"__init__() \n",
" \n",
" \n",
"\n",
"mine_ebnf_grammar() \n",
" \n",
" \n",
"\n",
"mine_grammar() \n",
" \n",
" \n",
"\n",
"add_group() \n",
" \n",
" \n",
"\n",
"add_int_rule() \n",
" \n",
" \n",
"\n",
"add_metavar_rule() \n",
" \n",
" \n",
"\n",
"add_parameter() \n",
" \n",
" \n",
"\n",
"add_str_rule() \n",
" \n",
" \n",
"\n",
"add_type_rule() \n",
" \n",
" \n",
"\n",
"process_arg() \n",
" \n",
" \n",
"\n",
"process_argument() \n",
" \n",
" \n",
"\n",
"traceit() \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"Legend \n",
"Legend \n",
"• \n",
"public_method() \n",
"• \n",
"private_method() \n",
"• \n",
"overloaded_method() \n",
"Hover over names to see doc \n",
" \n",
" \n",
" \n"
],
"text/plain": [
""
]
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# ignore\n",
"display_class_hierarchy([OptionRunner, OptionFuzzer, OptionGrammarMiner],\n",
" public_methods=[\n",
" Fuzzer.__init__,\n",
" Fuzzer.fuzz,\n",
" Fuzzer.run,\n",
" Fuzzer.runs,\n",
" GrammarFuzzer.__init__,\n",
" GrammarFuzzer.fuzz,\n",
" GrammarFuzzer.fuzz_tree,\n",
" TrackingGrammarCoverageFuzzer.__init__,\n",
" OptionFuzzer.__init__,\n",
" OptionFuzzer.run,\n",
" Runner.__init__,\n",
" Runner.run,\n",
" ProgramRunner.__init__,\n",
" ProgramRunner.__init__,\n",
" OptionRunner.__init__,\n",
" OptionRunner.ebnf_grammar,\n",
" OptionRunner.grammar,\n",
" OptionGrammarMiner.__init__,\n",
" OptionGrammarMiner.mine_ebnf_grammar,\n",
" OptionGrammarMiner.mine_grammar,\n",
" ],\n",
" types={\n",
" 'DerivationTree': DerivationTree,\n",
" 'Expansion': Expansion,\n",
" 'Grammar': Grammar\n",
" },\n",
" project='fuzzingbook')"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": true,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Lessons Learned\n",
"\n",
"* Besides regular input data, program _configurations_ make an important testing target.\n",
"* For a given program using a standard library to parse command-line options and arguments, one can automatically extract these and convert them into a grammar.\n",
"* To cover not only single options, but combinations of options, one can expand the grammar to cover all pairs, or come up with even more ambitious targets."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Next Steps\n",
"\n",
"If you liked the idea of mining a grammar from a program, do not miss:\n",
"\n",
"* [how to mine grammars for input data](GrammarMiner.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"Our next steps in the book focus on:\n",
"\n",
"* [how to parse and recombine inputs](Parser.ipynb)\n",
"* [how to assign weights and probabilities to specific productions](ProbabilisticGrammarFuzzer.ipynb)\n",
"* [how to simplify inputs that cause a failure](Reducer.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Background\n",
"\n",
"Although configuration data is just as likely to cause failures as other input data, it has received relatively little attention in test generation – possibly because, unlike \"regular\" input data, configuration data is not so much under control of external parties, and because, again unlike regular data, there is little variance in configurations. Creating models for software configurations and using these models for testing is commonplace, as is the idea of pairwise testing. For an overview, see \\cite{Pezze2008}; for a discussion and comparison of state-of-the-art techniques, see \\cite{Petke2015}.\n",
"\n",
"More specifically, \\cite{Sutton2007} also discuss techniques to systematically cover command-line options. Dai et al. \\cite{Dai2010} apply configuration fuzzing by changing variables associated with configuration files. "
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "subslide"
},
"toc-hr-collapsed": false
},
"source": [
"## Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"toc-hr-collapsed": true
},
"source": [
"### Exercise 1: #ifdef Configuration Fuzzing\n",
"\n",
"In C programs, the *C preprocessor* can be used to choose which code parts should be compiled and which ones should not. As an example, in the C code\n",
"\n",
"```C\n",
"#ifdef LONG_FOO\n",
"long foo() { ... }\n",
"#else\n",
"int foo() { ... }\n",
"#endif\n",
"```\n",
"\n",
"the compiler will compile the function `foo()` with return type`long` if the _preprocessor variable_ `LONG_FOO` is defined, and with return type `int` if not. Such preprocessor variables are either set in the source files (using `#define`, as in `#define LONG_FOO`) or on the C compiler command line (using `-D` or `-D=`, as in `-DLONG_FOO`."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Such *conditional compilation* is used to configure C programs towards their environment. System-specific code can contain lots of conditional compilation. As an example, consider this excerpt of `xmlparse.c`, the XML parser that is part of the Python runtime library:\n",
"\n",
"```c\n",
"#if defined(_WIN32) && !defined(LOAD_LIBRARY_SEARCH_SYSTEM32)\n",
"# define LOAD_LIBRARY_SEARCH_SYSTEM32 0x00000800\n",
"#endif\n",
"\n",
"#if !defined(HAVE_GETRANDOM) && !defined(HAVE_SYSCALL_GETRANDOM) \\\n",
" && !defined(HAVE_ARC4RANDOM_BUF) && !defined(HAVE_ARC4RANDOM) \\\n",
" && !defined(XML_DEV_URANDOM) \\\n",
" && !defined(_WIN32) \\\n",
" && !defined(XML_POOR_ENTROPY)\n",
"# error\n",
"#endif\n",
"\n",
"#if !defined(TIOCSWINSZ) || defined(__SCO__) || defined(__UNIXWARE__)\n",
"#define USE_SYSV_ENVVARS\t/* COLUMNS/LINES vs. TERMCAP */\n",
"#endif\n",
"\n",
"#ifdef XML_UNICODE_WCHAR_T\n",
"#define XML_T(x) (const wchar_t)x\n",
"#define XML_L(x) L ## x\n",
"#else\n",
"#define XML_T(x) (const unsigned short)x\n",
"#define XML_L(x) x\n",
"#endif\n",
"\n",
"int fun(int x) { return XML_T(x); }\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"A typical configuration for the C preprocessor on the above code could be `cc -c -D_WIN32 -DXML_POOR_ENTROPY -DXML_UNICODE_WCHAR_T xmlparse.c`, defining the given preprocessor variables and selecting the appropriate code fragments."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Since the compiler can only compile one configuration at a time (implying that we can also only _test_ one resulting executable at a time), your task is to find out which of these configurations actually compile. To this end, proceed in three steps."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"solution2": "hidden",
"solution2_first": true
},
"source": [
"#### Part 1: Extract Preprocessor Variables\n",
"\n",
"Write a _function_ `cpp_identifiers()` that, given a set of lines (say, from `open(filename).readlines()`), extracts all preprocessor variables referenced in `#if` or `#ifdef` preprocessor instructions. Apply `ifdef_identifiers()` on the sample C input above, such that\n",
"\n",
"```python\n",
"cpp_identifiers(open(\"xmlparse.c\").readlines()) \n",
"```\n",
"\n",
"returns the set\n",
"\n",
"```python\n",
"{'_WIN32', 'LOAD_LIBRARY_SEARCH_SYSTEM32', 'HAVE_GETRANDOM', 'HAVE_SYSCALL_GETRANDOM', 'HAVE_ARC4RANDOM_BUF', ...}\n",
"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"solution2": "hidden"
},
"source": [
"**Solution.** Let us start with creating a sample input file, `xmlparse.c`:"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:04.523773Z",
"iopub.status.busy": "2024-01-18T17:21:04.523652Z",
"iopub.status.idle": "2024-01-18T17:21:04.525549Z",
"shell.execute_reply": "2024-01-18T17:21:04.525284Z"
},
"slideshow": {
"slide_type": "skip"
},
"solution2": "hidden"
},
"outputs": [],
"source": [
"filename = \"xmlparse.c\""
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:04.527271Z",
"iopub.status.busy": "2024-01-18T17:21:04.527122Z",
"iopub.status.idle": "2024-01-18T17:21:04.529762Z",
"shell.execute_reply": "2024-01-18T17:21:04.529480Z"
},
"slideshow": {
"slide_type": "skip"
},
"solution2": "hidden"
},
"outputs": [],
"source": [
"open(filename, \"w\").write(\n",
" \"\"\"\n",
"#if defined(_WIN32) && !defined(LOAD_LIBRARY_SEARCH_SYSTEM32)\n",
"# define LOAD_LIBRARY_SEARCH_SYSTEM32 0x00000800\n",
"#endif\n",
"\n",
"#if !defined(HAVE_GETRANDOM) && !defined(HAVE_SYSCALL_GETRANDOM) \\\n",
" && !defined(HAVE_ARC4RANDOM_BUF) && !defined(HAVE_ARC4RANDOM) \\\n",
" && !defined(XML_DEV_URANDOM) \\\n",
" && !defined(_WIN32) \\\n",
" && !defined(XML_POOR_ENTROPY)\n",
"# error\n",
"#endif\n",
"\n",
"#if !defined(TIOCSWINSZ) || defined(__SCO__) || defined(__UNIXWARE__)\n",
"#define USE_SYSV_ENVVARS\t/* COLUMNS/LINES vs. TERMCAP */\n",
"#endif\n",
"\n",
"#ifdef XML_UNICODE_WCHAR_T\n",
"#define XML_T(x) (const wchar_t)x\n",
"#define XML_L(x) L ## x\n",
"#else\n",
"#define XML_T(x) (const unsigned short)x\n",
"#define XML_L(x) x\n",
"#endif\n",
"\n",
"int fun(int x) { return XML_T(x); }\n",
"\"\"\");"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"solution2": "hidden"
},
"source": [
"To find C preprocessor `#if` directives and preprocessor variables, we use regular expressions matching them."
]
},
{
"cell_type": "code",
"execution_count": 114,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:04.531367Z",
"iopub.status.busy": "2024-01-18T17:21:04.531163Z",
"iopub.status.idle": "2024-01-18T17:21:04.532902Z",
"shell.execute_reply": "2024-01-18T17:21:04.532625Z"
},
"slideshow": {
"slide_type": "skip"
},
"solution2": "hidden"
},
"outputs": [],
"source": [
"import re"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:04.534249Z",
"iopub.status.busy": "2024-01-18T17:21:04.534137Z",
"iopub.status.idle": "2024-01-18T17:21:04.536373Z",
"shell.execute_reply": "2024-01-18T17:21:04.536118Z"
},
"slideshow": {
"slide_type": "skip"
},
"solution2": "hidden"
},
"outputs": [],
"source": [
"re_cpp_if_directive = re.compile(r\"\\s*#\\s*(el)?if\")\n",
"re_cpp_identifier = re.compile(r\"[a-zA-Z_$]+\")"
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:04.537693Z",
"iopub.status.busy": "2024-01-18T17:21:04.537588Z",
"iopub.status.idle": "2024-01-18T17:21:04.539597Z",
"shell.execute_reply": "2024-01-18T17:21:04.539350Z"
},
"slideshow": {
"slide_type": "skip"
},
"solution2": "hidden"
},
"outputs": [],
"source": [
"def cpp_identifiers(lines):\n",
" identifiers = set()\n",
" for line in lines:\n",
" if re_cpp_if_directive.match(line):\n",
" identifiers |= set(re_cpp_identifier.findall(line))\n",
"\n",
" # These are preprocessor keywords\n",
" identifiers -= {\"if\", \"ifdef\", \"ifndef\", \"defined\"}\n",
" return identifiers"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:04.540988Z",
"iopub.status.busy": "2024-01-18T17:21:04.540879Z",
"iopub.status.idle": "2024-01-18T17:21:04.543338Z",
"shell.execute_reply": "2024-01-18T17:21:04.543040Z"
},
"slideshow": {
"slide_type": "skip"
},
"solution2": "hidden"
},
"outputs": [
{
"data": {
"text/plain": [
"{'HAVE_ARC',\n",
" 'HAVE_GETRANDOM',\n",
" 'HAVE_SYSCALL_GETRANDOM',\n",
" 'LOAD_LIBRARY_SEARCH_SYSTEM',\n",
" 'RANDOM',\n",
" 'RANDOM_BUF',\n",
" 'TIOCSWINSZ',\n",
" 'XML_DEV_URANDOM',\n",
" 'XML_POOR_ENTROPY',\n",
" 'XML_UNICODE_WCHAR_T',\n",
" '_WIN',\n",
" '__SCO__',\n",
" '__UNIXWARE__'}"
]
},
"execution_count": 117,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cpp_ids = cpp_identifiers(open(\"xmlparse.c\").readlines())\n",
"cpp_ids"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"solution2": "hidden",
"solution2_first": true
},
"source": [
"#### Part 2: Derive an Option Grammar\n",
"\n",
"With the help of `cpp_identifiers()`, create a grammar which has C compiler invocations with a list of options, where each option takes the form `-D` for a preprocessor variable ``. Using this grammar `cpp_grammar`, a fuzzer \n",
"\n",
"```python\n",
"g = GrammarCoverageFuzzer(cpp_grammar)\n",
"```\n",
"\n",
"would create C compiler invocations such as\n",
"\n",
"```python\n",
"[g.fuzz() for i in range(10)]\n",
"['cc -DHAVE_SYSCALL_GETRANDOM xmlparse.c',\n",
" 'cc -D__SCO__ -DRANDOM_BUF -DXML_UNICODE_WCHAR_T -D__UNIXWARE__ xmlparse.c',\n",
" 'cc -DXML_POOR_ENTROPY xmlparse.c',\n",
" 'cc -DRANDOM xmlparse.c',\n",
" 'cc -D_WIN xmlparse.c',\n",
" 'cc -DHAVE_ARC xmlparse.c', ...]\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"solution2": "hidden"
},
"source": [
"**Solution.** This is not very difficult:"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:04.545079Z",
"iopub.status.busy": "2024-01-18T17:21:04.544954Z",
"iopub.status.idle": "2024-01-18T17:21:04.546686Z",
"shell.execute_reply": "2024-01-18T17:21:04.546445Z"
},
"slideshow": {
"slide_type": "skip"
},
"solution2": "hidden"
},
"outputs": [],
"source": [
"from Grammars import Grammar, is_valid_grammar"
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:21:04.548290Z",
"iopub.status.busy": "2024-01-18T17:21:04.548171Z",
"iopub.status.idle": "2024-01-18T17:21:04.550444Z",
"shell.execute_reply": "2024-01-18T17:21:04.550188Z"
},
"slideshow": {
"slide_type": "skip"
},
"solution2": "hidden"
},
"outputs": [],
"source": [
"cpp_grammar: Grammar = {\n",
" \"\": [\"cc -c \" + filename],\n",
" \"\": [\"\", \"