{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"# Reducing Failure-Inducing Inputs\n",
"\n",
"By construction, fuzzers create inputs that may be hard to read. This causes issues during _debugging_, when a human has to analyze the exact cause of the failure. In this chapter, we present techniques that _automatically reduce and simplify failure-inducing inputs to a minimum_ in order to ease debugging."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.511416Z",
"iopub.status.busy": "2024-01-18T17:18:36.510759Z",
"iopub.status.idle": "2024-01-18T17:18:36.572317Z",
"shell.execute_reply": "2024-01-18T17:18:36.571978Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from bookutils import YouTubeVideo\n",
"YouTubeVideo('JOv1xGVdXAU')"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Prerequisites**\n",
"\n",
"* The simple \"delta debugging\" technique for reduction has no specific prerequisites.\n",
"* As reduction is typically used together with fuzzing, reading the [chapter on basic fuzzing](Fuzzer.ipynb) is a good idea.\n",
"* The later grammar-based techniques require knowledge on [derivation trees](GrammarFuzzer.ipynb) and [parsing](Parser.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Synopsis\n",
"\n",
"\n",
"To [use the code provided in this chapter](Importing.ipynb), write\n",
"\n",
"```python\n",
">>> from fuzzingbook.Reducer import \n",
"```\n",
"\n",
"and then make use of the following features.\n",
"\n",
"\n",
"A _reducer_ takes a failure-inducing input and reduces it to the minimum that still reproduces the failure. This chapter provides `Reducer` classes that implement such reducers.\n",
"\n",
"Here is a simple example: An arithmetic expression causes an error in the Python interpreter:\n",
"\n",
"```python\n",
">>> !python -c 'x = 1 + 2 * 3 / 0'\n",
"Traceback (most recent call last):\r\n",
" File \"\", line 1, in \r\n",
"ZeroDivisionError: division by zero\r\n",
"\n",
"```\n",
"Can we reduce this input to a minimum? To use a `Reducer`, one first has to build a `Runner` whose outcome is `FAIL` if the precise error occurs. We therefore build a `ZeroDivisionRunner` whose `run()` method will specifically return a `FAIL` outcome if a `ZeroDivisionError` occurs.\n",
"\n",
"```python\n",
">>> from Fuzzer import ProgramRunner\n",
">>> import subprocess\n",
">>> class ZeroDivisionRunner(ProgramRunner):\n",
">>> \"\"\"Make outcome 'FAIL' if ZeroDivisionError occurs\"\"\"\n",
">>> \n",
">>> def run(self, inp: str = \"\") -> Tuple[subprocess.CompletedProcess, Outcome]:\n",
">>> process, outcome = super().run(inp)\n",
">>> if process.stderr.find('ZeroDivisionError') >= 0:\n",
">>> outcome = 'FAIL'\n",
">>> return process, outcome\n",
"```\n",
"If we feed this expression into a `ZeroDivisionRunner`, it will produce an outcome of `FAIL` as designed.\n",
"\n",
"```python\n",
">>> python_input = \"x = 1 + 2 * 3 / 0\"\n",
">>> python_runner = ZeroDivisionRunner(\"python\")\n",
">>> process, outcome = python_runner.run(python_input)\n",
">>> outcome\n",
"'FAIL'\n",
"```\n",
"Delta Debugging is a simple and robust reduction algorithm. We can tie a `DeltaDebuggingReducer` to this runner, and have it determine the substring that causes the `python` program to fail:\n",
"\n",
"```python\n",
">>> dd = DeltaDebuggingReducer(python_runner)\n",
">>> dd.reduce(python_input)\n",
"'3/0'\n",
"```\n",
"The input is reduced to the minimum: We get the essence of the division by zero.\n",
"\n",
"![](PICS/Reducer-synopsis-1.svg)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": true,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Why Reducing?\n",
"\n",
"At this point, we have seen a number of test generation techniques that all in some form produce inputs in order to trigger failures. If they are successful – that is, the program actually fails – we must find out why the failure occurred and how to fix it."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": true,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Here's an example of such a situation. We have a class `MysteryRunner` with a `run()` method that – given its code – can occasionally fail. But under which circumstances does this actually happen? We have deliberately obscured the exact condition in order to make this non-obvious."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"button": false,
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.594592Z",
"iopub.status.busy": "2024-01-18T17:18:36.594412Z",
"iopub.status.idle": "2024-01-18T17:18:36.596571Z",
"shell.execute_reply": "2024-01-18T17:18:36.596304Z"
},
"new_sheet": false,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import bookutils.setup"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.598074Z",
"iopub.status.busy": "2024-01-18T17:18:36.597967Z",
"iopub.status.idle": "2024-01-18T17:18:36.599516Z",
"shell.execute_reply": "2024-01-18T17:18:36.599268Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from bookutils import quiz"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.600876Z",
"iopub.status.busy": "2024-01-18T17:18:36.600792Z",
"iopub.status.idle": "2024-01-18T17:18:36.602479Z",
"shell.execute_reply": "2024-01-18T17:18:36.602245Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from typing import Tuple, List, Sequence, Any, Optional"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.604056Z",
"iopub.status.busy": "2024-01-18T17:18:36.603963Z",
"iopub.status.idle": "2024-01-18T17:18:36.630635Z",
"shell.execute_reply": "2024-01-18T17:18:36.630361Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"from ExpectError import ExpectError"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.632256Z",
"iopub.status.busy": "2024-01-18T17:18:36.632166Z",
"iopub.status.idle": "2024-01-18T17:18:36.684307Z",
"shell.execute_reply": "2024-01-18T17:18:36.684031Z"
},
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"outputs": [],
"source": [
"from Fuzzer import RandomFuzzer, Runner, Outcome"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.685953Z",
"iopub.status.busy": "2024-01-18T17:18:36.685870Z",
"iopub.status.idle": "2024-01-18T17:18:36.687451Z",
"shell.execute_reply": "2024-01-18T17:18:36.687212Z"
},
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import re"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.688953Z",
"iopub.status.busy": "2024-01-18T17:18:36.688867Z",
"iopub.status.idle": "2024-01-18T17:18:36.690955Z",
"shell.execute_reply": "2024-01-18T17:18:36.690719Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class MysteryRunner(Runner):\n",
" def run(self, inp: str) -> Tuple[str, Outcome]:\n",
" x = inp.find(chr(0o17 + 0o31))\n",
" y = inp.find(chr(0o27 + 0o22))\n",
" if x >= 0 and y >= 0 and x < y:\n",
" return (inp, Runner.FAIL)\n",
" else:\n",
" return (inp, Runner.PASS)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Let us fuzz the function until we find a failing input."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.692579Z",
"iopub.status.busy": "2024-01-18T17:18:36.692427Z",
"iopub.status.idle": "2024-01-18T17:18:36.694802Z",
"shell.execute_reply": "2024-01-18T17:18:36.694541Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"mystery = MysteryRunner()\n",
"random_fuzzer = RandomFuzzer()\n",
"while True:\n",
" inp = random_fuzzer.fuzz()\n",
" result, outcome = mystery.run(inp)\n",
" if outcome == mystery.FAIL:\n",
" break"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.696222Z",
"iopub.status.busy": "2024-01-18T17:18:36.696133Z",
"iopub.status.idle": "2024-01-18T17:18:36.698279Z",
"shell.execute_reply": "2024-01-18T17:18:36.698032Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"' 7:,>((/$$-/->.;.=;(.%!:50#7*8=$&&=$9!%6(4=&69\\':\\'<3+0-3.24#7=!&60)2/+\";+<7+1<2!4$>92+$1<(3%&5\\'\\'>#'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"failing_input = result\n",
"failing_input"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Something in this input causes `MysteryRunner` to fail. But what is it?"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": true,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Manual Input Reduction\n",
"\n",
"One important step in the debugging process is _reduction_ – that is, to identify those circumstances of a failure that are relevant for the failure to occur, and to _omit_ (if possible) those parts that are not. As Kernighan and Pike \\cite{Kernighan1999} put it:\n",
"\n",
"> For every circumstance of the problem, check whether it is relevant for the problem to occur. If it is not, remove it from the problem report or the test case in question."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Specifically for inputs, they suggest a _divide and conquer_ process:\n",
"\n",
"> Proceed by binary search. Throw away half the input and see if the output is still wrong; if not, go back to the previous state and discard the other half of the input.\n",
"\n",
"This is something we can easily try out, using our last generated input:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.699802Z",
"iopub.status.busy": "2024-01-18T17:18:36.699698Z",
"iopub.status.idle": "2024-01-18T17:18:36.701629Z",
"shell.execute_reply": "2024-01-18T17:18:36.701380Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"' 7:,>((/$$-/->.;.=;(.%!:50#7*8=$&&=$9!%6(4=&69\\':\\'<3+0-3.24#7=!&60)2/+\";+<7+1<2!4$>92+$1<(3%&5\\'\\'>#'"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"failing_input"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"For instance, we can see whether the error still occurs if we only feed in the first half:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.703020Z",
"iopub.status.busy": "2024-01-18T17:18:36.702922Z",
"iopub.status.idle": "2024-01-18T17:18:36.705076Z",
"shell.execute_reply": "2024-01-18T17:18:36.704836Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(\" 7:,>((/$$-/->.;.=;(.%!:50#7*8=$&&=$9!%6(4=&69':\", 'PASS')"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"half_length = len(failing_input) // 2 # // is integer division\n",
"first_half = failing_input[:half_length]\n",
"mystery.run(first_half)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Nope – the first half alone does not suffice. Maybe the second half?"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.706517Z",
"iopub.status.busy": "2024-01-18T17:18:36.706414Z",
"iopub.status.idle": "2024-01-18T17:18:36.708437Z",
"shell.execute_reply": "2024-01-18T17:18:36.708166Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"('\\'<3+0-3.24#7=!&60)2/+\";+<7+1<2!4$>92+$1<(3%&5\\'\\'>#', 'PASS')"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"second_half = failing_input[half_length:]\n",
"mystery.run(second_half)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"This did not go so well either. We may still proceed by cutting away _smaller chunks_ – say, one character after another. If our test is deterministic and easily repeated, it is clear that this process eventually will yield a reduced input. But still, it is a rather inefficient process, especially for long inputs. What we need is a _strategy_ that effectively minimizes a failure-inducing input – a strategy that can be automated."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Delta Debugging"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"One strategy to effectively reduce failure-inducing inputs is _delta debugging_ \\cite{Zeller2002}. Delta Debugging implements the \"binary search\" strategy, as listed above, but with a twist: If neither half fails (also as above), it keeps on cutting away smaller and smaller chunks from the input, until it eliminates individual characters. Thus, after cutting away the first half, we cut away\n",
"the first quarter, the second quarter, and so on."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Let us illustrate this on our example, and see what happens if we cut away the first quarter."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.710284Z",
"iopub.status.busy": "2024-01-18T17:18:36.710164Z",
"iopub.status.idle": "2024-01-18T17:18:36.712622Z",
"shell.execute_reply": "2024-01-18T17:18:36.712362Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"('50#7*8=$&&=$9!%6(4=&69\\':\\'<3+0-3.24#7=!&60)2/+\";+<7+1<2!4$>92+$1<(3%&5\\'\\'>#',\n",
" 'FAIL')"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"quarter_length = len(failing_input) // 4\n",
"input_without_first_quarter = failing_input[quarter_length:]\n",
"mystery.run(input_without_first_quarter)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Ah! This has failed, and reduced our failing input by 25%. Let's remove another quarter."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.714120Z",
"iopub.status.busy": "2024-01-18T17:18:36.714022Z",
"iopub.status.idle": "2024-01-18T17:18:36.716189Z",
"shell.execute_reply": "2024-01-18T17:18:36.715921Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"('\\'<3+0-3.24#7=!&60)2/+\";+<7+1<2!4$>92+$1<(3%&5\\'\\'>#', 'PASS')"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"input_without_first_and_second_quarter = failing_input[quarter_length * 2:]\n",
"mystery.run(input_without_first_and_second_quarter)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"This is not too surprising, as we had that one before:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.717639Z",
"iopub.status.busy": "2024-01-18T17:18:36.717530Z",
"iopub.status.idle": "2024-01-18T17:18:36.719399Z",
"shell.execute_reply": "2024-01-18T17:18:36.719148Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'\\'<3+0-3.24#7=!&60)2/+\";+<7+1<2!4$>92+$1<(3%&5\\'\\'>#'"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"second_half"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.720807Z",
"iopub.status.busy": "2024-01-18T17:18:36.720707Z",
"iopub.status.idle": "2024-01-18T17:18:36.722622Z",
"shell.execute_reply": "2024-01-18T17:18:36.722354Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"'\\'<3+0-3.24#7=!&60)2/+\";+<7+1<2!4$>92+$1<(3%&5\\'\\'>#'"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"input_without_first_and_second_quarter"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"How about removing the third quarter, then?"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.724038Z",
"iopub.status.busy": "2024-01-18T17:18:36.723929Z",
"iopub.status.idle": "2024-01-18T17:18:36.726234Z",
"shell.execute_reply": "2024-01-18T17:18:36.725960Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(\"50#7*8=$&&=$9!%6(4=&69':<7+1<2!4$>92+$1<(3%&5''>#\", 'PASS')"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"input_without_first_and_third_quarter = failing_input[quarter_length:\n",
" quarter_length * 2] + failing_input[quarter_length * 3:]\n",
"mystery.run(input_without_first_and_third_quarter)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Ok. Let us remove the fourth quarter."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.727837Z",
"iopub.status.busy": "2024-01-18T17:18:36.727724Z",
"iopub.status.idle": "2024-01-18T17:18:36.729942Z",
"shell.execute_reply": "2024-01-18T17:18:36.729668Z"
},
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"('50#7*8=$&&=$9!%6(4=&69\\':\\'<3+0-3.24#7=!&60)2/+\";+', 'FAIL')"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"input_without_first_and_fourth_quarter = failing_input[quarter_length:quarter_length * 3]\n",
"mystery.run(input_without_first_and_fourth_quarter)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Yes! This has succeeded. Our input is now 50% smaller."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We have now tried to remove pieces that make up $\\frac{1}{2}$ and $\\frac{1}{4}$ of the original failing string. In the next iteration, we would go and remove even smaller pieces – $\\frac{1}{8}$, $\\frac{1}{16}$ and so on. We continue until we are down to $\\frac{1}{97}$ – that is, individual characters."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"However, this is something we happily let a computer do for us. We first introduce a `Reducer` class as an abstract superclass for all kinds of reducers. The `test()` method runs a single test (with logging, if wanted); the `reduce()` method will eventually reduce an input to the minimum."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.731543Z",
"iopub.status.busy": "2024-01-18T17:18:36.731441Z",
"iopub.status.idle": "2024-01-18T17:18:36.734246Z",
"shell.execute_reply": "2024-01-18T17:18:36.733945Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class Reducer:\n",
" \"\"\"Base class for reducers.\"\"\"\n",
"\n",
" def __init__(self, runner: Runner, log_test: bool = False) -> None:\n",
" \"\"\"Attach reducer to the given `runner`\"\"\"\n",
" self.runner = runner\n",
" self.log_test = log_test\n",
" self.reset()\n",
"\n",
" def reset(self) -> None:\n",
" \"\"\"Reset the test counter to zero. To be extended in subclasses.\"\"\"\n",
" self.tests = 0\n",
"\n",
" def test(self, inp: str) -> Outcome:\n",
" \"\"\"Test with input `inp`. Return outcome.\n",
" To be extended in subclasses.\"\"\"\n",
"\n",
" result, outcome = self.runner.run(inp)\n",
" self.tests += 1\n",
" if self.log_test:\n",
" print(\"Test #%d\" % self.tests, repr(inp), repr(len(inp)), outcome)\n",
" return outcome\n",
"\n",
" def reduce(self, inp: str) -> str:\n",
" \"\"\"Reduce input `inp`. Return reduced input.\n",
" To be defined in subclasses.\"\"\"\n",
"\n",
" self.reset()\n",
" # Default: Don't reduce\n",
" return inp"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The `CachingReducer` variant saves test results, such that we don't have to run the same tests again and again:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.735750Z",
"iopub.status.busy": "2024-01-18T17:18:36.735649Z",
"iopub.status.idle": "2024-01-18T17:18:36.737859Z",
"shell.execute_reply": "2024-01-18T17:18:36.737601Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class CachingReducer(Reducer):\n",
" \"\"\"A reducer that also caches test outcomes\"\"\"\n",
"\n",
" def reset(self):\n",
" super().reset()\n",
" self.cache = {}\n",
"\n",
" def test(self, inp):\n",
" if inp in self.cache:\n",
" return self.cache[inp]\n",
"\n",
" outcome = super().test(inp)\n",
" self.cache[inp] = outcome\n",
" return outcome"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Here comes the _Delta Debugging_ reducer. Delta Debugging implements the strategy sketched above: It first removes larger chunks of size $\\frac{1}{2}$; if this does not fail, then we proceed to chunks of size $\\frac{1}{4}$, then $\\frac{1}{8}$ and so on."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Our implementation uses almost the same Python code as Zeller in \\cite{Zeller2002}; the only difference is that it has been adapted to work on Python 3 and our `Runner` framework. The variable `n` (initially 2) indicates the granularity – in each step, chunks of size $\\frac{1}{n}$ are cut away. If none of the test fails (`some_complement_is_failing` is False), then `n` is doubled – until it reaches the length of the input."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.739545Z",
"iopub.status.busy": "2024-01-18T17:18:36.739428Z",
"iopub.status.idle": "2024-01-18T17:18:36.742399Z",
"shell.execute_reply": "2024-01-18T17:18:36.742102Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"class DeltaDebuggingReducer(CachingReducer):\n",
" \"\"\"Reduce inputs using delta debugging.\"\"\"\n",
"\n",
" def reduce(self, inp: str) -> str:\n",
" \"\"\"Reduce input `inp` using delta debugging. Return reduced input.\"\"\"\n",
"\n",
" self.reset()\n",
" assert self.test(inp) != Runner.PASS\n",
"\n",
" n = 2 # Initial granularity\n",
" while len(inp) >= 2:\n",
" start = 0.0\n",
" subset_length = len(inp) / n\n",
" some_complement_is_failing = False\n",
"\n",
" while start < len(inp):\n",
" complement = inp[:int(start)] + \\\n",
" inp[int(start + subset_length):]\n",
"\n",
" if self.test(complement) == Runner.FAIL:\n",
" inp = complement\n",
" n = max(n - 1, 2)\n",
" some_complement_is_failing = True\n",
" break\n",
"\n",
" start += subset_length\n",
"\n",
" if not some_complement_is_failing:\n",
" if n == len(inp):\n",
" break\n",
" n = min(n * 2, len(inp))\n",
"\n",
" return inp"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"To see how the `DeltaDebuggingReducer` works, let us run it on our failing input. With each step, we see how the remaining input gets smaller and smaller, until only two characters remain:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.744073Z",
"iopub.status.busy": "2024-01-18T17:18:36.743960Z",
"iopub.status.idle": "2024-01-18T17:18:36.747364Z",
"shell.execute_reply": "2024-01-18T17:18:36.747089Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Test #1 ' 7:,>((/$$-/->.;.=;(.%!:50#7*8=$&&=$9!%6(4=&69\\':\\'<3+0-3.24#7=!&60)2/+\";+<7+1<2!4$>92+$1<(3%&5\\'\\'>#' 97 FAIL\n",
"Test #2 '\\'<3+0-3.24#7=!&60)2/+\";+<7+1<2!4$>92+$1<(3%&5\\'\\'>#' 49 PASS\n",
"Test #3 \" 7:,>((/$$-/->.;.=;(.%!:50#7*8=$&&=$9!%6(4=&69':\" 48 PASS\n",
"Test #4 '50#7*8=$&&=$9!%6(4=&69\\':\\'<3+0-3.24#7=!&60)2/+\";+<7+1<2!4$>92+$1<(3%&5\\'\\'>#' 73 FAIL\n",
"Test #5 \"50#7*8=$&&=$9!%6(4=&69':<7+1<2!4$>92+$1<(3%&5''>#\" 49 PASS\n",
"Test #6 '50#7*8=$&&=$9!%6(4=&69\\':\\'<3+0-3.24#7=!&60)2/+\";+' 48 FAIL\n",
"Test #7 '\\'<3+0-3.24#7=!&60)2/+\";+' 24 PASS\n",
"Test #8 \"50#7*8=$&&=$9!%6(4=&69':\" 24 PASS\n",
"Test #9 '9!%6(4=&69\\':\\'<3+0-3.24#7=!&60)2/+\";+' 36 FAIL\n",
"Test #10 '9!%6(4=&69\\':=!&60)2/+\";+' 24 FAIL\n",
"Test #11 '=!&60)2/+\";+' 12 PASS\n",
"Test #12 \"9!%6(4=&69':\" 12 PASS\n",
"Test #13 '=&69\\':=!&60)2/+\";+' 18 PASS\n",
"Test #14 '9!%6(4=!&60)2/+\";+' 18 FAIL\n",
"Test #15 '9!%6(42/+\";+' 12 PASS\n",
"Test #16 '9!%6(4=!&60)' 12 FAIL\n",
"Test #17 '=!&60)' 6 PASS\n",
"Test #18 '9!%6(4' 6 PASS\n",
"Test #19 '6(4=!&60)' 9 FAIL\n",
"Test #20 '6(460)' 6 FAIL\n",
"Test #21 '60)' 3 PASS\n",
"Test #22 '6(4' 3 PASS\n",
"Test #23 '(460)' 5 FAIL\n",
"Test #24 '460)' 4 PASS\n",
"Test #25 '(0)' 3 FAIL\n",
"Test #26 '0)' 2 PASS\n",
"Test #27 '(' 1 PASS\n",
"Test #28 '()' 2 FAIL\n",
"Test #29 ')' 1 PASS\n"
]
},
{
"data": {
"text/plain": [
"'()'"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dd_reducer = DeltaDebuggingReducer(mystery, log_test=True)\n",
"dd_reducer.reduce(failing_input)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Now we know why `MysteryRunner` fails – it suffices that the input contains two matching parentheses. Delta Debugging determines this in 29 steps. Its result is _1-minimal_, meaning that every character contained is required to produce the error; removing any (as seen in tests `#27` and `#29`, above) no longer makes the test fail. This property is guaranteed by the delta debugging algorithm, which in its last stage always tries to delete characters one by one."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": true,
"run_control": {
"read_only": false
},
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"A reduced test case such as the one above has many advantages:\n",
"\n",
"* A reduced test case __reduces the _cognitive load_ of the programmer__. The test case is shorter and focused, and thus does not burden the programmer with irrelevant details. A reduced input typically leads to shorter executions and smaller program states, both of which reduce the search space as it comes to understanding the bug. In our case, we have eliminated lots of irrelevant input – only the two characters the reduced input contains are relevant.\n",
"\n",
"* A reduced test case __is easier to communicate__. All one needs here is the summary: `MysteryRunner fails on \"()\"`, which is much better than `MysteryRunner fails on a 4100-character input (attached)`.\n",
"\n",
"* A reduced test case helps in __identifying duplicates__. If similar bugs have been reported already, and all of them have been reduced to the same cause (namely that the input contains matching parentheses), then it becomes obvious that all these bugs are different symptoms of the same underlying cause – and would all be resolved at once with one code fix."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"How effective is delta debugging? In the best case (when the left half or the right half fails), the number of tests is logarithmic proportional to the length $n$ of an input (i.e., $O(\\log_2 n)$); this is the same complexity as binary search. In the worst case, though, delta debugging can require a number of tests proportional to $n^2$ (i.e., $O(n^2)$) – this happens in the case when we are down to character granularity, and we have to repeatedly tried to delete all characters, only to find that deleting the last character results in a failure \\cite{Zeller2002}. (This is a pretty pathological situation, though.)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"In general, delta debugging is a robust algorithm that is easy to implement, easy to deploy, and easy to use – provided that the underlying test case is deterministic and runs quickly enough to warrant a number of experiments. As these are the same prerequisites that make fuzzing effective, delta debugging makes an excellent companion to fuzzing."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"execution": {
"iopub.execute_input": "2024-01-18T17:18:36.748920Z",
"iopub.status.busy": "2024-01-18T17:18:36.748840Z",
"iopub.status.idle": "2024-01-18T17:18:36.752944Z",
"shell.execute_reply": "2024-01-18T17:18:36.752717Z"
},
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" \n",
"
\n",
"
Quiz
\n",
"
\n",
"
What happens if the function under test does not fail?