{ "cells": [ { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" }, "toc-hr-collapsed": false }, "source": [ "# Testing Configurations\n", "\n", "The behavior of a program is not only governed by its data. The _configuration_ of a program – that is, the settings that govern the execution of a program on its (regular) input data, as set by options or configuration files – just as well influences behavior, and thus can and should be tested. In this chapter, we explore how to systematically _test_ and _cover_ software configurations. By _automatically inferring configuration options_, we can apply these techniques out of the box, with no need for writing a grammar. Finally, we show how to systematically cover _combinations_ of configuration options, quickly detecting unwanted interferences." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" } }, "source": [ "**Prerequisites**\n", "\n", "* You should have read the [chapter on grammars](Grammars.ipynb).\n", "* You should have read the [chapter on grammar coverage](GrammarCoverageFuzzer.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Configuration Options\n", "\n", "When we talk about the input to a program, we usually think of the _data_ it processes. This is also what we have been fuzzing in the past chapters – be it with [random input](Fuzzer.ipynb), [mutation-based fuzzing](MutationFuzzer.ipynb), or [grammar-based fuzzing](GrammarFuzzer.ipynb). However, programs typically have several input sources, all of which can and should be tested – and included in test generation." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "One important source of input is the program's _configuration_ – that is, a set of inputs that typically is set once when beginning to process data and then stays constant while processing data, while the program is running, or even while the program is deployed. Such a configuration is frequently set in _configuration files_ (for instance, as key/value pairs); the most ubiquitous method for command-line tools, though, are _configuration options_ on the command line." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As an example, consider the `grep` utility to find textual patterns in files. The exact mode by which `grep` works is governed by a multitude of options, which can be listed by providing a `--help` option:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "!grep --help" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "All these options need to be tested for whether they operate correctly. In security testing, any such option may also trigger a yet unknown vulnerability. Hence, such options can become _fuzz targets_ on their own. In this chapter, we analyze how to systematically test such options – and better yet, how to extract possible configurations right out of given program files, such that we do not have to specify anything." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Options in Python\n", "\n", "Let us stick to our common programming language here and examine how options are processed in Python. The `argparse` module provides a parser for command-line arguments (and options) with great functionality – and great complexity. You start by defining a parser (`argparse.ArgumentParser()`) to which individual arguments with various features are added, one after another. Additional parameters for each argument can specify the type (`type`) of the argument (say, integers or strings), or the number of arguments (`nargs`)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "By default, arguments are stored under their name in the `args` object coming from `parse_args()` – thus, `args.integers` holds the `integers` arguments added earlier. Special actions (`actions`) allow to store specific values in given variables; the `store_const` action stores the given `const` in the attribute named by `dest`. The following example takes a number of integer arguments (`integers`) as well as an operator (`--sum`, `--min`, or `--max`) to be applied on these integers. The operators all store a function reference in the `accumulate` attribute, which is finally invoked on the integers parsed:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import argparse" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def process_numbers(args=[]):\n", " parser = argparse.ArgumentParser(description='Process some integers.')\n", " parser.add_argument('integers', metavar='N', type=int, nargs='+',\n", " help='an integer for the accumulator')\n", " group = parser.add_mutually_exclusive_group(required=True)\n", " group.add_argument('--sum', dest='accumulate', action='store_const',\n", " const=sum,\n", " help='sum the integers')\n", " group.add_argument('--min', dest='accumulate', action='store_const',\n", " const=min,\n", " help='compute the minimum')\n", " group.add_argument('--max', dest='accumulate', action='store_const',\n", " const=max,\n", " help='compute the maximum')\n", "\n", " args = parser.parse_args(args)\n", " print(args.accumulate(args.integers))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here's how `process_numbers()` works. We can, for instance, invoke the `--min` option on the given arguments to compute the minimum:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "process_numbers([\"--min\", \"100\", \"200\", \"300\"])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Or compute the sum of three numbers:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "process_numbers([\"--sum\", \"1\", \"2\", \"3\"])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "When defined via `add_mutually_exclusive_group()` (as above), options are mutually exclusive. Consequently, we can have only one operator:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import fuzzingbook_utils" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ExpectError import ExpectError" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ExpectError(print_traceback=False):\n", " process_numbers([\"--sum\", \"--max\", \"1\", \"2\", \"3\"])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## A Grammar for Configurations\n", "\n", "How can we test a system with several options? The easiest answer is to write a grammar for it. The grammar `PROCESS_NUMBERS_EBNF_GRAMMAR` reflects the possible combinations of options and arguments:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from Grammars import crange, srange, convert_ebnf_grammar, extend_grammar, is_valid_grammar\n", "from Grammars import START_SYMBOL, new_symbol" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "PROCESS_NUMBERS_EBNF_GRAMMAR = {\n", " \"\": [\" \"],\n", " \"\": [\"--sum\", \"--min\", \"--max\"],\n", " \"\": [\"\", \" \"],\n", " \"\": [\"+\"],\n", " \"\": crange('0', '9')\n", "}\n", "\n", "assert is_valid_grammar(PROCESS_NUMBERS_EBNF_GRAMMAR)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "PROCESS_NUMBERS_GRAMMAR = convert_ebnf_grammar(PROCESS_NUMBERS_EBNF_GRAMMAR)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can feed this grammar into our [grammar coverage fuzzer](GrammarCoverageFuzzer.ipynb) and have it cover one option after another:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from GrammarCoverageFuzzer import GrammarCoverageFuzzer" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "f = GrammarCoverageFuzzer(PROCESS_NUMBERS_GRAMMAR, min_nonterminals=10)\n", "for i in range(3):\n", " print(f.fuzz())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Of course, we can also invoke `process_numbers()` with these very arguments. To this end, we need to convert the string produced by the grammar back into a list of individual arguments, using `split()`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "f = GrammarCoverageFuzzer(PROCESS_NUMBERS_GRAMMAR, min_nonterminals=10)\n", "for i in range(3):\n", " args = f.fuzz().split()\n", " print(args)\n", " process_numbers(args)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In a similar way, we can define grammars for any program to be tested; as well as define grammars for, say, configuration files. Yet, the grammar has to be updated with every change to the program, which creates a maintenance burden. Given that the information required for the grammar is already all encoded in the program, the question arises: _Can't we go and extract configuration options right out of the program in the first place?_" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "## Mining Configuration Options\n", "\n", "In this section, we try to extract option and argument information right out of a program, such that we do not have to specify a configuration grammar. The aim is to have a configuration fuzzer that works on the options and arguments of an arbitrary program, as long as it follows specific conventions for processing its arguments. In the case of Python programs, this means using the `argparse` module.\n", "\n", "Our idea is as follows: We execute the given program up to the point where the arguments are actually parsed – that is, `argparse.parse_args()` is invoked. Up to this point, we track all calls into the argument parser, notably those calls that define arguments and options (`add_argument()`). From these, we construct the grammar." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "### Tracking Arguments\n", "\n", "Let us illustrate this approach with a simple experiment: We define a trace function (see [our chapter on coverage](Coverage.ipynb) for details) that is active while `process_numbers` is invoked. If we have a call to a method `add_argument`, we access and print out the local variables (which at this point are the arguments to the method)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import sys" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import string" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def traceit(frame, event, arg):\n", " if event != \"call\":\n", " return\n", " method_name = frame.f_code.co_name\n", " if method_name != \"add_argument\":\n", " return\n", " locals = frame.f_locals\n", " print(method_name, locals)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "What we get is a list of all calls to `add_argument()`, together with the method arguments passed:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "sys.settrace(traceit)\n", "process_numbers([\"--sum\", \"1\", \"2\", \"3\"])\n", "sys.settrace(None)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "From the `args` argument, we can access the individual options and arguments to be defined:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def traceit(frame, event, arg):\n", " if event != \"call\":\n", " return\n", " method_name = frame.f_code.co_name\n", " if method_name != \"add_argument\":\n", " return\n", " locals = frame.f_locals\n", " print(locals['args'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "sys.settrace(traceit)\n", "process_numbers([\"--sum\", \"1\", \"2\", \"3\"])\n", "sys.settrace(None)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We see that each argument comes as a tuple with one (say, `integers` or `--sum`) or two members (`-h` and `--help`), which denote alternate forms for the same option. Our job will be to go through the arguments of `add_arguments()` and detect not only the names of options and arguments, but also whether they accept additional parameters, as well as the type of the parameters." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### A Grammar Miner for Options and Arguments" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us now build a class that gathers all this information to create a grammar." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We use the `ParseInterrupt` exception to interrupt program execution after gathering all arguments and options:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class ParseInterrupt(Exception):\n", " pass" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The class `OptionGrammarMiner` takes an executable function for which the grammar of options and arguments is to be mined:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class OptionGrammarMiner(object):\n", " def __init__(self, function, log=False):\n", " self.function = function\n", " self.log = log" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The method `mine_ebnf_grammar()` is where everything happens. It creates a grammar of the form\n", "\n", "```\n", " ::=