{ "cells": [ { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Carving Unit Tests\n", "\n", "So far, we have always generated _system input_, i.e. data that the program as a whole obtains via its input channels. If we are interested in testing only a small set of functions, having to go through the system can be very inefficient. This chapter introduces a technique known as _carving_, which, given a system test, automatically extracts a set of _unit tests_ that replicate the calls seen during the system test. The key idea is to _record_ such calls such that we can _replay_ them later – as a whole or selectively. On top, we also explore how to synthesize API grammars from carved unit tests; this means that we can _synthesize API tests without having to write a grammar at all._" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "**Prerequisites**\n", "\n", "* Carving makes use of dynamic traces of function calls and variables, as introduced in the [chapter on configuration fuzzing](ConfigurationFuzzer.ipynb).\n", "* Using grammars to test units was introduced in the [chapter on API fuzzing](APIFuzzer.ipynb)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-01-07T14:21:30.151172Z", "iopub.status.busy": "2023-01-07T14:21:30.150761Z", "iopub.status.idle": "2023-01-07T14:21:30.184310Z", "shell.execute_reply": "2023-01-07T14:21:30.184524Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import bookutils" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2023-01-07T14:21:30.186485Z", "iopub.status.busy": "2023-01-07T14:21:30.186193Z", "iopub.status.idle": "2023-01-07T14:21:30.936838Z", "shell.execute_reply": "2023-01-07T14:21:30.937071Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import APIFuzzer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Synopsis\n", "\n", "\n", "To [use the code provided in this chapter](Importing.ipynb), write\n", "\n", "```python\n", ">>> from fuzzingbook.Carver import \n", "```\n", "\n", "and then make use of the following features.\n", "\n", "\n", "This chapter provides means to _record and replay function calls_ during a system test. Since individual function calls are much faster than a whole system run, such \"carving\" mechanisms have the potential to run tests much faster.\n", "\n", "### Recording Calls\n", "\n", "The `CallCarver` class records all calls occurring while it is active. It is used in conjunction with a `with` clause:\n", "\n", "```python\n", ">>> with CallCarver() as carver:\n", ">>> y = my_sqrt(2)\n", ">>> y = my_sqrt(4)\n", "```\n", "After execution, `called_functions()` lists the names of functions encountered:\n", "\n", "```python\n", ">>> carver.called_functions()\n", "['my_sqrt', '__exit__']\n", "```\n", "The `arguments()` method lists the arguments recorded for a function. This is a mapping of the function name to a list of lists of arguments; each argument is a pair (parameter name, value).\n", "\n", "```python\n", ">>> carver.arguments('my_sqrt')\n", "[[('x', 2)], [('x', 4)]]\n", "```\n", "Complex arguments are properly serialized, such that they can be easily restored.\n", "\n", "### Synthesizing Calls\n", "\n", "While such recorded arguments already could be turned into arguments and calls, a much nicer alternative is to create a _grammar_ for recorded calls. This allows to synthesize arbitrary _combinations_ of arguments, and also offers a base for further customization of calls.\n", "\n", "The `CallGrammarMiner` class turns a list of carved executions into a grammar.\n", "\n", "```python\n", ">>> my_sqrt_miner = CallGrammarMiner(carver)\n", ">>> my_sqrt_grammar = my_sqrt_miner.mine_call_grammar()\n", ">>> my_sqrt_grammar\n", "{'': [''],\n", " '': [''],\n", " '': ['2', '4'],\n", " '': ['my_sqrt()']}\n", "```\n", "This grammar can be used to synthesize calls.\n", "\n", "```python\n", ">>> fuzzer = GrammarCoverageFuzzer(my_sqrt_grammar)\n", ">>> fuzzer.fuzz()\n", "'my_sqrt(4)'\n", "```\n", "These calls can be executed in isolation, effectively extracting unit tests from system tests:\n", "\n", "```python\n", ">>> eval(fuzzer.fuzz())\n", "1.414213562373095\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## System Tests vs Unit Tests\n", "\n", "Remember the URL grammar introduced for [grammar fuzzing](Grammars.ipynb)? With such a grammar, we can happily test a Web browser again and again, checking how it reacts to arbitrary page requests.\n", "\n", "Let us define a very simple \"web browser\" that goes and downloads the content given by the URL." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-01-07T14:21:30.939867Z", "iopub.status.busy": "2023-01-07T14:21:30.939463Z", "iopub.status.idle": "2023-01-07T14:21:30.940704Z", "shell.execute_reply": "2023-01-07T14:21:30.940918Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import urllib.parse" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2023-01-07T14:21:30.942963Z", "iopub.status.busy": "2023-01-07T14:21:30.942614Z", "iopub.status.idle": "2023-01-07T14:21:30.943920Z", "shell.execute_reply": "2023-01-07T14:21:30.944123Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def webbrowser(url):\n", " \"\"\"Download the http/https resource given by the URL\"\"\"\n", " import requests # Only import if needed\n", "\n", " r = requests.get(url)\n", " return r.text" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let us apply this on [fuzzingbook.org](https://www.fuzzingbook.org/) and measure the time, using the [Timer class](Timer.ipynb):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2023-01-07T14:21:30.945893Z", "iopub.status.busy": "2023-01-07T14:21:30.945569Z", "iopub.status.idle": "2023-01-07T14:21:30.946766Z", "shell.execute_reply": "2023-01-07T14:21:30.947036Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from Timer import Timer" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2023-01-07T14:21:30.949028Z", "iopub.status.busy": "2023-01-07T14:21:30.948726Z", "iopub.status.idle": "2023-01-07T14:21:31.461376Z", "shell.execute_reply": "2023-01-07T14:21:31.461588Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloaded 470494 bytes in 0.49 seconds\n" ] } ], "source": [ "with Timer() as webbrowser_timer:\n", " fuzzingbook_contents = webbrowser(\n", " \"http://www.fuzzingbook.org/html/Fuzzer.html\")\n", "\n", "print(\"Downloaded %d bytes in %.2f seconds\" %\n", " (len(fuzzingbook_contents), webbrowser_timer.elapsed_time()))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2023-01-07T14:21:31.465040Z", "iopub.status.busy": "2023-01-07T14:21:31.464723Z", "iopub.status.idle": "2023-01-07T14:21:31.466433Z", "shell.execute_reply": "2023-01-07T14:21:31.466641Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'\\n\\n