{ "cells": [ { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Carving Unit Tests\n", "\n", "So far, we have always generated _system input_, i.e. data that the program as a whole obtains via its input channels. If we are interested in testing only a small set of functions, having to go through the system can be very inefficient. This chapter introduces a technique known as _carving_, which, given a system test, automatically extracts a set of _unit tests_ that replicate the calls seen during the system test. The key idea is to _record_ such calls such that we can _replay_ them later – as a whole or selectively. On top, we also explore how to synthesize API grammars from carved unit tests; this means that we can _synthesize API tests without having to write a grammar at all._" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "**Prerequisites**\n", "\n", "* Carving makes use of dynamic traces of function calls and variables, as introduced in the [chapter on configuration fuzzing](ConfigurationFuzzer.ipynb).\n", "* Using grammars to test units was introduced in the [chapter on API fuzzing](APIFuzzer.ipynb)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:21:14.708722Z", "iopub.status.busy": "2024-01-18T17:21:14.708528Z", "iopub.status.idle": "2024-01-18T17:21:14.761806Z", "shell.execute_reply": "2024-01-18T17:21:14.761360Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import bookutils.setup" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:21:14.764915Z", "iopub.status.busy": "2024-01-18T17:21:14.764579Z", "iopub.status.idle": "2024-01-18T17:21:15.620041Z", "shell.execute_reply": "2024-01-18T17:21:15.619627Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import APIFuzzer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Synopsis\n", "\n", "\n", "To [use the code provided in this chapter](Importing.ipynb), write\n", "\n", "```python\n", ">>> from fuzzingbook.Carver import \n", "```\n", "\n", "and then make use of the following features.\n", "\n", "\n", "This chapter provides means to _record and replay function calls_ during a system test. Since individual function calls are much faster than a whole system run, such \"carving\" mechanisms have the potential to run tests much faster.\n", "\n", "### Recording Calls\n", "\n", "The `CallCarver` class records all calls occurring while it is active. It is used in conjunction with a `with` clause:\n", "\n", "```python\n", ">>> with CallCarver() as carver:\n", ">>> y = my_sqrt(2)\n", ">>> y = my_sqrt(4)\n", "```\n", "After execution, `called_functions()` lists the names of functions encountered:\n", "\n", "```python\n", ">>> carver.called_functions()\n", "['my_sqrt', '__exit__']\n", "```\n", "The `arguments()` method lists the arguments recorded for a function. This is a mapping of the function name to a list of lists of arguments; each argument is a pair (parameter name, value).\n", "\n", "```python\n", ">>> carver.arguments('my_sqrt')\n", "[[('x', 2)], [('x', 4)]]\n", "```\n", "Complex arguments are properly serialized, such that they can be easily restored.\n", "\n", "### Synthesizing Calls\n", "\n", "While such recorded arguments already could be turned into arguments and calls, a much nicer alternative is to create a _grammar_ for recorded calls. This allows synthesizing arbitrary _combinations_ of arguments, and also offers a base for further customization of calls.\n", "\n", "The `CallGrammarMiner` class turns a list of carved executions into a grammar.\n", "\n", "```python\n", ">>> my_sqrt_miner = CallGrammarMiner(carver)\n", ">>> my_sqrt_grammar = my_sqrt_miner.mine_call_grammar()\n", ">>> my_sqrt_grammar\n", "{'': [''],\n", " '': [''],\n", " '': ['4', '2'],\n", " '': ['my_sqrt()']}\n", "```\n", "This grammar can be used to synthesize calls.\n", "\n", "```python\n", ">>> fuzzer = GrammarCoverageFuzzer(my_sqrt_grammar)\n", ">>> fuzzer.fuzz()\n", "'my_sqrt(2)'\n", "```\n", "These calls can be executed in isolation, effectively extracting unit tests from system tests:\n", "\n", "```python\n", ">>> eval(fuzzer.fuzz())\n", "2.0\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## System Tests vs Unit Tests\n", "\n", "Remember the URL grammar introduced for [grammar fuzzing](Grammars.ipynb)? With such a grammar, we can happily test a Web browser again and again, checking how it reacts to arbitrary page requests.\n", "\n", "Let us define a very simple \"web browser\" that goes and downloads the content given by the URL." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:21:15.622036Z", "iopub.status.busy": "2024-01-18T17:21:15.621916Z", "iopub.status.idle": "2024-01-18T17:21:15.623514Z", "shell.execute_reply": "2024-01-18T17:21:15.623222Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import urllib.parse" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:21:15.625080Z", "iopub.status.busy": "2024-01-18T17:21:15.624973Z", "iopub.status.idle": "2024-01-18T17:21:15.626778Z", "shell.execute_reply": "2024-01-18T17:21:15.626518Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def webbrowser(url):\n", " \"\"\"Download the http/https resource given by the URL\"\"\"\n", " import requests # Only import if needed\n", "\n", " r = requests.get(url)\n", " return r.text" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let us apply this on [fuzzingbook.org](https://www.fuzzingbook.org/) and measure the time, using the [Timer class](Timer.ipynb):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:21:15.628609Z", "iopub.status.busy": "2024-01-18T17:21:15.628472Z", "iopub.status.idle": "2024-01-18T17:21:15.630432Z", "shell.execute_reply": "2024-01-18T17:21:15.630042Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from Timer import Timer" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:21:15.632119Z", "iopub.status.busy": "2024-01-18T17:21:15.632001Z", "iopub.status.idle": "2024-01-18T17:21:16.033650Z", "shell.execute_reply": "2024-01-18T17:21:16.033214Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloaded 474685 bytes in 0.40 seconds\n" ] } ], "source": [ "with Timer() as webbrowser_timer:\n", " fuzzingbook_contents = webbrowser(\n", " \"http://www.fuzzingbook.org/html/Fuzzer.html\")\n", "\n", "print(\"Downloaded %d bytes in %.2f seconds\" %\n", " (len(fuzzingbook_contents), webbrowser_timer.elapsed_time()))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:21:16.076013Z", "iopub.status.busy": "2024-01-18T17:21:16.075776Z", "iopub.status.idle": "2024-01-18T17:21:16.081187Z", "shell.execute_reply": "2024-01-18T17:21:16.080785Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'\\n\\n