{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Concolic Fuzzing\n", "\n", "In the [chapter on information flow](InformationFlow.ipynb), we have seen how one can use dynamic taints to produce more intelligent test cases than simply looking for program crashes. We have also seen how one can use the taints to update the grammar, and hence focus more on the dangerous methods. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "While taints are helpful, uninterpreted strings is only one of the attack vectors. Can we say anything more about the properties of variables at any point in the execution? For example, can we say for sure that a function will always receive the buffers with the correct length?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ " _Concolic execution_ offers a solution here. The idea of _concolic execution_ over a function is as follows: We start with a sample input for the function, and execute the function under trace. At each point the execution passes through a conditional, we _save the conditional encountered_ in the form of _relations between symbolic variables._ Here, a _symbolic variable_ can be thought of as a sort of placeholder for the real variable, sort of like the x in solving for x in Algebra. The symbolic variables can be used to specify relations without actually solving them." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With concolic execution, one can collect the constraints that an execution path encounters, and use it to answer questions about the program behavior at any point we prefer along the program execution path. We can further use concolic execution to enhance fuzzing." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In this chapter, we explore in depth how to execute a Python function concolically, and how concolic execution can be used to enhance fuzzing." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:16.245391Z", "iopub.status.busy": "2024-01-18T17:20:16.244821Z", "iopub.status.idle": "2024-01-18T17:20:16.305903Z", "shell.execute_reply": "2024-01-18T17:20:16.305515Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from bookutils import YouTubeVideo\n", "YouTubeVideo('KDcMjWX5ulU')" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "**Prerequisites**\n", "\n", "* You should have read the [chapter on coverage](Coverage.ipynb).\n", "* You should have read the [chapter on information flow](InformationFlow.ipynb).\n", "* A familiarity with the basic idea of [SMT solvers](https://en.wikipedia.org/wiki/Satisfiability_modulo_theories) would be useful." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:16.329205Z", "iopub.status.busy": "2024-01-18T17:20:16.328954Z", "iopub.status.idle": "2024-01-18T17:20:16.331502Z", "shell.execute_reply": "2024-01-18T17:20:16.331182Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import bookutils.setup" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:16.333118Z", "iopub.status.busy": "2024-01-18T17:20:16.333011Z", "iopub.status.idle": "2024-01-18T17:20:16.334743Z", "shell.execute_reply": "2024-01-18T17:20:16.334468Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from typing import List, Callable, Dict, Tuple" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Synopsis\n", "\n", "\n", "To [use the code provided in this chapter](Importing.ipynb), write\n", "\n", "```python\n", ">>> from fuzzingbook.ConcolicFuzzer import \n", "```\n", "\n", "and then make use of the following features.\n", "\n", "\n", "This chapter defines two main classes: `SimpleConcolicFuzzer` and `ConcolicGrammarFuzzer`. The `SimpleConcolicFuzzer` first uses a sample input to collect predicates encountered. The fuzzer then negates random predicates to generate new input constraints. These, when solved, produce inputs that explore paths that are close to the original path.\n", "\n", "### ConcolicTracer\n", "\n", "At the heart of both fuzzers lies the concept of a _concolic tracer_, capturing symbolic variables and path conditions as a program gets executed.\n", "\n", "`ConcolicTracer` is used in a `with` block; the syntax `tracer[function]` executes `function` within the `tracer` while capturing conditions. Here is an example for the `cgi_decode()` function:\n", "\n", "```python\n", ">>> with ConcolicTracer() as _:\n", ">>> _[cgi_decode]('a%20d')\n", "```\n", "Once executed, we can retrieve the symbolic variables in the `decls` attribute. This is a mapping of symbolic variables to types.\n", "\n", "```python\n", ">>> _.decls\n", "{'cgi_decode_s_str_1': 'String'}\n", "```\n", "The extracted path conditions can be found in the `path` attribute:\n", "\n", "```python\n", ">>> _.path\n", "[0 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 0, 1) == \"+\"),\n", " Not(str.substr(cgi_decode_s_str_1, 0, 1) == \"%\"),\n", " 1 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 1, 1) == \"+\"),\n", " str.substr(cgi_decode_s_str_1, 1, 1) == \"%\",\n", " Not(str.substr(cgi_decode_s_str_1, 2, 1) == \"0\"),\n", " Not(str.substr(cgi_decode_s_str_1, 2, 1) == \"1\"),\n", " str.substr(cgi_decode_s_str_1, 2, 1) == \"2\",\n", " str.substr(cgi_decode_s_str_1, 3, 1) == \"0\",\n", " 4 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 4, 1) == \"+\"),\n", " Not(str.substr(cgi_decode_s_str_1, 4, 1) == \"%\"),\n", " Not(5 < Length(cgi_decode_s_str_1))]\n", "```\n", "The `context` attribute holds a pair of `decls` and `path` attributes; this is useful for passing it into the `ConcolicTracer` constructor.\n", "\n", "```python\n", ">>> assert _.context == (_.decls, _.path)\n", "```\n", "We can solve these constraints to obtain a value for the function parameters that follow the same path as the original (traced) invocation:\n", "\n", "```python\n", ">>> _.zeval()\n", "('sat', {'s': ('A%20B', 'String')})\n", "```\n", "The `zeval()` function also allows passing _alternate_ or _negated_ constraints. See the chapter for examples.\n", "\n", "![](PICS/ConcolicFuzzer-synopsis-1.svg)\n", "\n", "### SimpleConcolicFuzzer\n", "\n", "The constraints obtained from `ConcolicTracer` are added to the concolic fuzzer as follows:\n", "\n", "```python\n", ">>> scf = SimpleConcolicFuzzer()\n", ">>> scf.add_trace(_, 'a%20d')\n", "```\n", "The concolic fuzzer then uses the constraints added to guide its fuzzing as follows:\n", "\n", "```python\n", ">>> scf = SimpleConcolicFuzzer()\n", ">>> for i in range(20):\n", ">>> v = scf.fuzz()\n", ">>> if v is None:\n", ">>> break\n", ">>> print(repr(v))\n", ">>> with ExpectError(print_traceback=False):\n", ">>> with ConcolicTracer() as _:\n", ">>> _[cgi_decode](v)\n", ">>> scf.add_trace(_, v)\n", "' '\n", "'+'\n", "'%'\n", "'+A'\n", "'AB'\n", "'++'\n", "'++A'\n", "'+++'\n", "'A'\n", "'+A'\n", "'+++A'\n", "\n", "IndexError: string index out of range (expected)\n", "\n", "'+AB'\n", "'++'\n", "'%'\n", "'++AB'\n", "'++A+'\n", "'+A'\n", "'++'\n", "'+'\n", "'+%'\n", "\n", "IndexError: string index out of range (expected)\n", "IndexError: string index out of range (expected)\n", "\n", "```\n", "We see how the additional inputs generated explore additional paths.\n", "\n", "![](PICS/ConcolicFuzzer-synopsis-2.svg)\n", "\n", "### ConcolicGrammarFuzzer\n", "\n", "The `SimpleConcolicFuzzer` simply explores all paths near the original path traversed by the sample input. It uses a simple mechanism to explore the paths that are near the paths that it knows about, and other than code paths, knows nothing about the input.\n", "\n", "The `ConcolicGrammarFuzzer` on the other hand, knows about the input grammar, and can collect feedback from the subject under fuzzing. It can lift some constraints encountered to the grammar, enabling deeper fuzzing. It is used as follows:\n", "\n", "```python\n", ">>> from InformationFlow import INVENTORY_GRAMMAR, SQLException\n", ">>> cgf = ConcolicGrammarFuzzer(INVENTORY_GRAMMAR)\n", ">>> cgf.prune_tokens(prune_tokens)\n", ">>> for i in range(10):\n", ">>> query = cgf.fuzz()\n", ">>> print(query)\n", ">>> with ConcolicTracer() as _:\n", ">>> with ExpectError(print_traceback=False):\n", ">>> try:\n", ">>> res = _[db_select](query)\n", ">>> print(repr(res))\n", ">>> except SQLException as e:\n", ">>> print(e)\n", ">>> cgf.update_grammar(_)\n", ">>> print()\n", "select 245 from :2 where r(_)-N+e>n\n", "Table (':2') was not found\n", "\n", "delete from months where Q/x/j/q(p)/H*h-B==cz\n", "Invalid WHERE ('Q/x/j/q(p)/H*h-B==cz')\n", "\n", "insert into vehicles (:b) values (22.72)\n", "Column (':b') was not found\n", "\n", "select i*q!=(4) from vehicles where L*S/l/u/b+b==W\n", "\n", "delete from vehicles where W/V!=A(f)+tL+S))==((:+lL+S))==((:+l\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "1\n", "\n", "\n", "1: enter: factorial(n)\n", "\n", "\n", "\n", "3\n", "\n", "2: if: n < 0\n", "\n", "\n", "\n", "1->3\n", "\n", "\n", "\n", "\n", "\n", "2\n", "\n", "\n", "1: exit: factorial(n)\n", "\n", "\n", "\n", "4\n", "\n", "3: return None\n", "\n", "\n", "\n", "4->2\n", "\n", "\n", "\n", "\n", "\n", "6\n", "\n", "6: return 1\n", "\n", "\n", "\n", "6->2\n", "\n", "\n", "\n", "\n", "\n", "8\n", "\n", "9: return 1\n", "\n", "\n", "\n", "8->2\n", "\n", "\n", "\n", "\n", "\n", "13\n", "\n", "16: return v\n", "\n", "\n", "\n", "13->2\n", "\n", "\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "\n", "5\n", "\n", "5: if: n == 0\n", "\n", "\n", "\n", "3->5\n", "\n", "\n", "\n", "\n", "\n", "5->6\n", "\n", "\n", "\n", "\n", "\n", "7\n", "\n", "8: if: n == 1\n", "\n", "\n", "\n", "5->7\n", "\n", "\n", "\n", "\n", "\n", "7->8\n", "\n", "\n", "\n", "\n", "\n", "9\n", "\n", "11: v = 1\n", "\n", "\n", "\n", "7->9\n", "\n", "\n", "\n", "\n", "\n", "10\n", "\n", "12: while: n != 0\n", "\n", "\n", "\n", "9->10\n", "\n", "\n", "\n", "\n", "\n", "10->13\n", "\n", "\n", "\n", "\n", "\n", "11\n", "\n", "13: v = v * n\n", "\n", "\n", "\n", "10->11\n", "\n", "\n", "\n", "\n", "\n", "12\n", "\n", "14: n = n - 1\n", "\n", "\n", "\n", "12->10\n", "\n", "\n", "\n", "\n", "\n", "11->12\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "to_graph(gen_cfg(inspect.getsource(factorial)), arcs=cov.arcs())" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see that the path `[1, 2, 5, 8, 11, 12, 13, 14]` is covered (green) but sub-paths such as `[2, 3]`, `[5, 6]` and `[8, 9]` are unexplored (red). What we need is the ability to generate inputs such that the `True` branch is taken at `2`. How do we do that?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "toc-hr-collapsed": false }, "source": [ "## Concolic Execution\n", "\n", "One way to cover additional branches is to look at the execution path being taken, and collect the _conditional constraints_ that the path encounters. Then we can try to produce inputs that lead us to taking the non-traversed path." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "First, let us step through the function." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.249737Z", "iopub.status.busy": "2024-01-18T17:20:17.249609Z", "iopub.status.idle": "2024-01-18T17:20:17.252327Z", "shell.execute_reply": "2024-01-18T17:20:17.252005Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "lines = [i[1] for i in cov._trace if i[0] == 'factorial']\n", "src = {i + 1: s for i, s in enumerate(\n", " inspect.getsource(factorial).split('\\n'))}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* The line (1) is simply the entry point of the function. We know that the input is `n`, which is an integer." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.254055Z", "iopub.status.busy": "2024-01-18T17:20:17.253931Z", "iopub.status.idle": "2024-01-18T17:20:17.256099Z", "shell.execute_reply": "2024-01-18T17:20:17.255827Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'def factorial(n):'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src[1]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* The line (2) is a predicate `n < 0`. Since the next line taken is line (5), we know that at this point in the execution path, the predicate was `false`." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.257554Z", "iopub.status.busy": "2024-01-18T17:20:17.257433Z", "iopub.status.idle": "2024-01-18T17:20:17.259639Z", "shell.execute_reply": "2024-01-18T17:20:17.259369Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(' if n < 0:', ' return None', '', ' if n == 0:')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src[2], src[3], src[4], src[5]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We notice that this is one of the predicates where the `true` branch was not taken. How do we generate a value that takes the `true` branch here? One way is to use symbolic variables to represent the input, encode the constraint, and use an *SMT Solver* to solve the negation of the constraint." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As we mentioned in the introduction to the chapter, a symbolic variable can be thought of as a sort of placeholder for the real variable, sort of like the `x` in solving for `x` in Algebra. These variables can be used to encode constraints placed on the variables in the program. We identify what constraints the variable is supposed to obey, and finally produce a value that obeys all constraints imposed." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Solving Constraints" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To solve these constraints, one can use a _Satisfiability Modulo Theories_ (SMT) solver. An SMT solver is built on top of a _SATISFIABILITY_ (SAT) solver. A SAT solver is being used to check whether boolean formulas in first order logic (e.g. `(a | b ) & (~a | ~b)`) can be satisfied using any assignments for the variables (e.g `a = true, b = false`). An SMT solver extends these SAT solvers to specific background theories -- for example, _theory of integers_, or _theory of strings_. That is, given a string constraint expressed as a formula with string variables (e.g. `h + t == 'hello,world'`), an SMT solver that understands _theory of strings_ can be used to check if that constraint can be satisfied, and if satisfiable, provide an instantiation of concrete values for the variables used in the formula (e.g `h = 'hello,', t = 'world'`)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We use the SMT solver Z3 in this chapter." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.261433Z", "iopub.status.busy": "2024-01-18T17:20:17.261246Z", "iopub.status.idle": "2024-01-18T17:20:17.274279Z", "shell.execute_reply": "2024-01-18T17:20:17.273971Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import z3 # type: ignore" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.275872Z", "iopub.status.busy": "2024-01-18T17:20:17.275764Z", "iopub.status.idle": "2024-01-18T17:20:17.277856Z", "shell.execute_reply": "2024-01-18T17:20:17.277548Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "z3_ver = z3.get_version()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.279469Z", "iopub.status.busy": "2024-01-18T17:20:17.279358Z", "iopub.status.idle": "2024-01-18T17:20:17.281287Z", "shell.execute_reply": "2024-01-18T17:20:17.281040Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(4, 11, 2, 0)\n" ] } ], "source": [ "print(z3_ver)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.282663Z", "iopub.status.busy": "2024-01-18T17:20:17.282556Z", "iopub.status.idle": "2024-01-18T17:20:17.284248Z", "shell.execute_reply": "2024-01-18T17:20:17.283948Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert z3_ver >= (4, 8, 13, 0), \\\n", " f\"Please install z3-solver 4.8.13.0 or later - you have {z3_ver}\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us set up Z3 first. To ensure that the string constraints we use in this chapter are successfully evaluated, we need to specify the `z3str3` solver. Further, we set the timeout for Z3 computations to 30 seconds." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.285666Z", "iopub.status.busy": "2024-01-18T17:20:17.285565Z", "iopub.status.idle": "2024-01-18T17:20:17.287106Z", "shell.execute_reply": "2024-01-18T17:20:17.286869Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# z3.set_option('smt.string_solver', 'z3str3')\n", "z3.set_option('timeout', 30 * 1000) # milliseconds" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To encode constraints, we need symbolic variables. Here, we make `zn` a placeholder for the Z3 symbolic integer variable `n`." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.288569Z", "iopub.status.busy": "2024-01-18T17:20:17.288452Z", "iopub.status.idle": "2024-01-18T17:20:17.292146Z", "shell.execute_reply": "2024-01-18T17:20:17.291836Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "zn = z3.Int('n')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Remember the constraint `(n < 0)` from line 2 in `factorial()`? We can now encode the constraint as follows. " ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.294025Z", "iopub.status.busy": "2024-01-18T17:20:17.293919Z", "iopub.status.idle": "2024-01-18T17:20:17.297496Z", "shell.execute_reply": "2024-01-18T17:20:17.297240Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "n < 0" ], "text/plain": [ "n < 0" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zn < 0" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We previously traced `factorial(5)`. We saw that with input `5`, the execution took the `else` branch on the predicate `n < 0`. We can express this observation as follows." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.298982Z", "iopub.status.busy": "2024-01-18T17:20:17.298877Z", "iopub.status.idle": "2024-01-18T17:20:17.302023Z", "shell.execute_reply": "2024-01-18T17:20:17.301750Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "¬(n < 0)" ], "text/plain": [ "Not(n < 0)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "z3.Not(zn < 0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us now solve constraints. The `z3.solve()` method checks if the constraints are satisfiable; if they are, it also provides values for variables such that the constraints are satisfied. For example, we can ask Z3 for an input that will take the `else` branch as follows:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.303499Z", "iopub.status.busy": "2024-01-18T17:20:17.303390Z", "iopub.status.idle": "2024-01-18T17:20:17.311702Z", "shell.execute_reply": "2024-01-18T17:20:17.311358Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[n = 0]\n" ] } ], "source": [ "z3.solve(z3.Not(zn < 0))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This is *a solution* (albeit a trivial one). SMT solvers can be used to solve much harder problems. For example, here is how one can solve a quadratic equation." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.313231Z", "iopub.status.busy": "2024-01-18T17:20:17.313127Z", "iopub.status.idle": "2024-01-18T17:20:17.322438Z", "shell.execute_reply": "2024-01-18T17:20:17.322157Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[x = 5]\n" ] } ], "source": [ "x = z3.Real('x')\n", "eqn = (2 * x**2 - 11 * x + 5 == 0)\n", "z3.solve(eqn)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Again, this is _one solution_. We can ask z3 to give us another solution as follows." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.324013Z", "iopub.status.busy": "2024-01-18T17:20:17.323854Z", "iopub.status.idle": "2024-01-18T17:20:17.331944Z", "shell.execute_reply": "2024-01-18T17:20:17.331661Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[x = 1/2]\n" ] } ], "source": [ "z3.solve(x != 5, eqn)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Indeed, both `x = 5` and `x = 1/2` are solutions to the quadratic equation $2x^2 -11x + 5 = 0$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Similarly, we can ask *Z3* for an input that satisfies the constraint encoded in line 2 of `factorial()` so that we take the `if` branch." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.333538Z", "iopub.status.busy": "2024-01-18T17:20:17.333428Z", "iopub.status.idle": "2024-01-18T17:20:17.341002Z", "shell.execute_reply": "2024-01-18T17:20:17.340753Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[n = -1]\n" ] } ], "source": [ "z3.solve(zn < 0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "That is, if one uses `-1` as an input to `factorial()`, it is guaranteed to take the `if` branch in line 2 during execution." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us try using that with our coverage. Here, the `-1` is the solution from above." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.342579Z", "iopub.status.busy": "2024-01-18T17:20:17.342471Z", "iopub.status.idle": "2024-01-18T17:20:17.361678Z", "shell.execute_reply": "2024-01-18T17:20:17.361230Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "with cov as cov:\n", " factorial(-1)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.363432Z", "iopub.status.busy": "2024-01-18T17:20:17.363305Z", "iopub.status.idle": "2024-01-18T17:20:17.733054Z", "shell.execute_reply": "2024-01-18T17:20:17.732671Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "1\n", "\n", "\n", "1: enter: factorial(n)\n", "\n", "\n", "\n", "3\n", "\n", "2: if: n < 0\n", "\n", "\n", "\n", "1->3\n", "\n", "\n", "\n", "\n", "\n", "2\n", "\n", "\n", "1: exit: factorial(n)\n", "\n", "\n", "\n", "4\n", "\n", "3: return None\n", "\n", "\n", "\n", "4->2\n", "\n", "\n", "\n", "\n", "\n", "6\n", "\n", "6: return 1\n", "\n", "\n", "\n", "6->2\n", "\n", "\n", "\n", "\n", "\n", "8\n", "\n", "9: return 1\n", "\n", "\n", "\n", "8->2\n", "\n", "\n", "\n", "\n", "\n", "13\n", "\n", "16: return v\n", "\n", "\n", "\n", "13->2\n", "\n", "\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "\n", "5\n", "\n", "5: if: n == 0\n", "\n", "\n", "\n", "3->5\n", "\n", "\n", "\n", "\n", "\n", "5->6\n", "\n", "\n", "\n", "\n", "\n", "7\n", "\n", "8: if: n == 1\n", "\n", "\n", "\n", "5->7\n", "\n", "\n", "\n", "\n", "\n", "7->8\n", "\n", "\n", "\n", "\n", "\n", "9\n", "\n", "11: v = 1\n", "\n", "\n", "\n", "7->9\n", "\n", "\n", "\n", "\n", "\n", "10\n", "\n", "12: while: n != 0\n", "\n", "\n", "\n", "9->10\n", "\n", "\n", "\n", "\n", "\n", "10->13\n", "\n", "\n", "\n", "\n", "\n", "11\n", "\n", "13: v = v * n\n", "\n", "\n", "\n", "10->11\n", "\n", "\n", "\n", "\n", "\n", "12\n", "\n", "14: n = n - 1\n", "\n", "\n", "\n", "12->10\n", "\n", "\n", "\n", "\n", "\n", "11->12\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "to_graph(gen_cfg(inspect.getsource(factorial)), arcs=cov.arcs())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Ok, so we have managed to cover a little more of the graph. Let us continue with our original input of `factorial(5)`:\n", "* In line (5) we encounter a new predicate `n == 0`, for which we again took the false branch." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.735129Z", "iopub.status.busy": "2024-01-18T17:20:17.734969Z", "iopub.status.idle": "2024-01-18T17:20:17.737980Z", "shell.execute_reply": "2024-01-18T17:20:17.737432Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "' if n == 0:'" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src[5]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The predicates required, to follow the path until this point are as follows." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.739767Z", "iopub.status.busy": "2024-01-18T17:20:17.739656Z", "iopub.status.idle": "2024-01-18T17:20:17.742011Z", "shell.execute_reply": "2024-01-18T17:20:17.741745Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "predicates = [z3.Not(zn < 0), z3.Not(zn == 0)]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* If we continue to line (8), we encounter another predicate, for which again, we took the `false` branch" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.743413Z", "iopub.status.busy": "2024-01-18T17:20:17.743314Z", "iopub.status.idle": "2024-01-18T17:20:17.745503Z", "shell.execute_reply": "2024-01-18T17:20:17.745175Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "' if n == 1:'" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src[8]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The predicates encountered so far are as follows" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.746996Z", "iopub.status.busy": "2024-01-18T17:20:17.746907Z", "iopub.status.idle": "2024-01-18T17:20:17.749047Z", "shell.execute_reply": "2024-01-18T17:20:17.748787Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "predicates = [z3.Not(zn < 0), z3.Not(zn == 0), z3.Not(zn == 1)]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To take the branch at (6), we essentially have to obey the predicates until that point, but invert the last predicate." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.750493Z", "iopub.status.busy": "2024-01-18T17:20:17.750409Z", "iopub.status.idle": "2024-01-18T17:20:17.759071Z", "shell.execute_reply": "2024-01-18T17:20:17.758790Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[n = 1]\n" ] } ], "source": [ "last = len(predicates) - 1\n", "z3.solve(predicates[0:-1] + [z3.Not(predicates[-1])])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "What we are doing here is tracing the execution corresponding to a particular input `factorial(5)`, using concrete values, and along with it, keeping *symbolic shadow variables* that enable us to capture the constraints. As we mentioned in the introduction, this particular method of execution where one tracks concrete execution using symbolic variables is called *Concolic Execution*.\n", "\n", "How do we automate this process? One method is to use a similar infrastructure as that of the chapter on [information flow](InformationFlow.ipynb), and use the Python inheritance to create _symbolic proxy objects_ that can track the concrete execution." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [], "toc-nb-collapsed": true }, "source": [ "## A Concolic Tracer\n", "\n", "Let us now define a class to _collect_ symbolic variables and path conditions during an execution. The idea is to have a `ConcolicTracer` class that is invoked in a `with` block. To execute a function while tracing its path conditions, we need to _transform_ its arguments, which we do by invoking functions through a `[]` item access.\n", "\n", "This is a typical usage of a `ConcolicTracer`:\n", "\n", "```python\n", "with ConcolicTracer as _:\n", " _.[function](args, ...)\n", "```\n", "\n", "After execution, we can access the symbolic variables in the `decls` attribute:\n", "\n", "```python\n", "_.decls\n", "```\n", "\n", "whereas the `path` attribute lists the precondition paths encountered:\n", "\n", "```python\n", "_.path\n", "```\n", "\n", "The `context` attribute contains a pair of declarations and paths:\n", "\n", "```python\n", "_.context\n", "```\n", "\n", "If you read this for the first time, skip the implementation and head right to the examples." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "### Excursion: Implementing ConcolicTracer" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us now implement `ConcolicTracer`.\n", " Its constructor accepts a single `context` argument which contains the declarations for the symbolic variables seen so far, and path conditions seen so far. We only need this in case of nested contexts." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.760623Z", "iopub.status.busy": "2024-01-18T17:20:17.760518Z", "iopub.status.idle": "2024-01-18T17:20:17.762574Z", "shell.execute_reply": "2024-01-18T17:20:17.762329Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class ConcolicTracer:\n", " \"\"\"Trace function execution, tracking variables and path conditions\"\"\"\n", "\n", " def __init__(self, context=None):\n", " \"\"\"Constructor.\"\"\"\n", " self.context = context if context is not None else ({}, [])\n", " self.decls, self.path = self.context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We add the enter and exit methods for the `with` block." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.764101Z", "iopub.status.busy": "2024-01-18T17:20:17.763989Z", "iopub.status.idle": "2024-01-18T17:20:17.765766Z", "shell.execute_reply": "2024-01-18T17:20:17.765517Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ConcolicTracer(ConcolicTracer):\n", " def __enter__(self):\n", " return self\n", "\n", " def __exit__(self, exc_type, exc_value, tb):\n", " return" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We use introspection to determine the arguments to the function, which is hooked into the `getitem()` method." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.767111Z", "iopub.status.busy": "2024-01-18T17:20:17.767012Z", "iopub.status.idle": "2024-01-18T17:20:17.768916Z", "shell.execute_reply": "2024-01-18T17:20:17.768668Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class ConcolicTracer(ConcolicTracer):\n", " def __getitem__(self, fn):\n", " self.fn = fn\n", " self.fn_args = {i: None for i in inspect.signature(fn).parameters}\n", " return self" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Finally, the function itself is invoked using the `call` method." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.770386Z", "iopub.status.busy": "2024-01-18T17:20:17.770266Z", "iopub.status.idle": "2024-01-18T17:20:17.771990Z", "shell.execute_reply": "2024-01-18T17:20:17.771726Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ConcolicTracer(ConcolicTracer):\n", " def __call__(self, *args):\n", " self.result = self.fn(*self.concolic(args))\n", " return self.result" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For now, we define `concolic()` as a transparent function. It will be modified to produce symbolic variables later." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.773495Z", "iopub.status.busy": "2024-01-18T17:20:17.773400Z", "iopub.status.idle": "2024-01-18T17:20:17.775114Z", "shell.execute_reply": "2024-01-18T17:20:17.774850Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class ConcolicTracer(ConcolicTracer):\n", " def concolic(self, args):\n", " return args" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We now have things in place for _tracing_ functions:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.776688Z", "iopub.status.busy": "2024-01-18T17:20:17.776550Z", "iopub.status.idle": "2024-01-18T17:20:17.778244Z", "shell.execute_reply": "2024-01-18T17:20:17.778003Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " _[factorial](1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And for retrieving results (but not actually _computing_ them):" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.779748Z", "iopub.status.busy": "2024-01-18T17:20:17.779611Z", "iopub.status.idle": "2024-01-18T17:20:17.781619Z", "shell.execute_reply": "2024-01-18T17:20:17.781336Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{}" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.decls" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.782991Z", "iopub.status.busy": "2024-01-18T17:20:17.782894Z", "iopub.status.idle": "2024-01-18T17:20:17.784926Z", "shell.execute_reply": "2024-01-18T17:20:17.784672Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.path" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Both `decls` and `path` attributes will be set by concolic proxy objects, which we define next." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [], "toc-hr-collapsed": true }, "source": [ "#### Concolic Proxy Objects\n", "\n", "We now define the concolic proxy objects that can be used for concolic tracing. First, we define the `zproxy_create()` method that given a class name, correctly creates an instance of that class, and the symbolic corresponding variable, and registers the symbolic variable in the context information `context`." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.786408Z", "iopub.status.busy": "2024-01-18T17:20:17.786295Z", "iopub.status.idle": "2024-01-18T17:20:17.788096Z", "shell.execute_reply": "2024-01-18T17:20:17.787837Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def zproxy_create(cls, z_type, z3var, context, z_name, v=None):\n", " z_value = cls(context, z3var(z_name), v)\n", " context[0][z_name] = z_type # add to decls\n", " return z_value" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "toc-hr-collapsed": false }, "source": [ "#### A Proxy Class for Booleans\n", "\n", "First, we define the `zbool` class which is used to track the predicates encountered. It is a wrapper class that contains both symbolic (`z`) and concrete (`v`) values. The concrete value is used to determine which path to take, and the symbolic value is used to collect the predicates encountered.\n", "\n", "The initialization is done in two parts. The first one is using `zproxy_create()` to correctly initialize and register the shadow symbolic variable corresponding to the passed argument. This is used exclusively when the symbolic variable needs to be initialized first. In all other cases, the constructor is called with the preexisting symbolic value." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.789489Z", "iopub.status.busy": "2024-01-18T17:20:17.789384Z", "iopub.status.idle": "2024-01-18T17:20:17.791367Z", "shell.execute_reply": "2024-01-18T17:20:17.791129Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zbool:\n", " @classmethod\n", " def create(cls, context, z_name, v):\n", " return zproxy_create(cls, 'Bool', z3.Bool, context, z_name, v)\n", "\n", " def __init__(self, context, z, v=None):\n", " self.context = context\n", " self.z = z\n", " self.v = v\n", " self.decl, self.path = self.context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is how it is used. We create a symbolic variable `my_bool_arg` with a value of `True` in the current context of the concolic tracer:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.792751Z", "iopub.status.busy": "2024-01-18T17:20:17.792631Z", "iopub.status.idle": "2024-01-18T17:20:17.794443Z", "shell.execute_reply": "2024-01-18T17:20:17.794217Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " val = zbool.create(_.context, 'my_bool_arg', True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can now access the symbolic name in the `z` attribute:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.795866Z", "iopub.status.busy": "2024-01-18T17:20:17.795753Z", "iopub.status.idle": "2024-01-18T17:20:17.797944Z", "shell.execute_reply": "2024-01-18T17:20:17.797699Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "my_bool_arg" ], "text/plain": [ "my_bool_arg" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val.z" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The value is in the `v` attribute:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.799454Z", "iopub.status.busy": "2024-01-18T17:20:17.799346Z", "iopub.status.idle": "2024-01-18T17:20:17.801322Z", "shell.execute_reply": "2024-01-18T17:20:17.801068Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val.v" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Note that the context of the enclosing `ConcolicTracer()` is automatically updated (via `zproxy_create()`) to hold the variable declarations and types:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.802688Z", "iopub.status.busy": "2024-01-18T17:20:17.802589Z", "iopub.status.idle": "2024-01-18T17:20:17.804670Z", "shell.execute_reply": "2024-01-18T17:20:17.804424Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'my_bool_arg': 'Bool'}, [])" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The context can also be reached through the `context` attribute; both point to the same data structure." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.806158Z", "iopub.status.busy": "2024-01-18T17:20:17.806052Z", "iopub.status.idle": "2024-01-18T17:20:17.808207Z", "shell.execute_reply": "2024-01-18T17:20:17.807923Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'my_bool_arg': 'Bool'}, [])" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val.context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Negation of Encoded formula\n", "\n", "The `zbool` class allows negation of its concrete and symbolic values." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.809705Z", "iopub.status.busy": "2024-01-18T17:20:17.809589Z", "iopub.status.idle": "2024-01-18T17:20:17.811352Z", "shell.execute_reply": "2024-01-18T17:20:17.811101Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zbool(zbool):\n", " def __not__(self):\n", " return zbool(self.context, z3.Not(self.z), not self.v)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is how it can be used." ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.812848Z", "iopub.status.busy": "2024-01-18T17:20:17.812712Z", "iopub.status.idle": "2024-01-18T17:20:17.814525Z", "shell.execute_reply": "2024-01-18T17:20:17.814287Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " val = zbool.create(_.context, 'my_bool_arg', True).__not__()" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.815831Z", "iopub.status.busy": "2024-01-18T17:20:17.815733Z", "iopub.status.idle": "2024-01-18T17:20:17.818235Z", "shell.execute_reply": "2024-01-18T17:20:17.817941Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "¬my_bool_arg" ], "text/plain": [ "Not(my_bool_arg)" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val.z" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.819621Z", "iopub.status.busy": "2024-01-18T17:20:17.819512Z", "iopub.status.idle": "2024-01-18T17:20:17.821517Z", "shell.execute_reply": "2024-01-18T17:20:17.821291Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val.v" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.822864Z", "iopub.status.busy": "2024-01-18T17:20:17.822765Z", "iopub.status.idle": "2024-01-18T17:20:17.824776Z", "shell.execute_reply": "2024-01-18T17:20:17.824515Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "({'my_bool_arg': 'Bool'}, [])" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Registering Predicates on Conditionals\n", "\n", "The `zbool` class is being used to track Boolean conditions that arise during program execution. It tracks such conditions by registering the corresponding symbolic expressions in the context as soon as it is evaluated. On evaluation, the `__bool__()` method is called; so we can hook into this one:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.826191Z", "iopub.status.busy": "2024-01-18T17:20:17.826089Z", "iopub.status.idle": "2024-01-18T17:20:17.827853Z", "shell.execute_reply": "2024-01-18T17:20:17.827624Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zbool(zbool):\n", " def __bool__(self):\n", " r, pred = (True, self.z) if self.v else (False, z3.Not(self.z))\n", " self.path.append(pred)\n", " return r" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `zbool` class can be used to keep track of Boolean values and conditions encountered during the execution. For example, we can encode the conditions encountered by Line 6 in `factorial()` as follows:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "First, we define the concrete value (`ca`), and its shadow symbolic variable (`za`)." ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.829239Z", "iopub.status.busy": "2024-01-18T17:20:17.829145Z", "iopub.status.idle": "2024-01-18T17:20:17.830820Z", "shell.execute_reply": "2024-01-18T17:20:17.830574Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "ca = 5\n", "za = z3.Int('a')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Then, we wrap it in `zbool`, and use it in a conditional, forcing the conditional to be registered in the context." ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.832146Z", "iopub.status.busy": "2024-01-18T17:20:17.832069Z", "iopub.status.idle": "2024-01-18T17:20:17.833957Z", "shell.execute_reply": "2024-01-18T17:20:17.833719Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "success\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " if zbool(_.context, za == z3.IntVal(5), ca == 5):\n", " print('success')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can retrieve the registered conditional as follows." ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.835386Z", "iopub.status.busy": "2024-01-18T17:20:17.835289Z", "iopub.status.idle": "2024-01-18T17:20:17.837607Z", "shell.execute_reply": "2024-01-18T17:20:17.837354Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[5 == a]" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.path" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "toc-hr-collapsed": false }, "source": [ "#### A Proxy Class for Integers" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Next, we define a symbolic wrapper `zint` for `int`.\n", "This class keeps track of the `int` variables used and the predicates encountered in `context`. Finally, it also keeps the concrete value so that it can be used to determine the path to take. As the `zint` extends the primitive `int` class, we have to define a _new_ method to open it for extension." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.839103Z", "iopub.status.busy": "2024-01-18T17:20:17.839003Z", "iopub.status.idle": "2024-01-18T17:20:17.840744Z", "shell.execute_reply": "2024-01-18T17:20:17.840516Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zint(int):\n", " def __new__(cls, context, zn, v, *args, **kw):\n", " return int.__new__(cls, v, *args, **kw)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "As in the case of `zbool`, the initialization takes place in two parts. The first using `create()` if a new symbolic argument is being registered, and then the usual initialization." ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.842167Z", "iopub.status.busy": "2024-01-18T17:20:17.842073Z", "iopub.status.idle": "2024-01-18T17:20:17.844039Z", "shell.execute_reply": "2024-01-18T17:20:17.843814Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zint(zint):\n", " @classmethod\n", " def create(cls, context, zn, v=None):\n", " return zproxy_create(cls, 'Int', z3.Int, context, zn, v)\n", "\n", " def __init__(self, context, z, v=None):\n", " self.z, self.v = z, v\n", " self.context = context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `int` value of a `zint` object is its concrete value." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.845463Z", "iopub.status.busy": "2024-01-18T17:20:17.845355Z", "iopub.status.idle": "2024-01-18T17:20:17.847102Z", "shell.execute_reply": "2024-01-18T17:20:17.846880Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zint(zint):\n", " def __int__(self):\n", " return self.v\n", "\n", " def __pos__(self):\n", " return self.v" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Using these proxies is as follows." ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.848603Z", "iopub.status.busy": "2024-01-18T17:20:17.848505Z", "iopub.status.idle": "2024-01-18T17:20:17.850201Z", "shell.execute_reply": "2024-01-18T17:20:17.849949Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " val = zint.create(_.context, 'int_arg', 0)" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.851649Z", "iopub.status.busy": "2024-01-18T17:20:17.851547Z", "iopub.status.idle": "2024-01-18T17:20:17.853799Z", "shell.execute_reply": "2024-01-18T17:20:17.853544Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "int_arg" ], "text/plain": [ "int_arg" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val.z" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.855222Z", "iopub.status.busy": "2024-01-18T17:20:17.855114Z", "iopub.status.idle": "2024-01-18T17:20:17.857073Z", "shell.execute_reply": "2024-01-18T17:20:17.856785Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val.v" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.858448Z", "iopub.status.busy": "2024-01-18T17:20:17.858347Z", "iopub.status.idle": "2024-01-18T17:20:17.860288Z", "shell.execute_reply": "2024-01-18T17:20:17.860035Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'int_arg': 'Int'}, [])" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `zint` class is often used to do arithmetic with, or compare to other `int`s. These `int`s can be either a variable or a constant value. We define a helper method `_zv()` that checks what kind of `int` a given value is, and produces the correct symbolic equivalent." ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.861714Z", "iopub.status.busy": "2024-01-18T17:20:17.861599Z", "iopub.status.idle": "2024-01-18T17:20:17.863427Z", "shell.execute_reply": "2024-01-18T17:20:17.863179Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zint(zint):\n", " def _zv(self, o):\n", " return (o.z, o.v) if isinstance(o, zint) else (z3.IntVal(o), o)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It can be used as follows" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.864870Z", "iopub.status.busy": "2024-01-18T17:20:17.864770Z", "iopub.status.idle": "2024-01-18T17:20:17.866407Z", "shell.execute_reply": "2024-01-18T17:20:17.866174Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " val = zint.create(_.context, 'int_arg', 0)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.867710Z", "iopub.status.busy": "2024-01-18T17:20:17.867623Z", "iopub.status.idle": "2024-01-18T17:20:17.869711Z", "shell.execute_reply": "2024-01-18T17:20:17.869458Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(0, 0)" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val._zv(0)" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.871051Z", "iopub.status.busy": "2024-01-18T17:20:17.870967Z", "iopub.status.idle": "2024-01-18T17:20:17.873210Z", "shell.execute_reply": "2024-01-18T17:20:17.872993Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(int_arg, 0)" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "val._zv(val)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Equality between Integers\n", "\n", "Two integers can be compared for equality using _ne_ and _eq_." ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.874643Z", "iopub.status.busy": "2024-01-18T17:20:17.874553Z", "iopub.status.idle": "2024-01-18T17:20:17.876773Z", "shell.execute_reply": "2024-01-18T17:20:17.876525Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zint(zint):\n", " def __ne__(self, other):\n", " z, v = self._zv(other)\n", " return zbool(self.context, self.z != z, self.v != v)\n", "\n", " def __eq__(self, other):\n", " z, v = self._zv(other)\n", " return zbool(self.context, self.z == z, self.v == v)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " We also define _req_ using _eq_ in case the int being compared is on the left-hand side." ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.878063Z", "iopub.status.busy": "2024-01-18T17:20:17.877984Z", "iopub.status.idle": "2024-01-18T17:20:17.879745Z", "shell.execute_reply": "2024-01-18T17:20:17.879480Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zint(zint):\n", " def __req__(self, other):\n", " return self.__eq__(other)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It can be used as follows." ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.881237Z", "iopub.status.busy": "2024-01-18T17:20:17.881149Z", "iopub.status.idle": "2024-01-18T17:20:17.884641Z", "shell.execute_reply": "2024-01-18T17:20:17.884419Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "int_a == int_b int_a != int_b 0 != int_b\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " ia = zint.create(_.context, 'int_a', 0)\n", " ib = zint.create(_.context, 'int_b', 0)\n", " v1 = ia == ib\n", " v2 = ia != ib\n", " v3 = 0 != ib\n", " print(v1.z, v2.z, v3.z)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Comparisons between Integers\n", "\n", "Integers can also be compared for ordering, and the methods for this are defined below." ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.886043Z", "iopub.status.busy": "2024-01-18T17:20:17.885945Z", "iopub.status.idle": "2024-01-18T17:20:17.887945Z", "shell.execute_reply": "2024-01-18T17:20:17.887711Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zint(zint):\n", " def __lt__(self, other):\n", " z, v = self._zv(other)\n", " return zbool(self.context, self.z < z, self.v < v)\n", "\n", " def __gt__(self, other):\n", " z, v = self._zv(other)\n", " return zbool(self.context, self.z > z, self.v > v)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We use the comparisons and equality operators to provide the other missing operators." ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.889321Z", "iopub.status.busy": "2024-01-18T17:20:17.889221Z", "iopub.status.idle": "2024-01-18T17:20:17.891367Z", "shell.execute_reply": "2024-01-18T17:20:17.891128Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zint(zint):\n", " def __le__(self, other):\n", " z, v = self._zv(other)\n", " return zbool(self.context, z3.Or(self.z < z, self.z == z),\n", " self.v < v or self.v == v)\n", "\n", " def __ge__(self, other):\n", " z, v = self._zv(other)\n", " return zbool(self.context, z3.Or(self.z > z, self.z == z),\n", " self.v > v or self.v == v)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "These functions can be used as follows." ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.892817Z", "iopub.status.busy": "2024-01-18T17:20:17.892705Z", "iopub.status.idle": "2024-01-18T17:20:17.897383Z", "shell.execute_reply": "2024-01-18T17:20:17.897153Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "int_a > int_b int_a < int_b\n", "Or(int_a > int_b, int_a == int_b) Or(int_a < int_b, int_a == int_b)\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " ia = zint.create(_.context, 'int_a', 0)\n", " ib = zint.create(_.context, 'int_b', 1)\n", " v1 = ia > ib\n", " v2 = ia < ib\n", " print(v1.z, v2.z)\n", " v3 = ia >= ib\n", " v4 = ia <= ib\n", " print(v3.z, v4.z)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Binary Operators for Integers\n", "\n", "We implement relevant arithmetic operators for integers as described in the [Python documentation](https://docs.python.org/3/reference/datamodel.html#object.__add__). (The commented out operators are not directly available for `z3.ArithRef`. They need to be implemented separately if needed. See the exercises for how it can be done.)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.898785Z", "iopub.status.busy": "2024-01-18T17:20:17.898703Z", "iopub.status.idle": "2024-01-18T17:20:17.900628Z", "shell.execute_reply": "2024-01-18T17:20:17.900389Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "INT_BINARY_OPS = [\n", " '__add__',\n", " '__sub__',\n", " '__mul__',\n", " '__truediv__',\n", " # '__div__',\n", " '__mod__',\n", " # '__divmod__',\n", " '__pow__',\n", " # '__lshift__',\n", " # '__rshift__',\n", " # '__and__',\n", " # '__xor__',\n", " # '__or__',\n", " '__radd__',\n", " '__rsub__',\n", " '__rmul__',\n", " '__rtruediv__',\n", " # '__rdiv__',\n", " '__rmod__',\n", " # '__rdivmod__',\n", " '__rpow__',\n", " # '__rlshift__',\n", " # '__rrshift__',\n", " # '__rand__',\n", " # '__rxor__',\n", " # '__ror__',\n", "]" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.902054Z", "iopub.status.busy": "2024-01-18T17:20:17.901973Z", "iopub.status.idle": "2024-01-18T17:20:17.904278Z", "shell.execute_reply": "2024-01-18T17:20:17.904013Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def make_int_binary_wrapper(fname, fun, zfun): # type: ignore\n", " def proxy(self, other):\n", " z, v = self._zv(other)\n", " z_ = zfun(self.z, z)\n", " v_ = fun(self.v, v)\n", " if isinstance(v_, float):\n", " # we do not implement float results yet.\n", " assert round(v_) == v_\n", " v_ = round(v_)\n", " return zint(self.context, z_, v_)\n", "\n", " return proxy" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.905649Z", "iopub.status.busy": "2024-01-18T17:20:17.905559Z", "iopub.status.idle": "2024-01-18T17:20:17.907234Z", "shell.execute_reply": "2024-01-18T17:20:17.906994Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "INITIALIZER_LIST: List[Callable] = []" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.908523Z", "iopub.status.busy": "2024-01-18T17:20:17.908444Z", "iopub.status.idle": "2024-01-18T17:20:17.910166Z", "shell.execute_reply": "2024-01-18T17:20:17.909906Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def initialize():\n", " for fn in INITIALIZER_LIST:\n", " fn()" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.911443Z", "iopub.status.busy": "2024-01-18T17:20:17.911367Z", "iopub.status.idle": "2024-01-18T17:20:17.913225Z", "shell.execute_reply": "2024-01-18T17:20:17.912954Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def init_concolic_1():\n", " for fname in INT_BINARY_OPS:\n", " fun = getattr(int, fname)\n", " zfun = getattr(z3.ArithRef, fname)\n", " setattr(zint, fname, make_int_binary_wrapper(fname, fun, zfun))" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.914533Z", "iopub.status.busy": "2024-01-18T17:20:17.914448Z", "iopub.status.idle": "2024-01-18T17:20:17.915972Z", "shell.execute_reply": "2024-01-18T17:20:17.915702Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "INITIALIZER_LIST.append(init_concolic_1)" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.917524Z", "iopub.status.busy": "2024-01-18T17:20:17.917400Z", "iopub.status.idle": "2024-01-18T17:20:17.919123Z", "shell.execute_reply": "2024-01-18T17:20:17.918853Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "init_concolic_1()" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.920712Z", "iopub.status.busy": "2024-01-18T17:20:17.920488Z", "iopub.status.idle": "2024-01-18T17:20:17.926496Z", "shell.execute_reply": "2024-01-18T17:20:17.926250Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "int_a + int_b\n", "int_a + 10\n", "11 + int_b\n", "int_a - int_b\n", "int_a*int_b\n", "int_a/int_b\n", "int_a**int_b\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " ia = zint.create(_.context, 'int_a', 0)\n", " ib = zint.create(_.context, 'int_b', 1)\n", " print((ia + ib).z)\n", " print((ia + 10).z)\n", " print((11 + ib).z)\n", " print((ia - ib).z)\n", " print((ia * ib).z)\n", " print((ia / ib).z)\n", " print((ia ** ib).z)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Integer Unary Operators\n", "\n", "We also implement the relevant unary operators as below." ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.927927Z", "iopub.status.busy": "2024-01-18T17:20:17.927816Z", "iopub.status.idle": "2024-01-18T17:20:17.929406Z", "shell.execute_reply": "2024-01-18T17:20:17.929146Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "INT_UNARY_OPS = [\n", " '__neg__',\n", " '__pos__',\n", " # '__abs__',\n", " # '__invert__',\n", " # '__round__',\n", " # '__ceil__',\n", " # '__floor__',\n", " # '__trunc__',\n", "]" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.930847Z", "iopub.status.busy": "2024-01-18T17:20:17.930755Z", "iopub.status.idle": "2024-01-18T17:20:17.932615Z", "shell.execute_reply": "2024-01-18T17:20:17.932373Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def make_int_unary_wrapper(fname, fun, zfun):\n", " def proxy(self):\n", " return zint(self.context, zfun(self.z), fun(self.v))\n", "\n", " return proxy" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.933967Z", "iopub.status.busy": "2024-01-18T17:20:17.933864Z", "iopub.status.idle": "2024-01-18T17:20:17.935731Z", "shell.execute_reply": "2024-01-18T17:20:17.935474Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def init_concolic_2():\n", " for fname in INT_UNARY_OPS:\n", " fun = getattr(int, fname)\n", " zfun = getattr(z3.ArithRef, fname)\n", " setattr(zint, fname, make_int_unary_wrapper(fname, fun, zfun))" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.937181Z", "iopub.status.busy": "2024-01-18T17:20:17.937082Z", "iopub.status.idle": "2024-01-18T17:20:17.938602Z", "shell.execute_reply": "2024-01-18T17:20:17.938369Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "INITIALIZER_LIST.append(init_concolic_2)" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.939924Z", "iopub.status.busy": "2024-01-18T17:20:17.939841Z", "iopub.status.idle": "2024-01-18T17:20:17.941434Z", "shell.execute_reply": "2024-01-18T17:20:17.941211Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "init_concolic_2()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can use the unary operators we defined above as follows:" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.942795Z", "iopub.status.busy": "2024-01-18T17:20:17.942715Z", "iopub.status.idle": "2024-01-18T17:20:17.945249Z", "shell.execute_reply": "2024-01-18T17:20:17.944930Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-int_a\n", "int_a\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " ia = zint.create(_.context, 'int_a', 0)\n", " print((-ia).z)\n", " print((+ia).z)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Using an Integer in a Boolean Context\n", "\n", "An integer may be converted to a boolean context in conditionals or as part of boolean predicates such as `or`, `and` and `not`. In these cases, the `__bool__()` method gets called. Unfortunately, this method requires a primitive boolean value. Hence, we force the current integer formula to a boolean predicate and register it in the current context." ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.946753Z", "iopub.status.busy": "2024-01-18T17:20:17.946663Z", "iopub.status.idle": "2024-01-18T17:20:17.948407Z", "shell.execute_reply": "2024-01-18T17:20:17.948147Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zint(zint):\n", " def __bool__(self):\n", " # return zbool(self.context, self.z, self.v) <-- not allowed\n", " # force registering boolean condition\n", " if self != 0:\n", " return True\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "It is used as follows" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.949825Z", "iopub.status.busy": "2024-01-18T17:20:17.949742Z", "iopub.status.idle": "2024-01-18T17:20:17.951926Z", "shell.execute_reply": "2024-01-18T17:20:17.951674Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " za = zint.create(_.context, 'int_a', 1)\n", " zb = zint.create(_.context, 'int_b', 0)\n", " if za and zb:\n", " print(1)" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.953314Z", "iopub.status.busy": "2024-01-18T17:20:17.953213Z", "iopub.status.idle": "2024-01-18T17:20:17.955811Z", "shell.execute_reply": "2024-01-18T17:20:17.955577Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'int_a': 'Int', 'int_b': 'Int'}, [0 != int_a, Not(0 != int_b)])" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "toc-hr-collapsed": true }, "source": [ "#### Remaining Methods of the ConcolicTracer\n", "\n", "We now complete some methods of the `ConcolicTracer`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Translating to the SMT Expression Format\n", "\n", "Given that we are using an SMT Solver z3, it is often useful to retrieve the corresponding SMT expression for a symbolic expression. This can be used as an argument to `z3` or other SMT solvers.\n", "\n", "The format of the SMT expression ([SMT-LIB](http://smtlib.github.io/jSMTLIB/SMTLIBTutorial.pdf)) is as follows:\n", "\n", "* Variables declarations in [S-EXP](https://en.wikipedia.org/wiki/S-expression) format.\n", " E.g. The following declares a symbolic integer variable `x`\n", "```\n", "(declare-const x Int)\n", "```\n", " This declares a `bit vector` `b` of length `8`\n", "```\n", "(declare-const b (_ BitVec 8))\n", "```\n", " This declares a symbolic real variable `r`\n", "```\n", "(declare-const x Real)\n", "```\n", " This declares a symbolic string variable `s`\n", "```\n", "(declare-const s String)\n", "```\n", "\n", "The declared variables can be used in logical formulas that are encoded in *S-EXP* format. For example, here is a logical formula.\n", "\n", "```\n", "(assert\n", " (and\n", " (= a b)\n", " (= a c)\n", " (! b c)))\n", "```\n", "Here is another example, using string variables.\n", "\n", "```\n", "(or (< 0 (str.indexof (str.substr my_str1 7 19) \" where \" 0))\n", " (= (str.indexof (str.substr my_str1 7 19) \" where \" 0) 0))\n", "```\n" ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.957313Z", "iopub.status.busy": "2024-01-18T17:20:17.957203Z", "iopub.status.idle": "2024-01-18T17:20:17.959948Z", "shell.execute_reply": "2024-01-18T17:20:17.959711Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ConcolicTracer(ConcolicTracer):\n", " def smt_expr(self, show_decl=False, simplify=False, path=[]):\n", " r = []\n", " if show_decl:\n", " for decl in self.decls:\n", " v = self.decls[decl]\n", " v = '(_ BitVec 8)' if v == 'BitVec' else v\n", " r.append(\"(declare-const %s %s)\" % (decl, v))\n", " path = path if path else self.path\n", " if path:\n", " path = z3.And(path)\n", " if show_decl:\n", " if simplify:\n", " return '\\n'.join([\n", " *r,\n", " \"(assert %s)\" % z3.simplify(path).sexpr()\n", " ])\n", " else:\n", " return '\\n'.join(\n", " [*r, \"(assert %s)\" % path.sexpr()])\n", " else:\n", " return z3.simplify(path).sexpr()\n", " else:\n", " return ''" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To see how to use `smt_expr()`, let us consider an example. The `triangle()` function is used to determine if the given sides to a triangle result in an `equilateral` triangle, an `isosceles` triangle, or a `scalene` triangle. It is implemented as follows." ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.961384Z", "iopub.status.busy": "2024-01-18T17:20:17.961286Z", "iopub.status.idle": "2024-01-18T17:20:17.963183Z", "shell.execute_reply": "2024-01-18T17:20:17.962925Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def triangle(a, b, c):\n", " if a == b:\n", " if b == c:\n", " return 'equilateral'\n", " else:\n", " return 'isosceles'\n", " else:\n", " if b == c:\n", " return 'isosceles'\n", " else:\n", " if a == c:\n", " return 'isosceles'\n", " else:\n", " return 'scalene'" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.964605Z", "iopub.status.busy": "2024-01-18T17:20:17.964492Z", "iopub.status.idle": "2024-01-18T17:20:17.966507Z", "shell.execute_reply": "2024-01-18T17:20:17.966247Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'isosceles'" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "triangle(1, 2, 1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To make `triangle()` run under `ConcolicTracer`, we first define the (symbolic) arguments. The triangle being defined has sides `1, 1, 1`. i.e. it is an `equilateral` triangle." ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.967897Z", "iopub.status.busy": "2024-01-18T17:20:17.967799Z", "iopub.status.idle": "2024-01-18T17:20:17.970667Z", "shell.execute_reply": "2024-01-18T17:20:17.970429Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "({'int_a': 'Int', 'int_b': 'Int', 'int_c': 'Int'}, [int_a == int_b, int_b == int_c])\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " za = zint.create(_.context, 'int_a', 1)\n", " zb = zint.create(_.context, 'int_b', 1)\n", " zc = zint.create(_.context, 'int_c', 1)\n", " triangle(za, zb, zc)\n", "print(_.context)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can now call `smt_expr()` to retrieve the SMT expression as below." ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.972099Z", "iopub.status.busy": "2024-01-18T17:20:17.971983Z", "iopub.status.idle": "2024-01-18T17:20:17.973955Z", "shell.execute_reply": "2024-01-18T17:20:17.973684Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(declare-const int_a Int)\n", "(declare-const int_b Int)\n", "(declare-const int_c Int)\n", "(assert (and (= int_a int_b) (= int_b int_c)))\n" ] } ], "source": [ "print(_.smt_expr(show_decl=True))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The collected predicates can also be solved directly using the Python z3 API." ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.975363Z", "iopub.status.busy": "2024-01-18T17:20:17.975262Z", "iopub.status.idle": "2024-01-18T17:20:17.982958Z", "shell.execute_reply": "2024-01-18T17:20:17.982696Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[int_c = 0, int_a = 0, int_b = 0]\n" ] } ], "source": [ "z3.solve(_.path)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Generating Fresh Names\n", "While using the proxy classes, we often will have to generate new symbolic variables, with names that have not been used before. For this, we define `fresh_name()` that always generates unique integers for names." ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.984430Z", "iopub.status.busy": "2024-01-18T17:20:17.984348Z", "iopub.status.idle": "2024-01-18T17:20:17.985987Z", "shell.execute_reply": "2024-01-18T17:20:17.985743Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "COUNTER = 0" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.987381Z", "iopub.status.busy": "2024-01-18T17:20:17.987305Z", "iopub.status.idle": "2024-01-18T17:20:17.989206Z", "shell.execute_reply": "2024-01-18T17:20:17.988756Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def fresh_name():\n", " global COUNTER\n", " COUNTER += 1\n", " return COUNTER" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It can be used as follows:" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.990718Z", "iopub.status.busy": "2024-01-18T17:20:17.990635Z", "iopub.status.idle": "2024-01-18T17:20:17.992748Z", "shell.execute_reply": "2024-01-18T17:20:17.992420Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fresh_name()" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.994312Z", "iopub.status.busy": "2024-01-18T17:20:17.994196Z", "iopub.status.idle": "2024-01-18T17:20:17.996049Z", "shell.execute_reply": "2024-01-18T17:20:17.995730Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def reset_counter():\n", " global COUNTER\n", " COUNTER = 0" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:17.998347Z", "iopub.status.busy": "2024-01-18T17:20:17.998198Z", "iopub.status.idle": "2024-01-18T17:20:18.000315Z", "shell.execute_reply": "2024-01-18T17:20:18.000018Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class ConcolicTracer(ConcolicTracer):\n", " def __enter__(self):\n", " reset_counter()\n", " return self\n", "\n", " def __exit__(self, exc_type, exc_value, tb):\n", " return" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ " ##### Translating Arguments to Concolic Proxies\n", " \n", "We had previously defined `concolic()` as a transparent function. We now provide the full implementation of this function. It inspects a given function's parameters, and infers the parameter types from the concrete arguments passed in. It then uses this information to instantiate the correct proxy classes for each argument." ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.001951Z", "iopub.status.busy": "2024-01-18T17:20:18.001844Z", "iopub.status.idle": "2024-01-18T17:20:18.004063Z", "shell.execute_reply": "2024-01-18T17:20:18.003823Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ConcolicTracer(ConcolicTracer):\n", " def concolic(self, args):\n", " my_args = []\n", " for name, arg in zip(self.fn_args, args):\n", " t = type(arg).__name__\n", " zwrap = globals()['z' + t]\n", " vname = \"%s_%s_%s_%s\" % (self.fn.__name__, name, t, fresh_name())\n", " my_args.append(zwrap.create(self.context, vname, arg))\n", " self.fn_args[name] = vname\n", " return my_args" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This is how it gets used:" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.005491Z", "iopub.status.busy": "2024-01-18T17:20:18.005383Z", "iopub.status.idle": "2024-01-18T17:20:18.008381Z", "shell.execute_reply": "2024-01-18T17:20:18.008152Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " _[factorial](5)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "With the new `concolic()` method, the arguments to the factorial are correctly associated with symbolic variables, which allows us to retrieve the predicates encountered." ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.009945Z", "iopub.status.busy": "2024-01-18T17:20:18.009832Z", "iopub.status.idle": "2024-01-18T17:20:18.020240Z", "shell.execute_reply": "2024-01-18T17:20:18.019935Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'factorial_n_int_1': 'Int'},\n", " [Not(0 > factorial_n_int_1),\n", " Not(0 == factorial_n_int_1),\n", " Not(1 == factorial_n_int_1),\n", " 0 != factorial_n_int_1,\n", " 0 != factorial_n_int_1 - 1,\n", " 0 != factorial_n_int_1 - 1 - 1,\n", " 0 != factorial_n_int_1 - 1 - 1 - 1,\n", " 0 != factorial_n_int_1 - 1 - 1 - 1 - 1,\n", " Not(0 != factorial_n_int_1 - 1 - 1 - 1 - 1 - 1)])" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As before, we can also print out the SMT expression which can be passed directly to command line SMT solvers." ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.022064Z", "iopub.status.busy": "2024-01-18T17:20:18.021920Z", "iopub.status.idle": "2024-01-18T17:20:18.025126Z", "shell.execute_reply": "2024-01-18T17:20:18.024797Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(declare-const factorial_n_int_1 Int)\n", "(assert (let ((a!1 (distinct 0 (- (- (- factorial_n_int_1 1) 1) 1)))\n", " (a!2 (- (- (- (- factorial_n_int_1 1) 1) 1) 1)))\n", " (and (not (> 0 factorial_n_int_1))\n", " (not (= 0 factorial_n_int_1))\n", " (not (= 1 factorial_n_int_1))\n", " (distinct 0 factorial_n_int_1)\n", " (distinct 0 (- factorial_n_int_1 1))\n", " (distinct 0 (- (- factorial_n_int_1 1) 1))\n", " a!1\n", " (distinct 0 a!2)\n", " (not (distinct 0 (- a!2 1))))))\n" ] } ], "source": [ "print(_.smt_expr(show_decl=True))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We next define methods to evaluate the SMT expression both in Python and from command line." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Evaluating the Concolic Expressions\n", "\n", "We define `zeval()` to solve the predicates in a context, and return results. It has two modes. The `python` mode uses `z3` Python API to solve and return the results. If the `python` mode is false, it writes the SMT expression to a file, and invokes the command line `z3` for a solution." ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.026749Z", "iopub.status.busy": "2024-01-18T17:20:18.026654Z", "iopub.status.idle": "2024-01-18T17:20:18.030589Z", "shell.execute_reply": "2024-01-18T17:20:18.030134Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ConcolicTracer(ConcolicTracer):\n", " def zeval(self, predicates=None, *,python=False, log=False):\n", " \"\"\"Evaluate `predicates` in current context.\n", " - If `python` is set, use the z3 Python API; otherwise use z3 standalone.\n", " - If `log` is set, show input to z3.\n", " Return a pair (`result`, `solution`) where\n", " - `result` is either `'sat'` (satisfiable); then \n", " solution` is a mapping of variables to (value, type) pairs; or\n", " - `result` is not `'sat'`, indicating an error; then `solution` is `None`\n", " \"\"\"\n", " if predicates is None:\n", " path = self.path\n", " else:\n", " path = list(self.path)\n", " for i in sorted(predicates):\n", " if len(path) > i:\n", " path[i] = predicates[i]\n", " else:\n", " path.append(predicates[i])\n", " if log:\n", " print('Predicates in path:')\n", " for i, p in enumerate(path):\n", " print(i, p)\n", " print()\n", "\n", " r, sol = (zeval_py if python else zeval_smt)(path, self, log)\n", " if r == 'sat':\n", " return r, {k: sol.get(self.fn_args[k], None) for k in self.fn_args}\n", " else:\n", " return r, None" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Using the Python API\n", "\n", "Given a set of predicates that the function encountered, and the tracer under which the function was executed, the `zeval_py()` function first declares the relevant symbolic variables, and uses the `z3.Solver()`to provide a set of inputs that would trace the same path through the function." ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.033059Z", "iopub.status.busy": "2024-01-18T17:20:18.032838Z", "iopub.status.idle": "2024-01-18T17:20:18.035797Z", "shell.execute_reply": "2024-01-18T17:20:18.035519Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def zeval_py(path, cc, log):\n", " for decl in cc.decls:\n", " if cc.decls[decl] == 'BitVec':\n", " v = \"z3.%s('%s', 8)\" % (cc.decls[decl], decl)\n", " else:\n", " v = \"z3.%s('%s')\" % (cc.decls[decl], decl)\n", " exec(v)\n", " s = z3.Solver()\n", " s.add(z3.And(path))\n", " if s.check() == z3.unsat:\n", " return 'No Solutions', {}\n", " elif s.check() == z3.unknown:\n", " return 'Gave up', None\n", " assert s.check() == z3.sat\n", " m = s.model()\n", " return 'sat', {d.name(): m[d] for d in m.decls()}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "It can be used as follows:" ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.037327Z", "iopub.status.busy": "2024-01-18T17:20:18.037204Z", "iopub.status.idle": "2024-01-18T17:20:18.040532Z", "shell.execute_reply": "2024-01-18T17:20:18.040258Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " _[factorial](5)" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.042342Z", "iopub.status.busy": "2024-01-18T17:20:18.042252Z", "iopub.status.idle": "2024-01-18T17:20:18.056465Z", "shell.execute_reply": "2024-01-18T17:20:18.056151Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'n': 5})" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval(python=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "That is, given the set of constraints, the assignment `n == 5` conforms to all constraints." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Using the Command Line\n", "\n", "The `zeval_smt()` function writes the SMT expression to the file system, and calls the `z3` SMT solver command line to solve it. The result of SMT expression is again an `sexpr`. Hence, we first define `parse_sexp()` to parse and return the correct values." ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.058161Z", "iopub.status.busy": "2024-01-18T17:20:18.058052Z", "iopub.status.idle": "2024-01-18T17:20:18.059673Z", "shell.execute_reply": "2024-01-18T17:20:18.059415Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import re" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.061476Z", "iopub.status.busy": "2024-01-18T17:20:18.061349Z", "iopub.status.idle": "2024-01-18T17:20:18.063215Z", "shell.execute_reply": "2024-01-18T17:20:18.062870Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import subprocess" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.065017Z", "iopub.status.busy": "2024-01-18T17:20:18.064895Z", "iopub.status.idle": "2024-01-18T17:20:18.066672Z", "shell.execute_reply": "2024-01-18T17:20:18.066343Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "SEXPR_TOKEN = r'''(?mx)\n", " \\s*(?:\n", " (?P\\()|\n", " (?P\\))|\n", " (?P[^\"()\\s]+)|\n", " (?P\"[^\"]*\")\n", " )'''" ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.068205Z", "iopub.status.busy": "2024-01-18T17:20:18.068091Z", "iopub.status.idle": "2024-01-18T17:20:18.070701Z", "shell.execute_reply": "2024-01-18T17:20:18.070460Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def parse_sexp(sexp):\n", " stack, res = [], []\n", " for elements in re.finditer(SEXPR_TOKEN, sexp):\n", " kind, value = [(t, v) for t, v in elements.groupdict().items() if v][0]\n", " if kind == 'bra':\n", " stack.append(res)\n", " res = []\n", " elif kind == 'ket':\n", " last, res = res, stack.pop(-1)\n", " res.append(last)\n", " elif kind == 'token':\n", " res.append(value)\n", " elif kind == 'string':\n", " res.append(value[1:-1])\n", " else:\n", " assert False\n", " return res" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `parse_sexp()` function can be used as follows" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.072285Z", "iopub.status.busy": "2024-01-18T17:20:18.072182Z", "iopub.status.idle": "2024-01-18T17:20:18.074461Z", "shell.execute_reply": "2024-01-18T17:20:18.074201Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['abcd', ['hello', '123', ['world', 'hello world']]]" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parse_sexp('abcd (hello 123 (world \"hello world\"))')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We now define `zeval_smt()` which uses the `z3` command line directly, and uses `parse_sexp()` to parse and return the solutions to function arguments if any." ] }, { "cell_type": "code", "execution_count": 116, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.075963Z", "iopub.status.busy": "2024-01-18T17:20:18.075863Z", "iopub.status.idle": "2024-01-18T17:20:18.077428Z", "shell.execute_reply": "2024-01-18T17:20:18.077169Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import tempfile\n", "import os" ] }, { "cell_type": "code", "execution_count": 117, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.079076Z", "iopub.status.busy": "2024-01-18T17:20:18.078948Z", "iopub.status.idle": "2024-01-18T17:20:18.080712Z", "shell.execute_reply": "2024-01-18T17:20:18.080412Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "Z3_BINARY = 'z3' # Z3 binary to invoke" ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.082188Z", "iopub.status.busy": "2024-01-18T17:20:18.082086Z", "iopub.status.idle": "2024-01-18T17:20:18.083623Z", "shell.execute_reply": "2024-01-18T17:20:18.083379Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "Z3_OPTIONS = '-t:6000' # Z3 options - a soft timeout of 6000 milliseconds" ] }, { "cell_type": "code", "execution_count": 119, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.085082Z", "iopub.status.busy": "2024-01-18T17:20:18.084985Z", "iopub.status.idle": "2024-01-18T17:20:18.088538Z", "shell.execute_reply": "2024-01-18T17:20:18.088307Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def zeval_smt(path, cc, log):\n", " s = cc.smt_expr(True, True, path)\n", "\n", " with tempfile.NamedTemporaryFile(mode='w', suffix='.smt',\n", " delete=False) as f:\n", " f.write(s + \"\\n\")\n", " f.write(\"(check-sat)\\n\")\n", " f.write(\"(get-model)\\n\")\n", "\n", " if log:\n", " print(open(f.name).read())\n", "\n", " cmd = f\"{Z3_BINARY} {Z3_OPTIONS} {f.name}\"\n", " if log:\n", " print(cmd)\n", "\n", " output = subprocess.getoutput(cmd)\n", "\n", " os.remove(f.name)\n", "\n", " if log:\n", " print(output)\n", "\n", " o = parse_sexp(output)\n", " if not o:\n", " return 'Gave up', None\n", "\n", " kind = o[0]\n", " if kind == 'unknown':\n", " return 'Gave up', None\n", " elif kind == 'timeout':\n", " return 'Timeout', None\n", " elif kind == 'unsat':\n", " return 'No Solutions', {}\n", "\n", " assert kind == 'sat', kind\n", " if o[1][0] == 'model': # up to 4.8.8.0\n", " return 'sat', {i[1]: (i[-1], i[-2]) for i in o[1][1:]}\n", " else:\n", " return 'sat', {i[1]: (i[-1], i[-2]) for i in o[1][0:]}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can now use `zeval()` as follows." ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.090019Z", "iopub.status.busy": "2024-01-18T17:20:18.089924Z", "iopub.status.idle": "2024-01-18T17:20:18.093112Z", "shell.execute_reply": "2024-01-18T17:20:18.092755Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " _[factorial](5)" ] }, { "cell_type": "code", "execution_count": 121, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.094823Z", "iopub.status.busy": "2024-01-18T17:20:18.094719Z", "iopub.status.idle": "2024-01-18T17:20:18.157543Z", "shell.execute_reply": "2024-01-18T17:20:18.157176Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Predicates in path:\n", "0 Not(0 > factorial_n_int_1)\n", "1 Not(0 == factorial_n_int_1)\n", "2 Not(1 == factorial_n_int_1)\n", "3 0 != factorial_n_int_1\n", "4 0 != factorial_n_int_1 - 1\n", "5 0 != factorial_n_int_1 - 1 - 1\n", "6 0 != factorial_n_int_1 - 1 - 1 - 1\n", "7 0 != factorial_n_int_1 - 1 - 1 - 1 - 1\n", "8 Not(0 != factorial_n_int_1 - 1 - 1 - 1 - 1 - 1)\n", "\n", "(declare-const factorial_n_int_1 Int)\n", "(assert (and (<= 0 factorial_n_int_1)\n", " (not (= 0 factorial_n_int_1))\n", " (not (= 1 factorial_n_int_1))\n", " (not (= 2 factorial_n_int_1))\n", " (not (= 3 factorial_n_int_1))\n", " (not (= 4 factorial_n_int_1))\n", " (= 5 factorial_n_int_1)))\n", "(check-sat)\n", "(get-model)\n", "\n", "z3 -t:6000 /var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/tmpsy3ai9oq.smt\n", "sat\n", "(\n", " (define-fun factorial_n_int_1 () Int\n", " 5)\n", ")\n" ] }, { "data": { "text/plain": [ "('sat', {'n': ('5', 'Int')})" ] }, "execution_count": 121, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval(log=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Indeed, we get similar results (`n == 5`) from using the command line as from using the Python API." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "toc-hr-collapsed": true }, "source": [ "#### A Proxy Class for Strings\n", "\n", "Here, we define the proxy string class `zstr`. First we define our initialization routines. Since `str` is a primitive type, we define `new` to extend it." ] }, { "cell_type": "code", "execution_count": 122, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.159225Z", "iopub.status.busy": "2024-01-18T17:20:18.159115Z", "iopub.status.idle": "2024-01-18T17:20:18.161096Z", "shell.execute_reply": "2024-01-18T17:20:18.160833Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zstr(str):\n", " def __new__(cls, context, zn, v):\n", " return str.__new__(cls, v)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "As before, initialization proceeds with `create()` and the constructor." ] }, { "cell_type": "code", "execution_count": 123, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.162685Z", "iopub.status.busy": "2024-01-18T17:20:18.162560Z", "iopub.status.idle": "2024-01-18T17:20:18.164899Z", "shell.execute_reply": "2024-01-18T17:20:18.164622Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " @classmethod\n", " def create(cls, context, zn, v=None):\n", " return zproxy_create(cls, 'String', z3.String, context, zn, v)\n", "\n", " def __init__(self, context, z, v=None):\n", " self.context, self.z, self.v = context, z, v\n", " self._len = zint(context, z3.Length(z), len(v))\n", " #self.context[1].append(z3.Length(z) == z3.IntVal(len(v)))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We also define `_zv()` helper to help us with methods that accept another string" ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.166388Z", "iopub.status.busy": "2024-01-18T17:20:18.166281Z", "iopub.status.idle": "2024-01-18T17:20:18.168123Z", "shell.execute_reply": "2024-01-18T17:20:18.167891Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def _zv(self, o):\n", " return (o.z, o.v) if isinstance(o, zstr) else (z3.StringVal(o), o)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Hack to use the ASCII value of a character." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Note:** Temporary solution; This block should go away as soon as [this commit](https://github.com/Z3Prover/z3/issues/5764)\n", "is released, which allows us to use the Python API directly." ] }, { "cell_type": "code", "execution_count": 125, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.169727Z", "iopub.status.busy": "2024-01-18T17:20:18.169625Z", "iopub.status.idle": "2024-01-18T17:20:18.173903Z", "shell.execute_reply": "2024-01-18T17:20:18.173642Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from typing import Union, Optional, Dict, Generator, Set\n", "\n", "def visit_z3_expr(\n", " e: Union[z3.ExprRef, z3.QuantifierRef],\n", " seen: Optional[Dict[Union[z3.ExprRef, z3.QuantifierRef], bool]] = None) -> \\\n", " Generator[Union[z3.ExprRef, z3.QuantifierRef], None, None]:\n", " if seen is None:\n", " seen = {}\n", " elif e in seen:\n", " return\n", "\n", " seen[e] = True\n", " yield e\n", "\n", " if z3.is_app(e):\n", " for ch in e.children():\n", " for e in visit_z3_expr(ch, seen):\n", " yield e\n", " return\n", "\n", " if z3.is_quantifier(e):\n", " for e in visit_z3_expr(e.body(), seen):\n", " yield e\n", " return\n", "\n", "\n", "def is_z3_var(e: z3.ExprRef) -> bool:\n", " return z3.is_const(e) and e.decl().kind() == z3.Z3_OP_UNINTERPRETED\n", "\n", "\n", "def get_all_vars(e: z3.ExprRef) -> Set[z3.ExprRef]:\n", " return {sub for sub in visit_z3_expr(e) if is_z3_var(sub)}\n", "\n", "\n", "def z3_ord(str_expr: z3.SeqRef) -> z3.ArithRef:\n", " return z3.parse_smt2_string(\n", " f\"(assert (= 42 (str.to_code {str_expr.sexpr()})))\",\n", " decls={str(c): c for c in get_all_vars(str_expr)}\n", " )[0].children()[1]\n", "\n", "\n", "def z3_chr(int_expr: z3.ArithRef) -> z3.SeqRef:\n", " return z3.parse_smt2_string(\n", " f\"(assert (= \\\"4\\\" (str.from_code {int_expr.sexpr()})))\",\n", " decls={str(c): c for c in get_all_vars(int_expr)}\n", " )[0].children()[1]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Retrieving Ordinal Value\n", "We define `zord` that given a symbolic one character long string, obtains the `ord()` for that." ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.175327Z", "iopub.status.busy": "2024-01-18T17:20:18.175243Z", "iopub.status.idle": "2024-01-18T17:20:18.176947Z", "shell.execute_reply": "2024-01-18T17:20:18.176688Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def zord(context, c):\n", " return z3_ord(c)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We use it as follows" ] }, { "cell_type": "code", "execution_count": 127, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.178662Z", "iopub.status.busy": "2024-01-18T17:20:18.178558Z", "iopub.status.idle": "2024-01-18T17:20:18.180622Z", "shell.execute_reply": "2024-01-18T17:20:18.180318Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "zc = z3.String('arg_%d' % fresh_name())" ] }, { "cell_type": "code", "execution_count": 128, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.182350Z", "iopub.status.busy": "2024-01-18T17:20:18.182255Z", "iopub.status.idle": "2024-01-18T17:20:18.184967Z", "shell.execute_reply": "2024-01-18T17:20:18.184676Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " zi = zord(_.context, zc)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "No new variables are defined." ] }, { "cell_type": "code", "execution_count": 129, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.186831Z", "iopub.status.busy": "2024-01-18T17:20:18.186701Z", "iopub.status.idle": "2024-01-18T17:20:18.188872Z", "shell.execute_reply": "2024-01-18T17:20:18.188576Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({}, [])" ] }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is the smtlib representation." ] }, { "cell_type": "code", "execution_count": 130, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.190518Z", "iopub.status.busy": "2024-01-18T17:20:18.190398Z", "iopub.status.idle": "2024-01-18T17:20:18.192649Z", "shell.execute_reply": "2024-01-18T17:20:18.192393Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'(str.to_code arg_2)'" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zi.sexpr()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can specify what the result of `ord()` should be, and call `z3.solve()` to provide us with a solution that will provide the required result." ] }, { "cell_type": "code", "execution_count": 131, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.194228Z", "iopub.status.busy": "2024-01-18T17:20:18.194122Z", "iopub.status.idle": "2024-01-18T17:20:18.196616Z", "shell.execute_reply": "2024-01-18T17:20:18.196227Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'(= (str.to_code arg_2) 65)'" ] }, "execution_count": 131, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(zi == 65).sexpr()" ] }, { "cell_type": "code", "execution_count": 132, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.198440Z", "iopub.status.busy": "2024-01-18T17:20:18.198301Z", "iopub.status.idle": "2024-01-18T17:20:18.209922Z", "shell.execute_reply": "2024-01-18T17:20:18.209663Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[arg_2 = \"A\"]\n" ] } ], "source": [ "z3.solve([zi == 65])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Translating an Ordinal Value to ASCII\n", "Similarly, we can convert the ASCII value back to a single character string using `zchr()`" ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.211579Z", "iopub.status.busy": "2024-01-18T17:20:18.211448Z", "iopub.status.idle": "2024-01-18T17:20:18.213201Z", "shell.execute_reply": "2024-01-18T17:20:18.212935Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def zchr(context, i):\n", " return z3_chr(i)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For using it, we first define a bitvector that is 8 bits long." ] }, { "cell_type": "code", "execution_count": 134, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.215049Z", "iopub.status.busy": "2024-01-18T17:20:18.214942Z", "iopub.status.idle": "2024-01-18T17:20:18.216540Z", "shell.execute_reply": "2024-01-18T17:20:18.216279Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "i = z3.Int('ival_%d' % fresh_name())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can now retrieve the `chr()` representation as below." ] }, { "cell_type": "code", "execution_count": 135, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.217992Z", "iopub.status.busy": "2024-01-18T17:20:18.217911Z", "iopub.status.idle": "2024-01-18T17:20:18.220229Z", "shell.execute_reply": "2024-01-18T17:20:18.219812Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " zc = zchr(_.context, i)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "No new variables are defined." ] }, { "cell_type": "code", "execution_count": 136, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.221777Z", "iopub.status.busy": "2024-01-18T17:20:18.221542Z", "iopub.status.idle": "2024-01-18T17:20:18.223574Z", "shell.execute_reply": "2024-01-18T17:20:18.223337Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({}, [])" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "code", "execution_count": 137, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.225113Z", "iopub.status.busy": "2024-01-18T17:20:18.225014Z", "iopub.status.idle": "2024-01-18T17:20:18.227546Z", "shell.execute_reply": "2024-01-18T17:20:18.227286Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'(= (str.from_code ival_1) \"a\")'" ] }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(zc== z3.StringVal('a')).sexpr()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As before, we can specify what the end result of calling `chr()` should be to get the original argument." ] }, { "cell_type": "code", "execution_count": 138, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.229371Z", "iopub.status.busy": "2024-01-18T17:20:18.229245Z", "iopub.status.idle": "2024-01-18T17:20:18.238579Z", "shell.execute_reply": "2024-01-18T17:20:18.238321Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ival_1 = 97]\n" ] } ], "source": [ "z3.solve([zc == z3.StringVal('a')])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Equality between Strings\n", "\n", "The equality of `zstr` is defined similar to that of `zint`" ] }, { "cell_type": "code", "execution_count": 139, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.240119Z", "iopub.status.busy": "2024-01-18T17:20:18.240006Z", "iopub.status.idle": "2024-01-18T17:20:18.241997Z", "shell.execute_reply": "2024-01-18T17:20:18.241772Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def __eq__(self, other):\n", " z, v = self._zv(other)\n", " return zbool(self.context, self.z == z, self.v == v)\n", "\n", " def __req__(self, other):\n", " return self.__eq__(other)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `zstr` class is used as follows." ] }, { "cell_type": "code", "execution_count": 140, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.243543Z", "iopub.status.busy": "2024-01-18T17:20:18.243424Z", "iopub.status.idle": "2024-01-18T17:20:18.245178Z", "shell.execute_reply": "2024-01-18T17:20:18.244850Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def tstr1(s):\n", " if s == 'h':\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": 141, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.246737Z", "iopub.status.busy": "2024-01-18T17:20:18.246621Z", "iopub.status.idle": "2024-01-18T17:20:18.248654Z", "shell.execute_reply": "2024-01-18T17:20:18.248395Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr1]('h')" ] }, { "cell_type": "code", "execution_count": 142, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.250230Z", "iopub.status.busy": "2024-01-18T17:20:18.250112Z", "iopub.status.idle": "2024-01-18T17:20:18.271305Z", "shell.execute_reply": "2024-01-18T17:20:18.270942Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('h', 'String')})" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It works even if we have more than one character." ] }, { "cell_type": "code", "execution_count": 143, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.273020Z", "iopub.status.busy": "2024-01-18T17:20:18.272906Z", "iopub.status.idle": "2024-01-18T17:20:18.274752Z", "shell.execute_reply": "2024-01-18T17:20:18.274497Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def tstr1(s): # type: ignore\n", " if s == 'hello world':\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": 144, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.276115Z", "iopub.status.busy": "2024-01-18T17:20:18.276016Z", "iopub.status.idle": "2024-01-18T17:20:18.278114Z", "shell.execute_reply": "2024-01-18T17:20:18.277826Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr1]('hello world')" ] }, { "cell_type": "code", "execution_count": 145, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.279669Z", "iopub.status.busy": "2024-01-18T17:20:18.279551Z", "iopub.status.idle": "2024-01-18T17:20:18.282552Z", "shell.execute_reply": "2024-01-18T17:20:18.282277Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'tstr1_s_str_1': 'String'}, [tstr1_s_str_1 == \"hello world\"])" ] }, "execution_count": 145, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "code", "execution_count": 146, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.284070Z", "iopub.status.busy": "2024-01-18T17:20:18.283964Z", "iopub.status.idle": "2024-01-18T17:20:18.302273Z", "shell.execute_reply": "2024-01-18T17:20:18.301954Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('hello world', 'String')})" ] }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Length of Strings\n", "\n", "Unfortunately, in Python, we can't override `len()` to return a new datatype. Hence, we work around that." ] }, { "cell_type": "code", "execution_count": 147, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.304362Z", "iopub.status.busy": "2024-01-18T17:20:18.304256Z", "iopub.status.idle": "2024-01-18T17:20:18.306207Z", "shell.execute_reply": "2024-01-18T17:20:18.305945Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def __len__(self):\n", " raise NotImplemented()" ] }, { "cell_type": "code", "execution_count": 148, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.307726Z", "iopub.status.busy": "2024-01-18T17:20:18.307623Z", "iopub.status.idle": "2024-01-18T17:20:18.309235Z", "shell.execute_reply": "2024-01-18T17:20:18.308980Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def length(self):\n", " return self._len" ] }, { "cell_type": "code", "execution_count": 149, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.310782Z", "iopub.status.busy": "2024-01-18T17:20:18.310664Z", "iopub.status.idle": "2024-01-18T17:20:18.313013Z", "shell.execute_reply": "2024-01-18T17:20:18.312750Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " za = zstr.create(_.context, 'str_a', \"s\")\n", " if za.length() > 0:\n", " print(1)" ] }, { "cell_type": "code", "execution_count": 150, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.314735Z", "iopub.status.busy": "2024-01-18T17:20:18.314610Z", "iopub.status.idle": "2024-01-18T17:20:18.317333Z", "shell.execute_reply": "2024-01-18T17:20:18.317095Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'str_a': 'String'}, [0 < Length(str_a)])" ] }, "execution_count": 150, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "code", "execution_count": 151, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.318768Z", "iopub.status.busy": "2024-01-18T17:20:18.318671Z", "iopub.status.idle": "2024-01-18T17:20:18.320292Z", "shell.execute_reply": "2024-01-18T17:20:18.320065Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def tstr2(s):\n", " if s.length() > 1:\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": 152, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.321806Z", "iopub.status.busy": "2024-01-18T17:20:18.321706Z", "iopub.status.idle": "2024-01-18T17:20:18.323559Z", "shell.execute_reply": "2024-01-18T17:20:18.323333Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr2]('hello world')" ] }, { "cell_type": "code", "execution_count": 153, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.325381Z", "iopub.status.busy": "2024-01-18T17:20:18.325291Z", "iopub.status.idle": "2024-01-18T17:20:18.327650Z", "shell.execute_reply": "2024-01-18T17:20:18.327397Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "({'tstr2_s_str_1': 'String'}, [1 < Length(tstr2_s_str_1)])" ] }, "execution_count": 153, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "code", "execution_count": 154, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.329385Z", "iopub.status.busy": "2024-01-18T17:20:18.329182Z", "iopub.status.idle": "2024-01-18T17:20:18.361847Z", "shell.execute_reply": "2024-01-18T17:20:18.361416Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Predicates in path:\n", "0 1 < Length(tstr2_s_str_1)\n", "\n", "(declare-const tstr2_s_str_1 String)\n", "(assert (not (<= (str.len tstr2_s_str_1) 1)))\n", "(check-sat)\n", "(get-model)\n", "\n", "z3 -t:6000 /var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/tmpo8sz1k_w.smt\n", "sat\n", "(\n", " (define-fun tstr2_s_str_1 () String\n", " \"AB\")\n", ")\n" ] }, { "data": { "text/plain": [ "('sat', {'s': ('AB', 'String')})" ] }, "execution_count": 154, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval(log=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Concatenation of Strings\n", "What if we need to concatenate two strings? We need additional helpers to accomplish that." ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.363787Z", "iopub.status.busy": "2024-01-18T17:20:18.363663Z", "iopub.status.idle": "2024-01-18T17:20:18.366021Z", "shell.execute_reply": "2024-01-18T17:20:18.365745Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def __add__(self, other):\n", " z, v = self._zv(other)\n", " return zstr(self.context, self.z + z, self.v + v)\n", "\n", " def __radd__(self, other):\n", " return self.__add__(other)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is how it can be used. First, we create the wrapped arguments" ] }, { "cell_type": "code", "execution_count": 156, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.367569Z", "iopub.status.busy": "2024-01-18T17:20:18.367462Z", "iopub.status.idle": "2024-01-18T17:20:18.370067Z", "shell.execute_reply": "2024-01-18T17:20:18.369807Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hello world\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " v1, v2 = [zstr.create(_.context, 'arg_%d' % fresh_name(), s)\n", " for s in ['hello', 'world']]\n", " if (v1 + ' ' + v2) == 'hello world':\n", " print('hello world')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The addition of symbolic variables is preserved in `context`" ] }, { "cell_type": "code", "execution_count": 157, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.371559Z", "iopub.status.busy": "2024-01-18T17:20:18.371455Z", "iopub.status.idle": "2024-01-18T17:20:18.374037Z", "shell.execute_reply": "2024-01-18T17:20:18.373797Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'arg_1': 'String', 'arg_2': 'String'},\n", " [Concat(Concat(arg_1, \" \"), arg_2) == \"hello world\"])" ] }, "execution_count": 157, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Producing Substrings\n", "Similarly, accessing substrings also require extra help." ] }, { "cell_type": "code", "execution_count": 158, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.375514Z", "iopub.status.busy": "2024-01-18T17:20:18.375410Z", "iopub.status.idle": "2024-01-18T17:20:18.378174Z", "shell.execute_reply": "2024-01-18T17:20:18.377846Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def __getitem__(self, idx):\n", " if isinstance(idx, slice):\n", " start, stop, step = idx.indices(len(self.v))\n", " assert step == 1 # for now\n", " assert stop >= start # for now\n", " rz = z3.SubString(self.z, start, stop - start)\n", " rv = self.v[idx]\n", " elif isinstance(idx, int):\n", " rz = z3.SubString(self.z, idx, 1)\n", " rv = self.v[idx]\n", " else:\n", " assert False # for now\n", " return zstr(self.context, rz, rv)\n", "\n", " def __iter__(self):\n", " return zstr_iterator(self.context, self)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### An Iterator Class for Strings\n", "\n", "We define the iterator as follows." ] }, { "cell_type": "code", "execution_count": 159, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.379909Z", "iopub.status.busy": "2024-01-18T17:20:18.379796Z", "iopub.status.idle": "2024-01-18T17:20:18.382207Z", "shell.execute_reply": "2024-01-18T17:20:18.381941Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zstr_iterator():\n", " def __init__(self, context, zstr):\n", " self.context = context\n", " self._zstr = zstr\n", " self._str_idx = 0\n", " self._str_max = zstr._len # intz is not an _int_\n", "\n", " def __next__(self):\n", " if self._str_idx == self._str_max: # intz#eq\n", " raise StopIteration\n", " c = self._zstr[self._str_idx]\n", " self._str_idx += 1\n", " return c\n", "\n", " def __len__(self):\n", " return self._len" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here is how it can be used." ] }, { "cell_type": "code", "execution_count": 160, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.383720Z", "iopub.status.busy": "2024-01-18T17:20:18.383617Z", "iopub.status.idle": "2024-01-18T17:20:18.385362Z", "shell.execute_reply": "2024-01-18T17:20:18.385139Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def tstr3(s):\n", " if s[0] == 'h' and s[1] == 'e' and s[3] == 'l':\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": 161, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.386753Z", "iopub.status.busy": "2024-01-18T17:20:18.386652Z", "iopub.status.idle": "2024-01-18T17:20:18.388990Z", "shell.execute_reply": "2024-01-18T17:20:18.388734Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr3]('hello')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Again, the context shows predicates encountered." ] }, { "cell_type": "code", "execution_count": 162, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.390479Z", "iopub.status.busy": "2024-01-18T17:20:18.390380Z", "iopub.status.idle": "2024-01-18T17:20:18.393899Z", "shell.execute_reply": "2024-01-18T17:20:18.393614Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'tstr3_s_str_1': 'String'},\n", " [str.substr(tstr3_s_str_1, 0, 1) == \"h\",\n", " str.substr(tstr3_s_str_1, 1, 1) == \"e\",\n", " str.substr(tstr3_s_str_1, 3, 1) == \"l\"])" ] }, "execution_count": 162, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The function `zeval()` returns a solution for the predicate. Note that the value returned is not exactly the argument that we passed in. This is a consequence of the predicates we have. That is, we have no constraints on what the character value on `s[2]` should be." ] }, { "cell_type": "code", "execution_count": 163, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.395624Z", "iopub.status.busy": "2024-01-18T17:20:18.395527Z", "iopub.status.idle": "2024-01-18T17:20:18.416975Z", "shell.execute_reply": "2024-01-18T17:20:18.416636Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('heAl', 'String')})" ] }, "execution_count": 163, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Translating to Upper and Lower Equivalents\n", "\n", "A major complication is supporting `upper()` and `lower()` methods. We use the previously defined `zchr()` and `zord()` functions to accomplish this." ] }, { "cell_type": "code", "execution_count": 164, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.418800Z", "iopub.status.busy": "2024-01-18T17:20:18.418576Z", "iopub.status.idle": "2024-01-18T17:20:18.421918Z", "shell.execute_reply": "2024-01-18T17:20:18.421642Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def upper(self):\n", " empty = ''\n", " ne = 'empty_%d' % fresh_name()\n", " result = zstr.create(self.context, ne, empty)\n", " self.context[1].append(z3.StringVal(empty) == result.z)\n", " cdiff = (ord('a') - ord('A'))\n", " for i in self:\n", " oz = zord(self.context, i.z)\n", " uz = zchr(self.context, oz - cdiff)\n", " rz = z3.And([oz >= ord('a'), oz <= ord('z')])\n", " ov = ord(i.v)\n", " uv = chr(ov - cdiff)\n", " rv = ov >= ord('a') and ov <= ord('z')\n", " if zbool(self.context, rz, rv):\n", " i = zstr(self.context, uz, uv)\n", " else:\n", " i = zstr(self.context, i.z, i.v)\n", " result += i\n", " return result" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `lower()` function is similar to `upper()` except that the character ranges are switched, and the lowercase is above uppercase. Hence, we add the difference to the ordinal to make a character to lowercase." ] }, { "cell_type": "code", "execution_count": 165, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.423648Z", "iopub.status.busy": "2024-01-18T17:20:18.423560Z", "iopub.status.idle": "2024-01-18T17:20:18.426570Z", "shell.execute_reply": "2024-01-18T17:20:18.426311Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def lower(self):\n", " empty = ''\n", " ne = 'empty_%d' % fresh_name()\n", " result = zstr.create(self.context, ne, empty)\n", " self.context[1].append(z3.StringVal(empty) == result.z)\n", " cdiff = (ord('a') - ord('A'))\n", " for i in self:\n", " oz = zord(self.context, i.z)\n", " uz = zchr(self.context, oz + cdiff)\n", " rz = z3.And([oz >= ord('A'), oz <= ord('Z')])\n", " ov = ord(i.v)\n", " uv = chr(ov + cdiff)\n", " rv = ov >= ord('A') and ov <= ord('Z')\n", " if zbool(self.context, rz, rv):\n", " i = zstr(self.context, uz, uv)\n", " else:\n", " i = zstr(self.context, i.z, i.v)\n", " result += i\n", " return result" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here is how `upper()` is used." ] }, { "cell_type": "code", "execution_count": 166, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.428227Z", "iopub.status.busy": "2024-01-18T17:20:18.428128Z", "iopub.status.idle": "2024-01-18T17:20:18.430212Z", "shell.execute_reply": "2024-01-18T17:20:18.429888Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def tstr4(s):\n", " if s.upper() == 'H':\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": 167, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.431929Z", "iopub.status.busy": "2024-01-18T17:20:18.431815Z", "iopub.status.idle": "2024-01-18T17:20:18.435596Z", "shell.execute_reply": "2024-01-18T17:20:18.435328Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr4]('h')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Again, we use `zeval()` to solve the collected constraints, and verify that our constraints are correct. " ] }, { "cell_type": "code", "execution_count": 168, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.437182Z", "iopub.status.busy": "2024-01-18T17:20:18.437084Z", "iopub.status.idle": "2024-01-18T17:20:18.461253Z", "shell.execute_reply": "2024-01-18T17:20:18.460912Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('h', 'String')})" ] }, "execution_count": 168, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is a larger example using `lower()`:" ] }, { "cell_type": "code", "execution_count": 169, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.463358Z", "iopub.status.busy": "2024-01-18T17:20:18.463202Z", "iopub.status.idle": "2024-01-18T17:20:18.465371Z", "shell.execute_reply": "2024-01-18T17:20:18.465055Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def tstr5(s):\n", " if s.lower() == 'hello world':\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": 170, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.467686Z", "iopub.status.busy": "2024-01-18T17:20:18.467566Z", "iopub.status.idle": "2024-01-18T17:20:18.479804Z", "shell.execute_reply": "2024-01-18T17:20:18.479512Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr5]('Hello World')" ] }, { "cell_type": "code", "execution_count": 171, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.481577Z", "iopub.status.busy": "2024-01-18T17:20:18.481487Z", "iopub.status.idle": "2024-01-18T17:20:18.535311Z", "shell.execute_reply": "2024-01-18T17:20:18.534859Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('Hello World', 'String')})" ] }, "execution_count": 171, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Again, we obtain the right input value." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Checking for String Prefixes\n", "We define `startswith()`." ] }, { "cell_type": "code", "execution_count": 172, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.537292Z", "iopub.status.busy": "2024-01-18T17:20:18.537171Z", "iopub.status.idle": "2024-01-18T17:20:18.539903Z", "shell.execute_reply": "2024-01-18T17:20:18.539657Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def startswith(self, other, beg=0, end=None):\n", " assert end is None # for now\n", " assert isinstance(beg, int) # for now\n", " zb = z3.IntVal(beg)\n", "\n", " others = other if isinstance(other, tuple) else (other, )\n", "\n", " last = False\n", " for o in others:\n", " z, v = self._zv(o)\n", " r = z3.IndexOf(self.z, z, zb)\n", " last = zbool(self.context, r == zb, self.v.startswith(v))\n", " if last:\n", " return last\n", " return last" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "An example." ] }, { "cell_type": "code", "execution_count": 173, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.541379Z", "iopub.status.busy": "2024-01-18T17:20:18.541267Z", "iopub.status.idle": "2024-01-18T17:20:18.543050Z", "shell.execute_reply": "2024-01-18T17:20:18.542729Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def tstr6(s):\n", " if s.startswith('hello'):\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": 174, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.544732Z", "iopub.status.busy": "2024-01-18T17:20:18.544620Z", "iopub.status.idle": "2024-01-18T17:20:18.546932Z", "shell.execute_reply": "2024-01-18T17:20:18.546694Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr6]('hello world')" ] }, { "cell_type": "code", "execution_count": 175, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.548402Z", "iopub.status.busy": "2024-01-18T17:20:18.548302Z", "iopub.status.idle": "2024-01-18T17:20:18.569271Z", "shell.execute_reply": "2024-01-18T17:20:18.568939Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('helloAhello', 'String')})" ] }, "execution_count": 175, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "code", "execution_count": 176, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.570870Z", "iopub.status.busy": "2024-01-18T17:20:18.570754Z", "iopub.status.idle": "2024-01-18T17:20:18.573024Z", "shell.execute_reply": "2024-01-18T17:20:18.572794Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr6]('my world')" ] }, { "cell_type": "code", "execution_count": 177, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.574465Z", "iopub.status.busy": "2024-01-18T17:20:18.574367Z", "iopub.status.idle": "2024-01-18T17:20:18.595507Z", "shell.execute_reply": "2024-01-18T17:20:18.595122Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('', 'String')})" ] }, "execution_count": 177, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As before, the predicates only ensure that the `startswith()` returned a true value. Hence, our solution reflects that." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Finding Substrings\n", "We also define `find()`" ] }, { "cell_type": "code", "execution_count": 178, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.597309Z", "iopub.status.busy": "2024-01-18T17:20:18.597195Z", "iopub.status.idle": "2024-01-18T17:20:18.599600Z", "shell.execute_reply": "2024-01-18T17:20:18.599363Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def find(self, other, beg=0, end=None):\n", " assert end is None # for now\n", " assert isinstance(beg, int) # for now\n", " zb = z3.IntVal(beg)\n", " z, v = self._zv(other)\n", " zi = z3.IndexOf(self.z, z, zb)\n", " vi = self.v.find(v, beg, end)\n", " return zint(self.context, zi, vi)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "An example." ] }, { "cell_type": "code", "execution_count": 179, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.601029Z", "iopub.status.busy": "2024-01-18T17:20:18.600931Z", "iopub.status.idle": "2024-01-18T17:20:18.602581Z", "shell.execute_reply": "2024-01-18T17:20:18.602336Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def tstr7(s):\n", " if s.find('world') != -1:\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": 180, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.604095Z", "iopub.status.busy": "2024-01-18T17:20:18.603997Z", "iopub.status.idle": "2024-01-18T17:20:18.606084Z", "shell.execute_reply": "2024-01-18T17:20:18.605846Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr7]('hello world')" ] }, { "cell_type": "code", "execution_count": 181, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.607498Z", "iopub.status.busy": "2024-01-18T17:20:18.607400Z", "iopub.status.idle": "2024-01-18T17:20:18.629633Z", "shell.execute_reply": "2024-01-18T17:20:18.629255Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('worldAworld', 'String')})" ] }, "execution_count": 181, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "As before, the predicates only ensure that the `find()` returned a value greater than -1. Hence, our solution reflects that." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Remove Space from Ends" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We next implement `strip()`." ] }, { "cell_type": "code", "execution_count": 182, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.631571Z", "iopub.status.busy": "2024-01-18T17:20:18.631457Z", "iopub.status.idle": "2024-01-18T17:20:18.633172Z", "shell.execute_reply": "2024-01-18T17:20:18.632914Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import string" ] }, { "cell_type": "code", "execution_count": 183, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.634726Z", "iopub.status.busy": "2024-01-18T17:20:18.634618Z", "iopub.status.idle": "2024-01-18T17:20:18.637620Z", "shell.execute_reply": "2024-01-18T17:20:18.637402Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def rstrip(self, chars=None):\n", " if chars is None:\n", " chars = string.whitespace\n", " if self._len == 0:\n", " return self\n", " else:\n", " last_idx = self._len - 1\n", " cz = z3.SubString(self.z, last_idx.z, 1)\n", " cv = self.v[-1]\n", " zcheck_space = z3.Or([cz == z3.StringVal(char) for char in chars])\n", " vcheck_space = any(cv == char for char in chars)\n", " if zbool(self.context, zcheck_space, vcheck_space):\n", " return zstr(self.context, z3.SubString(self.z, 0, last_idx.z),\n", " self.v[0:-1]).rstrip(chars)\n", " else:\n", " return self" ] }, { "cell_type": "code", "execution_count": 184, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.639125Z", "iopub.status.busy": "2024-01-18T17:20:18.639021Z", "iopub.status.idle": "2024-01-18T17:20:18.640726Z", "shell.execute_reply": "2024-01-18T17:20:18.640452Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def tstr8(s):\n", " if s.rstrip(' ') == 'a b':\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": 185, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.642165Z", "iopub.status.busy": "2024-01-18T17:20:18.642053Z", "iopub.status.idle": "2024-01-18T17:20:18.645640Z", "shell.execute_reply": "2024-01-18T17:20:18.645362Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr8]('a b ')\n", " print(r)" ] }, { "cell_type": "code", "execution_count": 186, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.647209Z", "iopub.status.busy": "2024-01-18T17:20:18.647117Z", "iopub.status.idle": "2024-01-18T17:20:18.672810Z", "shell.execute_reply": "2024-01-18T17:20:18.672495Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('a b ', 'String')})" ] }, "execution_count": 186, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "code", "execution_count": 187, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.674724Z", "iopub.status.busy": "2024-01-18T17:20:18.674605Z", "iopub.status.idle": "2024-01-18T17:20:18.677673Z", "shell.execute_reply": "2024-01-18T17:20:18.677377Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def lstrip(self, chars=None):\n", " if chars is None:\n", " chars = string.whitespace\n", " if self._len == 0:\n", " return self\n", " else:\n", " first_idx = 0\n", " cz = z3.SubString(self.z, 0, 1)\n", " cv = self.v[0]\n", " zcheck_space = z3.Or([cz == z3.StringVal(char) for char in chars])\n", " vcheck_space = any(cv == char for char in chars)\n", " if zbool(self.context, zcheck_space, vcheck_space):\n", " return zstr(self.context, z3.SubString(\n", " self.z, 1, self._len.z), self.v[1:]).lstrip(chars)\n", " else:\n", " return self" ] }, { "cell_type": "code", "execution_count": 188, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.679278Z", "iopub.status.busy": "2024-01-18T17:20:18.679187Z", "iopub.status.idle": "2024-01-18T17:20:18.680948Z", "shell.execute_reply": "2024-01-18T17:20:18.680660Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def tstr9(s):\n", " if s.lstrip(' ') == 'a b':\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": 189, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.682657Z", "iopub.status.busy": "2024-01-18T17:20:18.682541Z", "iopub.status.idle": "2024-01-18T17:20:18.685806Z", "shell.execute_reply": "2024-01-18T17:20:18.685548Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr9](' a b')\n", " print(r)" ] }, { "cell_type": "code", "execution_count": 190, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.687259Z", "iopub.status.busy": "2024-01-18T17:20:18.687183Z", "iopub.status.idle": "2024-01-18T17:20:18.710151Z", "shell.execute_reply": "2024-01-18T17:20:18.709779Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': (' a b', 'String')})" ] }, "execution_count": 190, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "code", "execution_count": 191, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.712027Z", "iopub.status.busy": "2024-01-18T17:20:18.711903Z", "iopub.status.idle": "2024-01-18T17:20:18.713953Z", "shell.execute_reply": "2024-01-18T17:20:18.713700Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def strip(self, chars=None):\n", " return self.lstrip(chars).rstrip(chars)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Example usage." ] }, { "cell_type": "code", "execution_count": 192, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.715486Z", "iopub.status.busy": "2024-01-18T17:20:18.715385Z", "iopub.status.idle": "2024-01-18T17:20:18.717005Z", "shell.execute_reply": "2024-01-18T17:20:18.716773Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def tstr10(s):\n", " if s.strip() == 'a b':\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": 193, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.718418Z", "iopub.status.busy": "2024-01-18T17:20:18.718339Z", "iopub.status.idle": "2024-01-18T17:20:18.724149Z", "shell.execute_reply": "2024-01-18T17:20:18.723914Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr10](' a b ')\n", " print(r)" ] }, { "cell_type": "code", "execution_count": 194, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.725542Z", "iopub.status.busy": "2024-01-18T17:20:18.725459Z", "iopub.status.idle": "2024-01-18T17:20:18.753587Z", "shell.execute_reply": "2024-01-18T17:20:18.753207Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('\\\\u{c}\\\\u{a}\\\\u{9}\\\\u{a}a b\\\\u{d}\\\\u{d}', 'String')})" ] }, "execution_count": 194, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `strip()` has generated the right constraints. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Splitting Strings\n", "\n", "We implement string `split()` as follows." ] }, { "cell_type": "code", "execution_count": 195, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.755450Z", "iopub.status.busy": "2024-01-18T17:20:18.755327Z", "iopub.status.idle": "2024-01-18T17:20:18.758696Z", "shell.execute_reply": "2024-01-18T17:20:18.758443Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zstr(zstr):\n", " def split(self, sep=None, maxsplit=-1):\n", " assert sep is not None # default space based split is complicated\n", " assert maxsplit == -1 # for now.\n", " zsep = z3.StringVal(sep)\n", " zl = z3.Length(zsep)\n", " # zi would be the length of prefix\n", " zi = z3.IndexOf(self.z, zsep, z3.IntVal(0))\n", " # Z3Bug: There is a bug in the `z3.IndexOf` method which returns\n", " # `z3.SeqRef` instead of `z3.ArithRef`. So we need to fix it.\n", " zi = z3.ArithRef(zi.ast, zi.ctx)\n", "\n", " vi = self.v.find(sep)\n", " if zbool(self.context, zi >= z3.IntVal(0), vi >= 0):\n", " zprefix = z3.SubString(self.z, z3.IntVal(0), zi)\n", " zmid = z3.SubString(self.z, zi, zl)\n", " zsuffix = z3.SubString(self.z, zi + zl,\n", " z3.Length(self.z))\n", " return [zstr(self.context, zprefix, self.v[0:vi])] + zstr(\n", " self.context, zsuffix, self.v[vi + len(sep):]).split(\n", " sep, maxsplit)\n", " else:\n", " return [self]" ] }, { "cell_type": "code", "execution_count": 196, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.760233Z", "iopub.status.busy": "2024-01-18T17:20:18.760146Z", "iopub.status.idle": "2024-01-18T17:20:18.762036Z", "shell.execute_reply": "2024-01-18T17:20:18.761741Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def tstr11(s):\n", " if s.split(',') == ['a', 'b', 'c']:\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": 197, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.763453Z", "iopub.status.busy": "2024-01-18T17:20:18.763370Z", "iopub.status.idle": "2024-01-18T17:20:18.766398Z", "shell.execute_reply": "2024-01-18T17:20:18.766166Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " r = _[tstr11]('a,b,c')\n", " print(r)" ] }, { "cell_type": "code", "execution_count": 198, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.767821Z", "iopub.status.busy": "2024-01-18T17:20:18.767739Z", "iopub.status.idle": "2024-01-18T17:20:18.799430Z", "shell.execute_reply": "2024-01-18T17:20:18.799093Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('a,b,c', 'String')})" ] }, "execution_count": 198, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Trip Wire\n", "\n", "For easier debugging, we abort any calls to methods in `str` that are not overridden by `zstr`." ] }, { "cell_type": "code", "execution_count": 199, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.801176Z", "iopub.status.busy": "2024-01-18T17:20:18.801060Z", "iopub.status.idle": "2024-01-18T17:20:18.803005Z", "shell.execute_reply": "2024-01-18T17:20:18.802723Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def make_str_abort_wrapper(fun):\n", " def proxy(*args, **kwargs):\n", " raise Exception('%s Not implemented in `zstr`' % fun.__name__)\n", " return proxy" ] }, { "cell_type": "code", "execution_count": 200, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.804513Z", "iopub.status.busy": "2024-01-18T17:20:18.804411Z", "iopub.status.idle": "2024-01-18T17:20:18.807103Z", "shell.execute_reply": "2024-01-18T17:20:18.806859Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def init_concolic_3():\n", " strmembers = inspect.getmembers(zstr, callable)\n", " zstrmembers = {m[0] for m in strmembers if len(\n", " m) == 2 and 'zstr' in m[1].__qualname__}\n", " for name, fn in inspect.getmembers(str, callable):\n", " # Omitted 'splitlines' as this is needed for formatting output in\n", " # IPython/Jupyter\n", " if name not in zstrmembers and name not in [\n", " 'splitlines',\n", " '__class__',\n", " '__contains__',\n", " '__delattr__',\n", " '__dir__',\n", " '__format__',\n", " '__ge__',\n", " '__getattribute__',\n", " '__getnewargs__',\n", " '__gt__',\n", " '__hash__',\n", " '__le__',\n", " '__len__',\n", " '__lt__',\n", " '__mod__',\n", " '__mul__',\n", " '__ne__',\n", " '__reduce__',\n", " '__reduce_ex__',\n", " '__repr__',\n", " '__rmod__',\n", " '__rmul__',\n", " '__setattr__',\n", " '__sizeof__',\n", " '__str__']:\n", " setattr(zstr, name, make_str_abort_wrapper(fn))" ] }, { "cell_type": "code", "execution_count": 201, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.808644Z", "iopub.status.busy": "2024-01-18T17:20:18.808539Z", "iopub.status.idle": "2024-01-18T17:20:18.810031Z", "shell.execute_reply": "2024-01-18T17:20:18.809766Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "INITIALIZER_LIST.append(init_concolic_3)" ] }, { "cell_type": "code", "execution_count": 202, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.811627Z", "iopub.status.busy": "2024-01-18T17:20:18.811534Z", "iopub.status.idle": "2024-01-18T17:20:18.813343Z", "shell.execute_reply": "2024-01-18T17:20:18.813098Z" }, "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [], "source": [ "init_concolic_3()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example: Triangle\n", "\n", "We previously showed how to run `triangle()` under `ConcolicTracer`." ] }, { "cell_type": "code", "execution_count": 203, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.814826Z", "iopub.status.busy": "2024-01-18T17:20:18.814727Z", "iopub.status.idle": "2024-01-18T17:20:18.817360Z", "shell.execute_reply": "2024-01-18T17:20:18.817122Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "scalene\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " print(_[triangle](1, 2, 3))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The symbolic variables are as follows:" ] }, { "cell_type": "code", "execution_count": 204, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.818851Z", "iopub.status.busy": "2024-01-18T17:20:18.818759Z", "iopub.status.idle": "2024-01-18T17:20:18.820743Z", "shell.execute_reply": "2024-01-18T17:20:18.820503Z" }, "slideshow": { "slide_type": "fragment" }, "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'triangle_a_int_1': 'Int',\n", " 'triangle_b_int_2': 'Int',\n", " 'triangle_c_int_3': 'Int'}" ] }, "execution_count": 204, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.decls" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The predicates are as follows:" ] }, { "cell_type": "code", "execution_count": 205, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.822182Z", "iopub.status.busy": "2024-01-18T17:20:18.822076Z", "iopub.status.idle": "2024-01-18T17:20:18.825011Z", "shell.execute_reply": "2024-01-18T17:20:18.824768Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[Not(triangle_a_int_1 == triangle_b_int_2),\n", " Not(triangle_b_int_2 == triangle_c_int_3),\n", " Not(triangle_a_int_1 == triangle_c_int_3)]" ] }, "execution_count": 205, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.path" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Using `zeval()`, we solve these path conditions and obtain a solution. We find that Z3 gives us three distinct integer values:" ] }, { "cell_type": "code", "execution_count": 206, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.826616Z", "iopub.status.busy": "2024-01-18T17:20:18.826509Z", "iopub.status.idle": "2024-01-18T17:20:18.857931Z", "shell.execute_reply": "2024-01-18T17:20:18.857568Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat',\n", " {'a': ('0', 'Int'), 'b': (['-', '2'], 'Int'), 'c': (['-', '1'], 'Int')})" ] }, "execution_count": 206, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "(Note that some values may be negative. Indeed, `triangle()` works with negative length values, too, even if real triangles only have positive lengths.)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we invoke `triangle()` with these very values, we take the _exact same path_ as the original input:" ] }, { "cell_type": "code", "execution_count": 207, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.859838Z", "iopub.status.busy": "2024-01-18T17:20:18.859711Z", "iopub.status.idle": "2024-01-18T17:20:18.862367Z", "shell.execute_reply": "2024-01-18T17:20:18.861953Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'scalene'" ] }, "execution_count": 207, "metadata": {}, "output_type": "execute_result" } ], "source": [ "triangle(0, -2, -1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can have z3 _negate_ individual conditions – and thus take different paths.\n", "First, we retrieve the symbolic variables." ] }, { "cell_type": "code", "execution_count": 208, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.864369Z", "iopub.status.busy": "2024-01-18T17:20:18.864233Z", "iopub.status.idle": "2024-01-18T17:20:18.866486Z", "shell.execute_reply": "2024-01-18T17:20:18.866184Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "za, zb, zc = [z3.Int(s) for s in _.decls.keys()]" ] }, { "cell_type": "code", "execution_count": 209, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.868473Z", "iopub.status.busy": "2024-01-18T17:20:18.868324Z", "iopub.status.idle": "2024-01-18T17:20:18.871024Z", "shell.execute_reply": "2024-01-18T17:20:18.870745Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(triangle_a_int_1, triangle_b_int_2, triangle_c_int_3)" ] }, "execution_count": 209, "metadata": {}, "output_type": "execute_result" } ], "source": [ "za, zb, zc" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Then, we pass a negated predicate to `zeval()`. The key (here: `1`) determines which predicate the new predicate will replace." ] }, { "cell_type": "code", "execution_count": 210, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.872754Z", "iopub.status.busy": "2024-01-18T17:20:18.872591Z", "iopub.status.idle": "2024-01-18T17:20:18.898633Z", "shell.execute_reply": "2024-01-18T17:20:18.898124Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'a': ('1', 'Int'), 'b': ('0', 'Int'), 'c': ('0', 'Int')})" ] }, "execution_count": 210, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval({1: zb == zc})" ] }, { "cell_type": "code", "execution_count": 211, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.900728Z", "iopub.status.busy": "2024-01-18T17:20:18.900581Z", "iopub.status.idle": "2024-01-18T17:20:18.903080Z", "shell.execute_reply": "2024-01-18T17:20:18.902767Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'isosceles'" ] }, "execution_count": 211, "metadata": {}, "output_type": "execute_result" } ], "source": [ "triangle(1, 0, 1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The updated predicate returns `isosceles` as expected. By negating further conditions, we can systematically explore all branches in `triangle()`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example: Decoding CGI Strings" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us apply `ConcolicTracer` on our example program `cgi_decode()` from the [chapter on coverage](Coverage.ipynb). Note that we need to rewrite its code slightly, as the hash lookups in `hex_values` can not be used for transferring constraints yet." ] }, { "cell_type": "code", "execution_count": 212, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.904860Z", "iopub.status.busy": "2024-01-18T17:20:18.904731Z", "iopub.status.idle": "2024-01-18T17:20:18.908907Z", "shell.execute_reply": "2024-01-18T17:20:18.908629Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def cgi_decode(s):\n", " \"\"\"Decode the CGI-encoded string `s`:\n", " * replace \"+\" by \" \"\n", " * replace \"%xx\" by the character with hex number xx.\n", " Return the decoded string. Raise `ValueError` for invalid inputs.\"\"\"\n", "\n", " # Mapping of hex digits to their integer values\n", " hex_values = {\n", " '0': 0, '1': 1, '2': 2, '3': 3, '4': 4,\n", " '5': 5, '6': 6, '7': 7, '8': 8, '9': 9,\n", " 'a': 10, 'b': 11, 'c': 12, 'd': 13, 'e': 14, 'f': 15,\n", " 'A': 10, 'B': 11, 'C': 12, 'D': 13, 'E': 14, 'F': 15,\n", " }\n", "\n", " t = ''\n", " i = 0\n", " while i < s.length():\n", " c = s[i]\n", " if c == '+':\n", " t += ' '\n", " elif c == '%':\n", " digit_high, digit_low = s[i + 1], s[i + 2]\n", " i = i + 2\n", " found = 0\n", " v = 0\n", " for key in hex_values:\n", " if key == digit_high:\n", " found = found + 1\n", " v = hex_values[key] * 16\n", " break\n", " for key in hex_values:\n", " if key == digit_low:\n", " found = found + 1\n", " v = v + hex_values[key]\n", " break\n", " if found == 2:\n", " if v >= 128:\n", " # z3.StringVal(urllib.parse.unquote('%80')) <-- bug in z3\n", " raise ValueError(\"Invalid encoding\")\n", " t = t + chr(v)\n", " else:\n", " raise ValueError(\"Invalid encoding\")\n", " else:\n", " t = t + c\n", " i = i + 1\n", " return t" ] }, { "cell_type": "code", "execution_count": 213, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.910594Z", "iopub.status.busy": "2024-01-18T17:20:18.910472Z", "iopub.status.idle": "2024-01-18T17:20:18.913062Z", "shell.execute_reply": "2024-01-18T17:20:18.912782Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " _[cgi_decode]('')" ] }, { "cell_type": "code", "execution_count": 214, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.914529Z", "iopub.status.busy": "2024-01-18T17:20:18.914443Z", "iopub.status.idle": "2024-01-18T17:20:18.917122Z", "shell.execute_reply": "2024-01-18T17:20:18.916829Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'cgi_decode_s_str_1': 'String'}, [Not(0 < Length(cgi_decode_s_str_1))])" ] }, "execution_count": 214, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "code", "execution_count": 215, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.918825Z", "iopub.status.busy": "2024-01-18T17:20:18.918703Z", "iopub.status.idle": "2024-01-18T17:20:18.921976Z", "shell.execute_reply": "2024-01-18T17:20:18.921676Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " _[cgi_decode]('a%20d')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Once executed, we can retrieve the symbolic variables in the `decls` attribute. This is a mapping of symbolic variables to types." ] }, { "cell_type": "code", "execution_count": 216, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.923658Z", "iopub.status.busy": "2024-01-18T17:20:18.923563Z", "iopub.status.idle": "2024-01-18T17:20:18.926021Z", "shell.execute_reply": "2024-01-18T17:20:18.925688Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'cgi_decode_s_str_1': 'String'}" ] }, "execution_count": 216, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.decls" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The extracted path conditions can be found in the `path` attribute:" ] }, { "cell_type": "code", "execution_count": 217, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.927701Z", "iopub.status.busy": "2024-01-18T17:20:18.927597Z", "iopub.status.idle": "2024-01-18T17:20:18.934409Z", "shell.execute_reply": "2024-01-18T17:20:18.934085Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[0 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 0, 1) == \"+\"),\n", " Not(str.substr(cgi_decode_s_str_1, 0, 1) == \"%\"),\n", " 1 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 1, 1) == \"+\"),\n", " str.substr(cgi_decode_s_str_1, 1, 1) == \"%\",\n", " Not(str.substr(cgi_decode_s_str_1, 2, 1) == \"0\"),\n", " Not(str.substr(cgi_decode_s_str_1, 2, 1) == \"1\"),\n", " str.substr(cgi_decode_s_str_1, 2, 1) == \"2\",\n", " str.substr(cgi_decode_s_str_1, 3, 1) == \"0\",\n", " 4 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 4, 1) == \"+\"),\n", " Not(str.substr(cgi_decode_s_str_1, 4, 1) == \"%\"),\n", " Not(5 < Length(cgi_decode_s_str_1))]" ] }, "execution_count": 217, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.path" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `context` attribute holds a pair of `decls` and `path` attributes; this is useful for passing it into the `ConcolicTracer` constructor." ] }, { "cell_type": "code", "execution_count": 218, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.936074Z", "iopub.status.busy": "2024-01-18T17:20:18.935946Z", "iopub.status.idle": "2024-01-18T17:20:18.937622Z", "shell.execute_reply": "2024-01-18T17:20:18.937312Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert _.context == (_.decls, _.path)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can solve these constraints to obtain a value for the function parameters that follow the same path as the original (traced) invocation:" ] }, { "cell_type": "code", "execution_count": 219, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.939366Z", "iopub.status.busy": "2024-01-18T17:20:18.939249Z", "iopub.status.idle": "2024-01-18T17:20:18.962640Z", "shell.execute_reply": "2024-01-18T17:20:18.962268Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('A%20B', 'String')})" ] }, "execution_count": 219, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "_Negating_ some of these constraints will yield different paths taken, and thus greater code coverage. This is what our concolic fuzzers (see later) do. Let us go and negate the first constraint, namely that the first character should _not_ be a `+` character:" ] }, { "cell_type": "code", "execution_count": 220, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.964508Z", "iopub.status.busy": "2024-01-18T17:20:18.964377Z", "iopub.status.idle": "2024-01-18T17:20:18.973087Z", "shell.execute_reply": "2024-01-18T17:20:18.972382Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "0 < Length(cgi_decode_s_str_1)" ], "text/plain": [ "0 < Length(cgi_decode_s_str_1)" ] }, "execution_count": 220, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.path[0]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To compute the negated string, we have to construct it via z3 primitives:" ] }, { "cell_type": "code", "execution_count": 221, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.978006Z", "iopub.status.busy": "2024-01-18T17:20:18.977727Z", "iopub.status.idle": "2024-01-18T17:20:18.981546Z", "shell.execute_reply": "2024-01-18T17:20:18.981214Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "zs = z3.String('cgi_decode_s_str_1')" ] }, { "cell_type": "code", "execution_count": 222, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.983319Z", "iopub.status.busy": "2024-01-18T17:20:18.983199Z", "iopub.status.idle": "2024-01-18T17:20:18.989287Z", "shell.execute_reply": "2024-01-18T17:20:18.988678Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "str.substr(cgi_decode_s_str_1, 0, 1) = \"a\"" ], "text/plain": [ "str.substr(cgi_decode_s_str_1, 0, 1) == \"a\"" ] }, "execution_count": 222, "metadata": {}, "output_type": "execute_result" } ], "source": [ "z3.SubString(zs, 0, 1) == z3.StringVal('a')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Invoking `zeval()` with the path condition to be changed obtains a new input that satisfies the negated predicate:" ] }, { "cell_type": "code", "execution_count": 223, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:18.992064Z", "iopub.status.busy": "2024-01-18T17:20:18.991924Z", "iopub.status.idle": "2024-01-18T17:20:19.016869Z", "shell.execute_reply": "2024-01-18T17:20:19.016399Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "(result, new_vars) = _.zeval({1: z3.SubString(zs, 0, 1) == z3.StringVal('+')})\n" ] }, { "cell_type": "code", "execution_count": 224, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.019137Z", "iopub.status.busy": "2024-01-18T17:20:19.018985Z", "iopub.status.idle": "2024-01-18T17:20:19.021794Z", "shell.execute_reply": "2024-01-18T17:20:19.021356Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'s': ('+%20A', 'String')}" ] }, "execution_count": 224, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_vars" ] }, { "cell_type": "code", "execution_count": 225, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.023969Z", "iopub.status.busy": "2024-01-18T17:20:19.023768Z", "iopub.status.idle": "2024-01-18T17:20:19.026186Z", "shell.execute_reply": "2024-01-18T17:20:19.025760Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "(new_s, new_s_type) = new_vars['s']" ] }, { "cell_type": "code", "execution_count": 226, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.028348Z", "iopub.status.busy": "2024-01-18T17:20:19.028023Z", "iopub.status.idle": "2024-01-18T17:20:19.030670Z", "shell.execute_reply": "2024-01-18T17:20:19.030399Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'+%20A'" ] }, "execution_count": 226, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_s" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can validate that `new_s` indeed takes the new path by re-running the tracer with `new_s` as input:" ] }, { "cell_type": "code", "execution_count": 227, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.032404Z", "iopub.status.busy": "2024-01-18T17:20:19.032279Z", "iopub.status.idle": "2024-01-18T17:20:19.035350Z", "shell.execute_reply": "2024-01-18T17:20:19.035089Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " _[cgi_decode](new_s)" ] }, { "cell_type": "code", "execution_count": 228, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.036926Z", "iopub.status.busy": "2024-01-18T17:20:19.036824Z", "iopub.status.idle": "2024-01-18T17:20:19.043063Z", "shell.execute_reply": "2024-01-18T17:20:19.042751Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[0 < Length(cgi_decode_s_str_1),\n", " str.substr(cgi_decode_s_str_1, 0, 1) == \"+\",\n", " 1 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 1, 1) == \"+\"),\n", " str.substr(cgi_decode_s_str_1, 1, 1) == \"%\",\n", " Not(str.substr(cgi_decode_s_str_1, 2, 1) == \"0\"),\n", " Not(str.substr(cgi_decode_s_str_1, 2, 1) == \"1\"),\n", " str.substr(cgi_decode_s_str_1, 2, 1) == \"2\",\n", " str.substr(cgi_decode_s_str_1, 3, 1) == \"0\",\n", " 4 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 4, 1) == \"+\"),\n", " Not(str.substr(cgi_decode_s_str_1, 4, 1) == \"%\"),\n", " Not(5 < Length(cgi_decode_s_str_1))]" ] }, "execution_count": 228, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.path" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "By negating further conditions, we can explore more and more code." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example: Round\n", "\n", "Here is a function that gives you the nearest ten's multiplier" ] }, { "cell_type": "code", "execution_count": 229, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.044882Z", "iopub.status.busy": "2024-01-18T17:20:19.044738Z", "iopub.status.idle": "2024-01-18T17:20:19.046516Z", "shell.execute_reply": "2024-01-18T17:20:19.046196Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def round10(r):\n", " while r % 10 != 0:\n", " r += 1\n", " return r" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "As before, we execute the function under the `ConcolicTracer` context." ] }, { "cell_type": "code", "execution_count": 230, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.048388Z", "iopub.status.busy": "2024-01-18T17:20:19.048244Z", "iopub.status.idle": "2024-01-18T17:20:19.051786Z", "shell.execute_reply": "2024-01-18T17:20:19.051501Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " r = _[round10](1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We verify that we were able to capture all the predicates:" ] }, { "cell_type": "code", "execution_count": 231, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.053681Z", "iopub.status.busy": "2024-01-18T17:20:19.053576Z", "iopub.status.idle": "2024-01-18T17:20:19.153352Z", "shell.execute_reply": "2024-01-18T17:20:19.153008Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "({'round10_r_int_1': 'Int'},\n", " [0 != round10_r_int_1%10,\n", " 0 != (round10_r_int_1 + 1)%10,\n", " 0 != (round10_r_int_1 + 1 + 1)%10,\n", " 0 != (round10_r_int_1 + 1 + 1 + 1)%10,\n", " 0 != (round10_r_int_1 + 1 + 1 + 1 + 1)%10,\n", " 0 != (round10_r_int_1 + 1 + 1 + 1 + 1 + 1)%10,\n", " 0 != (round10_r_int_1 + 1 + 1 + 1 + 1 + 1 + 1)%10,\n", " 0 != (round10_r_int_1 + 1 + 1 + 1 + 1 + 1 + 1 + 1)%10,\n", " 0 != (round10_r_int_1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1)%10,\n", " Not(0 !=\n", " (round10_r_int_1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1)%10)])" ] }, "execution_count": 231, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We use `zeval()` to obtain more inputs that take the same path." ] }, { "cell_type": "code", "execution_count": 232, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.155177Z", "iopub.status.busy": "2024-01-18T17:20:19.155052Z", "iopub.status.idle": "2024-01-18T17:20:19.175351Z", "shell.execute_reply": "2024-01-18T17:20:19.174919Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'r': (['-', '9'], 'Int')})" ] }, "execution_count": 232, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example: Absolute Maximum\n", "\n", "Do our concolic proxies work across functions? Say we have a function `max_value()` as below." ] }, { "cell_type": "code", "execution_count": 233, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.177674Z", "iopub.status.busy": "2024-01-18T17:20:19.177523Z", "iopub.status.idle": "2024-01-18T17:20:19.179669Z", "shell.execute_reply": "2024-01-18T17:20:19.179389Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def abs_value(a):\n", " if a > 0:\n", " return a\n", " else:\n", " return -a" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It is called by another function `abs_max()`" ] }, { "cell_type": "code", "execution_count": 234, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.181317Z", "iopub.status.busy": "2024-01-18T17:20:19.181199Z", "iopub.status.idle": "2024-01-18T17:20:19.183127Z", "shell.execute_reply": "2024-01-18T17:20:19.182893Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def abs_max(a, b):\n", " a1 = abs_value(a)\n", " b1 = abs_value(b)\n", " if a1 > b1:\n", " c = a1\n", " else:\n", " c = b1\n", " return c" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Using the `Concolic()` context on `abs_max()`." ] }, { "cell_type": "code", "execution_count": 235, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.184695Z", "iopub.status.busy": "2024-01-18T17:20:19.184583Z", "iopub.status.idle": "2024-01-18T17:20:19.186701Z", "shell.execute_reply": "2024-01-18T17:20:19.186450Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " _[abs_max](2, 1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "As expected, we have the predicates across functions." ] }, { "cell_type": "code", "execution_count": 236, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.188450Z", "iopub.status.busy": "2024-01-18T17:20:19.188325Z", "iopub.status.idle": "2024-01-18T17:20:19.191400Z", "shell.execute_reply": "2024-01-18T17:20:19.191146Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'abs_max_a_int_1': 'Int', 'abs_max_b_int_2': 'Int'},\n", " [0 < abs_max_a_int_1, 0 < abs_max_b_int_2, abs_max_a_int_1 > abs_max_b_int_2])" ] }, "execution_count": 236, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "code", "execution_count": 237, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.192937Z", "iopub.status.busy": "2024-01-18T17:20:19.192824Z", "iopub.status.idle": "2024-01-18T17:20:19.212064Z", "shell.execute_reply": "2024-01-18T17:20:19.211635Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'a': ('2', 'Int'), 'b': ('1', 'Int')})" ] }, "execution_count": 237, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Solving the predicates works as expected." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Using negative numbers as arguments so that a different branch is taken in `abs_value()`" ] }, { "cell_type": "code", "execution_count": 238, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.213856Z", "iopub.status.busy": "2024-01-18T17:20:19.213745Z", "iopub.status.idle": "2024-01-18T17:20:19.216243Z", "shell.execute_reply": "2024-01-18T17:20:19.216000Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " _[abs_max](-2, -1)" ] }, { "cell_type": "code", "execution_count": 239, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.217658Z", "iopub.status.busy": "2024-01-18T17:20:19.217558Z", "iopub.status.idle": "2024-01-18T17:20:19.221026Z", "shell.execute_reply": "2024-01-18T17:20:19.220755Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'abs_max_a_int_1': 'Int', 'abs_max_b_int_2': 'Int'},\n", " [Not(0 < abs_max_a_int_1),\n", " Not(0 < abs_max_b_int_2),\n", " -abs_max_a_int_1 > -abs_max_b_int_2])" ] }, "execution_count": 239, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "code", "execution_count": 240, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.222479Z", "iopub.status.busy": "2024-01-18T17:20:19.222400Z", "iopub.status.idle": "2024-01-18T17:20:19.241246Z", "shell.execute_reply": "2024-01-18T17:20:19.240908Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'a': (['-', '1'], 'Int'), 'b': ('0', 'Int')})" ] }, "execution_count": 240, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The solution reflects our predicates. (We used `a > 0` in `abs_value()`)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example: Binomial Coefficient\n", "\n", "For a larger example that uses different kinds of variables, say we want to compute the binomial coefficient by the following formulas\n", "\n", "$$ \n", "^nP_k=\\frac{n!}{(n-k)!}\n", "$$\n", "\n", "$$\n", "\\binom nk=\\,^nC_k=\\frac{^nP_k}{k!}\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "we define the functions as follows." ] }, { "cell_type": "code", "execution_count": 241, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.243147Z", "iopub.status.busy": "2024-01-18T17:20:19.242929Z", "iopub.status.idle": "2024-01-18T17:20:19.245209Z", "shell.execute_reply": "2024-01-18T17:20:19.244873Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def factorial(n): # type: ignore\n", " v = 1\n", " while n != 0:\n", " v *= n\n", " n -= 1\n", "\n", " return v" ] }, { "cell_type": "code", "execution_count": 242, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.246869Z", "iopub.status.busy": "2024-01-18T17:20:19.246759Z", "iopub.status.idle": "2024-01-18T17:20:19.248538Z", "shell.execute_reply": "2024-01-18T17:20:19.248270Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def permutation(n, k):\n", " return factorial(n) / factorial(n - k)" ] }, { "cell_type": "code", "execution_count": 243, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.250058Z", "iopub.status.busy": "2024-01-18T17:20:19.249940Z", "iopub.status.idle": "2024-01-18T17:20:19.251614Z", "shell.execute_reply": "2024-01-18T17:20:19.251260Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def combination(n, k):\n", " return permutation(n, k) / factorial(k)" ] }, { "cell_type": "code", "execution_count": 244, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.253195Z", "iopub.status.busy": "2024-01-18T17:20:19.253093Z", "iopub.status.idle": "2024-01-18T17:20:19.254988Z", "shell.execute_reply": "2024-01-18T17:20:19.254701Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def binomial(n, k):\n", " if n < 0 or k < 0 or n < k:\n", " raise Exception('Invalid values')\n", " return combination(n, k)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "As before, we run the function under `ConcolicTracer`." ] }, { "cell_type": "code", "execution_count": 245, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.256663Z", "iopub.status.busy": "2024-01-18T17:20:19.256522Z", "iopub.status.idle": "2024-01-18T17:20:19.260663Z", "shell.execute_reply": "2024-01-18T17:20:19.260411Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " v = _[binomial](4, 2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Then call `zeval()` to evaluate." ] }, { "cell_type": "code", "execution_count": 246, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.262356Z", "iopub.status.busy": "2024-01-18T17:20:19.262259Z", "iopub.status.idle": "2024-01-18T17:20:19.283249Z", "shell.execute_reply": "2024-01-18T17:20:19.282876Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'n': ('4', 'Int'), 'k': ('2', 'Int')})" ] }, "execution_count": 246, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "raw", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "The values returned are same as the input values as expected." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example: Database" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" }, "toc-hr-collapsed": true }, "source": [ "For a larger example using the Concolic String class `zstr`, we use the DB class from the [chapter on information flow](InformationFlow.ipynb)." ] }, { "cell_type": "code", "execution_count": 247, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.285149Z", "iopub.status.busy": "2024-01-18T17:20:19.285020Z", "iopub.status.idle": "2024-01-18T17:20:19.287140Z", "shell.execute_reply": "2024-01-18T17:20:19.286874Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Note: The following example may not work with your Z3 version;\n", "see https://github.com/Z3Prover/z3/issues/5763 for details.\n", "Consider `pip install z3-solver==4.8.7.0` as a workaround.\n" ] } ], "source": [ "if __name__ == '__main__':\n", " if z3.get_version() > (4, 8, 7, 0):\n", " print(\"\"\"Note: The following example may not work with your Z3 version;\n", "see https://github.com/Z3Prover/z3/issues/5763 for details.\n", "Consider `pip install z3-solver==4.8.7.0` as a workaround.\"\"\")" ] }, { "cell_type": "code", "execution_count": 248, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.288694Z", "iopub.status.busy": "2024-01-18T17:20:19.288577Z", "iopub.status.idle": "2024-01-18T17:20:19.498608Z", "shell.execute_reply": "2024-01-18T17:20:19.498317Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from InformationFlow import DB, sample_db, update_inventory" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We first populate our database." ] }, { "cell_type": "code", "execution_count": 249, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.500426Z", "iopub.status.busy": "2024-01-18T17:20:19.500333Z", "iopub.status.idle": "2024-01-18T17:20:19.833941Z", "shell.execute_reply": "2024-01-18T17:20:19.832766Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from GrammarMiner import VEHICLES # minor dependency" ] }, { "cell_type": "code", "execution_count": 250, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.842644Z", "iopub.status.busy": "2024-01-18T17:20:19.842033Z", "iopub.status.idle": "2024-01-18T17:20:19.851204Z", "shell.execute_reply": "2024-01-18T17:20:19.850042Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "db = sample_db()\n", "for V in VEHICLES:\n", " update_inventory(db, V)" ] }, { "cell_type": "code", "execution_count": 251, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.856185Z", "iopub.status.busy": "2024-01-18T17:20:19.855522Z", "iopub.status.idle": "2024-01-18T17:20:19.866261Z", "shell.execute_reply": "2024-01-18T17:20:19.864596Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'inventory': ({'year': int, 'kind': str, 'company': str, 'model': str},\n", " [{'year': 1997, 'kind': 'van', 'company': 'Ford', 'model': 'E350'},\n", " {'year': 2000, 'kind': 'car', 'company': 'Mercury', 'model': 'Cougar'},\n", " {'year': 1999, 'kind': 'car', 'company': 'Chevy', 'model': 'Venture'}])}" ] }, "execution_count": 251, "metadata": {}, "output_type": "execute_result" } ], "source": [ "db.db" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We are now ready to fuzz our `DB` class. Hash functions are difficult to handle directly (because they rely on internal C functions). Hence we modify `table()` slightly." ] }, { "cell_type": "code", "execution_count": 252, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.872592Z", "iopub.status.busy": "2024-01-18T17:20:19.871842Z", "iopub.status.idle": "2024-01-18T17:20:19.876075Z", "shell.execute_reply": "2024-01-18T17:20:19.875370Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ConcolicDB(DB):\n", " def table(self, t_name):\n", " for k, v in self.db:\n", " if t_name == k:\n", " return v\n", " raise SQLException('Table (%s) was not found' % repr(t_name))\n", "\n", " def column(self, decl, c_name):\n", " for k in decl:\n", " if c_name == k:\n", " return decl[k]\n", " raise SQLException('Column (%s) was not found' % repr(c_name))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To make it easy, we define a single function `db_select()` that directly invokes `db.sql()`." ] }, { "cell_type": "code", "execution_count": 253, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.877952Z", "iopub.status.busy": "2024-01-18T17:20:19.877839Z", "iopub.status.idle": "2024-01-18T17:20:19.880003Z", "shell.execute_reply": "2024-01-18T17:20:19.879742Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def db_select(s):\n", " my_db = ConcolicDB()\n", " my_db.db = [(k, v) for (k, v) in db.db.items()]\n", " r = my_db.sql(s)\n", " return r" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We now want to run SQL statements under our `ConcolicTracer`, and collect predicates obtained." ] }, { "cell_type": "code", "execution_count": 254, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.881573Z", "iopub.status.busy": "2024-01-18T17:20:19.881484Z", "iopub.status.idle": "2024-01-18T17:20:19.884214Z", "shell.execute_reply": "2024-01-18T17:20:19.883934Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " _[db_select]('select kind from inventory')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The predicates encountered during the execution are as follows:" ] }, { "cell_type": "code", "execution_count": 255, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.885687Z", "iopub.status.busy": "2024-01-18T17:20:19.885604Z", "iopub.status.idle": "2024-01-18T17:20:19.890003Z", "shell.execute_reply": "2024-01-18T17:20:19.889773Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[0 == IndexOf(db_select_s_str_1, \"select \", 0),\n", " 0 == IndexOf(db_select_s_str_1, \"select \", 0),\n", " Not(0 >\n", " IndexOf(str.substr(db_select_s_str_1, 7, 19),\n", " \" from \",\n", " 0)),\n", " Not(Or(0 <\n", " IndexOf(str.substr(db_select_s_str_1, 7, 19),\n", " \" where \",\n", " 0),\n", " 0 ==\n", " IndexOf(str.substr(db_select_s_str_1, 7, 19),\n", " \" where \",\n", " 0))),\n", " str.substr(str.substr(db_select_s_str_1, 7, 19), 10, 9) ==\n", " \"inventory\"]" ] }, "execution_count": 255, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.path" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can use `zeval()` as before to solve the constraints." ] }, { "cell_type": "code", "execution_count": 256, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:19.891519Z", "iopub.status.busy": "2024-01-18T17:20:19.891439Z", "iopub.status.idle": "2024-01-18T17:20:25.975012Z", "shell.execute_reply": "2024-01-18T17:20:25.974687Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('Gave up', None)" ] }, "execution_count": 256, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "tags": [] }, "source": [ "## Fuzzing with Constraints\n", "\n", "The `SimpleConcolicFuzzer` class starts with a sample input generated by some other fuzzer. It then runs the function being tested under `ConcolicTracer`, and collects the path predicates. It then negates random predicates within the path and solves it with Z3 to produce a new output that is guaranteed to take a different path than the original." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "As with `ConcolicTracer`, above, please first look at the examples before digging into the implementation." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Implementing SimpleConcolicFuzzer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "First, we import the `Fuzzer` interface, and write example program `hang_if_no_space()`" ] }, { "cell_type": "code", "execution_count": 257, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:25.977049Z", "iopub.status.busy": "2024-01-18T17:20:25.976909Z", "iopub.status.idle": "2024-01-18T17:20:25.978916Z", "shell.execute_reply": "2024-01-18T17:20:25.978630Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from Fuzzer import Fuzzer" ] }, { "cell_type": "code", "execution_count": 258, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:25.980437Z", "iopub.status.busy": "2024-01-18T17:20:25.980323Z", "iopub.status.idle": "2024-01-18T17:20:25.982182Z", "shell.execute_reply": "2024-01-18T17:20:25.981932Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def hang_if_no_space(s):\n", " i = 0\n", " while True:\n", " if i < s.length():\n", " if s[i] == ' ':\n", " break\n", " i += 1" ] }, { "cell_type": "code", "execution_count": 259, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:25.983771Z", "iopub.status.busy": "2024-01-18T17:20:25.983647Z", "iopub.status.idle": "2024-01-18T17:20:25.985345Z", "shell.execute_reply": "2024-01-18T17:20:25.985085Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ExpectError import ExpectTimeout, ExpectError" ] }, { "cell_type": "code", "execution_count": 260, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:25.987059Z", "iopub.status.busy": "2024-01-18T17:20:25.986800Z", "iopub.status.idle": "2024-01-18T17:20:25.988818Z", "shell.execute_reply": "2024-01-18T17:20:25.988473Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import random" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Representing Decisions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To make the fuzzer work, we need a way to represent decisions made during trace. We keep this in a _binary tree_ where each node represents a decision made, and each leaf represents a complete path. A node in the binary tree is represented by the `TraceNode` class.\n", "\n", "When a new node is added, it represents a decision taken by the parent on some predicate. This predicate is supplied as `smt_val`, which is `True` for this child to be reached. Since the predicate is actually present in the parent node, we also carry a member `smt` which will be updated by the first child to be added." ] }, { "cell_type": "code", "execution_count": 261, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:25.990560Z", "iopub.status.busy": "2024-01-18T17:20:25.990444Z", "iopub.status.idle": "2024-01-18T17:20:25.993277Z", "shell.execute_reply": "2024-01-18T17:20:25.993032Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TraceNode:\n", " def __init__(self, smt_val, parent, info):\n", " # This is the smt that lead to this node\n", " self._smt_val = z3.simplify(smt_val) if smt_val is not None else None\n", "\n", " # This is the predicate that this node might perform at a future point\n", " self.smt = None\n", " self.info = info\n", " self.parent = parent\n", " self.children = {}\n", " self.path = None\n", " self.tree = None\n", " self._pattern = None\n", " self.log = True\n", "\n", " def no(self): return self.children.get(self.tree.no_bit)\n", "\n", " def yes(self): return self.children.get(self.tree.yes_bit)\n", "\n", " def get_children(self): return (self.no(), self.yes())\n", "\n", " def __str__(self):\n", " return 'TraceNode[%s]' % ','.join(self.children.keys())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We add a `PlausibleChild` class to track the leaf nodes." ] }, { "cell_type": "code", "execution_count": 262, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:25.994939Z", "iopub.status.busy": "2024-01-18T17:20:25.994798Z", "iopub.status.idle": "2024-01-18T17:20:25.996935Z", "shell.execute_reply": "2024-01-18T17:20:25.996675Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class PlausibleChild:\n", " def __init__(self, parent, cond, tree):\n", " self.parent = parent\n", " self.cond = cond\n", " self.tree = tree\n", " self._smt_val = None\n", "\n", " def __repr__(self):\n", " return 'PlausibleChild[%s]' % (self.parent.pattern() + ':' + self.cond)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "When the leaf nodes are used to generate new paths, we expect its sibling `TraceNode` to have been already explored. Hence, we make use of the sibling's values for context `cc`, and the `smt_val` from the parent." ] }, { "cell_type": "code", "execution_count": 263, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:25.998472Z", "iopub.status.busy": "2024-01-18T17:20:25.998365Z", "iopub.status.idle": "2024-01-18T17:20:26.001128Z", "shell.execute_reply": "2024-01-18T17:20:26.000875Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class PlausibleChild(PlausibleChild):\n", " def smt_val(self):\n", " if self._smt_val is not None:\n", " return self._smt_val\n", " # if the parent has other children, then that child would have updatd the parent's smt\n", " # Hence, we can use that child's smt_value's opposite as our value.\n", " assert self.parent.smt is not None\n", " if self.cond == self.tree.no_bit:\n", " self._smt_val = z3.Not(self.parent.smt)\n", " else:\n", " self._smt_val = self.parent.smt\n", " return self._smt_val\n", "\n", " def cc(self):\n", " if self.parent.info.get('cc') is not None:\n", " return self.parent.info['cc']\n", " # if there is a plausible child node, it means that there can\n", " # be at most one child.\n", " siblings = list(self.parent.children.values())\n", " assert len(siblings) == 1\n", " # We expect at the other child to have cc\n", " return siblings[0].info['cc']" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `PlausibleChild` instance is used to generate new paths to explore using `path_expression()`." ] }, { "cell_type": "code", "execution_count": 264, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.002583Z", "iopub.status.busy": "2024-01-18T17:20:26.002483Z", "iopub.status.idle": "2024-01-18T17:20:26.004464Z", "shell.execute_reply": "2024-01-18T17:20:26.004211Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class PlausibleChild(PlausibleChild):\n", " def path_expression(self):\n", " path_to_root = self.parent.get_path_to_root()\n", " assert path_to_root[0]._smt_val is None\n", " return [i._smt_val for i in path_to_root[1:]] + [self.smt_val()]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `TraceTree` class helps us keep track of the binary tree. In the beginning, the root is a sentinel `TraceNode` instance, and simply have two plausible children as leaves. As soon as the first trace is added, one of the plausible children will become a true child." ] }, { "cell_type": "code", "execution_count": 265, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.005909Z", "iopub.status.busy": "2024-01-18T17:20:26.005808Z", "iopub.status.idle": "2024-01-18T17:20:26.008000Z", "shell.execute_reply": "2024-01-18T17:20:26.007776Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TraceTree:\n", " def __init__(self):\n", " self.root = TraceNode(smt_val=None, parent=None, info={'num': 0})\n", " self.root.tree = self\n", " self.leaves = {}\n", " self.no_bit, self.yes_bit = '0', '1'\n", "\n", " pprefix = ':'\n", " for bit in [self.no_bit, self.yes_bit]:\n", " self.leaves[pprefix + bit] = PlausibleChild(self.root, bit, self)\n", " self.completed_paths = {}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `add_trace()` method of the `TraceTree` provides a way for new traces to be added. It is kept separate from the initialization as we might want to add more than one trace from the same function." ] }, { "cell_type": "code", "execution_count": 266, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.009804Z", "iopub.status.busy": "2024-01-18T17:20:26.009494Z", "iopub.status.idle": "2024-01-18T17:20:26.012149Z", "shell.execute_reply": "2024-01-18T17:20:26.011839Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TraceTree(TraceTree):\n", " def add_trace(self, tracer, string):\n", " last = self.root\n", " i = 0\n", " for i, elt in enumerate(tracer.path):\n", " last = last.add_child(elt=elt, i=i + 1, cc=tracer, string=string)\n", " last.add_child(elt=z3.BoolVal(True), i=i + 1, cc=tracer, string=string)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To make `add_trace()` work, we need a little more infrastructure, that we define below.\n", "\n", "The `bit()` method translates a predicate to a bit that corresponds to the decision taken at each predicate. If the `if` branch is taken, the result is `1`, while `else` branch is indicated by `0`. The pattern indicates the bit-pattern of decisions required to reach the leaf from the root." ] }, { "cell_type": "code", "execution_count": 267, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.013688Z", "iopub.status.busy": "2024-01-18T17:20:26.013586Z", "iopub.status.idle": "2024-01-18T17:20:26.016053Z", "shell.execute_reply": "2024-01-18T17:20:26.015808Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TraceNode(TraceNode):\n", " def bit(self):\n", " if self._smt_val is None:\n", " return None\n", " return self.tree.no_bit if self._smt_val.decl(\n", " ).name() == 'not' else self.tree.yes_bit\n", "\n", " def pattern(self):\n", " if self._pattern is not None:\n", " return self._pattern\n", " path = self.get_path_to_root()\n", " assert path[0]._smt_val is None\n", " assert path[0].parent is None\n", "\n", " self._pattern = ''.join([p.bit() for p in path[1:]])\n", " return self._pattern" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Each node knows how to add a new child, and get the path to root, which is cached.\n", "\n", "When we add a child to the root node, it means that there was a decision in the current node, and the child is the result of the decision. Hence, to get the decision being made, we simplify the `smt` expression, and check if it starts with `not`. If it does not start with a `not`, we interpret that as the current decision in the node. If it starts with `not`, then we interpret that `not(smt)` was the expression being evaluated in the current node.\n", "\n", "We know the first decision made only after going through the program at least once. As soon as the program is traversed, we update the parent with the decision that resulted in the current child." ] }, { "cell_type": "code", "execution_count": 268, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.017599Z", "iopub.status.busy": "2024-01-18T17:20:26.017491Z", "iopub.status.idle": "2024-01-18T17:20:26.022896Z", "shell.execute_reply": "2024-01-18T17:20:26.022635Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TraceNode(TraceNode):\n", " def add_child(self, elt, i, cc, string):\n", " if elt == z3.BoolVal(True):\n", " # No more exploration here. Simply unregister the leaves of *this*\n", " # node and possibly register them in completed nodes, and exit\n", " for bit in [self.tree.no_bit, self.tree.yes_bit]:\n", " child_leaf = self.pattern() + ':' + bit\n", " if child_leaf in self.tree.leaves:\n", " del self.tree.leaves[child_leaf]\n", " self.tree.completed_paths[self.pattern()] = self\n", " return None\n", "\n", " child_node = TraceNode(smt_val=elt,\n", " parent=self,\n", " info={'num': i, 'cc': cc, 'string': string})\n", " child_node.tree = self.tree\n", "\n", " # bit represents the path that child took from this node.\n", " bit = child_node.bit()\n", "\n", " # first we update our smt decision\n", " if bit == self.tree.yes_bit: # yes, which means the smt can be used as is\n", " if self.smt is not None:\n", " assert self.smt == child_node._smt_val\n", " else:\n", " self.smt = child_node._smt_val\n", " # no, which means we have to negate it to get the decision.\n", " elif bit == self.tree.no_bit:\n", " smt_ = z3.simplify(z3.Not(child_node._smt_val))\n", " if self.smt is not None:\n", " assert smt_ == self.smt\n", " else:\n", " self.smt = smt_\n", " else:\n", " assert False\n", "\n", " if bit in self.children:\n", " # if self.log:\n", " #print(elt, child_node.bit(), i, string)\n", " #print(i,'overwriting', bit,'=>',self.children[bit],'with',child_node)\n", " child_node = self.children[bit]\n", " #self.children[bit] = child_node\n", " #child_node.children = old.children\n", " else:\n", " self.children[bit] = child_node\n", "\n", " # At this point, we have to unregister any leaves that correspond to this child from tree,\n", " # and add the plausible children of this child as leaves to be explored. Note that\n", " # if it is the end (z3.True), we do not have any more children.\n", " child_leaf = self.pattern() + ':' + bit\n", " if child_leaf in self.tree.leaves:\n", " del self.tree.leaves[child_leaf]\n", "\n", " pprefix = child_node.pattern() + ':'\n", "\n", " # Plausible children.\n", " for bit in [self.tree.no_bit, self.tree.yes_bit]:\n", " self.tree.leaves[pprefix +\n", " bit] = PlausibleChild(child_node, bit, self.tree)\n", " return child_node" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The path to root from any node is computed once and cached." ] }, { "cell_type": "code", "execution_count": 269, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.024381Z", "iopub.status.busy": "2024-01-18T17:20:26.024299Z", "iopub.status.idle": "2024-01-18T17:20:26.026326Z", "shell.execute_reply": "2024-01-18T17:20:26.026073Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class TraceNode(TraceNode):\n", " def get_path_to_root(self):\n", " if self.path is not None:\n", " return self.path\n", " parent_path = []\n", " if self.parent is not None:\n", " parent_path = self.parent.get_path_to_root()\n", " self.path = parent_path + [self]\n", " return self.path" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### The SimpleConcolicFuzzer class" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `SimpleConcolicFuzzer` is defined with the `Fuzzer` interface. " ] }, { "cell_type": "code", "execution_count": 270, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.027893Z", "iopub.status.busy": "2024-01-18T17:20:26.027806Z", "iopub.status.idle": "2024-01-18T17:20:26.029777Z", "shell.execute_reply": "2024-01-18T17:20:26.029530Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class SimpleConcolicFuzzer(Fuzzer):\n", " def __init__(self):\n", " self.ct = TraceTree()\n", " self.max_tries = 1000\n", " self.last = None\n", " self.last_idx = None" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `add_trace()` method we defined earlier is used as follows. First, we use a random string to generate the concolic trace." ] }, { "cell_type": "code", "execution_count": 271, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.031220Z", "iopub.status.busy": "2024-01-18T17:20:26.031139Z", "iopub.status.idle": "2024-01-18T17:20:26.033864Z", "shell.execute_reply": "2024-01-18T17:20:26.033624Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ExpectTimeout(2):\n", " with ConcolicTracer() as _:\n", " _[hang_if_no_space]('ab d')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Next, we initialize and add this trace to the fuzzer." ] }, { "cell_type": "code", "execution_count": 272, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.035331Z", "iopub.status.busy": "2024-01-18T17:20:26.035252Z", "iopub.status.idle": "2024-01-18T17:20:26.039332Z", "shell.execute_reply": "2024-01-18T17:20:26.039087Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[0 < Length(hang_if_no_space_s_str_1),\n", " Not(str.substr(hang_if_no_space_s_str_1, 0, 1) == \" \"),\n", " 1 < Length(hang_if_no_space_s_str_1),\n", " Not(str.substr(hang_if_no_space_s_str_1, 1, 1) == \" \"),\n", " 2 < Length(hang_if_no_space_s_str_1),\n", " str.substr(hang_if_no_space_s_str_1, 2, 1) == \" \"]" ] }, "execution_count": 272, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.path" ] }, { "cell_type": "code", "execution_count": 273, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.040747Z", "iopub.status.busy": "2024-01-18T17:20:26.040661Z", "iopub.status.idle": "2024-01-18T17:20:26.043777Z", "shell.execute_reply": "2024-01-18T17:20:26.043501Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "scf = SimpleConcolicFuzzer()\n", "scf.ct.add_trace(_, 'ab d')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The path we added above can be obtained from the `TraceTree` as below." ] }, { "cell_type": "code", "execution_count": 274, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.045505Z", "iopub.status.busy": "2024-01-18T17:20:26.045414Z", "iopub.status.idle": "2024-01-18T17:20:26.049559Z", "shell.execute_reply": "2024-01-18T17:20:26.049319Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[None,\n", " Not(Length(hang_if_no_space_s_str_1) <= 0),\n", " Not(str.substr(hang_if_no_space_s_str_1, 0, 1) == \" \"),\n", " Not(Length(hang_if_no_space_s_str_1) <= 1),\n", " Not(str.substr(hang_if_no_space_s_str_1, 1, 1) == \" \"),\n", " Not(Length(hang_if_no_space_s_str_1) <= 2),\n", " str.substr(hang_if_no_space_s_str_1, 2, 1) == \" \"]" ] }, "execution_count": 274, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[i._smt_val for i in scf.ct.root.get_children(\n", ")[0].get_children(\n", ")[0].get_children(\n", ")[0].get_children(\n", ")[0].get_children(\n", ")[0].get_children(\n", ")[1].get_path_to_root()]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Below are the registered leaves that we can explore at this moment." ] }, { "cell_type": "code", "execution_count": 275, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.050996Z", "iopub.status.busy": "2024-01-18T17:20:26.050915Z", "iopub.status.idle": "2024-01-18T17:20:26.052875Z", "shell.execute_reply": "2024-01-18T17:20:26.052651Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ":1 \t PlausibleChild[:1]\n", "0:1 \t PlausibleChild[0:1]\n", "00:1 \t PlausibleChild[00:1]\n", "000:1 \t PlausibleChild[000:1]\n", "0000:1 \t PlausibleChild[0000:1]\n", "00000:0 \t PlausibleChild[00000:0]\n" ] } ], "source": [ "for key in scf.ct.leaves:\n", " print(key, '\\t', scf.ct.leaves[key])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Next, we need a way to visualize the constructed tree." ] }, { "cell_type": "code", "execution_count": 276, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.054366Z", "iopub.status.busy": "2024-01-18T17:20:26.054293Z", "iopub.status.idle": "2024-01-18T17:20:26.055824Z", "shell.execute_reply": "2024-01-18T17:20:26.055612Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from GrammarFuzzer import display_tree" ] }, { "cell_type": "code", "execution_count": 277, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.057230Z", "iopub.status.busy": "2024-01-18T17:20:26.057154Z", "iopub.status.idle": "2024-01-18T17:20:26.058771Z", "shell.execute_reply": "2024-01-18T17:20:26.058520Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "TREE_NODES = {}" ] }, { "cell_type": "code", "execution_count": 278, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.060163Z", "iopub.status.busy": "2024-01-18T17:20:26.060085Z", "iopub.status.idle": "2024-01-18T17:20:26.062402Z", "shell.execute_reply": "2024-01-18T17:20:26.062176Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def my_extract_node(tnode, id):\n", " key, node, parent = tnode\n", " if node is None:\n", " # return '? (%s:%s)' % (parent.pattern(), key) , [], ''\n", " return '?', [], ''\n", " if node.smt is None:\n", " return '* %s' % node.info.get('string', ''), [], ''\n", "\n", " no, yes = node.get_children()\n", " num = str(node.info.get('num'))\n", " children = [('0', no, node), ('1', yes, node)]\n", " TREE_NODES[id] = 0\n", " return \"(%s) %s\" % (num, str(node.smt)), children, ''" ] }, { "cell_type": "code", "execution_count": 279, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.063752Z", "iopub.status.busy": "2024-01-18T17:20:26.063674Z", "iopub.status.idle": "2024-01-18T17:20:26.065635Z", "shell.execute_reply": "2024-01-18T17:20:26.065406Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def my_edge_attr(dot, start_node, stop_node):\n", " # the edges are always drawn '0:NO' first.\n", " if TREE_NODES[start_node] == 0:\n", " color, label = 'red', '0'\n", " TREE_NODES[start_node] = 1\n", " else:\n", " color, label = 'blue', '1'\n", " TREE_NODES[start_node] = 2\n", " dot.edge(repr(start_node), repr(stop_node), color=color, label=label)" ] }, { "cell_type": "code", "execution_count": 280, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.067013Z", "iopub.status.busy": "2024-01-18T17:20:26.066937Z", "iopub.status.idle": "2024-01-18T17:20:26.068699Z", "shell.execute_reply": "2024-01-18T17:20:26.068461Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def display_trace_tree(root):\n", " TREE_NODES.clear()\n", " return display_tree(\n", " ('', root, None), extract_node=my_extract_node, edge_attr=my_edge_attr)" ] }, { "cell_type": "code", "execution_count": 281, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.070108Z", "iopub.status.busy": "2024-01-18T17:20:26.070030Z", "iopub.status.idle": "2024-01-18T17:20:26.466080Z", "shell.execute_reply": "2024-01-18T17:20:26.465690Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "0\n", "(0) Length(hang_if_no_space_s_str_1) <= 0\n", "\n", "\n", "\n", "1\n", "(1) str.substr(hang_if_no_space_s_str_1, 0, 1) == " "\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "0\n", "\n", "\n", "\n", "12\n", "? (63)\n", "\n", "\n", "\n", "0->12\n", "\n", "\n", "1\n", "\n", "\n", "\n", "2\n", "(2) Length(hang_if_no_space_s_str_1) <= 1\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "0\n", "\n", "\n", "\n", "11\n", "? (63)\n", "\n", "\n", "\n", "1->11\n", "\n", "\n", "1\n", "\n", "\n", "\n", "3\n", "(3) str.substr(hang_if_no_space_s_str_1, 1, 1) == " "\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "0\n", "\n", "\n", "\n", "10\n", "? (63)\n", "\n", "\n", "\n", "2->10\n", "\n", "\n", "1\n", "\n", "\n", "\n", "4\n", "(4) Length(hang_if_no_space_s_str_1) <= 2\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "0\n", "\n", "\n", "\n", "9\n", "? (63)\n", "\n", "\n", "\n", "3->9\n", "\n", "\n", "1\n", "\n", "\n", "\n", "5\n", "(5) str.substr(hang_if_no_space_s_str_1, 2, 1) == " "\n", "\n", "\n", "\n", "4->5\n", "\n", "\n", "0\n", "\n", "\n", "\n", "8\n", "? (63)\n", "\n", "\n", "\n", "4->8\n", "\n", "\n", "1\n", "\n", "\n", "\n", "6\n", "? (63)\n", "\n", "\n", "\n", "5->6\n", "\n", "\n", "0\n", "\n", "\n", "\n", "7\n", "* ab d\n", "\n", "\n", "\n", "5->7\n", "\n", "\n", "1\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 281, "metadata": {}, "output_type": "execute_result" } ], "source": [ "display_trace_tree(scf.ct.root)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "For example, the pattern `00000:0` corresponds to the following predicates." ] }, { "cell_type": "code", "execution_count": 282, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.467806Z", "iopub.status.busy": "2024-01-18T17:20:26.467681Z", "iopub.status.idle": "2024-01-18T17:20:26.470180Z", "shell.execute_reply": "2024-01-18T17:20:26.469936Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "PlausibleChild[00000:0]" ] }, "execution_count": 282, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scf.ct.leaves['00000:0']" ] }, { "cell_type": "code", "execution_count": 283, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.471532Z", "iopub.status.busy": "2024-01-18T17:20:26.471434Z", "iopub.status.idle": "2024-01-18T17:20:26.475708Z", "shell.execute_reply": "2024-01-18T17:20:26.475486Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[Not(Length(hang_if_no_space_s_str_1) <= 0),\n", " Not(str.substr(hang_if_no_space_s_str_1, 0, 1) == \" \"),\n", " Not(Length(hang_if_no_space_s_str_1) <= 1),\n", " Not(str.substr(hang_if_no_space_s_str_1, 1, 1) == \" \"),\n", " Not(Length(hang_if_no_space_s_str_1) <= 2),\n", " Not(str.substr(hang_if_no_space_s_str_1, 2, 1) == \" \")]" ] }, "execution_count": 283, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scf.ct.leaves['00000:0'].path_expression()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Similarly the pattern `:1` corresponds to the following predicates." ] }, { "cell_type": "code", "execution_count": 284, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.477337Z", "iopub.status.busy": "2024-01-18T17:20:26.477248Z", "iopub.status.idle": "2024-01-18T17:20:26.479668Z", "shell.execute_reply": "2024-01-18T17:20:26.479408Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "PlausibleChild[:1]" ] }, "execution_count": 284, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scf.ct.leaves[':1']" ] }, { "cell_type": "code", "execution_count": 285, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.481087Z", "iopub.status.busy": "2024-01-18T17:20:26.480987Z", "iopub.status.idle": "2024-01-18T17:20:26.483605Z", "shell.execute_reply": "2024-01-18T17:20:26.483377Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[Length(hang_if_no_space_s_str_1) <= 0]" ] }, "execution_count": 285, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scf.ct.leaves[':1'].path_expression()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can now generate the next input to be generated by looking for a leaf that is incompletely explored. The idea is to collect all leaf nodes, and choose one at random." ] }, { "cell_type": "code", "execution_count": 286, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.485242Z", "iopub.status.busy": "2024-01-18T17:20:26.485134Z", "iopub.status.idle": "2024-01-18T17:20:26.487275Z", "shell.execute_reply": "2024-01-18T17:20:26.487041Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class SimpleConcolicFuzzer(SimpleConcolicFuzzer):\n", " def add_trace(self, trace, s):\n", " self.ct.add_trace(trace, s)\n", "\n", " def next_choice(self):\n", " #lst = sorted(list(self.ct.leaves.keys()), key=len)\n", " c = random.choice(list(self.ct.leaves.keys()))\n", " #c = lst[0]\n", " return self.ct.leaves[c]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We use the `next_choice()` as follows." ] }, { "cell_type": "code", "execution_count": 287, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.488831Z", "iopub.status.busy": "2024-01-18T17:20:26.488719Z", "iopub.status.idle": "2024-01-18T17:20:26.492147Z", "shell.execute_reply": "2024-01-18T17:20:26.491855Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "scf = SimpleConcolicFuzzer()\n", "scf.add_trace(_, 'ab d')\n", "node = scf.next_choice()" ] }, { "cell_type": "code", "execution_count": 288, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.493951Z", "iopub.status.busy": "2024-01-18T17:20:26.493841Z", "iopub.status.idle": "2024-01-18T17:20:26.496503Z", "shell.execute_reply": "2024-01-18T17:20:26.496202Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "PlausibleChild[0000:1]" ] }, "execution_count": 288, "metadata": {}, "output_type": "execute_result" } ], "source": [ "node" ] }, { "cell_type": "code", "execution_count": 289, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.498098Z", "iopub.status.busy": "2024-01-18T17:20:26.498006Z", "iopub.status.idle": "2024-01-18T17:20:26.502281Z", "shell.execute_reply": "2024-01-18T17:20:26.502009Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[Not(Length(hang_if_no_space_s_str_1) <= 0),\n", " Not(str.substr(hang_if_no_space_s_str_1, 0, 1) == \" \"),\n", " Not(Length(hang_if_no_space_s_str_1) <= 1),\n", " Not(str.substr(hang_if_no_space_s_str_1, 1, 1) == \" \"),\n", " Length(hang_if_no_space_s_str_1) <= 2]" ] }, "execution_count": 289, "metadata": {}, "output_type": "execute_result" } ], "source": [ "node.path_expression()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We get the next choice for exploration, and expand the path expression, and return it together with a context using `get_newpath()`" ] }, { "cell_type": "code", "execution_count": 290, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.503760Z", "iopub.status.busy": "2024-01-18T17:20:26.503673Z", "iopub.status.idle": "2024-01-18T17:20:26.505682Z", "shell.execute_reply": "2024-01-18T17:20:26.505404Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class SimpleConcolicFuzzer(SimpleConcolicFuzzer):\n", " def get_newpath(self):\n", " node = self.next_choice()\n", " path = node.path_expression()\n", " return path, node.cc()" ] }, { "cell_type": "code", "execution_count": 291, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.507142Z", "iopub.status.busy": "2024-01-18T17:20:26.507052Z", "iopub.status.idle": "2024-01-18T17:20:26.511087Z", "shell.execute_reply": "2024-01-18T17:20:26.510827Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[Length(hang_if_no_space_s_str_1) <= 0]" ] }, "execution_count": 291, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scf = SimpleConcolicFuzzer()\n", "scf.add_trace(_, 'abcd')\n", "path, cc = scf.get_newpath()\n", "path" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### The fuzzing method\n", "\n", "The `fuzz()` method simply generates new lists of predicates, and solves them to produce new inputs." ] }, { "cell_type": "code", "execution_count": 292, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.512808Z", "iopub.status.busy": "2024-01-18T17:20:26.512683Z", "iopub.status.idle": "2024-01-18T17:20:26.515756Z", "shell.execute_reply": "2024-01-18T17:20:26.515471Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class SimpleConcolicFuzzer(SimpleConcolicFuzzer):\n", " def fuzz(self):\n", " if self.ct.root.children == {}:\n", " # a random value to generate comparisons. This would be\n", " # the initial value around which we explore with concolic\n", " # fuzzing.\n", " # str_len = random.randint(1,100)\n", " # return ' '*str_len\n", " return ' '\n", " for i in range(self.max_tries):\n", " path, last = self.get_newpath()\n", " s, v = zeval_smt(path, last, log=False)\n", " if s != 'sat':\n", " # raise Exception(\"Unexpected UNSAT\")\n", " continue\n", "\n", " val = list(v.values())[0]\n", " elt, typ = val\n", "\n", " # make sure that we do not retry the tried paths\n", " # The tracer we add here is incomplete. This gets updated when\n", " # the add_trace is called from the concolic fuzzer context.\n", " # self.add_trace(ConcolicTracer((last.decls, path)), elt)\n", " if typ == 'Int':\n", " if len(elt) == 2 and elt[0] == '-': # negative numbers are [-, x]\n", " return -1*int(elt[1])\n", " return int(elt)\n", " elif typ == 'String':\n", " return elt\n", " return elt\n", " return None" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To illustrate `SimpleConcolicFuzzer`, let us apply it on our example program `cgi_decode()` from the `Coverage` chapter. Note that we cannot use it directly as the hash lookups in `hex_values` can not be used for transferring constraints yet." ] }, { "cell_type": "code", "execution_count": 293, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.517302Z", "iopub.status.busy": "2024-01-18T17:20:26.517216Z", "iopub.status.idle": "2024-01-18T17:20:26.520254Z", "shell.execute_reply": "2024-01-18T17:20:26.519992Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " _[cgi_decode]('a+c')" ] }, { "cell_type": "code", "execution_count": 294, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.521786Z", "iopub.status.busy": "2024-01-18T17:20:26.521693Z", "iopub.status.idle": "2024-01-18T17:20:26.526571Z", "shell.execute_reply": "2024-01-18T17:20:26.526305Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[0 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 0, 1) == \"+\"),\n", " Not(str.substr(cgi_decode_s_str_1, 0, 1) == \"%\"),\n", " 1 < Length(cgi_decode_s_str_1),\n", " str.substr(cgi_decode_s_str_1, 1, 1) == \"+\",\n", " 2 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 2, 1) == \"+\"),\n", " Not(str.substr(cgi_decode_s_str_1, 2, 1) == \"%\"),\n", " Not(3 < Length(cgi_decode_s_str_1))]" ] }, "execution_count": 294, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.path" ] }, { "cell_type": "code", "execution_count": 295, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.528287Z", "iopub.status.busy": "2024-01-18T17:20:26.528138Z", "iopub.status.idle": "2024-01-18T17:20:26.532022Z", "shell.execute_reply": "2024-01-18T17:20:26.531726Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "scf = SimpleConcolicFuzzer()\n", "scf.add_trace(_, 'a+c')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The _trace tree_ shows the path conditions encountered so far. Any blue edge towards a \"?\" implies that there is a path not yet taken." ] }, { "cell_type": "code", "execution_count": 296, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.533589Z", "iopub.status.busy": "2024-01-18T17:20:26.533494Z", "iopub.status.idle": "2024-01-18T17:20:26.916174Z", "shell.execute_reply": "2024-01-18T17:20:26.915811Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "0\n", "(0) Length(cgi_decode_s_str_1) <= 0\n", "\n", "\n", "\n", "1\n", "(1) str.substr(cgi_decode_s_str_1, 0, 1) == "+"\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "0\n", "\n", "\n", "\n", "18\n", "? (63)\n", "\n", "\n", "\n", "0->18\n", "\n", "\n", "1\n", "\n", "\n", "\n", "2\n", "(2) str.substr(cgi_decode_s_str_1, 0, 1) == "%"\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "0\n", "\n", "\n", "\n", "17\n", "? (63)\n", "\n", "\n", "\n", "1->17\n", "\n", "\n", "1\n", "\n", "\n", "\n", "3\n", "(3) Length(cgi_decode_s_str_1) <= 1\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "0\n", "\n", "\n", "\n", "16\n", "? (63)\n", "\n", "\n", "\n", "2->16\n", "\n", "\n", "1\n", "\n", "\n", "\n", "4\n", "(4) str.substr(cgi_decode_s_str_1, 1, 1) == "+"\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "0\n", "\n", "\n", "\n", "15\n", "? (63)\n", "\n", "\n", "\n", "3->15\n", "\n", "\n", "1\n", "\n", "\n", "\n", "5\n", "? (63)\n", "\n", "\n", "\n", "4->5\n", "\n", "\n", "0\n", "\n", "\n", "\n", "6\n", "(5) Length(cgi_decode_s_str_1) <= 2\n", "\n", "\n", "\n", "4->6\n", "\n", "\n", "1\n", "\n", "\n", "\n", "7\n", "(6) str.substr(cgi_decode_s_str_1, 2, 1) == "+"\n", "\n", "\n", "\n", "6->7\n", "\n", "\n", "0\n", "\n", "\n", "\n", "14\n", "? (63)\n", "\n", "\n", "\n", "6->14\n", "\n", "\n", "1\n", "\n", "\n", "\n", "8\n", "(7) str.substr(cgi_decode_s_str_1, 2, 1) == "%"\n", "\n", "\n", "\n", "7->8\n", "\n", "\n", "0\n", "\n", "\n", "\n", "13\n", "? (63)\n", "\n", "\n", "\n", "7->13\n", "\n", "\n", "1\n", "\n", "\n", "\n", "9\n", "(8) Length(cgi_decode_s_str_1) <= 3\n", "\n", "\n", "\n", "8->9\n", "\n", "\n", "0\n", "\n", "\n", "\n", "12\n", "? (63)\n", "\n", "\n", "\n", "8->12\n", "\n", "\n", "1\n", "\n", "\n", "\n", "10\n", "? (63)\n", "\n", "\n", "\n", "9->10\n", "\n", "\n", "0\n", "\n", "\n", "\n", "11\n", "* a+c\n", "\n", "\n", "\n", "9->11\n", "\n", "\n", "1\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 296, "metadata": {}, "output_type": "execute_result" } ], "source": [ "display_trace_tree(scf.ct.root)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "So, we fuzz to get a new path that is not empty." ] }, { "cell_type": "code", "execution_count": 297, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.918022Z", "iopub.status.busy": "2024-01-18T17:20:26.917896Z", "iopub.status.idle": "2024-01-18T17:20:26.940149Z", "shell.execute_reply": "2024-01-18T17:20:26.939780Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A+\n" ] } ], "source": [ "v = scf.fuzz()\n", "print(v)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can now obtain the new trace as before." ] }, { "cell_type": "code", "execution_count": 298, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.941917Z", "iopub.status.busy": "2024-01-18T17:20:26.941788Z", "iopub.status.idle": "2024-01-18T17:20:26.945221Z", "shell.execute_reply": "2024-01-18T17:20:26.944812Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ExpectError():\n", " with ConcolicTracer() as _:\n", " _[cgi_decode](v)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The new trace is added to our fuzzer using `add_trace()`" ] }, { "cell_type": "code", "execution_count": 299, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.947079Z", "iopub.status.busy": "2024-01-18T17:20:26.946985Z", "iopub.status.idle": "2024-01-18T17:20:26.950476Z", "shell.execute_reply": "2024-01-18T17:20:26.950227Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "scf.add_trace(_, v)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The updated binary tree is as follows. Note the difference between the child nodes of `Root` node." ] }, { "cell_type": "code", "execution_count": 300, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:26.952010Z", "iopub.status.busy": "2024-01-18T17:20:26.951923Z", "iopub.status.idle": "2024-01-18T17:20:27.360544Z", "shell.execute_reply": "2024-01-18T17:20:27.360046Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "0\n", "(0) Length(cgi_decode_s_str_1) <= 0\n", "\n", "\n", "\n", "1\n", "(1) str.substr(cgi_decode_s_str_1, 0, 1) == "+"\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "0\n", "\n", "\n", "\n", "18\n", "? (63)\n", "\n", "\n", "\n", "0->18\n", "\n", "\n", "1\n", "\n", "\n", "\n", "2\n", "(2) str.substr(cgi_decode_s_str_1, 0, 1) == "%"\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "0\n", "\n", "\n", "\n", "17\n", "? (63)\n", "\n", "\n", "\n", "1->17\n", "\n", "\n", "1\n", "\n", "\n", "\n", "3\n", "(3) Length(cgi_decode_s_str_1) <= 1\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "0\n", "\n", "\n", "\n", "16\n", "? (63)\n", "\n", "\n", "\n", "2->16\n", "\n", "\n", "1\n", "\n", "\n", "\n", "4\n", "(4) str.substr(cgi_decode_s_str_1, 1, 1) == "+"\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "0\n", "\n", "\n", "\n", "15\n", "? (63)\n", "\n", "\n", "\n", "3->15\n", "\n", "\n", "1\n", "\n", "\n", "\n", "5\n", "? (63)\n", "\n", "\n", "\n", "4->5\n", "\n", "\n", "0\n", "\n", "\n", "\n", "6\n", "(5) Length(cgi_decode_s_str_1) <= 2\n", "\n", "\n", "\n", "4->6\n", "\n", "\n", "1\n", "\n", "\n", "\n", "7\n", "(6) str.substr(cgi_decode_s_str_1, 2, 1) == "+"\n", "\n", "\n", "\n", "6->7\n", "\n", "\n", "0\n", "\n", "\n", "\n", "14\n", "* A+\n", "\n", "\n", "\n", "6->14\n", "\n", "\n", "1\n", "\n", "\n", "\n", "8\n", "(7) str.substr(cgi_decode_s_str_1, 2, 1) == "%"\n", "\n", "\n", "\n", "7->8\n", "\n", "\n", "0\n", "\n", "\n", "\n", "13\n", "? (63)\n", "\n", "\n", "\n", "7->13\n", "\n", "\n", "1\n", "\n", "\n", "\n", "9\n", "(8) Length(cgi_decode_s_str_1) <= 3\n", "\n", "\n", "\n", "8->9\n", "\n", "\n", "0\n", "\n", "\n", "\n", "12\n", "? (63)\n", "\n", "\n", "\n", "8->12\n", "\n", "\n", "1\n", "\n", "\n", "\n", "10\n", "? (63)\n", "\n", "\n", "\n", "9->10\n", "\n", "\n", "0\n", "\n", "\n", "\n", "11\n", "* a+c\n", "\n", "\n", "\n", "9->11\n", "\n", "\n", "1\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 300, "metadata": {}, "output_type": "execute_result" } ], "source": [ "display_trace_tree(scf.ct.root)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A complete fuzzer run is as follows:" ] }, { "cell_type": "code", "execution_count": 301, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:27.362976Z", "iopub.status.busy": "2024-01-18T17:20:27.362683Z", "iopub.status.idle": "2024-01-18T17:20:27.549951Z", "shell.execute_reply": "2024-01-18T17:20:27.549595Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "' '\n", "''\n", "'+'\n", "'%'\n", "'+A'\n", "'++'\n", "'AB'\n", "'++A'\n", "'A%'\n", "'+AB'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "IndexError: string index out of range (expected)\n", "IndexError: string index out of range (expected)\n" ] } ], "source": [ "scf = SimpleConcolicFuzzer()\n", "for i in range(10):\n", " v = scf.fuzz()\n", " print(repr(v))\n", " if v is None:\n", " continue\n", " with ConcolicTracer() as _:\n", " with ExpectError(print_traceback=False):\n", " # z3.StringVal(urllib.parse.unquote('%80')) <-- bug in z3\n", " _[cgi_decode](v)\n", " scf.add_trace(_, v)" ] }, { "cell_type": "code", "execution_count": 302, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:27.551714Z", "iopub.status.busy": "2024-01-18T17:20:27.551602Z", "iopub.status.idle": "2024-01-18T17:20:27.956698Z", "shell.execute_reply": "2024-01-18T17:20:27.956206Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "0\n", "(0) Length(cgi_decode_s_str_1) <= 0\n", "\n", "\n", "\n", "1\n", "(1) str.substr(cgi_decode_s_str_1, 0, 1) == "+"\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "0\n", "\n", "\n", "\n", "36\n", "* \n", "\n", "\n", "\n", "0->36\n", "\n", "\n", "1\n", "\n", "\n", "\n", "2\n", "(2) str.substr(cgi_decode_s_str_1, 0, 1) == "%"\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "0\n", "\n", "\n", "\n", "13\n", "(2) Length(cgi_decode_s_str_1) <= 1\n", "\n", "\n", "\n", "1->13\n", "\n", "\n", "1\n", "\n", "\n", "\n", "3\n", "(3) Length(cgi_decode_s_str_1) <= 1\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "0\n", "\n", "\n", "\n", "12\n", "* %\n", "\n", "\n", "\n", "2->12\n", "\n", "\n", "1\n", "\n", "\n", "\n", "4\n", "(4) str.substr(cgi_decode_s_str_1, 1, 1) == "+"\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "0\n", "\n", "\n", "\n", "11\n", "*  \n", "\n", "\n", "\n", "3->11\n", "\n", "\n", "1\n", "\n", "\n", "\n", "5\n", "(5) str.substr(cgi_decode_s_str_1, 1, 1) == "%"\n", "\n", "\n", "\n", "4->5\n", "\n", "\n", "0\n", "\n", "\n", "\n", "10\n", "? (63)\n", "\n", "\n", "\n", "4->10\n", "\n", "\n", "1\n", "\n", "\n", "\n", "6\n", "(6) Length(cgi_decode_s_str_1) <= 2\n", "\n", "\n", "\n", "5->6\n", "\n", "\n", "0\n", "\n", "\n", "\n", "9\n", "* A%\n", "\n", "\n", "\n", "5->9\n", "\n", "\n", "1\n", "\n", "\n", "\n", "7\n", "? (63)\n", "\n", "\n", "\n", "6->7\n", "\n", "\n", "0\n", "\n", "\n", "\n", "8\n", "* AB\n", "\n", "\n", "\n", "6->8\n", "\n", "\n", "1\n", "\n", "\n", "\n", "14\n", "(3) str.substr(cgi_decode_s_str_1, 1, 1) == "+"\n", "\n", "\n", "\n", "13->14\n", "\n", "\n", "0\n", "\n", "\n", "\n", "35\n", "* +\n", "\n", "\n", "\n", "13->35\n", "\n", "\n", "1\n", "\n", "\n", "\n", "15\n", "(4) str.substr(cgi_decode_s_str_1, 1, 1) == "%"\n", "\n", "\n", "\n", "14->15\n", "\n", "\n", "0\n", "\n", "\n", "\n", "26\n", "(4) Length(cgi_decode_s_str_1) <= 2\n", "\n", "\n", "\n", "14->26\n", "\n", "\n", "1\n", "\n", "\n", "\n", "16\n", "(5) Length(cgi_decode_s_str_1) <= 2\n", "\n", "\n", "\n", "15->16\n", "\n", "\n", "0\n", "\n", "\n", "\n", "25\n", "? (63)\n", "\n", "\n", "\n", "15->25\n", "\n", "\n", "1\n", "\n", "\n", "\n", "17\n", "(6) str.substr(cgi_decode_s_str_1, 2, 1) == "+"\n", "\n", "\n", "\n", "16->17\n", "\n", "\n", "0\n", "\n", "\n", "\n", "24\n", "* +A\n", "\n", "\n", "\n", "16->24\n", "\n", "\n", "1\n", "\n", "\n", "\n", "18\n", "(7) str.substr(cgi_decode_s_str_1, 2, 1) == "%"\n", "\n", "\n", "\n", "17->18\n", "\n", "\n", "0\n", "\n", "\n", "\n", "23\n", "? (63)\n", "\n", "\n", "\n", "17->23\n", "\n", "\n", "1\n", "\n", "\n", "\n", "19\n", "(8) Length(cgi_decode_s_str_1) <= 3\n", "\n", "\n", "\n", "18->19\n", "\n", "\n", "0\n", "\n", "\n", "\n", "22\n", "? (63)\n", "\n", "\n", "\n", "18->22\n", "\n", "\n", "1\n", "\n", "\n", "\n", "20\n", "? (63)\n", "\n", "\n", "\n", "19->20\n", "\n", "\n", "0\n", "\n", "\n", "\n", "21\n", "* +AB\n", "\n", "\n", "\n", "19->21\n", "\n", "\n", "1\n", "\n", "\n", "\n", "27\n", "(5) str.substr(cgi_decode_s_str_1, 2, 1) == "+"\n", "\n", "\n", "\n", "26->27\n", "\n", "\n", "0\n", "\n", "\n", "\n", "34\n", "* ++\n", "\n", "\n", "\n", "26->34\n", "\n", "\n", "1\n", "\n", "\n", "\n", "28\n", "(6) str.substr(cgi_decode_s_str_1, 2, 1) == "%"\n", "\n", "\n", "\n", "27->28\n", "\n", "\n", "0\n", "\n", "\n", "\n", "33\n", "? (63)\n", "\n", "\n", "\n", "27->33\n", "\n", "\n", "1\n", "\n", "\n", "\n", "29\n", "(7) Length(cgi_decode_s_str_1) <= 3\n", "\n", "\n", "\n", "28->29\n", "\n", "\n", "0\n", "\n", "\n", "\n", "32\n", "? (63)\n", "\n", "\n", "\n", "28->32\n", "\n", "\n", "1\n", "\n", "\n", "\n", "30\n", "? (63)\n", "\n", "\n", "\n", "29->30\n", "\n", "\n", "0\n", "\n", "\n", "\n", "31\n", "* ++A\n", "\n", "\n", "\n", "29->31\n", "\n", "\n", "1\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 302, "metadata": {}, "output_type": "execute_result" } ], "source": [ "display_trace_tree(scf.ct.root)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Note.** Our concolic tracer is limited in that it does not track changes in the string length. This leads it to treat every string with same prefix as the same string." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `SimpleConcolicFuzzer` is reasonably efficient at exploring paths near the path followed by a given sample input. However, it is not very intelligent when it comes to choosing which paths to follow. We look at another fuzzer that lifts the predicates obtained to the grammar and achieves better fuzzing." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Concolic Grammar Fuzzing\n", "\n", "The concolic framework can be used directly in grammar-based fuzzing. We implement a class `ConcolicGrammarFuzzer` wihich does this." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "tags": [] }, "source": [ "### Excursion: Implementing ConcolicGrammarFuzzer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "First, we extend our `GrammarFuzzer` with a helper method `tree_to_string()` such that we can retrieve the derivation tree of the fuzz output. We also define `prune_tree()` and `coalesce()` methods to reduce the depth of sub trees. These methods accept a list of tokens types such that a node belonging to the token type gets converted from a tree to a leaf node by calling `tree_to_string()`." ] }, { "cell_type": "code", "execution_count": 303, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:27.958948Z", "iopub.status.busy": "2024-01-18T17:20:27.958809Z", "iopub.status.idle": "2024-01-18T17:20:27.961159Z", "shell.execute_reply": "2024-01-18T17:20:27.960805Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from InformationFlow import INVENTORY_GRAMMAR, SQLException" ] }, { "cell_type": "code", "execution_count": 304, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:27.962824Z", "iopub.status.busy": "2024-01-18T17:20:27.962700Z", "iopub.status.idle": "2024-01-18T17:20:27.964674Z", "shell.execute_reply": "2024-01-18T17:20:27.964253Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from GrammarFuzzer import GrammarFuzzer" ] }, { "cell_type": "code", "execution_count": 305, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:27.966472Z", "iopub.status.busy": "2024-01-18T17:20:27.966343Z", "iopub.status.idle": "2024-01-18T17:20:27.970415Z", "shell.execute_reply": "2024-01-18T17:20:27.970102Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ConcolicGrammarFuzzer(GrammarFuzzer):\n", " def tree_to_string(self, tree):\n", " symbol, children, *_ = tree\n", " e = ''\n", " if children:\n", " return e.join([self.tree_to_string(c) for c in children])\n", " else:\n", " return e if symbol in self.grammar else symbol\n", "\n", " def prune_tree(self, tree, tokens):\n", " name, children = tree\n", " children = self.coalesce(children)\n", " if name in tokens:\n", " return (name, [(self.tree_to_string(tree), [])])\n", " else:\n", " return (name, [self.prune_tree(c, tokens) for c in children])\n", "\n", " def coalesce(self, children):\n", " last = ''\n", " new_lst = []\n", " for cn, cc in children:\n", " if cn not in self.grammar:\n", " last += cn\n", " else:\n", " if last:\n", " new_lst.append((last, []))\n", " last = ''\n", " new_lst.append((cn, cc))\n", " if last:\n", " new_lst.append((last, []))\n", " return new_lst" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can now use the fuzzer to produce inputs for our DB." ] }, { "cell_type": "code", "execution_count": 306, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:27.971926Z", "iopub.status.busy": "2024-01-18T17:20:27.971829Z", "iopub.status.idle": "2024-01-18T17:20:27.982253Z", "shell.execute_reply": "2024-01-18T17:20:27.981925Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "tgf = ConcolicGrammarFuzzer(INVENTORY_GRAMMAR)\n", "while True:\n", " qtree = tgf.fuzz_tree()\n", " query = str(tgf.tree_to_string(qtree))\n", " if query.startswith('select'):\n", " break" ] }, { "cell_type": "code", "execution_count": 307, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:27.983982Z", "iopub.status.busy": "2024-01-18T17:20:27.983852Z", "iopub.status.idle": "2024-01-18T17:20:27.985830Z", "shell.execute_reply": "2024-01-18T17:20:27.985507Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ExpectError import ExpectError" ] }, { "cell_type": "code", "execution_count": 308, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:27.987610Z", "iopub.status.busy": "2024-01-18T17:20:27.987471Z", "iopub.status.idle": "2024-01-18T17:20:27.992462Z", "shell.execute_reply": "2024-01-18T17:20:27.992118Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "'select t4(I,N)!=b(k)/O!=(K4(:/Z)) from I7'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_77984/2536269233.py\", line 4, in \n", " res = _[db_select](str(query))\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_77984/2687284210.py\", line 3, in __call__\n", " self.result = self.fn(*self.concolic(args))\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_77984/1994573112.py\", line 4, in db_select\n", " r = my_db.sql(s)\n", " File \"/Users/zeller/Projects/fuzzingbook/notebooks/InformationFlow.ipynb\", line 65, in sql\n", " return method(query[len(key):])\n", " File \"/Users/zeller/Projects/fuzzingbook/notebooks/InformationFlow.ipynb\", line 84, in do_select\n", " _, table = self.table(t_name)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_77984/2474817571.py\", line 6, in table\n", " raise SQLException('Table (%s) was not found' % repr(t_name))\n", "InformationFlow.SQLException: Table ('I7') was not found (expected)\n" ] } ], "source": [ "with ExpectError():\n", " print(repr(query))\n", " with ConcolicTracer() as _:\n", " res = _[db_select](str(query))\n", " print(repr(res))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Our fuzzer returns with an exception. It is unable to find the specified table. Let us examine the predicates it encountered." ] }, { "cell_type": "code", "execution_count": 309, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:27.994167Z", "iopub.status.busy": "2024-01-18T17:20:27.994037Z", "iopub.status.idle": "2024-01-18T17:20:27.999554Z", "shell.execute_reply": "2024-01-18T17:20:27.999290Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 0 == IndexOf(db_select_s_str_1, \"select \", 0)\n", "1 0 == IndexOf(db_select_s_str_1, \"select \", 0)\n", "2 Not(0 >\n", " IndexOf(str.substr(db_select_s_str_1, 7, 34),\n", " \" from \",\n", " 0))\n", "3 Not(Or(0 <\n", " IndexOf(str.substr(db_select_s_str_1, 7, 34),\n", " \" where \",\n", " 0),\n", " 0 ==\n", " IndexOf(str.substr(db_select_s_str_1, 7, 34),\n", " \" where \",\n", " 0)))\n", "4 Not(str.substr(str.substr(db_select_s_str_1, 7, 34), 32, 2) ==\n", " \"inventory\")\n" ] } ], "source": [ "for i, p in enumerate(_.path):\n", " print(i, p)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Note that we can obtain constraints that are not present in the grammar from using the `ConcolicTracer`. In particular, see how we are able to obtain the condition that the table needs to be `inventory` (Predicate 11) for the fuzzing to succeed." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "How do we lift these to the grammar? and in particular how do we do it automatically? One option we have is to simply switch the last predicate obtained. In our case, the last predicate is (11). Can we simply invert the predicate and solve it again?" ] }, { "cell_type": "code", "execution_count": 310, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.001517Z", "iopub.status.busy": "2024-01-18T17:20:28.001303Z", "iopub.status.idle": "2024-01-18T17:20:28.003648Z", "shell.execute_reply": "2024-01-18T17:20:28.003293Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "new_path = _.path[0:-1] + [z3.Not(_.path[-1])]" ] }, { "cell_type": "code", "execution_count": 311, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.005179Z", "iopub.status.busy": "2024-01-18T17:20:28.005070Z", "iopub.status.idle": "2024-01-18T17:20:28.006866Z", "shell.execute_reply": "2024-01-18T17:20:28.006582Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "new_ = ConcolicTracer((_.decls, new_path))\n", "new_.fn = _.fn\n", "new_.fn_args = _.fn_args" ] }, { "cell_type": "code", "execution_count": 312, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.008336Z", "iopub.status.busy": "2024-01-18T17:20:28.008228Z", "iopub.status.idle": "2024-01-18T17:20:28.036762Z", "shell.execute_reply": "2024-01-18T17:20:28.036325Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('No Solutions', None)" ] }, "execution_count": 312, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Indeed, this will not work as the string lengths being compared to are different." ] }, { "cell_type": "code", "execution_count": 313, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.038985Z", "iopub.status.busy": "2024-01-18T17:20:28.038845Z", "iopub.status.idle": "2024-01-18T17:20:28.062205Z", "shell.execute_reply": "2024-01-18T17:20:28.061855Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Not(str.substr(str.substr(db_select_s_str_1, 7, 34), 32, 2) ==\n", " \"inventory\")\n", "no solution\n" ] } ], "source": [ "print(_.path[-1])\n", "z3.solve(z3.Not(_.path[-1]))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A better idea is to investigate what _string_ comparisons are being made, and associate that with the corresponding nodes in the grammar. Let us examine our derivation tree (pruned to avoid recursive structures, and to focus on important parts)." ] }, { "cell_type": "code", "execution_count": 314, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.064021Z", "iopub.status.busy": "2024-01-18T17:20:28.063878Z", "iopub.status.idle": "2024-01-18T17:20:28.065926Z", "shell.execute_reply": "2024-01-18T17:20:28.065528Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from GrammarFuzzer import display_tree" ] }, { "cell_type": "code", "execution_count": 315, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.067933Z", "iopub.status.busy": "2024-01-18T17:20:28.067770Z", "iopub.status.idle": "2024-01-18T17:20:28.468487Z", "shell.execute_reply": "2024-01-18T17:20:28.468058Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "0\n", "<start>\n", "\n", "\n", "\n", "1\n", "<query>\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "\n", "\n", "\n", "2\n", "select \n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "\n", "3\n", "<exprs>\n", "\n", "\n", "\n", "1->3\n", "\n", "\n", "\n", "\n", "\n", "5\n", " from \n", "\n", "\n", "\n", "1->5\n", "\n", "\n", "\n", "\n", "\n", "6\n", "<table>\n", "\n", "\n", "\n", "1->6\n", "\n", "\n", "\n", "\n", "\n", "4\n", "t4(I,N)!=b(k)/O!=(K4(:/Z))\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "\n", "\n", "\n", "7\n", "I7\n", "\n", "\n", "\n", "6->7\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 315, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prune_tokens = [\n", " '', '', '', '', '', ''\n", "]\n", "dt = tgf.prune_tree(qtree, prune_tokens)\n", "display_tree(dt)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Can we identify which part of the input was supplied by which part of the grammar? We define `span()` that can recover this information from the derivation tree. For a given node, let us assume that the start point is known. Then, for processing the children, we proceed as follows: We choose one child at a time from left to right, and compute the length of the child. The length of the children before the current child in addition to our starting point gives the starting point of the current child. The end point for each node is simply the end point of its last children (or the length of its node if it is a leaf)." ] }, { "cell_type": "code", "execution_count": 316, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.470356Z", "iopub.status.busy": "2024-01-18T17:20:28.470232Z", "iopub.status.idle": "2024-01-18T17:20:28.472237Z", "shell.execute_reply": "2024-01-18T17:20:28.471973Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from GrammarFuzzer import START_SYMBOL" ] }, { "cell_type": "code", "execution_count": 317, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.473829Z", "iopub.status.busy": "2024-01-18T17:20:28.473714Z", "iopub.status.idle": "2024-01-18T17:20:28.476432Z", "shell.execute_reply": "2024-01-18T17:20:28.476139Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def span(node, g, node_start=0):\n", " hm = {}\n", " k, cs = node\n", " end_i = node_start\n", " new_cs = []\n", " for c in cs:\n", " chm, (ck, child_start, child_end, gcs) = span(c, g, end_i)\n", " new_cs.append((ck, child_start, child_end, gcs))\n", " end_i = child_end\n", " hm.update(chm)\n", " node_end = end_i if cs else node_start + len(k)\n", " if k in g and k != START_SYMBOL:\n", " hm[k] = (node_start, node_end - node_start)\n", " return hm, (k, node_start, node_end, new_cs)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We use it as follows:" ] }, { "cell_type": "code", "execution_count": 318, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.478150Z", "iopub.status.busy": "2024-01-18T17:20:28.478023Z", "iopub.status.idle": "2024-01-18T17:20:28.480075Z", "shell.execute_reply": "2024-01-18T17:20:28.479755Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "span_hm, _n = span(dt, INVENTORY_GRAMMAR)" ] }, { "cell_type": "code", "execution_count": 319, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.481563Z", "iopub.status.busy": "2024-01-18T17:20:28.481446Z", "iopub.status.idle": "2024-01-18T17:20:28.483889Z", "shell.execute_reply": "2024-01-18T17:20:28.483566Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'': (7, 26), '
': (39, 2), '': (0, 41)}" ] }, "execution_count": 319, "metadata": {}, "output_type": "execute_result" } ], "source": [ "span_hm" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can check if we got the right values as follows." ] }, { "cell_type": "code", "execution_count": 320, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.485561Z", "iopub.status.busy": "2024-01-18T17:20:28.485439Z", "iopub.status.idle": "2024-01-18T17:20:28.487537Z", "shell.execute_reply": "2024-01-18T17:20:28.487260Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "query: select t4(I,N)!=b(k)/O!=(K4(:/Z)) from I7\n", " t4(I,N)!=b(k)/O!=(K4(:/Z))\n", "
I7\n", " select t4(I,N)!=b(k)/O!=(K4(:/Z)) from I7\n" ] } ], "source": [ "print(\"query:\", query)\n", "for k in span_hm:\n", " start, l = span_hm[k]\n", " print(k, query[start:start + l])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Next, we need to obtain all the comparisons made in each predicate. For that, we define two helper functions. The first is `unwrap_substrings()` that translates multiple calls to `z3.SubString` and returns the start, and length of the given z3 string expression." ] }, { "cell_type": "code", "execution_count": 321, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.489135Z", "iopub.status.busy": "2024-01-18T17:20:28.489017Z", "iopub.status.idle": "2024-01-18T17:20:28.491269Z", "shell.execute_reply": "2024-01-18T17:20:28.491007Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def unwrap_substrings(s):\n", " assert s.decl().name() == 'str.substr'\n", " cs, frm, l = s.children()\n", " fl = frm.as_long()\n", " ll = l.as_long()\n", " if cs.decl().name() == 'str.substr':\n", " newfrm, _l = unwrap_substrings(cs)\n", " return (fl + newfrm, ll)\n", " else:\n", " return (fl, ll)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We define `traverse_z3()` that traverses a given z3 string expression, and collects all direct string comparisons to a substring of the original argument." ] }, { "cell_type": "code", "execution_count": 322, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.492906Z", "iopub.status.busy": "2024-01-18T17:20:28.492740Z", "iopub.status.idle": "2024-01-18T17:20:28.497059Z", "shell.execute_reply": "2024-01-18T17:20:28.496782Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def traverse_z3(p, hm):\n", " def z3_as_string(v):\n", " return v.as_string()\n", "\n", " n = p.decl().name()\n", " if n == 'not':\n", " return traverse_z3(p.children()[0], hm)\n", " elif n == '=':\n", " i, j = p.children()\n", " if isinstance(i, (int, z3.IntNumRef)):\n", " return traverse_z3(j, hm)\n", " elif isinstance(j, (int, z3.IntNumRef)):\n", " return traverse_z3(i, hm)\n", " else:\n", " if i.is_string() and j.is_string():\n", " if i.is_string_value():\n", " cs, frm, l = j.children()\n", " if (isinstance(frm, z3.IntNumRef)\n", " and isinstance(l, z3.IntNumRef)):\n", " hm[z3_as_string(i)] = unwrap_substrings(j)\n", " elif j.is_string_value():\n", " cs, frm, l = i.children()\n", " if (isinstance(frm, z3.IntNumRef)\n", " and isinstance(l, z3.IntNumRef)):\n", " hm[z3_as_string(j)] = unwrap_substrings(i)\n", " else:\n", " assert False # for now\n", " elif n == '<' or n == '>':\n", " i, j = p.children()\n", " if isinstance(i, (int, z3.IntNumRef)):\n", " return traverse_z3(j, hm)\n", " elif isinstance(j, (int, z3.IntNumRef)):\n", " return traverse_z3(i, hm)\n", " else:\n", " assert False\n", " return p" ] }, { "cell_type": "code", "execution_count": 323, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.498487Z", "iopub.status.busy": "2024-01-18T17:20:28.498390Z", "iopub.status.idle": "2024-01-18T17:20:28.502228Z", "shell.execute_reply": "2024-01-18T17:20:28.501936Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{'inventory': (39, 2)}" ] }, "execution_count": 323, "metadata": {}, "output_type": "execute_result" } ], "source": [ "comparisons: Dict[str, Tuple] = {}\n", "for p in _.path:\n", " traverse_z3(p, comparisons)\n", "comparisons" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "All that we need now is to declare string variables that match the substrings in `comparisons`, and solve for them for each item in the path. For that, we define `find_alternatives()`." ] }, { "cell_type": "code", "execution_count": 324, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.504025Z", "iopub.status.busy": "2024-01-18T17:20:28.503891Z", "iopub.status.idle": "2024-01-18T17:20:28.506496Z", "shell.execute_reply": "2024-01-18T17:20:28.506226Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def find_alternatives(spans, cmp):\n", " alts = {}\n", " for key in spans:\n", " start, l = spans[key]\n", " rset = set(range(start, start + l))\n", " for ckey in cmp:\n", " cstart, cl = cmp[ckey]\n", " cset = set(range(cstart, cstart + cl))\n", " # if rset.issubset(cset): <- ignoring subsets for now.\n", " if rset == cset:\n", " if key not in alts:\n", " alts[key] = set()\n", " alts[key].add(ckey)\n", " return alts" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We use it as follows." ] }, { "cell_type": "code", "execution_count": 325, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.508085Z", "iopub.status.busy": "2024-01-18T17:20:28.507947Z", "iopub.status.idle": "2024-01-18T17:20:28.510130Z", "shell.execute_reply": "2024-01-18T17:20:28.509878Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{'
': {'inventory'}}" ] }, "execution_count": 325, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alternatives = find_alternatives(span_hm, comparisons)\n", "alternatives" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "So, we have our alternatives for each key in the grammar. We can now update our grammar as follows." ] }, { "cell_type": "code", "execution_count": 326, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.511676Z", "iopub.status.busy": "2024-01-18T17:20:28.511571Z", "iopub.status.idle": "2024-01-18T17:20:28.513299Z", "shell.execute_reply": "2024-01-18T17:20:28.513016Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "INVENTORY_GRAMMAR_NEW = dict(INVENTORY_GRAMMAR)" ] }, { "cell_type": "code", "execution_count": 327, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.514735Z", "iopub.status.busy": "2024-01-18T17:20:28.514617Z", "iopub.status.idle": "2024-01-18T17:20:28.516348Z", "shell.execute_reply": "2024-01-18T17:20:28.516085Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "for k in alternatives:\n", " INVENTORY_GRAMMAR_NEW[k] = INVENTORY_GRAMMAR_NEW[k] + list(alternatives[k])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We made a choice here. We could have completely overwritten the definition of `
` . Instead, we added our new alternatives to the existing definition. This way, our fuzzer will also attempt other values for `
` once in a while." ] }, { "cell_type": "code", "execution_count": 328, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.517911Z", "iopub.status.busy": "2024-01-18T17:20:28.517796Z", "iopub.status.idle": "2024-01-18T17:20:28.519972Z", "shell.execute_reply": "2024-01-18T17:20:28.519656Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['', 'inventory']" ] }, "execution_count": 328, "metadata": {}, "output_type": "execute_result" } ], "source": [ "INVENTORY_GRAMMAR_NEW['
']" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us try fuzzing with our new grammar." ] }, { "cell_type": "code", "execution_count": 329, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.521485Z", "iopub.status.busy": "2024-01-18T17:20:28.521372Z", "iopub.status.idle": "2024-01-18T17:20:28.523336Z", "shell.execute_reply": "2024-01-18T17:20:28.523066Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "cgf = ConcolicGrammarFuzzer(INVENTORY_GRAMMAR_NEW)" ] }, { "cell_type": "code", "execution_count": 330, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.524716Z", "iopub.status.busy": "2024-01-18T17:20:28.524604Z", "iopub.status.idle": "2024-01-18T17:20:28.618419Z", "shell.execute_reply": "2024-01-18T17:20:28.618069Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "insert into inventory (i9Oam41gsP2,h97,q8J:.70J) values ('.q')\n", "Column ('i9Oam41gsP2') was not found\n", "\n", "select C from wy where R/s/y>_-X-.+C/u==(((---6.5)))\n", "Table ('wy') was not found\n", "\n", "update T set I=gj5 where (-8.6/O*.-W)==s-OA\n", "Table ('nb0') was not found\n", "\n", "insert into inventory (P,wmE,U,F) values (50,'/',--6.2)\n", "Column ('P') was not found\n", "\n", "delete from GTV3_ where :-M!=t>n+R/x+r*a/t-r-V\n", "Table ('GTV3_') was not found\n", "\n" ] } ], "source": [ "for i in range(10):\n", " qtree = cgf.fuzz_tree()\n", " query = cgf.tree_to_string(qtree)\n", " print(query)\n", " with ExpectError(print_traceback=False):\n", " try:\n", " with ConcolicTracer() as _:\n", " res = _[db_select](query)\n", " print(repr(res))\n", " except SQLException as e:\n", " print(e)\n", " print()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "That is, we were able to reach the dangerous method `my_eval()`.\n", "In effect, what we have done is to lift parts of predicates to the grammar. The new grammar can generate inputs that reach deeper into the program than before. Note that we have only handled the equality predicate. One can also lift the '<' and '>' comparison operators to the grammar if required.\n", "\n", "Compare the output of our fuzzer to the original `GrammarFuzzer` below." ] }, { "cell_type": "code", "execution_count": 331, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.620138Z", "iopub.status.busy": "2024-01-18T17:20:28.620042Z", "iopub.status.idle": "2024-01-18T17:20:28.687580Z", "shell.execute_reply": "2024-01-18T17:20:28.687277Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "insert into UCu4 (E,xM:lOq6,u38p,W54G3b0) values (':',1.835)\n", "Table ('UCu4') was not found\n", "\n", "insert into B81 (Np) values ('h')\n", "Table ('B81') was not found\n", "\n", "delete from w where Xn((T))>a(8.8,g)/h+t-P-j+L\n", "Table ('w') was not found\n", "\n", "update q75 set L=z4 where ((QUy+N))==A/P-L*ao(R)/I\n", "Table ('q75') was not found\n", "\n", "update q3 set x=F where l(N)-P-S+t==e\n", "Table ('q3') was not found\n", "\n", "update Dy06rr set h=F where (z!=Q)==(((aa==O(mA,j(g)*:,s,B)-5-eD(c,F!=n)==eO41Xy\n", "Table ('Z') was not found\n", "\n", "select 1.8 from U3X8p\n", "Table ('U3X8p') was not found\n", "\n", "update N set w=X9,A=w,M=Z where ((b!=c))==U/Nw-H/e,s/s from vehicles\n", "Invalid WHERE ('(c,M-./b*y>w-H/e,s/s)')\n", "\n", "delete from WG where t9(z)!=d4(P,r)*K/Q/M\n", "Table ('WG') was not found\n", "\n", "delete from months where Oz(w)!=4.9\n", "Invalid WHERE ('Oz(w)!=4.9')\n", "\n", "select 9.33 from months\n", "[9.33, 9.33, 9.33, 9.33, 9.33, 9.33, 9.33, 9.33, 9.33, 9.33, 9.33, 9.33]\n", "\n", "select X/G==u==y,b==p>P,e-M-r from WL\n", "Table ('WL') was not found\n", "\n", "select b0J8n3 from months\n", "Invalid WHERE ('(b0J8n3)')\n", "\n", "delete from vehicles where D-o-s(:,S)*x>8==0!=y==U\n", "Invalid WHERE ('D-o-s(:,S)*x>8==0!=y==U')\n", "\n" ] } ], "source": [ "cgf = ConcolicGrammarFuzzer(INVENTORY_GRAMMAR)\n", "cgf.prune_tokens(prune_tokens)\n", "for i in range(10):\n", " query = cgf.fuzz()\n", " print(query)\n", " with ConcolicTracer() as _:\n", " with ExpectError(print_traceback=False):\n", " try:\n", " res = _[db_select](query)\n", " print(repr(res))\n", " except SQLException as e:\n", " print(e)\n", " cgf.update_grammar(_)\n", " print()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As can be seen, the fuzzer starts with no knowledge of the tables `vehicles`, `months` and `years`, but identifies it from the concolic execution, and lifts it to the grammar. This allows us to improve the effectiveness of fuzzing." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Limitations\n", "\n", "As with dynamic taint analysis, implicit control flow can obscure the predicates encountered during concolic execution. However, this limitation could be overcome to some extent by wrapping any constants in the source with their respective proxy objects. Similarly, calls to internal C functions can cause the symbolic information to be discarded, and only partial information may be obtained." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Synopsis" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This chapter defines two main classes: `SimpleConcolicFuzzer` and `ConcolicGrammarFuzzer`. The `SimpleConcolicFuzzer` first uses a sample input to collect predicates encountered. The fuzzer then negates random predicates to generate new input constraints. These, when solved, produce inputs that explore paths that are close to the original path." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### ConcolicTracer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "At the heart of both fuzzers lies the concept of a _concolic tracer_, capturing symbolic variables and path conditions as a program gets executed.\n", "\n", "`ConcolicTracer` is used in a `with` block; the syntax `tracer[function]` executes `function` within the `tracer` while capturing conditions. Here is an example for the `cgi_decode()` function:" ] }, { "cell_type": "code", "execution_count": 337, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.807460Z", "iopub.status.busy": "2024-01-18T17:20:28.807346Z", "iopub.status.idle": "2024-01-18T17:20:28.810855Z", "shell.execute_reply": "2024-01-18T17:20:28.810541Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " _[cgi_decode]('a%20d')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Once executed, we can retrieve the symbolic variables in the `decls` attribute. This is a mapping of symbolic variables to types." ] }, { "cell_type": "code", "execution_count": 338, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.812519Z", "iopub.status.busy": "2024-01-18T17:20:28.812423Z", "iopub.status.idle": "2024-01-18T17:20:28.814632Z", "shell.execute_reply": "2024-01-18T17:20:28.814345Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'cgi_decode_s_str_1': 'String'}" ] }, "execution_count": 338, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.decls" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The extracted path conditions can be found in the `path` attribute:" ] }, { "cell_type": "code", "execution_count": 339, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.816202Z", "iopub.status.busy": "2024-01-18T17:20:28.816109Z", "iopub.status.idle": "2024-01-18T17:20:28.822262Z", "shell.execute_reply": "2024-01-18T17:20:28.821976Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[0 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 0, 1) == \"+\"),\n", " Not(str.substr(cgi_decode_s_str_1, 0, 1) == \"%\"),\n", " 1 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 1, 1) == \"+\"),\n", " str.substr(cgi_decode_s_str_1, 1, 1) == \"%\",\n", " Not(str.substr(cgi_decode_s_str_1, 2, 1) == \"0\"),\n", " Not(str.substr(cgi_decode_s_str_1, 2, 1) == \"1\"),\n", " str.substr(cgi_decode_s_str_1, 2, 1) == \"2\",\n", " str.substr(cgi_decode_s_str_1, 3, 1) == \"0\",\n", " 4 < Length(cgi_decode_s_str_1),\n", " Not(str.substr(cgi_decode_s_str_1, 4, 1) == \"+\"),\n", " Not(str.substr(cgi_decode_s_str_1, 4, 1) == \"%\"),\n", " Not(5 < Length(cgi_decode_s_str_1))]" ] }, "execution_count": 339, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.path" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `context` attribute holds a pair of `decls` and `path` attributes; this is useful for passing it into the `ConcolicTracer` constructor." ] }, { "cell_type": "code", "execution_count": 340, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.823954Z", "iopub.status.busy": "2024-01-18T17:20:28.823830Z", "iopub.status.idle": "2024-01-18T17:20:28.825533Z", "shell.execute_reply": "2024-01-18T17:20:28.825278Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert _.context == (_.decls, _.path)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can solve these constraints to obtain a value for the function parameters that follow the same path as the original (traced) invocation:" ] }, { "cell_type": "code", "execution_count": 341, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.827376Z", "iopub.status.busy": "2024-01-18T17:20:28.827261Z", "iopub.status.idle": "2024-01-18T17:20:28.852421Z", "shell.execute_reply": "2024-01-18T17:20:28.851980Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'s': ('A%20B', 'String')})" ] }, "execution_count": 341, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `zeval()` function also allows passing _alternate_ or _negated_ constraints. See the chapter for examples." ] }, { "cell_type": "code", "execution_count": 342, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:28.854418Z", "iopub.status.busy": "2024-01-18T17:20:28.854292Z", "iopub.status.idle": "2024-01-18T17:20:29.265798Z", "shell.execute_reply": "2024-01-18T17:20:29.265423Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "ConcolicTracer\n", "\n", "\n", "ConcolicTracer\n", "\n", "\n", "\n", "__call__()\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "zeval()\n", "\n", "\n", "\n", "__enter__()\n", "\n", "\n", "\n", "__exit__()\n", "\n", "\n", "\n", "__getitem__()\n", "\n", "\n", "\n", "concolic()\n", "\n", "\n", "\n", "smt_expr()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Legend\n", "Legend\n", "• \n", "public_method()\n", "• \n", "private_method()\n", "• \n", "overloaded_method()\n", "Hover over names to see doc\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 342, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "from ClassDiagram import display_class_hierarchy\n", "display_class_hierarchy(ConcolicTracer)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### SimpleConcolicFuzzer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The constraints obtained from `ConcolicTracer` are added to the concolic fuzzer as follows:" ] }, { "cell_type": "code", "execution_count": 343, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:29.267547Z", "iopub.status.busy": "2024-01-18T17:20:29.267431Z", "iopub.status.idle": "2024-01-18T17:20:29.272500Z", "shell.execute_reply": "2024-01-18T17:20:29.272256Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "scf = SimpleConcolicFuzzer()\n", "scf.add_trace(_, 'a%20d')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The concolic fuzzer then uses the constraints added to guide its fuzzing as follows:" ] }, { "cell_type": "code", "execution_count": 344, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:29.274110Z", "iopub.status.busy": "2024-01-18T17:20:29.274016Z", "iopub.status.idle": "2024-01-18T17:20:29.661298Z", "shell.execute_reply": "2024-01-18T17:20:29.660746Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "' '\n", "'+'\n", "'%'\n", "'+A'\n", "'AB'\n", "'++'\n", "'++A'\n", "'+++'\n", "'A'\n", "'+A'\n", "'+++A'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "IndexError: string index out of range (expected)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "'+AB'\n", "'++'\n", "'%'\n", "'++AB'\n", "'++A+'\n", "'+A'\n", "'++'\n", "'+'\n", "'+%'\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "IndexError: string index out of range (expected)\n", "IndexError: string index out of range (expected)\n" ] } ], "source": [ "scf = SimpleConcolicFuzzer()\n", "for i in range(20):\n", " v = scf.fuzz()\n", " if v is None:\n", " break\n", " print(repr(v))\n", " with ExpectError(print_traceback=False):\n", " with ConcolicTracer() as _:\n", " _[cgi_decode](v)\n", " scf.add_trace(_, v)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We see how the additional inputs generated explore additional paths." ] }, { "cell_type": "code", "execution_count": 345, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:29.663263Z", "iopub.status.busy": "2024-01-18T17:20:29.663135Z", "iopub.status.idle": "2024-01-18T17:20:30.082924Z", "shell.execute_reply": "2024-01-18T17:20:30.082519Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "SimpleConcolicFuzzer\n", "\n", "\n", "SimpleConcolicFuzzer\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "fuzz()\n", "\n", "\n", "\n", "add_trace()\n", "\n", "\n", "\n", "get_newpath()\n", "\n", "\n", "\n", "next_choice()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "Fuzzer\n", "\n", "\n", "Fuzzer\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "fuzz()\n", "\n", "\n", "\n", "run()\n", "\n", "\n", "\n", "runs()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "SimpleConcolicFuzzer->Fuzzer\n", "\n", "\n", "\n", "\n", "\n", "Legend\n", "Legend\n", "• \n", "public_method()\n", "• \n", "private_method()\n", "• \n", "overloaded_method()\n", "Hover over names to see doc\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 345, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "display_class_hierarchy(SimpleConcolicFuzzer)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### ConcolicGrammarFuzzer\n", "\n", "The `SimpleConcolicFuzzer` simply explores all paths near the original path traversed by the sample input. It uses a simple mechanism to explore the paths that are near the paths that it knows about, and other than code paths, knows nothing about the input.\n", "\n", "The `ConcolicGrammarFuzzer` on the other hand, knows about the input grammar, and can collect feedback from the subject under fuzzing. It can lift some constraints encountered to the grammar, enabling deeper fuzzing. It is used as follows:" ] }, { "cell_type": "code", "execution_count": 346, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.084887Z", "iopub.status.busy": "2024-01-18T17:20:30.084731Z", "iopub.status.idle": "2024-01-18T17:20:30.087032Z", "shell.execute_reply": "2024-01-18T17:20:30.086751Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from InformationFlow import INVENTORY_GRAMMAR, SQLException" ] }, { "cell_type": "code", "execution_count": 347, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.088660Z", "iopub.status.busy": "2024-01-18T17:20:30.088547Z", "iopub.status.idle": "2024-01-18T17:20:30.189360Z", "shell.execute_reply": "2024-01-18T17:20:30.189076Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "select 245 from :2 where r(_)-N+e>n\n", "Table (':2') was not found\n", "\n", "delete from months where Q/x/j/q(p)/H*h-B==cz\n", "Invalid WHERE ('Q/x/j/q(p)/H*h-B==cz')\n", "\n", "insert into vehicles (:b) values (22.72)\n", "Column (':b') was not found\n", "\n", "select i*q!=(4) from vehicles where L*S/l/u/b+b==W\n", "\n", "delete from vehicles where W/V!=A(f)+tL+S))==((:+lL+S))==((:+l\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "ConcolicGrammarFuzzer\n", "\n", "\n", "ConcolicGrammarFuzzer\n", "\n", "\n", "\n", "fuzz()\n", "\n", "\n", "\n", "coalesce()\n", "\n", "\n", "\n", "prune_tokens()\n", "\n", "\n", "\n", "prune_tree()\n", "\n", "\n", "\n", "tree_to_string()\n", "\n", "\n", "\n", "update_grammar()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "GrammarFuzzer\n", "\n", "\n", "GrammarFuzzer\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "check_grammar()\n", "\n", "\n", "\n", "choose_node_expansion()\n", "\n", "\n", "\n", "choose_tree_expansion()\n", "\n", "\n", "\n", "expand_node_randomly()\n", "\n", "\n", "\n", "expand_tree()\n", "\n", "\n", "\n", "expand_tree_once()\n", "\n", "\n", "\n", "expand_tree_with_strategy()\n", "\n", "\n", "\n", "fuzz()\n", "\n", "\n", "\n", "fuzz_tree()\n", "\n", "\n", "\n", "log_tree()\n", "\n", "\n", "\n", "process_chosen_children()\n", "\n", "\n", "\n", "supported_opts()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "ConcolicGrammarFuzzer->GrammarFuzzer\n", "\n", "\n", "\n", "\n", "\n", "Fuzzer\n", "\n", "\n", "Fuzzer\n", "\n", "\n", "\n", "__init__()\n", "\n", "\n", "\n", "fuzz()\n", "\n", "\n", "\n", "run()\n", "\n", "\n", "\n", "runs()\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "GrammarFuzzer->Fuzzer\n", "\n", "\n", "\n", "\n", "\n", "Legend\n", "Legend\n", "• \n", "public_method()\n", "• \n", "private_method()\n", "• \n", "overloaded_method()\n", "Hover over names to see doc\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 348, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# ignore\n", "display_class_hierarchy(ConcolicGrammarFuzzer)" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Lessons Learned\n", "\n", "* Concolic execution can often provide more information than taint analysis with respect to the program behavior. However, this comes at a much larger runtime cost. Hence, unlike taint analysis, real-time analysis is often not possible.\n", "\n", "* Similar to taint analysis, concolic execution also suffers from limitations such as indirect control flow and internal function calls.\n", "\n", "* Predicates from concolic execution can be used in conjunction with fuzzing to provide an even more robust indication of incorrect behavior than taints, and can be used to create grammars that are better at producing valid inputs." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Next Steps\n", "\n", "A costlier but stronger alternative to concolic fuzzing is [symbolic fuzzing](SymbolicFuzzer.ipynb). Similarly, [search based fuzzing](SearchBasedFuzzer.ipynb) can often provide a cheaper exploration strategy than relying on SMT solvers to provide inputs slightly different from the current path." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Background\n", "\n", "The technique of concolic execution was originally used to inform and expand the scope of _symbolic execution_ \\cite{king1976symbolic}, a static analysis technique for program analysis. Laron et al. cite{Larson2003} was the first to use the concolic execution technique.\n", "\n", "The idea of using proxy objects for collecting constraints was pioneered by Cadar et al. \\cite{cadar2005execution}. The concolic execution technique for Python programs used in this chapter was pioneered by PeerCheck \\cite{PeerCheck}, and Python Error Finder \\cite{Barsotti2018}." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercises" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 1: Implment a Concolic Float Proxy Class\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "While implementing the `zint` binary operators, we asserted that the results were `int`. However, that need not be the case. For example, division can result in `float`. Hence, we need proxy objects for `float`. Can you implement a similar proxy object for `float` and fix the `zint` binary operator definition?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "__Solution.__ The solution is as follows." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "As in the case of `zint`, we first open up `zfloat` for extension." ] }, { "cell_type": "code", "execution_count": 349, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.592550Z", "iopub.status.busy": "2024-01-18T17:20:30.592422Z", "iopub.status.idle": "2024-01-18T17:20:30.594441Z", "shell.execute_reply": "2024-01-18T17:20:30.594194Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zfloat(float):\n", " def __new__(cls, context, zn, v, *args, **kw):\n", " return float.__new__(cls, v, *args, **kw)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We then implement the initialization methods." ] }, { "cell_type": "code", "execution_count": 350, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.595966Z", "iopub.status.busy": "2024-01-18T17:20:30.595853Z", "iopub.status.idle": "2024-01-18T17:20:30.598157Z", "shell.execute_reply": "2024-01-18T17:20:30.597889Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zfloat(zfloat):\n", " @classmethod\n", " def create(cls, context, zn, v=None):\n", " return zproxy_create(cls, 'Real', z3.Real, context, zn, v)\n", "\n", " def __init__(self, context, z, v=None):\n", " self.z, self.v = z, v\n", " self.context = context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The helper for when one of the arguments in a binary operation is not `float`." ] }, { "cell_type": "code", "execution_count": 351, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.599582Z", "iopub.status.busy": "2024-01-18T17:20:30.599485Z", "iopub.status.idle": "2024-01-18T17:20:30.601186Z", "shell.execute_reply": "2024-01-18T17:20:30.600953Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zfloat(zfloat):\n", " def _zv(self, o):\n", " return (o.z, o.v) if isinstance(o, zfloat) else (z3.RealVal(o), o)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Coerce `float` into bool value for use in conditionals." ] }, { "cell_type": "code", "execution_count": 352, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.602716Z", "iopub.status.busy": "2024-01-18T17:20:30.602612Z", "iopub.status.idle": "2024-01-18T17:20:30.604288Z", "shell.execute_reply": "2024-01-18T17:20:30.604060Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class zfloat(zfloat):\n", " def __bool__(self):\n", " # force registering boolean condition\n", " if self != 0.0:\n", " return True\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Define the common proxy method for comparison methods" ] }, { "cell_type": "code", "execution_count": 353, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.605888Z", "iopub.status.busy": "2024-01-18T17:20:30.605785Z", "iopub.status.idle": "2024-01-18T17:20:30.607744Z", "shell.execute_reply": "2024-01-18T17:20:30.607504Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def make_float_bool_wrapper(fname, fun, zfun):\n", " def proxy(self, other):\n", " z, v = self._zv(other)\n", " z_ = zfun(self.z, z)\n", " v_ = fun(self.v, v)\n", " return zbool(self.context, z_, v_)\n", "\n", " return proxy" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We apply the comparison methods on the defined `zfloat` class." ] }, { "cell_type": "code", "execution_count": 354, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.609227Z", "iopub.status.busy": "2024-01-18T17:20:30.609114Z", "iopub.status.idle": "2024-01-18T17:20:30.610844Z", "shell.execute_reply": "2024-01-18T17:20:30.610553Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "FLOAT_BOOL_OPS = [\n", " '__eq__',\n", " # '__req__',\n", " '__ne__',\n", " # '__rne__',\n", " '__gt__',\n", " '__lt__',\n", " '__le__',\n", " '__ge__',\n", "]" ] }, { "cell_type": "code", "execution_count": 355, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.612184Z", "iopub.status.busy": "2024-01-18T17:20:30.612104Z", "iopub.status.idle": "2024-01-18T17:20:30.613915Z", "shell.execute_reply": "2024-01-18T17:20:30.613668Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "for fname in FLOAT_BOOL_OPS:\n", " fun = getattr(float, fname)\n", " zfun = getattr(z3.ArithRef, fname)\n", " setattr(zfloat, fname, make_float_bool_wrapper(fname, fun, zfun))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Similarly, we define the common proxy method for binary operators." ] }, { "cell_type": "code", "execution_count": 356, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.615444Z", "iopub.status.busy": "2024-01-18T17:20:30.615332Z", "iopub.status.idle": "2024-01-18T17:20:30.617260Z", "shell.execute_reply": "2024-01-18T17:20:30.617032Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def make_float_binary_wrapper(fname, fun, zfun):\n", " def proxy(self, other):\n", " z, v = self._zv(other)\n", " z_ = zfun(self.z, z)\n", " v_ = fun(self.v, v)\n", " return zfloat(self.context, z_, v_)\n", "\n", " return proxy" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And apply them on `zfloat`" ] }, { "cell_type": "code", "execution_count": 357, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.618620Z", "iopub.status.busy": "2024-01-18T17:20:30.618522Z", "iopub.status.idle": "2024-01-18T17:20:30.620284Z", "shell.execute_reply": "2024-01-18T17:20:30.620059Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "FLOAT_BINARY_OPS = [\n", " '__add__',\n", " '__sub__',\n", " '__mul__',\n", " '__truediv__',\n", " # '__div__',\n", " '__mod__',\n", " # '__divmod__',\n", " '__pow__',\n", " # '__lshift__',\n", " # '__rshift__',\n", " # '__and__',\n", " # '__xor__',\n", " # '__or__',\n", " '__radd__',\n", " '__rsub__',\n", " '__rmul__',\n", " '__rtruediv__',\n", " # '__rdiv__',\n", " '__rmod__',\n", " # '__rdivmod__',\n", " '__rpow__',\n", " # '__rlshift__',\n", " # '__rrshift__',\n", " # '__rand__',\n", " # '__rxor__',\n", " # '__ror__',\n", "]" ] }, { "cell_type": "code", "execution_count": 358, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.621535Z", "iopub.status.busy": "2024-01-18T17:20:30.621460Z", "iopub.status.idle": "2024-01-18T17:20:30.623423Z", "shell.execute_reply": "2024-01-18T17:20:30.623178Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "for fname in FLOAT_BINARY_OPS:\n", " fun = getattr(float, fname)\n", " zfun = getattr(z3.ArithRef, fname)\n", " setattr(zfloat, fname, make_float_binary_wrapper(fname, fun, zfun))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "These are used as follows." ] }, { "cell_type": "code", "execution_count": 359, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.624881Z", "iopub.status.busy": "2024-01-18T17:20:30.624771Z", "iopub.status.idle": "2024-01-18T17:20:30.627255Z", "shell.execute_reply": "2024-01-18T17:20:30.626950Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " za = zfloat.create(_.context, 'float_a', 1.0)\n", " zb = zfloat.create(_.context, 'float_b', 0.0)\n", " if za * zb:\n", " print(1)" ] }, { "cell_type": "code", "execution_count": 360, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.628697Z", "iopub.status.busy": "2024-01-18T17:20:30.628598Z", "iopub.status.idle": "2024-01-18T17:20:30.631299Z", "shell.execute_reply": "2024-01-18T17:20:30.631047Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "({'float_a': 'Real', 'float_b': 'Real'}, [Not(float_a*float_b != 0)])" ] }, "execution_count": 360, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Finally, we fix the `zint` binary wrapper to correctly create `zfloat` when needed." ] }, { "cell_type": "code", "execution_count": 361, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.632734Z", "iopub.status.busy": "2024-01-18T17:20:30.632633Z", "iopub.status.idle": "2024-01-18T17:20:30.634918Z", "shell.execute_reply": "2024-01-18T17:20:30.634686Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def make_int_binary_wrapper(fname, fun, zfun): # type: ignore\n", " def proxy(self, other):\n", " z, v = self._zv(other)\n", " z_ = zfun(self.z, z)\n", " v_ = fun(self.v, v)\n", " if isinstance(v_, float):\n", " return zfloat(self.context, z_, v_)\n", " elif isinstance(v_, int):\n", " return zint(self.context, z_, v_)\n", " else:\n", " assert False\n", "\n", " return proxy" ] }, { "cell_type": "code", "execution_count": 362, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.636291Z", "iopub.status.busy": "2024-01-18T17:20:30.636194Z", "iopub.status.idle": "2024-01-18T17:20:30.637861Z", "shell.execute_reply": "2024-01-18T17:20:30.637613Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "for fname in INT_BINARY_OPS:\n", " fun = getattr(int, fname)\n", " zfun = getattr(z3.ArithRef, fname)\n", " setattr(zint, fname, make_int_binary_wrapper(fname, fun, zfun))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Checking whether it worked as expected." ] }, { "cell_type": "code", "execution_count": 363, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.639333Z", "iopub.status.busy": "2024-01-18T17:20:30.639222Z", "iopub.status.idle": "2024-01-18T17:20:30.642660Z", "shell.execute_reply": "2024-01-18T17:20:30.642415Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with ConcolicTracer() as _:\n", " v = _[binomial](4, 2)" ] }, { "cell_type": "code", "execution_count": 364, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.643970Z", "iopub.status.busy": "2024-01-18T17:20:30.643894Z", "iopub.status.idle": "2024-01-18T17:20:30.664048Z", "shell.execute_reply": "2024-01-18T17:20:30.663716Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "('sat', {'n': ('4', 'Int'), 'k': ('2', 'Int')})" ] }, "execution_count": 364, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 2: Bit Manipulation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Similar to floats, implementing the bit manipulation functions such as `xor` involves converting `int` to its bit vector equivalents, performing operations on them, and converting it back to the original type. Can you implement the bit manipulation operations for `zint`?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "__Solution.__ The solution is as follows." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We first define the proxy method as before." ] }, { "cell_type": "code", "execution_count": 365, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.665881Z", "iopub.status.busy": "2024-01-18T17:20:30.665761Z", "iopub.status.idle": "2024-01-18T17:20:30.668157Z", "shell.execute_reply": "2024-01-18T17:20:30.667904Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def make_int_bit_wrapper(fname, fun, zfun):\n", " def proxy(self, other):\n", " z, v = self._zv(other)\n", " z_ = z3.BV2Int(\n", " zfun(\n", " z3.Int2BV(\n", " self.z, num_bits=64), z3.Int2BV(\n", " z, num_bits=64)))\n", " v_ = fun(self.v, v)\n", " return zint(self.context, z_, v_)\n", "\n", " return proxy" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "It is then applied to the `zint` class." ] }, { "cell_type": "code", "execution_count": 366, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.669508Z", "iopub.status.busy": "2024-01-18T17:20:30.669404Z", "iopub.status.idle": "2024-01-18T17:20:30.671094Z", "shell.execute_reply": "2024-01-18T17:20:30.670855Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "BIT_OPS = [\n", " '__lshift__',\n", " '__rshift__',\n", " '__and__',\n", " '__xor__',\n", " '__or__',\n", " '__rlshift__',\n", " '__rrshift__',\n", " '__rand__',\n", " '__rxor__',\n", " '__ror__',\n", "]" ] }, { "cell_type": "code", "execution_count": 367, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.672389Z", "iopub.status.busy": "2024-01-18T17:20:30.672311Z", "iopub.status.idle": "2024-01-18T17:20:30.674175Z", "shell.execute_reply": "2024-01-18T17:20:30.673934Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def init_concolic_4():\n", " for fname in BIT_OPS:\n", " fun = getattr(int, fname)\n", " zfun = getattr(z3.BitVecRef, fname)\n", " setattr(zint, fname, make_int_bit_wrapper(fname, fun, zfun))" ] }, { "cell_type": "code", "execution_count": 368, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.675624Z", "iopub.status.busy": "2024-01-18T17:20:30.675384Z", "iopub.status.idle": "2024-01-18T17:20:30.677105Z", "shell.execute_reply": "2024-01-18T17:20:30.676895Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "INITIALIZER_LIST.append(init_concolic_4)" ] }, { "cell_type": "code", "execution_count": 369, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.678412Z", "iopub.status.busy": "2024-01-18T17:20:30.678335Z", "iopub.status.idle": "2024-01-18T17:20:30.679865Z", "shell.execute_reply": "2024-01-18T17:20:30.679548Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "init_concolic_4()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Invert is the only unary bit manipulation method." ] }, { "cell_type": "code", "execution_count": 370, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.681266Z", "iopub.status.busy": "2024-01-18T17:20:30.681191Z", "iopub.status.idle": "2024-01-18T17:20:30.682997Z", "shell.execute_reply": "2024-01-18T17:20:30.682753Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class zint(zint):\n", " def __invert__(self):\n", " return zint(self.context, z3.BV2Int(\n", " ~z3.Int2BV(self.z, num_bits=64)), ~self.v)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `my_fn()` computes `xor` and returns `True` if the `xor` results in a non-zero value." ] }, { "cell_type": "code", "execution_count": 371, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.684339Z", "iopub.status.busy": "2024-01-18T17:20:30.684258Z", "iopub.status.idle": "2024-01-18T17:20:30.686015Z", "shell.execute_reply": "2024-01-18T17:20:30.685766Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def my_fn(a, b):\n", " o_ = (a | b)\n", " a_ = (a & b)\n", " if o_ & ~a_:\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Using that under `ConcolicTracer`" ] }, { "cell_type": "code", "execution_count": 372, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.687343Z", "iopub.status.busy": "2024-01-18T17:20:30.687265Z", "iopub.status.idle": "2024-01-18T17:20:30.689690Z", "shell.execute_reply": "2024-01-18T17:20:30.689421Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "with ConcolicTracer() as _:\n", " print(_[my_fn](2, 1))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We log the computed SMT expression to verify that everything went well." ] }, { "cell_type": "code", "execution_count": 373, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:30.691192Z", "iopub.status.busy": "2024-01-18T17:20:30.691087Z", "iopub.status.idle": "2024-01-18T17:20:31.112562Z", "shell.execute_reply": "2024-01-18T17:20:31.112169Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Predicates in path:\n", "0 0 !=\n", "BV2Int(int2bv(BV2Int(int2bv(my_fn_a_int_1) |\n", " int2bv(my_fn_b_int_2))) &\n", " int2bv(BV2Int(~int2bv(BV2Int(int2bv(my_fn_a_int_1) &\n", " int2bv(my_fn_b_int_2))))))\n", "\n", "(declare-const my_fn_a_int_1 Int)\n", "(declare-const my_fn_b_int_2 Int)\n", "(assert (let ((a!1 (bvnot (bvor (bvnot ((_ int2bv 64) my_fn_a_int_1))\n", " (bvnot ((_ int2bv 64) my_fn_b_int_2))))))\n", "(let ((a!2 (bvor (bvnot (bvor ((_ int2bv 64) my_fn_a_int_1)\n", " ((_ int2bv 64) my_fn_b_int_2)))\n", " a!1)))\n", " (not (= 0 (bv2int (bvnot a!2)))))))\n", "(check-sat)\n", "(get-model)\n", "\n", "z3 -t:6000 /var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/tmpk6hn2b0y.smt\n", "sat\n", "(\n", " (define-fun my_fn_a_int_1 () Int\n", " (- 1))\n", " (define-fun my_fn_b_int_2 () Int\n", " (- 9223372036854775809))\n", ")\n" ] }, { "data": { "text/plain": [ "('sat', {'a': (['-', '1'], 'Int'), 'b': (['-', '9223372036854775809'], 'Int')})" ] }, "execution_count": 373, "metadata": {}, "output_type": "execute_result" } ], "source": [ "_.zeval(log=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We can confirm from the formulas generated that the bit manipulation functions worked correctly. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 3: String Translation Functions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We have seen how to define `upper()` and `lower()`. Can you define the `capitalize()`, `title()`, and `swapcase()` methods?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "__Solution.__ Solution not yet available." ] } ], "metadata": { "ipub": { "bibliography": "fuzzingbook.bib", "toc": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "toc-autonumbering": false, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }