{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Mining Function Specifications\n", "\n", "When testing a program, one not only needs to cover its several behaviors; one also needs to _check_ whether the result is as expected. In this chapter, we introduce a technique that allows us to _mine_ function specifications from a set of given executions, resulting in abstract and formal _descriptions_ of what the function expects and what it delivers. \n", "\n", "These so-called _dynamic invariants_ produce pre- and post-conditions over function arguments and variables from a set of executions. They are useful in a variety of contexts:\n", "\n", "* Dynamic invariants provide important information for [symbolic fuzzing](SymbolicFuzzer.ipynb), such as types and ranges of function arguments.\n", "* Dynamic invariants provide pre- and postconditions for formal program proofs and verification.\n", "* Dynamic invariants provide numerous assertions that can check whether function behavior has changed\n", "* Checks provided by dynamic invariants can be very useful as _oracles_ for checking the effects of generated tests\n", "\n", "Traditionally, dynamic invariants are dependent on the executions they are derived from. However, when paired with comprehensive test generators, they quickly become very precise, as we show in this chapter." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "**Prerequisites**\n", "\n", "* You should be familiar with tracing program executions, as in the [chapter on coverage](Coverage.ipynb).\n", "* Later in this section, we access the internal _abstract syntax tree_ representations of Python programs and transform them, as in the [chapter on information flow](InformationFlow.ipynb)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:46.522229Z", "iopub.status.busy": "2024-01-18T17:20:46.521998Z", "iopub.status.idle": "2024-01-18T17:20:46.553471Z", "shell.execute_reply": "2024-01-18T17:20:46.553077Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import bookutils.setup" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:46.555660Z", "iopub.status.busy": "2024-01-18T17:20:46.555498Z", "iopub.status.idle": "2024-01-18T17:20:46.895461Z", "shell.execute_reply": "2024-01-18T17:20:46.894144Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import Coverage\n", "import Intro_Testing" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Synopsis\n", "\n", "\n", "To [use the code provided in this chapter](Importing.ipynb), write\n", "\n", "```python\n", ">>> from fuzzingbook.DynamicInvariants import \n", "```\n", "\n", "and then make use of the following features.\n", "\n", "\n", "This chapter provides two classes that automatically extract specifications from a function and a set of inputs:\n", "\n", "* `TypeAnnotator` for _types_, and\n", "* `InvariantAnnotator` for _pre-_ and _postconditions_.\n", "\n", "Both work by _observing_ a function and its invocations within a `with` clause. Here is an example for the type annotator:\n", "\n", "```python\n", ">>> def sum(a, b):\n", ">>> return a + b\n", ">>> with TypeAnnotator() as type_annotator:\n", ">>> sum(1, 2)\n", ">>> sum(-4, -5)\n", ">>> sum(0, 0)\n", "```\n", "The `typed_functions()` method will return a representation of `sum2()` annotated with types observed during execution.\n", "\n", "```python\n", ">>> print(type_annotator.typed_functions())\n", "def sum(a: int, b: int) -> int:\n", " return a + b\n", "\n", "```\n", "The invariant annotator works similarly:\n", "\n", "```python\n", ">>> with InvariantAnnotator() as inv_annotator:\n", ">>> sum(1, 2)\n", ">>> sum(-4, -5)\n", ">>> sum(0, 0)\n", "```\n", "The `functions_with_invariants()` method will return a representation of `sum2()` annotated with inferred pre- and postconditions that all hold for the observed values.\n", "\n", "```python\n", ">>> print(inv_annotator.functions_with_invariants())\n", "@precondition(lambda b, a: isinstance(a, int))\n", "@precondition(lambda b, a: isinstance(b, int))\n", "@postcondition(lambda return_value, b, a: a == return_value - b)\n", "@postcondition(lambda return_value, b, a: b == return_value - a)\n", "@postcondition(lambda return_value, b, a: isinstance(return_value, int))\n", "@postcondition(lambda return_value, b, a: return_value == a + b)\n", "@postcondition(lambda return_value, b, a: return_value == b + a)\n", "def sum(a, b):\n", " return a + b\n", "\n", "\n", "```\n", "Such type specifications and invariants can be helpful as _oracles_ (to detect deviations from a given set of runs) as well as for all kinds of _symbolic code analyses_. The chapter gives details on how to customize the properties checked for.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Specifications and Assertions\n", "\n", "When implementing a function or program, one usually works against a _specification_ – a set of documented requirements to be satisfied by the code. Such specifications can come in natural language. A formal specification, however, allows the computer to check whether the specification is satisfied.\n", "\n", "In the [introduction to testing](Intro_Testing.ipynb), we have seen how _preconditions_ and _postconditions_ can describe what a function does. Consider the following (simple) square root function:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:46.902692Z", "iopub.status.busy": "2024-01-18T17:20:46.902384Z", "iopub.status.idle": "2024-01-18T17:20:46.907165Z", "shell.execute_reply": "2024-01-18T17:20:46.905033Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def any_sqrt(x):\n", " assert x >= 0 # Precondition\n", "\n", " ...\n", "\n", " assert result * result == x # Postcondition\n", " return result" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The assertion `assert p` checks the condition `p`; if it does not hold, execution is aborted. Here, the actual body is not yet written; we use the assertions as a specification of what `any_sqrt()` _expects_, and what it _delivers_.\n", "\n", "The topmost assertion is the _precondition_, stating the requirements on the function arguments. The assertion at the end is the _postcondition_, stating the properties of the function result (including its relationship with the original arguments). Using these pre- and postconditions as a specification, we can now go and implement a square root function that satisfies them. Once implemented, we can have the assertions check at runtime whether `any_sqrt()` works as expected; a [symbolic](SymbolicFuzzer.ipynb) or [concolic](ConcolicFuzzer.ipynb) test generator will even specifically try to find inputs where the assertions do _not_ hold. (An assertion can be seen as a conditional branch towards aborting the execution, and any technique that tries to cover all code branches will also try to invalidate as many assertions as possible.)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "However, not every piece of code is developed with explicit specifications in the first place; let alone does most code comes with formal pre- and post-conditions. (Just take a look at the chapters in this book.) This is a pity: As Ken Thompson famously said, \"Without specifications, there are no bugs – only surprises\". It is also a problem for testing, since, of course, testing needs some specification to test against. This raises the interesting question: Can we somehow _retrofit_ existing code with \"specifications\" that properly describe their behavior, allowing developers to simply _check_ them rather than having to write them from scratch? This is what we do in this chapter." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Why Generic Error Checking is Not Enough\n", "\n", "Before we go into _mining_ specifications, let us first discuss why it could be useful to _have_ them. As a motivating example, consider the full implementation of a square root function from the [introduction to testing](Intro_Testing.ipynb):" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:46.914307Z", "iopub.status.busy": "2024-01-18T17:20:46.913730Z", "iopub.status.idle": "2024-01-18T17:20:46.922108Z", "shell.execute_reply": "2024-01-18T17:20:46.921276Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import bookutils.setup" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "button": false, "execution": { "iopub.execute_input": "2024-01-18T17:20:46.923949Z", "iopub.status.busy": "2024-01-18T17:20:46.923848Z", "iopub.status.idle": "2024-01-18T17:20:46.926213Z", "shell.execute_reply": "2024-01-18T17:20:46.925860Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def my_sqrt(x):\n", " \"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\n", " approx = None\n", " guess = x / 2\n", " while approx != guess:\n", " approx = guess\n", " guess = (approx + x / approx) / 2\n", " return approx" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`my_sqrt()` does not come with any functionality that would check types or values. Hence, it is easy for callers to make mistakes when calling `my_sqrt()`:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:46.927905Z", "iopub.status.busy": "2024-01-18T17:20:46.927808Z", "iopub.status.idle": "2024-01-18T17:20:46.929798Z", "shell.execute_reply": "2024-01-18T17:20:46.929453Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ExpectError import ExpectError, ExpectTimeout" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:46.931505Z", "iopub.status.busy": "2024-01-18T17:20:46.931380Z", "iopub.status.idle": "2024-01-18T17:20:46.933647Z", "shell.execute_reply": "2024-01-18T17:20:46.933302Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/829521914.py\", line 2, in \n", " my_sqrt(\"foo\")\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/2661069967.py\", line 4, in my_sqrt\n", " guess = x / 2\n", "TypeError: unsupported operand type(s) for /: 'str' and 'int' (expected)\n" ] } ], "source": [ "with ExpectError():\n", " my_sqrt(\"foo\")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:46.955018Z", "iopub.status.busy": "2024-01-18T17:20:46.954866Z", "iopub.status.idle": "2024-01-18T17:20:46.956847Z", "shell.execute_reply": "2024-01-18T17:20:46.956575Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/1975547953.py\", line 2, in \n", " x = my_sqrt(0.0)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/2661069967.py\", line 7, in my_sqrt\n", " guess = (approx + x / approx) / 2\n", "ZeroDivisionError: float division by zero (expected)\n" ] } ], "source": [ "with ExpectError():\n", " x = my_sqrt(0.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "At least, the Python system catches these errors at runtime. The following call, however, simply lets the function enter an infinite loop:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:46.958431Z", "iopub.status.busy": "2024-01-18T17:20:46.958318Z", "iopub.status.idle": "2024-01-18T17:20:47.963480Z", "shell.execute_reply": "2024-01-18T17:20:47.963163Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/1349814288.py\", line 2, in \n", " x = my_sqrt(-1.0)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/2661069967.py\", line 5, in my_sqrt\n", " while approx != guess:\n", " File \"/Users/zeller/Projects/fuzzingbook/notebooks/Timeout.ipynb\", line 43, in timeout_handler\n", " raise TimeoutError()\n", "TimeoutError (expected)\n" ] } ], "source": [ "with ExpectTimeout(1):\n", " x = my_sqrt(-1.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Our goal is to avoid such errors by _annotating_ functions with information that prevents errors like the above ones. The idea is to provide a _specification_ of expected properties – a specification that can then be checked at runtime or statically." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\\todo{Introduce the concept of *contract*.}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "toc-hr-collapsed": false }, "source": [ "## Specifying and Checking Data Types\n", "\n", "For our Python code, one of the most important \"specifications\" we need is *types*. Python being a \"dynamically\" typed language means that all data types are determined at run time; the code itself does not explicitly state whether a variable is an integer, a string, an array, a dictionary – or whatever." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As _writer_ of Python code, omitting explicit type declarations may save time (and allows for some fun hacks). It is not sure whether a lack of types helps in _reading_ and _understanding_ code for humans. For a _computer_ trying to analyze code, the lack of explicit types is detrimental. If, say, a constraint solver, sees `if x:` and cannot know whether `x` is supposed to be a number or a string, this introduces an _ambiguity_. Such ambiguities may multiply over the entire analysis in a combinatorial explosion – or in the analysis yielding an overly inaccurate result." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Python 3.6 and later allows data types as _annotations_ to function arguments (actually, to all variables) and return values. We can, for instance, state that `my_sqrt()` is a function that accepts a floating-point value and returns one:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "button": false, "execution": { "iopub.execute_input": "2024-01-18T17:20:47.965487Z", "iopub.status.busy": "2024-01-18T17:20:47.965318Z", "iopub.status.idle": "2024-01-18T17:20:47.967244Z", "shell.execute_reply": "2024-01-18T17:20:47.966957Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def my_sqrt_with_type_annotations(x: float) -> float:\n", " \"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\n", " return my_sqrt(x)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "By default, such annotations are ignored by the Python interpreter. Therefore, one can still call `my_sqrt_typed()` with a string as an argument and get the exact same result as above. However, one can make use of special _typechecking_ modules that would check types – _dynamically_ at runtime or _statically_ by analyzing the code without having to execute it." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Excursion: Runtime Type Checking\n", "\n", "(Commented out as `enforce` is not supported by Python 3.9)\n", "\n", "The Python `enforce` package provides a function decorator that automatically inserts type-checking code that is executed at runtime. Here is how to use it:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:47.968991Z", "iopub.status.busy": "2024-01-18T17:20:47.968850Z", "iopub.status.idle": "2024-01-18T17:20:47.970446Z", "shell.execute_reply": "2024-01-18T17:20:47.970183Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# import enforce" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:47.971986Z", "iopub.status.busy": "2024-01-18T17:20:47.971865Z", "iopub.status.idle": "2024-01-18T17:20:47.973441Z", "shell.execute_reply": "2024-01-18T17:20:47.973124Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# @enforce.runtime_validation\n", "# def my_sqrt_with_checked_type_annotations(x: float) -> float:\n", "# \"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\n", "# return my_sqrt(x)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Now, invoking `my_sqrt_with_checked_type_annotations()` raises an exception when invoked with a type different from the one declared:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:47.975085Z", "iopub.status.busy": "2024-01-18T17:20:47.974975Z", "iopub.status.idle": "2024-01-18T17:20:47.976451Z", "shell.execute_reply": "2024-01-18T17:20:47.976206Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# with ExpectError():\n", "# my_sqrt_with_checked_type_annotations(True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Note that this error is not caught by the \"untyped\" variant, where passing a boolean value happily returns $\\sqrt{1}$ as result. " ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:47.978239Z", "iopub.status.busy": "2024-01-18T17:20:47.978111Z", "iopub.status.idle": "2024-01-18T17:20:47.979716Z", "shell.execute_reply": "2024-01-18T17:20:47.979450Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# my_sqrt(True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In Python (and other languages), the boolean values `True` and `False` can be implicitly converted to the integers 1 and 0; however, it is hard to think of a call to `sqrt()` where this would not be an error." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Static Type Checking\n", "\n", "Type annotations can also be checked _statically_ – that is, without even running the code. Let us create a simple Python file consisting of the above `my_sqrt_typed()` definition and a bad invocation." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:47.981266Z", "iopub.status.busy": "2024-01-18T17:20:47.981168Z", "iopub.status.idle": "2024-01-18T17:20:47.982732Z", "shell.execute_reply": "2024-01-18T17:20:47.982479Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import inspect\n", "import tempfile" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:47.984289Z", "iopub.status.busy": "2024-01-18T17:20:47.984180Z", "iopub.status.idle": "2024-01-18T17:20:47.987975Z", "shell.execute_reply": "2024-01-18T17:20:47.987623Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/tmp66r4vpr8.py'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f = tempfile.NamedTemporaryFile(mode='w', suffix='.py')\n", "f.name" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:47.989705Z", "iopub.status.busy": "2024-01-18T17:20:47.989605Z", "iopub.status.idle": "2024-01-18T17:20:47.992231Z", "shell.execute_reply": "2024-01-18T17:20:47.991976Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "f.write(inspect.getsource(my_sqrt))\n", "f.write('\\n')\n", "f.write(inspect.getsource(my_sqrt_with_type_annotations))\n", "f.write('\\n')\n", "f.write(\"print(my_sqrt_with_type_annotations('123'))\\n\")\n", "f.flush()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "These are the contents of our newly created Python file:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:47.993882Z", "iopub.status.busy": "2024-01-18T17:20:47.993792Z", "iopub.status.idle": "2024-01-18T17:20:47.995679Z", "shell.execute_reply": "2024-01-18T17:20:47.995356Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import print_file" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:47.997387Z", "iopub.status.busy": "2024-01-18T17:20:47.997276Z", "iopub.status.idle": "2024-01-18T17:20:48.080152Z", "shell.execute_reply": "2024-01-18T17:20:48.079841Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt\u001b[39;49;00m(x):\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt_with_type_annotations\u001b[39;49;00m(x: \u001b[36mfloat\u001b[39;49;00m) -> \u001b[36mfloat\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m my_sqrt(x)\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", "\u001b[36mprint\u001b[39;49;00m(my_sqrt_with_type_annotations(\u001b[33m'\u001b[39;49;00m\u001b[33m123\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_file(f.name)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "[Mypy](http://mypy-lang.org) is a type checker for Python programs. As it checks types statically, types induce no overhead at runtime; plus, a static check can be faster than a lengthy series of tests with runtime type checking enabled. Let us see what `mypy` produces on the above file:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.081932Z", "iopub.status.busy": "2024-01-18T17:20:48.081803Z", "iopub.status.idle": "2024-01-18T17:20:48.083411Z", "shell.execute_reply": "2024-01-18T17:20:48.083188Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import subprocess" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.084844Z", "iopub.status.busy": "2024-01-18T17:20:48.084765Z", "iopub.status.idle": "2024-01-18T17:20:48.312242Z", "shell.execute_reply": "2024-01-18T17:20:48.311677Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "result = subprocess.run([\"mypy\", \"--strict\", f.name], universal_newlines=True, stdout=subprocess.PIPE)\n", "del f # Delete temporary file" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.314282Z", "iopub.status.busy": "2024-01-18T17:20:48.314158Z", "iopub.status.idle": "2024-01-18T17:20:48.316065Z", "shell.execute_reply": "2024-01-18T17:20:48.315821Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/tmp66r4vpr8.py:1: error: Function is missing a type annotation [no-untyped-def]\n", "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/tmp66r4vpr8.py:12: error: Returning Any from function declared to return \"float\" [no-any-return]\n", "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/tmp66r4vpr8.py:12: error: Call to untyped function \"my_sqrt\" in typed context [no-untyped-call]\n", "/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/tmp66r4vpr8.py:14: error: Argument 1 to \"my_sqrt_with_type_annotations\" has incompatible type \"str\"; expected \"float\" [arg-type]\n", "Found 4 errors in 1 file (checked 1 source file)\n", "\n" ] } ], "source": [ "print(result.stdout)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We see that `mypy` complains about untyped function definitions such as `my_sqrt()`; most important, however, it finds that the call to `my_sqrt_with_type_annotations()` in the last line has the wrong type." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With `mypy`, we can achieve the same type safety with Python as in statically typed languages – provided that we as programmers also produce the necessary type annotations. Is there a simple way to obtain these?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "toc-hr-collapsed": false }, "source": [ "## Mining Type Specifications\n", "\n", "Our first task will be to mine type annotations (as part of the code) from _values_ we observe at run time. These type annotations would be _mined_ from actual function executions, _learning_ from (normal) runs what the expected argument and return types should be. By observing a series of calls such as these, we could infer that both `x` and the return value are of type `float`:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.317671Z", "iopub.status.busy": "2024-01-18T17:20:48.317566Z", "iopub.status.idle": "2024-01-18T17:20:48.319779Z", "shell.execute_reply": "2024-01-18T17:20:48.319520Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "5.0" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = my_sqrt(25.0)\n", "y" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.321518Z", "iopub.status.busy": "2024-01-18T17:20:48.321304Z", "iopub.status.idle": "2024-01-18T17:20:48.323640Z", "shell.execute_reply": "2024-01-18T17:20:48.323360Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1.414213562373095" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = my_sqrt(2.0)\n", "y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "How can we mine types from executions? The answer is simple: \n", "\n", "1. We _observe_ a function during execution\n", "2. We track the _types_ of its arguments\n", "3. We include these types as _annotations_ into the code.\n", "\n", "To do so, we can make use of Python's tracing facility we already observed in the [chapter on coverage](Coverage.ipynb). With every call to a function, we retrieve the arguments, their values, and their types." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Tracking Calls\n", "\n", "To observe argument types at runtime, we define a _tracer function_ that tracks the execution of `my_sqrt()`, checking its arguments and return values. The `Tracker` class is set to trace functions in a `with` block as follows:\n", "\n", "```python\n", "with Tracker() as tracker:\n", " function_to_be_tracked(...)\n", "info = tracker.collected_information()\n", "```\n", "\n", "As in the [chapter on coverage](Coverage.ipynb), we use the `sys.settrace()` function to trace individual functions during execution. We turn on tracking when the `with` block starts; at this point, the `__enter__()` method is called. When execution of the `with` block ends, `__exit()__` is called. " ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.325380Z", "iopub.status.busy": "2024-01-18T17:20:48.325260Z", "iopub.status.idle": "2024-01-18T17:20:48.326864Z", "shell.execute_reply": "2024-01-18T17:20:48.326614Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import sys" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.328562Z", "iopub.status.busy": "2024-01-18T17:20:48.328413Z", "iopub.status.idle": "2024-01-18T17:20:48.331030Z", "shell.execute_reply": "2024-01-18T17:20:48.330781Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Tracker:\n", " def __init__(self, log=False):\n", " self._log = log\n", " self.reset()\n", "\n", " def reset(self):\n", " self._calls = {}\n", " self._stack = []\n", "\n", " def traceit(self):\n", " \"\"\"Placeholder to be overloaded in subclasses\"\"\"\n", " pass\n", "\n", " # Start of `with` block\n", " def __enter__(self):\n", " self.original_trace_function = sys.gettrace()\n", " sys.settrace(self.traceit)\n", " return self\n", "\n", " # End of `with` block\n", " def __exit__(self, exc_type, exc_value, tb):\n", " sys.settrace(self.original_trace_function)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `traceit()` method does nothing yet; this is done in specialized subclasses. The `CallTracker` class implements a `traceit()` function that checks for function calls and returns:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.332610Z", "iopub.status.busy": "2024-01-18T17:20:48.332491Z", "iopub.status.idle": "2024-01-18T17:20:48.334691Z", "shell.execute_reply": "2024-01-18T17:20:48.334377Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class CallTracker(Tracker):\n", " def traceit(self, frame, event, arg):\n", " \"\"\"Tracking function: Record all calls and all args\"\"\"\n", " if event == \"call\":\n", " self.trace_call(frame, event, arg)\n", " elif event == \"return\":\n", " self.trace_return(frame, event, arg)\n", " \n", " return self.traceit" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`trace_call()` is called when a function is called; it retrieves the function name and current arguments, and saves them on a stack." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.336400Z", "iopub.status.busy": "2024-01-18T17:20:48.336279Z", "iopub.status.idle": "2024-01-18T17:20:48.338483Z", "shell.execute_reply": "2024-01-18T17:20:48.338202Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class CallTracker(CallTracker):\n", " def trace_call(self, frame, event, arg):\n", " \"\"\"Save current function name and args on the stack\"\"\"\n", " code = frame.f_code\n", " function_name = code.co_name\n", " arguments = get_arguments(frame)\n", " self._stack.append((function_name, arguments))\n", "\n", " if self._log:\n", " print(simple_call_string(function_name, arguments))" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.340280Z", "iopub.status.busy": "2024-01-18T17:20:48.340139Z", "iopub.status.idle": "2024-01-18T17:20:48.342308Z", "shell.execute_reply": "2024-01-18T17:20:48.342043Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def get_arguments(frame):\n", " \"\"\"Return call arguments in the given frame\"\"\"\n", " # When called, all arguments are local variables\n", " local_variables = dict(frame.f_locals) # explicit copy\n", " arguments = [(var, frame.f_locals[var]) for var in local_variables]\n", " arguments.reverse() # Want same order as call\n", " return arguments" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "When the function returns, `trace_return()` is called. We now also have the return value. We log the whole call with arguments and return value (if desired) and save it in our list of calls." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.344207Z", "iopub.status.busy": "2024-01-18T17:20:48.344063Z", "iopub.status.idle": "2024-01-18T17:20:48.346730Z", "shell.execute_reply": "2024-01-18T17:20:48.346384Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class CallTracker(CallTracker):\n", " def trace_return(self, frame, event, arg):\n", " \"\"\"Get return value and store complete call with arguments and return value\"\"\"\n", " code = frame.f_code\n", " function_name = code.co_name\n", " return_value = arg\n", " # TODO: Could call get_arguments() here to also retrieve _final_ values of argument variables\n", "\n", " called_function_name, called_arguments = self._stack.pop()\n", " assert function_name == called_function_name\n", "\n", " if self._log:\n", " print(simple_call_string(function_name, called_arguments), \"returns\", return_value)\n", "\n", " self.add_call(function_name, called_arguments, return_value)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`simple_call_string()` is a helper for logging that prints out calls in a user-friendly manner." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.348520Z", "iopub.status.busy": "2024-01-18T17:20:48.348396Z", "iopub.status.idle": "2024-01-18T17:20:48.350608Z", "shell.execute_reply": "2024-01-18T17:20:48.350326Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def simple_call_string(function_name, argument_list, return_value=None):\n", " \"\"\"Return function_name(arg[0], arg[1], ...) as a string\"\"\"\n", " call = function_name + \"(\" + \\\n", " \", \".join([var + \"=\" + repr(value)\n", " for (var, value) in argument_list]) + \")\"\n", "\n", " if return_value is not None:\n", " call += \" = \" + repr(return_value)\n", "\n", " return call" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`add_call()` saves the calls in a list; each function name has its own list." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.352155Z", "iopub.status.busy": "2024-01-18T17:20:48.352043Z", "iopub.status.idle": "2024-01-18T17:20:48.353917Z", "shell.execute_reply": "2024-01-18T17:20:48.353687Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class CallTracker(CallTracker):\n", " def add_call(self, function_name, arguments, return_value=None):\n", " \"\"\"Add given call to list of calls\"\"\"\n", " if function_name not in self._calls:\n", " self._calls[function_name] = []\n", " self._calls[function_name].append((arguments, return_value))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Using `calls()`, we can retrieve the list of calls, either for a given function, or for all functions." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.355508Z", "iopub.status.busy": "2024-01-18T17:20:48.355396Z", "iopub.status.idle": "2024-01-18T17:20:48.357299Z", "shell.execute_reply": "2024-01-18T17:20:48.357008Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class CallTracker(CallTracker):\n", " def calls(self, function_name=None):\n", " \"\"\"Return list of calls for function_name, \n", " or a mapping function_name -> calls for all functions tracked\"\"\"\n", " if function_name is None:\n", " return self._calls\n", "\n", " return self._calls[function_name]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us now put this to use. We turn on logging to track the individual calls and their return values:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.358812Z", "iopub.status.busy": "2024-01-18T17:20:48.358712Z", "iopub.status.idle": "2024-01-18T17:20:48.360584Z", "shell.execute_reply": "2024-01-18T17:20:48.360310Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "my_sqrt(x=25)\n", "my_sqrt(x=25) returns 5.0\n", "my_sqrt(x=2.0)\n", "my_sqrt(x=2.0) returns 1.414213562373095\n", "__exit__(tb=None, exc_value=None, exc_type=None, self=<__main__.CallTracker object at 0x11e231690>)\n" ] } ], "source": [ "with CallTracker(log=True) as tracker:\n", " y = my_sqrt(25)\n", " y = my_sqrt(2.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "After execution, we can retrieve the individual calls:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.381226Z", "iopub.status.busy": "2024-01-18T17:20:48.381076Z", "iopub.status.idle": "2024-01-18T17:20:48.383458Z", "shell.execute_reply": "2024-01-18T17:20:48.383190Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[([('x', 25)], 5.0), ([('x', 2.0)], 1.414213562373095)]" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "calls = tracker.calls('my_sqrt')\n", "calls" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Each call is pair (`argument_list`, `return_value`), where `argument_list` is a list of pairs (`parameter_name`, `value`)." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.385221Z", "iopub.status.busy": "2024-01-18T17:20:48.385055Z", "iopub.status.idle": "2024-01-18T17:20:48.387259Z", "shell.execute_reply": "2024-01-18T17:20:48.387028Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'my_sqrt(x=25) = 5.0'" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_sqrt_argument_list, my_sqrt_return_value = calls[0]\n", "simple_call_string('my_sqrt', my_sqrt_argument_list, my_sqrt_return_value)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If the function does not return a value, `return_value` is `None`." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.388816Z", "iopub.status.busy": "2024-01-18T17:20:48.388681Z", "iopub.status.idle": "2024-01-18T17:20:48.390331Z", "shell.execute_reply": "2024-01-18T17:20:48.390092Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def hello(name):\n", " print(\"Hello,\", name)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.391715Z", "iopub.status.busy": "2024-01-18T17:20:48.391634Z", "iopub.status.idle": "2024-01-18T17:20:48.393528Z", "shell.execute_reply": "2024-01-18T17:20:48.393301Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello, world\n" ] } ], "source": [ "with CallTracker() as tracker:\n", " hello(\"world\")" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.395113Z", "iopub.status.busy": "2024-01-18T17:20:48.394997Z", "iopub.status.idle": "2024-01-18T17:20:48.397158Z", "shell.execute_reply": "2024-01-18T17:20:48.396892Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[([('name', 'world')], None)]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hello_calls = tracker.calls('hello')\n", "hello_calls" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.398623Z", "iopub.status.busy": "2024-01-18T17:20:48.398517Z", "iopub.status.idle": "2024-01-18T17:20:48.400473Z", "shell.execute_reply": "2024-01-18T17:20:48.400208Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "\"hello(name='world')\"" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hello_argument_list, hello_return_value = hello_calls[0]\n", "simple_call_string('hello', hello_argument_list, hello_return_value)" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Getting Types\n", "\n", "Despite what you may have read or heard, Python actually _is_ a typed language. It is just that it is _dynamically typed_ – types are used and checked only at runtime (rather than declared in the code, where they can be _statically checked_ at compile time). We can thus retrieve types of all values within Python:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.402115Z", "iopub.status.busy": "2024-01-18T17:20:48.402009Z", "iopub.status.idle": "2024-01-18T17:20:48.403906Z", "shell.execute_reply": "2024-01-18T17:20:48.403673Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "int" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(4)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.405313Z", "iopub.status.busy": "2024-01-18T17:20:48.405216Z", "iopub.status.idle": "2024-01-18T17:20:48.407138Z", "shell.execute_reply": "2024-01-18T17:20:48.406879Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(2.0)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.408619Z", "iopub.status.busy": "2024-01-18T17:20:48.408522Z", "iopub.status.idle": "2024-01-18T17:20:48.410457Z", "shell.execute_reply": "2024-01-18T17:20:48.410192Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type([4])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can retrieve the type of the first argument to `my_sqrt()`:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.412398Z", "iopub.status.busy": "2024-01-18T17:20:48.412252Z", "iopub.status.idle": "2024-01-18T17:20:48.414364Z", "shell.execute_reply": "2024-01-18T17:20:48.414120Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "('x', int)" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parameter, value = my_sqrt_argument_list[0]\n", "parameter, type(value)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "as well as the type of the return value:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.415944Z", "iopub.status.busy": "2024-01-18T17:20:48.415831Z", "iopub.status.idle": "2024-01-18T17:20:48.417848Z", "shell.execute_reply": "2024-01-18T17:20:48.417594Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(my_sqrt_return_value)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Hence, we see that (so far), `my_sqrt()` is a function taking (among others) integers and floats and returning floats. We could declare `my_sqrt()` as:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.419321Z", "iopub.status.busy": "2024-01-18T17:20:48.419218Z", "iopub.status.idle": "2024-01-18T17:20:48.420803Z", "shell.execute_reply": "2024-01-18T17:20:48.420579Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def my_sqrt_annotated(x: float) -> float:\n", " return my_sqrt(x)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This is a representation we could place in a static type checker, allowing to check whether calls to `my_sqrt()` actually pass a number. A dynamic type checker could run such checks at runtime. And of course, any [symbolic interpretation](SymbolicFuzzer.ipynb) will greatly profit from the additional annotations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "By default, Python does not do anything with such annotations. However, tools can access annotations from functions and other objects:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.422273Z", "iopub.status.busy": "2024-01-18T17:20:48.422184Z", "iopub.status.idle": "2024-01-18T17:20:48.424487Z", "shell.execute_reply": "2024-01-18T17:20:48.424207Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'x': float, 'return': float}" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_sqrt_annotated.__annotations__" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This is how run-time checkers access the annotations to check against." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Accessing Function Structure\n", "\n", "Our plan is to annotate functions automatically, based on the types we have seen. To do so, we need a few modules that allow us to convert a function into a tree representation (called _abstract syntax trees_, or ASTs) and back; we already have seen these in the chapters on [concolic](ConcolicFuzzer.ipynb) and [symbolic](SymbolicFuzzer.ipynb) testing." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.425997Z", "iopub.status.busy": "2024-01-18T17:20:48.425891Z", "iopub.status.idle": "2024-01-18T17:20:48.427607Z", "shell.execute_reply": "2024-01-18T17:20:48.427328Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import ast\n", "import inspect" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can get the source of a Python function using `inspect.getsource()`. (Note that this does not work for functions defined in other notebooks.)" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.429188Z", "iopub.status.busy": "2024-01-18T17:20:48.429071Z", "iopub.status.idle": "2024-01-18T17:20:48.431470Z", "shell.execute_reply": "2024-01-18T17:20:48.431224Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'def my_sqrt(x):\\n \"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\\n approx = None\\n guess = x / 2\\n while approx != guess:\\n approx = guess\\n guess = (approx + x / approx) / 2\\n return approx\\n'" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_sqrt_source = inspect.getsource(my_sqrt)\n", "my_sqrt_source" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To view these in a visually pleasing form, our function `print_content(s, suffix)` formats and highlights the string `s` as if it were a file with ending `suffix`. We can thus view (and highlight) the source as if it were a Python file:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.432890Z", "iopub.status.busy": "2024-01-18T17:20:48.432789Z", "iopub.status.idle": "2024-01-18T17:20:48.434485Z", "shell.execute_reply": "2024-01-18T17:20:48.434133Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import print_content" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.436189Z", "iopub.status.busy": "2024-01-18T17:20:48.436079Z", "iopub.status.idle": "2024-01-18T17:20:48.466999Z", "shell.execute_reply": "2024-01-18T17:20:48.466704Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt\u001b[39;49;00m(x):\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(my_sqrt_source, '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Parsing this gives us an abstract syntax tree (AST) – a representation of the program in tree form." ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.468585Z", "iopub.status.busy": "2024-01-18T17:20:48.468499Z", "iopub.status.idle": "2024-01-18T17:20:48.470227Z", "shell.execute_reply": "2024-01-18T17:20:48.470014Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "my_sqrt_ast = ast.parse(my_sqrt_source)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "What does this AST look like? The helper functions `ast.dump()` (textual output) and `showast.show_ast()` (graphical output with [showast](https://github.com/hchasestevens/show_ast)) allow us to inspect the structure of the tree. We see that the function starts as a `FunctionDef` with name and arguments, followed by a body, which is a list of statements of type `Expr` (the docstring), type `Assign` (assignments), `While` (while loop with its own body), and finally `Return`." ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.471611Z", "iopub.status.busy": "2024-01-18T17:20:48.471531Z", "iopub.status.idle": "2024-01-18T17:20:48.473408Z", "shell.execute_reply": "2024-01-18T17:20:48.473160Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Module(\n", " body=[\n", " FunctionDef(\n", " name='my_sqrt',\n", " args=arguments(\n", " posonlyargs=[],\n", " args=[\n", " arg(arg='x')],\n", " kwonlyargs=[],\n", " kw_defaults=[],\n", " defaults=[]),\n", " body=[\n", " Expr(\n", " value=Constant(value='Computes the square root of x, using the Newton-Raphson method')),\n", " Assign(\n", " targets=[\n", " Name(id='approx', ctx=Store())],\n", " value=Constant(value=None)),\n", " Assign(\n", " targets=[\n", " Name(id='guess', ctx=Store())],\n", " value=BinOp(\n", " left=Name(id='x', ctx=Load()),\n", " op=Div(),\n", " right=Constant(value=2))),\n", " While(\n", " test=Compare(\n", " left=Name(id='approx', ctx=Load()),\n", " ops=[\n", " NotEq()],\n", " comparators=[\n", " Name(id='guess', ctx=Load())]),\n", " body=[\n", " Assign(\n", " targets=[\n", " Name(id='approx', ctx=Store())],\n", " value=Name(id='guess', ctx=Load())),\n", " Assign(\n", " targets=[\n", " Name(id='guess', ctx=Store())],\n", " value=BinOp(\n", " left=BinOp(\n", " left=Name(id='approx', ctx=Load()),\n", " op=Add(),\n", " right=BinOp(\n", " left=Name(id='x', ctx=Load()),\n", " op=Div(),\n", " right=Name(id='approx', ctx=Load()))),\n", " op=Div(),\n", " right=Constant(value=2)))],\n", " orelse=[]),\n", " Return(\n", " value=Name(id='approx', ctx=Load()))],\n", " decorator_list=[])],\n", " type_ignores=[])\n" ] } ], "source": [ "print(ast.dump(my_sqrt_ast, indent=4))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Too much text for you? This graphical representation may make things simpler." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.474830Z", "iopub.status.busy": "2024-01-18T17:20:48.474745Z", "iopub.status.idle": "2024-01-18T17:20:48.476216Z", "shell.execute_reply": "2024-01-18T17:20:48.475959Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import rich_output" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.477811Z", "iopub.status.busy": "2024-01-18T17:20:48.477694Z", "iopub.status.idle": "2024-01-18T17:20:48.903734Z", "shell.execute_reply": "2024-01-18T17:20:48.903364Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "0\n", "FunctionDef\n", "\n", "\n", "\n", "1\n", ""my_sqrt"\n", "\n", "\n", "\n", "0--1\n", "\n", "\n", "\n", "\n", "2\n", "arguments\n", "\n", "\n", "\n", "0--2\n", "\n", "\n", "\n", "\n", "5\n", "Assign\n", "\n", "\n", "\n", "0--5\n", "\n", "\n", "\n", "\n", "10\n", "Assign\n", "\n", "\n", "\n", "0--10\n", "\n", "\n", "\n", "\n", "21\n", "While\n", "\n", "\n", "\n", "0--21\n", "\n", "\n", "\n", "\n", "58\n", "Return\n", "\n", "\n", "\n", "0--58\n", "\n", "\n", "\n", "\n", "3\n", "arg\n", "\n", "\n", "\n", "2--3\n", "\n", "\n", "\n", "\n", "4\n", ""x"\n", "\n", "\n", "\n", "3--4\n", "\n", "\n", "\n", "\n", "6\n", "Name\n", "\n", "\n", "\n", "5--6\n", "\n", "\n", "\n", "\n", "9\n", "Constant\n", "\n", "\n", "\n", "5--9\n", "\n", "\n", "\n", "\n", "7\n", ""approx"\n", "\n", "\n", "\n", "6--7\n", "\n", "\n", "\n", "\n", "8\n", "Store\n", "\n", "\n", "\n", "6--8\n", "\n", "\n", "\n", "\n", "11\n", "Name\n", "\n", "\n", "\n", "10--11\n", "\n", "\n", "\n", "\n", "14\n", "BinOp\n", "\n", "\n", "\n", "10--14\n", "\n", "\n", "\n", "\n", "12\n", ""guess"\n", "\n", "\n", "\n", "11--12\n", "\n", "\n", "\n", "\n", "13\n", "Store\n", "\n", "\n", "\n", "11--13\n", "\n", "\n", "\n", "\n", "15\n", "Name\n", "\n", "\n", "\n", "14--15\n", "\n", "\n", "\n", "\n", "18\n", "Div\n", "\n", "\n", "\n", "14--18\n", "\n", "\n", "\n", "\n", "19\n", "Constant\n", "\n", "\n", "\n", "14--19\n", "\n", "\n", "\n", "\n", "16\n", ""x"\n", "\n", "\n", "\n", "15--16\n", "\n", "\n", "\n", "\n", "17\n", "Load\n", "\n", "\n", "\n", "15--17\n", "\n", "\n", "\n", "\n", "20\n", "2\n", "\n", "\n", "\n", "19--20\n", "\n", "\n", "\n", "\n", "22\n", "Compare\n", "\n", "\n", "\n", "21--22\n", "\n", "\n", "\n", "\n", "30\n", "Assign\n", "\n", "\n", "\n", "21--30\n", "\n", "\n", "\n", "\n", "37\n", "Assign\n", "\n", "\n", "\n", "21--37\n", "\n", "\n", "\n", "\n", "23\n", "Name\n", "\n", "\n", "\n", "22--23\n", "\n", "\n", "\n", "\n", "26\n", "NotEq\n", "\n", "\n", "\n", "22--26\n", "\n", "\n", "\n", "\n", "27\n", "Name\n", "\n", "\n", "\n", "22--27\n", "\n", "\n", "\n", "\n", "24\n", ""approx"\n", "\n", "\n", "\n", "23--24\n", "\n", "\n", "\n", "\n", "25\n", "Load\n", "\n", "\n", "\n", "23--25\n", "\n", "\n", "\n", "\n", "28\n", ""guess"\n", "\n", "\n", "\n", "27--28\n", "\n", "\n", "\n", "\n", "29\n", "Load\n", "\n", "\n", "\n", "27--29\n", "\n", "\n", "\n", "\n", "31\n", "Name\n", "\n", "\n", "\n", "30--31\n", "\n", "\n", "\n", "\n", "34\n", "Name\n", "\n", "\n", "\n", "30--34\n", "\n", "\n", "\n", "\n", "32\n", ""approx"\n", "\n", "\n", "\n", "31--32\n", "\n", "\n", "\n", "\n", "33\n", "Store\n", "\n", "\n", "\n", "31--33\n", "\n", "\n", "\n", "\n", "35\n", ""guess"\n", "\n", "\n", "\n", "34--35\n", "\n", "\n", "\n", "\n", "36\n", "Load\n", "\n", "\n", "\n", "34--36\n", "\n", "\n", "\n", "\n", "38\n", "Name\n", "\n", "\n", "\n", "37--38\n", "\n", "\n", "\n", "\n", "41\n", "BinOp\n", "\n", "\n", "\n", "37--41\n", "\n", "\n", "\n", "\n", "39\n", ""guess"\n", "\n", "\n", "\n", "38--39\n", "\n", "\n", "\n", "\n", "40\n", "Store\n", "\n", "\n", "\n", "38--40\n", "\n", "\n", "\n", "\n", "42\n", "BinOp\n", "\n", "\n", "\n", "41--42\n", "\n", "\n", "\n", "\n", "55\n", "Div\n", "\n", "\n", "\n", "41--55\n", "\n", "\n", "\n", "\n", "56\n", "Constant\n", "\n", "\n", "\n", "41--56\n", "\n", "\n", "\n", "\n", "43\n", "Name\n", "\n", "\n", "\n", "42--43\n", "\n", "\n", "\n", "\n", "46\n", "Add\n", "\n", "\n", "\n", "42--46\n", "\n", "\n", "\n", "\n", "47\n", "BinOp\n", "\n", "\n", "\n", "42--47\n", "\n", "\n", "\n", "\n", "44\n", ""approx"\n", "\n", "\n", "\n", "43--44\n", "\n", "\n", "\n", "\n", "45\n", "Load\n", "\n", "\n", "\n", "43--45\n", "\n", "\n", "\n", "\n", "48\n", "Name\n", "\n", "\n", "\n", "47--48\n", "\n", "\n", "\n", "\n", "51\n", "Div\n", "\n", "\n", "\n", "47--51\n", "\n", "\n", "\n", "\n", "52\n", "Name\n", "\n", "\n", "\n", "47--52\n", "\n", "\n", "\n", "\n", "49\n", ""x"\n", "\n", "\n", "\n", "48--49\n", "\n", "\n", "\n", "\n", "50\n", "Load\n", "\n", "\n", "\n", "48--50\n", "\n", "\n", "\n", "\n", "53\n", ""approx"\n", "\n", "\n", "\n", "52--53\n", "\n", "\n", "\n", "\n", "54\n", "Load\n", "\n", "\n", "\n", "52--54\n", "\n", "\n", "\n", "\n", "57\n", "2\n", "\n", "\n", "\n", "56--57\n", "\n", "\n", "\n", "\n", "59\n", "Name\n", "\n", "\n", "\n", "58--59\n", "\n", "\n", "\n", "\n", "60\n", ""approx"\n", "\n", "\n", "\n", "59--60\n", "\n", "\n", "\n", "\n", "61\n", "Load\n", "\n", "\n", "\n", "59--61\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "if rich_output():\n", " import showast\n", " showast.show_ast(my_sqrt_ast)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The function `ast.unparse()` converts such a tree back into the more familiar textual Python code representation. Comments are gone, and there may be more parentheses than before, but the result has the same semantics:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.905611Z", "iopub.status.busy": "2024-01-18T17:20:48.905475Z", "iopub.status.idle": "2024-01-18T17:20:48.937486Z", "shell.execute_reply": "2024-01-18T17:20:48.937191Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt\u001b[39;49;00m(x):\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(ast.unparse(my_sqrt_ast), '.py')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Annotating Functions with Given Types\n", "\n", "Let us now go and transform these trees to add type annotations. We start with a helper function `parse_type(name)` which parses a type name into an AST." ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.939243Z", "iopub.status.busy": "2024-01-18T17:20:48.939125Z", "iopub.status.idle": "2024-01-18T17:20:48.941429Z", "shell.execute_reply": "2024-01-18T17:20:48.941116Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def parse_type(name):\n", " class ValueVisitor(ast.NodeVisitor):\n", " def visit_Expr(self, node):\n", " self.value_node = node.value\n", "\n", " tree = ast.parse(name)\n", " name_visitor = ValueVisitor()\n", " name_visitor.visit(tree)\n", " return name_visitor.value_node" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.943050Z", "iopub.status.busy": "2024-01-18T17:20:48.942934Z", "iopub.status.idle": "2024-01-18T17:20:48.944888Z", "shell.execute_reply": "2024-01-18T17:20:48.944574Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Name(id='int', ctx=Load())\n" ] } ], "source": [ "print(ast.dump(parse_type('int')))" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.946552Z", "iopub.status.busy": "2024-01-18T17:20:48.946415Z", "iopub.status.idle": "2024-01-18T17:20:48.948332Z", "shell.execute_reply": "2024-01-18T17:20:48.948056Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "List(elts=[Name(id='object', ctx=Load())], ctx=Load())\n" ] } ], "source": [ "print(ast.dump(parse_type('[object]')))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We now define a helper function that actually adds type annotations to a function AST. The `TypeTransformer` class builds on the Python standard library `ast.NodeTransformer` infrastructure. It would be called as\n", "\n", "```python\n", " TypeTransformer({'x': 'int'}, 'float').visit(ast)\n", "```\n", "\n", "to annotate the arguments of `my_sqrt()`: `x` with `int`, and the return type with `float`. The returned AST can then be unparsed, compiled or analyzed." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.949861Z", "iopub.status.busy": "2024-01-18T17:20:48.949752Z", "iopub.status.idle": "2024-01-18T17:20:48.951702Z", "shell.execute_reply": "2024-01-18T17:20:48.951426Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TypeTransformer(ast.NodeTransformer):\n", " def __init__(self, argument_types, return_type=None):\n", " self.argument_types = argument_types\n", " self.return_type = return_type\n", " super().__init__()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The core of `TypeTransformer` is the method `visit_FunctionDef()`, which is called for every function definition in the AST. Its argument `node` is the subtree of the function definition to be transformed. Our implementation accesses the individual arguments and invokes `annotate_args()` on them; it also sets the return type in the `returns` attribute of the node." ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.953212Z", "iopub.status.busy": "2024-01-18T17:20:48.953103Z", "iopub.status.idle": "2024-01-18T17:20:48.955691Z", "shell.execute_reply": "2024-01-18T17:20:48.955431Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TypeTransformer(TypeTransformer):\n", " def visit_FunctionDef(self, node):\n", " \"\"\"Add annotation to function\"\"\"\n", " # Set argument types\n", " new_args = []\n", " for arg in node.args.args:\n", " new_args.append(self.annotate_arg(arg))\n", "\n", " new_arguments = ast.arguments(\n", " node.args.posonlyargs,\n", " new_args,\n", " node.args.vararg,\n", " node.args.kwonlyargs,\n", " node.args.kw_defaults,\n", " node.args.kwarg,\n", " node.args.defaults\n", " )\n", "\n", " # Set return type\n", " if self.return_type is not None:\n", " node.returns = parse_type(self.return_type)\n", "\n", " return ast.copy_location(\n", " ast.FunctionDef(node.name, new_arguments, \n", " node.body, node.decorator_list,\n", " node.returns), node)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Each argument gets its own annotation, taken from the types originally passed to the class:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.957237Z", "iopub.status.busy": "2024-01-18T17:20:48.957132Z", "iopub.status.idle": "2024-01-18T17:20:48.959108Z", "shell.execute_reply": "2024-01-18T17:20:48.958858Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class TypeTransformer(TypeTransformer):\n", " def annotate_arg(self, arg):\n", " \"\"\"Add annotation to single function argument\"\"\"\n", " arg_name = arg.arg\n", " if arg_name in self.argument_types:\n", " arg.annotation = parse_type(self.argument_types[arg_name])\n", " return arg" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Does this work? Let us annotate the AST from `my_sqrt()` with types for the arguments and return types:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.960768Z", "iopub.status.busy": "2024-01-18T17:20:48.960644Z", "iopub.status.idle": "2024-01-18T17:20:48.962463Z", "shell.execute_reply": "2024-01-18T17:20:48.962191Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "new_ast = TypeTransformer({'x': 'int'}, 'float').visit(my_sqrt_ast)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "When we unparse the new AST, we see that the annotations actually are present:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.963914Z", "iopub.status.busy": "2024-01-18T17:20:48.963813Z", "iopub.status.idle": "2024-01-18T17:20:48.997070Z", "shell.execute_reply": "2024-01-18T17:20:48.996755Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt\u001b[39;49;00m(x: \u001b[36mint\u001b[39;49;00m) -> \u001b[36mfloat\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(ast.unparse(new_ast), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Similarly, we can annotate the `hello()` function from above:" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:48.998802Z", "iopub.status.busy": "2024-01-18T17:20:48.998684Z", "iopub.status.idle": "2024-01-18T17:20:49.000918Z", "shell.execute_reply": "2024-01-18T17:20:49.000636Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "hello_source = inspect.getsource(hello)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.002477Z", "iopub.status.busy": "2024-01-18T17:20:49.002365Z", "iopub.status.idle": "2024-01-18T17:20:49.004103Z", "shell.execute_reply": "2024-01-18T17:20:49.003805Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "hello_ast = ast.parse(hello_source)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.005689Z", "iopub.status.busy": "2024-01-18T17:20:49.005569Z", "iopub.status.idle": "2024-01-18T17:20:49.007450Z", "shell.execute_reply": "2024-01-18T17:20:49.007158Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "new_ast = TypeTransformer({'name': 'str'}, 'None').visit(hello_ast)" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.009188Z", "iopub.status.busy": "2024-01-18T17:20:49.009047Z", "iopub.status.idle": "2024-01-18T17:20:49.040658Z", "shell.execute_reply": "2024-01-18T17:20:49.040380Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mhello\u001b[39;49;00m(name: \u001b[36mstr\u001b[39;49;00m) -> \u001b[34mNone\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[36mprint\u001b[39;49;00m(\u001b[33m'\u001b[39;49;00m\u001b[33mHello,\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, name)\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(ast.unparse(new_ast), '.py')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Annotating Functions with Mined Types\n", "\n", "Let us now annotate functions with types mined at runtime. We start with a simple function `type_string()` that determines the appropriate type of a given value (as a string):" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.042225Z", "iopub.status.busy": "2024-01-18T17:20:49.042129Z", "iopub.status.idle": "2024-01-18T17:20:49.043902Z", "shell.execute_reply": "2024-01-18T17:20:49.043653Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def type_string(value):\n", " return type(value).__name__" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.045731Z", "iopub.status.busy": "2024-01-18T17:20:49.045601Z", "iopub.status.idle": "2024-01-18T17:20:49.047732Z", "shell.execute_reply": "2024-01-18T17:20:49.047457Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'int'" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type_string(4)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.049140Z", "iopub.status.busy": "2024-01-18T17:20:49.049031Z", "iopub.status.idle": "2024-01-18T17:20:49.050913Z", "shell.execute_reply": "2024-01-18T17:20:49.050660Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'list'" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type_string([])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For composite structures, `type_string()` does not examine element types; hence, the type of `[3]` is simply `list` instead of, say, `list[int]`. For now, `list` will do fine." ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.052380Z", "iopub.status.busy": "2024-01-18T17:20:49.052270Z", "iopub.status.idle": "2024-01-18T17:20:49.054198Z", "shell.execute_reply": "2024-01-18T17:20:49.053958Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'list'" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type_string([3])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`type_string()` will be used to infer the types of argument values found at runtime, as returned by `CallTracker.calls()`:" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "button": false, "execution": { "iopub.execute_input": "2024-01-18T17:20:49.055640Z", "iopub.status.busy": "2024-01-18T17:20:49.055541Z", "iopub.status.idle": "2024-01-18T17:20:49.057226Z", "shell.execute_reply": "2024-01-18T17:20:49.056972Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "with CallTracker() as tracker:\n", " y = my_sqrt(25.0)\n", " y = my_sqrt(2.0)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "button": false, "code_folding": [], "execution": { "iopub.execute_input": "2024-01-18T17:20:49.058791Z", "iopub.status.busy": "2024-01-18T17:20:49.058682Z", "iopub.status.idle": "2024-01-18T17:20:49.060767Z", "shell.execute_reply": "2024-01-18T17:20:49.060518Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{'my_sqrt': [([('x', 25.0)], 5.0), ([('x', 2.0)], 1.414213562373095)]}" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tracker.calls()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The function `annotate_types()` takes such a list of calls and annotates each function listed:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.062209Z", "iopub.status.busy": "2024-01-18T17:20:49.062109Z", "iopub.status.idle": "2024-01-18T17:20:49.064077Z", "shell.execute_reply": "2024-01-18T17:20:49.063747Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def annotate_types(calls):\n", " annotated_functions = {}\n", " \n", " for function_name in calls:\n", " try:\n", " annotated_functions[function_name] = annotate_function_with_types(function_name, calls[function_name])\n", " except KeyError:\n", " continue\n", "\n", " return annotated_functions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For each function, we get the source and its AST and then get to the actual annotation in `annotate_function_ast_with_types()`:" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.065558Z", "iopub.status.busy": "2024-01-18T17:20:49.065460Z", "iopub.status.idle": "2024-01-18T17:20:49.067246Z", "shell.execute_reply": "2024-01-18T17:20:49.067006Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def annotate_function_with_types(function_name, function_calls):\n", " function = globals()[function_name] # May raise KeyError for internal functions\n", " function_code = inspect.getsource(function)\n", " function_ast = ast.parse(function_code)\n", " return annotate_function_ast_with_types(function_ast, function_calls)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The function `annotate_function_ast_with_types()` invokes the `TypeTransformer` with the calls seen, and for each call, iterate over the arguments, determine their types, and annotate the AST with these. The universal type `Any` is used when we encounter type conflicts, which we will discuss below." ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.068916Z", "iopub.status.busy": "2024-01-18T17:20:49.068661Z", "iopub.status.idle": "2024-01-18T17:20:49.070387Z", "shell.execute_reply": "2024-01-18T17:20:49.070116Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from typing import Any" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.071723Z", "iopub.status.busy": "2024-01-18T17:20:49.071640Z", "iopub.status.idle": "2024-01-18T17:20:49.074362Z", "shell.execute_reply": "2024-01-18T17:20:49.074024Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def annotate_function_ast_with_types(function_ast, function_calls):\n", " parameter_types = {}\n", " return_type = None\n", "\n", " for calls_seen in function_calls:\n", " args, return_value = calls_seen\n", " if return_value is not None:\n", " if return_type is not None and return_type != type_string(return_value):\n", " return_type = 'Any'\n", " else:\n", " return_type = type_string(return_value)\n", " \n", " \n", " for parameter, value in args:\n", " try:\n", " different_type = parameter_types[parameter] != type_string(value)\n", " except KeyError:\n", " different_type = False\n", " \n", " if different_type:\n", " parameter_types[parameter] = 'Any'\n", " else:\n", " parameter_types[parameter] = type_string(value)\n", " \n", " annotated_function_ast = TypeTransformer(parameter_types, return_type).visit(function_ast)\n", " return annotated_function_ast" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here is `my_sqrt()` annotated with the types recorded usign the tracker, above." ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.076014Z", "iopub.status.busy": "2024-01-18T17:20:49.075911Z", "iopub.status.idle": "2024-01-18T17:20:49.108013Z", "shell.execute_reply": "2024-01-18T17:20:49.107651Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt\u001b[39;49;00m(x: \u001b[36mfloat\u001b[39;49;00m) -> \u001b[36mfloat\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(ast.unparse(annotate_types(tracker.calls())['my_sqrt']), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### All-in-one Annotation\n", "\n", "Let us bring all of this together in a single class `TypeAnnotator` that first tracks calls of functions and then allows accesing the AST (and the source code form) of the tracked functions annotated with types. The method `typed_functions()` returns the annotated functions as a string; `typed_functions_ast()` returns their AST." ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.109850Z", "iopub.status.busy": "2024-01-18T17:20:49.109713Z", "iopub.status.idle": "2024-01-18T17:20:49.111532Z", "shell.execute_reply": "2024-01-18T17:20:49.111244Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class TypeTracker(CallTracker):\n", " pass" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.113281Z", "iopub.status.busy": "2024-01-18T17:20:49.113151Z", "iopub.status.idle": "2024-01-18T17:20:49.115794Z", "shell.execute_reply": "2024-01-18T17:20:49.115529Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TypeAnnotator(TypeTracker):\n", " def typed_functions_ast(self, function_name=None):\n", " if function_name is None:\n", " return annotate_types(self.calls())\n", "\n", " return annotate_function_with_types(function_name, \n", " self.calls(function_name))\n", "\n", " def typed_functions(self, function_name=None):\n", " if function_name is None:\n", " functions = ''\n", " for f_name in self.calls():\n", " try:\n", " f_text = ast.unparse(self.typed_functions_ast(f_name))\n", " except KeyError:\n", " f_text = ''\n", " functions += f_text\n", " return functions\n", "\n", " return ast.unparse(self.typed_functions_ast(function_name))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here is how to use `TypeAnnotator`. We first track a series of calls:" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.117160Z", "iopub.status.busy": "2024-01-18T17:20:49.117057Z", "iopub.status.idle": "2024-01-18T17:20:49.118754Z", "shell.execute_reply": "2024-01-18T17:20:49.118507Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with TypeAnnotator() as annotator:\n", " y = my_sqrt(25.0)\n", " y = my_sqrt(2.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "After tracking, we can immediately retrieve an annotated version of the functions tracked:" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.120608Z", "iopub.status.busy": "2024-01-18T17:20:49.120476Z", "iopub.status.idle": "2024-01-18T17:20:49.153193Z", "shell.execute_reply": "2024-01-18T17:20:49.152880Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt\u001b[39;49;00m(x: \u001b[36mfloat\u001b[39;49;00m) -> \u001b[36mfloat\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(annotator.typed_functions(), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This also works for multiple and diverse functions. One could go and implement an automatic type annotator for Python files based on the types seen during execution." ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.154897Z", "iopub.status.busy": "2024-01-18T17:20:49.154805Z", "iopub.status.idle": "2024-01-18T17:20:49.156896Z", "shell.execute_reply": "2024-01-18T17:20:49.156638Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello, type annotations\n" ] } ], "source": [ "with TypeAnnotator() as annotator:\n", " hello('type annotations')\n", " y = my_sqrt(1.0)" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.158336Z", "iopub.status.busy": "2024-01-18T17:20:49.158228Z", "iopub.status.idle": "2024-01-18T17:20:49.188709Z", "shell.execute_reply": "2024-01-18T17:20:49.188437Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mhello\u001b[39;49;00m(name: \u001b[36mstr\u001b[39;49;00m):\u001b[37m\u001b[39;49;00m\n", " \u001b[36mprint\u001b[39;49;00m(\u001b[33m'\u001b[39;49;00m\u001b[33mHello,\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m, name)\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt\u001b[39;49;00m(x: \u001b[36mfloat\u001b[39;49;00m) -> \u001b[36mfloat\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(annotator.typed_functions(), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A content as above could now be sent to a type checker, which would detect any type inconsistency between callers and callees. Likewise, type annotations such as the ones above greatly benefit symbolic code analysis (as in the chapter on [symbolic fuzzing](SymbolicFuzzer.ipynb)), as they effectively constrain the set of values that arguments and variables can take." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Multiple Types\n", "\n", "Let us now resolve the role of the magic `Any` type in `annotate_function_ast_with_types()`. If we see multiple types for the same argument, we set its type to `Any`. For `my_sqrt()`, this makes sense, as its arguments can be integers as well as floats:" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "button": false, "execution": { "iopub.execute_input": "2024-01-18T17:20:49.190312Z", "iopub.status.busy": "2024-01-18T17:20:49.190226Z", "iopub.status.idle": "2024-01-18T17:20:49.191977Z", "shell.execute_reply": "2024-01-18T17:20:49.191745Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "with CallTracker() as tracker:\n", " y = my_sqrt(25.0)\n", " y = my_sqrt(4)" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.193447Z", "iopub.status.busy": "2024-01-18T17:20:49.193363Z", "iopub.status.idle": "2024-01-18T17:20:49.224320Z", "shell.execute_reply": "2024-01-18T17:20:49.224061Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt\u001b[39;49;00m(x: Any) -> \u001b[36mfloat\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(ast.unparse(annotate_types(tracker.calls())['my_sqrt']), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The following function `sum3()` can be called with floating-point numbers as arguments, resulting in the parameters getting a `float` type:" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.225817Z", "iopub.status.busy": "2024-01-18T17:20:49.225730Z", "iopub.status.idle": "2024-01-18T17:20:49.227704Z", "shell.execute_reply": "2024-01-18T17:20:49.227422Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def sum3(a, b, c):\n", " return a + b + c" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.229220Z", "iopub.status.busy": "2024-01-18T17:20:49.229123Z", "iopub.status.idle": "2024-01-18T17:20:49.231427Z", "shell.execute_reply": "2024-01-18T17:20:49.231108Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "6.0" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with TypeAnnotator() as annotator:\n", " y = sum3(1.0, 2.0, 3.0)\n", "y" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.233219Z", "iopub.status.busy": "2024-01-18T17:20:49.233095Z", "iopub.status.idle": "2024-01-18T17:20:49.263320Z", "shell.execute_reply": "2024-01-18T17:20:49.263069Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32msum3\u001b[39;49;00m(a: \u001b[36mfloat\u001b[39;49;00m, b: \u001b[36mfloat\u001b[39;49;00m, c: \u001b[36mfloat\u001b[39;49;00m) -> \u001b[36mfloat\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m a + b + c\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(annotator.typed_functions(), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we call `sum3()` with integers, though, the arguments get an `int` type:" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.264860Z", "iopub.status.busy": "2024-01-18T17:20:49.264773Z", "iopub.status.idle": "2024-01-18T17:20:49.267082Z", "shell.execute_reply": "2024-01-18T17:20:49.266838Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with TypeAnnotator() as annotator:\n", " y = sum3(1, 2, 3)\n", "y" ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.268525Z", "iopub.status.busy": "2024-01-18T17:20:49.268446Z", "iopub.status.idle": "2024-01-18T17:20:49.298535Z", "shell.execute_reply": "2024-01-18T17:20:49.298262Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32msum3\u001b[39;49;00m(a: \u001b[36mint\u001b[39;49;00m, b: \u001b[36mint\u001b[39;49;00m, c: \u001b[36mint\u001b[39;49;00m) -> \u001b[36mint\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m a + b + c\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(annotator.typed_functions(), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And we can also call `sum3()` with strings, giving the arguments a `str` type:" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.300144Z", "iopub.status.busy": "2024-01-18T17:20:49.300061Z", "iopub.status.idle": "2024-01-18T17:20:49.302304Z", "shell.execute_reply": "2024-01-18T17:20:49.302058Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'onetwothree'" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with TypeAnnotator() as annotator:\n", " y = sum3(\"one\", \"two\", \"three\")\n", "y" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.303696Z", "iopub.status.busy": "2024-01-18T17:20:49.303612Z", "iopub.status.idle": "2024-01-18T17:20:49.333382Z", "shell.execute_reply": "2024-01-18T17:20:49.333073Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32msum3\u001b[39;49;00m(a: \u001b[36mstr\u001b[39;49;00m, b: \u001b[36mstr\u001b[39;49;00m, c: \u001b[36mstr\u001b[39;49;00m) -> \u001b[36mstr\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m a + b + c\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(annotator.typed_functions(), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we have multiple calls, but with different types, `TypeAnnotator()` will assign an `Any` type to both arguments and return values:" ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.335013Z", "iopub.status.busy": "2024-01-18T17:20:49.334927Z", "iopub.status.idle": "2024-01-18T17:20:49.336740Z", "shell.execute_reply": "2024-01-18T17:20:49.336496Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "with TypeAnnotator() as annotator:\n", " y = sum3(1, 2, 3)\n", " y = sum3(\"one\", \"two\", \"three\")" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.338118Z", "iopub.status.busy": "2024-01-18T17:20:49.338022Z", "iopub.status.idle": "2024-01-18T17:20:49.339916Z", "shell.execute_reply": "2024-01-18T17:20:49.339670Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "typed_sum3_def = annotator.typed_functions('sum3')" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.341320Z", "iopub.status.busy": "2024-01-18T17:20:49.341227Z", "iopub.status.idle": "2024-01-18T17:20:49.370690Z", "shell.execute_reply": "2024-01-18T17:20:49.370413Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32msum3\u001b[39;49;00m(a: Any, b: Any, c: Any) -> Any:\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m a + b + c\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(typed_sum3_def, '.py')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A type `Any` makes it explicit that an object can, indeed, have any type; it will not be type-checked at runtime or statically. To some extent, this defeats the power of type checking; but it also preserves some of the type flexibility that many Python programmers enjoy. Besides `Any`, the `typing` module supports several additional ways to define ambiguous types; we will keep this in mind for a later exercise." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Specifying and Checking Invariants\n", "\n", "Besides basic data types. we can check several further properties from arguments. We can, for instance, whether an argument can be negative, zero, or positive; or that one argument should be smaller than the second; or that the result should be the sum of two arguments – properties that cannot be expressed in a (Python) type.\n", "\n", "Such properties are called *invariants*, as they hold across all invocations of a function. Specifically, invariants come as _pre_- and _postconditions_ – conditions that always hold at the beginning and at the end of a function. (There are also _data_ and _object_ invariants that express always-holding properties over the state of data or objects, but we do not consider these in this book.)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "toc-hr-collapsed": true, "toc-nb-collapsed": true }, "source": [ "### Annotating Functions with Pre- and Postconditions\n", "\n", "The classical means to specify pre- and postconditions is via _assertions_, which we have introduced in the [chapter on testing](Intro_Testing.ipynb). A precondition checks whether the arguments to a function satisfy the expected properties; a postcondition does the same for the result. We can express and check both using assertions as follows:" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.372283Z", "iopub.status.busy": "2024-01-18T17:20:49.372195Z", "iopub.status.idle": "2024-01-18T17:20:49.373972Z", "shell.execute_reply": "2024-01-18T17:20:49.373743Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def my_sqrt_with_invariants(x):\n", " assert x >= 0 # Precondition\n", " \n", " ...\n", " \n", " assert result * result == x # Postcondition\n", " return result" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "A nicer way, however, is to syntactically separate invariants from the function at hand. Using appropriate decorators, we could specify pre- and postconditions as follows:\n", "\n", "```python\n", "@precondition lambda x: x >= 0\n", "@postcondition lambda return_value, x: return_value * return_value == x\n", "def my_sqrt_with_invariants(x):\n", " # normal code without assertions\n", " ...\n", "```\n", "\n", "The decorators `@precondition` and `@postcondition` would run the given functions (specified as anonymous `lambda` functions) before and after the decorated function, respectively. If the functions return `False`, the condition is violated. `@precondition` gets the function arguments as arguments; `@postcondition` additionally gets the return value as first argument." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "It turns out that implementing such decorators is not hard at all. Our implementation builds on a [code snippet from StackOverflow](https://stackoverflow.com/questions/12151182/python-precondition-postcondition-for-member-function-how):" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.375505Z", "iopub.status.busy": "2024-01-18T17:20:49.375423Z", "iopub.status.idle": "2024-01-18T17:20:49.376956Z", "shell.execute_reply": "2024-01-18T17:20:49.376718Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import functools" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.378616Z", "iopub.status.busy": "2024-01-18T17:20:49.378521Z", "iopub.status.idle": "2024-01-18T17:20:49.381238Z", "shell.execute_reply": "2024-01-18T17:20:49.380948Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def condition(precondition=None, postcondition=None):\n", " def decorator(func):\n", " @functools.wraps(func) # preserves name, docstring, etc\n", " def wrapper(*args, **kwargs):\n", " if precondition is not None:\n", " assert precondition(*args, **kwargs), \"Precondition violated\"\n", "\n", " retval = func(*args, **kwargs) # call original function or method\n", " if postcondition is not None:\n", " assert postcondition(retval, *args, **kwargs), \"Postcondition violated\"\n", "\n", " return retval\n", " return wrapper\n", " return decorator\n", "\n", "def precondition(check):\n", " return condition(precondition=check)\n", "\n", "def postcondition(check):\n", " return condition(postcondition=check)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "With these, we can now start decorating `my_sqrt()`:" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.382720Z", "iopub.status.busy": "2024-01-18T17:20:49.382636Z", "iopub.status.idle": "2024-01-18T17:20:49.384415Z", "shell.execute_reply": "2024-01-18T17:20:49.384182Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "@precondition(lambda x: x > 0)\n", "def my_sqrt_with_precondition(x):\n", " return my_sqrt(x)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This catches arguments violating the precondition:" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.385805Z", "iopub.status.busy": "2024-01-18T17:20:49.385726Z", "iopub.status.idle": "2024-01-18T17:20:49.387507Z", "shell.execute_reply": "2024-01-18T17:20:49.387285Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/2353876897.py\", line 2, in \n", " my_sqrt_with_precondition(-1.0)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/906718213.py\", line 6, in wrapper\n", " assert precondition(*args, **kwargs), \"Precondition violated\"\n", "AssertionError: Precondition violated (expected)\n" ] } ], "source": [ "with ExpectError():\n", " my_sqrt_with_precondition(-1.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Likewise, we can provide a postcondition:" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.388941Z", "iopub.status.busy": "2024-01-18T17:20:49.388854Z", "iopub.status.idle": "2024-01-18T17:20:49.390324Z", "shell.execute_reply": "2024-01-18T17:20:49.390086Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "EPSILON = 1e-5" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.391707Z", "iopub.status.busy": "2024-01-18T17:20:49.391630Z", "iopub.status.idle": "2024-01-18T17:20:49.393413Z", "shell.execute_reply": "2024-01-18T17:20:49.393125Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "@postcondition(lambda ret, x: ret * ret - x < EPSILON)\n", "def my_sqrt_with_postcondition(x):\n", " return my_sqrt(x)" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.394967Z", "iopub.status.busy": "2024-01-18T17:20:49.394883Z", "iopub.status.idle": "2024-01-18T17:20:49.397157Z", "shell.execute_reply": "2024-01-18T17:20:49.396904Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1.414213562373095" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = my_sqrt_with_postcondition(2.0)\n", "y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we have a buggy implementation of $\\sqrt{x}$, this gets caught quickly:" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.398687Z", "iopub.status.busy": "2024-01-18T17:20:49.398572Z", "iopub.status.idle": "2024-01-18T17:20:49.400412Z", "shell.execute_reply": "2024-01-18T17:20:49.400160Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "@postcondition(lambda ret, x: ret * ret - x < EPSILON)\n", "def buggy_my_sqrt_with_postcondition(x):\n", " return my_sqrt(x) + 0.1" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.401858Z", "iopub.status.busy": "2024-01-18T17:20:49.401755Z", "iopub.status.idle": "2024-01-18T17:20:49.403497Z", "shell.execute_reply": "2024-01-18T17:20:49.403268Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/1985029262.py\", line 2, in \n", " y = buggy_my_sqrt_with_postcondition(2.0)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/906718213.py\", line 10, in wrapper\n", " assert postcondition(retval, *args, **kwargs), \"Postcondition violated\"\n", "AssertionError: Postcondition violated (expected)\n" ] } ], "source": [ "with ExpectError():\n", " y = buggy_my_sqrt_with_postcondition(2.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "While checking pre- and postconditions is a great way to catch errors, specifying them can be cumbersome. Let us try to see whether we can (again) _mine_ some of them." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" }, "toc-hr-collapsed": true, "toc-nb-collapsed": true }, "source": [ "## Mining Invariants\n", "\n", "To _mine_ invariants, we can use the same tracking functionality as before; instead of saving values for individual variables, though, we now check whether the values satisfy specific _properties_ or not. For instance, if all values of `x` seen satisfy the condition `x > 0`, then we make `x > 0` an invariant of the function. If we see positive, zero, and negative values of `x`, though, then there is no property of `x` left to talk about.\n", "\n", "The general idea is thus:\n", "\n", "1. Check all variable values observed against a set of predefined properties; and\n", "2. Keep only those properties that hold for all runs observed." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Defining Properties\n", "\n", "What precisely do we mean by properties? Here is a small collection of value properties that would frequently be used in invariants. All these properties would be evaluated with the _metavariables_ `X`, `Y`, and `Z` (actually, any upper-case identifier) being replaced with the names of function parameters: " ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.405007Z", "iopub.status.busy": "2024-01-18T17:20:49.404914Z", "iopub.status.idle": "2024-01-18T17:20:49.406552Z", "shell.execute_reply": "2024-01-18T17:20:49.406314Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "INVARIANT_PROPERTIES = [\n", " \"X < 0\",\n", " \"X <= 0\",\n", " \"X > 0\",\n", " \"X >= 0\",\n", " \"X == 0\",\n", " \"X != 0\",\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "When `my_sqrt(x)` is called as, say `my_sqrt(5.0)`, we see that `x = 5.0` holds. The above properties would then all be checked for `x`. Only the properties `X > 0`, `X >= 0`, and `X != 0` hold for the call seen; and hence `x > 0`, `x >= 0`, and `x != 0` would make potential preconditions for `my_sqrt(x)`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can check for many more properties such as relations between two arguments:" ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.408107Z", "iopub.status.busy": "2024-01-18T17:20:49.408006Z", "iopub.status.idle": "2024-01-18T17:20:49.409609Z", "shell.execute_reply": "2024-01-18T17:20:49.409380Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "INVARIANT_PROPERTIES += [\n", " \"X == Y\",\n", " \"X > Y\",\n", " \"X < Y\",\n", " \"X >= Y\",\n", " \"X <= Y\",\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Types also can be checked using properties. For any function parameter `X`, only one of these will hold:" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.411296Z", "iopub.status.busy": "2024-01-18T17:20:49.411174Z", "iopub.status.idle": "2024-01-18T17:20:49.412905Z", "shell.execute_reply": "2024-01-18T17:20:49.412659Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "INVARIANT_PROPERTIES += [\n", " \"isinstance(X, bool)\",\n", " \"isinstance(X, int)\",\n", " \"isinstance(X, float)\",\n", " \"isinstance(X, list)\",\n", " \"isinstance(X, dict)\",\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can check for arithmetic properties:" ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.414473Z", "iopub.status.busy": "2024-01-18T17:20:49.414374Z", "iopub.status.idle": "2024-01-18T17:20:49.416036Z", "shell.execute_reply": "2024-01-18T17:20:49.415760Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "INVARIANT_PROPERTIES += [\n", " \"X == Y + Z\",\n", " \"X == Y * Z\",\n", " \"X == Y - Z\",\n", " \"X == Y / Z\",\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here's relations over three values, a Python special:" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.417580Z", "iopub.status.busy": "2024-01-18T17:20:49.417481Z", "iopub.status.idle": "2024-01-18T17:20:49.419018Z", "shell.execute_reply": "2024-01-18T17:20:49.418779Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "INVARIANT_PROPERTIES += [\n", " \"X < Y < Z\",\n", " \"X <= Y <= Z\",\n", " \"X > Y > Z\",\n", " \"X >= Y >= Z\",\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Finally, we can also check for list or string properties. Again, this is just a tiny selection." ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.420450Z", "iopub.status.busy": "2024-01-18T17:20:49.420375Z", "iopub.status.idle": "2024-01-18T17:20:49.421955Z", "shell.execute_reply": "2024-01-18T17:20:49.421714Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "INVARIANT_PROPERTIES += [\n", " \"X == len(Y)\",\n", " \"X == sum(Y)\",\n", " \"X.startswith(Y)\",\n", "]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Extracting Meta-Variables\n", "\n", "Let us first introduce a few _helper functions_ before we can get to the actual mining. `metavars()` extracts the set of meta-variables (`X`, `Y`, `Z`, etc.) from a property. To this end, we parse the property as a Python expression and then visit the identifiers." ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.423446Z", "iopub.status.busy": "2024-01-18T17:20:49.423363Z", "iopub.status.idle": "2024-01-18T17:20:49.425479Z", "shell.execute_reply": "2024-01-18T17:20:49.425180Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def metavars(prop):\n", " metavar_list = []\n", " \n", " class ArgVisitor(ast.NodeVisitor):\n", " def visit_Name(self, node):\n", " if node.id.isupper():\n", " metavar_list.append(node.id)\n", "\n", " ArgVisitor().visit(ast.parse(prop))\n", " return metavar_list" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.427007Z", "iopub.status.busy": "2024-01-18T17:20:49.426910Z", "iopub.status.idle": "2024-01-18T17:20:49.428804Z", "shell.execute_reply": "2024-01-18T17:20:49.428540Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert metavars(\"X < 0\") == ['X']" ] }, { "cell_type": "code", "execution_count": 116, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.430322Z", "iopub.status.busy": "2024-01-18T17:20:49.430234Z", "iopub.status.idle": "2024-01-18T17:20:49.431975Z", "shell.execute_reply": "2024-01-18T17:20:49.431755Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert metavars(\"X.startswith(Y)\") == ['X', 'Y']" ] }, { "cell_type": "code", "execution_count": 117, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.433378Z", "iopub.status.busy": "2024-01-18T17:20:49.433302Z", "iopub.status.idle": "2024-01-18T17:20:49.434828Z", "shell.execute_reply": "2024-01-18T17:20:49.434603Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert metavars(\"isinstance(X, str)\") == ['X']" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Instantiating Properties\n", "\n", "To produce a property as invariant, we need to be able to _instantiate_ it with variable names. The instantiation of `X > 0` with `X` being instantiated to `a`, for instance, gets us `a > 0`. To this end, the function `instantiate_prop()` takes a property and a collection of variable names and instantiates the meta-variables left-to-right with the corresponding variables names in the collection." ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.436212Z", "iopub.status.busy": "2024-01-18T17:20:49.436121Z", "iopub.status.idle": "2024-01-18T17:20:49.438646Z", "shell.execute_reply": "2024-01-18T17:20:49.438407Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def instantiate_prop_ast(prop, var_names):\n", " class NameTransformer(ast.NodeTransformer):\n", " def visit_Name(self, node):\n", " if node.id not in mapping:\n", " return node\n", " return ast.Name(id=mapping[node.id], ctx=ast.Load())\n", "\n", " meta_variables = metavars(prop)\n", " assert len(meta_variables) == len(var_names)\n", "\n", " mapping = {}\n", " for i in range(0, len(meta_variables)):\n", " mapping[meta_variables[i]] = var_names[i]\n", "\n", " prop_ast = ast.parse(prop, mode='eval')\n", " new_ast = NameTransformer().visit(prop_ast)\n", "\n", " return new_ast" ] }, { "cell_type": "code", "execution_count": 119, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.439987Z", "iopub.status.busy": "2024-01-18T17:20:49.439910Z", "iopub.status.idle": "2024-01-18T17:20:49.441802Z", "shell.execute_reply": "2024-01-18T17:20:49.441573Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def instantiate_prop(prop, var_names):\n", " prop_ast = instantiate_prop_ast(prop, var_names)\n", " prop_text = ast.unparse(prop_ast).strip()\n", " while prop_text.startswith('(') and prop_text.endswith(')'):\n", " prop_text = prop_text[1:-1]\n", " return prop_text" ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.443391Z", "iopub.status.busy": "2024-01-18T17:20:49.443306Z", "iopub.status.idle": "2024-01-18T17:20:49.445240Z", "shell.execute_reply": "2024-01-18T17:20:49.444917Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert instantiate_prop(\"X > Y\", ['a', 'b']) == 'a > b'" ] }, { "cell_type": "code", "execution_count": 121, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.446754Z", "iopub.status.busy": "2024-01-18T17:20:49.446664Z", "iopub.status.idle": "2024-01-18T17:20:49.448473Z", "shell.execute_reply": "2024-01-18T17:20:49.448206Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert instantiate_prop(\"X.startswith(Y)\", ['x', 'y']) == 'x.startswith(y)'" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Evaluating Properties\n", "\n", "To actually _evaluate_ properties, we do not need to instantiate them. Instead, we simply convert them into a boolean function, using `lambda`:" ] }, { "cell_type": "code", "execution_count": 122, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.449944Z", "iopub.status.busy": "2024-01-18T17:20:49.449857Z", "iopub.status.idle": "2024-01-18T17:20:49.451680Z", "shell.execute_reply": "2024-01-18T17:20:49.451458Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def prop_function_text(prop):\n", " return \"lambda \" + \", \".join(metavars(prop)) + \": \" + prop\n", "\n", "def prop_function(prop):\n", " return eval(prop_function_text(prop))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is a simple example:" ] }, { "cell_type": "code", "execution_count": 123, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.453026Z", "iopub.status.busy": "2024-01-18T17:20:49.452943Z", "iopub.status.idle": "2024-01-18T17:20:49.454989Z", "shell.execute_reply": "2024-01-18T17:20:49.454748Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'lambda X, Y: X > Y'" ] }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "prop_function_text(\"X > Y\")" ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.456369Z", "iopub.status.busy": "2024-01-18T17:20:49.456289Z", "iopub.status.idle": "2024-01-18T17:20:49.458300Z", "shell.execute_reply": "2024-01-18T17:20:49.458055Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = prop_function(\"X > Y\")\n", "p(100, 1)" ] }, { "cell_type": "code", "execution_count": 125, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.459713Z", "iopub.status.busy": "2024-01-18T17:20:49.459633Z", "iopub.status.idle": "2024-01-18T17:20:49.461749Z", "shell.execute_reply": "2024-01-18T17:20:49.461470Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 125, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p(1, 100)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Checking Invariants\n", "\n", "To extract invariants from an execution, we need to check them on all possible instantiations of arguments. If the function to be checked has two arguments `a` and `b`, we instantiate the property `X < Y` both as `a < b` and `b < a` and check each of them." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To get all combinations, we use the Python `permutations()` function:" ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.463332Z", "iopub.status.busy": "2024-01-18T17:20:49.463224Z", "iopub.status.idle": "2024-01-18T17:20:49.464724Z", "shell.execute_reply": "2024-01-18T17:20:49.464486Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import itertools" ] }, { "cell_type": "code", "execution_count": 127, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.466073Z", "iopub.status.busy": "2024-01-18T17:20:49.465996Z", "iopub.status.idle": "2024-01-18T17:20:49.467888Z", "shell.execute_reply": "2024-01-18T17:20:49.467646Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1.0, 2.0)\n", "(1.0, 3.0)\n", "(2.0, 1.0)\n", "(2.0, 3.0)\n", "(3.0, 1.0)\n", "(3.0, 2.0)\n" ] } ], "source": [ "for combination in itertools.permutations([1.0, 2.0, 3.0], 2):\n", " print(combination)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The function `true_property_instantiations()` takes a property and a list of tuples (`var_name`, `value`). It then produces all instantiations of the property with the given values and returns those that evaluate to True." ] }, { "cell_type": "code", "execution_count": 128, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.469300Z", "iopub.status.busy": "2024-01-18T17:20:49.469214Z", "iopub.status.idle": "2024-01-18T17:20:49.471712Z", "shell.execute_reply": "2024-01-18T17:20:49.471473Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def true_property_instantiations(prop, vars_and_values, log=False):\n", " instantiations = set()\n", " p = prop_function(prop)\n", "\n", " len_metavars = len(metavars(prop))\n", " for combination in itertools.permutations(vars_and_values, len_metavars):\n", " args = [value for var_name, value in combination]\n", " var_names = [var_name for var_name, value in combination]\n", "\n", " try:\n", " result = p(*args)\n", " except:\n", " result = None\n", "\n", " if log:\n", " print(prop, combination, result)\n", " if result:\n", " instantiations.add((prop, tuple(var_names)))\n", "\n", " return instantiations" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here is an example. If `x == -1` and `y == 1`, the property `X < Y` holds for `x < y`, but not for `y < x`:" ] }, { "cell_type": "code", "execution_count": 129, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.473135Z", "iopub.status.busy": "2024-01-18T17:20:49.473057Z", "iopub.status.idle": "2024-01-18T17:20:49.475448Z", "shell.execute_reply": "2024-01-18T17:20:49.475220Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "X < Y (('x', -1), ('y', 1)) True\n", "X < Y (('y', 1), ('x', -1)) False\n" ] }, { "data": { "text/plain": [ "{('X < Y', ('x', 'y'))}" ] }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "invs = true_property_instantiations(\"X < Y\", [('x', -1), ('y', 1)], log=True)\n", "invs" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The instantiation retrieves the short form:" ] }, { "cell_type": "code", "execution_count": 130, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.477564Z", "iopub.status.busy": "2024-01-18T17:20:49.477452Z", "iopub.status.idle": "2024-01-18T17:20:49.479330Z", "shell.execute_reply": "2024-01-18T17:20:49.479086Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x < y\n" ] } ], "source": [ "for prop, var_names in invs:\n", " print(instantiate_prop(prop, var_names))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Likewise, with values for `x` and `y` as above, the property `X < 0` only holds for `x`, but not for `y`:" ] }, { "cell_type": "code", "execution_count": 131, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.480838Z", "iopub.status.busy": "2024-01-18T17:20:49.480736Z", "iopub.status.idle": "2024-01-18T17:20:49.482573Z", "shell.execute_reply": "2024-01-18T17:20:49.482346Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "X < 0 (('x', -1),) True\n", "X < 0 (('y', 1),) False\n" ] } ], "source": [ "invs = true_property_instantiations(\"X < 0\", [('x', -1), ('y', 1)], log=True)" ] }, { "cell_type": "code", "execution_count": 132, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.484008Z", "iopub.status.busy": "2024-01-18T17:20:49.483911Z", "iopub.status.idle": "2024-01-18T17:20:49.485690Z", "shell.execute_reply": "2024-01-18T17:20:49.485460Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x < 0\n" ] } ], "source": [ "for prop, var_names in invs:\n", " print(instantiate_prop(prop, var_names))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Extracting Invariants\n", "\n", "Let us now run the above invariant extraction on function arguments and return values as observed during a function execution. To this end, we extend the `CallTracker` class into an `InvariantTracker` class, which automatically computes invariants for all functions and all calls observed during tracking." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "By default, an `InvariantTracker` uses the properties as defined above; however, one can specify alternate sets of properties." ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.487148Z", "iopub.status.busy": "2024-01-18T17:20:49.487053Z", "iopub.status.idle": "2024-01-18T17:20:49.488832Z", "shell.execute_reply": "2024-01-18T17:20:49.488602Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class InvariantTracker(CallTracker):\n", " def __init__(self, props=None, **kwargs):\n", " if props is None:\n", " props = INVARIANT_PROPERTIES\n", "\n", " self.props = props\n", " super().__init__(**kwargs)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The key method of the `InvariantTracker` is the `invariants()` method. This iterates over the calls observed and checks which properties hold. Only the intersection of properties – that is, the set of properties that hold for all calls – is preserved, and eventually returned. The special variable `return_value` is set to hold the return value." ] }, { "cell_type": "code", "execution_count": 134, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.490258Z", "iopub.status.busy": "2024-01-18T17:20:49.490155Z", "iopub.status.idle": "2024-01-18T17:20:49.491638Z", "shell.execute_reply": "2024-01-18T17:20:49.491434Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "RETURN_VALUE = 'return_value'" ] }, { "cell_type": "code", "execution_count": 135, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.492960Z", "iopub.status.busy": "2024-01-18T17:20:49.492855Z", "iopub.status.idle": "2024-01-18T17:20:49.495570Z", "shell.execute_reply": "2024-01-18T17:20:49.495316Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class InvariantTracker(InvariantTracker):\n", " def invariants(self, function_name=None):\n", " if function_name is None:\n", " return {function_name: self.invariants(function_name) for function_name in self.calls()}\n", "\n", " invariants = None\n", " for variables, return_value in self.calls(function_name):\n", " vars_and_values = variables + [(RETURN_VALUE, return_value)]\n", "\n", " s = set()\n", " for prop in self.props:\n", " s |= true_property_instantiations(prop, vars_and_values, self._log)\n", " if invariants is None:\n", " invariants = s\n", " else:\n", " invariants &= s\n", "\n", " return invariants" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here's an example of how to use `invariants()`. We run the tracker on a small set of calls." ] }, { "cell_type": "code", "execution_count": 136, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.497061Z", "iopub.status.busy": "2024-01-18T17:20:49.496973Z", "iopub.status.idle": "2024-01-18T17:20:49.499368Z", "shell.execute_reply": "2024-01-18T17:20:49.499140Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "{'my_sqrt': [([('x', 25.0)], 5.0), ([('x', 10.0)], 3.162277660168379)]}" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with InvariantTracker() as tracker:\n", " y = my_sqrt(25.0)\n", " y = my_sqrt(10.0)\n", "\n", "tracker.calls()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `invariants()` method produces a set of properties that hold for the observed runs, together with their instantiations over function arguments." ] }, { "cell_type": "code", "execution_count": 137, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.500791Z", "iopub.status.busy": "2024-01-18T17:20:49.500708Z", "iopub.status.idle": "2024-01-18T17:20:49.505345Z", "shell.execute_reply": "2024-01-18T17:20:49.505093Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "{('X != 0', ('return_value',)),\n", " ('X != 0', ('x',)),\n", " ('X < Y', ('return_value', 'x')),\n", " ('X <= Y', ('return_value', 'x')),\n", " ('X > 0', ('return_value',)),\n", " ('X > 0', ('x',)),\n", " ('X > Y', ('x', 'return_value')),\n", " ('X >= 0', ('return_value',)),\n", " ('X >= 0', ('x',)),\n", " ('X >= Y', ('x', 'return_value')),\n", " ('isinstance(X, float)', ('return_value',)),\n", " ('isinstance(X, float)', ('x',))}" ] }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" } ], "source": [ "invs = tracker.invariants('my_sqrt')\n", "invs" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As before, the actual instantiations are easier to read:\n" ] }, { "cell_type": "code", "execution_count": 138, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.506721Z", "iopub.status.busy": "2024-01-18T17:20:49.506639Z", "iopub.status.idle": "2024-01-18T17:20:49.508388Z", "shell.execute_reply": "2024-01-18T17:20:49.508163Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def pretty_invariants(invariants):\n", " props = []\n", " for (prop, var_names) in invariants:\n", " props.append(instantiate_prop(prop, var_names))\n", " return sorted(props)" ] }, { "cell_type": "code", "execution_count": 139, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.509730Z", "iopub.status.busy": "2024-01-18T17:20:49.509653Z", "iopub.status.idle": "2024-01-18T17:20:49.512270Z", "shell.execute_reply": "2024-01-18T17:20:49.512034Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "['isinstance(return_value, float)',\n", " 'isinstance(x, float)',\n", " 'return_value != 0',\n", " 'return_value < x',\n", " 'return_value <= x',\n", " 'return_value > 0',\n", " 'return_value >= 0',\n", " 'x != 0',\n", " 'x > 0',\n", " 'x > return_value',\n", " 'x >= 0',\n", " 'x >= return_value']" ] }, "execution_count": 139, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pretty_invariants(invs)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We see that the both `x` and the return value have a `float` type. We also see that both are always greater than zero. These are properties that may make useful pre- and postconditions, notably for symbolic analysis." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "However, there's also an invariant which does _not_ universally hold, namely `return_value <= x`, as the following example shows:" ] }, { "cell_type": "code", "execution_count": 140, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.513873Z", "iopub.status.busy": "2024-01-18T17:20:49.513776Z", "iopub.status.idle": "2024-01-18T17:20:49.515642Z", "shell.execute_reply": "2024-01-18T17:20:49.515405Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0.1" ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_sqrt(0.01)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Clearly, 0.1 > 0.01 holds. This is a case of us not learning from sufficiently diverse inputs. As soon as we have a call including `x = 0.1`, though, the invariant `return_value <= x` is eliminated:" ] }, { "cell_type": "code", "execution_count": 141, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.517161Z", "iopub.status.busy": "2024-01-18T17:20:49.517050Z", "iopub.status.idle": "2024-01-18T17:20:49.522853Z", "shell.execute_reply": "2024-01-18T17:20:49.522628Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "['isinstance(return_value, float)',\n", " 'isinstance(x, float)',\n", " 'return_value != 0',\n", " 'return_value > 0',\n", " 'return_value >= 0',\n", " 'x != 0',\n", " 'x > 0',\n", " 'x >= 0']" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with InvariantTracker() as tracker:\n", " y = my_sqrt(25.0)\n", " y = my_sqrt(10.0)\n", " y = my_sqrt(0.01)\n", " \n", "pretty_invariants(tracker.invariants('my_sqrt'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We will discuss later how to ensure sufficient diversity in inputs. (Hint: This involves test generation.)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us try out our invariant tracker on `sum3()`. We see that all types are well-defined; the properties that all arguments are non-zero, however, is specific to the calls observed." ] }, { "cell_type": "code", "execution_count": 142, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.524304Z", "iopub.status.busy": "2024-01-18T17:20:49.524221Z", "iopub.status.idle": "2024-01-18T17:20:49.529347Z", "shell.execute_reply": "2024-01-18T17:20:49.529143Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "['a != 0',\n", " 'b != 0',\n", " 'c != 0',\n", " 'isinstance(a, int)',\n", " 'isinstance(b, int)',\n", " 'isinstance(c, int)',\n", " 'isinstance(return_value, int)',\n", " 'return_value != 0']" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with InvariantTracker() as tracker:\n", " y = sum3(1, 2, 3)\n", " y = sum3(-4, -5, -6)\n", " \n", "pretty_invariants(tracker.invariants('sum3'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If we invoke `sum3()` with strings instead, we get different invariants. Notably, we obtain the postcondition that the return value starts with the value of `a` – a universal postcondition if strings are used." ] }, { "cell_type": "code", "execution_count": 143, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.530928Z", "iopub.status.busy": "2024-01-18T17:20:49.530819Z", "iopub.status.idle": "2024-01-18T17:20:49.536098Z", "shell.execute_reply": "2024-01-18T17:20:49.535866Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "['a != 0',\n", " 'a < return_value',\n", " 'a <= return_value',\n", " 'b != 0',\n", " 'c != 0',\n", " 'return_value != 0',\n", " 'return_value > a',\n", " 'return_value >= a',\n", " 'return_value.startswith(a)']" ] }, "execution_count": 143, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with InvariantTracker() as tracker:\n", " y = sum3('a', 'b', 'c')\n", " y = sum3('f', 'e', 'd')\n", " \n", "pretty_invariants(tracker.invariants('sum3'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If we invoke `sum3()` with both strings and numbers (and zeros, too), there are no properties left that would hold across all calls. That's the price of flexibility." ] }, { "cell_type": "code", "execution_count": 144, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.537530Z", "iopub.status.busy": "2024-01-18T17:20:49.537449Z", "iopub.status.idle": "2024-01-18T17:20:49.544699Z", "shell.execute_reply": "2024-01-18T17:20:49.544450Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 144, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with InvariantTracker() as tracker:\n", " y = sum3('a', 'b', 'c')\n", " y = sum3('c', 'b', 'a')\n", " y = sum3(-4, -5, -6)\n", " y = sum3(0, 0, 0)\n", " \n", "pretty_invariants(tracker.invariants('sum3'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "toc-hr-collapsed": true }, "source": [ "### Converting Mined Invariants to Annotations\n", "\n", "As with types, above, we would like to have some functionality where we can add the mined invariants as annotations to existing functions. To this end, we introduce the `InvariantAnnotator` class, extending `InvariantTracker`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We start with a helper method. `params()` returns a comma-separated list of parameter names as observed during calls." ] }, { "cell_type": "code", "execution_count": 145, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.546138Z", "iopub.status.busy": "2024-01-18T17:20:49.546058Z", "iopub.status.idle": "2024-01-18T17:20:49.547919Z", "shell.execute_reply": "2024-01-18T17:20:49.547685Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class InvariantAnnotator(InvariantTracker):\n", " def params(self, function_name):\n", " arguments, return_value = self.calls(function_name)[0]\n", " return \", \".join(arg_name for (arg_name, arg_value) in arguments)" ] }, { "cell_type": "code", "execution_count": 146, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.549305Z", "iopub.status.busy": "2024-01-18T17:20:49.549208Z", "iopub.status.idle": "2024-01-18T17:20:49.550829Z", "shell.execute_reply": "2024-01-18T17:20:49.550592Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with InvariantAnnotator() as annotator:\n", " y = my_sqrt(25.0)\n", " y = sum3(1, 2, 3)" ] }, { "cell_type": "code", "execution_count": 147, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.552135Z", "iopub.status.busy": "2024-01-18T17:20:49.552061Z", "iopub.status.idle": "2024-01-18T17:20:49.553999Z", "shell.execute_reply": "2024-01-18T17:20:49.553782Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'x'" ] }, "execution_count": 147, "metadata": {}, "output_type": "execute_result" } ], "source": [ "annotator.params('my_sqrt')" ] }, { "cell_type": "code", "execution_count": 148, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.555338Z", "iopub.status.busy": "2024-01-18T17:20:49.555265Z", "iopub.status.idle": "2024-01-18T17:20:49.557298Z", "shell.execute_reply": "2024-01-18T17:20:49.557079Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'c, b, a'" ] }, "execution_count": 148, "metadata": {}, "output_type": "execute_result" } ], "source": [ "annotator.params('sum3')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Now for the actual annotation. `preconditions()` returns the preconditions from the mined invariants (i.e., those propertes that do not depend on the return value) as a string with annotations:" ] }, { "cell_type": "code", "execution_count": 149, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.558815Z", "iopub.status.busy": "2024-01-18T17:20:49.558715Z", "iopub.status.idle": "2024-01-18T17:20:49.560718Z", "shell.execute_reply": "2024-01-18T17:20:49.560488Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class InvariantAnnotator(InvariantAnnotator):\n", " def preconditions(self, function_name):\n", " conditions = []\n", "\n", " for inv in pretty_invariants(self.invariants(function_name)):\n", " if inv.find(RETURN_VALUE) >= 0:\n", " continue # Postcondition\n", "\n", " cond = \"@precondition(lambda \" + self.params(function_name) + \": \" + inv + \")\"\n", " conditions.append(cond)\n", "\n", " return conditions" ] }, { "cell_type": "code", "execution_count": 150, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.562158Z", "iopub.status.busy": "2024-01-18T17:20:49.562057Z", "iopub.status.idle": "2024-01-18T17:20:49.563726Z", "shell.execute_reply": "2024-01-18T17:20:49.563506Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "with InvariantAnnotator() as annotator:\n", " y = my_sqrt(25.0)\n", " y = my_sqrt(0.01)\n", " y = sum3(1, 2, 3)" ] }, { "cell_type": "code", "execution_count": 151, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.565076Z", "iopub.status.busy": "2024-01-18T17:20:49.564998Z", "iopub.status.idle": "2024-01-18T17:20:49.569471Z", "shell.execute_reply": "2024-01-18T17:20:49.569209Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['@precondition(lambda x: isinstance(x, float))',\n", " '@precondition(lambda x: x != 0)',\n", " '@precondition(lambda x: x > 0)',\n", " '@precondition(lambda x: x >= 0)']" ] }, "execution_count": 151, "metadata": {}, "output_type": "execute_result" } ], "source": [ "annotator.preconditions('my_sqrt')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`postconditions()` does the same for postconditions:" ] }, { "cell_type": "code", "execution_count": 152, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.570860Z", "iopub.status.busy": "2024-01-18T17:20:49.570781Z", "iopub.status.idle": "2024-01-18T17:20:49.573023Z", "shell.execute_reply": "2024-01-18T17:20:49.572814Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class InvariantAnnotator(InvariantAnnotator):\n", " def postconditions(self, function_name):\n", " conditions = []\n", "\n", " for inv in pretty_invariants(self.invariants(function_name)):\n", " if inv.find(RETURN_VALUE) < 0:\n", " continue # Precondition\n", "\n", " cond = (\"@postcondition(lambda \" + \n", " RETURN_VALUE + \", \" + self.params(function_name) + \": \" + inv + \")\")\n", " conditions.append(cond)\n", "\n", " return conditions" ] }, { "cell_type": "code", "execution_count": 153, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.574461Z", "iopub.status.busy": "2024-01-18T17:20:49.574380Z", "iopub.status.idle": "2024-01-18T17:20:49.576091Z", "shell.execute_reply": "2024-01-18T17:20:49.575855Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "with InvariantAnnotator() as annotator:\n", " y = my_sqrt(25.0)\n", " y = my_sqrt(0.01)\n", " y = sum3(1, 2, 3)" ] }, { "cell_type": "code", "execution_count": 154, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.577566Z", "iopub.status.busy": "2024-01-18T17:20:49.577475Z", "iopub.status.idle": "2024-01-18T17:20:49.581966Z", "shell.execute_reply": "2024-01-18T17:20:49.581727Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['@postcondition(lambda return_value, x: isinstance(return_value, float))',\n", " '@postcondition(lambda return_value, x: return_value != 0)',\n", " '@postcondition(lambda return_value, x: return_value > 0)',\n", " '@postcondition(lambda return_value, x: return_value >= 0)']" ] }, "execution_count": 154, "metadata": {}, "output_type": "execute_result" } ], "source": [ "annotator.postconditions('my_sqrt')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With these, we can take a function and add both pre- and postconditions as annotations:" ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.583421Z", "iopub.status.busy": "2024-01-18T17:20:49.583341Z", "iopub.status.idle": "2024-01-18T17:20:49.585661Z", "shell.execute_reply": "2024-01-18T17:20:49.585426Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class InvariantAnnotator(InvariantAnnotator):\n", " def functions_with_invariants(self):\n", " functions = \"\"\n", " for function_name in self.invariants():\n", " try:\n", " function = self.function_with_invariants(function_name)\n", " except KeyError:\n", " continue\n", " functions += function\n", " return functions\n", "\n", " def function_with_invariants(self, function_name):\n", " function = globals()[function_name] # Can throw KeyError\n", " source = inspect.getsource(function)\n", " return \"\\n\".join(self.preconditions(function_name) + \n", " self.postconditions(function_name)) + '\\n' + source" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Here comes `function_with_invariants()` in all its glory:" ] }, { "cell_type": "code", "execution_count": 156, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.587146Z", "iopub.status.busy": "2024-01-18T17:20:49.587047Z", "iopub.status.idle": "2024-01-18T17:20:49.588746Z", "shell.execute_reply": "2024-01-18T17:20:49.588518Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with InvariantAnnotator() as annotator:\n", " y = my_sqrt(25.0)\n", " y = my_sqrt(0.01)\n", " y = sum3(1, 2, 3)" ] }, { "cell_type": "code", "execution_count": 157, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.590100Z", "iopub.status.busy": "2024-01-18T17:20:49.590020Z", "iopub.status.idle": "2024-01-18T17:20:49.625489Z", "shell.execute_reply": "2024-01-18T17:20:49.625232Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m x: \u001b[36misinstance\u001b[39;49;00m(x, \u001b[36mfloat\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m x: x != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m x: x > \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m x: x >= \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, x: \u001b[36misinstance\u001b[39;49;00m(return_value, \u001b[36mfloat\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, x: return_value != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, x: return_value > \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, x: return_value >= \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt\u001b[39;49;00m(x):\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(annotator.function_with_invariants('my_sqrt'), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Quite a lot of invariants, is it? Further below (and in the exercises), we will discuss on how to focus on the most relevant properties." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Some Examples\n", "\n", "Here's another example. `list_length()` recursively computes the length of a Python function. Let us see whether we can mine its invariants:" ] }, { "cell_type": "code", "execution_count": 158, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.627012Z", "iopub.status.busy": "2024-01-18T17:20:49.626930Z", "iopub.status.idle": "2024-01-18T17:20:49.628717Z", "shell.execute_reply": "2024-01-18T17:20:49.628486Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def list_length(L):\n", " if L == []:\n", " length = 0\n", " else:\n", " length = 1 + list_length(L[1:])\n", " return length" ] }, { "cell_type": "code", "execution_count": 159, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.630161Z", "iopub.status.busy": "2024-01-18T17:20:49.630082Z", "iopub.status.idle": "2024-01-18T17:20:49.675519Z", "shell.execute_reply": "2024-01-18T17:20:49.675242Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m L: L != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m L: \u001b[36misinstance\u001b[39;49;00m(L, \u001b[36mlist\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, L: \u001b[36misinstance\u001b[39;49;00m(return_value, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, L: return_value == \u001b[36mlen\u001b[39;49;00m(L))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, L: return_value >= \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mlist_length\u001b[39;49;00m(L):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m L == []:\u001b[37m\u001b[39;49;00m\n", " length = \u001b[34m0\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " length = \u001b[34m1\u001b[39;49;00m + list_length(L[\u001b[34m1\u001b[39;49;00m:])\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m length\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "with InvariantAnnotator() as annotator:\n", " length = list_length([1, 2, 3])\n", "\n", "print_content(annotator.functions_with_invariants(), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Almost all these properties (except for the very first) are relevant. Of course, the reason the invariants are so neat is that the return value is equal to `len(L)` is that `X == len(Y)` is part of the list of properties to be checked." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The next example is a very simple function:" ] }, { "cell_type": "code", "execution_count": 160, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.677250Z", "iopub.status.busy": "2024-01-18T17:20:49.677128Z", "iopub.status.idle": "2024-01-18T17:20:49.678871Z", "shell.execute_reply": "2024-01-18T17:20:49.678617Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def sum2(a, b):\n", " return a + b" ] }, { "cell_type": "code", "execution_count": 161, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.680749Z", "iopub.status.busy": "2024-01-18T17:20:49.680607Z", "iopub.status.idle": "2024-01-18T17:20:49.682492Z", "shell.execute_reply": "2024-01-18T17:20:49.682231Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with InvariantAnnotator() as annotator:\n", " sum2(31, 45)\n", " sum2(0, 0)\n", " sum2(-1, -5)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The invariants all capture the relationship between `a`, `b`, and the return value as `return_value == a + b` in all its variations." ] }, { "cell_type": "code", "execution_count": 162, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.684053Z", "iopub.status.busy": "2024-01-18T17:20:49.683951Z", "iopub.status.idle": "2024-01-18T17:20:49.725760Z", "shell.execute_reply": "2024-01-18T17:20:49.725487Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: \u001b[36misinstance\u001b[39;49;00m(a, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: \u001b[36misinstance\u001b[39;49;00m(b, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a == return_value - b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b == return_value - a)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: \u001b[36misinstance\u001b[39;49;00m(return_value, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == a + b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == b + a)\u001b[37m\u001b[39;49;00m\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32msum2\u001b[39;49;00m(a, b):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m a + b\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(annotator.functions_with_invariants(), '.py')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we have a function without return value, the return value is `None`, and we can only mine preconditions. (Well, we get a \"postcondition\" that the return value is non-zero, which holds for `None`)." ] }, { "cell_type": "code", "execution_count": 163, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.727591Z", "iopub.status.busy": "2024-01-18T17:20:49.727471Z", "iopub.status.idle": "2024-01-18T17:20:49.729122Z", "shell.execute_reply": "2024-01-18T17:20:49.728854Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def print_sum(a, b):\n", " print(a + b)" ] }, { "cell_type": "code", "execution_count": 164, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.730587Z", "iopub.status.busy": "2024-01-18T17:20:49.730482Z", "iopub.status.idle": "2024-01-18T17:20:49.732412Z", "shell.execute_reply": "2024-01-18T17:20:49.732172Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "76\n", "0\n", "-6\n" ] } ], "source": [ "with InvariantAnnotator() as annotator:\n", " print_sum(31, 45)\n", " print_sum(0, 0)\n", " print_sum(-1, -5)" ] }, { "cell_type": "code", "execution_count": 165, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.734029Z", "iopub.status.busy": "2024-01-18T17:20:49.733924Z", "iopub.status.idle": "2024-01-18T17:20:49.862212Z", "shell.execute_reply": "2024-01-18T17:20:49.861781Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: \u001b[36misinstance\u001b[39;49;00m(a, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: \u001b[36misinstance\u001b[39;49;00m(b, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mprint_sum\u001b[39;49;00m(a, b):\u001b[37m\u001b[39;49;00m\n", " \u001b[36mprint\u001b[39;49;00m(a + b)\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(annotator.functions_with_invariants(), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Checking Specifications\n", "\n", "A function with invariants, as above, can be fed into the Python interpreter, such that all pre- and postconditions are checked. We create a function `my_sqrt_annotated()` which includes all the invariants mined above." ] }, { "cell_type": "code", "execution_count": 166, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.864004Z", "iopub.status.busy": "2024-01-18T17:20:49.863879Z", "iopub.status.idle": "2024-01-18T17:20:49.865688Z", "shell.execute_reply": "2024-01-18T17:20:49.865456Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with InvariantAnnotator() as annotator:\n", " y = my_sqrt(25.0)\n", " y = my_sqrt(0.01)" ] }, { "cell_type": "code", "execution_count": 167, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.867201Z", "iopub.status.busy": "2024-01-18T17:20:49.867088Z", "iopub.status.idle": "2024-01-18T17:20:49.876146Z", "shell.execute_reply": "2024-01-18T17:20:49.875899Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "my_sqrt_def = annotator.functions_with_invariants()\n", "my_sqrt_def = my_sqrt_def.replace('my_sqrt', 'my_sqrt_annotated')" ] }, { "cell_type": "code", "execution_count": 168, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.877812Z", "iopub.status.busy": "2024-01-18T17:20:49.877682Z", "iopub.status.idle": "2024-01-18T17:20:49.908681Z", "shell.execute_reply": "2024-01-18T17:20:49.908409Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m x: \u001b[36misinstance\u001b[39;49;00m(x, \u001b[36mfloat\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m x: x != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m x: x > \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m x: x >= \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, x: \u001b[36misinstance\u001b[39;49;00m(return_value, \u001b[36mfloat\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, x: return_value != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, x: return_value > \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, x: return_value >= \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt_annotated\u001b[39;49;00m(x):\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(my_sqrt_def, '.py')" ] }, { "cell_type": "code", "execution_count": 169, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.910295Z", "iopub.status.busy": "2024-01-18T17:20:49.910187Z", "iopub.status.idle": "2024-01-18T17:20:49.912199Z", "shell.execute_reply": "2024-01-18T17:20:49.911928Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "exec(my_sqrt_def)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The \"annotated\" version checks against invalid arguments – or more precisely, against arguments with properties that have not been observed yet:" ] }, { "cell_type": "code", "execution_count": 170, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.913948Z", "iopub.status.busy": "2024-01-18T17:20:49.913834Z", "iopub.status.idle": "2024-01-18T17:20:49.915654Z", "shell.execute_reply": "2024-01-18T17:20:49.915423Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/3390953352.py\", line 2, in \n", " my_sqrt_annotated(-1.0)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/906718213.py\", line 8, in wrapper\n", " retval = func(*args, **kwargs) # call original function or method\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/906718213.py\", line 8, in wrapper\n", " retval = func(*args, **kwargs) # call original function or method\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/906718213.py\", line 6, in wrapper\n", " assert precondition(*args, **kwargs), \"Precondition violated\"\n", "AssertionError: Precondition violated (expected)\n" ] } ], "source": [ "with ExpectError():\n", " my_sqrt_annotated(-1.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This is in contrast to the original version, which just hangs on negative values:" ] }, { "cell_type": "code", "execution_count": 171, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:49.917153Z", "iopub.status.busy": "2024-01-18T17:20:49.917053Z", "iopub.status.idle": "2024-01-18T17:20:50.919792Z", "shell.execute_reply": "2024-01-18T17:20:50.919489Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/2605599394.py\", line 2, in \n", " my_sqrt(-1.0)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/2661069967.py\", line 5, in my_sqrt\n", " while approx != guess:\n", " File \"/Users/zeller/Projects/fuzzingbook/notebooks/Timeout.ipynb\", line 43, in timeout_handler\n", " raise TimeoutError()\n", "TimeoutError (expected)\n" ] } ], "source": [ "with ExpectTimeout(1):\n", " my_sqrt(-1.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we make changes to the function definition such that the properties of the return value change, such _regressions_ are caught as violations of the postconditions. Let us illustrate this by simply inverting the result, and return $-2$ as square root of 4." ] }, { "cell_type": "code", "execution_count": 172, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:50.921474Z", "iopub.status.busy": "2024-01-18T17:20:50.921356Z", "iopub.status.idle": "2024-01-18T17:20:50.923107Z", "shell.execute_reply": "2024-01-18T17:20:50.922858Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "my_sqrt_def = my_sqrt_def.replace('my_sqrt_annotated', 'my_sqrt_negative')\n", "my_sqrt_def = my_sqrt_def.replace('return approx', 'return -approx')" ] }, { "cell_type": "code", "execution_count": 173, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:50.924595Z", "iopub.status.busy": "2024-01-18T17:20:50.924493Z", "iopub.status.idle": "2024-01-18T17:20:50.956089Z", "shell.execute_reply": "2024-01-18T17:20:50.955812Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m x: \u001b[36misinstance\u001b[39;49;00m(x, \u001b[36mfloat\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m x: x != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m x: x > \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m x: x >= \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, x: \u001b[36misinstance\u001b[39;49;00m(return_value, \u001b[36mfloat\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, x: return_value != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, x: return_value > \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, x: return_value >= \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt_negative\u001b[39;49;00m(x):\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m -approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(my_sqrt_def, '.py')" ] }, { "cell_type": "code", "execution_count": 174, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:50.957665Z", "iopub.status.busy": "2024-01-18T17:20:50.957576Z", "iopub.status.idle": "2024-01-18T17:20:50.959459Z", "shell.execute_reply": "2024-01-18T17:20:50.959224Z" }, "slideshow": { "slide_type": "subslide" }, "tags": [] }, "outputs": [], "source": [ "exec(my_sqrt_def)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Technically speaking, $-2$ _is_ a square root of 4, since $(-2)^2 = 4$ holds. Yet, such a change may be unexpected by callers of `my_sqrt()`, and hence, this would be caught with the first call:" ] }, { "cell_type": "code", "execution_count": 175, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:50.961107Z", "iopub.status.busy": "2024-01-18T17:20:50.960998Z", "iopub.status.idle": "2024-01-18T17:20:50.963258Z", "shell.execute_reply": "2024-01-18T17:20:50.962968Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/2428286732.py\", line 2, in \n", " my_sqrt_negative(2.0) # type: ignore\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/906718213.py\", line 8, in wrapper\n", " retval = func(*args, **kwargs) # call original function or method\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/906718213.py\", line 8, in wrapper\n", " retval = func(*args, **kwargs) # call original function or method\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/906718213.py\", line 8, in wrapper\n", " retval = func(*args, **kwargs) # call original function or method\n", " [Previous line repeated 4 more times]\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/906718213.py\", line 10, in wrapper\n", " assert postcondition(retval, *args, **kwargs), \"Postcondition violated\"\n", "AssertionError: Postcondition violated (expected)\n" ] } ], "source": [ "with ExpectError():\n", " my_sqrt_negative(2.0) # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We see how pre- and postconditions, as well as types, can serve as *oracles* during testing. In particular, once we have mined them for a set of functions, we can check them again and again with test generators – especially after code changes. The more checks we have, and the more specific they are, the more likely it is we can detect unwanted effects of changes." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Mining Specifications from Generated Tests\n", "\n", "Mined specifications can only be as good as the executions they were mined from. If we only see a single call to, say, `sum2()` as defined above, we will be faced with several mined pre- and postconditions that _overspecialize_ towards the values seen:" ] }, { "cell_type": "code", "execution_count": 176, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:50.964882Z", "iopub.status.busy": "2024-01-18T17:20:50.964760Z", "iopub.status.idle": "2024-01-18T17:20:51.007299Z", "shell.execute_reply": "2024-01-18T17:20:51.007009Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a <= b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a == b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a > \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a >= \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a >= b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: b != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: b <= a)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: b == a)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: b > \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: b >= \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: b >= a)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: \u001b[36misinstance\u001b[39;49;00m(a, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: \u001b[36misinstance\u001b[39;49;00m(b, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a < return_value)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a <= b <= return_value)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a <= return_value)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a == return_value - b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a == return_value / b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b < return_value)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b <= a <= return_value)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b <= return_value)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b == return_value - a)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b == return_value / a)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: \u001b[36misinstance\u001b[39;49;00m(return_value, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == a * b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == a + b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == b * a)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == b + a)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value > \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value > a)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value > b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value >= \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value >= a)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value >= a >= b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value >= b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value >= b >= a)\u001b[37m\u001b[39;49;00m\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32msum2\u001b[39;49;00m(a, b):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m a + b\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "with InvariantAnnotator() as annotator:\n", " y = sum2(2, 2)\n", "print_content(annotator.functions_with_invariants(), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The mined precondition `a == b`, for instance, only holds for the single call observed; the same holds for the mined postcondition `return_value == a * b`. Yet, `sum2()` can obviously be successfully called with other values that do not satisfy these conditions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To get out of this trap, we have to _learn from more and more diverse runs_. If we have a few more calls of `sum2()`, we see how the set of invariants quickly gets smaller:" ] }, { "cell_type": "code", "execution_count": 177, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.009078Z", "iopub.status.busy": "2024-01-18T17:20:51.008964Z", "iopub.status.idle": "2024-01-18T17:20:51.052671Z", "shell.execute_reply": "2024-01-18T17:20:51.052392Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: \u001b[36misinstance\u001b[39;49;00m(a, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: \u001b[36misinstance\u001b[39;49;00m(b, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a == return_value - b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b == return_value - a)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: \u001b[36misinstance\u001b[39;49;00m(return_value, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == a + b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == b + a)\u001b[37m\u001b[39;49;00m\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32msum2\u001b[39;49;00m(a, b):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m a + b\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "with InvariantAnnotator() as annotator:\n", " length = sum2(1, 2)\n", " length = sum2(-1, -2)\n", " length = sum2(0, 0)\n", "\n", "print_content(annotator.functions_with_invariants(), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "But where to we get such diverse runs from? This is the job of generating software tests. A simple grammar for calls of `sum2()` will easily resolve the problem." ] }, { "cell_type": "code", "execution_count": 178, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.054476Z", "iopub.status.busy": "2024-01-18T17:20:51.054348Z", "iopub.status.idle": "2024-01-18T17:20:51.150393Z", "shell.execute_reply": "2024-01-18T17:20:51.150093Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from GrammarFuzzer import GrammarFuzzer # minor dependency\n", "from Grammars import is_valid_grammar, crange # minor dependency\n", "from Grammars import convert_ebnf_grammar, Grammar # minor dependency" ] }, { "cell_type": "code", "execution_count": 179, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.152194Z", "iopub.status.busy": "2024-01-18T17:20:51.152104Z", "iopub.status.idle": "2024-01-18T17:20:51.154128Z", "shell.execute_reply": "2024-01-18T17:20:51.153872Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "SUM2_EBNF_GRAMMAR: Grammar = {\n", " \"\": [\"\"],\n", " \"\": [\"sum2(, )\"],\n", " \"\": [\"<_int>\"],\n", " \"<_int>\": [\"(-)?*\", \"0\"],\n", " \"\": crange('1', '9'),\n", " \"\": crange('0', '9')\n", "}" ] }, { "cell_type": "code", "execution_count": 180, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.155658Z", "iopub.status.busy": "2024-01-18T17:20:51.155556Z", "iopub.status.idle": "2024-01-18T17:20:51.157184Z", "shell.execute_reply": "2024-01-18T17:20:51.156919Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert is_valid_grammar(SUM2_EBNF_GRAMMAR)" ] }, { "cell_type": "code", "execution_count": 181, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.158566Z", "iopub.status.busy": "2024-01-18T17:20:51.158484Z", "iopub.status.idle": "2024-01-18T17:20:51.160192Z", "shell.execute_reply": "2024-01-18T17:20:51.159887Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "sum2_grammar = convert_ebnf_grammar(SUM2_EBNF_GRAMMAR)" ] }, { "cell_type": "code", "execution_count": 182, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.161866Z", "iopub.status.busy": "2024-01-18T17:20:51.161727Z", "iopub.status.idle": "2024-01-18T17:20:51.166587Z", "shell.execute_reply": "2024-01-18T17:20:51.166320Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "['sum2(60, 3)',\n", " 'sum2(-4, 0)',\n", " 'sum2(-579, 34)',\n", " 'sum2(3, 0)',\n", " 'sum2(-8, 0)',\n", " 'sum2(0, 8)',\n", " 'sum2(3, -9)',\n", " 'sum2(0, 0)',\n", " 'sum2(0, 5)',\n", " 'sum2(-3181, 0)']" ] }, "execution_count": 182, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum2_fuzzer = GrammarFuzzer(sum2_grammar)\n", "[sum2_fuzzer.fuzz() for i in range(10)]" ] }, { "cell_type": "code", "execution_count": 183, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.168272Z", "iopub.status.busy": "2024-01-18T17:20:51.168162Z", "iopub.status.idle": "2024-01-18T17:20:51.363022Z", "shell.execute_reply": "2024-01-18T17:20:51.362683Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: \u001b[36misinstance\u001b[39;49;00m(a, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: \u001b[36misinstance\u001b[39;49;00m(b, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a == return_value - b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b == return_value - a)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: \u001b[36misinstance\u001b[39;49;00m(return_value, \u001b[36mint\u001b[39;49;00m))\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value != \u001b[34m0\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == a + b)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == b + a)\u001b[37m\u001b[39;49;00m\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32msum2\u001b[39;49;00m(a, b):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m a + b\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "with InvariantAnnotator() as annotator:\n", " for i in range(10):\n", " eval(sum2_fuzzer.fuzz())\n", "\n", "print_content(annotator.function_with_invariants('sum2'), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "But then, writing tests (or a test driver) just to derive a set of pre- and postconditions may possibly be too much effort – in particular, since tests can easily be derived from given pre- and postconditions in the first place. Hence, it would be wiser to first specify invariants and then let test generators or program provers do the job." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Also, an API grammar, such as above, will have to be set up such that it actually respects preconditions – in our case, we invoke `sqrt()` with positive numbers only, already assuming its precondition. In some way, one thus needs a specification (a model, a grammar) to mine another specification – a chicken-and-egg problem." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "However, there is one way out of this problem: If one can automatically generate tests at the system level, then one has an _infinite source of executions_ to learn invariants from. In each of these executions, all functions would be called with values that satisfy the (implicit) precondition, allowing us to mine invariants for these functions. This holds, because at the system level, invalid inputs must be rejected by the system in the first place. The meaningful precondition at the system level, ensuring that only valid inputs get through, thus gets broken down into a multitude of meaningful preconditions (and subsequent postconditions) at the function level." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The big requirement for this, though, is that one needs good test generators at the system level. In [the next part](05_Domain-Specific_Fuzzing.ipynb), we will discuss how to automatically generate tests for a variety of domains, from configuration to graphical user interfaces." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Synopsis\n", "\n", "This chapter provides two classes that automatically extract specifications from a function and a set of inputs:\n", "\n", "* `TypeAnnotator` for _types_, and\n", "* `InvariantAnnotator` for _pre-_ and _postconditions_.\n", "\n", "Both work by _observing_ a function and its invocations within a `with` clause. Here is an example for the type annotator:" ] }, { "cell_type": "code", "execution_count": 184, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.364939Z", "iopub.status.busy": "2024-01-18T17:20:51.364815Z", "iopub.status.idle": "2024-01-18T17:20:51.366557Z", "shell.execute_reply": "2024-01-18T17:20:51.366296Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def sum(a, b):\n", " return a + b" ] }, { "cell_type": "code", "execution_count": 185, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.368046Z", "iopub.status.busy": "2024-01-18T17:20:51.367938Z", "iopub.status.idle": "2024-01-18T17:20:51.369718Z", "shell.execute_reply": "2024-01-18T17:20:51.369439Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "with TypeAnnotator() as type_annotator:\n", " sum(1, 2)\n", " sum(-4, -5)\n", " sum(0, 0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The `typed_functions()` method will return a representation of `sum2()` annotated with types observed during execution." ] }, { "cell_type": "code", "execution_count": 186, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.371303Z", "iopub.status.busy": "2024-01-18T17:20:51.371192Z", "iopub.status.idle": "2024-01-18T17:20:51.373234Z", "shell.execute_reply": "2024-01-18T17:20:51.372988Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "def sum(a: int, b: int) -> int:\n", " return a + b\n" ] } ], "source": [ "print(type_annotator.typed_functions())" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The invariant annotator works similarly:" ] }, { "cell_type": "code", "execution_count": 187, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.374734Z", "iopub.status.busy": "2024-01-18T17:20:51.374631Z", "iopub.status.idle": "2024-01-18T17:20:51.376348Z", "shell.execute_reply": "2024-01-18T17:20:51.376093Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with InvariantAnnotator() as inv_annotator:\n", " sum(1, 2)\n", " sum(-4, -5)\n", " sum(0, 0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The `functions_with_invariants()` method will return a representation of `sum2()` annotated with inferred pre- and postconditions that all hold for the observed values." ] }, { "cell_type": "code", "execution_count": 188, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.378017Z", "iopub.status.busy": "2024-01-18T17:20:51.377897Z", "iopub.status.idle": "2024-01-18T17:20:51.391489Z", "shell.execute_reply": "2024-01-18T17:20:51.391194Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "@precondition(lambda b, a: isinstance(a, int))\n", "@precondition(lambda b, a: isinstance(b, int))\n", "@postcondition(lambda return_value, b, a: a == return_value - b)\n", "@postcondition(lambda return_value, b, a: b == return_value - a)\n", "@postcondition(lambda return_value, b, a: isinstance(return_value, int))\n", "@postcondition(lambda return_value, b, a: return_value == a + b)\n", "@postcondition(lambda return_value, b, a: return_value == b + a)\n", "def sum(a, b):\n", " return a + b\n", "\n" ] } ], "source": [ "print(inv_annotator.functions_with_invariants())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Such type specifications and invariants can be helpful as _oracles_ (to detect deviations from a given set of runs) as well as for all kinds of _symbolic code analyses_. The chapter gives details on how to customize the properties checked for." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Lessons Learned\n", "\n", "* Type annotations and explicit invariants allow for _checking_ arguments and results for expected data types and other properties.\n", "* One can automatically _mine_ data types and invariants by observing arguments and results at runtime.\n", "* The quality of mined invariants depends on the diversity of values observed during executions; this variety can be increased by generating tests." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Next Steps\n", "\n", "This chapter concludes the [part on semantic fuzzing techniques](04_Semantical_Fuzzing.ipynb). In the next part, we will explore [domain-specific fuzzing techniques](05_Domain-Specific_Fuzzing.ipynb) from configurations and APIs to graphical user interfaces." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Background\n", "\n", "The [DAIKON dynamic invariant detector](https://plse.cs.washington.edu/daikon/) can be considered the mother of function specification miners. Continuously maintained and extended for more than 20 years, it mines likely invariants in the style of this chapter for a variety of languages, including C, C++, C#, Eiffel, F#, Java, Perl, and Visual Basic. On top of the functionality discussed above, it holds a rich catalog of patterns for likely invariants, supports data invariants, can eliminate invariants that are implied by others, and determines statistical confidence to disregard unlikely invariants. The corresponding paper \\cite{Ernst2001} is one of the seminal and most-cited papers of Software Engineering. A multitude of works have been published based on DAIKON and detecting invariants; see this [curated list](http://plse.cs.washington.edu/daikon/pubs/) for details." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The interaction between test generators and invariant detection is already discussed in \\cite{Ernst2001} (incidentally also using grammars). The Eclat tool \\cite{Pacheco2005} is a model example of tight interaction between a unit-level test generator and DAIKON-style invariant mining, where the mined invariants are used to produce oracles and to systematically guide the test generator towards fault-revealing inputs." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Mining specifications is not restricted to pre- and postconditions. The paper \"Mining Specifications\" \\cite{Ammons2002} is another classic in the field, learning state protocols from executions. Grammar mining, as described in [our chapter with the same name](GrammarMiner.ipynb) can also be seen as a specification mining approach, this time learning specifications of input formats." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As it comes to adding type annotations to existing code, the blog post [\"The state of type hints in Python\"](https://www.bernat.tech/the-state-of-type-hints-in-python/) gives a great overview on how Python type hints can be used and checked. To add type annotations, there are two important tools available that also implement our above approach:\n", "\n", "* [MonkeyType](https://instagram-engineering.com/let-your-code-type-hint-itself-introducing-open-source-monkeytype-a855c7284881) implements the above approach of tracing executions and annotating Python 3 arguments, returns, and variables with type hints.\n", "* [PyAnnotate](https://github.com/dropbox/pyannotate) does a similar job, focusing on code in Python 2. It does not produce Python 3-style annotations, but instead produces annotations as comments that can be processed by static type checkers.\n", "\n", "These tools have been created by engineers at Facebook and Dropbox, respectively, assisting them in checking millions of lines of code for type issues." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercises\n", "\n", "Our code for mining types and invariants is in no way complete. There are dozens of ways to extend our implementations, some of which we discuss in exercises." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 1: Union Types\n", "\n", "The Python `typing` module allows expressing that an argument can have multiple types. For `my_sqrt(x)`, this allows expressing that `x` can be an `int` or a `float`:" ] }, { "cell_type": "code", "execution_count": 189, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.393348Z", "iopub.status.busy": "2024-01-18T17:20:51.393230Z", "iopub.status.idle": "2024-01-18T17:20:51.394948Z", "shell.execute_reply": "2024-01-18T17:20:51.394640Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from typing import Union, Optional" ] }, { "cell_type": "code", "execution_count": 190, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.396626Z", "iopub.status.busy": "2024-01-18T17:20:51.396509Z", "iopub.status.idle": "2024-01-18T17:20:51.398400Z", "shell.execute_reply": "2024-01-18T17:20:51.398122Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def my_sqrt_with_union_type(x: Union[int, float]) -> float: # type: ignore\n", " ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" }, "solution2": "hidden", "solution2_first": true }, "source": [ "Extend the `TypeAnnotator` such that it supports union types for arguments and return values. Use `Optional[X]` as a shorthand for `Union[X, None]`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** Left to the reader. Hint: extend `type_string()`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 2: Types for Local Variables\n", "\n", "In Python, one cannot only annotate arguments with types, but actually also local and global variables – for instance, `approx` and `guess` in our `my_sqrt()` implementation:" ] }, { "cell_type": "code", "execution_count": 191, "metadata": { "button": false, "execution": { "iopub.execute_input": "2024-01-18T17:20:51.400009Z", "iopub.status.busy": "2024-01-18T17:20:51.399912Z", "iopub.status.idle": "2024-01-18T17:20:51.401939Z", "shell.execute_reply": "2024-01-18T17:20:51.401701Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def my_sqrt_with_local_types(x: Union[int, float]) -> float:\n", " \"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\n", " approx: Optional[float] = None\n", " guess: float = x / 2\n", " while approx != guess:\n", " approx = guess\n", " guess = (approx + x / approx) / 2\n", " return approx" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "solution2": "hidden", "solution2_first": true }, "source": [ "Extend the `TypeAnnotator` such that it also annotates local variables with types. Search the function AST for assignments, determine the type of the assigned value, and make it an annotation on the left-hand side." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** Left to the reader." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 3: Verbose Invariant Checkers\n", "\n", "Our implementation of invariant checkers does not make it clear for the user which pre-/postcondition failed." ] }, { "cell_type": "code", "execution_count": 192, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.403699Z", "iopub.status.busy": "2024-01-18T17:20:51.403576Z", "iopub.status.idle": "2024-01-18T17:20:51.405328Z", "shell.execute_reply": "2024-01-18T17:20:51.405093Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "@precondition(lambda s: len(s) > 0)\n", "def remove_first_char(s):\n", " return s[1:]" ] }, { "cell_type": "code", "execution_count": 193, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.406714Z", "iopub.status.busy": "2024-01-18T17:20:51.406618Z", "iopub.status.idle": "2024-01-18T17:20:51.408334Z", "shell.execute_reply": "2024-01-18T17:20:51.408105Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/2212034949.py\", line 2, in \n", " remove_first_char('')\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/906718213.py\", line 6, in wrapper\n", " assert precondition(*args, **kwargs), \"Precondition violated\"\n", "AssertionError: Precondition violated (expected)\n" ] } ], "source": [ "with ExpectError():\n", " remove_first_char('')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The following implementation adds an optional `doc` keyword argument which is printed if the invariant is violated:" ] }, { "cell_type": "code", "execution_count": 194, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.409791Z", "iopub.status.busy": "2024-01-18T17:20:51.409687Z", "iopub.status.idle": "2024-01-18T17:20:51.412037Z", "shell.execute_reply": "2024-01-18T17:20:51.411778Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def verbose_condition(precondition=None, postcondition=None, doc='Unknown'):\n", " def decorator(func):\n", " @functools.wraps(func) # preserves name, docstring, etc\n", " def wrapper(*args, **kwargs):\n", " if precondition is not None:\n", " assert precondition(*args, **kwargs), \"Precondition violated: \" + doc\n", "\n", " retval = func(*args, **kwargs) # call original function or method\n", " if postcondition is not None:\n", " assert postcondition(retval, *args, **kwargs), \"Postcondition violated: \" + doc\n", "\n", " return retval\n", " return wrapper\n", " return decorator" ] }, { "cell_type": "code", "execution_count": 195, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.413781Z", "iopub.status.busy": "2024-01-18T17:20:51.413650Z", "iopub.status.idle": "2024-01-18T17:20:51.415536Z", "shell.execute_reply": "2024-01-18T17:20:51.415289Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def verbose_precondition(check, **kwargs): # type: ignore\n", " return verbose_condition(precondition=check, doc=kwargs.get('doc', 'Unknown'))" ] }, { "cell_type": "code", "execution_count": 196, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.417060Z", "iopub.status.busy": "2024-01-18T17:20:51.416954Z", "iopub.status.idle": "2024-01-18T17:20:51.418624Z", "shell.execute_reply": "2024-01-18T17:20:51.418366Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def verbose_postcondition(check, **kwargs): # type: ignore\n", " return verbose_condition(postcondition=check, doc=kwargs.get('doc', 'Unknown'))" ] }, { "cell_type": "code", "execution_count": 197, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.420018Z", "iopub.status.busy": "2024-01-18T17:20:51.419934Z", "iopub.status.idle": "2024-01-18T17:20:51.422377Z", "shell.execute_reply": "2024-01-18T17:20:51.422103Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'bc'" ] }, "execution_count": 197, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@verbose_precondition(lambda s: len(s) > 0, doc=\"len(s) > 0\") # type: ignore\n", "def remove_first_char(s):\n", " return s[1:]\n", "\n", "remove_first_char('abc')" ] }, { "cell_type": "code", "execution_count": 198, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.423808Z", "iopub.status.busy": "2024-01-18T17:20:51.423711Z", "iopub.status.idle": "2024-01-18T17:20:51.425413Z", "shell.execute_reply": "2024-01-18T17:20:51.425165Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/2212034949.py\", line 2, in \n", " remove_first_char('')\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/860932556.py\", line 6, in wrapper\n", " assert precondition(*args, **kwargs), \"Precondition violated: \" + doc\n", "AssertionError: Precondition violated: len(s) > 0 (expected)\n" ] } ], "source": [ "with ExpectError():\n", " remove_first_char('')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" }, "solution2": "hidden", "solution2_first": true }, "source": [ "Extend `InvariantAnnotator` such that it includes the conditions in the generated pre- and postconditions." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** Here's a simple solution:" ] }, { "cell_type": "code", "execution_count": 199, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.426977Z", "iopub.status.busy": "2024-01-18T17:20:51.426874Z", "iopub.status.idle": "2024-01-18T17:20:51.429095Z", "shell.execute_reply": "2024-01-18T17:20:51.428800Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "class InvariantAnnotator(InvariantAnnotator):\n", " def preconditions(self, function_name):\n", " conditions = []\n", "\n", " for inv in pretty_invariants(self.invariants(function_name)):\n", " if inv.find(RETURN_VALUE) >= 0:\n", " continue # Postcondition\n", "\n", " cond = \"@verbose_precondition(lambda \" + self.params(function_name) + \": \" + inv + ', doc=' + repr(inv) + \")\"\n", " conditions.append(cond)\n", "\n", " return conditions" ] }, { "cell_type": "code", "execution_count": 200, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.430806Z", "iopub.status.busy": "2024-01-18T17:20:51.430689Z", "iopub.status.idle": "2024-01-18T17:20:51.432909Z", "shell.execute_reply": "2024-01-18T17:20:51.432675Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "class InvariantAnnotator(InvariantAnnotator):\n", " def postconditions(self, function_name):\n", " conditions = []\n", "\n", " for inv in pretty_invariants(self.invariants(function_name)):\n", " if inv.find(RETURN_VALUE) < 0:\n", " continue # Precondition\n", "\n", " cond = (\"@verbose_postcondition(lambda \" + \n", " RETURN_VALUE + \", \" + self.params(function_name) + \": \" + inv + ', doc=' + repr(inv) + \")\")\n", " conditions.append(cond)\n", "\n", " return conditions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "The resulting annotations are harder to read, but easier to diagnose:" ] }, { "cell_type": "code", "execution_count": 201, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.434370Z", "iopub.status.busy": "2024-01-18T17:20:51.434271Z", "iopub.status.idle": "2024-01-18T17:20:51.477391Z", "shell.execute_reply": "2024-01-18T17:20:51.477045Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a != \u001b[34m0\u001b[39;49;00m, doc=\u001b[33m'\u001b[39;49;00m\u001b[33ma != 0\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a <= b, doc=\u001b[33m'\u001b[39;49;00m\u001b[33ma <= b\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a == b, doc=\u001b[33m'\u001b[39;49;00m\u001b[33ma == b\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a > \u001b[34m0\u001b[39;49;00m, doc=\u001b[33m'\u001b[39;49;00m\u001b[33ma > 0\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a >= \u001b[34m0\u001b[39;49;00m, doc=\u001b[33m'\u001b[39;49;00m\u001b[33ma >= 0\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: a >= b, doc=\u001b[33m'\u001b[39;49;00m\u001b[33ma >= b\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: b != \u001b[34m0\u001b[39;49;00m, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mb != 0\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: b <= a, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mb <= a\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: b == a, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mb == a\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: b > \u001b[34m0\u001b[39;49;00m, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mb > 0\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: b >= \u001b[34m0\u001b[39;49;00m, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mb >= 0\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: b >= a, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mb >= a\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: \u001b[36misinstance\u001b[39;49;00m(a, \u001b[36mint\u001b[39;49;00m), doc=\u001b[33m'\u001b[39;49;00m\u001b[33misinstance(a, int)\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_precondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m b, a: \u001b[36misinstance\u001b[39;49;00m(b, \u001b[36mint\u001b[39;49;00m), doc=\u001b[33m'\u001b[39;49;00m\u001b[33misinstance(b, int)\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a < return_value, doc=\u001b[33m'\u001b[39;49;00m\u001b[33ma < return_value\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a <= b <= return_value, doc=\u001b[33m'\u001b[39;49;00m\u001b[33ma <= b <= return_value\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a <= return_value, doc=\u001b[33m'\u001b[39;49;00m\u001b[33ma <= return_value\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a == return_value - b, doc=\u001b[33m'\u001b[39;49;00m\u001b[33ma == return_value - b\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: a == return_value / b, doc=\u001b[33m'\u001b[39;49;00m\u001b[33ma == return_value / b\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b < return_value, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mb < return_value\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b <= a <= return_value, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mb <= a <= return_value\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b <= return_value, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mb <= return_value\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b == return_value - a, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mb == return_value - a\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: b == return_value / a, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mb == return_value / a\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: \u001b[36misinstance\u001b[39;49;00m(return_value, \u001b[36mint\u001b[39;49;00m), doc=\u001b[33m'\u001b[39;49;00m\u001b[33misinstance(return_value, int)\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value != \u001b[34m0\u001b[39;49;00m, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value != 0\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == a * b, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value == a * b\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == a + b, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value == a + b\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == b * a, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value == b * a\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value == b + a, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value == b + a\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value > \u001b[34m0\u001b[39;49;00m, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value > 0\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value > a, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value > a\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value > b, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value > b\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value >= \u001b[34m0\u001b[39;49;00m, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value >= 0\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value >= a, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value >= a\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value >= a >= b, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value >= a >= b\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value >= b, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value >= b\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[90m@verbose_postcondition\u001b[39;49;00m(\u001b[34mlambda\u001b[39;49;00m return_value, b, a: return_value >= b >= a, doc=\u001b[33m'\u001b[39;49;00m\u001b[33mreturn_value >= b >= a\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\u001b[37m\u001b[39;49;00m\n", "\u001b[34mdef\u001b[39;49;00m \u001b[32msum2\u001b[39;49;00m(a, b):\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m a + b\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "with InvariantAnnotator() as annotator:\n", " y = sum2(2, 2)\n", "print_content(annotator.functions_with_invariants(), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "As an alternative, one may be able to use `inspect.getsource()` on the lambda expression or unparse it. This is left to the reader." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "solution2": "hidden", "solution2_first": true }, "source": [ "### Exercise 4: Save Initial Values\n", "\n", "If the value of an argument changes during function execution, this can easily confuse our implementation: The values are tracked at the beginning of the function, but checked only when it returns. Extend the `InvariantAnnotator` and the infrastructure it uses such that\n", "\n", "* it saves argument values both at the beginning and at the end of a function invocation;\n", "* postconditions can be expressed over both _initial_ values of arguments as well as the _final_ values of arguments;\n", "* the mined postconditions refer to both these values as well." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** To be added." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "solution2": "hidden", "solution2_first": true }, "source": [ "### Exercise 5: Implications\n", "\n", "Several mined invariant are actually _implied_ by others: If `x > 0` holds, then this implies `x >= 0` and `x != 0`. Extend the `InvariantAnnotator` such that implications between properties are explicitly encoded, and such that implied properties are no longer listed as invariants. See \\cite{Ernst2001} for ideas." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** Left to the reader." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "solution2": "hidden", "solution2_first": true }, "source": [ "### Exercise 6: Local Variables\n", "\n", "Postconditions may also refer to the values of local variables. Consider extending `InvariantAnnotator` and its infrastructure such that the values of local variables at the end of the execution are also recorded and made part of the invariant inference mechanism." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** Left to the reader." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "solution2": "hidden", "solution2_first": true }, "source": [ "### Exercise 7: Exploring Invariant Alternatives\n", "\n", "After mining a first set of invariants, have a [concolic fuzzer](ConcolicFuzzer.ipynb) generate tests that systematically attempt to invalidate pre- and postconditions. How far can you generalize?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** To be added." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "solution2": "hidden", "solution2_first": true }, "source": [ "### Exercise 8: Grammar-Generated Properties\n", "\n", "The larger the set of properties to be checked, the more potential invariants can be discovered. Create a _grammar_ that systematically produces a large set of properties. See \\cite{Ernst2001} for possible patterns." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** Left to the reader." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" }, "solution": "hidden", "solution2": "hidden", "solution2_first": true, "solution_first": true, "toc-hr-collapsed": false }, "source": [ "### Exercise 9: Embedding Invariants as Assertions\n", "\n", "Rather than producing invariants as annotations for pre- and postconditions, insert them as `assert` statements into the function code, as in:\n", "\n", "```python\n", "def my_sqrt(x):\n", " 'Computes the square root of x, using the Newton-Raphson method'\n", " assert isinstance(x, int), 'violated precondition'\n", " assert (x > 0), 'violated precondition'\n", " approx = None\n", " guess = (x / 2)\n", " while (approx != guess):\n", " approx = guess\n", " guess = ((approx + (x / approx)) / 2)\n", " return_value = approx\n", " assert (return_value < x), 'violated postcondition'\n", " assert isinstance(return_value, float), 'violated postcondition'\n", " return approx\n", "```\n", "\n", "Such a formulation may make it easier for test generators and symbolic analysis to access and interpret pre- and postconditions." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" }, "solution": "hidden", "solution2": "hidden" }, "source": [ "**Solution.** Here is a tentative implementation that inserts invariants into function ASTs." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden", "toc-hr-collapsed": true }, "source": [ "Part 1: Embedding Invariants into Functions" ] }, { "cell_type": "code", "execution_count": 202, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.479810Z", "iopub.status.busy": "2024-01-18T17:20:51.479513Z", "iopub.status.idle": "2024-01-18T17:20:51.482559Z", "shell.execute_reply": "2024-01-18T17:20:51.482316Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "class EmbeddedInvariantAnnotator(InvariantTracker):\n", " def functions_with_invariants_ast(self, function_name=None):\n", " if function_name is None:\n", " return annotate_functions_with_invariants(self.invariants())\n", " \n", " return annotate_function_with_invariants(function_name, self.invariants(function_name))\n", " \n", " def functions_with_invariants(self, function_name=None):\n", " if function_name is None:\n", " functions = ''\n", " for f_name in self.invariants():\n", " try:\n", " f_text = ast.unparse(self.functions_with_invariants_ast(f_name))\n", " except KeyError:\n", " f_text = ''\n", " functions += f_text\n", " return functions\n", "\n", " return ast.unparse(self.functions_with_invariants_ast(function_name))\n", " \n", " def function_with_invariants(self, function_name):\n", " return self.functions_with_invariants(function_name)\n", " def function_with_invariants_ast(self, function_name):\n", " return self.functions_with_invariants_ast(function_name)" ] }, { "cell_type": "code", "execution_count": 203, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.483967Z", "iopub.status.busy": "2024-01-18T17:20:51.483886Z", "iopub.status.idle": "2024-01-18T17:20:51.486003Z", "shell.execute_reply": "2024-01-18T17:20:51.485729Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "def annotate_invariants(invariants):\n", " annotated_functions = {}\n", " \n", " for function_name in invariants:\n", " try:\n", " annotated_functions[function_name] = annotate_function_with_invariants(function_name, invariants[function_name])\n", " except KeyError:\n", " continue\n", "\n", " return annotated_functions" ] }, { "cell_type": "code", "execution_count": 204, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.487500Z", "iopub.status.busy": "2024-01-18T17:20:51.487420Z", "iopub.status.idle": "2024-01-18T17:20:51.489372Z", "shell.execute_reply": "2024-01-18T17:20:51.489136Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "def annotate_function_with_invariants(function_name, function_invariants):\n", " function = globals()[function_name]\n", " function_code = inspect.getsource(function)\n", " function_ast = ast.parse(function_code)\n", " return annotate_function_ast_with_invariants(function_ast, function_invariants)" ] }, { "cell_type": "code", "execution_count": 205, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.490811Z", "iopub.status.busy": "2024-01-18T17:20:51.490727Z", "iopub.status.idle": "2024-01-18T17:20:51.492514Z", "shell.execute_reply": "2024-01-18T17:20:51.492248Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "def annotate_function_ast_with_invariants(function_ast, function_invariants):\n", " annotated_function_ast = EmbeddedInvariantTransformer(function_invariants).visit(function_ast)\n", " return annotated_function_ast" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "Part 2: Preconditions" ] }, { "cell_type": "code", "execution_count": 206, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.494051Z", "iopub.status.busy": "2024-01-18T17:20:51.493956Z", "iopub.status.idle": "2024-01-18T17:20:51.497256Z", "shell.execute_reply": "2024-01-18T17:20:51.496954Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "class PreconditionTransformer(ast.NodeTransformer):\n", " def __init__(self, invariants):\n", " self.invariants = invariants\n", " super().__init__()\n", "\n", " def preconditions(self):\n", " preconditions = []\n", " for (prop, var_names) in self.invariants:\n", " assertion = \"assert \" + instantiate_prop(prop, var_names) + ', \"violated precondition\"'\n", " assertion_ast = ast.parse(assertion)\n", "\n", " if assertion.find(RETURN_VALUE) < 0:\n", " preconditions += assertion_ast.body\n", "\n", " return preconditions\n", "\n", " def insert_assertions(self, body):\n", " preconditions = self.preconditions()\n", " try:\n", " docstring = body[0].value.s\n", " except:\n", " docstring = None\n", "\n", " if docstring:\n", " return [body[0]] + preconditions + body[1:]\n", " else:\n", " return preconditions + body\n", "\n", " def visit_FunctionDef(self, node):\n", " \"\"\"Add invariants to function\"\"\"\n", " # print(ast.dump(node))\n", " node.body = self.insert_assertions(node.body)\n", " return node " ] }, { "cell_type": "code", "execution_count": 207, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.498841Z", "iopub.status.busy": "2024-01-18T17:20:51.498750Z", "iopub.status.idle": "2024-01-18T17:20:51.500335Z", "shell.execute_reply": "2024-01-18T17:20:51.500073Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "class EmbeddedInvariantTransformer(PreconditionTransformer):\n", " pass" ] }, { "cell_type": "code", "execution_count": 208, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.501810Z", "iopub.status.busy": "2024-01-18T17:20:51.501724Z", "iopub.status.idle": "2024-01-18T17:20:51.503340Z", "shell.execute_reply": "2024-01-18T17:20:51.503120Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "with EmbeddedInvariantAnnotator() as annotator:\n", " my_sqrt(5)" ] }, { "cell_type": "code", "execution_count": 209, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.504665Z", "iopub.status.busy": "2024-01-18T17:20:51.504590Z", "iopub.status.idle": "2024-01-18T17:20:51.538056Z", "shell.execute_reply": "2024-01-18T17:20:51.537784Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt\u001b[39;49;00m(x):\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m x >= \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(x, \u001b[36mint\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m x > \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m x != \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(annotator.functions_with_invariants(), '.py')" ] }, { "cell_type": "code", "execution_count": 210, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.539576Z", "iopub.status.busy": "2024-01-18T17:20:51.539484Z", "iopub.status.idle": "2024-01-18T17:20:51.541440Z", "shell.execute_reply": "2024-01-18T17:20:51.541212Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "with EmbeddedInvariantAnnotator() as annotator:\n", " y = sum3(3, 4, 5)\n", " y = sum3(-3, -4, -5)\n", " y = sum3(0, 0, 0)" ] }, { "cell_type": "code", "execution_count": 211, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.542850Z", "iopub.status.busy": "2024-01-18T17:20:51.542760Z", "iopub.status.idle": "2024-01-18T17:20:51.582457Z", "shell.execute_reply": "2024-01-18T17:20:51.582051Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32msum3\u001b[39;49;00m(a, b, c):\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(a, \u001b[36mint\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(c, \u001b[36mint\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(b, \u001b[36mint\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m a + b + c\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(annotator.functions_with_invariants(), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "Part 3: Postconditions\n", "\n", "We make a few simplifying assumptions: \n", "\n", "* Variables do not change during execution.\n", "* There is a single `return` statement at the end of the function." ] }, { "cell_type": "code", "execution_count": 212, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.584277Z", "iopub.status.busy": "2024-01-18T17:20:51.584162Z", "iopub.status.idle": "2024-01-18T17:20:51.587190Z", "shell.execute_reply": "2024-01-18T17:20:51.586935Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "class EmbeddedInvariantTransformer(PreconditionTransformer):\n", " def postconditions(self):\n", " postconditions = []\n", "\n", " for (prop, var_names) in self.invariants:\n", " assertion = \"assert \" + instantiate_prop(prop, var_names) + ', \"violated postcondition\"'\n", " assertion_ast = ast.parse(assertion)\n", "\n", " if assertion.find(RETURN_VALUE) >= 0:\n", " postconditions += assertion_ast.body\n", "\n", " return postconditions\n", " \n", " def insert_assertions(self, body):\n", " new_body = super().insert_assertions(body)\n", " postconditions = self.postconditions()\n", "\n", " body_ends_with_return = isinstance(new_body[-1], ast.Return)\n", " if body_ends_with_return:\n", " saver = RETURN_VALUE + \" = \" + ast.unparse(new_body[-1].value)\n", " else:\n", " saver = RETURN_VALUE + \" = None\"\n", " \n", " saver_ast = ast.parse(saver)\n", " postconditions = [saver_ast] + postconditions\n", "\n", " if body_ends_with_return:\n", " return new_body[:-1] + postconditions + [new_body[-1]]\n", " else:\n", " return new_body + postconditions" ] }, { "cell_type": "code", "execution_count": 213, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.588701Z", "iopub.status.busy": "2024-01-18T17:20:51.588615Z", "iopub.status.idle": "2024-01-18T17:20:51.590202Z", "shell.execute_reply": "2024-01-18T17:20:51.589973Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "with EmbeddedInvariantAnnotator() as annotator:\n", " my_sqrt(5)" ] }, { "cell_type": "code", "execution_count": 214, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.591590Z", "iopub.status.busy": "2024-01-18T17:20:51.591510Z", "iopub.status.idle": "2024-01-18T17:20:51.597335Z", "shell.execute_reply": "2024-01-18T17:20:51.597005Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "my_sqrt_def = annotator.functions_with_invariants()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "Here's the full definition with included assertions:" ] }, { "cell_type": "code", "execution_count": 215, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.599263Z", "iopub.status.busy": "2024-01-18T17:20:51.599126Z", "iopub.status.idle": "2024-01-18T17:20:51.631460Z", "shell.execute_reply": "2024-01-18T17:20:51.631163Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mmy_sqrt\u001b[39;49;00m(x):\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[33m\"\"\"Computes the square root of x, using the Newton-Raphson method\"\"\"\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m x >= \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(x, \u001b[36mint\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m x > \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m x != \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " approx = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " guess = x / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mwhile\u001b[39;49;00m approx != guess:\u001b[37m\u001b[39;49;00m\n", " approx = guess\u001b[37m\u001b[39;49;00m\n", " guess = (approx + x / approx) / \u001b[34m2\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " return_value = approx\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m return_value <= x, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m return_value > \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(return_value, \u001b[36mfloat\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m return_value < x, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m return_value != \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m x > return_value, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m return_value >= \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m x >= return_value, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m approx\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(my_sqrt_def, '.py')" ] }, { "cell_type": "code", "execution_count": 216, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.633160Z", "iopub.status.busy": "2024-01-18T17:20:51.633051Z", "iopub.status.idle": "2024-01-18T17:20:51.634908Z", "shell.execute_reply": "2024-01-18T17:20:51.634626Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "exec(my_sqrt_def.replace('my_sqrt', 'my_sqrt_annotated'))" ] }, { "cell_type": "code", "execution_count": 217, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.636324Z", "iopub.status.busy": "2024-01-18T17:20:51.636228Z", "iopub.status.idle": "2024-01-18T17:20:51.637959Z", "shell.execute_reply": "2024-01-18T17:20:51.637708Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_78422/1337162511.py\", line 2, in \n", " my_sqrt_annotated(-1)\n", " File \"\", line 3, in my_sqrt_annotated\n", "AssertionError: violated precondition (expected)\n" ] } ], "source": [ "with ExpectError():\n", " my_sqrt_annotated(-1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "Here come some more examples:" ] }, { "cell_type": "code", "execution_count": 218, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.639457Z", "iopub.status.busy": "2024-01-18T17:20:51.639360Z", "iopub.status.idle": "2024-01-18T17:20:51.641037Z", "shell.execute_reply": "2024-01-18T17:20:51.640822Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "with EmbeddedInvariantAnnotator() as annotator:\n", " y = sum3(3, 4, 5)\n", " y = sum3(-3, -4, -5)\n", " y = sum3(0, 0, 0)" ] }, { "cell_type": "code", "execution_count": 219, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.642418Z", "iopub.status.busy": "2024-01-18T17:20:51.642338Z", "iopub.status.idle": "2024-01-18T17:20:51.680860Z", "shell.execute_reply": "2024-01-18T17:20:51.680514Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32msum3\u001b[39;49;00m(a, b, c):\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(a, \u001b[36mint\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(c, \u001b[36mint\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(b, \u001b[36mint\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " return_value = a + b + c\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(return_value, \u001b[36mint\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m a + b + c\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(annotator.functions_with_invariants(), '.py')" ] }, { "cell_type": "code", "execution_count": 220, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.682550Z", "iopub.status.busy": "2024-01-18T17:20:51.682438Z", "iopub.status.idle": "2024-01-18T17:20:51.722210Z", "shell.execute_reply": "2024-01-18T17:20:51.721932Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mlist_length\u001b[39;49;00m(L):\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m L != \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(L, \u001b[36mlist\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mif\u001b[39;49;00m L == []:\u001b[37m\u001b[39;49;00m\n", " length = \u001b[34m0\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34melse\u001b[39;49;00m:\u001b[37m\u001b[39;49;00m\n", " length = \u001b[34m1\u001b[39;49;00m + list_length(L[\u001b[34m1\u001b[39;49;00m:])\u001b[37m\u001b[39;49;00m\n", " return_value = length\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m return_value == \u001b[36mlen\u001b[39;49;00m(L), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(return_value, \u001b[36mint\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m return_value >= \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34mreturn\u001b[39;49;00m length\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "with EmbeddedInvariantAnnotator() as annotator:\n", " length = list_length([1, 2, 3])\n", "\n", "print_content(annotator.functions_with_invariants(), '.py')" ] }, { "cell_type": "code", "execution_count": 221, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.723766Z", "iopub.status.busy": "2024-01-18T17:20:51.723675Z", "iopub.status.idle": "2024-01-18T17:20:51.725675Z", "shell.execute_reply": "2024-01-18T17:20:51.725420Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "76\n" ] } ], "source": [ "with EmbeddedInvariantAnnotator() as annotator:\n", " print_sum(31, 45)" ] }, { "cell_type": "code", "execution_count": 222, "metadata": { "execution": { "iopub.execute_input": "2024-01-18T17:20:51.727209Z", "iopub.status.busy": "2024-01-18T17:20:51.727116Z", "iopub.status.idle": "2024-01-18T17:20:51.834994Z", "shell.execute_reply": "2024-01-18T17:20:51.834693Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mdef\u001b[39;49;00m \u001b[32mprint_sum\u001b[39;49;00m(a, b):\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m a > \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m a != \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(a, \u001b[36mint\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m b > \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m a <= b, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m a >= \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m b != \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m \u001b[36misinstance\u001b[39;49;00m(b, \u001b[36mint\u001b[39;49;00m), \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m b >= \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m a < b, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m b >= a, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m b > a, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated precondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[36mprint\u001b[39;49;00m(a + b)\u001b[37m\u001b[39;49;00m\n", " return_value = \u001b[34mNone\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", " \u001b[34massert\u001b[39;49;00m return_value != \u001b[34m0\u001b[39;49;00m, \u001b[33m'\u001b[39;49;00m\u001b[33mviolated postcondition\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(annotator.functions_with_invariants(), '.py')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "And we're done!" ] } ], "metadata": { "ipub": { "bibliography": "fuzzingbook.bib", "toc": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "toc-autonumbering": false }, "nbformat": 4, "nbformat_minor": 4 }