{ "cells": [ { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Asserting Expectations\n", "\n", "In the previous chapters on [tracing](Tracer.ipynb) and [interactive debugging](Debugger.ipynb), we have seen how to observe executions. By checking our observations against our expectations, we can find out when and how the program state is faulty. So far, we have assumed that this check would be done by _humans_ – that is, us. However, having this check done by a _computer_, for instance as part of the execution, is infinitely more rigorous and efficient. In this chapter, we introduce techniques to _specify_ our expectations and to check them at runtime, enabling us to detect faults _right as they occur_." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.279611Z", "iopub.status.busy": "2023-11-12T12:40:00.277814Z", "iopub.status.idle": "2023-11-12T12:40:00.359527Z", "shell.execute_reply": "2023-11-12T12:40:00.358191Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from bookutils import YouTubeVideo\n", "YouTubeVideo(\"9mI9sbKFkwU\")" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "**Prerequisites**\n", "\n", "* You should have read the [chapter on tracing executions](Tracer.ipynb)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "button": false, "execution": { "iopub.execute_input": "2023-11-12T12:40:00.388862Z", "iopub.status.busy": "2023-11-12T12:40:00.387424Z", "iopub.status.idle": "2023-11-12T12:40:00.391750Z", "shell.execute_reply": "2023-11-12T12:40:00.390945Z" }, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import bookutils.setup" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.406470Z", "iopub.status.busy": "2023-11-12T12:40:00.406316Z", "iopub.status.idle": "2023-11-12T12:40:00.413327Z", "shell.execute_reply": "2023-11-12T12:40:00.412355Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import quiz" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.419221Z", "iopub.status.busy": "2023-11-12T12:40:00.419075Z", "iopub.status.idle": "2023-11-12T12:40:00.539889Z", "shell.execute_reply": "2023-11-12T12:40:00.539581Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import Tracer" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Synopsis\n", "\n", "\n", "To [use the code provided in this chapter](Importing.ipynb), write\n", "\n", "```python\n", ">>> from debuggingbook.Assertions import \n", "```\n", "\n", "and then make use of the following features.\n", "\n", "\n", "This chapter discusses _assertions_ to define _assumptions_ on function inputs and results:\n", "\n", "```python\n", ">>> def my_square_root(x): # type: ignore\n", ">>> assert x >= 0\n", ">>> y = square_root(x)\n", ">>> assert math.isclose(y * y, x)\n", ">>> return y\n", "```\n", "Notably, assertions detect _violations_ of these assumptions at runtime:\n", "\n", "```python\n", ">>> with ExpectError():\n", ">>> y = my_square_root(-1)\n", "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/76616918.py\", line 2, in \n", " y = my_square_root(-1)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2617682038.py\", line 2, in my_square_root\n", " assert x >= 0\n", "AssertionError (expected)\n", "\n", "```\n", "_System assertions_ help to detect invalid memory operations.\n", "\n", "```python\n", ">>> managed_mem = ManagedMemory()\n", ">>> managed_mem\n", "```\n", "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Allocated| | | | | | | | | | |\n", "|Initialized| | | | | | | | | | |\n", "|Content|-1|0| | | | | | | | |\n", "\n", "```\n", ">>> with ExpectError():\n", ">>> x = managed_mem[2]\n", "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/1296110967.py\", line 2, in \n", " x = managed_mem[2]\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2465984283.py\", line 3, in __getitem__\n", " return self.read(address)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2898840933.py\", line 9, in read\n", " assert self.allocated[address], \\\n", "AssertionError: Reading from unallocated memory (expected)\n", "\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Introducing Assertions\n", "\n", "[Tracers](Tracer.ipynb) and [Interactive Debuggers](Debugger.ipynb) are very flexible tools that allow you to observe precisely what happens during a program execution. It is still _you_, however, who has to check program states and traces against your expectations. There is nothing wrong with that – except that checking hundreds of statements or variables can quickly become a pretty boring and tedious task.\n", "\n", "Processing and checking large amounts of data is actually precisely what _computers were invented for_. Hence, we should aim to _delegate such checking tasks to our computers_ as much as we can. This automates another essential part of debugging – maybe even _the_ most essential part." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Assertions\n", "\n", "The standard tool for having the computer check specific conditions at runtime is called an _assertion_. An assertion takes the form\n", "\n", "```python\n", "assert condition\n", "```\n", "\n", "and states that, at runtime, the computer should check that `condition` holds, e.g. evaluates to True. If the condition holds, then nothing happens:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.542153Z", "iopub.status.busy": "2023-11-12T12:40:00.541997Z", "iopub.status.idle": "2023-11-12T12:40:00.543787Z", "shell.execute_reply": "2023-11-12T12:40:00.543494Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "assert True" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If the condition evaluates to _False_, however, then the assertion _fails_, indicating an internal error." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.545347Z", "iopub.status.busy": "2023-11-12T12:40:00.545227Z", "iopub.status.idle": "2023-11-12T12:40:00.546876Z", "shell.execute_reply": "2023-11-12T12:40:00.546571Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ExpectError import ExpectError" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.548451Z", "iopub.status.busy": "2023-11-12T12:40:00.548333Z", "iopub.status.idle": "2023-11-12T12:40:00.550360Z", "shell.execute_reply": "2023-11-12T12:40:00.550064Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2715578531.py\", line 2, in \n", " assert False\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError():\n", " assert False" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A common usage for assertions is for _testing_. For instance, we can test a square root function as" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.551978Z", "iopub.status.busy": "2023-11-12T12:40:00.551857Z", "iopub.status.idle": "2023-11-12T12:40:00.553592Z", "shell.execute_reply": "2023-11-12T12:40:00.553301Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def test_square_root() -> None:\n", " assert square_root(4) == 2\n", " assert square_root(9) == 3\n", " ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "and `test_square_root()` will fail if `square_root()` returns a wrong value." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Assertions are available in all programming languages. You can even go and implement assertions yourself:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.555299Z", "iopub.status.busy": "2023-11-12T12:40:00.555108Z", "iopub.status.idle": "2023-11-12T12:40:00.557117Z", "shell.execute_reply": "2023-11-12T12:40:00.556760Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def my_own_assert(cond: bool) -> None:\n", " if not cond:\n", " raise AssertionError" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "... and get (almost) the same functionality:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.558919Z", "iopub.status.busy": "2023-11-12T12:40:00.558773Z", "iopub.status.idle": "2023-11-12T12:40:00.560705Z", "shell.execute_reply": "2023-11-12T12:40:00.560453Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/1450148856.py\", line 2, in \n", " my_own_assert(2 + 2 == 5)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/3374119957.py\", line 3, in my_own_assert\n", " raise AssertionError\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError():\n", " my_own_assert(2 + 2 == 5)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Assertion Diagnostics" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In most languages, _built-in assertions_ offer a bit more functionality than what can be obtained with self-defined functions. Most notably, built-in assertions \n", "\n", "* frequently tell _which condition_ failed (`2 + 2 == 5`)\n", "* frequently tell _where_ the assertion failed (`line 2`), and\n", "* are _optional_ – that is, they can be turned off to save computation time." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "C and C++, for instance, provide an `assert()` function that does all this:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.562226Z", "iopub.status.busy": "2023-11-12T12:40:00.562127Z", "iopub.status.idle": "2023-11-12T12:40:00.564139Z", "shell.execute_reply": "2023-11-12T12:40:00.563863Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# ignore\n", "open('testassert.c', 'w').write(r'''\n", "#include \n", "#include \"assert.h\"\n", "\n", "int main(int argc, char *argv[]) {\n", " assert(2 + 2 == 5);\n", " printf(\"Foo\\n\");\n", "}\n", "''');" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.565807Z", "iopub.status.busy": "2023-11-12T12:40:00.565685Z", "iopub.status.idle": "2023-11-12T12:40:00.567283Z", "shell.execute_reply": "2023-11-12T12:40:00.567042Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "from bookutils import print_content" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.568704Z", "iopub.status.busy": "2023-11-12T12:40:00.568604Z", "iopub.status.idle": "2023-11-12T12:40:00.620062Z", "shell.execute_reply": "2023-11-12T12:40:00.619776Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[36m#\u001b[39;49;00m\u001b[36minclude\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[37m\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[36m#\u001b[39;49;00m\u001b[36minclude\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[37m\"assert.h\"\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", "\u001b[36mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[32mmain\u001b[39;49;00m(\u001b[36mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00margc,\u001b[37m \u001b[39;49;00m\u001b[36mchar\u001b[39;49;00m\u001b[37m \u001b[39;49;00m*argv[])\u001b[37m \u001b[39;49;00m{\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00massert(\u001b[34m2\u001b[39;49;00m\u001b[37m \u001b[39;49;00m+\u001b[37m \u001b[39;49;00m\u001b[34m2\u001b[39;49;00m\u001b[37m \u001b[39;49;00m==\u001b[37m \u001b[39;49;00m\u001b[34m5\u001b[39;49;00m);\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00mprintf(\u001b[33m\"\u001b[39;49;00m\u001b[33mFoo\u001b[39;49;00m\u001b[33m\\n\u001b[39;49;00m\u001b[33m\"\u001b[39;49;00m);\u001b[37m\u001b[39;49;00m\n", "}\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(open('testassert.c').read(), '.h')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we compile this function and execute it, the assertion (expectedly) fails:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:00.621721Z", "iopub.status.busy": "2023-11-12T12:40:00.621599Z", "iopub.status.idle": "2023-11-12T12:40:01.030484Z", "shell.execute_reply": "2023-11-12T12:40:01.030033Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "!cc -g -o testassert testassert.c" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:01.032740Z", "iopub.status.busy": "2023-11-12T12:40:01.032520Z", "iopub.status.idle": "2023-11-12T12:40:01.308516Z", "shell.execute_reply": "2023-11-12T12:40:01.308064Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Assertion failed: (2 + 2 == 5), function main, file testassert.c, line 6.\r\n" ] } ], "source": [ "!./testassert" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "How would the C `assert()` function be able to report the condition and the current location? In fact, `assert()` is commonly implemented as a _macro_ that besides checking the condition, also turns it into a _string_ for a potential error message. Additional macros such as `__FILE__` and `__LINE__` expand into the current location and line, which can then all be used in the assertion error message." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A very simple definition of `assert()` that provides the above diagnostics looks like this:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:01.310769Z", "iopub.status.busy": "2023-11-12T12:40:01.310632Z", "iopub.status.idle": "2023-11-12T12:40:01.312945Z", "shell.execute_reply": "2023-11-12T12:40:01.312690Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# ignore\n", "open('assert.h', 'w').write(r'''\n", "#include \n", "#include \n", "\n", "#ifndef NDEBUG\n", "#define assert(cond) \\\n", " if (!(cond)) { \\\n", " fprintf(stderr, \"Assertion failed: %s, function %s, file %s, line %d\", \\\n", " #cond, __func__, __FILE__, __LINE__); \\\n", " exit(1); \\\n", " }\n", "#else\n", "#define assert(cond) ((void) 0)\n", "#endif\n", "''');" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:01.314519Z", "iopub.status.busy": "2023-11-12T12:40:01.314408Z", "iopub.status.idle": "2023-11-12T12:40:01.347012Z", "shell.execute_reply": "2023-11-12T12:40:01.346725Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[36m#\u001b[39;49;00m\u001b[36minclude\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[37m\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[36m#\u001b[39;49;00m\u001b[36minclude\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[37m\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", "\u001b[36m#\u001b[39;49;00m\u001b[36mifndef NDEBUG\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[36m#\u001b[39;49;00m\u001b[36mdefine assert(cond) \\\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[36m if (!(cond)) { \\\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[36m fprintf(stderr, \"Assertion failed: %s, function %s, file %s, line %d\", \\\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[36m #cond, __func__, __FILE__, __LINE__); \\\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[36m exit(1); \\\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[36m }\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[36m#\u001b[39;49;00m\u001b[36melse\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[36m#\u001b[39;49;00m\u001b[36mdefine assert(cond) ((void) 0)\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[36m#\u001b[39;49;00m\u001b[36mendif\u001b[39;49;00m\u001b[36m\u001b[39;49;00m" ] } ], "source": [ "print_content(open('assert.h').read(), '.h')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "(If you think that this is cryptic, you should have a look at an [_actual_ `` header file](https://github.com/bminor/glibc/blob/master/assert/assert.h).)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This header file reveals another important property of assertions – they can be _turned off_. In C and C++, defining the preprocessor variable `NDEBUG` (\"no debug\") turns off assertions, replacing them with a statement that does nothing. The `NDEBUG` variable can be set during compilation:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:01.349448Z", "iopub.status.busy": "2023-11-12T12:40:01.349311Z", "iopub.status.idle": "2023-11-12T12:40:01.576000Z", "shell.execute_reply": "2023-11-12T12:40:01.575484Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "!cc -DNDEBUG -g -o testassert testassert.c" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And, as you can see, the assertion has no effect anymore:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:01.578132Z", "iopub.status.busy": "2023-11-12T12:40:01.578004Z", "iopub.status.idle": "2023-11-12T12:40:01.862784Z", "shell.execute_reply": "2023-11-12T12:40:01.862322Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Foo\r\n" ] } ], "source": [ "!./testassert" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In Python, assertions can also be turned off, by invoking the `python` interpreter with the `-O` (\"optimize\") flag:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:01.864982Z", "iopub.status.busy": "2023-11-12T12:40:01.864852Z", "iopub.status.idle": "2023-11-12T12:40:02.010735Z", "shell.execute_reply": "2023-11-12T12:40:02.010326Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Traceback (most recent call last):\r\n", " File \"\", line 1, in \r\n", "AssertionError\r\n" ] } ], "source": [ "!python -c 'assert 2 + 2 == 5; print(\"Foo\")'" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.012788Z", "iopub.status.busy": "2023-11-12T12:40:02.012663Z", "iopub.status.idle": "2023-11-12T12:40:02.162102Z", "shell.execute_reply": "2023-11-12T12:40:02.161686Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Foo\r\n" ] } ], "source": [ "!python -O -c 'assert 2 + 2 == 5; print(\"Foo\")'" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In comparison, which language wins in the amount of assertion diagnostics? Have a look at the information Python provides. If, after defining `fun()` as" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.164216Z", "iopub.status.busy": "2023-11-12T12:40:02.164068Z", "iopub.status.idle": "2023-11-12T12:40:02.166139Z", "shell.execute_reply": "2023-11-12T12:40:02.165857Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def fun() -> None:\n", " assert 2 + 2 == 5" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.168036Z", "iopub.status.busy": "2023-11-12T12:40:02.167873Z", "iopub.status.idle": "2023-11-12T12:40:02.172983Z", "shell.execute_reply": "2023-11-12T12:40:02.172663Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

If we invoke fun() and the assertion fails, which information do we get?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"If we invoke `fun()` and the assertion fails,\"\n", " \" which information do we get?\",\n", " [\n", " \"The failing condition (`2 + 2 == 5`)\",\n", " \"The location of the assertion in the program\",\n", " \"The list of callers\",\n", " \"All of the above\"\n", " ], '123456789 % 5')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Indeed, a failed assertion (like any exception in Python) provides us with lots of debugging information, even including the source code:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.174810Z", "iopub.status.busy": "2023-11-12T12:40:02.174676Z", "iopub.status.idle": "2023-11-12T12:40:02.176794Z", "shell.execute_reply": "2023-11-12T12:40:02.176493Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2842303881.py\", line 2, in \n", " fun()\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/3649916634.py\", line 2, in fun\n", " assert 2 + 2 == 5\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError():\n", " fun()" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Checking Preconditions\n", "\n", "Assertions show their true power when they are not used in a test, but used in a program instead, because that is when they can check not only _one_ run, but actually _all_ runs." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The classic example for the use of assertions is a _square root_ program, implementing the function $\\sqrt{x}$. (Let's assume for a moment that the environment does not already have one.) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We want to ensure that `square_root()` is always called with correct arguments. For this purpose, we set up an assertion:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.178809Z", "iopub.status.busy": "2023-11-12T12:40:02.178686Z", "iopub.status.idle": "2023-11-12T12:40:02.180453Z", "shell.execute_reply": "2023-11-12T12:40:02.180163Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def square_root(x): # type: ignore\n", " assert x >= 0\n", " ... # compute square root in y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This assertion is called the _precondition_. A precondition is checked at the beginning of a function. It checks whether all the conditions for using the function are met. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "So, if we call `square_root()` with an bad argument, we will get an exception. This holds for _any_ call, ever." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.182066Z", "iopub.status.busy": "2023-11-12T12:40:02.181944Z", "iopub.status.idle": "2023-11-12T12:40:02.183874Z", "shell.execute_reply": "2023-11-12T12:40:02.183612Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/3162937341.py\", line 2, in \n", " square_root(-1)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/1081921329.py\", line 2, in square_root\n", " assert x >= 0\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError():\n", " square_root(-1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "For a dynamically typed language like Python, an assertion could actually also check that the argument has the correct type. For `square_root()`, we could ensure that `x` actually has a numeric type:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.185474Z", "iopub.status.busy": "2023-11-12T12:40:02.185361Z", "iopub.status.idle": "2023-11-12T12:40:02.187148Z", "shell.execute_reply": "2023-11-12T12:40:02.186900Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def square_root(x): # type: ignore\n", " assert isinstance(x, (int, float))\n", " assert x >= 0\n", " ... # compute square root in y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And while calls with the correct types just work..." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.188818Z", "iopub.status.busy": "2023-11-12T12:40:02.188575Z", "iopub.status.idle": "2023-11-12T12:40:02.190556Z", "shell.execute_reply": "2023-11-12T12:40:02.190175Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "square_root(4) " ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.192425Z", "iopub.status.busy": "2023-11-12T12:40:02.192270Z", "iopub.status.idle": "2023-11-12T12:40:02.194196Z", "shell.execute_reply": "2023-11-12T12:40:02.193912Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "square_root(4.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "... a call with an illegal type will raise a revealing diagnostic:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.195862Z", "iopub.status.busy": "2023-11-12T12:40:02.195749Z", "iopub.status.idle": "2023-11-12T12:40:02.197823Z", "shell.execute_reply": "2023-11-12T12:40:02.197489Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2953341793.py\", line 2, in \n", " square_root('4')\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/1840548601.py\", line 2, in square_root\n", " assert isinstance(x, (int, float))\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError():\n", " square_root('4')" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.199742Z", "iopub.status.busy": "2023-11-12T12:40:02.199604Z", "iopub.status.idle": "2023-11-12T12:40:02.203119Z", "shell.execute_reply": "2023-11-12T12:40:02.202820Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

If we did not check for the type of x, would the assertion x >= 0 still catch a bad call?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"If we did not check for the type of `x`, \"\n", " \"would the assertion `x >= 0` still catch a bad call?\",\n", " [\n", " \"Yes, since `>=` is only defined between numbers\",\n", " \"No, because an empty list or string would evaluate to 0\"\n", " ], '0b10 - 0b01')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Fortunately (for us Python users), the assertion `x >= 0` would already catch a number of invalid types, because (in contrast to, say, JavaScript), Python has no implicit conversion of strings or structures to integers:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.204639Z", "iopub.status.busy": "2023-11-12T12:40:02.204529Z", "iopub.status.idle": "2023-11-12T12:40:02.206249Z", "shell.execute_reply": "2023-11-12T12:40:02.205996Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2976301596.py\", line 2, in \n", " '4' >= 0 # type: ignore\n", "TypeError: '>=' not supported between instances of 'str' and 'int' (expected)\n" ] } ], "source": [ "with ExpectError():\n", " '4' >= 0 # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Checking Results\n", "\n", "While a precondition ensures that the _argument_ to a function is correct, a _postcondition_ checks the _result_ of this very function (assuming the precondition held in the first place). For our `square_root()` function, we can check that the result $y = \\sqrt{x}$ is correct by checking that $y^2 = x$ holds:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.209270Z", "iopub.status.busy": "2023-11-12T12:40:02.209124Z", "iopub.status.idle": "2023-11-12T12:40:02.211781Z", "shell.execute_reply": "2023-11-12T12:40:02.211365Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def square_root(x): # type: ignore\n", " assert x >= 0\n", " ... # compute square root in y\n", " assert y * y == x" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In practice, we might encounter problems with this assertion. What might these be?" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.213792Z", "iopub.status.busy": "2023-11-12T12:40:02.213624Z", "iopub.status.idle": "2023-11-12T12:40:02.217336Z", "shell.execute_reply": "2023-11-12T12:40:02.217051Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

Why could the assertion fail despite square_root() being correct?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"Why could the assertion fail despite `square_root()` being correct?\",\n", " [\n", " \"We need to compute `y ** 2`, not `y * y`\",\n", " \"We may encounter rounding errors\",\n", " \"The value of `x` may have changed during computation\",\n", " \"The interpreter / compiler may be buggy\"\n", " ], '0b110011 - 0o61')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Technically speaking, there could be many things that _also_ could cause the assertion to fail (cosmic radiation, operating system bugs, secret service bugs, anything) – but the by far most important reason is indeed rounding errors. Here's a simple example, using the Python built-in square root function:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.218900Z", "iopub.status.busy": "2023-11-12T12:40:02.218780Z", "iopub.status.idle": "2023-11-12T12:40:02.220437Z", "shell.execute_reply": "2023-11-12T12:40:02.220163Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import math" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.221845Z", "iopub.status.busy": "2023-11-12T12:40:02.221750Z", "iopub.status.idle": "2023-11-12T12:40:02.223953Z", "shell.execute_reply": "2023-11-12T12:40:02.223670Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2.0000000000000004" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "math.sqrt(2.0) * math.sqrt(2.0)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.225844Z", "iopub.status.busy": "2023-11-12T12:40:02.225702Z", "iopub.status.idle": "2023-11-12T12:40:02.228113Z", "shell.execute_reply": "2023-11-12T12:40:02.227841Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "math.sqrt(2.0) * math.sqrt(2.0) == 2.0" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If you want to compare two floating-point values, you need to provide an _epsilon value_ denoting the margin of error." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.229703Z", "iopub.status.busy": "2023-11-12T12:40:02.229612Z", "iopub.status.idle": "2023-11-12T12:40:02.231520Z", "shell.execute_reply": "2023-11-12T12:40:02.231263Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def square_root(x): # type: ignore\n", " assert x >= 0\n", " ... # compute square root in y\n", " epsilon = 0.000001\n", " assert abs(y * y - x) < epsilon" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In Python, the function `math.isclose(x, y)` also does the job, by default ensuring that the two values are the same within about 9 decimal digits:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.233144Z", "iopub.status.busy": "2023-11-12T12:40:02.233014Z", "iopub.status.idle": "2023-11-12T12:40:02.235362Z", "shell.execute_reply": "2023-11-12T12:40:02.235098Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "math.isclose(math.sqrt(2.0) * math.sqrt(2.0), 2.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "So let's use `math.isclose()` for our revised postcondition:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.236958Z", "iopub.status.busy": "2023-11-12T12:40:02.236843Z", "iopub.status.idle": "2023-11-12T12:40:02.238596Z", "shell.execute_reply": "2023-11-12T12:40:02.238331Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def square_root(x): # type: ignore\n", " assert x >= 0\n", " ... # compute square root in y\n", " assert math.isclose(y * y, x)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us try out this postcondition by using an actual implementation. The [Newton–Raphson method](https://en.wikipedia.org/wiki/Newton%27s_method) is an efficient way to compute square roots:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.240015Z", "iopub.status.busy": "2023-11-12T12:40:02.239912Z", "iopub.status.idle": "2023-11-12T12:40:02.241881Z", "shell.execute_reply": "2023-11-12T12:40:02.241591Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def square_root(x): # type: ignore\n", " assert x >= 0 # precondition\n", "\n", " approx = None\n", " guess = x / 2\n", " while approx != guess:\n", " approx = guess\n", " guess = (approx + x / approx) / 2\n", "\n", " assert math.isclose(approx * approx, x)\n", " return approx" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Apparently, this implementation does the job:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.243521Z", "iopub.status.busy": "2023-11-12T12:40:02.243350Z", "iopub.status.idle": "2023-11-12T12:40:02.245333Z", "shell.execute_reply": "2023-11-12T12:40:02.245069Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "square_root(4.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "However, it is not just this call that produces the correct result – _all_ calls will produce the correct result. (If the postcondition assertion does not fail, that is.) So, a call like" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.246812Z", "iopub.status.busy": "2023-11-12T12:40:02.246625Z", "iopub.status.idle": "2023-11-12T12:40:02.248671Z", "shell.execute_reply": "2023-11-12T12:40:02.248355Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "111.1080555135405" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "square_root(12345.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "does not require us to _manually_ check the result – the postcondition assertion already has done that for us, and will continue to do so forever." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Assertions and Tests\n", "\n", "Having assertions right in the code gives us an easy means to _test_ it – if we can feed sufficiently many inputs into the code without the postcondition ever failing, we can increase our confidence. Let us try this out with our `square_root()` function:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.250399Z", "iopub.status.busy": "2023-11-12T12:40:02.250292Z", "iopub.status.idle": "2023-11-12T12:40:02.260210Z", "shell.execute_reply": "2023-11-12T12:40:02.259924Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "for x in range(1, 10000):\n", " y = square_root(x)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Note again that we do not have to check the value of `y` – the `square_root()` postcondition already did that for us." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Instead of enumerating input values, we could also use random (non-negative) numbers; even totally random numbers could work if we filter out those tests where the precondition already fails. If you are interested in such _test generation techniques_, the [Fuzzing Book](fuzzingbook.org) is a great reference for you." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Modern program verification tools even can _prove_ that your program will always meet its assertions. But for all this, you need to have _explicit_ and _formal_ assertions in the first place." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For those interested in testing and verification, here is a quiz for you:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.261907Z", "iopub.status.busy": "2023-11-12T12:40:02.261792Z", "iopub.status.idle": "2023-11-12T12:40:02.265077Z", "shell.execute_reply": "2023-11-12T12:40:02.264774Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

Is there a value for x that satisfies the precondition, but fails the postcondition?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"Is there a value for x that satisfies the precondition, \"\n", " \"but fails the postcondition?\",\n", " [ \n", " \"Yes\",\n", " \"No\"\n", " ], 'int(\"Y\" in \"Yes\")')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This is indeed something a test generator or program verifier might be able to find with _zero_ effort." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Partial Checks\n", "\n", "In the case of `square_root()`, our postcondition is _total_ – if it passes, then the result is correct (within the `epsilon` boundaries, that is). In practice, however, it is not always easy to provide such a total check. As an example, consider our `remove_html_markup()` function from the [Introduction to Debugging](Intro_Debugging.ipynb):" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.266653Z", "iopub.status.busy": "2023-11-12T12:40:02.266545Z", "iopub.status.idle": "2023-11-12T12:40:02.268798Z", "shell.execute_reply": "2023-11-12T12:40:02.268511Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def remove_html_markup(s): # type: ignore\n", " tag = False\n", " quote = False\n", " out = \"\"\n", "\n", " for c in s:\n", " if c == '<' and not quote:\n", " tag = True\n", " elif c == '>' and not quote:\n", " tag = False\n", " elif c == '\"' or c == \"'\" and tag:\n", " quote = not quote\n", " elif not tag:\n", " out = out + c\n", "\n", " return out" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.270344Z", "iopub.status.busy": "2023-11-12T12:40:02.270228Z", "iopub.status.idle": "2023-11-12T12:40:02.272365Z", "shell.execute_reply": "2023-11-12T12:40:02.272023Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "'I am a text with HTML markup'" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "remove_html_markup(\"I am a text with HTML markup\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The precondition for `remove_html_markup()` is trivial – it accepts any string. (Strictly speaking, a precondition `assert isinstance(s, str)` could prevent it from being called with some other collection such as a list.)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The challenge, however, is the _postcondition_. How do we check that `remove_html_markup()` produces the correct result?\n", "\n", "* We could check it against some other implementation that removes HTML markup – but if we already do have such a \"golden\" implementation, why bother implementing it again?\n", "\n", "* After a change, we could also check it against some earlier version to prevent _regression_ – that is, losing functionality that was there before. But how would we know the earlier version was correct? (And if it was, why change it?)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If we do not aim for ensuring full correctness, our postcondition can also check for _partial properties_. For instance, a postcondition for `remove_html_markup()` may simply ensure that the result no longer contains any markup:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.274200Z", "iopub.status.busy": "2023-11-12T12:40:02.274096Z", "iopub.status.idle": "2023-11-12T12:40:02.276664Z", "shell.execute_reply": "2023-11-12T12:40:02.276368Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def remove_html_markup(s): # type: ignore\n", " tag = False\n", " quote = False\n", " out = \"\"\n", "\n", " for c in s:\n", " if c == '<' and not quote:\n", " tag = True\n", " elif c == '>' and not quote:\n", " tag = False\n", " elif c == '\"' or c == \"'\" and tag:\n", " quote = not quote\n", " elif not tag:\n", " out = out + c\n", "\n", " # postcondition\n", " assert '<' not in out and '>' not in out\n", "\n", " return out" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Besides doing a good job at checking results, the postcondition also does a good job in documenting what `remove_html_markup()` actually does." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.278376Z", "iopub.status.busy": "2023-11-12T12:40:02.278263Z", "iopub.status.idle": "2023-11-12T12:40:02.281895Z", "shell.execute_reply": "2023-11-12T12:40:02.281592Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

Which of these inputs causes the assertion to fail?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"Which of these inputs causes the assertion to fail?\",\n", " [\n", " '`bar`',\n", " '`\"foo\"`',\n", " '`>foo<`',\n", " '`\"x > y\"`'\n", " ], '1 + 1 -(-1) + (1 * -1) + 1 ** (1 - 1) + 1')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Indeed. Our (partial) assertion does _not_ detect this error:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.283777Z", "iopub.status.busy": "2023-11-12T12:40:02.283677Z", "iopub.status.idle": "2023-11-12T12:40:02.285863Z", "shell.execute_reply": "2023-11-12T12:40:02.285620Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'foo'" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "remove_html_markup('\"foo\"')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "But it detects this one:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.287348Z", "iopub.status.busy": "2023-11-12T12:40:02.287268Z", "iopub.status.idle": "2023-11-12T12:40:02.289178Z", "shell.execute_reply": "2023-11-12T12:40:02.288920Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/1913183346.py\", line 2, in \n", " remove_html_markup('\"x > y\"')\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2717035104.py\", line 17, in remove_html_markup\n", " assert '<' not in out and '>' not in out\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError():\n", " remove_html_markup('\"x > y\"')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Assertions and Documentation\n", "\n", "In contrast to \"standard\" documentation such as\n", "\n", "> `square_root()` expects a non-negative number `x`; its result is $\\sqrt{x}$.\n", "\n", "assertions have a big advantage: They are _formal_ – and thus have an unambiguous semantics. Notably, we can understand what a function does _uniquely by reading its pre- and postconditions_. Here is an example:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.290671Z", "iopub.status.busy": "2023-11-12T12:40:02.290580Z", "iopub.status.idle": "2023-11-12T12:40:02.292784Z", "shell.execute_reply": "2023-11-12T12:40:02.292485Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def some_obscure_function(x: int, y: int, z: int) -> int:\n", " result = int(...) # type: ignore\n", " assert x == y == z or result > min(x, y, z)\n", " assert x == y == z or result < max(x, y, z)\n", " return result" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.294370Z", "iopub.status.busy": "2023-11-12T12:40:02.294282Z", "iopub.status.idle": "2023-11-12T12:40:02.297678Z", "shell.execute_reply": "2023-11-12T12:40:02.297398Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

What does this function do?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"What does this function do?\",\n", " [\n", " \"It returns the minimum value out of `x`, `y`, `z`\",\n", " \"It returns the middle value out of `x`, `y`, `z`\",\n", " \"It returns the maximum value out of `x`, `y`, `z`\",\n", " ], 'int(0.5 ** math.cos(math.pi))', globals())" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Indeed, this would be a useful (and bug-revealing!) postcondition for one of our showcase functions in the [chapter on statistical debugging](StatisticalDebugger.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Using Assertions to Trivially Locate Defects\n", "\n", "The final benefit of assertions, and possibly even the most important in the context of this book, is _how much assertions help_ with locating defects.\n", "Indeed, with proper assertions, it is almost trivial to locate the one function that is responsible for a failure." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Consider the following situation. Assume I have\n", "* a function `f()` whose precondition is satisfied, calling\n", "* a function `g()` whose precondition is violated and raises an exception." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.299134Z", "iopub.status.busy": "2023-11-12T12:40:02.299048Z", "iopub.status.idle": "2023-11-12T12:40:02.302878Z", "shell.execute_reply": "2023-11-12T12:40:02.302612Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", "
\n", "

Quiz

\n", "

\n", "

Which function is faulty here?
\n", "

\n", "

\n", "

\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", " \n", "
\n", " \n", "
\n", "

\n", " \n", " \n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "quiz(\"Which function is faulty here?\",\n", " [\n", " \"`g()` because it raises an exception\",\n", " \"`f()`\"\n", " \" because it violates the precondition of `g()`\",\n", " \"Both `f()` and `g()`\"\n", " \" because they are incompatible\",\n", " \"None of the above\"\n", " ], 'math.factorial(int(math.tau / math.pi))', globals())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The rule is very simple: If some function `func()` is called with its preconditions satisfied, and the postcondition of `func()` fail, then the fault in the program state must have originated at some point between these two events. Assuming that all functions called by `func()` also are correct (because their postconditions held), the defect _can only be in the code of `func()`._" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "What pre- and postconditions imply is actually often called a _contract_ between caller and callee:\n", "\n", "* The caller promises to satisfy the _precondition_ of the callee, \n", "whereas\n", "* the callee promises to satisfy its own _postcondition_, delivering a correct result.\n", "\n", "In the above setting, `f()` is the caller, and `g()` is the callee; but as `f()` violates the precondition of `g()`, it has not kept its promises. Hence, `f()` violates the contract and is at fault. `f()` thus needs to be fixed." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Checking Data Structures\n", "\n", "Let us get back to debugging. In debugging, assertions serve two purposes: \n", "\n", "* They immediately detect bugs (if they fail)\n", "* They immediately rule out _specific parts of code and state_ (if they pass)\n", "\n", "This latter part is particularly interesting, as it allows us to focus our search on the lesser checked aspects of code and state. \n", "\n", "When we say \"code _and_ state\", what do we mean? Actually, assertions can not quickly check several executions of a function, but also _large amounts of data_, detecting faults in data _at the moment they are introduced_." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Times and Time Bombs" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us illustrate this by an example. Let's assume we want a `Time` class that represents the time of day. Its constructor takes the current time using hours, minutes, and seconds. (Note that this is a deliberately simple example – real-world classes for representing time are way more complex.)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.304528Z", "iopub.status.busy": "2023-11-12T12:40:02.304436Z", "iopub.status.idle": "2023-11-12T12:40:02.306249Z", "shell.execute_reply": "2023-11-12T12:40:02.305987Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Time:\n", " def __init__(self, hours: int = 0, minutes: int = 0, seconds: int = 0) -> None:\n", " self._hours = hours\n", " self._minutes = minutes\n", " self._seconds = seconds" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To access the individual elements, we introduce a few getters:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.307827Z", "iopub.status.busy": "2023-11-12T12:40:02.307727Z", "iopub.status.idle": "2023-11-12T12:40:02.309952Z", "shell.execute_reply": "2023-11-12T12:40:02.309658Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Time(Time):\n", " def hours(self) -> int:\n", " return self._hours\n", "\n", " def minutes(self) -> int:\n", " return self._minutes\n", "\n", " def seconds(self) -> int:\n", " return self._seconds" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We allow printing out time, using the [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601):" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.311693Z", "iopub.status.busy": "2023-11-12T12:40:02.311590Z", "iopub.status.idle": "2023-11-12T12:40:02.313572Z", "shell.execute_reply": "2023-11-12T12:40:02.313296Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Time(Time):\n", " def __repr__(self) -> str:\n", " return f\"{self.hours():02}:{self.minutes():02}:{self.seconds():02}\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Three minutes to midnight can thus be represented as" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.315105Z", "iopub.status.busy": "2023-11-12T12:40:02.314990Z", "iopub.status.idle": "2023-11-12T12:40:02.317174Z", "shell.execute_reply": "2023-11-12T12:40:02.316859Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "23:57:00" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t = Time(23, 57, 0)\n", "t" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Unfortunately, there's nothing in our `Time` class that prevents blatant misuse. We can easily set up a time with negative numbers, for instance:" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.320056Z", "iopub.status.busy": "2023-11-12T12:40:02.319907Z", "iopub.status.idle": "2023-11-12T12:40:02.322384Z", "shell.execute_reply": "2023-11-12T12:40:02.322053Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "-1:00:00" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t = Time(-1, 0, 0)\n", "t" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Such a thing _may_ have some semantics (relative time, maybe?), but it's not exactly conforming to ISO format." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Even worse, we can even _construct_ a `Time` object with strings as numbers." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.324264Z", "iopub.status.busy": "2023-11-12T12:40:02.324105Z", "iopub.status.idle": "2023-11-12T12:40:02.326121Z", "shell.execute_reply": "2023-11-12T12:40:02.325759Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "t = Time(\"High noon\") # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "and this will be included verbatim the moment we try to print it:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.328004Z", "iopub.status.busy": "2023-11-12T12:40:02.327835Z", "iopub.status.idle": "2023-11-12T12:40:02.329738Z", "shell.execute_reply": "2023-11-12T12:40:02.329445Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "High noon:00:00\n" ] } ], "source": [ "with ExpectError(): # This fails in Python 3.9\n", " print(t)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For now, everything will be fine - but what happens when some other program tries to parse this time? Or processes a log file with a timestamp like this?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In fact, what we have here is a _time bomb_ – a fault in the program state that can sleep for ages until someone steps on it. These are hard to debug, because one has to figure out when the time bomb was set – which can be thousands or millions of lines earlier in the program. Since in the absence of type checking, _any_ assignment to a `Time` object could be the culprit – so good luck with the search." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This is again where assertions save the day. What you need is an _assertion that checks whether the data is correct_. For instance, we could revise our constructor such that it checks for correct arguments:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.331754Z", "iopub.status.busy": "2023-11-12T12:40:02.331644Z", "iopub.status.idle": "2023-11-12T12:40:02.333734Z", "shell.execute_reply": "2023-11-12T12:40:02.333486Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Time(Time):\n", " def __init__(self, hours: int = 0, minutes: int = 0, seconds: int = 0) -> None:\n", " assert 0 <= hours <= 23\n", " assert 0 <= minutes <= 59\n", " assert 0 <= seconds <= 60 # Includes leap seconds (ISO8601)\n", "\n", " self._hours = hours\n", " self._minutes = minutes\n", " self._seconds = seconds" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "These conditions check whether `hours`, `minutes`, and `seconds` are within the right range. They are called _data invariants_ (or short _invariants_) because they hold for the given data (notably, the internal attributes) at all times.\n", "\n", "Note the unusual syntax for range checks (this is a Python special), and the fact that seconds can range from 0 to 60. That's because there's not only leap years, but also leap seconds." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With this revised constructor, we now get errors as soon as we pass an invalid parameter:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.335340Z", "iopub.status.busy": "2023-11-12T12:40:02.335233Z", "iopub.status.idle": "2023-11-12T12:40:02.337024Z", "shell.execute_reply": "2023-11-12T12:40:02.336803Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/283236387.py\", line 2, in \n", " t = Time(-23, 0, 0)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/559831374.py\", line 3, in __init__\n", " assert 0 <= hours <= 23\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError():\n", " t = Time(-23, 0, 0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Hence, _any_ attempt to set _any_ Time object to an illegal state will be immediately detected. In other words, the time bomb defuses itself at the moment it is being set." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This means that when we are debugging, and search for potential faults in the state that could have caused the current failure, we can now rule out `Time` as a culprit, allowing us to focus on other parts of the state." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The more of the state we have checked with invariants,\n", "\n", "* the less state we have to examine, \n", "* the fewer possible causes we have to investigate,\n", "* the faster we are done with determining the defect." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Invariant Checkers\n", "\n", "For invariants to be effective, they have to be checked at all times. If we introduce a method that changes the state, then this method will also have to ensure that the invariant is satisfied:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.339037Z", "iopub.status.busy": "2023-11-12T12:40:02.338922Z", "iopub.status.idle": "2023-11-12T12:40:02.340785Z", "shell.execute_reply": "2023-11-12T12:40:02.340474Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Time(Time):\n", " def set_hours(self, hours: int) -> None:\n", " assert 0 <= hours <= 23\n", " self._hours = hours" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This also implies that state changes should go through methods, not direct accesses to attributes. If some code changes the attributes of your object directly, without going through the method that could check for consistency, then it will be much harder for you to a) detect the source of the problem and b) even detect that a problem exists." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Excursion: Checked Getters and Setters in Python" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In Python, the `@property` decorator offers a handy way to implement checkers, even for otherwise direct accesses to attributes. It allows defining specific \"getter\" and \"setter\" functions for individual properties that would even be invoked when a (seemingly) attribute is accessed." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Using `@property`, our `Time` class could look like this:" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.342923Z", "iopub.status.busy": "2023-11-12T12:40:02.342800Z", "iopub.status.idle": "2023-11-12T12:40:02.344783Z", "shell.execute_reply": "2023-11-12T12:40:02.344488Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class MyTime(Time):\n", " @property # type: ignore\n", " def hours(self) -> int:\n", " return self._hours\n", "\n", " @hours.setter\n", " def hours(self, new_hours: int) -> None:\n", " assert 0 <= new_hours <= 23\n", " self._hours = new_hours" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To access the current hour, we no longer go through a specific \"getter\" function; instead, we access a synthesized attribute that – behind the scenes – invokes the \"getter\" function marked with `@property`:" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.346332Z", "iopub.status.busy": "2023-11-12T12:40:02.346224Z", "iopub.status.idle": "2023-11-12T12:40:02.348523Z", "shell.execute_reply": "2023-11-12T12:40:02.348203Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "11" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_time = MyTime(11, 30, 0)\n", "my_time.hours" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If we \"assign\" to the attribute, the \"setter\" function is called in the background;" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.350119Z", "iopub.status.busy": "2023-11-12T12:40:02.350009Z", "iopub.status.idle": "2023-11-12T12:40:02.351556Z", "shell.execute_reply": "2023-11-12T12:40:02.351284Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "my_time.hours = 12 # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see this immediately when trying to assign an illegal value:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.353139Z", "iopub.status.busy": "2023-11-12T12:40:02.353028Z", "iopub.status.idle": "2023-11-12T12:40:02.354930Z", "shell.execute_reply": "2023-11-12T12:40:02.354668Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/91299477.py\", line 2, in \n", " my_time.hours = 25 # type: ignore\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2658705698.py\", line 8, in hours\n", " assert 0 <= new_hours <= 23\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError():\n", " my_time.hours = 25 # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If you build large infrastructures in Python, you can use these features to implement \n", "\n", "* attributes that are _checked_ every time they are accessed or changed; \n", "* attributes that are easier to remember than a large slew of getter and setter functions.\n", "\n", "In this book, we do not have that many attributes, and we try to use not too many Python-specific features, so we usually go without `@property`. But for Python aficionados, and especially those who care about runtime checks, checked property accesses are a boon." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we have several methods that can alter an object, it can be helpful to factor out invariant checking into its own method. Such a method can also be called to check for inconsistencies that might have been introduced without going through one of the methods – e.g. by direct object access, memory manipulation, or memory corruption." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "By convention, methods that check invariants have the name `repOK()`, since they check whether the internal representation is okay, and return True if so." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here's a `repOK()` method for `Time`:" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.356699Z", "iopub.status.busy": "2023-11-12T12:40:02.356577Z", "iopub.status.idle": "2023-11-12T12:40:02.358517Z", "shell.execute_reply": "2023-11-12T12:40:02.358230Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Time(Time):\n", " def repOK(self) -> bool:\n", " assert 0 <= self.hours() <= 23\n", " assert 0 <= self.minutes() <= 59\n", " assert 0 <= self.seconds() <= 60\n", " return True" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can integrate this method right into our constructor and our setter:" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.360200Z", "iopub.status.busy": "2023-11-12T12:40:02.360074Z", "iopub.status.idle": "2023-11-12T12:40:02.362263Z", "shell.execute_reply": "2023-11-12T12:40:02.362013Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Time(Time):\n", " def __init__(self, hours: int = 0, minutes: int = 0, seconds: int = 0) -> None:\n", " self._hours = hours\n", " self._minutes = minutes\n", " self._seconds = seconds\n", " assert self.repOK()\n", "\n", " def set_hours(self, hours: int) -> None:\n", " self._hours = hours\n", " assert self.repOK()" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.363670Z", "iopub.status.busy": "2023-11-12T12:40:02.363569Z", "iopub.status.idle": "2023-11-12T12:40:02.365495Z", "shell.execute_reply": "2023-11-12T12:40:02.365237Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/283236387.py\", line 2, in \n", " t = Time(-23, 0, 0)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/346828762.py\", line 6, in __init__\n", " assert self.repOK()\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/478898990.py\", line 3, in repOK\n", " assert 0 <= self.hours() <= 23\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError():\n", " t = Time(-23, 0, 0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Having a single method that checks everything can be beneficial, as it may explicitly check for more faulty states. For instance, it is still permissible to pass a floating-point number for hours and minutes, again breaking the Time representation:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.367042Z", "iopub.status.busy": "2023-11-12T12:40:02.366926Z", "iopub.status.idle": "2023-11-12T12:40:02.369108Z", "shell.execute_reply": "2023-11-12T12:40:02.368790Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "1.5:00:00" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Time(1.5) # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "(Strictly speaking, ISO 8601 _does_ allow fractional parts for seconds and even for hours and minutes – but still wants two leading digits before the fraction separator. Plus, the comma is the \"preferred\" fraction separator. In short, you won't be making too many friends using times formatted like the one above.)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can extend our `repOK()` method to check for correct types, too." ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.370810Z", "iopub.status.busy": "2023-11-12T12:40:02.370703Z", "iopub.status.idle": "2023-11-12T12:40:02.372823Z", "shell.execute_reply": "2023-11-12T12:40:02.372577Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Time(Time):\n", " def repOK(self) -> bool:\n", " assert isinstance(self.hours(), int)\n", " assert isinstance(self.minutes(), int)\n", " assert isinstance(self.seconds(), int)\n", "\n", " assert 0 <= self.hours() <= 23\n", " assert 0 <= self.minutes() <= 59\n", " assert 0 <= self.seconds() <= 60\n", " return True" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.374541Z", "iopub.status.busy": "2023-11-12T12:40:02.374411Z", "iopub.status.idle": "2023-11-12T12:40:02.376825Z", "shell.execute_reply": "2023-11-12T12:40:02.376532Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "14:00:00" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Time(14, 0, 0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This now also catches other type errors:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.378611Z", "iopub.status.busy": "2023-11-12T12:40:02.378481Z", "iopub.status.idle": "2023-11-12T12:40:02.380466Z", "shell.execute_reply": "2023-11-12T12:40:02.380230Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/119323666.py\", line 2, in \n", " t = Time(\"After midnight\") # type: ignore\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/346828762.py\", line 6, in __init__\n", " assert self.repOK()\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/1587012707.py\", line 3, in repOK\n", " assert isinstance(self.hours(), int)\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError():\n", " t = Time(\"After midnight\") # type: ignore" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Our `repOK()` method can also be used in combination with pre- and postconditions. Typically, you'd like to make it part of the pre- and postcondition checks." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Assume you want to implement an `advance()` method that adds a number of seconds to the current time. The preconditions and postconditions can be easily defined:" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.382018Z", "iopub.status.busy": "2023-11-12T12:40:02.381915Z", "iopub.status.idle": "2023-11-12T12:40:02.384067Z", "shell.execute_reply": "2023-11-12T12:40:02.383822Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Time(Time):\n", " def seconds_since_midnight(self) -> int:\n", " return self.hours() * 3600 + self.minutes() * 60 + self.seconds()\n", "\n", " def advance(self, seconds_offset: int) -> None:\n", " old_seconds = self.seconds_since_midnight()\n", "\n", " ... # Advance the clock\n", "\n", " assert (self.seconds_since_midnight() ==\n", " (old_seconds + seconds_offset) % (24 * 60 * 60))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "But you'd really like `advance()` to check the state before _and_ after its execution – again using `repOK()`:" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.385628Z", "iopub.status.busy": "2023-11-12T12:40:02.385497Z", "iopub.status.idle": "2023-11-12T12:40:02.387634Z", "shell.execute_reply": "2023-11-12T12:40:02.387358Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class BetterTime(Time):\n", " def advance(self, seconds_offset: int) -> None:\n", " assert self.repOK()\n", " old_seconds = self.seconds_since_midnight()\n", "\n", " ... # Advance the clock\n", "\n", " assert (self.seconds_since_midnight() ==\n", " (old_seconds + seconds_offset) % (24 * 60 * 60))\n", " assert self.repOK()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The first postcondition ensures that `advance()` produces the desired result; the second one ensures that the internal state is still okay." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Large Data Structures\n", "\n", "Invariants are especially useful if you have a large, complex data structure which is very hard to track in a conventional debugger." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let's assume you have a [red-black search tree](https://en.wikipedia.org/wiki/Red–black_tree) for storing and searching data. Red-black trees are among the most efficient data structures for representing associative arrays (also known as mappings); they are self-balancing and guarantee search, insertion, and deletion in logarithmic time. They also are among the most ugly to debug." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "What is a red-black tree? Here is an example from Wikipedia:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "![Red-Black Tree](https://upload.wikimedia.org/wikipedia/commons/6/66/Red-black_tree_example.svg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As you can see, there are red nodes and black nodes (giving the tree its name). We can define a class `RedBlackTrees` and implement all the necessary operations." ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.389429Z", "iopub.status.busy": "2023-11-12T12:40:02.389193Z", "iopub.status.idle": "2023-11-12T12:40:02.391060Z", "shell.execute_reply": "2023-11-12T12:40:02.390790Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class RedBlackTree:\n", " RED = 'red'\n", " BLACK = 'black'\n", " ..." ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.392740Z", "iopub.status.busy": "2023-11-12T12:40:02.392616Z", "iopub.status.idle": "2023-11-12T12:40:02.394482Z", "shell.execute_reply": "2023-11-12T12:40:02.394236Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "\n", "# A few dummy methods to make the static type checker happy. Ignore.\n", "class RedBlackNode:\n", " def __init__(self) -> None:\n", " self.parent = None\n", " self.color = RedBlackTree.BLACK\n", " pass" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.395974Z", "iopub.status.busy": "2023-11-12T12:40:02.395870Z", "iopub.status.idle": "2023-11-12T12:40:02.398183Z", "shell.execute_reply": "2023-11-12T12:40:02.397832Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# ignore\n", "\n", "# More dummy methods to make the static type checker happy. Ignore.\n", "class RedBlackTree(RedBlackTree):\n", " def redNodesHaveOnlyBlackChildren(self) -> bool:\n", " return True\n", "\n", " def equalNumberOfBlackNodesOnSubtrees(self) -> bool:\n", " return True\n", "\n", " def treeIsAcyclic(self) -> bool:\n", " return True\n", "\n", " def parentsAreConsistent(self) -> bool:\n", " return True\n", "\n", " def __init__(self) -> None:\n", " self._root = RedBlackNode()\n", " self._root.parent = None\n", " self._root.color = self.BLACK" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "However, before we start coding, it would be a good idea to _first_ reason about the invariants of a red-black tree. Indeed, a red-black tree has a number of important properties that hold at all times – for instance, that the root node be black or that the tree be balanced. When we implement a red-black tree, these _invariants_ can be encoded into a `repOK()` method:" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.419709Z", "iopub.status.busy": "2023-11-12T12:40:02.419559Z", "iopub.status.idle": "2023-11-12T12:40:02.421886Z", "shell.execute_reply": "2023-11-12T12:40:02.421566Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class RedBlackTree(RedBlackTree):\n", " def repOK(self) -> bool:\n", " assert self.rootHasNoParent()\n", " assert self.rootIsBlack()\n", " assert self.redNodesHaveOnlyBlackChildren()\n", " assert self.equalNumberOfBlackNodesOnSubtrees()\n", " assert self.treeIsAcyclic()\n", " assert self.parentsAreConsistent()\n", " return True" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Each of these helper methods are checkers in their own right:" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.423460Z", "iopub.status.busy": "2023-11-12T12:40:02.423354Z", "iopub.status.idle": "2023-11-12T12:40:02.425394Z", "shell.execute_reply": "2023-11-12T12:40:02.425065Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class RedBlackTree(RedBlackTree):\n", " def rootHasNoParent(self) -> bool:\n", " return self._root.parent is None\n", "\n", " def rootIsBlack(self) -> bool:\n", " return self._root.color == self.BLACK\n", " ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "With all these helpers, our `repOK()` method will become very rigorous – but all this rigor is very much needed. Just for fun, check out the [description of red-black trees on Wikipedia](https://en.wikipedia.org/wiki/Red–black_tree). The description of how insertion or deletion work is 4 to 5 pages long (each!), with dozens of special cases that all have to be handled properly. If you ever face the task of implementing such a data structure, be sure to (1) write a `repOK()` method such as the above, and (2) call it before and after each method that alters the tree:" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.427095Z", "iopub.status.busy": "2023-11-12T12:40:02.426973Z", "iopub.status.idle": "2023-11-12T12:40:02.428670Z", "shell.execute_reply": "2023-11-12T12:40:02.428398Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "from typing import Any, List" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.430136Z", "iopub.status.busy": "2023-11-12T12:40:02.430031Z", "iopub.status.idle": "2023-11-12T12:40:02.432039Z", "shell.execute_reply": "2023-11-12T12:40:02.431772Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class RedBlackTree(RedBlackTree):\n", " def insert(self, item: Any) -> None:\n", " assert self.repOK()\n", " ... # four pages of code\n", " assert self.repOK()\n", "\n", " def delete(self, item: Any) -> None:\n", " assert self.repOK()\n", " ... # five pages of code\n", " assert self.repOK()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Such checks will make your tree run much slower – essentially, instead of logarithmic time complexity, we now have linear time complexity, as the entire tree is traversed with each change – but you will find any bugs much, much faster. Once your tree goes in production, you can deactivate `repOK()` by default, using some debugging switch to turn it on again should the need ever arise:" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.433644Z", "iopub.status.busy": "2023-11-12T12:40:02.433535Z", "iopub.status.idle": "2023-11-12T12:40:02.435663Z", "shell.execute_reply": "2023-11-12T12:40:02.435401Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class RedBlackTree(RedBlackTree):\n", " def __init__(self, checkRepOK: bool = False) -> None:\n", " ...\n", " self.checkRepOK = checkRepOK\n", "\n", " def repOK(self) -> bool:\n", " if not self.checkRepOK:\n", " return True\n", "\n", " assert self.rootHasNoParent()\n", " assert self.rootIsBlack()\n", " ...\n", " return True" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Just don't delete it – future maintainers of your code will be forever grateful that you have documented your assumptions and given them a means to quickly check their code." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## System Invariants\n", "\n", "When interacting with the operating system, there are a number of rules that programs must follow, lest they get themselves (or the system) in some state where they cannot execute properly anymore.\n", "\n", "* If you work with files, every file that you open also must be closed; otherwise, you will deplete resources.\n", "* If you create temporary files, be sure to delete them after use; otherwise, you will consume disk space.\n", "* If you work with locks, be sure to release locks after use; otherwise, your system may end up in a deadlock.\n", "\n", "One area in which it is particularly easy to make mistakes is _memory usage_. In Python, memory is maintained by the Python interpreter, and all memory accesses are checked at runtime. Accessing a non-existing element of a string, for instance, will raise a memory error:" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.437170Z", "iopub.status.busy": "2023-11-12T12:40:02.437070Z", "iopub.status.idle": "2023-11-12T12:40:02.438706Z", "shell.execute_reply": "2023-11-12T12:40:02.438486Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/295675708.py\", line 3, in \n", " \"foo\"[index]\n", "IndexError: string index out of range (expected)\n" ] } ], "source": [ "with ExpectError():\n", " index = 10\n", " \"foo\"[index]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The very same expression in a C program, though, will yield _undefined behavior_ – which means that anything can happen. Let us explore a couple of C programs with undefined behavior." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### The C Memory Model\n", "\n", "We start with a simple C program which uses the same invalid index as our Python expression, above. What does this program do?" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.440206Z", "iopub.status.busy": "2023-11-12T12:40:02.440098Z", "iopub.status.idle": "2023-11-12T12:40:02.442235Z", "shell.execute_reply": "2023-11-12T12:40:02.441974Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "open('testoverflow.c', 'w').write(r'''\n", "#include \n", "\n", "// Access memory out of bounds\n", "int main(int argc, char *argv[]) {\n", " int index = 10;\n", " return \"foo\"[index]; // BOOM\n", "}\n", "''');" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.443803Z", "iopub.status.busy": "2023-11-12T12:40:02.443687Z", "iopub.status.idle": "2023-11-12T12:40:02.526170Z", "shell.execute_reply": "2023-11-12T12:40:02.525808Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "#include\u001b[37m \u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", "//\u001b[37m \u001b[39;49;00mAccess\u001b[37m \u001b[39;49;00mmemory\u001b[37m \u001b[39;49;00m\u001b[34mout\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[34mof\u001b[39;49;00m\u001b[37m \u001b[39;49;00mbounds\u001b[37m\u001b[39;49;00m\n", "\u001b[04m\u001b[32mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00mmain(\u001b[04m\u001b[32mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00margc,\u001b[37m \u001b[39;49;00m\u001b[04m\u001b[32mchar\u001b[39;49;00m\u001b[37m \u001b[39;49;00m*argv\u001b[04m\u001b[91m[\u001b[39;49;00m\u001b[04m\u001b[91m]\u001b[39;49;00m)\u001b[37m \u001b[39;49;00m\u001b[04m\u001b[91m{\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[04m\u001b[32mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[34mindex\u001b[39;49;00m\u001b[37m \u001b[39;49;00m=\u001b[37m \u001b[39;49;00m\u001b[34m10\u001b[39;49;00m;\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[34mreturn\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[33m\"foo\"\u001b[39;49;00m[index];\u001b[37m \u001b[39;49;00m//\u001b[37m \u001b[39;49;00mBOOM\u001b[37m\u001b[39;49;00m\n", "\u001b[04m\u001b[91m}\u001b[39;49;00m\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(open('testoverflow.c').read())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In our example, the program will read from a random chunk or memory, which may exist or not. In most cases, nothing at all will happen – which is a bad thing, because you won't realize that your program has a defect." ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.527875Z", "iopub.status.busy": "2023-11-12T12:40:02.527761Z", "iopub.status.idle": "2023-11-12T12:40:02.718123Z", "shell.execute_reply": "2023-11-12T12:40:02.717619Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "!cc -g -o testoverflow testoverflow.c" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.720374Z", "iopub.status.busy": "2023-11-12T12:40:02.720217Z", "iopub.status.idle": "2023-11-12T12:40:02.979778Z", "shell.execute_reply": "2023-11-12T12:40:02.979305Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "!./testoverflow" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To see what is going on behind the scenes, let us have a look at the C memory model." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Excursion: A C Memory Model Simulator" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We build a little simulation of C memory. A `Memory` item stands for a block of continuous memory, which we can access by address using `read()` and `write()`. The `__repr__()` method shows memory contents as a string." ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.982223Z", "iopub.status.busy": "2023-11-12T12:40:02.982067Z", "iopub.status.idle": "2023-11-12T12:40:02.984809Z", "shell.execute_reply": "2023-11-12T12:40:02.984535Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Memory:\n", " def __init__(self, size: int = 10) -> None:\n", " self.size: int = size\n", " self.memory: List[Any] = [None for i in range(size)]\n", "\n", " def read(self, address: int) -> Any:\n", " return self.memory[address]\n", "\n", " def write(self, address: int, item: Any) -> None:\n", " self.memory[address] = item\n", "\n", " def __repr__(self) -> str:\n", " return repr(self.memory)" ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.986446Z", "iopub.status.busy": "2023-11-12T12:40:02.986328Z", "iopub.status.idle": "2023-11-12T12:40:02.988013Z", "shell.execute_reply": "2023-11-12T12:40:02.987749Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "mem: Memory = Memory()" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.989453Z", "iopub.status.busy": "2023-11-12T12:40:02.989334Z", "iopub.status.idle": "2023-11-12T12:40:02.991834Z", "shell.execute_reply": "2023-11-12T12:40:02.991519Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[None, None, None, None, None, None, None, None, None, None]" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mem" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.993517Z", "iopub.status.busy": "2023-11-12T12:40:02.993371Z", "iopub.status.idle": "2023-11-12T12:40:02.995130Z", "shell.execute_reply": "2023-11-12T12:40:02.994848Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "mem.write(0, 'a')" ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.996586Z", "iopub.status.busy": "2023-11-12T12:40:02.996483Z", "iopub.status.idle": "2023-11-12T12:40:02.998449Z", "shell.execute_reply": "2023-11-12T12:40:02.998206Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['a', None, None, None, None, None, None, None, None, None]" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mem" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:02.999868Z", "iopub.status.busy": "2023-11-12T12:40:02.999758Z", "iopub.status.idle": "2023-11-12T12:40:03.001792Z", "shell.execute_reply": "2023-11-12T12:40:03.001552Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'a'" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mem.read(0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We introduce `[index]` syntax for easy read and write:" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.003317Z", "iopub.status.busy": "2023-11-12T12:40:03.003197Z", "iopub.status.idle": "2023-11-12T12:40:03.005216Z", "shell.execute_reply": "2023-11-12T12:40:03.004932Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Memory(Memory):\n", " def __getitem__(self, address: int) -> Any:\n", " return self.read(address)\n", "\n", " def __setitem__(self, address: int, item: Any) -> None:\n", " self.write(address, item)" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.006752Z", "iopub.status.busy": "2023-11-12T12:40:03.006633Z", "iopub.status.idle": "2023-11-12T12:40:03.008355Z", "shell.execute_reply": "2023-11-12T12:40:03.008073Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "mem_with_index: Memory = Memory()\n", "mem_with_index[1] = 'a'" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.010086Z", "iopub.status.busy": "2023-11-12T12:40:03.009954Z", "iopub.status.idle": "2023-11-12T12:40:03.012133Z", "shell.execute_reply": "2023-11-12T12:40:03.011861Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[None, 'a', None, None, None, None, None, None, None, None]" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mem_with_index" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.013756Z", "iopub.status.busy": "2023-11-12T12:40:03.013544Z", "iopub.status.idle": "2023-11-12T12:40:03.015664Z", "shell.execute_reply": "2023-11-12T12:40:03.015392Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'a'" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mem_with_index[1]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here are some more advanced methods to show memory cntents. The `repr()` and `_repr_markdown_()` methods display memory as a table. In a notebook, we can simply evaluate the memory to see the table." ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.017122Z", "iopub.status.busy": "2023-11-12T12:40:03.016999Z", "iopub.status.idle": "2023-11-12T12:40:03.018762Z", "shell.execute_reply": "2023-11-12T12:40:03.018482Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from IPython.display import display, Markdown, HTML" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.020322Z", "iopub.status.busy": "2023-11-12T12:40:03.020206Z", "iopub.status.idle": "2023-11-12T12:40:03.023210Z", "shell.execute_reply": "2023-11-12T12:40:03.022946Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Memory(Memory):\n", " def show_header(self) -> str:\n", " out = \"|Address|\"\n", " for address in range(self.size):\n", " out += f\"{address}|\"\n", " return out + '\\n'\n", "\n", " def show_sep(self) -> str:\n", " out = \"|:---|\"\n", " for address in range(self.size):\n", " out += \":---|\"\n", " return out + '\\n'\n", "\n", " def show_contents(self) -> str:\n", " out = \"|Content|\"\n", " for address in range(self.size):\n", " contents = self.memory[address]\n", " if contents is not None:\n", " out += f\"{repr(contents)}|\"\n", " else:\n", " out += \" |\"\n", " return out + '\\n'\n", "\n", " def __repr__(self) -> str:\n", " return self.show_header() + self.show_sep() + self.show_contents()\n", "\n", " def _repr_markdown_(self) -> str:\n", " return repr(self)" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.024799Z", "iopub.status.busy": "2023-11-12T12:40:03.024630Z", "iopub.status.idle": "2023-11-12T12:40:03.027058Z", "shell.execute_reply": "2023-11-12T12:40:03.026788Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|0|10|20|30|40|50|60|70|80|90|\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|0|10|20|30|40|50|60|70|80|90|" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mem_with_table: Memory = Memory()\n", "for i in range(mem_with_table.size):\n", " mem_with_table[i] = 10 * i\n", "mem_with_table" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In C, memory comes as a single block of bytes at continuous addresses. Let us assume we have a memory of only 20 bytes (duh!) and the string \"foo\" is stored at address 5:" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.028556Z", "iopub.status.busy": "2023-11-12T12:40:03.028446Z", "iopub.status.idle": "2023-11-12T12:40:03.030817Z", "shell.execute_reply": "2023-11-12T12:40:03.030560Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content| | | | | |'f'|'o'|'o'| | | | | | | | | | | | |\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content| | | | | |'f'|'o'|'o'| | | | | | | | | | | | |" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mem_with_table: Memory = Memory(20)\n", "mem_with_table[5] = 'f'\n", "mem_with_table[6] = 'o'\n", "mem_with_table[7] = 'o'\n", "mem_with_table" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "When we try to access `\"foo\"[10]`, we try to read the memory location at address 15 – which may exist (or not), and which may have arbitrary contents, based on whatever previous instructions left there. From there on, the behavior of our program is undefined." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Such _buffer overflows_ can also come as _writes_ into memory locations – and thus overwrite the item that happens to be at the location of interest. If the item at address 15 happens to be, say, a flag controlling administrator access, then setting it to a non-zero value can come handy for an attacker." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Dynamic Memory\n", "\n", "A second source of errors in C programs is the use of dynamic memory – that is, memory allocated and deallocated at run-time. In C, the function `malloc()` returns a continuous block of memory of a given size; the function `free()` returns it back to the system.\n", "\n", "After a block has been `free()`'d, it must no longer be used, as the memory might already be in use by some other function (or program!) again. Here's a piece of code that violates this assumption:" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.032632Z", "iopub.status.busy": "2023-11-12T12:40:03.032513Z", "iopub.status.idle": "2023-11-12T12:40:03.034562Z", "shell.execute_reply": "2023-11-12T12:40:03.034284Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# ignore\n", "open('testuseafterfree.c', 'w').write(r'''\n", "#include \n", "\n", "// Access a chunk of memory after it has been given back to the system\n", "int main(int argc, char *argv[]) {\n", " int *array = malloc(100 * sizeof(int));\n", " free(array);\n", " return array[10]; // BOOM\n", "}\n", "''');" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.036009Z", "iopub.status.busy": "2023-11-12T12:40:03.035907Z", "iopub.status.idle": "2023-11-12T12:40:03.048277Z", "shell.execute_reply": "2023-11-12T12:40:03.048016Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[36m#\u001b[39;49;00m\u001b[36minclude\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[37m\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", "\u001b[37m// Access a chunk of memory after it has been given back to the system\u001b[39;49;00m\n", "\u001b[36mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[32mmain\u001b[39;49;00m(\u001b[36mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00margc,\u001b[37m \u001b[39;49;00m\u001b[36mchar\u001b[39;49;00m\u001b[37m \u001b[39;49;00m*argv[])\u001b[37m \u001b[39;49;00m{\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[36mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00m*array\u001b[37m \u001b[39;49;00m=\u001b[37m \u001b[39;49;00mmalloc(\u001b[34m100\u001b[39;49;00m\u001b[37m \u001b[39;49;00m*\u001b[37m \u001b[39;49;00m\u001b[34msizeof\u001b[39;49;00m(\u001b[36mint\u001b[39;49;00m));\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00mfree(array);\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[34mreturn\u001b[39;49;00m\u001b[37m \u001b[39;49;00marray[\u001b[34m10\u001b[39;49;00m];\u001b[37m \u001b[39;49;00m\u001b[37m// BOOM\u001b[39;49;00m\n", "}\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(open('testuseafterfree.c').read())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Again, if we compile and execute this program, nothing tells us that we have just entered undefined behavior:" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.049871Z", "iopub.status.busy": "2023-11-12T12:40:03.049760Z", "iopub.status.idle": "2023-11-12T12:40:03.235227Z", "shell.execute_reply": "2023-11-12T12:40:03.234705Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "!cc -g -o testuseafterfree testuseafterfree.c" ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.237463Z", "iopub.status.busy": "2023-11-12T12:40:03.237235Z", "iopub.status.idle": "2023-11-12T12:40:03.509541Z", "shell.execute_reply": "2023-11-12T12:40:03.509019Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "!./testuseafterfree" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "What's going on behind the scenes here?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Excursion: Dynamic Memory in C" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`DynamicMemory` introduces dynamic memory allocation `allocate()` and deallocation `free()`, using a list of allocated blocks." ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.512081Z", "iopub.status.busy": "2023-11-12T12:40:03.511935Z", "iopub.status.idle": "2023-11-12T12:40:03.516383Z", "shell.execute_reply": "2023-11-12T12:40:03.516051Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DynamicMemory(Memory):\n", " # Address at which our list of blocks starts\n", " BLOCK_LIST_START = 0\n", "\n", " def __init__(self, *args: Any) -> None:\n", " super().__init__(*args)\n", "\n", " # Before each block, we reserve two items:\n", " # One pointing to the next block (-1 = END)\n", " self.memory[self.BLOCK_LIST_START] = -1\n", " # One giving the length of the current block (<0: freed)\n", " self.memory[self.BLOCK_LIST_START + 1] = 0\n", "\n", " def allocate(self, block_size: int) -> int:\n", " \"\"\"Allocate a block of memory\"\"\"\n", " # traverse block list \n", " # until we find a free block of appropriate size\n", " chunk = self.BLOCK_LIST_START\n", "\n", " while chunk < self.size:\n", " next_chunk = self.memory[chunk]\n", " chunk_length = self.memory[chunk + 1]\n", "\n", " if chunk_length < 0 and abs(chunk_length) >= block_size:\n", " # Reuse this free block\n", " self.memory[chunk + 1] = abs(chunk_length)\n", " return chunk + 2\n", "\n", " if next_chunk < 0:\n", " # End of list - allocate new block\n", " next_chunk = chunk + block_size + 2\n", " if next_chunk >= self.size:\n", " break\n", "\n", " self.memory[chunk] = next_chunk\n", " self.memory[chunk + 1] = block_size\n", " self.memory[next_chunk] = -1\n", " self.memory[next_chunk + 1] = 0\n", " base = chunk + 2\n", " return base\n", "\n", " # Go to next block\n", " chunk = next_chunk\n", "\n", " raise MemoryError(\"Out of Memory\")\n", "\n", " def free(self, base: int) -> None:\n", " \"\"\"Free a block of memory\"\"\"\n", " # Mark block as available\n", " chunk = base - 2\n", " self.memory[chunk + 1] = -abs(self.memory[chunk + 1])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In our table, we highlight free blocks in gray:" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.518035Z", "iopub.status.busy": "2023-11-12T12:40:03.517912Z", "iopub.status.idle": "2023-11-12T12:40:03.520797Z", "shell.execute_reply": "2023-11-12T12:40:03.520522Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class DynamicMemory(DynamicMemory):\n", " def show_header(self) -> str:\n", " out = \"|Address|\"\n", " color = \"black\"\n", " chunk = self.BLOCK_LIST_START\n", " allocated = False\n", "\n", " # States and colors\n", " for address in range(self.size):\n", " if address == chunk:\n", " color = \"blue\"\n", " next_chunk = self.memory[address]\n", " elif address == chunk + 1:\n", " color = \"blue\"\n", " allocated = self.memory[address] > 0\n", " chunk = next_chunk\n", " elif allocated:\n", " color = \"black\"\n", " else:\n", " color = \"lightgrey\"\n", "\n", " item = f'{address}'\n", " out += f\"{item}|\"\n", " return out + '\\n'" ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.522182Z", "iopub.status.busy": "2023-11-12T12:40:03.522080Z", "iopub.status.idle": "2023-11-12T12:40:03.523861Z", "shell.execute_reply": "2023-11-12T12:40:03.523542Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "dynamic_mem: DynamicMemory = DynamicMemory(10)" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.525504Z", "iopub.status.busy": "2023-11-12T12:40:03.525373Z", "iopub.status.idle": "2023-11-12T12:40:03.527710Z", "shell.execute_reply": "2023-11-12T12:40:03.527434Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|-1|0| | | | | | | | |\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|-1|0| | | | | | | | |" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.529189Z", "iopub.status.busy": "2023-11-12T12:40:03.529083Z", "iopub.status.idle": "2023-11-12T12:40:03.531189Z", "shell.execute_reply": "2023-11-12T12:40:03.530919Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem.allocate(2)" ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.532631Z", "iopub.status.busy": "2023-11-12T12:40:03.532522Z", "iopub.status.idle": "2023-11-12T12:40:03.534626Z", "shell.execute_reply": "2023-11-12T12:40:03.534385Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|4|2| | |-1|0| | | | |\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|4|2| | |-1|0| | | | |" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.536076Z", "iopub.status.busy": "2023-11-12T12:40:03.535967Z", "iopub.status.idle": "2023-11-12T12:40:03.537853Z", "shell.execute_reply": "2023-11-12T12:40:03.537606Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem.allocate(2)" ] }, { "cell_type": "code", "execution_count": 116, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.539288Z", "iopub.status.busy": "2023-11-12T12:40:03.539192Z", "iopub.status.idle": "2023-11-12T12:40:03.541330Z", "shell.execute_reply": "2023-11-12T12:40:03.540974Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|4|2| | |8|2| | |-1|0|\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|4|2| | |8|2| | |-1|0|" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem" ] }, { "cell_type": "code", "execution_count": 117, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.543022Z", "iopub.status.busy": "2023-11-12T12:40:03.542792Z", "iopub.status.idle": "2023-11-12T12:40:03.544490Z", "shell.execute_reply": "2023-11-12T12:40:03.544231Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "dynamic_mem.free(2)" ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.545915Z", "iopub.status.busy": "2023-11-12T12:40:03.545804Z", "iopub.status.idle": "2023-11-12T12:40:03.547876Z", "shell.execute_reply": "2023-11-12T12:40:03.547639Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|4|-2| | |8|2| | |-1|0|\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|4|-2| | |8|2| | |-1|0|" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem" ] }, { "cell_type": "code", "execution_count": 119, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.549277Z", "iopub.status.busy": "2023-11-12T12:40:03.549161Z", "iopub.status.idle": "2023-11-12T12:40:03.551209Z", "shell.execute_reply": "2023-11-12T12:40:03.550948Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem.allocate(1)" ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.552623Z", "iopub.status.busy": "2023-11-12T12:40:03.552509Z", "iopub.status.idle": "2023-11-12T12:40:03.554609Z", "shell.execute_reply": "2023-11-12T12:40:03.554334Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|4|2| | |8|2| | |-1|0|\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|4|2| | |8|2| | |-1|0|" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem" ] }, { "cell_type": "code", "execution_count": 121, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.556116Z", "iopub.status.busy": "2023-11-12T12:40:03.555993Z", "iopub.status.idle": "2023-11-12T12:40:03.557941Z", "shell.execute_reply": "2023-11-12T12:40:03.557681Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/381644293.py\", line 2, in \n", " dynamic_mem.allocate(1)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2922652606.py\", line 45, in allocate\n", " raise MemoryError(\"Out of Memory\")\n", "MemoryError: Out of Memory (expected)\n" ] } ], "source": [ "with ExpectError():\n", " dynamic_mem.allocate(1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### End of Excursion" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Dynamic memory is allocated as part of our main memory. The following table shows unallocated memory in gray and allocated memory in black:" ] }, { "cell_type": "code", "execution_count": 122, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.559529Z", "iopub.status.busy": "2023-11-12T12:40:03.559423Z", "iopub.status.idle": "2023-11-12T12:40:03.561496Z", "shell.execute_reply": "2023-11-12T12:40:03.561255Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|-1|0| | | | | | | | | | | |\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|-1|0| | | | | | | | | | | |" ] }, "execution_count": 122, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem: DynamicMemory = DynamicMemory(13)\n", "dynamic_mem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The numbers already stored (-1 and 0) are part of our dynamic memory housekeeping (highlighted in blue); they stand for the next block of memory and the length of the current block, respectively." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us allocate a block of 3 bytes. Our (simulated) allocation mechanism places these at the first continuous block available:" ] }, { "cell_type": "code", "execution_count": 123, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.562986Z", "iopub.status.busy": "2023-11-12T12:40:03.562879Z", "iopub.status.idle": "2023-11-12T12:40:03.564726Z", "shell.execute_reply": "2023-11-12T12:40:03.564495Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p1 = dynamic_mem.allocate(3)\n", "p1" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We see that a block of 3 items is allocated at address 2. The two numbers before that address (5 and 3) indicate the beginning of the next block as well as the length of the current one." ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.566306Z", "iopub.status.busy": "2023-11-12T12:40:03.566161Z", "iopub.status.idle": "2023-11-12T12:40:03.568259Z", "shell.execute_reply": "2023-11-12T12:40:03.567982Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|5|3| | | |-1|0| | | | | | |\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|5|3| | | |-1|0| | | | | | |" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us allocate some more." ] }, { "cell_type": "code", "execution_count": 125, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.569700Z", "iopub.status.busy": "2023-11-12T12:40:03.569598Z", "iopub.status.idle": "2023-11-12T12:40:03.571577Z", "shell.execute_reply": "2023-11-12T12:40:03.571332Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "7" ] }, "execution_count": 125, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p2 = dynamic_mem.allocate(4)\n", "p2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can make use of that memory:" ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.573001Z", "iopub.status.busy": "2023-11-12T12:40:03.572880Z", "iopub.status.idle": "2023-11-12T12:40:03.574590Z", "shell.execute_reply": "2023-11-12T12:40:03.574291Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "dynamic_mem[p1] = 123\n", "dynamic_mem[p2] = 'x'" ] }, { "cell_type": "code", "execution_count": 127, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.576261Z", "iopub.status.busy": "2023-11-12T12:40:03.576139Z", "iopub.status.idle": "2023-11-12T12:40:03.578260Z", "shell.execute_reply": "2023-11-12T12:40:03.577988Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|5|3|123| | |11|4|'x'| | | |-1|0|\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|5|3|123| | |11|4|'x'| | | |-1|0|" ] }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "When we free memory, the block is marked as free by giving it a negative length:" ] }, { "cell_type": "code", "execution_count": 128, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.579734Z", "iopub.status.busy": "2023-11-12T12:40:03.579620Z", "iopub.status.idle": "2023-11-12T12:40:03.581231Z", "shell.execute_reply": "2023-11-12T12:40:03.580951Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "dynamic_mem.free(p1)" ] }, { "cell_type": "code", "execution_count": 129, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.582681Z", "iopub.status.busy": "2023-11-12T12:40:03.582573Z", "iopub.status.idle": "2023-11-12T12:40:03.584730Z", "shell.execute_reply": "2023-11-12T12:40:03.584456Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|5|-3|123| | |11|4|'x'| | | |-1|0|\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|5|-3|123| | |11|4|'x'| | | |-1|0|" ] }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Note that freeing memory does not clear memory; the item at `p1` is still there. And we can also still access it." ] }, { "cell_type": "code", "execution_count": 130, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.586325Z", "iopub.status.busy": "2023-11-12T12:40:03.586108Z", "iopub.status.idle": "2023-11-12T12:40:03.588093Z", "shell.execute_reply": "2023-11-12T12:40:03.587831Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "123" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem[p1]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "But if, in the meantime, some other part of the program requests more memory and uses it..." ] }, { "cell_type": "code", "execution_count": 131, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.589523Z", "iopub.status.busy": "2023-11-12T12:40:03.589422Z", "iopub.status.idle": "2023-11-12T12:40:03.591920Z", "shell.execute_reply": "2023-11-12T12:40:03.591560Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|5|3|'y'| | |11|4|'x'| | | |-1|0|\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|5|3|'y'| | |11|4|'x'| | | |-1|0|" ] }, "execution_count": 131, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p3 = dynamic_mem.allocate(2)\n", "dynamic_mem[p3] = 'y'\n", "dynamic_mem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "... then the memory at `p1` may simply be overwritten." ] }, { "cell_type": "code", "execution_count": 132, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.593654Z", "iopub.status.busy": "2023-11-12T12:40:03.593441Z", "iopub.status.idle": "2023-11-12T12:40:03.595601Z", "shell.execute_reply": "2023-11-12T12:40:03.595355Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'y'" ] }, "execution_count": 132, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem[p1]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "An even worse effect comes into play if one accidentally overwrites the dynamic memory allocation information; this can easily corrupt the entire memory management. In our case, such corrupted memory can lead to an endless loop when trying to allocate more memory:" ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.597064Z", "iopub.status.busy": "2023-11-12T12:40:03.596957Z", "iopub.status.idle": "2023-11-12T12:40:03.598544Z", "shell.execute_reply": "2023-11-12T12:40:03.598299Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ExpectError import ExpectTimeout" ] }, { "cell_type": "code", "execution_count": 134, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.599951Z", "iopub.status.busy": "2023-11-12T12:40:03.599852Z", "iopub.status.idle": "2023-11-12T12:40:03.601524Z", "shell.execute_reply": "2023-11-12T12:40:03.601274Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "dynamic_mem[p3 + 3] = 0" ] }, { "cell_type": "code", "execution_count": 135, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.602980Z", "iopub.status.busy": "2023-11-12T12:40:03.602865Z", "iopub.status.idle": "2023-11-12T12:40:03.604994Z", "shell.execute_reply": "2023-11-12T12:40:03.604711Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|5|3|'y'| | |0|4|'x'| | | |-1|0|\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|10|11|12|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Content|5|3|'y'| | |0|4|'x'| | | |-1|0|" ] }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dynamic_mem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "When `allocate()` traverses the list of blocks, it will enter an endless loop between the block starting at address 0 (pointing to the next block at 5) and the block at address 5 (pointing back to 0)." ] }, { "cell_type": "code", "execution_count": 136, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:03.606488Z", "iopub.status.busy": "2023-11-12T12:40:03.606382Z", "iopub.status.idle": "2023-11-12T12:40:04.608825Z", "shell.execute_reply": "2023-11-12T12:40:04.608471Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2188947149.py\", line 2, in \n", " dynamic_mem.allocate(1)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2922652606.py\", line 24, in allocate\n", " if chunk_length < 0 and abs(chunk_length) >= block_size:\n", " File \"/Users/zeller/Projects/debuggingbook/notebooks/Timeout.ipynb\", line 43, in timeout_handler\n", " raise TimeoutError()\n", "TimeoutError (expected)\n" ] } ], "source": [ "with ExpectTimeout(1):\n", " dynamic_mem.allocate(1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Real-world `malloc()` and `free()` implementations suffer from similar problems. As stated above: As soon as undefined behavior is reached, anything may happen." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Managed Memory\n", "\n", "The solution to all these problems is to _keep track of memory_, specifically\n", "\n", "* which parts of memory have been _allocated_, and\n", "* which parts of memory have been _initialized_." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To this end, we introduce two extra flags for each address:\n", "\n", "* The `allocated` flag tells whether an address has been allocated; the `allocate()` method sets them, and `free()` clears them again.\n", "* The `initialized` flag tells whether an address has been written to. This is cleared as part of `allocate()`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "With these, we can run a number of checks:\n", "\n", "* When _writing_ into memory and _freeing_ memory, we can check whether the address has been _allocated_; and\n", "* When _reading_ from memory, we can check whether the address has been allocated and _initialized_.\n", "\n", "Both of these should effectively prevent memory errors." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Excursion: Managed Memory" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We create a simulation of managed memory. `ManagedMemory` keeps track of every address whether it is initialized and allocated." ] }, { "cell_type": "code", "execution_count": 137, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.611567Z", "iopub.status.busy": "2023-11-12T12:40:04.611396Z", "iopub.status.idle": "2023-11-12T12:40:04.614059Z", "shell.execute_reply": "2023-11-12T12:40:04.613675Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class ManagedMemory(DynamicMemory):\n", " def __init__(self, *args: Any) -> None:\n", " super().__init__(*args)\n", " self.initialized = [False for i in range(self.size)]\n", " self.allocated = [False for i in range(self.size)]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This allows memory access functions to run a number of extra checks:" ] }, { "cell_type": "code", "execution_count": 138, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.615973Z", "iopub.status.busy": "2023-11-12T12:40:04.615845Z", "iopub.status.idle": "2023-11-12T12:40:04.618486Z", "shell.execute_reply": "2023-11-12T12:40:04.618178Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ManagedMemory(ManagedMemory):\n", " def write(self, address: int, item: Any) -> None:\n", " assert self.allocated[address], \\\n", " \"Writing into unallocated memory\"\n", " self.memory[address] = item\n", " self.initialized[address] = True\n", "\n", " def read(self, address: int) -> Any:\n", " assert self.allocated[address], \\\n", " \"Reading from unallocated memory\"\n", " assert self.initialized[address], \\\n", " \"Reading from uninitialized memory\"\n", " return self.memory[address]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Dynamic memory functions are set up such that they keep track of these flags." ] }, { "cell_type": "code", "execution_count": 139, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.620221Z", "iopub.status.busy": "2023-11-12T12:40:04.620093Z", "iopub.status.idle": "2023-11-12T12:40:04.622837Z", "shell.execute_reply": "2023-11-12T12:40:04.622509Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ManagedMemory(ManagedMemory):\n", " def allocate(self, block_size: int) -> int:\n", " base = super().allocate(block_size)\n", " for i in range(block_size):\n", " self.allocated[base + i] = True\n", " self.initialized[base + i] = False\n", " return base\n", "\n", " def free(self, base: int) -> None:\n", " assert self.allocated[base], \\\n", " \"Freeing memory that is already freed\"\n", " block_size = self.memory[base - 1]\n", " for i in range(block_size):\n", " self.allocated[base + i] = False\n", " self.initialized[base + i] = False\n", " super().free(base)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Let us highlight these flags when printing out the table:" ] }, { "cell_type": "code", "execution_count": 140, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.624626Z", "iopub.status.busy": "2023-11-12T12:40:04.624485Z", "iopub.status.idle": "2023-11-12T12:40:04.628292Z", "shell.execute_reply": "2023-11-12T12:40:04.627623Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ManagedMemory(ManagedMemory):\n", " def show_contents(self) -> str:\n", " return (self.show_allocated() + \n", " self.show_initialized() +\n", " DynamicMemory.show_contents(self))\n", "\n", " def show_allocated(self) -> str:\n", " out = \"|Allocated|\"\n", " for address in range(self.size):\n", " if self.allocated[address]:\n", " out += \"Y|\"\n", " else:\n", " out += \" |\"\n", " return out + '\\n'\n", "\n", " def show_initialized(self) -> str:\n", " out = \"|Initialized|\"\n", " for address in range(self.size):\n", " if self.initialized[address]:\n", " out += \"Y|\"\n", " else:\n", " out += \" |\"\n", " return out + '\\n'" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### End of Excursion" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here comes a simple simulation of managed memory. After we create memory, all addresses are neither allocated nor initialized:" ] }, { "cell_type": "code", "execution_count": 141, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.630615Z", "iopub.status.busy": "2023-11-12T12:40:04.630483Z", "iopub.status.idle": "2023-11-12T12:40:04.633264Z", "shell.execute_reply": "2023-11-12T12:40:04.632922Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Allocated| | | | | | | | | | |\n", "|Initialized| | | | | | | | | | |\n", "|Content|-1|0| | | | | | | | |\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Allocated| | | | | | | | | | |\n", "|Initialized| | | | | | | | | | |\n", "|Content|-1|0| | | | | | | | |" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "managed_mem: ManagedMemory = ManagedMemory()\n", "managed_mem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us allocate some elements. We see that the first three bytes are now marked as allocated:" ] }, { "cell_type": "code", "execution_count": 142, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.635200Z", "iopub.status.busy": "2023-11-12T12:40:04.635071Z", "iopub.status.idle": "2023-11-12T12:40:04.637669Z", "shell.execute_reply": "2023-11-12T12:40:04.637350Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Allocated| | |Y|Y|Y| | | | | |\n", "|Initialized| | | | | | | | | | |\n", "|Content|5|3| | | |-1|0| | | |\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Allocated| | |Y|Y|Y| | | | | |\n", "|Initialized| | | | | | | | | | |\n", "|Content|5|3| | | |-1|0| | | |" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = managed_mem.allocate(3)\n", "managed_mem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "After writing into memory, the respective addresses are marked as \"initialized\":" ] }, { "cell_type": "code", "execution_count": 143, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.639507Z", "iopub.status.busy": "2023-11-12T12:40:04.639387Z", "iopub.status.idle": "2023-11-12T12:40:04.641275Z", "shell.execute_reply": "2023-11-12T12:40:04.640940Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "managed_mem[p] = 10\n", "managed_mem[p + 1] = 20" ] }, { "cell_type": "code", "execution_count": 144, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.643213Z", "iopub.status.busy": "2023-11-12T12:40:04.643073Z", "iopub.status.idle": "2023-11-12T12:40:04.646040Z", "shell.execute_reply": "2023-11-12T12:40:04.645609Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Allocated| | |Y|Y|Y| | | | | |\n", "|Initialized| | |Y|Y| | | | | | |\n", "|Content|5|3|10|20| |-1|0| | | |\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Allocated| | |Y|Y|Y| | | | | |\n", "|Initialized| | |Y|Y| | | | | | |\n", "|Content|5|3|10|20| |-1|0| | | |" ] }, "execution_count": 144, "metadata": {}, "output_type": "execute_result" } ], "source": [ "managed_mem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Attempting to read uninitialized memory fails:" ] }, { "cell_type": "code", "execution_count": 145, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.648068Z", "iopub.status.busy": "2023-11-12T12:40:04.647895Z", "iopub.status.idle": "2023-11-12T12:40:04.650286Z", "shell.execute_reply": "2023-11-12T12:40:04.649986Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/1363131886.py\", line 2, in \n", " x = managed_mem[p + 2]\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2465984283.py\", line 3, in __getitem__\n", " return self.read(address)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2898840933.py\", line 11, in read\n", " assert self.initialized[address], \\\n", "AssertionError: Reading from uninitialized memory (expected)\n" ] } ], "source": [ "with ExpectError():\n", " x = managed_mem[p + 2]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "When we free the block again, it is marked as not allocated:" ] }, { "cell_type": "code", "execution_count": 146, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.652076Z", "iopub.status.busy": "2023-11-12T12:40:04.651941Z", "iopub.status.idle": "2023-11-12T12:40:04.654253Z", "shell.execute_reply": "2023-11-12T12:40:04.653951Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Allocated| | | | | | | | | | |\n", "|Initialized| | | | | | | | | | |\n", "|Content|5|-3|10|20| |-1|0| | | |\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Allocated| | | | | | | | | | |\n", "|Initialized| | | | | | | | | | |\n", "|Content|5|-3|10|20| |-1|0| | | |" ] }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" } ], "source": [ "managed_mem.free(p)\n", "managed_mem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And accessing any element of the free'd block will yield an error:" ] }, { "cell_type": "code", "execution_count": 147, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.655997Z", "iopub.status.busy": "2023-11-12T12:40:04.655867Z", "iopub.status.idle": "2023-11-12T12:40:04.658126Z", "shell.execute_reply": "2023-11-12T12:40:04.657785Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/4208287712.py\", line 2, in \n", " managed_mem[p] = 10\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2465984283.py\", line 6, in __setitem__\n", " self.write(address, item)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2898840933.py\", line 3, in write\n", " assert self.allocated[address], \\\n", "AssertionError: Writing into unallocated memory (expected)\n" ] } ], "source": [ "with ExpectError():\n", " managed_mem[p] = 10" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Freeing the same block twice also yields an error:" ] }, { "cell_type": "code", "execution_count": 148, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.660033Z", "iopub.status.busy": "2023-11-12T12:40:04.659909Z", "iopub.status.idle": "2023-11-12T12:40:04.661796Z", "shell.execute_reply": "2023-11-12T12:40:04.661511Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/3645412891.py\", line 2, in \n", " managed_mem.free(p)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/193494573.py\", line 10, in free\n", " assert self.allocated[base], \\\n", "AssertionError: Freeing memory that is already freed (expected)\n" ] } ], "source": [ "with ExpectError():\n", " managed_mem.free(p)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With this, we now have a mechanism in place to fully detect memory issues in languages such as C." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Obviously, keeping track of whether memory is allocated/initialized or not requires some extra memory – and also some extra computation time, as read and write accesses have to be checked first. During testing, however, such effort may quickly pay off, as memory bugs can be quickly discovered." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To detect memory errors, a number of tools have been developed. The first class of tools _interprets_ the instructions of the executable code, tracking all memory accesses. For each memory access, they can check whether the memory accessed _exists_ and has been _initialized_ at some point." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Checking Memory Usage with Valgrind\n", "\n", "The [Valgrind](https://www.valgrind.org) tool allows _interpreting_ executable code, thus tracking each and every memory access. You can use Valgrind to execute any program from the command-line, and it will check all memory accesses during execution. \n", "\n", "\n", "\n", "Here's what happens if we run Valgrind on our `testuseafterfree` program:" ] }, { "cell_type": "code", "execution_count": 149, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.663603Z", "iopub.status.busy": "2023-11-12T12:40:04.663486Z", "iopub.status.idle": "2023-11-12T12:40:04.676067Z", "shell.execute_reply": "2023-11-12T12:40:04.675676Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[36m#\u001b[39;49;00m\u001b[36minclude\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[37m\u001b[39;49;00m\u001b[36m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", "\u001b[37m// Access a chunk of memory after it has been given back to the system\u001b[39;49;00m\n", "\u001b[36mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[32mmain\u001b[39;49;00m(\u001b[36mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00margc,\u001b[37m \u001b[39;49;00m\u001b[36mchar\u001b[39;49;00m\u001b[37m \u001b[39;49;00m*argv[])\u001b[37m \u001b[39;49;00m{\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[36mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00m*array\u001b[37m \u001b[39;49;00m=\u001b[37m \u001b[39;49;00mmalloc(\u001b[34m100\u001b[39;49;00m\u001b[37m \u001b[39;49;00m*\u001b[37m \u001b[39;49;00m\u001b[34msizeof\u001b[39;49;00m(\u001b[36mint\u001b[39;49;00m));\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00mfree(array);\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[34mreturn\u001b[39;49;00m\u001b[37m \u001b[39;49;00marray[\u001b[34m10\u001b[39;49;00m];\u001b[37m \u001b[39;49;00m\u001b[37m// BOOM\u001b[39;49;00m\n", "}\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(open('testuseafterfree.c').read())" ] }, { "cell_type": "code", "execution_count": 150, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.677838Z", "iopub.status.busy": "2023-11-12T12:40:04.677694Z", "iopub.status.idle": "2023-11-12T12:40:04.869439Z", "shell.execute_reply": "2023-11-12T12:40:04.868940Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==77== Memcheck, a memory error detector\r\n", "==77== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.\r\n", "==77== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info\r\n", "==77== Command: ./testuseafterfree\r\n", "==77== \r\n", "==77== Invalid read of size 4\r\n", "==77== at 0x1086B7: main (testuseafterfree.c:8)\r\n", "==77== Address 0x522f068 is 40 bytes inside a block of size 400 free'd\r\n", "==77== at 0x4C32D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)\r\n", "==77== by 0x1086B2: main (testuseafterfree.c:7)\r\n", "==77== Block was alloc'd at\r\n", "==77== at 0x4C31B0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)\r\n", "==77== by 0x1086A2: main (testuseafterfree.c:6)\r\n", "==77== \r\n", "==77== \r\n", "==77== HEAP SUMMARY:\r\n", "==77== in use at exit: 0 bytes in 0 blocks\r\n", "==77== total heap usage: 1 allocs, 1 frees, 400 bytes allocated\r\n", "==77== \r\n", "==77== All heap blocks were freed -- no leaks are possible\r\n", "==77== \r\n", "==77== For counts of detected and suppressed errors, rerun with: -v\r\n", "==77== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)\r\n" ] } ], "source": [ "!valgrind ./testuseafterfree" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We see that Valgrind has detected the issue (\"Invalid read of size 4\") during execution; it also reported the current stack trace (and hence the location at which the error occurred). Note that the program continues execution even after the error occurred; should further errors occur, Valgrind will report these, too." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Being an interpreter, Valgrind slows down execution of programs dramatically. However, it requires no recompilation and thus can work on code (and libraries) whose source code is not available." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Valgrind is not perfect, though. For our `testoverflow` program, it fails to detect the illegal access:" ] }, { "cell_type": "code", "execution_count": 151, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.878516Z", "iopub.status.busy": "2023-11-12T12:40:04.872657Z", "iopub.status.idle": "2023-11-12T12:40:04.892408Z", "shell.execute_reply": "2023-11-12T12:40:04.892047Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "#include\u001b[37m \u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m\u001b[39;49;00m\n", "//\u001b[37m \u001b[39;49;00mAccess\u001b[37m \u001b[39;49;00mmemory\u001b[37m \u001b[39;49;00m\u001b[34mout\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[34mof\u001b[39;49;00m\u001b[37m \u001b[39;49;00mbounds\u001b[37m\u001b[39;49;00m\n", "\u001b[04m\u001b[32mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00mmain(\u001b[04m\u001b[32mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00margc,\u001b[37m \u001b[39;49;00m\u001b[04m\u001b[32mchar\u001b[39;49;00m\u001b[37m \u001b[39;49;00m*argv\u001b[04m\u001b[91m[\u001b[39;49;00m\u001b[04m\u001b[91m]\u001b[39;49;00m)\u001b[37m \u001b[39;49;00m\u001b[04m\u001b[91m{\u001b[39;49;00m\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[04m\u001b[32mint\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[34mindex\u001b[39;49;00m\u001b[37m \u001b[39;49;00m=\u001b[37m \u001b[39;49;00m\u001b[34m10\u001b[39;49;00m;\u001b[37m\u001b[39;49;00m\n", "\u001b[37m \u001b[39;49;00m\u001b[34mreturn\u001b[39;49;00m\u001b[37m \u001b[39;49;00m\u001b[33m\"foo\"\u001b[39;49;00m[index];\u001b[37m \u001b[39;49;00m//\u001b[37m \u001b[39;49;00mBOOM\u001b[37m\u001b[39;49;00m\n", "\u001b[04m\u001b[91m}\u001b[39;49;00m\u001b[37m\u001b[39;49;00m" ] } ], "source": [ "print_content(open('testoverflow.c').read())" ] }, { "cell_type": "code", "execution_count": 152, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:04.894210Z", "iopub.status.busy": "2023-11-12T12:40:04.893977Z", "iopub.status.idle": "2023-11-12T12:40:05.049272Z", "shell.execute_reply": "2023-11-12T12:40:05.040904Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==78== Memcheck, a memory error detector\r\n", "==78== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.\r\n", "==78== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info\r\n", "==78== Command: ./testoverflow\r\n", "==78== \r\n", "==78== \r\n", "==78== HEAP SUMMARY:\r\n", "==78== in use at exit: 0 bytes in 0 blocks\r\n", "==78== total heap usage: 0 allocs, 0 frees, 0 bytes allocated\r\n", "==78== \r\n", "==78== All heap blocks were freed -- no leaks are possible\r\n", "==78== \r\n", "==78== For counts of detected and suppressed errors, rerun with: -v\r\n", "==78== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)\r\n" ] } ], "source": [ "!valgrind ./testoverflow" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This is because at compile time, the information about the length of the `\"foo\"` string is no longer available – all Valgrind sees is a read access into the static data portion of the executable that may be valid or invalid. To actually detect such errors, we need to hook into the compiler." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Checking Memory Usage with Memory Sanitizer\n", "\n", "The second class of tools to detect memory issues are _address sanitizers_. An address sanitizer injects memory-checking code into the program _during compilation_. This means that every access will be checked – but this time, the code still runs on the processor itself, meaning that the speed is much less reduced." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here is an example of how to use the address sanitizer of the Clang C compiler:" ] }, { "cell_type": "code", "execution_count": 153, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:05.075274Z", "iopub.status.busy": "2023-11-12T12:40:05.074626Z", "iopub.status.idle": "2023-11-12T12:40:05.262660Z", "shell.execute_reply": "2023-11-12T12:40:05.262014Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "!cc -fsanitize=address -o testuseafterfree testuseafterfree.c" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "At the very first moment we have an out-of-bounds access, the program aborts with a diagnostic message – in our case already during `read_overflow()`." ] }, { "cell_type": "code", "execution_count": 154, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:05.265199Z", "iopub.status.busy": "2023-11-12T12:40:05.265038Z", "iopub.status.idle": "2023-11-12T12:40:05.753718Z", "shell.execute_reply": "2023-11-12T12:40:05.753368Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=================================================================\r\n", "\u001b[1m\u001b[31m==22366==ERROR: AddressSanitizer: heap-use-after-free on address 0x000105015a68 at pc 0x0001025dbeb8 bp 0x00016d826620 sp 0x00016d826618\r\n", "\u001b[1m\u001b[0m\u001b[1m\u001b[34mREAD of size 4 at 0x000105015a68 thread T0\u001b[1m\u001b[0m\r\n", " #0 0x1025dbeb4 in main+0x94 (testuseafterfree:arm64+0x100003eb4)\r\n", " #1 0x18567d0dc ()\r\n", "\r\n", "\u001b[1m\u001b[32m0x000105015a68 is located 40 bytes inside of 400-byte region [0x000105015a40,0x000105015bd0)\r\n", "\u001b[1m\u001b[0m\u001b[1m\u001b[35mfreed by thread T0 here:\u001b[1m\u001b[0m\r\n", " #0 0x102e86ce0 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x52ce0)\r\n", " #1 0x1025dbe58 in main+0x38 (testuseafterfree:arm64+0x100003e58)\r\n", " #2 0x18567d0dc ()\r\n", "\r\n", "\u001b[1m\u001b[35mpreviously allocated by thread T0 here:\u001b[1m\u001b[0m\r\n", " #0 0x102e86ba4 in wrap_malloc+0x94 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x52ba4)\r\n", " #1 0x1025dbe4c in main+0x2c (testuseafterfree:arm64+0x100003e4c)\r\n", " #2 0x18567d0dc ()\r\n", "\r\n", "SUMMARY: AddressSanitizer: heap-use-after-free (testuseafterfree:arm64+0x100003eb4) in main+0x94\r\n", "Shadow bytes around the buggy address:\r\n", " 0x000105015780: \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m\r\n", " 0x000105015800: \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m\r\n", " 0x000105015880: \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m\r\n", " 0x000105015900: \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m\r\n", " 0x000105015980: \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m\r\n", "=>0x000105015a00: \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m[\u001b[1m\u001b[35mfd\u001b[1m\u001b[0m]\u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m\r\n", " 0x000105015a80: \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m\r\n", " 0x000105015b00: \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m\r\n", " 0x000105015b80: \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m\r\n", " 0x000105015c00: \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m\r\n", " 0x000105015c80: \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m\r\n", "Shadow byte legend (one shadow byte represents 8 application bytes):\r\n", " Addressable: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m\r\n", " Partially addressable: \u001b[1m\u001b[0m01\u001b[1m\u001b[0m \u001b[1m\u001b[0m02\u001b[1m\u001b[0m \u001b[1m\u001b[0m03\u001b[1m\u001b[0m \u001b[1m\u001b[0m04\u001b[1m\u001b[0m \u001b[1m\u001b[0m05\u001b[1m\u001b[0m \u001b[1m\u001b[0m06\u001b[1m\u001b[0m \u001b[1m\u001b[0m07\u001b[1m\u001b[0m \r\n", " Heap left redzone: \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m\r\n", " Freed heap region: \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m\r\n", " Stack left redzone: \u001b[1m\u001b[31mf1\u001b[1m\u001b[0m\r\n", " Stack mid redzone: \u001b[1m\u001b[31mf2\u001b[1m\u001b[0m\r\n", " Stack right redzone: \u001b[1m\u001b[31mf3\u001b[1m\u001b[0m\r\n", " Stack after return: \u001b[1m\u001b[35mf5\u001b[1m\u001b[0m\r\n", " Stack use after scope: \u001b[1m\u001b[35mf8\u001b[1m\u001b[0m\r\n", " Global redzone: \u001b[1m\u001b[31mf9\u001b[1m\u001b[0m\r\n", " Global init order: \u001b[1m\u001b[36mf6\u001b[1m\u001b[0m\r\n", " Poisoned by user: \u001b[1m\u001b[34mf7\u001b[1m\u001b[0m\r\n", " Container overflow: \u001b[1m\u001b[34mfc\u001b[1m\u001b[0m\r\n", " Array cookie: \u001b[1m\u001b[31mac\u001b[1m\u001b[0m\r\n", " Intra object redzone: \u001b[1m\u001b[33mbb\u001b[1m\u001b[0m\r\n", " ASan internal: \u001b[1m\u001b[33mfe\u001b[1m\u001b[0m\r\n", " Left alloca redzone: \u001b[1m\u001b[34mca\u001b[1m\u001b[0m\r\n", " Right alloca redzone: \u001b[1m\u001b[34mcb\u001b[1m\u001b[0m\r\n", "==22366==ABORTING\r\n" ] } ], "source": [ "!./testuseafterfree" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Likewise, if we apply the address sanitizer on `testoverflow`, we also immediately get an error:" ] }, { "cell_type": "code", "execution_count": 155, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:05.755751Z", "iopub.status.busy": "2023-11-12T12:40:05.755631Z", "iopub.status.idle": "2023-11-12T12:40:05.945262Z", "shell.execute_reply": "2023-11-12T12:40:05.944784Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "!cc -fsanitize=address -o testoverflow testoverflow.c" ] }, { "cell_type": "code", "execution_count": 156, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:05.947232Z", "iopub.status.busy": "2023-11-12T12:40:05.947108Z", "iopub.status.idle": "2023-11-12T12:40:06.347628Z", "shell.execute_reply": "2023-11-12T12:40:06.347212Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=================================================================\r\n", "\u001b[1m\u001b[31m==22373==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00010030fe6a at pc 0x00010030fd60 bp 0x00016faf2640 sp 0x00016faf2638\r\n", "\u001b[1m\u001b[0m\u001b[1m\u001b[34mREAD of size 1 at 0x00010030fe6a thread T0\u001b[1m\u001b[0m\r\n", " #0 0x10030fd5c in main+0x84 (testoverflow:arm64+0x100003d5c)\r\n", " #1 0x18567d0dc ()\r\n", "\r\n", "\u001b[1m\u001b[32m0x00010030fe6a is located 6 bytes after global variable '.str' defined in 'testoverflow.c' (0x10030fe60) of size 4\r\n", "\u001b[1m\u001b[0m '.str' is ascii string 'foo'\r\n", "SUMMARY: AddressSanitizer: global-buffer-overflow (testoverflow:arm64+0x100003d5c) in main+0x84\r\n", "Shadow bytes around the buggy address:\r\n", " 0x00010030fb80: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m\r\n", " 0x00010030fc00: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m\r\n", " 0x00010030fc80: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m\r\n", " 0x00010030fd00: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m\r\n", " 0x00010030fd80: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m\r\n", "=>0x00010030fe00: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m04\u001b[1m\u001b[0m[\u001b[1m\u001b[31mf9\u001b[1m\u001b[0m]\u001b[1m\u001b[31mf9\u001b[1m\u001b[0m \u001b[1m\u001b[31mf9\u001b[1m\u001b[0m\r\n", " 0x00010030fe80: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m\r\n", " 0x00010030ff00: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m\r\n", " 0x00010030ff80: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m\r\n", " 0x000100310000: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m\r\n", " 0x000100310080: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m \u001b[1m\u001b[0m00\u001b[1m\u001b[0m\r\n", "Shadow byte legend (one shadow byte represents 8 application bytes):\r\n", " Addressable: \u001b[1m\u001b[0m00\u001b[1m\u001b[0m\r\n", " Partially addressable: \u001b[1m\u001b[0m01\u001b[1m\u001b[0m \u001b[1m\u001b[0m02\u001b[1m\u001b[0m \u001b[1m\u001b[0m03\u001b[1m\u001b[0m \u001b[1m\u001b[0m04\u001b[1m\u001b[0m \u001b[1m\u001b[0m05\u001b[1m\u001b[0m \u001b[1m\u001b[0m06\u001b[1m\u001b[0m \u001b[1m\u001b[0m07\u001b[1m\u001b[0m \r\n", " Heap left redzone: \u001b[1m\u001b[31mfa\u001b[1m\u001b[0m\r\n", " Freed heap region: \u001b[1m\u001b[35mfd\u001b[1m\u001b[0m\r\n", " Stack left redzone: \u001b[1m\u001b[31mf1\u001b[1m\u001b[0m\r\n", " Stack mid redzone: \u001b[1m\u001b[31mf2\u001b[1m\u001b[0m\r\n", " Stack right redzone: \u001b[1m\u001b[31mf3\u001b[1m\u001b[0m\r\n", " Stack after return: \u001b[1m\u001b[35mf5\u001b[1m\u001b[0m\r\n", " Stack use after scope: \u001b[1m\u001b[35mf8\u001b[1m\u001b[0m\r\n", " Global redzone: \u001b[1m\u001b[31mf9\u001b[1m\u001b[0m\r\n", " Global init order: \u001b[1m\u001b[36mf6\u001b[1m\u001b[0m\r\n", " Poisoned by user: \u001b[1m\u001b[34mf7\u001b[1m\u001b[0m\r\n", " Container overflow: \u001b[1m\u001b[34mfc\u001b[1m\u001b[0m\r\n", " Array cookie: \u001b[1m\u001b[31mac\u001b[1m\u001b[0m\r\n", " Intra object redzone: \u001b[1m\u001b[33mbb\u001b[1m\u001b[0m\r\n", " ASan internal: \u001b[1m\u001b[33mfe\u001b[1m\u001b[0m\r\n", " Left alloca redzone: \u001b[1m\u001b[34mca\u001b[1m\u001b[0m\r\n", " Right alloca redzone: \u001b[1m\u001b[34mcb\u001b[1m\u001b[0m\r\n", "==22373==ABORTING\r\n" ] } ], "source": [ "!./testoverflow" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Since the address sanitizer monitors each and every read and write, as well as usage of `free()`, it will require some effort to create a bug that it won't catch. Also, while Valgrind runs the program ten times slower and more, the performance penalty for memory sanitization is much much lower. Sanitizers can also help in finding data races, memory leaks, and all other sorts of undefined behavior. As [Daniel Lemire puts it](https://lemire.me/blog/2016/04/20/no-more-leaks-with-sanitize-flags-in-gcc-and-clang/):\n", "\n", "> Really, if you are using gcc or clang and you are not using these flags, you are not being serious." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## When Should Invariants be Checked?\n", "\n", "We have seen that during testing and debugging, invariants should be checked _as much as possible_, thus narrowing down the time it takes to detect a violation to a minimum. The easiest way to get there is to have them checked as _postcondition_ in the constructor and any other method that sets the state of an object." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If you have means to alter the state of an object outside of these methods – for instance, by directly writing to memory, or by writing to internal attributes –, then you may have to check them even more frequently. Using the [tracing infrastructure](Tracer.ipynb), for instance, you can have the tracer invoke `repOK()` with each and every line executed, thereby again directly pinpointing the moment the state gets corrupted. While this will slow down execution tremendously, it is still better to have the computer do the work than you stepping backwards and forwards through an execution." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Another question is whether assertions should remain active even in production code. Assertions take time, and this may be too much for production." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Assertions are not Production Code\n", "\n", "First of all, assertions are _not_ production code – the rest of the code should not be impacted by any assertion being on or off. If you write code like\n", "\n", "```python\n", "assert map.remove(location)\n", "```\n", "your assertion will have a side effect, namely removing a location from the map. If one turns assertions off, the side effect will be turned off as well. You need to change this into\n", "\n", "```python\n", "locationRemoved = map.remove(location)\n", "assert locationRemoved\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### For System Preconditions, Use Production Code \n", "\n", "Consequently, you should not rely on assertions for _system preconditions_ – that is, conditions that are necessary to keep the system running. System input (or anything that could be controlled by another party) still has to be validated by production code, not assertions. Critical conditions have to be checked by production code, not (only) assertions.\n", "\n", "If you have code such as\n", "```python\n", "assert command in {\"open\", \"close\", \"exit\"}\n", "exec(command)\n", "```\n", "then having the assertion document and check your assumptions is fine. However, if you turn the assertion off in production code, it will only be a matter of time until somebody sets `command` to `'system(\"/bin/sh\")'` and all of a sudden takes control over your system." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Consider Leaving Some Assertions On\n", "\n", "The main reason for turning assertions off is efficiency. However, _failing early is better than having bad data and not failing._ Think carefully which assertions have a high impact on execution time, and turn these off first. Assertions that have little to no impact on resources can be left on." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As an example, here's a piece of code that handles traffic in a simulation. The `light` variable can be either `RED`, `AMBER`, or `GREEN`:\n", "\n", "```python\n", "if light == RED:\n", " traffic.stop()\n", "elif light == AMBER:\n", " traffic.prepare_to_stop()\n", "elif light == GREEN:\n", " traffic.go()\n", "else:\n", " pass # This can't happen!\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Having an assertion\n", "\n", "```python\n", "assert light in [RED, AMBER, GREEN]\n", "```\n", "\n", "in your code will eat some (minor) resources. However, adding a line\n", "\n", "```python\n", "assert False\n", "```\n", "\n", "in the place of the `This can't happen!` line, above, will still catch errors, but require no resources at all." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If you have very critical software, it may be wise to actually pay the extra penalty for assertions (notably system assertions) rather than sacrifice reliability for performance. Keeping a memory sanitizer on even in production can have a small impact on performance, but will catch plenty of errors before some corrupted data (and even some attacks) have bad effects downstream." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Define How Your Application Should Handle Internal Errors\n", "\n", "By default, failing assertions are not exactly user-friendly – the diagnosis they provide is of interest to the code maintainers only. Think of how your application should handle internal errors as discovered by assertions (or the runtime system). Simply exiting (as assertions on C do) may not be the best option for critical software. Think about implementing your own assert functions with appropriate recovery methods." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Synopsis" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This chapter discusses _assertions_ to define _assumptions_ on function inputs and results:" ] }, { "cell_type": "code", "execution_count": 157, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.350098Z", "iopub.status.busy": "2023-11-12T12:40:06.349962Z", "iopub.status.idle": "2023-11-12T12:40:06.352278Z", "shell.execute_reply": "2023-11-12T12:40:06.351941Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def my_square_root(x): # type: ignore\n", " assert x >= 0\n", " y = square_root(x)\n", " assert math.isclose(y * y, x)\n", " return y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Notably, assertions detect _violations_ of these assumptions at runtime:" ] }, { "cell_type": "code", "execution_count": 158, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.353997Z", "iopub.status.busy": "2023-11-12T12:40:06.353872Z", "iopub.status.idle": "2023-11-12T12:40:06.355909Z", "shell.execute_reply": "2023-11-12T12:40:06.355659Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/76616918.py\", line 2, in \n", " y = my_square_root(-1)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2617682038.py\", line 2, in my_square_root\n", " assert x >= 0\n", "AssertionError (expected)\n" ] } ], "source": [ "with ExpectError():\n", " y = my_square_root(-1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "_System assertions_ help to detect invalid memory operations." ] }, { "cell_type": "code", "execution_count": 159, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.357655Z", "iopub.status.busy": "2023-11-12T12:40:06.357532Z", "iopub.status.idle": "2023-11-12T12:40:06.360018Z", "shell.execute_reply": "2023-11-12T12:40:06.359717Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/markdown": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Allocated| | | | | | | | | | |\n", "|Initialized| | | | | | | | | | |\n", "|Content|-1|0| | | | | | | | |\n" ], "text/plain": [ "|Address|0|1|2|3|4|5|6|7|8|9|\n", "|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|:---|\n", "|Allocated| | | | | | | | | | |\n", "|Initialized| | | | | | | | | | |\n", "|Content|-1|0| | | | | | | | |" ] }, "execution_count": 159, "metadata": {}, "output_type": "execute_result" } ], "source": [ "managed_mem = ManagedMemory()\n", "managed_mem" ] }, { "cell_type": "code", "execution_count": 160, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.361538Z", "iopub.status.busy": "2023-11-12T12:40:06.361427Z", "iopub.status.idle": "2023-11-12T12:40:06.363359Z", "shell.execute_reply": "2023-11-12T12:40:06.363087Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/1296110967.py\", line 2, in \n", " x = managed_mem[2]\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2465984283.py\", line 3, in __getitem__\n", " return self.read(address)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_22305/2898840933.py\", line 9, in read\n", " assert self.allocated[address], \\\n", "AssertionError: Reading from unallocated memory (expected)\n" ] } ], "source": [ "with ExpectError():\n", " x = managed_mem[2]" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Lessons Learned\n", "\n", "* _Assertions_ are powerful tools to have the computer check invariants during execution:\n", " * _Preconditions_ check whether the arguments to a function are correct\n", " * _Postconditions_ check whether the result of a function is correct\n", " * _Data Invariants_ allow checking data structures for integrity\n", "* Since assertions can be turned off for optimization, they should\n", " * not _change correct operation_ in any way\n", " * not do _any work that your application requires for correct operation_\n", " * not be used as a _replacement for errors that can possibly happen_; create permanent checks (and own exceptions) for these\n", "* _System assertions_ are powerful tools to monitor the integrity of the runtime system (notably memory)\n", "* The more assertions,\n", " * the earlier errors are detected\n", " * the easier it is to locate defects\n", " * the better the guidance towards failure causes during debugging" ] }, { "cell_type": "code", "execution_count": 161, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.364875Z", "iopub.status.busy": "2023-11-12T12:40:06.364768Z", "iopub.status.idle": "2023-11-12T12:40:06.366408Z", "shell.execute_reply": "2023-11-12T12:40:06.366141Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# ignore\n", "import os\n", "import shutil" ] }, { "cell_type": "code", "execution_count": 162, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.367841Z", "iopub.status.busy": "2023-11-12T12:40:06.367756Z", "iopub.status.idle": "2023-11-12T12:40:06.372639Z", "shell.execute_reply": "2023-11-12T12:40:06.372363Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# ignore\n", "# Let's cleanup things\n", "for path in [\n", " 'assert.h',\n", " 'testassert',\n", " 'testassert.c',\n", " 'testassert.dSYM',\n", " 'testoverflow',\n", " 'testoverflow.c',\n", " 'testoverflow.dSYM',\n", " 'testuseafterfree',\n", " 'testuseafterfree.c',\n", " 'testuseafterfree.dSYM',\n", " ]:\n", " if os.path.isdir(path):\n", " shutil.rmtree(path)\n", " else:\n", " try:\n", " os.remove(path)\n", " except FileNotFoundError:\n", " pass" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Next Steps\n", "\n", "In the next chapters, we will learn how to\n", "\n", "* [identify performance issues](PerformanceDebugger.ipynb)\n", "* [check flows and dependencies](Slicer.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Background\n", "\n", "The usage of _assertions_ goes back to the earliest days of programming. In 1947, Neumann and Goldstine defined _assertion boxes_ that would check the limits of specific variables. In his 1949 talk [\"Checking a Large Routine\"](http://www.turingarchive.org/browse.php/B/8), Alan Turing suggested\n", "\n", "> How can one check a large routine in the sense of making sure that it's right? In order that the man who checks may not have too difficult a task, the programmer should make a number of definite _assertions_ which can be checked individually, and from which the correctness of the whole program easily follows.\n", "\n", "[Valgrind](https://valgrind.org) originated as an academic tool which has seen lots of industrial usage. A [list of papers](https://www.valgrind.org/docs/pubs.html) is available on the Valgrind page.\n", "\n", "The [Address Sanitizer](http://clang.llvm.org/docs/AddressSanitizer.html) discussed in this chapter was developed at Google; the [paper by Serebryany](https://www.usenix.org/system/files/conference/atc12/atc12-final39.pdf) discusses several details." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Exercises" ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": true, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Exercise 1 – Storage Assertions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The Python [`shelve`](https://docs.python.org/3/library/shelve.html) module provides a simple interface for permanent storage of Python objects:" ] }, { "cell_type": "code", "execution_count": 163, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.374582Z", "iopub.status.busy": "2023-11-12T12:40:06.374443Z", "iopub.status.idle": "2023-11-12T12:40:06.377205Z", "shell.execute_reply": "2023-11-12T12:40:06.376903Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import shelve" ] }, { "cell_type": "code", "execution_count": 164, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.378882Z", "iopub.status.busy": "2023-11-12T12:40:06.378662Z", "iopub.status.idle": "2023-11-12T12:40:06.382851Z", "shell.execute_reply": "2023-11-12T12:40:06.382594Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "d = shelve.open('mydb')" ] }, { "cell_type": "code", "execution_count": 165, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.384334Z", "iopub.status.busy": "2023-11-12T12:40:06.384221Z", "iopub.status.idle": "2023-11-12T12:40:06.385953Z", "shell.execute_reply": "2023-11-12T12:40:06.385649Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "d['123'] = 123" ] }, { "cell_type": "code", "execution_count": 166, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.387447Z", "iopub.status.busy": "2023-11-12T12:40:06.387354Z", "iopub.status.idle": "2023-11-12T12:40:06.389757Z", "shell.execute_reply": "2023-11-12T12:40:06.389482Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "123" ] }, "execution_count": 166, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d['123']" ] }, { "cell_type": "code", "execution_count": 167, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.391447Z", "iopub.status.busy": "2023-11-12T12:40:06.391324Z", "iopub.status.idle": "2023-11-12T12:40:06.393152Z", "shell.execute_reply": "2023-11-12T12:40:06.392877Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "d.close()" ] }, { "cell_type": "code", "execution_count": 168, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.394625Z", "iopub.status.busy": "2023-11-12T12:40:06.394508Z", "iopub.status.idle": "2023-11-12T12:40:06.396346Z", "shell.execute_reply": "2023-11-12T12:40:06.396103Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "d = shelve.open('mydb')" ] }, { "cell_type": "code", "execution_count": 169, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.397801Z", "iopub.status.busy": "2023-11-12T12:40:06.397685Z", "iopub.status.idle": "2023-11-12T12:40:06.399738Z", "shell.execute_reply": "2023-11-12T12:40:06.399484Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "123" ] }, "execution_count": 169, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d['123']" ] }, { "cell_type": "code", "execution_count": 170, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.401106Z", "iopub.status.busy": "2023-11-12T12:40:06.401004Z", "iopub.status.idle": "2023-11-12T12:40:06.402585Z", "shell.execute_reply": "2023-11-12T12:40:06.402338Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "d.close()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Based on `shelve`, we can implement a class `ObjectStorage` that uses a context manager (a `with` block) to ensure the shelve database is always closed - also in presence of exceptions:" ] }, { "cell_type": "code", "execution_count": 171, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.403908Z", "iopub.status.busy": "2023-11-12T12:40:06.403814Z", "iopub.status.idle": "2023-11-12T12:40:06.405475Z", "shell.execute_reply": "2023-11-12T12:40:06.405214Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "from typing import Sequence, Any, Callable, Optional, Type, Tuple, Any\n", "from typing import Dict, Union, Set, List, FrozenSet, cast" ] }, { "cell_type": "code", "execution_count": 172, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.406848Z", "iopub.status.busy": "2023-11-12T12:40:06.406745Z", "iopub.status.idle": "2023-11-12T12:40:06.408571Z", "shell.execute_reply": "2023-11-12T12:40:06.408223Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from types import TracebackType" ] }, { "cell_type": "code", "execution_count": 173, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.410092Z", "iopub.status.busy": "2023-11-12T12:40:06.409984Z", "iopub.status.idle": "2023-11-12T12:40:06.412900Z", "shell.execute_reply": "2023-11-12T12:40:06.412641Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Storage:\n", " def __init__(self, dbname: str) -> None:\n", " self.dbname = dbname\n", "\n", " def __enter__(self) -> Any:\n", " self.db = shelve.open(self.dbname)\n", " return self\n", "\n", " def __exit__(self, exc_tp: Type, exc_value: BaseException, \n", " exc_traceback: TracebackType) -> Optional[bool]:\n", " self.db.close()\n", " return None\n", "\n", " def __getitem__(self, key: str) -> Any:\n", " return self.db[key]\n", "\n", " def __setitem__(self, key: str, value: Any) -> None:\n", " self.db[key] = value" ] }, { "cell_type": "code", "execution_count": 174, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.414343Z", "iopub.status.busy": "2023-11-12T12:40:06.414259Z", "iopub.status.idle": "2023-11-12T12:40:06.416512Z", "shell.execute_reply": "2023-11-12T12:40:06.416260Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "123\n" ] } ], "source": [ "with Storage('mydb') as storage:\n", " print(storage['123'])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "solution2": "hidden", "solution2_first": true }, "source": [ "#### Task 1 – Local Consistency\n", "\n", "Extend `Storage` with assertions that ensure that after adding an element, it also can be retrieved with the same value." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** One single assertion suffices:" ] }, { "cell_type": "code", "execution_count": 175, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.417943Z", "iopub.status.busy": "2023-11-12T12:40:06.417841Z", "iopub.status.idle": "2023-11-12T12:40:06.419724Z", "shell.execute_reply": "2023-11-12T12:40:06.419484Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "class Storage(Storage):\n", " def __setitem__(self, key: str, value: Any) -> None:\n", " self.db[key] = value\n", " assert self.db[key] == value" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "solution2": "hidden", "solution2_first": true }, "source": [ "#### Task 2 – Global Consistency\n", "\n", "Extend `Storage` with a \"shadow dictionary\" which holds elements in memory storage, too. Have a `repOK()` method that memory storage and `shelve` storage are identical at all times." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "source": [ "**Solution.** Here is a possible implementation:" ] }, { "cell_type": "code", "execution_count": 176, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.421173Z", "iopub.status.busy": "2023-11-12T12:40:06.421071Z", "iopub.status.idle": "2023-11-12T12:40:06.424545Z", "shell.execute_reply": "2023-11-12T12:40:06.424238Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "class ShadowStorage:\n", " def __init__(self, dbname: str) -> None:\n", " self.dbname = dbname\n", "\n", " def __enter__(self) -> Any:\n", " self.db = shelve.open(self.dbname)\n", " self.memdb = {}\n", " for key in self.db.keys():\n", " self.memdb[key] = self.db[key]\n", " assert self.repOK()\n", " return self\n", "\n", " def __exit__(self, exc_tp: Type, exc_value: BaseException, \n", " exc_traceback: TracebackType) -> Optional[bool]:\n", " self.db.close()\n", " return None\n", "\n", " def __getitem__(self, key: str) -> Any:\n", " assert self.repOK()\n", " return self.db[key]\n", "\n", " def __setitem__(self, key: str, value: Any) -> None:\n", " assert self.repOK()\n", " self.memdb[key] = self.db[key] = value\n", " assert self.repOK()\n", "\n", " def repOK(self) -> bool:\n", " assert self.db.keys() == self.memdb.keys(), f\"{self.dbname}: Differing keys\"\n", " for key in self.memdb.keys():\n", " assert self.db[key] == self.memdb[key], \\\n", " f\"{self.dbname}: Differing values for {repr(key)}\"\n", " return True" ] }, { "cell_type": "code", "execution_count": 177, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.426238Z", "iopub.status.busy": "2023-11-12T12:40:06.426138Z", "iopub.status.idle": "2023-11-12T12:40:06.428423Z", "shell.execute_reply": "2023-11-12T12:40:06.428182Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "123\n" ] } ], "source": [ "with ShadowStorage('mydb') as storage:\n", " storage['456'] = 456\n", " print(storage['123'])" ] }, { "cell_type": "code", "execution_count": 178, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.429760Z", "iopub.status.busy": "2023-11-12T12:40:06.429661Z", "iopub.status.idle": "2023-11-12T12:40:06.431426Z", "shell.execute_reply": "2023-11-12T12:40:06.431151Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "# ignore\n", "try:\n", " os.remove('mydb.db') # on macOS\n", "except FileNotFoundError:\n", " pass" ] }, { "cell_type": "code", "execution_count": 179, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T12:40:06.432841Z", "iopub.status.busy": "2023-11-12T12:40:06.432742Z", "iopub.status.idle": "2023-11-12T12:40:06.434335Z", "shell.execute_reply": "2023-11-12T12:40:06.434106Z" }, "slideshow": { "slide_type": "skip" }, "solution2": "hidden" }, "outputs": [], "source": [ "# ignore\n", "try:\n", " os.remove('mydb') # on Linux\n", "except FileNotFoundError:\n", " pass" ] } ], "metadata": { "ipub": { "bibliography": "fuzzingbook.bib", "toc": true }, "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "toc-autonumbering": false, "toc-showmarkdowntxt": false, "vscode": { "interpreter": { "hash": "0af4f07dd039d1b4e562c7a7d0340393b1c66f50605ac6af30beb81aa23b7ef5" } } }, "nbformat": 4, "nbformat_minor": 4 }