{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Second Project" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In this project, you will implement a debugging reducer able to reduce a Python program containing a specific property down to a syntactically correct Python program that *only* contains said property.\n", "To do this, you will be given several Python parsers checking for specific properties as well as Python programs containing these properties.\n", "\n", "The time frame for this project is **2 weeks**, and the Deadline is **January 15th 23:59**." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:28.798853Z", "iopub.status.busy": "2023-11-12T13:03:28.798732Z", "iopub.status.idle": "2023-11-12T13:03:28.829417Z", "shell.execute_reply": "2023-11-12T13:03:28.829125Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import bookutils.setup" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:28.831193Z", "iopub.status.busy": "2023-11-12T13:03:28.831080Z", "iopub.status.idle": "2023-11-12T13:03:28.832856Z", "shell.execute_reply": "2023-11-12T13:03:28.832611Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import inspect\n", "import ast" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:28.834236Z", "iopub.status.busy": "2023-11-12T13:03:28.834151Z", "iopub.status.idle": "2023-11-12T13:03:28.836007Z", "shell.execute_reply": "2023-11-12T13:03:28.835731Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from bookutils import show_ast" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:28.837593Z", "iopub.status.busy": "2023-11-12T13:03:28.837483Z", "iopub.status.idle": "2023-11-12T13:03:28.864688Z", "shell.execute_reply": "2023-11-12T13:03:28.864339Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ExpectError import ExpectError" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:28.866683Z", "iopub.status.busy": "2023-11-12T13:03:28.866564Z", "iopub.status.idle": "2023-11-12T13:03:28.868227Z", "shell.execute_reply": "2023-11-12T13:03:28.867990Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "# ignore\n", "from typing import Any, Optional, Set, cast" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Description\n", "\n", "Imagine we want to create a new compiler for Python programs which transforms the code into a binary executable.\n", "To this end, at first we need to implement a parser that parses a given program into a set of individual statements which can then be converted to the bytecode." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Parsing Python \n", "\n", "If we want to parse a programming language like Python, it makes sense to parse the code into an abstract syntax tree (AST) and work with the AST from this point onward.\n", "The reason we might want to do this, is to preserve the syntactic integrity of our program.\n", "Let's start with parsing a simple Python function (`foo`) into an AST using `ast.parse()` and printing it." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:28.869770Z", "iopub.status.busy": "2023-11-12T13:03:28.869665Z", "iopub.status.idle": "2023-11-12T13:03:28.871698Z", "shell.execute_reply": "2023-11-12T13:03:28.871409Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def foo(a, b): # type: ignore\n", " \"\"\"\n", " Checks whether a and b are true\n", " \"\"\"\n", " if a and b:\n", " return 1\n", " return 0" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "We take a string representation of `foo()`..." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:28.873251Z", "iopub.status.busy": "2023-11-12T13:03:28.873145Z", "iopub.status.idle": "2023-11-12T13:03:28.875067Z", "shell.execute_reply": "2023-11-12T13:03:28.874810Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "source = inspect.getsource(foo)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:28.876490Z", "iopub.status.busy": "2023-11-12T13:03:28.876386Z", "iopub.status.idle": "2023-11-12T13:03:28.878098Z", "shell.execute_reply": "2023-11-12T13:03:28.877864Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "def foo(a, b): # type: ignore\n", " \"\"\"\n", " Checks whether a and b are true\n", " \"\"\"\n", " if a and b:\n", " return 1\n", " return 0\n", "\n" ] } ], "source": [ "print(source)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "...and convert it to the AST:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:28.898862Z", "iopub.status.busy": "2023-11-12T13:03:28.898720Z", "iopub.status.idle": "2023-11-12T13:03:29.320485Z", "shell.execute_reply": "2023-11-12T13:03:29.320161Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "0\n", "FunctionDef\n", "\n", "\n", "\n", "1\n", ""foo"\n", "\n", "\n", "\n", "0--1\n", "\n", "\n", "\n", "\n", "2\n", "arguments\n", "\n", "\n", "\n", "0--2\n", "\n", "\n", "\n", "\n", "7\n", "If\n", "\n", "\n", "\n", "0--7\n", "\n", "\n", "\n", "\n", "19\n", "Return\n", "\n", "\n", "\n", "0--19\n", "\n", "\n", "\n", "\n", "3\n", "arg\n", "\n", "\n", "\n", "2--3\n", "\n", "\n", "\n", "\n", "5\n", "arg\n", "\n", "\n", "\n", "2--5\n", "\n", "\n", "\n", "\n", "4\n", ""a"\n", "\n", "\n", "\n", "3--4\n", "\n", "\n", "\n", "\n", "6\n", ""b"\n", "\n", "\n", "\n", "5--6\n", "\n", "\n", "\n", "\n", "8\n", "BoolOp\n", "\n", "\n", "\n", "7--8\n", "\n", "\n", "\n", "\n", "16\n", "Return\n", "\n", "\n", "\n", "7--16\n", "\n", "\n", "\n", "\n", "9\n", "And\n", "\n", "\n", "\n", "8--9\n", "\n", "\n", "\n", "\n", "10\n", "Name\n", "\n", "\n", "\n", "8--10\n", "\n", "\n", "\n", "\n", "13\n", "Name\n", "\n", "\n", "\n", "8--13\n", "\n", "\n", "\n", "\n", "11\n", ""a"\n", "\n", "\n", "\n", "10--11\n", "\n", "\n", "\n", "\n", "12\n", "Load\n", "\n", "\n", "\n", "10--12\n", "\n", "\n", "\n", "\n", "14\n", ""b"\n", "\n", "\n", "\n", "13--14\n", "\n", "\n", "\n", "\n", "15\n", "Load\n", "\n", "\n", "\n", "13--15\n", "\n", "\n", "\n", "\n", "17\n", "Constant\n", "\n", "\n", "\n", "16--17\n", "\n", "\n", "\n", "\n", "18\n", "1\n", "\n", "\n", "\n", "17--18\n", "\n", "\n", "\n", "\n", "20\n", "Constant\n", "\n", "\n", "\n", "19--20\n", "\n", "\n", "\n", "\n", "21\n", "0\n", "\n", "\n", "\n", "20--21\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "source_ast = ast.parse(source)\n", "show_ast(source_ast)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "We can see how the code is structured by inspecting the AST shown above." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To learn the structure of an AST, have a look at the [official Python `ast` reference](http://docs.python.org/3/library/ast) for a list of AST nodes and their attributes.\n", "This reference is complete, but a bit terse; the [\"Green Tree Snakes missing Python AST docs\"](https://greentreesnakes.readthedocs.io/en/latest/manipulating.html) provide a good manual on how to modify the tree." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "If we want to inspect certain elements of the source code, we can now use the `ast.NodeVisitor` to visit every node in the AST.\n", "This can be done by extending this class and implementing *visit-methods* for each type of node in the tree we want to inspect. \n", "So, let's implement a parser that first parses the source code into an AST and then visits each node one after another." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.322332Z", "iopub.status.busy": "2023-11-12T13:03:29.322196Z", "iopub.status.idle": "2023-11-12T13:03:29.325969Z", "shell.execute_reply": "2023-11-12T13:03:29.325551Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class SimpleParser(ast.NodeVisitor):\n", " \"\"\"\n", " A simple parser printing all nodes in order\n", " \"\"\"\n", "\n", " def parse(self, file: str) -> None:\n", " \"\"\"\n", " When the parse function is called, we parse the source code\n", " into a tree and then start visiting all nodes.\n", " \"\"\"\n", " tree = ast.parse(source=file)\n", " self.visit(tree)\n", "\n", " def visit_FunctionDef(self, node: ast.FunctionDef) -> None:\n", " print(node)\n", " self.generic_visit(node)\n", "\n", " def visit_arguments(self, node: ast.arguments) -> None:\n", " print(node)\n", " self.generic_visit(node)\n", "\n", " def visit_If(self, node: ast.If) -> None:\n", " print(node)\n", " self.generic_visit(node)\n", "\n", " def visit_Return(self, node: ast.Return) -> None:\n", " print(node)\n", " self.generic_visit(node)\n", "\n", " def visit_arg(self, node: ast.arg) -> None:\n", " print(node)\n", " self.generic_visit(node)\n", "\n", " def visit_Constant(self, node: ast.Constant) -> None:\n", " print(node)\n", " self.generic_visit(node)\n", "\n", " def visit_Name(self, node: ast.Name) -> None:\n", " print(node)\n", " self.generic_visit(node)\n", "\n", " def visit_BoolOp(self, node: ast.BoolOp) -> None:\n", " print(node)\n", " self.generic_visit(node)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "Using this `SimpleParser`, we can visit each node and print the node object itself." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.328031Z", "iopub.status.busy": "2023-11-12T13:03:29.327926Z", "iopub.status.idle": "2023-11-12T13:03:29.330137Z", "shell.execute_reply": "2023-11-12T13:03:29.329702Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] } ], "source": [ "SimpleParser().parse(source)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "Now imagine, instead of just parsing the input and printing each node, we start to process the information present in each node.\n", "To notify the user if anything breaks, we introduce a `ParserException` that is thrown when we encounter a parsing error." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.331991Z", "iopub.status.busy": "2023-11-12T13:03:29.331897Z", "iopub.status.idle": "2023-11-12T13:03:29.333443Z", "shell.execute_reply": "2023-11-12T13:03:29.333197Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class ParserException(Exception):\n", " pass" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "For instance, let's consider our parser has troubles processing boolean operators.\n", "We simulate this behavior by extending our parser to throw an exception whenever it encounters a `BoolOp` node.\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.335040Z", "iopub.status.busy": "2023-11-12T13:03:29.334912Z", "iopub.status.idle": "2023-11-12T13:03:29.336980Z", "shell.execute_reply": "2023-11-12T13:03:29.336681Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class SimpleParserPlus(SimpleParser):\n", " \"\"\"\n", " A simple parser printing all nodes in order until it reaches a BoolOp\n", " \"\"\"\n", "\n", " def visit_BoolOp(self, node: ast.BoolOp) -> None:\n", " \"\"\"\n", " If a BoolOp is encountered, we throw an exception\n", " \"\"\"\n", " raise ParserException(\"Something went wrong\")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.338409Z", "iopub.status.busy": "2023-11-12T13:03:29.338302Z", "iopub.status.idle": "2023-11-12T13:03:29.341801Z", "shell.execute_reply": "2023-11-12T13:03:29.341356Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/257518062.py\", line 2, in \n", " SimpleParserPlus().parse(source)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/2922974276.py\", line 12, in parse\n", " self.visit(tree)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/2922974276.py\", line 16, in visit_FunctionDef\n", " self.generic_visit(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/2922974276.py\", line 24, in visit_If\n", " self.generic_visit(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 420, in generic_visit\n", " self.visit(value)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1656391407.py\", line 10, in visit_BoolOp\n", " raise ParserException(\"Something went wrong\")\n", "ParserException: Something went wrong (expected)\n" ] } ], "source": [ "with ExpectError():\n", " SimpleParserPlus().parse(source)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As we can see here, as soon as we encounter a boolean operation, an exception is thrown." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Reducing Python\n", "\n", "Let's get back to our `SimpleParserPlus` example. This parser parses Python code into an AST and then iterates over all nodes. When it reaches a boolean operation, it encounters an error and crashes. \n", "Of course our example is a very small and artificial one. Furthermore, the input triggering our bug is rather small as well.\n", "However, in reality a similar situation might present itself to you but with a far less obvious error and a far larger input that causes it.\n", "In this case, instead of using the entire input, one might want to reduce the input down to only those parts actually causing the error. \n", "But how do we do that? \n", "We could, for instance, randomly remove some parts of the program.\n", "However, this would likely produce syntactically invalid Python code and hence prevent the parser from reaching the desired code location.\n", "So instead of just removing random parts of the code itself, we will manipulate the AST to preserve the correct syntax.\n", "\n", "How do we manipulate an AST produced by `ast.parse`? We can make use of the `ast.NodeTransformer`:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.343458Z", "iopub.status.busy": "2023-11-12T13:03:29.343337Z", "iopub.status.idle": "2023-11-12T13:03:29.345466Z", "shell.execute_reply": "2023-11-12T13:03:29.345078Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class NodeReducer(ast.NodeTransformer):\n", "\n", " def visit_If(self, node: ast.If) -> ast.AST:\n", " \"\"\"\n", " Instead of the `if` node return just its condition\n", " \"\"\"\n", " # Apply other transformations to the children first\n", " super().generic_visit(node)\n", "\n", " # We create an expression around the node test\n", " new_node = ast.Expr(node.test)\n", " return new_node" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "The `ast.NodeTransformer` again uses the visitor pattern to walk over all nodes.\n", "This time, however, we change the resulting AST. In our case, if we encounter an *if-statement* we only return the condition. \n", "\n", "(Keep in mind, `NodeReducer` changes the ast in place, so if you want to preserve the original tree, you need to copy it with `copy.deepcopy(tree)`)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.347158Z", "iopub.status.busy": "2023-11-12T13:03:29.347059Z", "iopub.status.idle": "2023-11-12T13:03:29.746135Z", "shell.execute_reply": "2023-11-12T13:03:29.745786Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "0\n", "FunctionDef\n", "\n", "\n", "\n", "1\n", ""foo"\n", "\n", "\n", "\n", "0--1\n", "\n", "\n", "\n", "\n", "2\n", "arguments\n", "\n", "\n", "\n", "0--2\n", "\n", "\n", "\n", "\n", "7\n", "Expr\n", "\n", "\n", "\n", "0--7\n", "\n", "\n", "\n", "\n", "16\n", "Return\n", "\n", "\n", "\n", "0--16\n", "\n", "\n", "\n", "\n", "3\n", "arg\n", "\n", "\n", "\n", "2--3\n", "\n", "\n", "\n", "\n", "5\n", "arg\n", "\n", "\n", "\n", "2--5\n", "\n", "\n", "\n", "\n", "4\n", ""a"\n", "\n", "\n", "\n", "3--4\n", "\n", "\n", "\n", "\n", "6\n", ""b"\n", "\n", "\n", "\n", "5--6\n", "\n", "\n", "\n", "\n", "8\n", "BoolOp\n", "\n", "\n", "\n", "7--8\n", "\n", "\n", "\n", "\n", "9\n", "And\n", "\n", "\n", "\n", "8--9\n", "\n", "\n", "\n", "\n", "10\n", "Name\n", "\n", "\n", "\n", "8--10\n", "\n", "\n", "\n", "\n", "13\n", "Name\n", "\n", "\n", "\n", "8--13\n", "\n", "\n", "\n", "\n", "11\n", ""a"\n", "\n", "\n", "\n", "10--11\n", "\n", "\n", "\n", "\n", "12\n", "Load\n", "\n", "\n", "\n", "10--12\n", "\n", "\n", "\n", "\n", "14\n", ""b"\n", "\n", "\n", "\n", "13--14\n", "\n", "\n", "\n", "\n", "15\n", "Load\n", "\n", "\n", "\n", "13--15\n", "\n", "\n", "\n", "\n", "17\n", "Constant\n", "\n", "\n", "\n", "16--17\n", "\n", "\n", "\n", "\n", "18\n", "0\n", "\n", "\n", "\n", "17--18\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "NodeReducer().visit(source_ast)\n", "show_ast(source_ast)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "As you can see, after running the `NodeReducer` there is no longer an *if-statement*, but only its condition.\n", "A more sophisticated NodeReducer is presented in the [Reducing Failure-Inducing Inputs chapter](https://www.debuggingbook.org/beta/html/DeltaDebugger.html) in the debugging book." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "To learn the structure of an AST, have a look at the [official Python `ast` reference](http://docs.python.org/3/library/ast) for a list of AST nodes and their attributes. This reference is complete, but a bit terse; the [\"Green Tree Snakes missing Python AST docs\"](https://greentreesnakes.readthedocs.io/en/latest/manipulating.html) provide a good manual on how to modify the tree." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Pro tip**: You can learn the expected structure of any Python fragment by invoking `ast.dump()` on any (parsed) AST." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.748165Z", "iopub.status.busy": "2023-11-12T13:03:29.748042Z", "iopub.status.idle": "2023-11-12T13:03:29.750071Z", "shell.execute_reply": "2023-11-12T13:03:29.749811Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "demo_ast = ast.parse('''\n", "if a == b:\n", " c = d\n", "''')" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.751574Z", "iopub.status.busy": "2023-11-12T13:03:29.751461Z", "iopub.status.idle": "2023-11-12T13:03:29.753307Z", "shell.execute_reply": "2023-11-12T13:03:29.753026Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Module(\n", " body=[\n", " If(\n", " test=Compare(\n", " left=Name(id='a', ctx=Load()),\n", " ops=[\n", " Eq()],\n", " comparators=[\n", " Name(id='b', ctx=Load())]),\n", " body=[\n", " Assign(\n", " targets=[\n", " Name(id='c', ctx=Store())],\n", " value=Name(id='d', ctx=Load()))],\n", " orelse=[])],\n", " type_ignores=[])\n" ] } ], "source": [ "print(ast.dump(demo_ast, indent=4))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The nice thing is that this not only reveals the AST structure, it also tells you how to construct it. If you take the very structure as produced by `ast.dump()` and evaluate it, you obtain the same syntax tree." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.754805Z", "iopub.status.busy": "2023-11-12T13:03:29.754692Z", "iopub.status.idle": "2023-11-12T13:03:29.756416Z", "shell.execute_reply": "2023-11-12T13:03:29.756148Z" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from ast import Module, If, Compare, Load, Eq, Assign, Name, Store" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.757954Z", "iopub.status.busy": "2023-11-12T13:03:29.757846Z", "iopub.status.idle": "2023-11-12T13:03:29.760219Z", "shell.execute_reply": "2023-11-12T13:03:29.759926Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "ast_from_scratch = Module(\n", " body=[\n", " If(\n", " test=Compare(\n", " left=Name(id='a', ctx=Load()),\n", " ops=[\n", " Eq()],\n", " comparators=[\n", " Name(id='b', ctx=Load())]),\n", " body=[\n", " Assign(\n", " targets=[\n", " Name(id='c', ctx=Store())],\n", " value=Name(id='d', ctx=Load()))],\n", " orelse=[])],\n", " type_ignores=[])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "You need to invoke `ast.fix_missing_locations()` to add line numbers and other location info that would be normally added during parsing..." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.761795Z", "iopub.status.busy": "2023-11-12T13:03:29.761682Z", "iopub.status.idle": "2023-11-12T13:03:29.763380Z", "shell.execute_reply": "2023-11-12T13:03:29.763124Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "ast_from_scratch = ast.fix_missing_locations(ast_from_scratch)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "... and with this, you're all set:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.764881Z", "iopub.status.busy": "2023-11-12T13:03:29.764777Z", "iopub.status.idle": "2023-11-12T13:03:29.766499Z", "shell.execute_reply": "2023-11-12T13:03:29.766260Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "if a == b:\n", " c = d\n" ] } ], "source": [ "print(ast.unparse(ast_from_scratch))" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "Hence, if you do not know what kind of AST fragment to produce, just\n", "1. parse its text with ``ast.parse()`` \n", "2. study the expression produced by `ast.dump()`, and\n", "3. build it from the same expression, using `ast.fix_missing_locations()` and `ast.unparse()`" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Implementing a Reducing Debugger\n", "\n", "Having seen how one can access and transform AST structures using a `NodeReducer` in Python, we can now use that knowledge to implement a `ReducingDebugger`\n", "capable of reducing the failure inducing Python code down to only that parts of the code that actually triggers the failure." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.768011Z", "iopub.status.busy": "2023-11-12T13:03:29.767898Z", "iopub.status.idle": "2023-11-12T13:03:29.770581Z", "shell.execute_reply": "2023-11-12T13:03:29.770338Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class ReducingDebugger:\n", " def __init__(self, parser: SimpleParser) -> None:\n", " \"\"\"\n", " We initialize the DebuggingReducer with a parser,\n", " as this is needed to verify whether the failure \n", " is still triggered after the code transformation\n", " \"\"\"\n", " self.parser = parser\n", " self.node_reducer = NodeReducer()\n", "\n", " def minimize(self, code: str) -> str:\n", " \"\"\"\n", " This function takes some Python code as string, reduces\n", " it using the NodeReducer defined earlier and returns the \n", " reduced program.\n", " \"\"\"\n", " # Parse the code to a tree\n", " code_ast = ast.parse(source=code)\n", "\n", " # Use the visit-method of the NodeReducer to reduce if nodes down to their condition\n", " new_code_ast = self.node_reducer.visit(code_ast)\n", "\n", " # After we've updated nodes, we need to fix the tree\n", " ast.fix_missing_locations(new_code_ast)\n", "\n", " # Generate code from the reduced tree\n", " new_code = ast.unparse(new_code_ast)\n", "\n", " # Test, whether the error is still triggered by the reduced code\n", " try:\n", " self.parser.parse(new_code)\n", " # No exception is thrown. This means the new_code does not \n", " # trigger an error anymore. Therefore, we failed in\n", " # reduction and return the initial code.\n", " return code\n", " except ParserException:\n", " # The error is still triggered. Return the reduced code\n", " return new_code" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "The `ReducingDebugger` implemented above takes some Python code as an input and replaces all if-statements with their conditions.\n", "Then, it checks whether the newly generated code still triggers the error. If not, it returns the initial code. If the exception is still triggered, it returns the reduced code.\n", "Let's apply it to the source code of our `foo` function:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.771955Z", "iopub.status.busy": "2023-11-12T13:03:29.771854Z", "iopub.status.idle": "2023-11-12T13:03:29.773850Z", "shell.execute_reply": "2023-11-12T13:03:29.773579Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "\n" ] } ], "source": [ "reducing_debugger = ReducingDebugger(SimpleParserPlus())\n", "new_source = reducing_debugger.minimize(source)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.775449Z", "iopub.status.busy": "2023-11-12T13:03:29.775341Z", "iopub.status.idle": "2023-11-12T13:03:29.777012Z", "shell.execute_reply": "2023-11-12T13:03:29.776738Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "def foo(a, b):\n", " \"\"\"\n", " Checks whether a and b are true\n", " \"\"\"\n", " a and b\n", " return 0\n" ] } ], "source": [ "print(new_source)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.778465Z", "iopub.status.busy": "2023-11-12T13:03:29.778357Z", "iopub.status.idle": "2023-11-12T13:03:29.780537Z", "shell.execute_reply": "2023-11-12T13:03:29.780284Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/3561000985.py\", line 2, in \n", " SimpleParserPlus().parse(new_source)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/2922974276.py\", line 12, in parse\n", " self.visit(tree)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/2922974276.py\", line 16, in visit_FunctionDef\n", " self.generic_visit(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 420, in generic_visit\n", " self.visit(value)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1656391407.py\", line 10, in visit_BoolOp\n", " raise ParserException(\"Something went wrong\")\n", "ParserException: Something went wrong (expected)\n" ] } ], "source": [ "with ExpectError():\n", " SimpleParserPlus().parse(new_source)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "As we can see, our `ReducingDebugger` reduces the original function to a smaller input, while still preserving the property of the code that triggers the same error as before.\n", "However, this reducer is very simple: it applies only one transformation to all `if` nodes at once.\n", "Instead, one can consider various transformations of the same node as well as transformations of various node types and apply them one by one (or in batches) to gradually reduce the code." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "Keep in mind, that the changes should be made in a way that the code is still accepted by a regular Python interpreter.\n", "Besides, the goal is to obtain a minimal program that still triggers the error.\n", "So, your debugging reducer should not simply reduce all possible nodes, but also regularly check whether the reduced code still triggers the failure." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "Therefore, you are required to implement certain modifications that can be done to the AST. In general, these transformations can be one of the following: \n", "\n", "1. Delete a node,\n", "2. Substitute a node with the new node (e.g., _pass_ node),\n", "3. Substitute a node with all of its children,\n", "4. Substitute a node with one of its children." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "More detailed, one can consider, for instance:\n", "\n", "1. Replacing a `BoolOp` node by `True`.\n", "2. Replacing a `BoolOp` node by `False`.\n", "3. Replacing a `BoolOp` node by its left operand.\n", "4. Replacing a `BoolOp` node by its right operand.\n", "5. Replacing an `If` node by its \"then\" body.\n", "6. Replacing an `If` node by its condition.\n", "7. Replacing an `If` node by its \"else\" body.\n", "8. Replacing all instances of a variable by a constant.\n", "9. Replacing expressions by a constant.\n", "10. etc.\n", "\n", "Please, refer to the [official Python `ast` reference](http://docs.python.org/3/library/ast) for a full list of node types." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Implementation\n", "\n", "To succeed in this project, you need to complete the `MyReducer` class." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.782138Z", "iopub.status.busy": "2023-11-12T13:03:29.782035Z", "iopub.status.idle": "2023-11-12T13:03:29.783723Z", "shell.execute_reply": "2023-11-12T13:03:29.783478Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class MyReducer(ReducingDebugger):\n", "\n", " def minimize(self, code: str) -> str:\n", " # TODO: implement this!\n", " return code" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "To this end, you will need to extend the `NodeReducer` with additional transformations and implement a reducer strategy\n", "which collects all possible transformations and applies them to the AST until it is exhaustively minimized.\n", "Feel free to make your own implementation or extend the `DeltaDebugger` from the [Reducing Failure-Inducing Inputs chapter](https://www.debuggingbook.org/beta/html/DeltaDebugger.html) with\n", "the proper strategy." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "slide" } }, "source": [ "## Evaluation\n", "\n", "We evaluate your project based on public as well as secret tests. In this section, we present **five** different Python parsers as well as **five** Python input programs, which should be minified.\n", "These parsers check for a specific property in the code and fail the execution if the property exists. The input programs and parsers in this section make up the public test cases.\n", "If you pass all of those tests **without hardcoding the modifications** you are guaranteed to score 15 points in this project. You can get more points for passing secret tests." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Parsers" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.785196Z", "iopub.status.busy": "2023-11-12T13:03:29.785085Z", "iopub.status.idle": "2023-11-12T13:03:29.787096Z", "shell.execute_reply": "2023-11-12T13:03:29.786858Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Parser(ast.NodeVisitor):\n", " \"\"\"The base class for a parser\"\"\"\n", "\n", " def parse_tree(self, tree: ast.AST) -> None:\n", " self.visit(tree)\n", "\n", " def parse(self, source: str) -> str:\n", " tree = ast.parse(source=source)\n", " self.parse_tree(tree)\n", " return \"The input was successfully parsed.\"" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "`Parser` is the base class for a parser from which all other parsers are derived. It works the same way as the `SimpleParser` introduced earlier." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's define a couple of parsers which model a failure during processing of a certain code feature." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.788574Z", "iopub.status.busy": "2023-11-12T13:03:29.788467Z", "iopub.status.idle": "2023-11-12T13:03:29.790246Z", "shell.execute_reply": "2023-11-12T13:03:29.789991Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Parser1(Parser):\n", " \"\"\"\n", " Contains a boolean operation\n", " \"\"\"\n", "\n", " def visit_BoolOp(self, node: ast.BoolOp) -> None:\n", " raise ParserException(f\"Something went wrong\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we feed this parser with an input program which contains a boolean operation, it fails:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.791665Z", "iopub.status.busy": "2023-11-12T13:03:29.791562Z", "iopub.status.idle": "2023-11-12T13:03:29.793638Z", "shell.execute_reply": "2023-11-12T13:03:29.793370Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/2806337694.py\", line 5, in \n", " parser1.parse(input_program)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1737200860.py\", line 9, in parse\n", " self.parse_tree(tree)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1737200860.py\", line 5, in parse_tree\n", " self.visit(tree)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 420, in generic_visit\n", " self.visit(value)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/2980862917.py\", line 7, in visit_BoolOp\n", " raise ParserException(f\"Something went wrong\")\n", "ParserException: Something went wrong (expected)\n" ] } ], "source": [ "parser1 = Parser1()\n", "input_program = '''\n", "a = True and False'''\n", "with ExpectError():\n", " parser1.parse(input_program)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "In contrast to the above, an input program _without_ a boolean operation is parsed correctly." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.795128Z", "iopub.status.busy": "2023-11-12T13:03:29.795018Z", "iopub.status.idle": "2023-11-12T13:03:29.797260Z", "shell.execute_reply": "2023-11-12T13:03:29.797014Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'The input was successfully parsed.'" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parser1 = Parser1()\n", "input_program = '''\n", "a = 1\n", "b = True\n", "'''\n", "\n", "parser1.parse(input_program)" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "The other parsers are" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.798796Z", "iopub.status.busy": "2023-11-12T13:03:29.798683Z", "iopub.status.idle": "2023-11-12T13:03:29.800476Z", "shell.execute_reply": "2023-11-12T13:03:29.800228Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Parser2(Parser):\n", " \"\"\"\n", " Fails if an input program contains `if` statement\n", " \"\"\"\n", " def visit_If(self, node: ast.If) -> None:\n", " raise ParserException(f\"Something went wrong\")" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.801908Z", "iopub.status.busy": "2023-11-12T13:03:29.801798Z", "iopub.status.idle": "2023-11-12T13:03:29.803721Z", "shell.execute_reply": "2023-11-12T13:03:29.803479Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1476831468.py\", line 9, in \n", " parser2.parse(input_program)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1737200860.py\", line 9, in parse\n", " self.parse_tree(tree)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1737200860.py\", line 5, in parse_tree\n", " self.visit(tree)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1684278366.py\", line 6, in visit_If\n", " raise ParserException(f\"Something went wrong\")\n", "ParserException: Something went wrong (expected)\n" ] } ], "source": [ "parser2 = Parser2()\n", "input_program = '''\n", "if a:\n", " b = 1\n", "else:\n", " b = 2\n", "'''\n", "with ExpectError():\n", " parser2.parse(input_program)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.805258Z", "iopub.status.busy": "2023-11-12T13:03:29.805124Z", "iopub.status.idle": "2023-11-12T13:03:29.807925Z", "shell.execute_reply": "2023-11-12T13:03:29.807684Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Parser3(Parser):\n", " \"\"\"\n", " Fails if an input program contains a special unicode character\n", " \"\"\"\n", "\n", " def __init__(self) -> None:\n", " self.assignment = False\n", " self.steps = 0\n", "\n", " def check_unicode(self, string: str) -> bool:\n", " return string == u'\\u0426'\n", "\n", " def generic_visit(self, node: ast.AST) -> None:\n", " self.steps += 1\n", " super().generic_visit(node)\n", "\n", " def visit_Assign(self, node: ast.Assign) -> None:\n", " self.assignment = True\n", " self.steps = 0\n", " self.generic_visit(node)\n", "\n", " def visit_Constant(self, node: ast.Constant) -> None:\n", " if self.assignment and self.steps == 3:\n", " if isinstance(node.value, str):\n", " if self.check_unicode(node.value):\n", " raise ParserException(f\"Something went wrong\")" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.809287Z", "iopub.status.busy": "2023-11-12T13:03:29.809184Z", "iopub.status.idle": "2023-11-12T13:03:29.811189Z", "shell.execute_reply": "2023-11-12T13:03:29.810947Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/951683174.py\", line 6, in \n", " parser3.parse(input_program)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1737200860.py\", line 9, in parse\n", " self.parse_tree(tree)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1737200860.py\", line 5, in parse_tree\n", " self.visit(tree)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/2904791116.py\", line 15, in generic_visit\n", " super().generic_visit(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/2904791116.py\", line 20, in visit_Assign\n", " self.generic_visit(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/2904791116.py\", line 15, in generic_visit\n", " super().generic_visit(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 420, in generic_visit\n", " self.visit(value)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/2904791116.py\", line 26, in visit_Constant\n", " raise ParserException(f\"Something went wrong\")\n", "ParserException: Something went wrong (expected)\n" ] } ], "source": [ "parser3 = Parser3()\n", "input_program = '''\n", "a = u'\\u0426'\n", "'''\n", "with ExpectError(ParserException):\n", " parser3.parse(input_program)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.812563Z", "iopub.status.busy": "2023-11-12T13:03:29.812461Z", "iopub.status.idle": "2023-11-12T13:03:29.815275Z", "shell.execute_reply": "2023-11-12T13:03:29.815023Z" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Parser4(Parser):\n", " \"\"\"\n", " Fails if an input program contains a variable which is not defined\n", " \"\"\"\n", "\n", " def __init__(self) -> None:\n", " self.assignment = False\n", " self.steps = 0\n", " self.variables: Set[str] = set()\n", "\n", " def generic_visit(self, node: ast.AST) -> None:\n", " self.steps += 1\n", " super().generic_visit(node)\n", "\n", " def visit_Name(self, node: ast.Name) -> None:\n", " if self.assignment and self.steps == 1:\n", " self.variables.add(node.id)\n", " self.assignment = False\n", " self.generic_visit(node)\n", " elif node.id in self.variables:\n", " self.generic_visit(node)\n", " else:\n", " raise ParserException(f\"Something went wrong\")\n", "\n", " def visit_Assign(self, node: ast.Assign) -> None:\n", " self.assignment = True\n", " self.steps = 0\n", " self.generic_visit(node)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.816766Z", "iopub.status.busy": "2023-11-12T13:03:29.816654Z", "iopub.status.idle": "2023-11-12T13:03:29.818623Z", "shell.execute_reply": "2023-11-12T13:03:29.818376Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1261297070.py\", line 7, in \n", " parser4.parse(input_program)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1737200860.py\", line 9, in parse\n", " self.parse_tree(tree)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1737200860.py\", line 5, in parse_tree\n", " self.visit(tree)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/3413124393.py\", line 13, in generic_visit\n", " super().generic_visit(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/3413124393.py\", line 28, in visit_Assign\n", " self.generic_visit(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/3413124393.py\", line 13, in generic_visit\n", " super().generic_visit(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 420, in generic_visit\n", " self.visit(value)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/3413124393.py\", line 23, in visit_Name\n", " raise ParserException(f\"Something went wrong\")\n", "ParserException: Something went wrong (expected)\n" ] } ], "source": [ "parser4 = Parser4()\n", "input_program = '''\n", "a = 1\n", "b = c\n", "'''\n", "with ExpectError(ParserException):\n", " parser4.parse(input_program)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.820026Z", "iopub.status.busy": "2023-11-12T13:03:29.819904Z", "iopub.status.idle": "2023-11-12T13:03:29.821849Z", "shell.execute_reply": "2023-11-12T13:03:29.821578Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Parser5(Parser):\n", " \"\"\"\n", " Fails if an input program contains a list\n", " \"\"\"\n", "\n", " def visit_List(self, node: ast.List) -> None:\n", " raise ParserException(f\"Something went wrong\") " ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.823355Z", "iopub.status.busy": "2023-11-12T13:03:29.823246Z", "iopub.status.idle": "2023-11-12T13:03:29.825339Z", "shell.execute_reply": "2023-11-12T13:03:29.825103Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/3158638464.py\", line 6, in \n", " parser5.parse(input_program)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1737200860.py\", line 9, in parse\n", " self.parse_tree(tree)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1737200860.py\", line 5, in parse_tree\n", " self.visit(tree)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 420, in generic_visit\n", " self.visit(value)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 420, in generic_visit\n", " self.visit(value)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/4175947920.py\", line 7, in visit_List\n", " raise ParserException(f\"Something went wrong\")\n", "ParserException: Something went wrong (expected)\n" ] } ], "source": [ "parser5 = Parser5()\n", "input_program = '''\n", "a = {x * 2 for x in map(int, ['1', '2', '3'])}\n", "'''\n", "with ExpectError(ParserException):\n", " parser5.parse(input_program)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "For secret tests we may use other parsers crafted in the same way." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Input Programs" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Now we present five public tests.\n", "Each test case implements a `get_original()` method that returns an input program which your reducer should minify.\n", "It also has a `parser` field which stores a parser used for a respective input program.\n", "Besides, `get_minimized()` method provides a reference solution." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.827056Z", "iopub.status.busy": "2023-11-12T13:03:29.826941Z", "iopub.status.idle": "2023-11-12T13:03:29.828955Z", "shell.execute_reply": "2023-11-12T13:03:29.828693Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Test1:\n", " parser = Parser1()\n", "\n", " def get_original(self) -> str:\n", " return '''\n", "def original():\n", " a = True\n", " b = not False\n", " c = 30\n", " for i in range(c):\n", " if i == 15:\n", " if a and b:\n", " return 1\n", " return 0\n", "'''\n", "\n", " def get_minimized(self) -> str:\n", " return '''\n", "True and True'''" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.830388Z", "iopub.status.busy": "2023-11-12T13:03:29.830285Z", "iopub.status.idle": "2023-11-12T13:03:29.832170Z", "shell.execute_reply": "2023-11-12T13:03:29.831905Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Test2:\n", " parser = Parser2()\n", " def get_original(self) -> str:\n", " return '''\n", "def original():\n", " a = True\n", " b = not False\n", " c = 30\n", " for i in range(c):\n", " if i == 15:\n", " if a and b:\n", " return 1\n", " return 0\n", " '''\n", "\n", " def get_minimized(self) -> str:\n", " return '''\n", "if True:\n", " return\n", "'''" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.833605Z", "iopub.status.busy": "2023-11-12T13:03:29.833492Z", "iopub.status.idle": "2023-11-12T13:03:29.835365Z", "shell.execute_reply": "2023-11-12T13:03:29.835110Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Test3:\n", " parser = Parser3()\n", " def get_original(self) -> str:\n", " return '''\n", "def original():\n", " a = 1\n", " b = a\n", " c = a - b\n", " if c < a:\n", " d = ''\n", " while a == b:\n", " d = u'\\u0426'\n", " a += 1\n", " return d\n", " return ''\n", "'''\n", "\n", " def get_minimized(self) -> str:\n", " return '''\n", "d = u'\\u0426'\n", "'''" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.836791Z", "iopub.status.busy": "2023-11-12T13:03:29.836677Z", "iopub.status.idle": "2023-11-12T13:03:29.838585Z", "shell.execute_reply": "2023-11-12T13:03:29.838343Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Test4:\n", " parser = Parser4()\n", " def get_original(self) -> str:\n", " return '''\n", "def original():\n", " a = 1\n", " b = a\n", " c = a - b\n", " if c < a:\n", " while a == b:\n", " a += 1\n", " return d\n", " return ''\n", "'''\n", "\n", " def get_minimized(self) -> str:\n", " return '''\n", "d\n", "'''" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.840018Z", "iopub.status.busy": "2023-11-12T13:03:29.839910Z", "iopub.status.idle": "2023-11-12T13:03:29.841800Z", "shell.execute_reply": "2023-11-12T13:03:29.841533Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class Test5:\n", " parser = Parser5()\n", "\n", " def get_original(self) -> str:\n", " return '''\n", "def original():\n", " a = 1\n", " b = 0\n", " while True:\n", " if a < b:\n", " return [1, 2, 3]\n", " else:\n", " return []\n", "'''\n", "\n", " def get_minimized(self) -> str:\n", " return '''\n", "[]\n", "'''" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "For instance, let's take the input program of `Test1`:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.843386Z", "iopub.status.busy": "2023-11-12T13:03:29.843268Z", "iopub.status.idle": "2023-11-12T13:03:29.844936Z", "shell.execute_reply": "2023-11-12T13:03:29.844688Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "def original():\n", " a = True\n", " b = not False\n", " c = 30\n", " for i in range(c):\n", " if i == 15:\n", " if a and b:\n", " return 1\n", " return 0\n", "\n" ] } ], "source": [ "test = Test1()\n", "source = test.get_original()\n", "print(source)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "And parse it with the provided parser:" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.846320Z", "iopub.status.busy": "2023-11-12T13:03:29.846239Z", "iopub.status.idle": "2023-11-12T13:03:29.848183Z", "shell.execute_reply": "2023-11-12T13:03:29.847921Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1292946598.py\", line 2, in \n", " test.parser.parse(source)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1737200860.py\", line 9, in parse\n", " self.parse_tree(tree)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/1737200860.py\", line 5, in parse_tree\n", " self.visit(tree)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 418, in generic_visit\n", " self.visit(item)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 420, in generic_visit\n", " self.visit(value)\n", " File \"/Users/zeller/.pyenv/versions/3.10.2/lib/python3.10/ast.py\", line 410, in visit\n", " return visitor(node)\n", " File \"/var/folders/n2/xd9445p97rb3xh7m1dfx8_4h0006ts/T/ipykernel_57532/2980862917.py\", line 7, in visit_BoolOp\n", " raise ParserException(f\"Something went wrong\")\n", "ParserException: Something went wrong (expected)\n" ] } ], "source": [ "with ExpectError():\n", " test.parser.parse(source)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The parser failed." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Testing Infrastructure\n", "\n", "The following section introduces the testing infrastructure, which is used to assess your performance in the project. Passing all tests will be enough to complete the project successfully." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "We are not going to directly compare a minimized program with the reference solution (as two strings). Instead, we again transform the code into AST and count the number of nodes in both trees.\n", "If the difference between them is lower than a `THRESHOLD` (and a parser still produces an error), the test is passed.\n", "To count the number of nodes in the AST, we need a helper class `NodeCounter`." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.849707Z", "iopub.status.busy": "2023-11-12T13:03:29.849606Z", "iopub.status.idle": "2023-11-12T13:03:29.851911Z", "shell.execute_reply": "2023-11-12T13:03:29.851680Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class NodeCounter(ast.NodeVisitor):\n", " \"\"\"\n", " This node counter is used to assess\n", " the amount of reductions performed by your reducer.\n", " It counts the number of nodes in the AST.\n", " \"\"\"\n", "\n", " def __init__(self) -> None:\n", " self.num_nodes = 0\n", "\n", " def visit(self, node: ast.AST) -> None:\n", " self.num_nodes += 1\n", " self.generic_visit(node)\n", "\n", " def count(self, source: str) -> int:\n", " tree = ast.parse(source=source)\n", " self.visit(tree)\n", " return self.num_nodes" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "The AST of the input program is as follows:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:29.853486Z", "iopub.status.busy": "2023-11-12T13:03:29.853374Z", "iopub.status.idle": "2023-11-12T13:03:30.244614Z", "shell.execute_reply": "2023-11-12T13:03:30.244304Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "0\n", "FunctionDef\n", "\n", "\n", "\n", "1\n", ""original"\n", "\n", "\n", "\n", "0--1\n", "\n", "\n", "\n", "\n", "2\n", "arguments\n", "\n", "\n", "\n", "0--2\n", "\n", "\n", "\n", "\n", "3\n", "Assign\n", "\n", "\n", "\n", "0--3\n", "\n", "\n", "\n", "\n", "9\n", "Assign\n", "\n", "\n", "\n", "0--9\n", "\n", "\n", "\n", "\n", "17\n", "Assign\n", "\n", "\n", "\n", "0--17\n", "\n", "\n", "\n", "\n", "23\n", "For\n", "\n", "\n", "\n", "0--23\n", "\n", "\n", "\n", "\n", "54\n", "Return\n", "\n", "\n", "\n", "0--54\n", "\n", "\n", "\n", "\n", "4\n", "Name\n", "\n", "\n", "\n", "3--4\n", "\n", "\n", "\n", "\n", "7\n", "Constant\n", "\n", "\n", "\n", "3--7\n", "\n", "\n", "\n", "\n", "5\n", ""a"\n", "\n", "\n", "\n", "4--5\n", "\n", "\n", "\n", "\n", "6\n", "Store\n", "\n", "\n", "\n", "4--6\n", "\n", "\n", "\n", "\n", "8\n", "True\n", "\n", "\n", "\n", "7--8\n", "\n", "\n", "\n", "\n", "10\n", "Name\n", "\n", "\n", "\n", "9--10\n", "\n", "\n", "\n", "\n", "13\n", "UnaryOp\n", "\n", "\n", "\n", "9--13\n", "\n", "\n", "\n", "\n", "11\n", ""b"\n", "\n", "\n", "\n", "10--11\n", "\n", "\n", "\n", "\n", "12\n", "Store\n", "\n", "\n", "\n", "10--12\n", "\n", "\n", "\n", "\n", "14\n", "Not\n", "\n", "\n", "\n", "13--14\n", "\n", "\n", "\n", "\n", "15\n", "Constant\n", "\n", "\n", "\n", "13--15\n", "\n", "\n", "\n", "\n", "16\n", "False\n", "\n", "\n", "\n", "15--16\n", "\n", "\n", "\n", "\n", "18\n", "Name\n", "\n", "\n", "\n", "17--18\n", "\n", "\n", "\n", "\n", "21\n", "Constant\n", "\n", "\n", "\n", "17--21\n", "\n", "\n", "\n", "\n", "19\n", ""c"\n", "\n", "\n", "\n", "18--19\n", "\n", "\n", "\n", "\n", "20\n", "Store\n", "\n", "\n", "\n", "18--20\n", "\n", "\n", "\n", "\n", "22\n", "30\n", "\n", "\n", "\n", "21--22\n", "\n", "\n", "\n", "\n", "24\n", "Name\n", "\n", "\n", "\n", "23--24\n", "\n", "\n", "\n", "\n", "27\n", "Call\n", "\n", "\n", "\n", "23--27\n", "\n", "\n", "\n", "\n", "34\n", "If\n", "\n", "\n", "\n", "23--34\n", "\n", "\n", "\n", "\n", "25\n", ""i"\n", "\n", "\n", "\n", "24--25\n", "\n", "\n", "\n", "\n", "26\n", "Store\n", "\n", "\n", "\n", "24--26\n", "\n", "\n", "\n", "\n", "28\n", "Name\n", "\n", "\n", "\n", "27--28\n", "\n", "\n", "\n", "\n", "31\n", "Name\n", "\n", "\n", "\n", "27--31\n", "\n", "\n", "\n", "\n", "29\n", ""range"\n", "\n", "\n", "\n", "28--29\n", "\n", "\n", "\n", "\n", "30\n", "Load\n", "\n", "\n", "\n", "28--30\n", "\n", "\n", "\n", "\n", "32\n", ""c"\n", "\n", "\n", "\n", "31--32\n", "\n", "\n", "\n", "\n", "33\n", "Load\n", "\n", "\n", "\n", "31--33\n", "\n", "\n", "\n", "\n", "35\n", "Compare\n", "\n", "\n", "\n", "34--35\n", "\n", "\n", "\n", "\n", "42\n", "If\n", "\n", "\n", "\n", "34--42\n", "\n", "\n", "\n", "\n", "36\n", "Name\n", "\n", "\n", "\n", "35--36\n", "\n", "\n", "\n", "\n", "39\n", "Eq\n", "\n", "\n", "\n", "35--39\n", "\n", "\n", "\n", "\n", "40\n", "Constant\n", "\n", "\n", "\n", "35--40\n", "\n", "\n", "\n", "\n", "37\n", ""i"\n", "\n", "\n", "\n", "36--37\n", "\n", "\n", "\n", "\n", "38\n", "Load\n", "\n", "\n", "\n", "36--38\n", "\n", "\n", "\n", "\n", "41\n", "15\n", "\n", "\n", "\n", "40--41\n", "\n", "\n", "\n", "\n", "43\n", "BoolOp\n", "\n", "\n", "\n", "42--43\n", "\n", "\n", "\n", "\n", "51\n", "Return\n", "\n", "\n", "\n", "42--51\n", "\n", "\n", "\n", "\n", "44\n", "And\n", "\n", "\n", "\n", "43--44\n", "\n", "\n", "\n", "\n", "45\n", "Name\n", "\n", "\n", "\n", "43--45\n", "\n", "\n", "\n", "\n", "48\n", "Name\n", "\n", "\n", "\n", "43--48\n", "\n", "\n", "\n", "\n", "46\n", ""a"\n", "\n", "\n", "\n", "45--46\n", "\n", "\n", "\n", "\n", "47\n", "Load\n", "\n", "\n", "\n", "45--47\n", "\n", "\n", "\n", "\n", "49\n", ""b"\n", "\n", "\n", "\n", "48--49\n", "\n", "\n", "\n", "\n", "50\n", "Load\n", "\n", "\n", "\n", "48--50\n", "\n", "\n", "\n", "\n", "52\n", "Constant\n", "\n", "\n", "\n", "51--52\n", "\n", "\n", "\n", "\n", "53\n", "1\n", "\n", "\n", "\n", "52--53\n", "\n", "\n", "\n", "\n", "55\n", "Constant\n", "\n", "\n", "\n", "54--55\n", "\n", "\n", "\n", "\n", "56\n", "0\n", "\n", "\n", "\n", "55--56\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "The number of nodes is 42\n" ] } ], "source": [ "show_ast(ast.parse(source))\n", "print(f\"The number of nodes is {NodeCounter().count(source)}\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "As you can see, the input program of `Test1` has 42 nodes." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The testing framework takes a reducer and allows executing a single test or all tests at once." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:30.246451Z", "iopub.status.busy": "2023-11-12T13:03:30.246330Z", "iopub.status.idle": "2023-11-12T13:03:30.250871Z", "shell.execute_reply": "2023-11-12T13:03:30.250615Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class TestingFramework:\n", " THRESHOLD = 3\n", " test_cases = {\n", " 'test1': Test1(),\n", " 'test2': Test2(),\n", " 'test3': Test3(),\n", " 'test4': Test4(),\n", " 'test5': Test5()\n", " }\n", "\n", " def __init__(self, reducer: Any) -> None:\n", " self.reducer = reducer\n", "\n", " def count_nodes(self, source: Optional[str]) -> int:\n", " if source is None:\n", " return 100000\n", " return NodeCounter().count(source)\n", "\n", " def run_test(self, test: Any) -> bool:\n", " \"\"\"\n", " run a single test\n", " \"\"\"\n", " print(f'Running test {test.__class__.__name__}')\n", " reducer = self.reducer(test.parser)\n", " reduced_code = reducer.minimize(test.get_original())\n", " return (self.has_property(reduced_code, test.parser) and \n", " self.is_minimized(reduced_code, test.get_minimized()))\n", "\n", " def run_tests(self) -> None:\n", " \"\"\"\n", " run all public tests\n", " \"\"\"\n", " passed_tests = 0\n", " for test in self.test_cases.values():\n", " success = self.run_test(test)\n", " if success:\n", " passed_tests += 1\n", "\n", " print(f\"In total {passed_tests} tests passed\")\n", "\n", " def has_property(self, source: str, parser: Any) -> bool:\n", " \"\"\"returns True if the parser fails to parse the source\"\"\"\n", " try:\n", " parser.parse(source)\n", " print(f'HAS PROPERTY: FAIL')\n", " return False\n", " except ParserException:\n", " print(f'HAS PROPERTY: OK')\n", " return True\n", " except Exception as e:\n", " print(f'HAS PROPERTY: FAIL {e}')\n", " return False\n", "\n", " def is_minimized(self, reduced: str, reference: str) -> bool:\n", " \"\"\"\n", " Returns True if the AST of the reduced code contains \n", " no more then the number of nodes in the reference + a THRESHOLD\n", " \"\"\"\n", " count_minimized = self.count_nodes(reduced)\n", " count_reference = self.count_nodes(reference)\n", " if count_minimized <= count_reference + self.THRESHOLD:\n", " print(f'IS MINIMIZED: OK')\n", " return True\n", " else:\n", " print(f'IS MINIMIZED: FAIL')\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "Let's test our `ReducingDebugger` with the following test:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:30.252360Z", "iopub.status.busy": "2023-11-12T13:03:30.252264Z", "iopub.status.idle": "2023-11-12T13:03:30.254115Z", "shell.execute_reply": "2023-11-12T13:03:30.253869Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class Test0:\n", " parser = Parser1()\n", "\n", " def get_original(self) -> str:\n", " return '''\n", "if a and b:\n", " c = 1\n", "else:\n", " c = 2\n", "'''\n", "\n", " def get_minimized(self) -> str:\n", " return '''\n", "a and b'''" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:30.255837Z", "iopub.status.busy": "2023-11-12T13:03:30.255723Z", "iopub.status.idle": "2023-11-12T13:03:30.258094Z", "shell.execute_reply": "2023-11-12T13:03:30.257769Z" }, "jupyter": { "outputs_hidden": false }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running test Test0\n", "HAS PROPERTY: OK\n", "IS MINIMIZED: OK\n", "Success!\n" ] } ], "source": [ "tf = TestingFramework(ReducingDebugger)\n", "if tf.run_test(Test0()):\n", " print(\"Success!\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The input program was successfully minimized." ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "fragment" } }, "source": [ "What if we run all public tests?" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "execution": { "iopub.execute_input": "2023-11-12T13:03:30.259647Z", "iopub.status.busy": "2023-11-12T13:03:30.259529Z", "iopub.status.idle": "2023-11-12T13:03:30.263230Z", "shell.execute_reply": "2023-11-12T13:03:30.262927Z" }, "pycharm": { "name": "#%%\n" }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running test Test1\n", "HAS PROPERTY: OK\n", "IS MINIMIZED: FAIL\n", "Running test Test2\n", "HAS PROPERTY: OK\n", "IS MINIMIZED: FAIL\n", "Running test Test3\n", "HAS PROPERTY: OK\n", "IS MINIMIZED: FAIL\n", "Running test Test4\n", "HAS PROPERTY: OK\n", "IS MINIMIZED: FAIL\n", "Running test Test5\n", "HAS PROPERTY: OK\n", "IS MINIMIZED: FAIL\n", "In total 0 tests passed\n" ] } ], "source": [ "tf.run_tests()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Unfortunately, our parser failed to minimize all input programs from the public test suite." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Grading" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For this project, you can get a total of 30 Points:\n", "\n", "* 15 Points will be awarded for passing the public test (**without hardcoding the minimized solution or the sequence of transformations**) presented in this notebook.\n", "These 15 points mean that you successfully implemented the must-have implementation.\n", "* 5 Points will be awarded for passing secret tests.\n", "* 10 Points will be awarded for efficient implementation (see may-have implementation)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Must-have Features\n", "\n", "Implement a reducer which minifies a given program so that its parser still produces an error.\n", "To this end, collect all possible transformations over the nodes of an AST tree and then apply one change at a time.\n", "These modifications should be repeated until no further updates can be made without triggering a parser exception.\n", "If your reducer passes secret test you are awarded additional 5 points.\n", "\n", "*Note: Implementing the given modifications should be sufficient to successfully complete this project.*" ] }, { "cell_type": "markdown", "metadata": { "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "subslide" } }, "source": [ "### May-have Features\n", "\n", "An implementation only of the must-have features aims for correctness but can be inefficient (if transformations are applied one at a time). So, it can be optimized, for instance, with help of the Delta Debugging approach.\n", "\n", "Implement an AST Delta Debugger which efficiently prunes the nodes.\n", "If your reducer can beat the runtime of our reference implementation (which simply collects all possible transformations and applies them randomly one at a time) you can get up to 10 points (depending on the margin).\n", "\n", "Hint: The [Reducing Failure-Inducing Inputs chapter](https://www.debuggingbook.org/beta/html/DeltaDebugger.html) already tries to optimize the reduction of the AST with help of Delta Debugging. \n", "Hint: For instance, you can find useful _Hierarchical Delta Debugging_ paper from the Background section of **Reducing Failure-Inducing Inputs** Chapter." ] } ], "metadata": { "ipub": { "bibliography": "fuzzingbook.bib" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "toc-autonumbering": false }, "nbformat": 4, "nbformat_minor": 4 }