{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "5729f383", "metadata": {}, "source": [ "# Building a Database Abstraction in xDSL\n", "\n", "In this example, we show how to build a database DSL such that we can interact with SQL-like queries directly from within Python, as well as optimize their execution.\n", "\n", "Say we are given an example like this:\n", "\n", "```SQL\n", "SELECT * FROM T WHERE T.a > 5 + 5\n", "```\n", "\n", "Clearly, this can be optimized using constant folding to:\n", "\n", "```SQL\n", "SELECT * FROM T WHERE T.a > 10\n", "```\n", "\n", "Through xDSL, we can build the necessary abstractions for such a query and implement optimizations, in particular the constant folding one.\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1b5c461b", "metadata": {}, "source": [ "While there are several ways to structure an IR in xDSL, we decide to go with a structure that connects different Operations (abstractions for some form of computation) through SSAValues. These SSAValues need a type, which is a form of compile-time information. This kind of information can be expressed as an Attribute. Therefore, we start by defining an Attribute for bags. These bags need to have some information about what is actually contained in them. This information is again compile-time information. Therefore, we encode it as an Attribute and add it to the bag attribute as a Parameter." ] }, { "cell_type": "code", "execution_count": 1, "id": "7aaf5d7c", "metadata": {}, "outputs": [], "source": [ "from xdsl.ir import *\n", "from xdsl.irdl import *\n", "from xdsl.dialects.builtin import *\n", "from xdsl.dialects.arith import *\n", "from xdsl.dialects.scf import *\n", "from xdsl.pattern_rewriter import *\n", "\n", "\n", "@irdl_attr_definition\n", "class Bag(ParametrizedAttribute):\n", " name = \"sql.bag\"\n", " schema: ParameterDef[Attribute]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c099a196", "metadata": {}, "source": [ "In the textual IR, which xDSL generates out-of-the-box for Attributes and Operations, this looks the following:" ] }, { "cell_type": "code", "execution_count": 2, "id": "497c67b9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "#sql.bag" ] } ], "source": [ "printer = Printer()\n", "printer.print_attribute(Bag([i32]))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6879da8f", "metadata": {}, "source": [ "Now we want to start abstracting forms of computation. At first, we just want to have an access to the table T, so we define a table operation. This Operation has an attribute encoding the table name and a result. This result is an SSAValue, which we can pass to other operations in the future." ] }, { "cell_type": "code", "execution_count": 3, "id": "e9fad9ec", "metadata": {}, "outputs": [], "source": [ "from xdsl.irdl import attr_def, result_def\n", "\n", "\n", "@irdl_op_definition\n", "class Table(IRDLOperation):\n", " name = \"sql.table\"\n", " table_name: StringAttr = attr_def(StringAttr)\n", " result_bag: OpResult = result_def(Bag)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ba73a528", "metadata": {}, "source": [ "Using this operation, we can create the first full query:\n", "\n", "```sql\n", "SELECT * from T\n", "```" ] }, { "cell_type": "code", "execution_count": 4, "id": "b0b135d3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "%0 = \"sql.table\"() {\"table_name\" = \"T\"} : () -> #sql.bag" ] } ], "source": [ "t = Table.build(attributes={\"table_name\": StringAttr(\"T\")}, result_types=[Bag([(i32)])])\n", "printer.print_op(t)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "4f99665f", "metadata": {}, "source": [ "The top-level object in xDSL is a `ModuleOp`, which is an operation representing a module\n", "of code to compile." ] }, { "cell_type": "code", "execution_count": 5, "id": "bbbccc08", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "builtin.module {\n", " %0 = \"sql.table\"() {\"table_name\" = \"T\"} : () -> #sql.bag\n", "}\n" ] } ], "source": [ "module = ModuleOp([t])\n", "print(module)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f10c513f", "metadata": {}, "source": [ "In order to abstract our goal query, we need an abstraction for selections. Again, this is a form of computation, so abstract it as an operation. The actual condition to filter with is nested inside that operation. The way to go about this in xDSL is using a Region. Additionally, we decide to reuse dialects already defined within xDSL, in particular the arith dialect." ] }, { "cell_type": "code", "execution_count": 6, "id": "ff621a44", "metadata": {}, "outputs": [], "source": [ "@irdl_op_definition\n", "class Selection(IRDLOperation):\n", " name = \"sql.selection\"\n", " input_bag: Operand = operand_def(Bag)\n", " filter: Region = region_def()\n", " result_bag: OpResult = result_def(Bag)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f668a3d1", "metadata": {}, "source": [ "We instantiate this in two steps. First, we build the filter region, then the operation itself:" ] }, { "cell_type": "code", "execution_count": 7, "id": "e38541ac", "metadata": {}, "outputs": [], "source": [ "from xdsl.builder import Builder\n", "\n", "\n", "@Builder.implicit_region((i32,))\n", "def filter(args: tuple[BlockArgument, ...]):\n", " # filter argument\n", " (arg,) = args\n", "\n", " const1 = Constant.from_int_and_width(5, 32)\n", " const2 = Constant.from_int_and_width(5, 32)\n", " add = Addi(const1, const2)\n", " cmp = Cmpi(arg, add, \"sgt\")\n", " # sgt stands for `signed greater than`. In xDSL, this is encoded as a predicate attribute with value 4.\n", " Yield(cmp)" ] }, { "cell_type": "code", "execution_count": 8, "id": "c8d39679", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "%1 = \"sql.selection\"(%0) ({\n", "^0(%2 : i32):\n", " %3 = arith.constant 5 : i32\n", " %4 = arith.constant 5 : i32\n", " %5 = arith.addi %3, %4 : i32\n", " %6 = arith.cmpi sgt, %2, %5 : i32\n", " scf.yield %6 : i1\n", "}) : (#sql.bag) -> #sql.bag" ] } ], "source": [ "sel = Selection.build(result_types=[Bag([i32])], operands=[t], regions=[filter])\n", "\n", "printer.print_op(sel)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "efd67bfc", "metadata": {}, "source": [ "In a next step, we want to rewrite the IR created in the last step using constant folding. For that, we use the xDSL RewriteEngine, which applies RewritePatterns to the IR. As a first step, we define the necessary Pattern:" ] }, { "cell_type": "code", "execution_count": 9, "id": "0b618749", "metadata": {}, "outputs": [], "source": [ "@dataclass\n", "class ConstantFolding(RewritePattern):\n", " @op_type_rewrite_pattern\n", " def match_and_rewrite(self, op: Addi, rewriter: PatternRewriter):\n", " if isinstance(op.lhs.owner, Constant) and isinstance(op.rhs.owner, Constant):\n", " lhs_data = cast(IntegerAttr[IntegerType], op.lhs.owner.value).value.data\n", " rhs_data = cast(IntegerAttr[IntegerType], op.rhs.owner.value).value.data\n", " lhs_type = cast(IntegerAttr[IntegerType], op.lhs.owner.value).type\n", " rewriter.replace_matched_op(\n", " Constant.from_int_and_width(lhs_data + rhs_data, lhs_type)\n", " )" ] }, { "cell_type": "code", "execution_count": 10, "id": "864c39f3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "walker = PatternRewriteWalker(\n", " GreedyRewritePatternApplier([ConstantFolding()]),\n", " walk_regions_first=True,\n", " apply_recursively=True,\n", " walk_reverse=False,\n", ")\n", "\n", "\n", "walker.rewrite_op(sel)" ] }, { "cell_type": "code", "execution_count": 11, "id": "0b653e33", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "%1 = \"sql.selection\"(%0) ({\n", "^0(%2 : i32):\n", " %3 = arith.constant 5 : i32\n", " %4 = arith.constant 5 : i32\n", " %7 = arith.constant 10 : i32\n", " %6 = arith.cmpi sgt, %2, %7 : i32\n", " scf.yield %6 : i1\n", "}) : (#sql.bag) -> #sql.bag" ] } ], "source": [ "printer.print_op(sel)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ac92fb31", "metadata": {}, "source": [ "Now let's remove the left over constants and we are:" ] }, { "cell_type": "code", "execution_count": 12, "id": "21e27fff", "metadata": {}, "outputs": [], "source": [ "@dataclass\n", "class DeadConstantElim(RewritePattern):\n", " @op_type_rewrite_pattern\n", " def match_and_rewrite(self, op: Constant, rewriter: PatternRewriter):\n", " if len(op.result.uses) == 0:\n", " rewriter.erase_matched_op()" ] }, { "cell_type": "code", "execution_count": 13, "id": "c53c8eb5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "walker = PatternRewriteWalker(\n", " GreedyRewritePatternApplier([DeadConstantElim()]),\n", " walk_regions_first=True,\n", " apply_recursively=True,\n", " walk_reverse=False,\n", ")\n", "\n", "walker.rewrite_op(sel)" ] }, { "cell_type": "code", "execution_count": 14, "id": "b76b2bab", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "%1 = \"sql.selection\"(%0) ({\n", "^0(%2 : i32):\n", " %7 = arith.constant 10 : i32\n", " %6 = arith.cmpi sgt, %2, %7 : i32\n", " scf.yield %6 : i1\n", "}) : (#sql.bag) -> #sql.bag" ] } ], "source": [ "printer.print_op(sel)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "14c3bf0a", "metadata": {}, "source": [ "In this example, the SSAValue %1 is just flying around. We want to make sure it is actually bound to somewhere, such that we know what to do with it during compilation. Therefore, we introduce a SinkOp, which returns the data in the bag to the executor of the Query." ] }, { "cell_type": "code", "execution_count": 15, "id": "e8fce9aa", "metadata": {}, "outputs": [], "source": [ "@irdl_op_definition\n", "class SinkOp(IRDLOperation):\n", " name = \"sql.sink\"\n", " bag: Operand = operand_def(Bag)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "38269ef7", "metadata": {}, "source": [ "In xDSL, all IRs need a ModuleOp as the outermost Operation, so we wrap it inside on:" ] }, { "cell_type": "code", "execution_count": 16, "id": "5deac6c5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "builtin.module {\n", " %0 = \"sql.table\"() {\"table_name\" = \"T\"} : () -> #sql.bag\n", " \"sql.sink\"(%1) : (#sql.bag) -> ()\n", "}\n" ] } ], "source": [ "module.body.block.add_op(SinkOp.build(operands=[sel]))\n", "\n", "printer.print(module)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "13368421", "metadata": {}, "source": [ "And now we actually have an abstraction for the query, even with an optimization pass for constant folding." ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 5 }