{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DaCe with Explicit Dataflow in Python\n", "\n", "In this tutorial, we will use the explicit dataflow specification in Python to construct DaCe programs." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import dace" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Explicit dataflow is a Python-based syntax that is close to defining SDFGs. In explicit ` @dace.program `s, the code (Tasklets) and memory movement (Memlets) are specified separately, as we show below.\n", "\n", "## Matrix Transposition\n", "\n", "We begin with a simple example, transposing a matrix (out-of-place). \n", "\n", "First, since we do not know what the matrix sizes will be, we define symbolic sizes:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "M = dace.symbol('M')\n", "N = dace.symbol('N')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now proceed to define the data-centric part of the application (i.e., the part that can be optimized by DaCe). It is a simple function which, when called, invokes the compilation and optimization procedure. It can also be compiled explicitly, as we show in the next example.\n", "\n", "DaCe programs use explicit types, so that they can be compiled. We provide a numpy-compatible set of types that can define N-dimensional tensors. For example, `dace.int64` defines a 64-bit signed integer scalar, and `dace.float32[133,8]` defines a 133-row and 8-column 2D array." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "@dace.program\n", "def transpose(A: dace.float32[M, N], B: dace.float32[N, M]):\n", " # Inside the function we will define a tasklet in a map, which is shortened\n", " # to dace.map. We define the map range in the arguments:\n", " @dace.map\n", " def mytasklet(i: _[0:M], j: _[0:N]):\n", " # Pre-declaring the memlets is required in explicit dataflow, tasklets\n", " # cannot use any external memory apart from data flowing to/from it.\n", " a << A[i,j] # Input memlet (<<)\n", " b >> B[j,i] # Output memlet (>>)\n", " \n", " # The code\n", " b = a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And that's it! We will now define some regression test using numpy:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "A = np.random.rand(37, 11).astype(np.float32)\n", "expected = A.transpose()\n", "# Define an array for the output of the dace program\n", "B = np.random.rand(11, 37).astype(np.float32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we call `transpose`, we can inspect the SDFG:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "" ], "text/plain": [ "SDFG (transpose)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sdfg = transpose.to_sdfg()\n", "sdfg" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now call `transpose` directly, or using the SDFG we created. When calling `transpose`, we need to feed the symbols as well as the arguments (since the arrays are `numpy` rather than symbolic `dace` arrays, see below tutorials). When prompted for transformations, we will now just press the \"Enter\" key to skip them." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING: Casting scalar argument \"M\" from int to \n", "WARNING: Casting scalar argument \"N\" from int to \n" ] } ], "source": [ "sdfg(A=A, B=B, M=A.shape[0], N=A.shape[1])" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Difference: 0.0\n" ] } ], "source": [ "print('Difference:', np.linalg.norm(expected - B))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Query (using Streams)\n", "\n", "In this example, we will use the Stream construct and symbolic dace ND arrays to create a simple parallel filter. We first define a symbolic size and a symbolically-sized array:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "N = dace.symbol('N')\n", "n = 255\n", "\n", "storage = dace.ndarray(shape=[n], dtype=dace.int32)\n", "# The size of \"output\" will actually be lesser or equal to N, but we need to \n", "# statically allocate the memory.\n", "output = dace.ndarray(shape=[n], dtype=dace.int32)\n", "# The size is a scalar\n", "output_size = dace.scalar(dtype=dace.uint32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with `transpose`, the DaCe program also consists of a tasklet nested in a Map, but also includes a Stream (to which we push outputs as necessary) that is directly connected to the output array, as well as a conflict-resolution output (because all tasklets in the map write to the same address:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "@dace.program\n", "def query(data: dace.int32[N], output: dace.int32[N], outsz: dace.int32[1], \n", " threshold: dace.int32):\n", " # Define a local, unbounded (buffer_size=0) stream\n", " S = dace.define_stream(dace.int32, 0)\n", " \n", " # Filtering tasklet\n", " @dace.map\n", " def filter(i: _[0:N]):\n", " a << data[i]\n", " # Writing to S (no location necessary) a dynamic number of times (-1)\n", " out >> S(-1)\n", " # Writing to outsz dynamically (-1), if there is a conflict, we will sum the results\n", " osz >> outsz(-1, lambda a,b: a+b) \n", " \n", " if a > threshold:\n", " # Pushing to a stream or writing with a conflict use the assignment operator\n", " out = a\n", " osz = 1\n", " \n", " # Define a memlet from S to the output\n", " S >> output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can compile `query` without defining anything further. However, before we call `query`, we will need to set the symbol sizes." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "qfunc = query.compile()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "thres = 50" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Define some random integers and zero outputs\n", "import numpy as np\n", "storage[:] = np.random.randint(0, 100, size=n)\n", "output_size[0] = 0\n", "output[:] = np.zeros(n).astype(np.int32)\n", "\n", "# Compute expected output using numpy\n", "expected = storage[np.where(storage > thres)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we will just use the Python function prototype to call the code, since we do not invoke it through the SDFG:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING: Casting scalar argument \"threshold\" from int to \n" ] }, { "data": { "text/plain": [ "array([121], dtype=uint32)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qfunc(data=storage, output=output, outsz=output_size, threshold=thres, N=n)\n", "output_size" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Difference: 0.0\n" ] } ], "source": [ "filtered_output = output[:output_size[0]]\n", "# Sorting outputs to avoid concurrency-based reordering\n", "print('Difference:', np.linalg.norm(np.sort(expected) - np.sort(filtered_output)))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.1" } }, "nbformat": 4, "nbformat_minor": 4 }