{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\"Accera\n", "\n", "# Accera Quickstart Example\n", "\n", "In this example, we will:\n", "\n", "* Implement matrix multiplication with a ReLU activation (matmul + ReLU), commonly used in in machine learning algorithms\n", " * Generate two implementations: a naive algorithm and one with loop transformations\n", "* Compare the timings of both implementations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup\n", "\n", "First, we'll install Accera using `pip`.\n", "\n", "#### Optional: if running this notebook locally\n", "\n", "* Linux/macOS: install gcc using `apt install gcc`.\n", "* Windows: install Microsoft Visual Studio and run `vcvars64.bat` to setup Visual Studio tools in your `PATH` before starting the Jupyter environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install accera" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Build\n", "\n", "Run the code below to implement `ReLU(C + A @ B)` on arrays `A`, `B`, and `C`. \n", "\n", "We'll build a package called `\"hello_accera\"` that will export both versions as C functions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import accera as acc\n", "\n", "# define placeholder inputs/output\n", "A = acc.Array(role=acc.Role.INPUT, shape=(512, 512))\n", "B = acc.Array(role=acc.Role.INPUT, shape=(512, 512))\n", "C = acc.Array(role=acc.Role.INPUT_OUTPUT, shape=(512, 512))\n", "\n", "# implement the logic for matmul and relu\n", "matmul = acc.Nest(shape=(512, 512, 512))\n", "i1, j1, k1 = matmul.get_indices()\n", "@matmul.iteration_logic\n", "def _():\n", " C[i1, j1] += A[i1, k1] * B[k1, j1]\n", "\n", "relu = acc.Nest(shape=(512, 512))\n", "i2, j2 = relu.get_indices()\n", "@relu.iteration_logic\n", "def _():\n", " C[i2, j2] = acc.max(C[i2, j2], 0.0)\n", "\n", "package = acc.Package()\n", "\n", "# fuse the i and j indices of matmul and relu, add to the package\n", "schedule = acc.fuse(matmul.create_schedule(), relu.create_schedule(), partial=2)\n", "package.add(schedule, args=(A, B, C), base_name=\"matmul_relu_fusion_naive\")\n", "\n", "# transform the schedule, add to the package\n", "# here we will focus only on the j index. For a more complete example, see:\n", "# https://microsoft.github.io/Accera/Tutorials/Optimized_MatMul/\n", "tile_size_j = 256\n", "target = acc.Target(category=acc.Target.Category.CPU)\n", "\n", "f, i, j, k = schedule.get_indices()\n", "jj = schedule.split(j, tile_size_j)\n", "jjj = schedule.split(jj, (target.vector_bytes // 4) * 2) # there are 2 vfma execution units, each holding (target.vector_bytes // 4) 32-bit float elements\n", "jjjj = schedule.split(jjj, target.vector_bytes // 4) # each SIMD register holds (target.vector_bytes // 4) 32-bit float elements\n", "\n", "schedule.reorder(j, f, k, i, jj, jjj, jjjj) # reorder the loops\n", "plan = schedule.create_plan(target)\n", "plan.kernelize(unroll_indices=(jjj,), vectorize_indices=jjjj) # unroll and vectorize\n", "package.add(plan, args=(A, B, C), base_name=\"matmul_relu_fusion_transformed\")\n", "\n", "# build a dynamically-linked package (a .dll or .so) that exports both functions\n", "print(package.build(name=\"hello_accera\", format=acc.Package.Format.HAT_DYNAMIC))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Benchmark\n", "\n", "In the previous section, we built a binary (`.so`) and a header file (`.hat`). \n", "\n", "Next, we will load the package and compare the timings of both implementations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import hatlib as hat\n", "import numpy as np\n", "\n", "# load the package\n", "_, functions = hat.load(\"hello_accera.hat\")\n", "\n", "# call one of the functions with test inputs\n", "A_test = np.random.rand(512, 512).astype(np.float32)\n", "B_test = np.random.rand(512, 512).astype(np.float32)\n", "C_test = np.zeros((512, 512)).astype(np.float32)\n", "C_numpy = np.maximum(C_test + A_test @ B_test, 0.0)\n", "\n", "matmul_relu = functions[\"matmul_relu_fusion_transformed\"]\n", "matmul_relu(A_test, B_test, C_test)\n", "\n", "# check correctness\n", "np.testing.assert_allclose(C_test, C_numpy, atol=1e-3)\n", "\n", "# benchmark all functions\n", "hat.run_benchmark(\"hello_accera.hat\", batch_size=5, min_time_in_sec=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Next Steps\n", "\n", "The [Manual](https://microsoft.github.io/Accera/Manual/00%20Introduction/) is a good place to start for an introduction to the Accera Python programming model.\n", "\n", "In particular, the [schedule transformations](https://microsoft.github.io/Accera/Manual/03%20Schedules/#schedule-transformations) describe how you can experiment with different loop transformations with just a few lines of Python.\n", "\n", "Finally, the `.hat` format is just a C header file containing metadata. Learn more about the [HAT format](https://github.com/microsoft/hat) and [benchmarking](https://github.com/microsoft/hat/tree/main/tools).\n", "\n", "\n", "## How it works\n", "\n", "In a nutshell, Accera takes the Python code that defines the loop schedule and algorithm and converts it into [MLIR](https://mlir.llvm.org/) intermediate representation (IR). Accera's compiler then takes this IR through a series of MLIR pipelines to perform transformations. The result is a binary library with a C header file. The library implements the algorithms that are defined in Python, and is compatible with the target.\n", "\n", "To peek into the stages of IR transformation that Accera does, try replacing `format=acc.Package.Format.HAT_DYNAMIC` with `format=acc.Package.Format.MLIR_DYNAMIC` above, re-run the build, and search the `_tmp` subfolder for the intermediate `*.mlir` files. We plan to document these IR constructs in the future.\n", "\n", "\n", "## Documentation\n", "Get to know Accera by reading the [Documentation](https://microsoft.github.io/Accera/).\n", "\n", "You can find more step-by-step examples in the [Tutorials](https://microsoft.github.io/Accera/Tutorials)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 2 }