{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Efficient Python analysis with dynamic C++ and just-in-time compilation\n", "\n", "PyROOT has the goal to combine the convenience of the Python language with the efficiency of C++ implementations. The dynamic C++ bindings of PyROOT powered by the C++ interpreter [cling](https://github.com/root-project/cling) allow to use conveniently efficient implementations in Python." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Welcome to JupyROOT 6.22/00\n" ] } ], "source": [ "import ROOT\n", "import numpy as np\n", "np.random.seed(1234)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Just-in-time compilation of C++ functions\n", "\n", "Just-in-time compilation (jitting) allows to use the power of the C++ compiler to optimize computation heavy operations. Just like numpy implements the actual functionality under the hood in C(++), PyROOT allows you to do the same dynamically." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "ROOT.gInterpreter.Declare('''\n", "float largest_sum(float* v1, float* v2, std::size_t size){\n", " float r = -999.f;\n", " for (size_t i1 = 0; i1 < size; i1++) {\n", " for (size_t i2 = 0; i2 < size; i2++) {\n", " const auto tmp = v1[i1] + v2[i2];\n", " if (tmp > r) r = tmp;\n", " }\n", " }\n", " return r;\n", "}\n", "''');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As example inputs, we generate two numpy arrays with random numbers." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "size = 100\n", "v1 = np.random.randn(size).astype(np.float32)\n", "v2 = np.random.randn(size).astype(np.float32)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And next we benchmark the runtime:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "25.5 µs ± 323 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" ] } ], "source": [ "%%timeit\n", "ROOT.largest_sum(v1, v2, size)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How does the C++ kernel compare to a pure Python implementation?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def largest_sum(x1, x2):\n", " r = -999.0\n", " for e1 in x1:\n", " for e2 in x2:\n", " tmp = e1 + e2\n", " if tmp > r: r = tmp\n", " return r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**The Python implementation is a factor of 100 slower!**" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.25 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], "source": [ "%%timeit\n", "largest_sum(v1, v2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading of precompiled functions\n", "\n", "Improved C++ performance can be expected by precompiling the functionality and loading interfacing the functions via PyROOT. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Header:\n", "#include \n", "float optimized_largest_sum(float* v1, float* v2, std::size_t size);\n", "\n", "Source:\n", "# include \"analysis.hxx\"\n", "\n", "float optimized_largest_sum(float* v1, float* v2, std::size_t size){\n", " float r = -999.f;\n", " for (size_t i1 = 0; i1 < size; i1++) {\n", " for (size_t i2 = 0; i2 < size; i2++) {\n", " const auto tmp = v1[i1] + v2[i2];\n", " if (tmp > r) r = tmp;\n", " }\n", " }\n", " return r;\n", "}\n" ] } ], "source": [ "%%bash\n", "echo 'Header:'\n", "cat analysis.hxx\n", "echo\n", "echo 'Source:'\n", "cat analysis.cxx\n", "g++ -Ofast -shared -o libanalysis.so analysis.cxx" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can interactively include the header and functionality from the shared library." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "ROOT.gInterpreter.Declare('#include \"analysis.hxx\"')\n", "ROOT.gSystem.Load('libanalysis.so');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**The optimized compilation improves the runtime again by a factor of 5!**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.83 µs ± 468 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n" ] } ], "source": [ "%%timeit\n", "ROOT.optimized_largest_sum(v1, v2, size)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can show that all implementations come to the same result:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "PyROOT: 4.7567291259765625\n", "Native Python: 4.756729\n", "PyROOT (optimized): 4.7567291259765625\n" ] } ], "source": [ "print('PyROOT:', ROOT.largest_sum(v1, v2, size))\n", "print('Native Python:', largest_sum(v1, v2))\n", "print('PyROOT (optimized):', ROOT.optimized_largest_sum(v1, v2, size))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }