{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Profiling the Compiler\n", "\n", "This notebook describes new features added in version 0.53 to help users measure time spent in the compiler." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from numba.core import config\n", "# This is needed for recording LLVM pass timings used \n", "# later in this notebook.\n", "config.LLVM_PASS_TIMINGS = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tracking Compilation Events" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Numba 0.53, a simple [event API](https://numba.readthedocs.io/en/0.53.0/developer/event_api.html) is implemented to allow users to track specific compilation events. This section will describe how it can be used to record compilation events within a region of code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will measure the following function." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import numba\n", "assert numba.version_info.short >= (0, 53)\n", "from numba import njit" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# The function to record\n", "@njit\n", "def foo(x):\n", " out = [4, 5, 6, 7]\n", " out.sort()\n", " return np.linalg.svd(x), out" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To record a specific event for later processing, users can use [`install_recorder()`](https://numba.readthedocs.io/en/0.53.0/developer/event_api.html#numba.core.event.install_recorder)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from numba.core.event import install_recorder\n", "\n", "# Record all compilation events\n", "with install_recorder(\"numba:compile\") as rec:\n", " foo(np.ones((2, 2)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above stores the recorded `numba:compile` events as a `list` into `rec.buffer`. Each item is a 2-tuple of `(timestamp, Event)`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[(1616078243.82092, Event(numba:compile, EventStatus.START, data: dict)),\n", " (1616078243.910625, Event(numba:compile, EventStatus.START, data: dict)),\n", " (1616078243.939735, Event(numba:compile, EventStatus.START, data: dict))]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rec.buffer[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following prepares the data for plotting:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def organize_events(rec):\n", " \"\"\"Combine START and END events.\n", " \"\"\"\n", " events = []\n", " temp = {}\n", " \n", " for ts, ev in rec.buffer:\n", " ed = ev.data\n", " k = ed['dispatcher'], ed['args']\n", " # If it's the start event\n", " if ev.is_start:\n", " # record the dispatcher name, type signature and start time\n", " temp[k] = ev, ts\n", " # Else, it's the end event\n", " else:\n", " # save 4-tuple of (start_event, end_event, start_time, end_time)\n", " ev_start, tt = temp[k]\n", " events.append((ev_start, ev, tt, ts))\n", " return events\n", "\n", "def group_data(events):\n", " \"\"\"Group the event timings into a simple column format in a dict.\n", " \n", " Returns a dict with 3 keys: \"start_times\", \"end_times\", and \"events\".\n", " \"\"\"\n", " names = []\n", " start_times = []\n", " end_times = []\n", " for i, (s, e, ts, te) in enumerate(events):\n", " names.append(f\"{s.data['dispatcher'].py_func.__name__}{s.data['args']}-[{i}]\")\n", " start_times.append(ts) \n", " end_times.append(te)\n", "\n", " # Rebase timestamp to the first compilation event\n", " base = np.min(start_times)\n", " start_times -= base\n", " end_times -= base\n", "\n", " source = {\n", " \"start_times\": start_times,\n", " \"end_times\": end_times,\n", " \"events\": names,\n", " }\n", " return source\n", "\n", "source = group_data(organize_events(rec))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use bokeh to render the compilation events." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " Loading BokehJS ...\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "(function(root) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = true;\n", "\n", " if (typeof root._bokeh_onload_callbacks === \"undefined\" || force === true) {\n", " root._bokeh_onload_callbacks = [];\n", " root._bokeh_is_loading = undefined;\n", " }\n", "\n", " var JS_MIME_TYPE = 'application/javascript';\n", " var HTML_MIME_TYPE = 'text/html';\n", " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n", " var CLASS_NAME = 'output_bokeh rendered_html';\n", "\n", " /**\n", " * Render data to the DOM node\n", " */\n", " function render(props, node) {\n", " var script = document.createElement(\"script\");\n", " node.appendChild(script);\n", " }\n", "\n", " /**\n", " * Handle when an output is cleared or removed\n", " */\n", " function handleClearOutput(event, handle) {\n", " var cell = handle.cell;\n", "\n", " var id = cell.output_area._bokeh_element_id;\n", " var server_id = cell.output_area._bokeh_server_id;\n", " // Clean up Bokeh references\n", " if (id != null && id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", "\n", " if (server_id !== undefined) {\n", " // Clean up Bokeh references\n", " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n", " cell.notebook.kernel.execute(cmd, {\n", " iopub: {\n", " output: function(msg) {\n", " var id = msg.content.text.trim();\n", " if (id in Bokeh.index) {\n", " Bokeh.index[id].model.document.clear();\n", " delete Bokeh.index[id];\n", " }\n", " }\n", " }\n", " });\n", " // Destroy server and session\n", " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n", " cell.notebook.kernel.execute(cmd);\n", " }\n", " }\n", "\n", " /**\n", " * Handle when a new output is added\n", " */\n", " function handleAddOutput(event, handle) {\n", " var output_area = handle.output_area;\n", " var output = handle.output;\n", "\n", " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n", " if ((output.output_type != \"display_data\") || (!Object.prototype.hasOwnProperty.call(output.data, EXEC_MIME_TYPE))) {\n", " return\n", " }\n", "\n", " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n", "\n", " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n", " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n", " // store reference to embed id on output_area\n", " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n", " }\n", " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n", " var bk_div = document.createElement(\"div\");\n", " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n", " var script_attrs = bk_div.children[0].attributes;\n", " for (var i = 0; i < script_attrs.length; i++) {\n", " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n", " toinsert[toinsert.length - 1].firstChild.textContent = bk_div.children[0].textContent\n", " }\n", " // store reference to server id on output_area\n", " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n", " }\n", " }\n", "\n", " function register_renderer(events, OutputArea) {\n", "\n", " function append_mime(data, metadata, element) {\n", " // create a DOM node to render to\n", " var toinsert = this.create_output_subarea(\n", " metadata,\n", " CLASS_NAME,\n", " EXEC_MIME_TYPE\n", " );\n", " this.keyboard_manager.register_events(toinsert);\n", " // Render to node\n", " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n", " render(props, toinsert[toinsert.length - 1]);\n", " element.append(toinsert);\n", " return toinsert\n", " }\n", "\n", " /* Handle when an output is cleared or removed */\n", " events.on('clear_output.CodeCell', handleClearOutput);\n", " events.on('delete.Cell', handleClearOutput);\n", "\n", " /* Handle when a new output is added */\n", " events.on('output_added.OutputArea', handleAddOutput);\n", "\n", " /**\n", " * Register the mime type and append_mime function with output_area\n", " */\n", " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n", " /* Is output safe? */\n", " safe: true,\n", " /* Index of renderer in `output_area.display_order` */\n", " index: 0\n", " });\n", " }\n", "\n", " // register the mime type if in Jupyter Notebook environment and previously unregistered\n", " if (root.Jupyter !== undefined) {\n", " var events = require('base/js/events');\n", " var OutputArea = require('notebook/js/outputarea').OutputArea;\n", "\n", " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n", " register_renderer(events, OutputArea);\n", " }\n", " }\n", "\n", " \n", " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n", " root._bokeh_timeout = Date.now() + 5000;\n", " root._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"
\\n\"+\n", " \"

\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"

\\n\"+\n", " \"\\n\"+\n", " \"\\n\"+\n", " \"from bokeh.resources import INLINE\\n\"+\n", " \"output_notebook(resources=INLINE)\\n\"+\n", " \"\\n\"+\n", " \"
\"}};\n", "\n", " function display_loaded() {\n", " var el = document.getElementById(\"1002\");\n", " if (el != null) {\n", " el.textContent = \"BokehJS is loading...\";\n", " }\n", " if (root.Bokeh !== undefined) {\n", " if (el != null) {\n", " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n", " }\n", " } else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(display_loaded, 100)\n", " }\n", " }\n", "\n", "\n", " function run_callbacks() {\n", " try {\n", " root._bokeh_onload_callbacks.forEach(function(callback) {\n", " if (callback != null)\n", " callback();\n", " });\n", " } finally {\n", " delete root._bokeh_onload_callbacks\n", " }\n", " console.debug(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(css_urls, js_urls, callback) {\n", " if (css_urls == null) css_urls = [];\n", " if (js_urls == null) js_urls = [];\n", "\n", " root._bokeh_onload_callbacks.push(callback);\n", " if (root._bokeh_is_loading > 0) {\n", " console.debug(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls == null || js_urls.length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " console.debug(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " root._bokeh_is_loading = css_urls.length + js_urls.length;\n", "\n", " function on_load() {\n", " root._bokeh_is_loading--;\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: all BokehJS libraries/stylesheets loaded\");\n", " run_callbacks()\n", " }\n", " }\n", "\n", " function on_error(url) {\n", " console.error(\"failed to load \" + url);\n", " }\n", "\n", " for (let i = 0; i < css_urls.length; i++) {\n", " const url = css_urls[i];\n", " const element = document.createElement(\"link\");\n", " element.onload = on_load;\n", " element.onerror = on_error.bind(null, url);\n", " element.rel = \"stylesheet\";\n", " element.type = \"text/css\";\n", " element.href = url;\n", " console.debug(\"Bokeh: injecting link tag for BokehJS stylesheet: \", url);\n", " document.body.appendChild(element);\n", " }\n", "\n", " const hashes = {\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.0.min.js\": \"HjagQp6T0/7bxYTAXbLotF1MLAGWmhkY5siA1Gc/pcEgvgRPtMsRn0gQtMwGKiw1\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.0.min.js\": \"ZEPPTjL+mdyqgIq+/pl9KTwzji8Kow2NnI3zWY8+sFinWP/SYJ80BnfeJsa45iYj\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.0.min.js\": \"exLqv2ACDRIaV7ZK1iL8aGzGYQvKVuT3U2CT7FsQREBxRah6JrkVCoFy0koY1YqV\"};\n", "\n", " for (let i = 0; i < js_urls.length; i++) {\n", " const url = js_urls[i];\n", " const element = document.createElement('script');\n", " element.onload = on_load;\n", " element.onerror = on_error.bind(null, url);\n", " element.async = false;\n", " element.src = url;\n", " if (url in hashes) {\n", " element.crossOrigin = \"anonymous\";\n", " element.integrity = \"sha384-\" + hashes[url];\n", " }\n", " console.debug(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.head.appendChild(element);\n", " }\n", " };\n", "\n", " function inject_raw_css(css) {\n", " const element = document.createElement(\"style\");\n", " element.appendChild(document.createTextNode(css));\n", " document.body.appendChild(element);\n", " }\n", "\n", " \n", " var js_urls = [\"https://cdn.bokeh.org/bokeh/release/bokeh-2.3.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-widgets-2.3.0.min.js\", \"https://cdn.bokeh.org/bokeh/release/bokeh-tables-2.3.0.min.js\"];\n", " var css_urls = [];\n", " \n", "\n", " var inline_js = [\n", " function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", " function(Bokeh) {\n", " \n", " \n", " }\n", " ];\n", "\n", " function run_inline_js() {\n", " \n", " if (root.Bokeh !== undefined || force === true) {\n", " \n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i].call(root, root.Bokeh);\n", " }\n", " if (force === true) {\n", " display_loaded();\n", " }} else if (Date.now() < root._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!root._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " root._bokeh_failed_load = true;\n", " } else if (force !== true) {\n", " var cell = $(document.getElementById(\"1002\")).parents('.cell').data().cell;\n", " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", " }\n", "\n", " }\n", "\n", " if (root._bokeh_is_loading === 0) {\n", " console.debug(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", " run_inline_js();\n", " } else {\n", " load_libs(css_urls, js_urls, function() {\n", " console.debug(\"Bokeh: BokehJS plotting callback run at\", now());\n", " run_inline_js();\n", " });\n", " }\n", "}(window));" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from bokeh.io import output_notebook, show\n", "from bokeh.plotting import figure\n", "\n", "output_notebook()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "
\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "(function(root) {\n", " function embed_document(root) {\n", " \n", " var docs_json = {\"d2ba439a-265e-4f03-bea5-0d49e3ef7b56\":{\"defs\":[{\"extends\":null,\"module\":null,\"name\":\"DataModel\",\"overrides\":[],\"properties\":[]}],\"roots\":{\"references\":[{\"attributes\":{\"below\":[{\"id\":\"1014\"}],\"center\":[{\"id\":\"1017\"},{\"id\":\"1020\"}],\"left\":[{\"id\":\"1018\"}],\"renderers\":[{\"id\":\"1039\"}],\"title\":{\"id\":\"1004\"},\"toolbar\":{\"id\":\"1028\"},\"toolbar_location\":null,\"width\":800,\"x_range\":{\"id\":\"1006\"},\"x_scale\":{\"id\":\"1010\"},\"y_range\":{\"id\":\"1008\"},\"y_scale\":{\"id\":\"1012\"}},\"id\":\"1003\",\"subtype\":\"Figure\",\"type\":\"Plot\"},{\"attributes\":{},\"id\":\"1045\",\"type\":\"AllLabels\"},{\"attributes\":{\"data\":{\"end_times\":{\"__ndarray__\":\"AAAAAM4zxj8AAAAAUonNPwAAAADdXdA/AAAAAHEh3T8AAAAA30DjPwAAAID9guk/AAAAgL0V6z8AAAAAT/bsPwAAAIBvxO0/AAAAQOM78j8AAADAsBH0PwAAAEByV/c/AAAAgO2E+T8AAABAgo37PwAAAABFgPw/AAAAgNT0/T8AAADAKHv+PwAAAIDoaQBAAAAA4MjKAUAAAACApcEDQA==\",\"dtype\":\"float64\",\"order\":\"little\",\"shape\":[20]},\"events\":[\"make_res(list(int64),)-[0]\",\"GET(list(int64), int64)-[1]\",\"default_lt(int64, int64)-[2]\",\"partition(list(int64), list(int64), int64, int64)-[3]\",\"insertion_sort(list(int64), list(int64), int64, int64)-[4]\",\"run_quicksort(list(int64),)-[5]\",\"make_res(list(int64),)-[6]\",\"GET(list(int64), int64)-[7]\",\"gt(int64, int64)-[8]\",\"partition(list(int64), list(int64), int64, int64)-[9]\",\"insertion_sort(list(int64), list(int64), int64, int64)-[10]\",\"run_quicksort(list(int64),)-[11]\",\"impl(list(int64), omitted(default=None), omitted(default=False))-[12]\",\"_check_finite_matrix(array(float64, 2d, C),)-[13]\",\"impl(array(float64, 2d, C),)-[14]\",\"_handle_err_maybe_convergence_problem(int32,)-[15]\",\"_dummy_liveness_func(list(int64),)-[16]\",\"svd_impl(array(float64, 2d, C), omitted(default=1))-[17]\",\"impl(list(int64), omitted(default=None), omitted(default=False))-[18]\",\"foo(array(float64, 2d, C),)-[19]\"],\"start_times\":{\"__ndarray__\":\"AAAAACxswj8AAAAAYKvKPwAAAAAyq80/AAAAADLmxj8AAAAAGXLdPwAAAACoar4/AAAAgIBE6j8AAACA9TjsPwAAAADt/uw/AAAAAGI86z8AAAAABlDyPwAAAADwh+k/AAAAAOj2tj8AAACA7dn5PwAAAABEkPs/AAAAgL2I/D8AAADACfj9PwAAAABzifk/AAAAAO2AAEAAAAAAAAAAAA==\",\"dtype\":\"float64\",\"order\":\"little\",\"shape\":[20]}},\"selected\":{\"id\":\"1048\"},\"selection_policy\":{\"id\":\"1049\"}},\"id\":\"1035\",\"type\":\"ColumnDataSource\"},{\"attributes\":{},\"id\":\"1046\",\"type\":\"CategoricalTickFormatter\"},{\"attributes\":{\"bottom_units\":\"screen\",\"fill_alpha\":0.5,\"fill_color\":\"lightgrey\",\"left_units\":\"screen\",\"level\":\"overlay\",\"line_alpha\":1.0,\"line_color\":\"black\",\"line_dash\":[4,4],\"line_width\":2,\"right_units\":\"screen\",\"syncable\":false,\"top_units\":\"screen\"},\"id\":\"1027\",\"type\":\"BoxAnnotation\"},{\"attributes\":{\"fill_color\":{\"value\":\"#1f77b4\"},\"height\":{\"value\":0.8},\"left\":{\"field\":\"start_times\"},\"line_color\":{\"value\":\"#1f77b4\"},\"right\":{\"field\":\"end_times\"},\"y\":{\"field\":\"events\"}},\"id\":\"1037\",\"type\":\"HBar\"},{\"attributes\":{},\"id\":\"1010\",\"type\":\"LinearScale\"},{\"attributes\":{\"text\":\"Compilation events\"},\"id\":\"1004\",\"type\":\"Title\"},{\"attributes\":{\"source\":{\"id\":\"1035\"}},\"id\":\"1040\",\"type\":\"CDSView\"},{\"attributes\":{},\"id\":\"1048\",\"type\":\"Selection\"},{\"attributes\":{\"factors\":[\"make_res(list(int64),)-[0]\",\"GET(list(int64), int64)-[1]\",\"default_lt(int64, int64)-[2]\",\"partition(list(int64), list(int64), int64, int64)-[3]\",\"insertion_sort(list(int64), list(int64), int64, int64)-[4]\",\"run_quicksort(list(int64),)-[5]\",\"make_res(list(int64),)-[6]\",\"GET(list(int64), int64)-[7]\",\"gt(int64, int64)-[8]\",\"partition(list(int64), list(int64), int64, int64)-[9]\",\"insertion_sort(list(int64), list(int64), int64, int64)-[10]\",\"run_quicksort(list(int64),)-[11]\",\"impl(list(int64), omitted(default=None), omitted(default=False))-[12]\",\"_check_finite_matrix(array(float64, 2d, C),)-[13]\",\"impl(array(float64, 2d, C),)-[14]\",\"_handle_err_maybe_convergence_problem(int32,)-[15]\",\"_dummy_liveness_func(list(int64),)-[16]\",\"svd_impl(array(float64, 2d, C), omitted(default=1))-[17]\",\"impl(list(int64), omitted(default=None), omitted(default=False))-[18]\",\"foo(array(float64, 2d, C),)-[19]\"]},\"id\":\"1008\",\"type\":\"FactorRange\"},{\"attributes\":{},\"id\":\"1012\",\"type\":\"CategoricalScale\"},{\"attributes\":{},\"id\":\"1006\",\"type\":\"DataRange1d\"},{\"attributes\":{},\"id\":\"1049\",\"type\":\"UnionRenderers\"},{\"attributes\":{},\"id\":\"1015\",\"type\":\"BasicTicker\"},{\"attributes\":{\"axis\":{\"id\":\"1014\"},\"ticker\":null},\"id\":\"1017\",\"type\":\"Grid\"},{\"attributes\":{\"data_source\":{\"id\":\"1035\"},\"glyph\":{\"id\":\"1037\"},\"hover_glyph\":null,\"muted_glyph\":null,\"nonselection_glyph\":{\"id\":\"1038\"},\"view\":{\"id\":\"1040\"}},\"id\":\"1039\",\"type\":\"GlyphRenderer\"},{\"attributes\":{},\"id\":\"1042\",\"type\":\"AllLabels\"},{\"attributes\":{},\"id\":\"1043\",\"type\":\"BasicTickFormatter\"},{\"attributes\":{},\"id\":\"1026\",\"type\":\"HelpTool\"},{\"attributes\":{\"active_multi\":null,\"tools\":[{\"id\":\"1021\"},{\"id\":\"1022\"},{\"id\":\"1023\"},{\"id\":\"1024\"},{\"id\":\"1025\"},{\"id\":\"1026\"}]},\"id\":\"1028\",\"type\":\"Toolbar\"},{\"attributes\":{\"formatter\":{\"id\":\"1046\"},\"major_label_policy\":{\"id\":\"1045\"},\"ticker\":{\"id\":\"1019\"}},\"id\":\"1018\",\"type\":\"CategoricalAxis\"},{\"attributes\":{\"axis\":{\"id\":\"1018\"},\"dimension\":1,\"ticker\":null},\"id\":\"1020\",\"type\":\"Grid\"},{\"attributes\":{},\"id\":\"1019\",\"type\":\"CategoricalTicker\"},{\"attributes\":{},\"id\":\"1022\",\"type\":\"WheelZoomTool\"},{\"attributes\":{\"overlay\":{\"id\":\"1027\"}},\"id\":\"1023\",\"type\":\"BoxZoomTool\"},{\"attributes\":{},\"id\":\"1021\",\"type\":\"PanTool\"},{\"attributes\":{},\"id\":\"1024\",\"type\":\"SaveTool\"},{\"attributes\":{},\"id\":\"1025\",\"type\":\"ResetTool\"},{\"attributes\":{\"axis_label\":\"Seconds since first event\",\"formatter\":{\"id\":\"1043\"},\"major_label_policy\":{\"id\":\"1042\"},\"ticker\":{\"id\":\"1015\"}},\"id\":\"1014\",\"type\":\"LinearAxis\"},{\"attributes\":{\"fill_alpha\":{\"value\":0.1},\"fill_color\":{\"value\":\"#1f77b4\"},\"height\":{\"value\":0.8},\"left\":{\"field\":\"start_times\"},\"line_alpha\":{\"value\":0.1},\"line_color\":{\"value\":\"#1f77b4\"},\"right\":{\"field\":\"end_times\"},\"y\":{\"field\":\"events\"}},\"id\":\"1038\",\"type\":\"HBar\"}],\"root_ids\":[\"1003\"]},\"title\":\"Bokeh Application\",\"version\":\"2.3.0\"}};\n", " var render_items = [{\"docid\":\"d2ba439a-265e-4f03-bea5-0d49e3ef7b56\",\"root_ids\":[\"1003\"],\"roots\":{\"1003\":\"934aec94-b18f-45ec-b303-a91dd60d8ccc\"}}];\n", " root.Bokeh.embed.embed_items_notebook(docs_json, render_items);\n", "\n", " }\n", " if (root.Bokeh !== undefined) {\n", " embed_document(root);\n", " } else {\n", " var attempts = 0;\n", " var timer = setInterval(function(root) {\n", " if (root.Bokeh !== undefined) {\n", " clearInterval(timer);\n", " embed_document(root);\n", " } else {\n", " attempts++;\n", " if (attempts > 100) {\n", " clearInterval(timer);\n", " console.log(\"Bokeh: ERROR: Unable to run BokehJS code because BokehJS library is missing\");\n", " }\n", " }\n", " }, 10, root)\n", " }\n", "})(window);" ] }, "metadata": { "application/vnd.bokehjs_exec.v0+json": { "id": "1003" } }, "output_type": "display_data" } ], "source": [ "p = figure(y_range=source['events'],\n", " plot_width=800, plot_height=600, toolbar_location=None,\n", " title=\"Compilation events\")\n", "p.hbar(y=\"events\", left='start_times', right='end_times', height=.8, source=source)\n", "p.xaxis.axis_label = \"Seconds since first event\"\n", "\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The plot above shows all the compilation events triggered for `foo()`. We can see that it triggers the compilation of 19 functions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The event API is used internally to record time spent in the compiler and LLVM. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Timers: {'llvm_lock': 1.1270671279999966, 'compiler_lock': 2.46969849}\n", "\n", "Total time spent in LLVM: 1.13s\n", "Total time spent in the compiler (includes LLVM time): 2.47s\n", "Percent of time spent in LLVM: 45.6%\n" ] } ], "source": [ "# Print out the timers for compiler_lock\n", "md = foo.get_metadata(foo.signatures[0])\n", "timers = md['timers']\n", "print(f\"Timers: {timers}\")\n", "print()\n", "print(f\"Total time spent in LLVM: {timers['llvm_lock']:.2f}s\")\n", "print(f\"Total time spent in the compiler (includes LLVM time): {timers['compiler_lock']:.2f}s\")\n", "\n", "print(f\"Percent of time spent in LLVM: {timers['llvm_lock'] / timers['compiler_lock'] * 100:.1f}%\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see, about half of the time is spent in LLVM. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## LLVM Pass Timings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To understand why some functions take a long time in LLVM, we can look at the [LLVM pass timings](https://numba.readthedocs.io/en/0.53.0/developer/llvm_timings.html) to see which LLVM compiler passes are taking a long time.\n", "\n", "This feature is turned off by default because of the overhead incurred in LLVM when recording the pass timings. To turn this on, set `numba.core.config.LLVM_PASS_TIMINGS` to truthy as done at the start of this notebook.\n", "\n", "The pass timings are recorded in the metadata for each compiled version of the function." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Printing pass timings for JITCodeLibrary('foo')\n", "Total time: 0.1698\n", "== #0 Function passes on '_ZN8__main__7foo$241E5ArrayIdLi2E1C7mutable7alignedE'\n", " Percent: 2.1%\n", " Total 0.0035s\n", " Top timings:\n", " 0.0026s ( 74.4%) Detect Dead Lanes #17\n", " 0.0005s ( 14.1%) Early CSE #34\n", " 0.0002s ( 7.1%) Prune NRT refops #31\n", " 0.0001s ( 1.9%) Simplify the CFG #39\n", " 0.0000s ( 1.0%) Normalize NRT refops #35\n", "== #1 Function passes on '.dtor.list.int64'\n", " Percent: 0.1%\n", " Total 0.0001s\n", " Top timings:\n", " 0.0000s ( 29.3%) Early CSE #34\n", " 0.0000s ( 15.5%) Simplify the CFG #39\n", " 0.0000s ( 11.7%) Prune NRT refops #31\n", " 0.0000s ( 10.0%) Post-Dominator Tree Construction #36\n", " 0.0000s ( 8.6%) Dominator Tree Construction #178\n", "== #2 Function passes on '_ZN7cpython8__main__7foo$241E5ArrayIdLi2E1C7mutable7alignedE'\n", " Percent: 0.6%\n", " Total 0.0011s\n", " Top timings:\n", " 0.0005s ( 48.6%) SROA #31\n", " 0.0003s ( 23.5%) Early CSE #35\n", " 0.0001s ( 9.6%) Prune NRT refops #32\n", " 0.0001s ( 8.7%) Simplify the CFG #40\n", " 0.0000s ( 2.5%) Post-Dominator Tree Construction #37\n", "== #3 Function passes on 'cfunc._ZN8__main__7foo$241E5ArrayIdLi2E1C7mutable7alignedE'\n", " Percent: 0.2%\n", " Total 0.0004s\n", " Top timings:\n", " 0.0001s ( 32.6%) SROA #32\n", " 0.0001s ( 31.3%) Early CSE #36\n", " 0.0001s ( 14.0%) Simplify the CFG #41\n", " 0.0000s ( 10.1%) Prune NRT refops #33\n", " 0.0000s ( 2.8%) Normalize NRT refops #36\n", "== #4 Module passes (cheap optimization for refprune)\n", " Percent: 5.6%\n", " Total 0.0095s\n", " Top timings:\n", " 0.0063s ( 66.1%) Combine redundant instructions\n", " 0.0008s ( 8.2%) Function Integration/Inlining\n", " 0.0007s ( 6.9%) Prune NRT refops #3\n", " 0.0002s ( 2.4%) Post-Dominator Tree Construction #3\n", " 0.0002s ( 1.9%) Rotate Loops\n", "== #5 Module passes (full optimization)\n", " Percent: 46.6%\n", " Total 0.0791s\n", " Top timings:\n", " 0.0067s ( 8.5%) Induction Variable Simplification\n", " 0.0050s ( 6.3%) Global Value Numbering\n", " 0.0036s ( 4.6%) SLP Vectorizer\n", " 0.0033s ( 4.2%) Combine redundant instructions #4\n", " 0.0031s ( 3.9%) Combine redundant instructions #7\n", "== #6 Finalize object\n", " Percent: 44.8%\n", " Total 0.0761s\n", " Top timings:\n", " 0.0291s ( 38.3%) X86 DAG->DAG Instruction Selection #19\n", " 0.0077s ( 10.1%) Greedy Register Allocator #18\n", " 0.0063s ( 8.3%) Loop Strength Reduction #11\n", " 0.0027s ( 3.6%) Induction Variable Users #10\n", " 0.0026s ( 3.4%) Machine Instruction Scheduler #21\n" ] } ], "source": [ "md = foo.get_metadata(foo.signatures[0])\n", "# Get a summary of the LLVM pass timings\n", "print(md['llvm_pass_timings'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are multiple sections in LLVM timing information. Each represents a single run of a bundle of LLVM passes for a specific purpose. The name of each bundle gives hints to the purpose. Under `== #0 Function passes on '_ZN8__main__8foo$2426E5ArrayIdLi2E1C7mutable7alignedE'` are the timings for running lightweight pre-optimization passes on `foo()`. These passes took 2.8% of all pass execution time. Under `== #6 Finalize object` are the timings for emitting the machine code. These passes took 44.0% of all pass execution time. The `Top timings` is a descending list of the 5 most expensive passes. In this case the `X86 DAG->DAG Instruction Selection` is the most expensive pass.\n", "\n", "Under `== #5 Module passes (full optimization)` are the timings for the full optimization. At this stage, all referenced functions are linked into the same LLVM module for aggressive optimizations, including function inlining, loop optimizations, and SIMD-vectorization. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's also possible to process the timings programmatically. See the [API reference here](https://numba.readthedocs.io/en/0.53.0/developer/llvm_timings.html#api-for-custom-analysis)." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "numba.misc.llvm_pass_timings.PassTimingsCollection" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pass_timings = md['llvm_pass_timings']\n", "type(pass_timings)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get pass-bundles in descending order of execution time:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "bundles = pass_timings.list_longest_first()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Longest pass bundle is: 'Module passes (full optimization)'\n", "Top 3 passes:\n", "- 'Induction Variable Simplification' pass took 0.007s\n", "- 'Global Value Numbering' pass took 0.005s\n", "- 'SLP Vectorizer' pass took 0.004s\n" ] } ], "source": [ "print(f\"Longest pass bundle is: {bundles[0].name!r}\")\n", "\n", "print(\"Top 3 passes:\")\n", "for info in bundles[0].timings.list_top(3):\n", " print(f\"- {info.pass_name!r} pass took {info.wall_time:.3f}s\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Total seconds spent in LLVM passes:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.1698" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pass_timings.get_total_time()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note, this is a small fraction of the time shown earlier as measured using the events API. The reason is that the LLVM pass timings are showing per-function statistics. In the following section, we will combine both techniques to discover all the LLVM pass timings in all triggered compilations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Combining Events and LLVM Pass Timings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The event API keeps extra data, including the dispatcher and the type signature for that compilation." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Event(numba:compile, EventStatus.START, data: dict)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ev = rec.buffer[0][1]\n", "ev" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'dispatcher': CPUDispatcher(),\n", " 'args': (array(float64, 2d, C),),\n", " 'return_type': None}" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ev.data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can gather all the LLVM pass timings in all the recorded compilation events." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "all_pass_timings = {}\n", "for ts, ev in rec.buffer:\n", " # Use only the START event\n", " if ev.is_start:\n", " # Get the extra data in the event\n", " extra = ev.data\n", " # Get the dispatcher for the event\n", " dispatcher = extra['dispatcher']\n", " md = dispatcher.get_metadata(extra['args'])\n", " # Get the LLVM pass timings for this dispatcher\n", " timings = md['llvm_pass_timings']\n", " all_pass_timings[ev] = timings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get the timings in descending order:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "unordered_timings = []\n", "total_time_in_passes = 0.\n", "for ev, timings in all_pass_timings.items():\n", " curtime = timings.get_total_time()\n", " total_time_in_passes += curtime\n", " unordered_timings.append((curtime, ev.data['dispatcher'].__qualname__))\n", " \n", "ordered_timings = sorted(unordered_timings, key=lambda x: x[0], reverse=True)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21.5% in LLVM passes for 'foo'\n", "11.3% in LLVM passes for 'ol_list_sort..impl'\n", "11.2% in LLVM passes for 'ol_list_sort..impl'\n", "10.5% in LLVM passes for 'make_quicksort_impl..run_quicksort'\n", "10.2% in LLVM passes for 'make_quicksort_impl..run_quicksort'\n", " 7.0% in LLVM passes for 'svd_impl..svd_impl'\n", " 5.9% in LLVM passes for 'make_quicksort_impl..partition'\n", " 5.7% in LLVM passes for 'make_quicksort_impl..partition'\n", " 3.4% in LLVM passes for 'make_quicksort_impl..insertion_sort'\n", " 3.3% in LLVM passes for 'make_quicksort_impl..insertion_sort'\n", " 2.6% in LLVM passes for 'ol_copy_to_fortran_order..impl'\n", " 1.6% in LLVM passes for '_check_finite_matrix'\n", " 1.0% in LLVM passes for '_handle_err_maybe_convergence_problem'\n", " 1.0% in LLVM passes for '_dummy_liveness_func'\n", " 0.9% in LLVM passes for 'make_quicksort_impl..make_res'\n", " 0.8% in LLVM passes for 'make_quicksort_impl..make_res'\n", " 0.6% in LLVM passes for 'gt'\n", " 0.6% in LLVM passes for 'make_quicksort_impl..default_lt'\n", " 0.5% in LLVM passes for 'make_quicksort_impl..GET'\n", " 0.5% in LLVM passes for 'make_quicksort_impl..GET'\n", "total time in LLVM passes: 0.7901000000000001s\n" ] } ], "source": [ "for run_time, func_name in ordered_timings:\n", " print(f\"{run_time / total_time_in_passes * 100:4.1f}% in LLVM passes for {func_name!r}\")\n", "total_time_in_passes = sum(x for x, _ in ordered_timings)\n", "print(f'total time in LLVM passes: {total_time_in_passes}s')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Not all of the time reported by the `llvm_lock` event is spent in LLVM passes. The remaining time is spent in other LLVM API calls." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Percent of llvm_lock time not accounted for in pass timings: 29.9%\n" ] } ], "source": [ "total_llvm_time = foo.get_metadata(foo.signatures[0])['timers']['llvm_lock']\n", "\n", "print(f\"Percent of llvm_lock time not accounted for in pass timings: {(total_llvm_time - total_time_in_passes) / total_llvm_time * 100:.1f}%\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }