{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "# C Output and Parameter Interfaces\n", "\n", "## Author: Zach Etienne\n", "### Formatting improvements courtesy Brandon Clark\n", "\n", "## Exploring C output and parameter interfaces in NRPy+, this notebook initializes core Python/NRPy+ modules, performs common subexpression elimination (CSE), and generates C code. It further delves into the NRPy+ parameter interface and demonstrates how Single Instruction, Multiple Data (SIMD) paradigms can optimize NRPy+ generated C code.\n", "\n", "### Required reading if you are unfamiliar with programming or [computer algebra systems](https://en.wikipedia.org/wiki/Computer_algebra_system). Otherwise, use for reference; you should be able to pick up the syntax as you follow the tutorial.\n", "+ **[Python Tutorial](https://docs.python.org/3/tutorial/index.html)**\n", "+ **[SymPy Tutorial](http://docs.sympy.org/latest/tutorial/intro.html)**\n", "\n", "### NRPy+ Source Code for this module: \n", "* [outputC.py](../edit/outputC.py)\n", "* [NRPy_param_funcs.py](../edit/NRPy_param_funcs.py)\n", "* [SIMD.py](../edit/SIMD.py)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Table of Contents\n", "$$\\label{toc}$$\n", "\n", "The module is organized as follows:\n", "\n", "1. [Step 1](#initializenrpy): Initialize core Python/NRPy+ modules\n", "1. [Step 2](#sympy_ccode): Common Subexpression Elimination (CSE)\n", "1. [Step 3](#coutput): **Let's generate some C code!** NRPy+'s core C code output routine, `Coutput()`\n", " 1. [Step 3.a](#cfunction): **Wrap it up!** NRPy+'s C function wrapper routine, `outCfunction()`\n", "1. [Step 4](#param): **Oh, the features you'll see!** Parameters in NRPy+\n", " 1. [Step 4.a](#param_func): `NRPy_param_funcs`: The NRPy+ Parameter Interface\n", "1. [Step 5](#simd): **Warp speed!** SIMD (Single Instruction, Multiple Data) in NRPy+-Generated C Code\n", "1. [Step 6](#latex_pdf_output): Output this notebook to $\\LaTeX$-formatted PDF file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Step 1: Initialize core Python/NRPy+ modules \\[Back to [top](#toc)\\]\n", "$$\\label{initializenrpy}$$\n", "\n", "Let's start by importing all the needed modules from Python/NRPy+ for dealing with parameter interfaces and outputting C code. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:06:49.359861Z", "iopub.status.busy": "2021-03-07T17:06:49.358149Z", "iopub.status.idle": "2021-03-07T17:06:49.696299Z", "shell.execute_reply": "2021-03-07T17:06:49.695601Z" } }, "outputs": [], "source": [ "# Step 1: Initialize core Python/NRPy+ modules\n", "from outputC import outputC,outCfunction # NRPy+: Core C code output module\n", "import NRPy_param_funcs as par # NRPy+: parameter interface\n", "import sympy as sp # SymPy: The Python computer algebra package upon which NRPy+ depends" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Step 2: Common Subexpression Elimination (CSE) \\[Back to [top](#toc)\\]\n", "$$\\label{sympy_ccode}$$\n", "\n", "Let's begin with a simple [SymPy](http://www.sympy.org/) worksheet that makes use of SymPy's built in C code generator function, [ccode](http://docs.sympy.org/dev/modules/utilities/codegen.html)(), to evaluate the expression $x = b^2 \\sin (2a) + \\frac{c}{\\sin (2a)}$." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:06:49.705899Z", "iopub.status.busy": "2021-03-07T17:06:49.702014Z", "iopub.status.idle": "2021-03-07T17:06:49.798097Z", "shell.execute_reply": "2021-03-07T17:06:49.798600Z" } }, "outputs": [ { "data": { "text/plain": [ "'pow(b, 2)*sin(2*a) + c/sin(2*a)'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 2: Common Subexpression Elimination\n", "\n", "# Declare some variables, using SymPy's symbols() function\n", "a,b,c = sp.symbols(\"a b c\")\n", "\n", "# Set x = b^2*sin(2*a) + c/sin(2*a).\n", "x = b**2*sp.sin(2*a) + c/(sp.sin(2*a))\n", "\n", "# Convert the expression into C code\n", "sp.ccode(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Computation of this expression in C requires 3 multiplications, one division, two sin() function calls, and one addition. Multiplications, additions, and subtractions typically require one clock cycle per SIMD element on a modern CPU, while divisions can require ~3x longer, and transcendental functions ~20x longer than additions or multiplications (See, e.g., [this page](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX&expand=118), [this page](http://www.agner.org/optimize/microarchitecture.pdf), or [this page](http://nicolas.limare.net/pro/notes/2014/12/16_math_speed/) for more details). \n", "\n", "One goal in generating C codes involving mathematical expressions in NRPy+ is to minimize the number of floating point operations, and SymPy provides a means to do this, known as [common subexpression elimination](https://en.wikipedia.org/wiki/Common_subexpression_elimination), or CSE.\n", "\n", "CSE algorithms search for common patterns within expressions and declare them as new variables, so they need not be computed again. To call SymPy's CSE algorithm, we need only pass the expression to [sp.cse()](http://docs.sympy.org/latest/modules/simplify/simplify.html#sympy.simplify.cse_main.cse):" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:06:49.805403Z", "iopub.status.busy": "2021-03-07T17:06:49.804722Z", "iopub.status.idle": "2021-03-07T17:06:49.807547Z", "shell.execute_reply": "2021-03-07T17:06:49.808012Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "([(x0, sin(2*a))], [b**2*x0 + c/x0])\n" ] } ], "source": [ "print(sp.cse(x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, SymPy returned a list with two elements. The first element, $(\\texttt{x0, sin(2*a)})$, indicates that a new variable $\\texttt{x0}$ should be set to $\\texttt{sin(2*a)}$. The second element yields the expression for our original expression $x$ in terms of the original variables, as well as the new variable $\\texttt{x0}$. \n", "\n", "$$\\texttt{x0} = \\sin(2*a)$$ is the common subexpression, so that the final expression $x$ is given by $$x = pow(b,2)*\\texttt{x0} + c/\\texttt{x0}.$$\n", "\n", "Thus, at the cost of a new variable assignment, SymPy's CSE has decreased the computational cost by one multiplication and one sin() function call.\n", "\n", "NRPy+ makes full use of SymPy's CSE algorithm in generating optimized C codes, and in addition automatically adjusts expressions like `pow(x,2)` into `((x)*(x))`.\n", "\n", "*Caveat: In order for a CSE to function optimally, it needs to know something about the cost of basic mathematical operations versus the cost of declaring a new variable. SymPy's CSE algorithm does not make any assumptions about cost, instead opting to declare new variables any time a common pattern is found more than once. The degree to which this is suboptimal is unclear.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Step 3: **Let's generate some C code!** NRPy+'s core C code output routine, `Coutput()` \\[Back to [top](#toc)\\]\n", "$$\\label{coutput}$$\n", "\n", "NRPy+'s `outputC()` function provides the core of NRPy+ functionality. It builds upon SymPy's `ccode()` and `cse()` functions and adds the ability to generate [SIMD](https://en.wikipedia.org/wiki/SIMD) [compiler intrinsics](https://software.intel.com/sites/landingpage/IntrinsicsGuide/) for modern Intel and AMD-based CPUs. \n", "\n", "As `outputC()` is at the heart of NRPy+, it will be useful to understand how it is called:\n", "\n", "```python\n", "outputC(sympyexpr, output_varname_str, filename = \"stdout\", params = \"\", prestring = \"\", poststring = \"\")\n", "```\n", "\n", "`outputC()` requires at least two arguments: \n", "+ **sympyexpr** is a SymPy expression or a list of SymPy expressions\n", "+ **output_varname_str** is the variable name to assign the SymPy expression, or alternatively the list of variable names to assign the SymPy expressions. If a list is provided, it must be the same length as the list of SymPy expressions.\n", "\n", "Additional, optional arguments to `outputC()` include\n", "+ **filename** (third argument; defaults to \"stdout\" if unspecified): \n", " + \"stdout\" = print to the screen\n", " + \"filename.c\" = output to filename.c\n", " + \"returnstring\" = return C output as a string. I.e., call \n", " + string = outputC(sympyexpr, output_varname_str, filename = \"returnstring\")\n", " + ... and then manipulate the string directly.\n", "+ **params** (fourth argument; defaults to \"\" if unspecified): A comma-separated list of tunable parameters. For example: *params=\"preindent=1,includebraces=False,declareoutputvars=False,SIMD_debug=True\"* Parameters can be listed in any order, and repeats are allowed; the final repeated value will be the value that is set. List of parameters:\n", " + *preindent*: (integer, defaults to 0) The number of tab stops to add to C code output\n", " + *includebraces*: (True or False, defaults to True) Wrap the C output expression in curly braces?\n", " + *declareoutputvars*: (True or False, defaults to False) Prepend the output variable with the variable type, thus defining the output variable.\n", " + *outCfileaccess*: (\"w\" or \"a\", defaults to \"w\") Write (\"w\") or append (\"a\") to the C output file.\n", " + *outCverbose*: (True or False, defaults to True) Output a comment block displaying the input SymPy expressions, if set to True.\n", " + *CSE_enable*: (True or False, defaults to True) If set to True, common-subexpression elimination (CSE) will be used.\n", " + *CSE_varprefix*: (Any string without spaces, defaults to \"tmp\") Prefix each temporary variable in the CSE with this string.\n", " + *enable_SIMD*: (True or False, defaults to False) If set to True, C code output exclusively uses SIMD compiler intrinsics, which must be linked to the actual intrinsics for a given compiler/SIMD library through C macros.\n", " + *SIMD_debug*: (True or False, defaults to False) Verify for each expression that the SIMD output matches the input (non-SIMD) expression.\n", "+ **prestring** (fifth argument; defaults to \"\" -- empty string -- if unspecified): Preface C code output with prestring.\n", "+ **poststring** (sixth argument): Same as prestring, but places poststring at the end of the C code output.\n", "\n", "Notice that by default, CSE is enabled (fourth function argument). Thus if we call outputC with two arguments, NRPy+ will process the expression through SymPy's CSE:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:06:49.817594Z", "iopub.status.busy": "2021-03-07T17:06:49.816877Z", "iopub.status.idle": "2021-03-07T17:06:49.819957Z", "shell.execute_reply": "2021-03-07T17:06:49.820512Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/*\n", " * Original SymPy expression:\n", " * \"x = b**2*sin(2*a) + c/sin(2*a)\"\n", " */\n", "{\n", " const double tmp_0 = sin(2*a);\n", " x = ((b)*(b))*tmp_0 + c/tmp_0;\n", "}\n", "\n" ] } ], "source": [ "# Step 3: NRPy+'s C code output routine, `Coutput()`\n", "\n", "# Declare some variables, using SymPy's symbols() function\n", "a,b,c = sp.symbols(\"a b c\")\n", "\n", "# Set x = b^2*sin(2*a) + c/sin(2*a).\n", "x = b**2*sp.sin(2*a) + c/(sp.sin(2*a))\n", "\n", "outputC(x,\"x\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## Step 3.a: **Wrap it up!** NRPy+'s C function wrapper routine, `outCfunction()` \\[Back to [top](#toc)\\]\n", "$$\\label{cfunction}$$" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:06:49.832459Z", "iopub.status.busy": "2021-03-07T17:06:49.829491Z", "iopub.status.idle": "2021-03-07T17:06:49.834572Z", "shell.execute_reply": "2021-03-07T17:06:49.834044Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/*\n", " * Output x(a,b,c) = b^2*sin(2*a) + c/sin(2*a)\n", " */\n", "void output_x_of_a_b_c(const double a,const double b,const double c, double *x) {\n", "\n", " /*\n", " * Original SymPy expression:\n", " * \"*x = b**2*sin(2*a) + c/sin(2*a)\"\n", " */\n", " const double tmp_0 = sin(2*a);\n", " *x = ((b)*(b))*tmp_0 + c/tmp_0;\n", "}\n", "\n" ] } ], "source": [ "# Declare some variables, using SymPy's symbols() function\n", "a,b,c = sp.symbols(\"a b c\")\n", "\n", "# Set x = b^2*sin(2*a) + c/sin(2*a).\n", "x = b**2*sp.sin(2*a) + c/(sp.sin(2*a))\n", "\n", "desc=\"Output x(a,b,c) = b^2*sin(2*a) + c/sin(2*a)\"\n", "name=\"output_x_of_a_b_c\"\n", "string = outCfunction(\n", " outfile = \"returnstring\", desc=desc, name=name,\n", " params = \"const double a,const double b,const double c, double *x\",\n", " body = outputC(x,\"*x\",filename=\"returnstring\",params=\"includebraces=False,preindent=1\"),\n", " enableCparameters=False)\n", "print(string)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## \n", "\n", "# Step 4: Oh, the features you'll see! Parameters in NRPy+ \\[Back to [top](#toc)\\]\n", "$$\\label{param}$$\n", "\n", "*TL;DR: When adding new features to NRPy+ or to modules that use NRPy+, it is strongly recommended to take advantage of NRPy+'s parameter interface.*\n", "\n", "As documented above, NRPy+'s `outputC()` routine accepts up to six inputs. Suppose we have a project that makes use of NRPy+ to generate *multiple C codes* for a project. It is reasonable to expect that these six inputs might vary from one C code to the next in the same project. (For example, sometimes a C code will be sufficiently simple that CSE only acts to obfuscate.) Thus we include these six inputs as part of the function call.\n", "\n", "Suppose we wanted to add another feature to `outputC()` that is universal to our project. If `outputC()`'s behavior were only steerable with inputs into the function call, then the number of inputs will balloon with the number of features, making the entire NRPy+ codebase far less manageable. To address this problem, while at the same time making the modules and functions within NRPy+ more easily extensible, we have introduced a parameter interface." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "## Step 4.a: NRPy_param_funcs: The NRPy+ Parameter Interface \\[Back to [top](#toc)\\]\n", "$$\\label{param_func}$$\n", "\n", "The **`NRPy_param_funcs`** module manages the parameter interface in NRPy+, and parameter information is stored in two global data structures defined within this module:\n", "\n", "* glb_params_list\\[\\]: The list of registered parameters. Each item in the list is a [named tuple](https://docs.python.org/2/library/collections.html#collections.namedtuple) of the type `glb_param`, where the type is defined\n", " * glb_param(\\[parameter type\\],\\[module name\\],\\[parameter name\\],\\[default value\\]).\n", "* glb_paramsvals_list\\[\\]: The list of parameter values. When a new glb_param is appended to the glb_params_list\\[\\], the corresponding element in glb_paramsvals_list\\[\\] is set to the default value. This value can be overwritten with\n", " * parameter files or parameter file overrides *when running NRPy+ in command-line mode*, as follows:\n", " * **python nrpy.py \\[PARAMETER FILE\\] \\[PARAMETER FILE OVERRIDES\\]**), or\n", " * with set_paramsvals_value(\"modulename::variablename = \\[value\\]\") *when running NRPy+ in interactive mode*.\n", "\n", "**Example**: Suppose you write a new module, *mymodule* (in \"mymodule.py\") that depends on NRPy+, which contains a free parameter $n$, which is an integer (we set integers in NRPy+ as *type=\"int\"*). A reasonable default value is $n=2$. To register this parameter with NRPy+, set the following at the top of your mymodule.py:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:06:49.839740Z", "iopub.status.busy": "2021-03-07T17:06:49.838960Z", "iopub.status.idle": "2021-03-07T17:06:49.841095Z", "shell.execute_reply": "2021-03-07T17:06:49.841586Z" } }, "outputs": [], "source": [ "# Step 4.a: NRPy_param_funcs: The NRPy+ Parameter Interface\n", "\n", "par.initialize_param(par.glb_param(type=\"int\", module=\"mymodule\", parname=\"n\", defaultval=2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At any time, you can find the parameter's value via the `par.parval_from_str()` function, which accepts a string in one of two formats: \"`variablename`\" or \"`modulename::variablename`\". *Warning*: If more than one module sets the parameter with variable name `\"n\"`, `par.parval_from_str(\"n\")` will produce an error." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:06:49.846589Z", "iopub.status.busy": "2021-03-07T17:06:49.845926Z", "iopub.status.idle": "2021-03-07T17:06:49.849470Z", "shell.execute_reply": "2021-03-07T17:06:49.848932Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n", "2\n" ] } ], "source": [ "print(par.parval_from_str(\"n\"))\n", "print(par.parval_from_str(\"mymodule::n\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let's overwrite the default parameter value of `\"mymodule::n\"` to be 4 instead:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:06:49.854156Z", "iopub.status.busy": "2021-03-07T17:06:49.853414Z", "iopub.status.idle": "2021-03-07T17:06:49.856669Z", "shell.execute_reply": "2021-03-07T17:06:49.856160Z" }, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4\n" ] } ], "source": [ "par.set_paramsvals_value(\"mymodule::n = 4\")\n", "print(par.parval_from_str(\"mymodule::n\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Warning**: Setting NRPy+ parameters via direct calls to `par.set_paramsvals_value(\"modulename::variablename\")`, when in non-interactive mode (e.g., running in a Jupyter or iPython notebook) is *strongly* discouraged, and in the future may result in an error message." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Step 5: Warp speed! SIMD (Single Instruction, Multiple Data) in NRPy+-Generated C Code \\[Back to [top](#toc)\\]\n", "$$\\label{simd}$$\n", "\n", "Taking advantage of a CPU's SIMD instruction set can yield very nice performance boosts, but only when the CPU can be used to process a large data set that can be performed in parallel. It enables the computation of multiple parts of the data set at once. \n", "\n", "For example, given the expression \n", "$$\\texttt{double x = a*b},$$ \n", "where $\\texttt{double}$ precision variables $\\texttt{a}$ and $\\texttt{b}$ vary at each point on a computational grid, AVX compiler intrinsics will enable the multiplication computation at *four* grid points *each clock cycle*, *on each CPU core*. Therefore, without these intrinsic, the computation might take four times longer. Compilers can sometimes be smart enough to \"vectorize\" the loops over data, but when the mathematical expressions become too complex (e.g., in the context of numerically solving Einstein's equations of general relativity), the compiler will simply give up and refuse to enable SIMD vectorization.\n", "\n", "As SIMD intrinsics can differ from one CPU to another, and even between compilers, NRPy+ outputs generic C macros for common arithmetic operations and transcendental functions. In this way, the C code's Makefile can decide the most optimal SIMD intrinsics for the given CPU's instruction set and compiler. For example, most modern CPUs support [AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions), and a majority support up to [AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Advanced_Vector_Extensions_2), while some support up to [AVX512](https://en.wikipedia.org/wiki/AVX-512) instruction sets. For a full list of compiler intrinsics, see the [official Intel SIMD intrinsics documentation](https://software.intel.com/sites/landingpage/IntrinsicsGuide/).\n", "\n", "To see how this works, let's return to our NRPy+ `outputC()` CSE example above, but this time enabling SIMD intrinsics:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:06:49.876504Z", "iopub.status.busy": "2021-03-07T17:06:49.875647Z", "iopub.status.idle": "2021-03-07T17:06:49.878859Z", "shell.execute_reply": "2021-03-07T17:06:49.879359Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/*\n", " * Original SymPy expression:\n", " * \"x = b**2*sin(2*a) + c/sin(2*a)\"\n", " */\n", "{\n", " const double tmp_Integer_1 = 1.0;\n", " const REAL_SIMD_ARRAY _Integer_1 = ConstSIMD(tmp_Integer_1);\n", "\n", " const double tmp_Integer_2 = 2.0;\n", " const REAL_SIMD_ARRAY _Integer_2 = ConstSIMD(tmp_Integer_2);\n", "\n", " const double tmp_NegativeOne_ = -1.0;\n", " const REAL_SIMD_ARRAY _NegativeOne_ = ConstSIMD(tmp_NegativeOne_);\n", "\n", " const REAL_SIMD_ARRAY tmp_0 = SinSIMD(MulSIMD(_Integer_2, a));\n", " x = FusedMulAddSIMD(tmp_0, MulSIMD(b, b), DivSIMD(c, tmp_0));\n", "}\n", "\n" ] } ], "source": [ "# Step 5: Taking Advantage of SIMD (Single Instruction, Multiple Data) in NRPy+-Generated C Code\n", "\n", "# Declare some variables, using SymPy's symbols() function\n", "a,b,c = sp.symbols(\"a b c\")\n", "\n", "# Set x = b^2*sin(2*a) + c/sin(2*a).\n", "x = b**2*sp.sin(2*a) + c/(sp.sin(2*a))\n", "\n", "outputC(x,\"x\",params=\"enable_SIMD=True\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above SIMD code does the following.\n", "* First it fills a constant SIMD array of type `REAL_SIMD_ARRAY `with the integer 2 to the double-precision 2.0. The larger C code in which the above-generated code will be embedded should automatically `#define REAL_SIMD_ARRAY` to e.g., _m256d or _m512d for AVX or AVX512, respectively. In other words, AVX intrinsics will need to set 4 double-precision variables in `REAL_SIMD_ARRAY` to 2.0, and AVX-512 intrinsics will need to set 8.\n", "* Then it changes all arithmetic operations to be in the form of SIMD \"functions\", which are in fact #define'd in the larger C code as compiler intrinsics. \n", "\n", "FusedMulAddSIMD(a,b,c) performs a fused-multiply-add operation (i.e., `FusedMulAddSIMD(a,b,c)`=$a*b+c$), which can be performed on many CPUs nowadays (with FMA or AVX-512 instruction support) with a *single clock cycle*, at nearly the same expense as a single addition or multiplication, \n", "\n", "Note that it is assumed that the SIMD code exists within a suitable set of nested loops, in which the innermost loop increments every 4 in the case of AVX double precision or 8 in the case of AVX-512 double precision.\n", "\n", "As an additional note, NRPy+'s SIMD routines are aware that the C `pow(x,y)` function is exceedingly expensive when $|\\texttt{y}|$ is a small integer. It will automatically convert such expressions into either multiplications of x or one-over multiplications of x, as follows (notice there are no calls to `PowSIMD()` intrinsics!):" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:06:49.891791Z", "iopub.status.busy": "2021-03-07T17:06:49.891112Z", "iopub.status.idle": "2021-03-07T17:06:49.894988Z", "shell.execute_reply": "2021-03-07T17:06:49.894160Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/*\n", " * Original SymPy expression:\n", " * \"x = sqrt(a)*c + b**2 + a**(-3)\"\n", " */\n", "{\n", " const double tmp_Integer_1 = 1.0;\n", " const REAL_SIMD_ARRAY _Integer_1 = ConstSIMD(tmp_Integer_1);\n", "\n", " x = FusedMulAddSIMD(b, b, FusedMulAddSIMD(c, SqrtSIMD(a), DivSIMD(_Integer_1, MulSIMD(MulSIMD(a, a), a))));\n", "}\n", "\n" ] } ], "source": [ "# Declare some variables, using SymPy's symbols() function\n", "a,b,c = sp.symbols(\"a b c\")\n", "\n", "# Set x = b^2*sin(2*a) + c/sin(2*a).\n", "x = b**2 + a**(-3) + c*a**(sp.Rational(1,2))\n", "\n", "outputC(x,\"x\", params=\"enable_SIMD=True\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For those who would like to maximize fused-multiply-adds (FMAs) and fused-multiply-subtracts (FMSs), NRPy+ has more advanced pattern matching, which can be enabled via the `params=\"SIMD_find_more_FMAsFMSs=True\"` option. **Note that finding more FMAs and FMSs may actually degrade performance, and the default behavior is found to be optimal on x86_64 CPUs.** In the below example, notice that the more advanced pattern matching finds another FMA:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:06:49.904671Z", "iopub.status.busy": "2021-03-07T17:06:49.904009Z", "iopub.status.idle": "2021-03-07T17:06:49.906721Z", "shell.execute_reply": "2021-03-07T17:06:49.907187Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "// SIMD_find_more_FMAsFMSs=True:\n", "// searches for more FMAs/FMSs, which has been found to degrade performance on some CPUs:\n", "/*\n", " * Original SymPy expression:\n", " * \"x = sqrt(a)*c + b**2 + a**(-3)\"\n", " */\n", "{\n", " const double tmp_Integer_1 = 1.0;\n", " const REAL_SIMD_ARRAY _Integer_1 = ConstSIMD(tmp_Integer_1);\n", "\n", " x = FusedMulAddSIMD(b, b, FusedMulAddSIMD(c, SqrtSIMD(a), DivSIMD(_Integer_1, MulSIMD(MulSIMD(a, a), a))));\n", "}\n", "\n" ] } ], "source": [ "print(\"// SIMD_find_more_FMAsFMSs=True:\\n// searches for more FMAs/FMSs, which has been found to degrade performance on some CPUs:\")\n", "outputC(x,\"x\", params=\"enable_SIMD=True,SIMD_find_more_FMAsFMSs=True\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Step 6: Output this notebook to $\\LaTeX$-formatted PDF file \\[Back to [top](#toc)\\]\n", "$$\\label{latex_pdf_output}$$\n", "\n", "The following code cell converts this Jupyter notebook into a proper, clickable $\\LaTeX$-formatted PDF file. After the cell is successfully run, the generated PDF may be found in the root NRPy+ tutorial directory, with filename\n", "[Tutorial-Coutput__Parameter_Interface.pdf](Tutorial-Coutput__Parameter_Interface.pdf). (Note that clicking on this link may not work; you may need to open the PDF file through another means.)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:06:49.912192Z", "iopub.status.busy": "2021-03-07T17:06:49.911506Z", "iopub.status.idle": "2021-03-07T17:06:53.653656Z", "shell.execute_reply": "2021-03-07T17:06:53.652739Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created Tutorial-Coutput__Parameter_Interface.tex, and compiled LaTeX file\n", " to PDF file Tutorial-Coutput__Parameter_Interface.pdf\n" ] } ], "source": [ "import cmdline_helper as cmd # NRPy+: Multi-platform Python command-line interface\n", "cmd.output_Jupyter_notebook_to_LaTeXed_PDF(\"Tutorial-Coutput__Parameter_Interface\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" } }, "nbformat": 4, "nbformat_minor": 4 }