{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import scipp as sc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tips, tricks, and anti-patterns\n",
    "## Choose dimensions wisely\n",
    "\n",
    "A good choice of dimensions for representing data goes a long way in making working with Scipp efficient.\n",
    "Consider, e.g., data gathered from detector pixels at certain time intervals.\n",
    "We could represent it as"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "npix = 100\n",
    "ntime = 10\n",
    "data = sc.zeros(dims=['pixel', 'time'], shape=[npix, ntime])\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For irregularly spaced detectors this may well be the correct or only choice.\n",
    "If however the pixels are actually forming a regular 2-D image sensor we should probably prefer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nx = 10\n",
    "ny = npix // nx\n",
    "data = sc.zeros(dims=['y', 'x', 'time'], shape=[ny, nx, ntime])\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With this layout we can naturally perform slices, access neighboring pixel rows or columns, or sum over rows or columns.\n",
    "\n",
    "## Choose dimension order wisely\n",
    "\n",
    "In principle the order of dimensions in Scipp can be arbitrary since operations transpose automatically based on dimension labels.\n",
    "In practice however a bad choice of dimension order can lead to performance bottlenecks.\n",
    "This is most obvious when slicing multi-dimensional variables or arrays, where slicing any but the outer dimension yields a slice with gaps between data values, i.e., a very inefficient memory layout.\n",
    "If an application requires slicing (directly or indirectly, e.g., in `groupby` operations) predominantly for a certain dimension, this dimension should be made the *outermost* dimension.\n",
    "For example, for a stack of images the best choice would typically be"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nimage = 13\n",
    "images = sc.zeros(\n",
    "    dims=['image', 'y', 'x'],\n",
    "    shape=[\n",
    "        nimage,\n",
    "        ny,\n",
    "        nx,\n",
    "    ],\n",
    ")\n",
    "images"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Slices such as"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "images['image', 3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "will then have data for all pixels in a contiguous chunk of memory.\n",
    "Note that in Scipp the first listed dimension in `dims` is always the *outermost* dimension (NumPy's default).\n",
    "\n",
    "## Avoid loops\n",
    "\n",
    "With Scipp, just like with NumPy or Matlab, loops such as `for`-loops should be avoided.\n",
    "Loops typically lead to many small slices or many small array objects and rapidly lead to very inefficient code.\n",
    "If we encounter the need for a loop in a workflow using Scipp we should try and take a step back to understand how it can be avoided.\n",
    "Some tips to do this include:\n",
    "\n",
    "### Use slicing with \"shifts\"\n",
    "\n",
    "When access to neighbor slices is required, replace"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i in range(len(images.values) - 1):\n",
    "    images['image', i] -= images['image', i + 1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "with"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "images['image', :-1] -= images['image', 1:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that at this point NumPy provides more powerful functions such as [numpy.roll](https://numpy.org/doc/stable/reference/generated/numpy.roll.html).\n",
    "Scipp's toolset for such purposes is not fully developed yet."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Seek advice from NumPy\n",
    "\n",
    "There is a huge amount of information available for NumPy, e.g., on [stackoverflow](https://stackoverflow.com/questions/tagged/numpy?tab=Votes).\n",
    "We can profit in two ways from this.\n",
    "In some cases, the same techniques can be applied to Scipp variables or data arrays, since mechanisms such as slicing and basic operations are very similar.\n",
    "In other cases, e.g., when functionality is not available in Scipp yet, we can resort to processing the raw array accessible through the `values` property:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "var = sc.arange('x', 10.0)\n",
    "var.values = np.roll(var.values, 2)\n",
    "var"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `values` property can also be used as the `out` argument that many NumPy functions support:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.exp(var.values, out=var.values)\n",
    "var"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "    <b>WARNING</b>\n",
    "\n",
    "When applying NumPy functions to the `values` directly we lose handling of units and variances, so this should be used with care.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use helper dimensions or reshaped data\n",
    "\n",
    "Some operations may be difficult to implement without a loop in a certain data layout.\n",
    "If this layout cannot be changed globally, we can still change it temporarily for a certain operation.\n",
    "Even if this requires a copy it may still be faster and more concise than implementing the operation with a loop.\n",
    "For example, we can sum neighboring elements by temporarily reshaping with a helper dimension using `fold`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "var = sc.arange('x', 12.0)\n",
    "var.fold('x', sizes={'x': 4, 'neighbors': 3}).sum('neighbors')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that `fold` returns a view, i.e., the operation is performed without making a copy of the underlying data buffers.\n",
    "The companion operation of `fold` is `flatten`, which provides the reverse operation (see the [section below](#reshaping-data) for more details).\n",
    "\n",
    "## Use in-place operations\n",
    "\n",
    "Allocating memory or copying data is an expensive process and may even be the dominant factor for overall application performance, apart from loading large amounts of data from disk.\n",
    "Therefore, it pays off the avoid copies where possible.\n",
    "\n",
    "Scipp provides two mechanisms for this, in-place arithmetic operators such as `+=`, and `out`-arguments similar to what NumPy provides.\n",
    "Examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "var = var * 2.0  # makes a copy\n",
    "var *= 2.0  # in-place (faster)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "var = sc.sqrt(var)  # makes a copy\n",
    "var = sc.sqrt(var, out=var)  # in-place (faster)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that in-place operations cannot be used if a broadcast is required or a dtype change happens, since in-place operations may only change the data contained in a variable."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reshaping data\n",
    "\n",
    "<div id='reshaping-data'></div>\n",
    "\n",
    "The shape of a `Variable` or a `DataArray` can be modified using the [fold](https://scipp.github.io/generated/functions/scipp.fold.html) and\n",
    "[flatten](https://scipp.github.io/generated/functions/scipp.flatten.html) functions.\n",
    "Below are a few examples to illustrate how they work.\n",
    "\n",
    "### Folding\n",
    "\n",
    "In a nutshell, the `fold` operation increases the number of dimensions of the data.\n",
    "We begin with a two-dimensional variable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = 4\n",
    "M = 3\n",
    "var = sc.array(dims=['x', 'y'], values=np.random.random([N, M]))\n",
    "sc.show(var)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then fold the `x` dimension into two new dimensions `a` and `b`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "folded_var = sc.fold(var, dim='x', sizes={'a': 2, 'b': 2})\n",
    "sc.show(folded_var)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result is a three-dimensional variable with dimensions `(a, b, y)`.\n",
    "\n",
    "A `DataArray` with coordinates can also be folded:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = sc.array(dims=['x'], values=np.arange(N))\n",
    "y = sc.array(dims=['y'], values=np.arange(M))\n",
    "da = sc.DataArray(data=var, coords={'x': x, 'y': y})\n",
    "sc.show(da)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "folded_da = sc.fold(da, dim='x', sizes={'a': 2, 'b': 2})\n",
    "sc.show(folded_da)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the dimensions of the `x` coordinate have changed from `(x,)` to `(a, b)`,\n",
    "but the coordinate name has not changed.\n",
    "\n",
    "### Flattening\n",
    "\n",
    "The inverse of the `fold` operation is `flatten`.\n",
    "This is analogous to NumPy's [flatten](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html) method.\n",
    "By default, all dimensions of the input are flattened to a single dimension,\n",
    "whose name is provided by the `to` argument:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "flat_da = sc.flatten(da, to='z')\n",
    "sc.show(flat_da)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is however possible to only flatten selected dimensions, using the `dims` argument.\n",
    "For example, we can flatten the `a` and `b` dimensions of our previously folded (three-dimensional) data to recover a two-dimensional array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "flat_ab = sc.flatten(folded_da, dims=['a', 'b'], to='time')\n",
    "sc.show(flat_ab)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Stacking: concatenating into a new dimension\n",
    "\n",
    "Another very common operation is combining multiple arrays together.\n",
    "In NumPy, [stack](https://numpy.org/doc/stable/reference/generated/numpy.stack.html)\n",
    "is used to combine arrays along a new dimension, while\n",
    "[concatenate](https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html)\n",
    "is used to combine arrays along an existing dimension.\n",
    "\n",
    "Because of its labeled dimensions, Scipp can achieve both operations using\n",
    "[concat](https://scipp.github.io/generated/functions/scipp.concat.html).\n",
    "\n",
    "For example, giving `concat` a `dim` which is not found in the inputs will stack the arrays along a new dimension:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stacked = sc.concat([da, da], dim='z')\n",
    "sc.show(stacked)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Improving performance: Allocators and HugePages\n",
    "\n",
    "### Overview\n",
    "\n",
    "Scipp frequently needs to allocate or deallocate large chunks of memory.\n",
    "In many cases this is a major contribution to the (lack of) performance of Scipp (or any other) applications.\n",
    "Aside from minimizing the number of allocations (which is not always easy or possible), there are two simple ways that have each shown to independently yield up to 5% or 10% performance improvements for certain workloads.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note**\n",
    "    \n",
    "The following sections are for *Linux only*.\n",
    "Similar options may be available on other platforms, please consult the appropriate documentation.\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using the TBB allocation proxy\n",
    "\n",
    "Scipp uses TBB for parallelization.\n",
    "TBB also provides allocators, but Scipp is not using them by default.\n",
    "It is possible to automatically do so by preloading `libtbbmalloc_proxy.so`.\n",
    "For more context refer to the [oneTBB documentation](https://oneapi-src.github.io/oneTBB/main/tbb_userguide/automatically-replacing-malloc.html).\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note**\n",
    "    \n",
    "Make sure to install Scipp via `conda` since Scipp's PyPI package does not ship the required `libtbbmalloc_proxy.so`.\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Python\n",
    "\n",
    "Use\n",
    "\n",
    "```console\n",
    "$ LD_PRELOAD=libtbbmalloc_proxy.so.2 python\n",
    "```\n",
    "\n",
    "to launch Python or\n",
    "\n",
    "```console\n",
    "$ LD_PRELOAD=libtbbmalloc_proxy.so.2 python myapp.py\n",
    "```\n",
    "\n",
    "to run a Python script.\n",
    "Note that the `.2` suffix may be different depending on the actual version of TBB."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Jupyter\n",
    "\n",
    "Environment variables can be configured for a specific Jupyter kernel.\n",
    "First, obtain a list of kernels:\n",
    "\n",
    "```console\n",
    "$ jupyter kernelspec list\n",
    "```\n",
    "\n",
    "The list may include, e.g., `/home/<username>/mambaforge/envs/myenv/share/jupyter/kernels/python3`.\n",
    "Consider making a copy of this folder, e.g., to `/home/<username>/mambaforge/envs/myenv/share/jupyter/kernels/python3-tbbmalloc_proxy`.\n",
    "Next, edit the `kernel.json` file in this folder, updating the \"display_name\" and adding an \"env\" section:\n",
    "\n",
    "```json\n",
    "{\n",
    " \"argv\": [\n",
    "  \"/home/<username>/mambaforge/envs/myenv/bin/python\",\n",
    "  \"-m\",\n",
    "  \"ipykernel_launcher\",\n",
    "  \"-f\",\n",
    "  \"{connection_file}\"\n",
    " ],\n",
    " \"display_name\": \"Python 3 (ipykernel) with tbbmalloc_proxy\",\n",
    " \"language\": \"python\",\n",
    " \"env\":{\n",
    "   \"LD_PRELOAD\":\"libtbbmalloc_proxy.so.2\"\n",
    " }\n",
    "}\n",
    "```\n",
    "\n",
    "Notes:\n",
    "\n",
    "- The `.2` suffix may be different depending on the actual version of TBB.\n",
    "- File content aside from the \"env\" section may be different depending on your kernel.\n",
    "  The above is an example and not a suggestion to change other settings."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using HugePages\n",
    "\n",
    "HugePages are a Linux feature that may decrease the overhead of page handling and translating virtual addresses when dealing with large amounts of memory.\n",
    "This can be beneficial for Scipp since it frequently allocates and accesses a lot of memory.\n",
    "\n",
    "<div class=\"alert alert-warning\">\n",
    "\n",
    "**Warning**\n",
    "    \n",
    "HugePages is a complex topic and this does not aim to be a comprehensive guide.\n",
    "In particular, details on your system may be different from what is described here, please **carefully check any commands given below before executing** them, as they require root privileges.\n",
    "Please contact your system administrator for more information.\n",
    "    \n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using explicit HugePages\n",
    "\n",
    "HugePages can be reserved explicitly, in contrast to the transparent HugePages described below.\n",
    "First, check the number of configured HugePages:\n",
    "\n",
    "```console\n",
    "$ cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages\n",
    "```\n",
    "\n",
    "By default this is 0 on many systems.\n",
    "We can change this at system runtime (as root),\n",
    "\n",
    "```console\n",
    "# echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages\n",
    "```\n",
    "\n",
    "where 1024 is the numfer of 2 MByte pages to allocate.\n",
    "This will allocate 2 GByte of memory as HugePages.\n",
    "You will likely need more than this &mdash; adjust this number depending on your system's memory and the memory requirements of your application.\n",
    "\n",
    "Unlike transparent HugePages, explicit HugePages are not used by default by an application.\n",
    "The TBB allocator can be configured to use HugePages by setting the `TBB_MALLOC_USE_HUGE_PAGES` environment variable.\n",
    "For example, we can extend the \"env\" section of the Jupyter kernel configuration above to\n",
    "\n",
    "```json\n",
    "{\n",
    " \"argv\": [\n",
    "  \"/home/<username>/mambaforge/envs/myenv/bin/python\",\n",
    "  \"-m\",\n",
    "  \"ipykernel_launcher\",\n",
    "  \"-f\",\n",
    "  \"{connection_file}\"\n",
    " ],\n",
    " \"display_name\": \"Python 3 (ipykernel) with tbbmalloc_proxy\",\n",
    " \"language\": \"python\",\n",
    " \"env\":{\n",
    "   \"LD_PRELOAD\":\"libtbbmalloc_proxy.so.2\",\n",
    "   \"TBB_MALLOC_USE_HUGE_PAGES\":\"1\"\n",
    " }\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using HugePages transparently\n",
    "\n",
    "The Linux kernel can be configured to transparently use HugePages for all allocations.\n",
    "We can check the setting using\n",
    "\n",
    "```console\n",
    "$ cat /sys/kernel/mm/transparent_hugepage/enabled\n",
    "always [madvise] never\n",
    "```\n",
    "\n",
    "The `madvise` setting, which is the default on some systems, is not sufficient for Scipp to use HugePages, even when using `libtbbmalloc_proxy.so`.\n",
    "Instead, we need to set the `transparent_hugepage` kernel parameter to `always`:\n",
    "\n",
    "```console\n",
    "# echo always > /sys/kernel/mm/transparent_hugepage/enabled\n",
    "```\n",
    "\n",
    "Notes:\n",
    "\n",
    "- This enables hugepages globally and may adversely affect other applications.\n",
    "- This does not require use of `libtbbmalloc_proxy.so`, but the two can be used together."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "#### Checking if it works\n",
    "\n",
    "Run\n",
    "\n",
    "```console\n",
    "$ grep HugePages /proc/meminfo\n",
    "```\n",
    "\n",
    "to check the current use or\n",
    "\n",
    "```console\n",
    "$ watch grep HugePages /proc/meminfo\n",
    "```\n",
    "\n",
    "to monitor the use while running your application.\n",
    "When using explicit HugePages the size in `HugePages_Total` should correspond to the setting made above.\n",
    "Now run your application:\n",
    "\n",
    "- When using transparent HugePages the size in `AnonHugePages` should increase.\n",
    "  Note that NumPy appears to be using transparent HugePages as well (even when support is set to `madvise` instead of `always`), make sure you are not misled by this.\n",
    "- When using explicit HugePages the size in `HugePages_Free` should decrease."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}