{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import scipp as sc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tips, tricks, and anti-patterns\n", "## Choose dimensions wisely\n", "\n", "A good choice of dimensions for representing data goes a long way in making working with Scipp efficient.\n", "Consider, e.g., data gathered from detector pixels at certain time intervals.\n", "We could represent it as" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "npix = 100\n", "ntime = 10\n", "data = sc.zeros(dims=['pixel', 'time'], shape=[npix, ntime])\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For irregularly spaced detectors this may well be the correct or only choice.\n", "If however the pixels are actually forming a regular 2-D image sensor we should probably prefer" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nx = 10\n", "ny = npix // nx\n", "data = sc.zeros(dims=['y', 'x', 'time'], shape=[ny, nx, ntime])\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this layout we can naturally perform slices, access neighboring pixel rows or columns, or sum over rows or columns.\n", "\n", "## Choose dimension order wisely\n", "\n", "In principle the order of dimensions in Scipp can be arbitrary since operations transpose automatically based on dimension labels.\n", "In practice however a bad choice of dimension order can lead to performance bottlenecks.\n", "This is most obvious when slicing multi-dimensional variables or arrays, where slicing any but the outer dimension yields a slice with gaps between data values, i.e., a very inefficient memory layout.\n", "If an application requires slicing (directly or indirectly, e.g., in `groupby` operations) predominantly for a certain dimension, this dimension should be made the *outermost* dimension.\n", "For example, for a stack of images the best choice would typically be" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nimage = 13\n", "images = sc.zeros(\n", " dims=['image', 'y', 'x'],\n", " shape=[\n", " nimage,\n", " ny,\n", " nx,\n", " ],\n", ")\n", "images" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Slices such as" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "images['image', 3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "will then have data for all pixels in a contiguous chunk of memory.\n", "Note that in Scipp the first listed dimension in `dims` is always the *outermost* dimension (NumPy's default).\n", "\n", "## Avoid loops\n", "\n", "With Scipp, just like with NumPy or Matlab, loops such as `for`-loops should be avoided.\n", "Loops typically lead to many small slices or many small array objects and rapidly lead to very inefficient code.\n", "If we encounter the need for a loop in a workflow using Scipp we should try and take a step back to understand how it can be avoided.\n", "Some tips to do this include:\n", "\n", "### Use slicing with \"shifts\"\n", "\n", "When access to neighbor slices is required, replace" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i in range(len(images.values) - 1):\n", " images['image', i] -= images['image', i + 1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "with" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "images['image', :-1] -= images['image', 1:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that at this point NumPy provides more powerful functions such as [numpy.roll](https://numpy.org/doc/stable/reference/generated/numpy.roll.html).\n", "Scipp's toolset for such purposes is not fully developed yet." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Seek advice from NumPy\n", "\n", "There is a huge amount of information available for NumPy, e.g., on [stackoverflow](https://stackoverflow.com/questions/tagged/numpy?tab=Votes).\n", "We can profit in two ways from this.\n", "In some cases, the same techniques can be applied to Scipp variables or data arrays, since mechanisms such as slicing and basic operations are very similar.\n", "In other cases, e.g., when functionality is not available in Scipp yet, we can resort to processing the raw array accessible through the `values` property:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var = sc.arange('x', 10.0)\n", "var.values = np.roll(var.values, 2)\n", "var" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `values` property can also be used as the `out` argument that many NumPy functions support:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.exp(var.values, out=var.values)\n", "var" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "