{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Computation\n", "\n", "## General concepts and mechanisms\n", "\n", "### Overview\n", "\n", "Binary operations between data arrays or datasets behave as follows:\n", "\n", "| Property | Action |\n", "| --- | --- |\n", "| coord | compare, abort on mismatch |\n", "| data | apply operation |\n", "| mask | combine with `or` |\n", "\n", "### Dimension matching and transposing\n", "\n", "Operations \"align\" variables based on their dimension labels.\n", "That is, an operation between two variables that have a transposed memory layout behave correctly:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "import scipp as sc\n", "\n", "rng = np.random.default_rng(12345)\n", "a = sc.array(\n", " dims=['x', 'y'], values=rng.random((2, 4)), variances=rng.random((2, 4)), unit='m'\n", ")\n", "b = sc.array(\n", " dims=['y', 'x'], values=rng.random((4, 2)), variances=rng.random((4, 2)), unit='s'\n", ")\n", "a / b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Propagation of uncertainties\n", "\n", "If variables have variances, operations correctly propagate uncertainties (the variances), in contrast to a naive implementation using NumPy:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = a / b\n", "result.values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.values / np.transpose(b.values)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result.variances" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.variances / np.transpose(b.variances)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The implementation assumes uncorrelated data and is otherwise based on, e.g., [Wikipedia: Propagation of uncertainty](https://en.wikipedia.org/wiki/Propagation_of_uncertainty#Example_formulae).\n", "See also [Propagation of uncertainties](../reference/error-propagation.rst) for the concrete equations used for error propagation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Note:\n", "\n", "If an operand with variances is also [broadcast](#Broadcasting) in an operation then the resulting values would be correlated.\n", "Scipp has no way of tracking or handling such correlations.\n", "Subsequent operations that combine values of the result, such as computing the mean, would therefore result in *underestimated uncertainties*.\n", "\n", "To avoid such silent underestimation of uncertainties, operations raise [VariancesError](https://scipp.github.io/generated/classes/scipp.VariancesError.html) when an operand with variances is implicitly or explicitly broadcast in an operations.\n", "See our publication [Systematic underestimation of uncertainties by widespread neutron-scattering data-reduction software](https://doi.org/10.3233/JNR-220049) for more background.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Broadcasting\n", "\n", "Missing dimensions in the operands are automatically broadcast.\n", "Consider:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var_xy = sc.array(dims=['x', 'y'], values=np.arange(6).reshape((2, 3)))\n", "print(var_xy.values)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var_y = sc.array(dims=['y'], values=np.arange(3))\n", "print(var_y.values)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var_xy -= var_y\n", "print(var_xy.values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since `var_y` did not depend on dimension `'x'` it is considered as \"constant\" along that dimension.\n", "That is, the *same* `var_y` values are subtracted *from all slices of dimension* `'x'` in `var_xy`.\n", "\n", "Given two variables `a` and `b`, we see that broadcasting integrates seamlessly with slicing and transposing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = sc.array(dims=['x', 'y'], values=rng.random((2, 4)), unit='m')\n", "b = sc.array(dims=['y', 'x'], values=rng.random((4, 2)), unit='s')\n", "a.values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a -= a['x', 1]\n", "a.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both operands may be broadcast, creating an output with the combination of input dimensions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sc.show(a['x', 1])\n", "sc.show(a['y', 1])\n", "sc.show(a['x', 1] + a['y', 1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that in-place operations such as `+=` will never change the shape of the left-hand-side.\n", "That is only the right-hand-side operation can be broadcast, and the operation fails of a broadcast of the left-hand-side would be required.\n", "\n", "### Units\n", "\n", "Units are required to be compatible:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "try:\n", " a + b\n", "except Exception as e:\n", " print(str(e))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Coordinate and name matching\n", "\n", "In operations with datasets, data items are paired based on their names when applying operations to datasets.\n", "Operations fail if names do not match:\n", "\n", "- In-place operations such as `+=` accept a right-hand-side operand that omits items that the left-hand-side has.\n", " If the right-hand-side contains items that are not in the left-hand-side the operation fails.\n", "- Non-in-place operations such as `+` return a new dataset with items from the intersection of the inputs.\n", "\n", "Coordinates are compared in operations with datasets or data arrays (or items of datasets).\n", "Operations fail if there is any mismatch in aligned (see below) coord values." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "d1 = sc.Dataset(\n", " data={\n", " 'a': sc.array(dims=['x', 'y'], values=rng.random((2, 3))),\n", " 'b': sc.array(dims=['y', 'x'], values=rng.random((3, 2))),\n", " 'c': sc.array(dims=['x', 'y'], values=rng.random((2, 3))),\n", " },\n", " coords={'x': sc.arange('x', 2.0, unit='m'), 'y': sc.arange('y', 3.0, unit='m')},\n", ")\n", "d2 = sc.Dataset(\n", " data={\n", " 'a': sc.array(dims=['x', 'y'], values=rng.random((2, 3))),\n", " 'b': sc.array(dims=['y', 'x'], values=rng.random((3, 2))),\n", " },\n", " coords={'x': sc.arange('x', 2.0, unit='m'), 'y': sc.arange('y', 3.0, unit='m')},\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "d1 += d2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "d2 += d1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "d3 = d1 + d2\n", "for name in d3:\n", " print(name)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "d3['a'] -= d3['b'] # transposing\n", "d3['a'] -= d3['x', 1]['b'] # broadcasting" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "d3['a'] -= d3['x', 1:2]['b'] # fail due to coordinate mismatch" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Alignment\n", "\n", "Coordinates are \"aligned\" be default.\n", "This means that they are required to match between operands and if they don't, an exception is raised." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "da1 = sc.DataArray(\n", " sc.arange('x', 8).fold('x', {'x': 4, 'y': 2}),\n", " coords={'x': sc.arange('x', 4), 'y': sc.arange('y', 2)},\n", ")\n", "da2 = sc.DataArray(\n", " sc.arange('x', 4), coords={'x': 10 * sc.arange('x', 4), 'y': sc.arange('y', 2)}\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "da1 + da2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Coordinates can become \"unaligned\" in some operations.\n", "For example, slicing out a single element makes all coordinates in the sliced dimension unaligned:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "da1_sliced = da1['x', 1]\n", "print(da1_sliced.coords['x'].aligned, da1_sliced.coords['y'].aligned)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Range-slices preserve alignment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "da1['x', 0:1].coords['x'].aligned" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Unaligned coordinates are not required to match in operations.\n", "They are instead dropped if there is a mismatch:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "da2_sliced = da2['x', 1]\n", "da1_sliced + da2_sliced" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Aligned coordinates take precedence over unaligned if both are present:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "da2.coords.set_aligned('x', False)\n", "da1 + da2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Note:\n", "\n", "`sc.identical` and similar comparison functions ignore alignment when applied to variables directly.\n", "They do, however take it into account when comparing coordinates of data arrays and datasets.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "da1 = sc.DataArray(sc.arange('x', 4), coords={'x': sc.arange('x', 4)})\n", "da2 = da1.copy()\n", "da2.coords.set_aligned('x', False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "sc.identical(da1.coords['x'], da2.coords['x'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "sc.identical(da1, da2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arithmetic\n", "\n", "The arithmetic operations `+`, `-`, `*`, and `/` and their in-place variants `+=`, `-=`, `*=`, and `/=` are available for variables, data arrays, and datasets.\n", "They can also be combined with [slicing](slicing.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Trigonometrics\n", "\n", "Trigonometric functions like `sin` are supported for variables.\n", "Units for angles provide a safeguard and ensure correct operation when working with either degree or radian:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rad = 3.141593 * sc.units.rad\n", "deg = 180.0 * sc.units.deg\n", "print(sc.sin(rad))\n", "print(sc.sin(deg))\n", "try:\n", " rad + deg\n", "except Exception as e:\n", " print(str(e))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other\n", "\n", "See the [list of free functions](../reference/free-functions.rst#free-functions) for an overview of other available operations." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 4 }