{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Masking\n", "\n", "The purpose of masks in Scipp is to *exclude* regions of data from analysis, or to *select* regions of interest (ROIs).\n", "For example, we may mask data from sensors we know to be broken, or we may mask everything outside our ROI in an image.\n", "\n", "In some cases direct *removal* of the bad data or data outside the ROI may be preferred.\n", "Masking provides an alternative solution, which lets us, e.g., modify the mask or remove it later.\n", "\n", "NumPy provides support for masked arrays in the [numpy.ma](https://numpy.org/doc/stable/reference/maskedarray.html) module.\n", "Scipp's masking feature is conceptually similar, but is based on a *dictionary* of masks.\n", "Each mask is a `Variable`, i.e., comes with explicit dimensions.\n", "Scipp can therefore store masks in a very space-efficient manner.\n", "For example, given an image stack with dimensions `('image', 'pixel_y', 'pixel_x')` we may have a mask for \"images\" with dimensions `('image', )` and a second mask defining the ROI with dimensions `('pixel_y', 'pixel_x')`.\n", "The support for multiple masks also enables Scipp to selectively apply or preserve masks.\n", "For example, a sum over the 'image' dimension can preserve the ROI mask." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import scipp as sc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating and manipulating masks\n", "\n", "Masks are simply variables with `dtype=bool`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mask = sc.array(dims=['x'], values=[False, False, True])\n", "mask" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Boolean operators can be used to manipulate such variables:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(~mask)\n", "print(mask ^ mask)\n", "print(mask & ~mask)\n", "print(mask | ~mask)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparison operators such as `==`, `!=`, `<`, or `>=` (see also the [list of comparison functions](../reference/free-functions.rst#comparison)) are a common method of defining masks:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var = sc.array(dims=['x'], values=np.random.random(5), unit='m')\n", "mask2 = var < 0.5 * sc.Unit('m')\n", "mask2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Masks in data arrays and items of dataset\n", "\n", "Data arrays and equivalently items of dataset can store arbitrary masks.\n", "Datasets themselves do not support masks.\n", "Masks are accessible using the `masks` keyword-argument and property, which behaves in the same way as `coords`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = sc.DataArray(\n", " data=sc.array(dims=['y', 'x'], values=np.arange(1.0, 7.0).reshape((2, 3))),\n", " coords={'y': sc.arange('y', 2.0, unit='m'), 'x': sc.arange('x', 3.0, unit='m')},\n", " masks={'x': sc.array(dims=['x'], values=[False, False, True])},\n", ")\n", "sc.show(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the example above the mask was specified as the data array was created.\n", "However, similar to `coords`, `masks` can also be set on an existing `DataArray` instance:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = a.copy()\n", "b.masks['x'].values[1] = True\n", "b.masks['y'] = sc.array(dims=['y'], values=[False, True])\n", "sc.show(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A mask value of `True` means that the mask is on, i.e., the corresponding data value should be ignored.\n", "Note that setting a mask does *not* affect the data.\n", "\n", "Masks of dataset items are accessed using the `masks` property of the item:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds = sc.Dataset(data={'a': a})\n", "ds['a'].masks['x']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Operations with masked objects\n", "\n", "### Element-wise binary operations\n", "\n", "The result of operations between data arrays or dataset with masks contains the masks of both inputs.\n", "If both inputs contain a mask with the same name, the output mask is the combination of the input masks with an **OR** operation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a + b" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Reduction operations\n", "\n", "Operations like `sum` and `mean` over a particular dimension cannot preserve masks that depend on this dimension.\n", "If this is the case, the mask is applied during the operation and is not present in the output:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.sum('x')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `mean` operation takes into account that masking is reducing the number of points in the mean, i.e., masked elements are not counted (in contrast to, e.g., treating them as 0):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.mean('x')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If a mask does not depend on the dimension used for the `sum` or `mean` operation, it is preserved.\n", "Here `b` has two masks, one that is applied and one that is preserved:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "b.sum('x')" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "### Binning and resampling operations\n", "\n", "Operations like `bin` and `rebin` over a particular dimension cannot preserve masks that depend on this dimension.\n", "In those cases, just like for reduction operations, masks are applied before the operation:\n", "\n", "- `bin` treats masked bins as empty.\n", "- `rebin` treats masked bins as zero.\n", "\n", "In both cases the masks that are applied during the operation are not included in the masks of the output.\n", "Masks that are independent of the binning dimension(s) are unaffected and included in the output." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examples and common patterns using masks\n", "\n", "### Masking where a coordinate is negative\n", "\n", "Example data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = sc.DataArray(\n", " data=sc.ones(dims=['x'], shape=(6,)),\n", " coords={'x': sc.linspace('x', -1, 1, 6, unit='m')},\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To mask entries of a data array where a coordinate is negative, create a boolean variable that is true where the coordinate is negative, and set it on the `masks` dictionary:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.masks['x is negative'] = a.coords['x'] < sc.scalar(0., unit='m')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Masking nan values\n", "\n", "It is common to mask data where the value of some coordinate is nan.\n", "To mask where a coordinate is `nan`-valued we need to create a boolean variable that is `True` where the coordinate is `nan`.\n", "That is exactly what [`sc.isnan`](/generated/functions/scipp.isnan.html#scipp.isnan) is for:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.masks['x is nan'] = sc.isnan(a.coords['x'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Removing masks\n", "\n", "To remove one or several masks, use the `.drop_masks` method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.masks" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "del a.masks['x is negative']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.masks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To remove all masks from the data array, the `.clear` method can be used:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.masks.clear()\n", "a.masks # now empty!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 4 }