{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Each Datashader `canvas` function call accepts an `agg` argument which is a `Reduction` that is used to aggregate values in each pixel (histogram bin) to return to the user. Each `Reduction` is in one of two categories:\n", "\n", "1. Mathematical combination of data such as the `count` of data points per pixel or the `mean` of a column of the supplied dataset.\n", "2. Selection of data from a column of the supplied dataset, or the index of the corresponding row in the dataset.\n", "\n", "This notebook explains how to use selection reductions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. `first` and `last` selection reductions\n", "\n", "The simplest selection reduction is the `first` reduction. This returns, for each pixel in the canvas, the value of a particular column in the dataset corresponding to the first data point that maps to that pixel. This is best illustrated with an example.\n", "\n", "Firstly create a sample dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import datashader as ds\n", "import pandas as pd\n", "\n", "df = pd.DataFrame(dict(\n", " x = [ 0, 0, 1, 1, 0, 0, 2, 2],\n", " y = [ 0, 0, 0, 0, 1, 1, 1, 1],\n", " value = [ 9, 8, 7, 6, 2, 3, 4, 5],\n", " other = [11, 12, 13, 14, 15, 16, 17, 18],\n", "))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 8 rows in the dataset with columns for `x` and `y` coordinates as well as a `value` and an `other` column.\n", "\n", "Next create a Datashader `canvas` with a height of 2 pixels and a width of 3 pixels:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "canvas = ds.Canvas(plot_height=2, plot_width=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Two rows of the dataset map to each canvas pixel with the exception of pixels `[0, 2]` and `[1, 1]` which do not have any rows mapped to them.\n", "\n", "Now call `canvas.line` using a `first` reduction:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "canvas.points(df, 'x', 'y', ds.first('value'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The returned `xarray.DataArray` is the same shape as the canvas and contains values taken from the `'value'` column corresponding to the first row that maps to each pixel. Pixels which do not have any rows mapped to them contain `NaN` values.\n", "\n", "Here are the results using a `last` selection reduction:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "canvas.points(df, 'x', 'y', ds.last('value'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. `max` and `min` selection reductions\n", "\n", "A `max` selection reduction returns, for each pixel in the canvas, the maximum value of the specified column of all rows that map to that pixel. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "canvas.points(df, 'x', 'y', ds.max('value'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The corresponding `min` selection reduction is:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "canvas.points(df, 'x', 'y', ds.min('value'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. `first_n`, `last_n`, `max_n` and `min_n` selection reductions\n", "\n", "These provide the same functionality as `first`, `last`, `max` and `min` reductions except that they return multiple values per pixel. For example, the `max_n` reduction with `n=3` returns the 3 largest values, in descending order, for each pixel:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "canvas.points(df, 'x', 'y', ds.max_n('value', n=3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The returned `xarray.DataArray` has shape `(ny, nx, n)` which is `(2, 3, 3)` in this example. The third dimension contains the maximum `n` values in order for each pixel, and where there are fewer than `n` values available `nan` is used instead as usual." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. `where` selection reductions\n", "\n", "A `where` reduction takes two arguments, a `selector` reduction and a `lookup_column` name. The `selector` reduction, such as a `first` or `max`, selects which row of the dataset to return information about for each pixel. But the information returned is that from the `lookup_column` rather than the column used by the `selector`.\n", "\n", "Again this is best illustrated by an example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "canvas.points(df, 'x', 'y', ds.where(ds.max('value'), 'other'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This returns, for each pixel, the value of the `'other'` column corresponding to the maximum of the `'value'` column of the data points that map to that pixel.\n", "\n", "Although it is possible to use a `first` or `last` as a `selector` with a `lookup_column`, such as\n", "\n", "```python\n", "ds.where(ds.first('value'), 'other')\n", "```\n", "this is unnecessary as it is identical to the simpler\n", "```python\n", "ds.where(ds.first('other'))\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. `where` selection reductions returning a row index\n", "\n", "The `lookup_column` argument to `where` is optional. If not specified, `where` defaults to returning the index of the row in the dataset corresponding to the `selector` for each pixel." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "canvas.points(df, 'x', 'y', ds.where(ds.max('value')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 8 rows in the dataframe so row indices returned are in the range 0 to 7. An index of -1 is returned for pixels that do not have any data points mapped to them.\n", "\n", "`first` and `last` can be used as `where` reduction `selector`s that return row indexes, for example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "canvas.points(df, 'x', 'y', ds.where(ds.first('value')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`where` reductions can also use a `selector` that is a `first_n`, `last_n`, `max_n` or `min_n` reduction, for example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "canvas.points(df, 'x', 'y', ds.where(ds.first_n('value', 3)))" ] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 }