{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each Datashader `canvas` function call accepts an `agg` argument which is a `Reduction` that is used to aggregate values in each pixel (histogram bin) to return to the user. Each `Reduction` is in one of two categories:\n",
    "\n",
    "1. Mathematical combination of data such as the `count` of data points per pixel or the `mean` of a column of the supplied dataset.\n",
    "2. Selection of data from a column of the supplied dataset, or the index of the corresponding row in the dataset.\n",
    "\n",
    "This notebook explains how to use selection reductions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. `first` and `last` selection reductions\n",
    "\n",
    "The simplest selection reduction is the `first` reduction. This returns, for each pixel in the canvas, the value of a particular column in the dataset corresponding to the first data point that maps to that pixel. This is best illustrated with an example.\n",
    "\n",
    "Firstly create a sample dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datashader as ds\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.DataFrame(dict(\n",
    "    x     = [ 0,  0,  1,  1,  0,  0,  2,  2],\n",
    "    y     = [ 0,  0,  0,  0,  1,  1,  1,  1],\n",
    "    value = [ 9,  8,  7,  6,  2,  3,  4,  5],\n",
    "    other = [11, 12, 13, 14, 15, 16, 17, 18],\n",
    "))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are 8 rows in the dataset with columns for `x` and `y` coordinates as well as a `value` and an `other` column.\n",
    "\n",
    "Next create a Datashader `canvas` with a height of 2 pixels and a width of 3 pixels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "canvas = ds.Canvas(plot_height=2, plot_width=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Two rows of the dataset map to each canvas pixel with the exception of pixels `[0, 2]` and `[1, 1]` which do not have any rows mapped to them.\n",
    "\n",
    "Now call `canvas.line` using a `first` reduction:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "canvas.points(df, 'x', 'y', ds.first('value'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The returned `xarray.DataArray` is the same shape as the canvas and contains values taken from the `'value'` column corresponding to the first row that maps to each pixel. Pixels which do not have any rows mapped to them contain `NaN` values.\n",
    "\n",
    "Here are the results using a `last` selection reduction:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "canvas.points(df, 'x', 'y', ds.last('value'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. `max` and `min` selection reductions\n",
    "\n",
    "A `max` selection reduction returns, for each pixel in the canvas, the maximum value of the specified column of all rows that map to that pixel. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "canvas.points(df, 'x', 'y', ds.max('value'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The corresponding `min` selection reduction is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "canvas.points(df, 'x', 'y', ds.min('value'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. `first_n`, `last_n`, `max_n` and `min_n` selection reductions\n",
    "\n",
    "These provide the same functionality as `first`, `last`, `max` and `min` reductions except that they return multiple values per pixel. For example, the `max_n` reduction with `n=3` returns the 3 largest values, in descending order, for each pixel:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "canvas.points(df, 'x', 'y', ds.max_n('value', n=3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The returned `xarray.DataArray` has shape `(ny, nx, n)` which is `(2, 3, 3)` in this example. The third dimension contains the maximum `n` values in order for each pixel, and where there are fewer than `n` values available `nan` is used instead as usual."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. `where` selection reductions\n",
    "\n",
    "A `where` reduction takes two arguments, a `selector` reduction and a `lookup_column` name. The `selector` reduction, such as a `first` or `max`, selects which row of the dataset to return information about for each pixel. But the information returned is that from the `lookup_column` rather than the column used by the `selector`.\n",
    "\n",
    "Again this is best illustrated by an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "canvas.points(df, 'x', 'y', ds.where(ds.max('value'), 'other'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This returns, for each pixel, the value of the `'other'` column corresponding to the maximum of the `'value'` column of the data points that map to that pixel.\n",
    "\n",
    "Although it is possible to use a `first` or `last` as a `selector` with a `lookup_column`, such as\n",
    "\n",
    "```python\n",
    "ds.where(ds.first('value'), 'other')\n",
    "```\n",
    "this is unnecessary as it is identical to the simpler\n",
    "```python\n",
    "ds.where(ds.first('other'))\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. `where` selection reductions returning a row index\n",
    "\n",
    "The `lookup_column` argument to `where` is optional. If not specified, `where` defaults to returning the index of the row in the dataset corresponding to the `selector` for each pixel."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "canvas.points(df, 'x', 'y', ds.where(ds.max('value')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are 8 rows in the dataframe so row indices returned are in the range 0 to 7. An index of -1 is returned for pixels that do not have any data points mapped to them.\n",
    "\n",
    "`first` and `last` can be used as `where` reduction `selector`s that return row indexes, for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "canvas.points(df, 'x', 'y', ds.where(ds.first('value')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`where` reductions can also use a `selector` that is a `first_n`, `last_n`, `max_n` or `min_n` reduction, for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "canvas.points(df, 'x', 'y', ds.where(ds.first_n('value', 3)))"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}