{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[Datashader](http://datashader.org) renders data into regularly sampled arrays, a process known as [rasterization](https://en.wikipedia.org/wiki/Rasterisation), and then optionally converts that array into a viewable image (with one pixel per element in that array). \n", "\n", "In some cases, your data is *already* rasterized, such as data from imaging experiments, simulations on a regular grid, or other regularly sampled processes. Even so, the rasters you have already are not always the ones you need for a given purpose, having the wrong shape, range, or size to be suitable for overlaying with or comparing against other data, maps, and so on. Datashader provides fast methods for [\"regridding\"](https://climatedataguide.ucar.edu/climate-data-tools-and-analysis/regridding-overview)/[\"re-sampling\"](http://gisgeography.com/raster-resampling/)/\"re-rasterizing\" your regularly gridded datasets, generating new rasters on demand that can be used together with those it generates for any other data types. Rasterizing into a common grid can help you implement complex cross-datatype analyses or visualizations. \n", "\n", "In other cases, your data is stored in a 2D array similar to a raster, but represents values that are not regularly sampled in the underlying coordinate space. Datashader also provides fast methods for rasterizing these more general rectilinear or curvilinear grids, known as [quadmeshes](#quadmesh-Rasterization) as described later below. Fully arbitrary unstructured grids ([Trimeshes](Trimesh.ipynb)) are discussed separately.\n", "\n", "## Re-rasterization\n", "\n", "First, let's explore the regularly gridded case, declaring a small raster using Numpy and wrapping it up as an [xarray](http://xarray.pydata.org) DataArray for us to re-rasterize: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np, datashader as ds, xarray as xr\n", "from datashader import transfer_functions as tf, reductions as rd\n", "\n", "def f(x,y):\n", " return np.cos((x**2+y**2)**2)\n", "\n", "def sample(fn, n=50, range_=(0.0,2.4)):\n", " xs = ys = np.linspace(range_[0], range_[1], n)\n", " x,y = np.meshgrid(xs, ys)\n", " z = fn(x,y)\n", " return xr.DataArray(z, coords=[('y',ys), ('x',xs)])\n", "\n", "da = sample(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we are declaring that the first dimension of the ``DataArray`` (the rows) is called ``y`` and corresponds to the indicated continuous coordinate values in the list ``ys``, and similarly for the second dimension (the columns) called ``x``. The coords argument is optional, but without it the default integer indexing from Numpy would be used, which would not match how this data was generated (sampling over each of the ``ys``). \n", "\n", "DataArrays like `da` happen to be the format used for datashader's own rasterized output, so you can now immediately turn this hand-constructed array into an image using ``tf.shade()`` just as you would for points or lines rasterized by datashader:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.shade(da)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interpolation (upsampling)\n", "\n", "So, what if we want a larger version? We can do that with the Datashader `Canvas.raster` method. Note that `Canvas.raster` only supports regularly spaced rectilinear, 2D or 3D ``DataArray``s, and so will not accept the additional dimensions or non-separable coordinate arrays that xarray allows (for which see [Quadmesh](#quadmesh-rasterization), below). Also see the Quadmesh docs below if you want to use a [GPU](https://datashader.org/user_guide/Performance.html#Data-objects) for processing your raster data, because the Datashader `Canvas.raster` implementation does not yet support GPU arrays.\n", "\n", "Assuming you are ready to use `Canvas.raster`, let's try upsampling with either nearest-neighbor or bilinear interpolation (the default):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.Images(tf.shade(ds.Canvas().raster(da, interpolate='linear'), name=\"linear interpolation (default)\"),\n", " tf.shade(ds.Canvas().raster(da, interpolate='nearest'), name=\"nearest-neighbor interpolation\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, linear interpolation works well for smoothly varying values, like those in the lower left of this function, but doesn't help much for regions close to or beyond the sampling limits of the original raster, like those in the upper right.\n", "\n", "We can choose whatever output size we like and sample the grid over any range we like, though of course there won't be any data in regions outside of the original raster grid: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.shade(ds.Canvas(plot_height=200, plot_width=600, x_range=(-2,5), y_range=(-0.1,0.4)).raster(da))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Aggregation (downsampling)\n", "\n", "The previous examples all use upsampling, from a smaller to a larger number of cells per unit distance in X or Y. Downsampling works just as for points and lines in Datashader, supporting various aggregation functions. These aggregation functions determine the result when more than one raster grid cell falls into a given pixel's region of the plane. To illustrate downsampling, let's first render a 500x500 version of the above 50x50 array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "da2 = sample(f, n=500)\n", "tf.shade(da2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see that the version we upsampled to this size from the 50x50 samples is similar to this, but this one is much smoother and more accurately represents the underlying mathematical function, because it has sufficient resolution to avoid aliasing in the high-frequency portions of this function (towards the upper right). \n", "\n", "Now that we have this larger array, we can downsample it using a variety of aggregation functions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cvs = ds.Canvas(plot_width=150, plot_height=150)\n", "tf.Images(tf.shade(cvs.raster(da2), name=\"mean downsampling (default)\"),\n", " tf.shade(cvs.raster(da2, agg=rd.min()), name=\"min downsampling\"),\n", " tf.shade(cvs.raster(da2, agg=rd.max()), name=\"max downsampling\"),\n", " tf.shade(cvs.raster(da2, agg=rd.mode()), name=\"mode downsampling\"),\n", " tf.shade(cvs.raster(da2, agg=rd.std()), name=\"std downsampling\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here the default downsampling function ``mean`` renders a faithful size-reduced version of the original, with all the raster grid points that overlap a given pixel being averaged to create the final pixel value. The ``min`` and ``max`` aggregation functions take the minimum or maximum, respectively, of the values overlapping each pixel, and you can see here that the ``min`` version has larger light-blue regions towards the upper right (with each pixel reflecting the minimum of all grid cells it overlaps), while the ``max`` version has larger dark-blue regions towards the upper right. The ``mode`` version computes the most common value that overlaps this pixel (not very useful for floating-point data as here, but important for categorical data where ``mean`` would not be valid; in that case you can also use `first` or `last` to take the first or last value found for a given pixel). The ``std`` version reports the standard deviation of the grid cells in each pixel, which is low towards the lower left where the function is smooth, and increases towards the upper right, where the function value varies significantly per pixel (i.e., has many samples in the original grid with different values).\n", "\n", "The differences between min and max are clearer if we look at a regime where the function varies so much that it can only barely be faithfully be represented in a grid of this size:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "da3 = sample(f, n=500, range_=(0,3))\n", "tf.shade(da3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the upper right of this plot, you can see that the function is varying with such high frequency that any downsampled version will fail to represent it properly. The aggregation functions in Datashader can help you see when that is happening:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cvs = ds.Canvas(plot_width=150, plot_height=150)\n", "tf.Images(tf.shade(cvs.raster(da3), name=\"mean downsampling (default)\"),\n", " tf.shade(cvs.raster(da3, agg=rd.min()), name=\"min downsampling\"),\n", " tf.shade(cvs.raster(da3, agg=rd.max()), name=\"max downsampling\"),\n", " tf.shade(cvs.raster(da3, agg=rd.mode()),name=\"mode downsampling\"),\n", " tf.shade(cvs.raster(da3, agg=rd.std()), name=\"std downsampling\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here you can see that the ``mean`` downsampling looks like a good approximation to the original array, locally averaging the original function values in each portion of the array. However, if you were to zoom in and adjust for contrast, you would be able to see some of the inevitable aliasing artifacts that occur for any such representation in a too-small array. These aliasing effects are clearly visible in the ``min`` and ``max`` aggregation, because they keep the local minimum or maximum rather than averaging out the artifacts. Comparing ``mean`` and ``min`` or ``max`` (or subtracting ``min`` from ``max``) can help find regions of the array that are being poorly represented in the current view.\n", "\n", "## Collections of raster data\n", "\n", "The above examples all use a single 2D DataArray. Datashader's `raster` support can also use a 3D DataArray as long as which \"layer\" along this third dimension is wanted is specified:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def g(x, y, k=1):\n", " return np.cos(k*((x**2+y**2)**2))\n", "\n", "ks = [1.1,3.3,33]\n", "xs = ys = np.linspace(0.0, 2.4, 200)\n", "y,k,x = np.meshgrid(ys, ks, xs, indexing='xy')\n", "\n", "da4 = xr.DataArray(g(x,y,k), coords=[('k',ks), ('y',ys), ('x',xs)])\n", "\n", "tf.Images(tf.shade(ds.Canvas().raster(da4, layer=1.1), name=\"k=1.1\"),\n", " tf.shade(ds.Canvas().raster(da4, layer=3.3), name=\"k=3.3\"),\n", " tf.shade(ds.Canvas().raster(da4, layer=33), name=\"k=33\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here the factor \"k\" results in the function being evaluated at increasingly higher frequencies, eventually leading to complex [Moiré patterns](https://en.wikipedia.org/wiki/Moir%C3%A9_pattern) due to undersampling for k=33. 3D xarrays are useful for reading multi-layer image files, such as those from [xarray's rasterio-based support for reading from multilayer TIFFs](http://xarray.pydata.org/en/stable/io.html#rasterio).\n", "\n", "Similarly, Datashader can accept an xarray Dataset (a dictionary-like collection of aligned DataArrays), as long as the aggregation method specifies a suitable DataArray within the Dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def h(x,y): return np.sin((x**2+y**2)**2)\n", "def m(x,y): return np.exp(x+y)\n", "\n", "dd = xr.Dataset({'cos': sample(f, n=150),\n", " 'sin': sample(h, n=150),\n", " 'exp': sample(m, n=150)})\n", "\n", "tf.Images(tf.shade(ds.Canvas().raster(dd, agg=rd.mean('cos')), name='cos ((x^2+y^2)^2)'),\n", " tf.shade(ds.Canvas().raster(dd, agg=rd.mean('sin')), name='sin ((x^2+y^2)^2)'),\n", " tf.shade(ds.Canvas().raster(dd, agg=rd.mean('exp')), name='exp (x+y)'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Scaling up\n", "\n", "The above rasters are all relatively tiny, for illustration purposes, but Datashader's raster support is accelerated using multi-core machine-code generation from Numba, and can re-sample much larger rasters easily. For instance, rendering the above function into a 10000x10000 raster takes a few seconds on a four-core machine using Numpy, but it can then be re-sampled using Datashader in under 0.1 sec:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%time da5 = sample(m, n=10000, range_=(0,3))\n", "%time tf.shade(cvs.raster(da5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interactivity\n", "\n", "In practice, it is usually a good idea to view your data interactively at many different zoom levels to see all the data available and to detect any sampling or aliasing issues, which you can do with Datashader as described in [3_Interactivity](../getting_started/3_Interactivity.ipynb). For such cases, you can define both upsampling and downsampling methods at the same time; whichever one is needed for a given zoom level and range of the data will be applied.\n", "\n", "You can see Datashader rasters at work in the [Landsat](https://examples.pyviz.org/landsat/landsat.html) example notebook, which also has examples of reading raster data from TIFF files using xarray and rasterio:\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quadmesh Rasterization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `Canvas.raster` method described above supports re-rasterizing grids that are already regularly spaced. For the more general case of rectilinear and curvilinear grids, which are structured (unlike [Trimesh grids](Trimesh.ipynb)) but not necessarily regularly spaced, Datashader provides the `Canvas.quadmesh` method. Despite its similarities with `raster`, `quadmesh` is currently implemented separately from `raster`, and thus has quite different features and limitations. For instance, `quadmesh` supports [GPU](https://datashader.org/user_guide/Performance.html#Data-objects) arrays (including an optimized code path specifically speeding up regularly spaced rasters if they are provided), while `raster` offers optional interpolation and supports distributed computing using Dask, and many other small differences exist (see the docs for each method). So if you have a regularly spaced array, you will probably want to try both `quadmesh` and `raster` and see which one meets your needs. \n", "\n", "### Rectilinear Quadmesh\n", "A rectilinear quadmesh is specified as a 2-dimensional xarray `DataArray` with two 1-dimensional coordinates. The coordinates specify the center position of the quads in each row or column of the grid, allowing the spacing on x and y to be irregular, but leaving the two dimensions orthogonal (and thus always producing an axis-aligned rectangular image out of the provided rectangular array, with each patch also rectangular and axis aligned)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "da = xr.DataArray(\n", " [[1, 2, 3],\n", " [4, 5, 6],\n", " [7, 8, 9]],\n", " coords=[('y', [1, 6, 7]),\n", " ('x', [1, 2, 7])],\n", " name='Z')\n", "\n", "canvas = ds.Canvas()\n", "tf.shade(canvas.quadmesh(da, x='x', y='y'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because the quads may vary in size, a given quadmesh rasterization call may require a combination of upscaling and downscaling at different locations in the grid. `Canvas.raster`, in contrast, is always either upscaling or downscaling in a given call, never both. Quadmesh upscaling is performed with nearest-neighbor interpolation, and downscaling is performed using the aggregation function supplied as the `agg` argument to `Canvas.quadmesh`. If not supplied, the aggregation function defaults to `mean`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def rect_data(n):\n", " xs = np.linspace(0, 3, n) ** 2\n", " ys = np.arange(n)\n", " zs = np.sin(xs * ys[:, np.newaxis])\n", " da = xr.DataArray(zs, coords=[('y', ys), ('x', xs)], name='Z')\n", " return da\n", "\n", "canvas = ds.Canvas(plot_width=150, plot_height=150)\n", "tf.Images(*[tf.shade(canvas.quadmesh(rect_data(n),x='x', y='y', agg=ds.mean('Z')), name=str(n))\n", " for n in [20, 80, 320, 1280]])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Curvilinear Quadmesh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A curvilinear quadmesh is specified as an 2-dimensional xarray `DataArray` with 2-dimensional coordinates. When upsampling such a mesh, each array element will map to a quadrilateral patch (a \"quad\"), which need not be rectangular or axis aligned. The coordinates specify the center position of each quad in the grid, allowing the quadmesh to map from the underlying 2D array into any arbitrary overall shape and with arbitrarily varying grid spacing." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Qy = [[1, 2, 4],\n", " [1, 2, 3],\n", " [1, 2, 3]]\n", "\n", "Qx = [[1, 1, 1],\n", " [2, 2, 2],\n", " [2.5, 3, 3]]\n", "\n", "Z = [[1, 2, 3],\n", " [4, 5, 6],\n", " [7, 8, 9]]\n", "\n", "da = xr.DataArray(Z, name='Z', dims = ['y', 'x'],\n", " coords={'Qy': (['y', 'x'], Qy),\n", " 'Qx': (['y', 'x'], Qx)})\n", "\n", "canvas = ds.Canvas()\n", "tf.shade(canvas.quadmesh(da, x='Qx', y='Qy'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with the rectilinear quadmesh, upscaling is performed with nearest-neighbor interpolation, and downscaling is performed using the aggregation function supplied as the `agg` argument to `Canvas.quadmesh` (thus combining the values from multiple quads into a given pixel). If not supplied, the aggregation function defaults to `mean`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def curve_data(n):\n", " coords = np.linspace(-1.5,1.5,n)\n", " X,Y = np.meshgrid(coords, coords);\n", " Qx = np.cos(Y) - np.cos(X)\n", " Qy = np.sin(Y) + np.sin(X)\n", " Z = np.sqrt(X**2 + Y**2)\n", "\n", " return xr.DataArray(Z, name='Z', dims=['Y', 'X'],\n", " coords={'Qx': (['Y', 'X'], Qx),\n", " 'Qy': (['Y', 'X'], Qy)})\n", "\n", "tf.Images(*[tf.shade(canvas.quadmesh(curve_data(n), x='Qx', y='Qy', agg=ds.mean('Z')), name=str(n))\n", " for n in [8, 16, 32, 64]])" ] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 }