{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Glaciers explorer using Datashader\n", "\n", "\n", "\n", "This notebook provides an annotated [HoloViews](https://holoviews.org)+[Panel](https://panel.pyviz.org) implementation of a [dashboard originally developed in Plotly+Dash](https://github.com/OGGM/OGGM-Dash/blob/master/apps/explore/app.py) for viewing data about the Earth's glaciers from the [Open Global Glacier Model](https://oggm.org). To run it, first:\n", "\n", " conda install -c pyviz pandas=0.24 param=1.9.0 panel=0.5.1 holoviews=1.12.2 geoviews=1.6.2 datashader=0.7.0\n", "\n", "Next, save the [data file](https://cluster.klima.uni-bremen.de/~fmaussion/misc/oggm_glacier_explorer.csv) as `data/oggm_glacier_explorer.csv` (and gzip it if desired).\n", "\n", "The dashboard can then be used here as a cell in the Jupyter notebook, or you can run it as a separate server using:\n", "\n", " bokeh serve glaciers.ipynb --show\n", " \n", "This notebook is essentially the same as [Glaciers.ipynb](https://anaconda.org/jbednar/glaciers) but uses unaggregated data that is practical only with [Datashader](http://datashader.org)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os, numpy as np, pandas as pd, cartopy.crs as ccrs, bokeh\n", "import holoviews as hv, geoviews as gv, datashader as ds\n", "\n", "from colorcet import bmy\n", "from holoviews.util import Dynamic\n", "from holoviews.operation.datashader import rasterize, datashade\n", "\n", "hv.extension('bokeh', width=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('data/oggm_glacier_explorer.csv')\n", "df['latdeg'] = df.cenlat\n", "\n", "df.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot the data\n", "\n", "As you can see in the dataframe, there are a lot of things that could be plotted about this dataset, but following the [previous version](https://github.com/OGGM/OGGM-Dash/blob/master/apps/explore/app.py) let's focus on the lat/lon location, elevation, temperature, and precipitation. We'll use tools from [PyViz](http://pyviz.org), starting with [HoloViews](https://holoviews.org) as an easy way to build interactive [Bokeh](http://bokeh.pydata.org) plots. So that we can use the full glacier database with good performance, we'll have [Datashader](http://datashader.org) pre-render some of the plots as images before they reach the browser.\n", "\n", "To start, let's declare a HoloViews object that captures English-text descriptions of the various columns in the dataframe, in a way that subsequent plots can all inherit without having to repeat that information:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = gv.Points(df, [('cenlon', 'Longitude'), ('cenlat', 'Latitude')],\n", " [('avg_prcp', 'Annual Precipitation (mm/yr)'),\n", " ('area_km2', 'Area'), ('latdeg', 'Latitude (deg)'),\n", " ('avg_temp_at_mean_elev', 'Annual Temperature at avg. altitude'), \n", " ('mean_elev', 'Elevation')])\n", "total_area = df.area_km2.sum()\n", "print(data, len(data), total_area)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we've declared that `cenlon` and `cenlat` (the lat,lon location of the center of the glacier) are the [\"key dimensions\"](http://holoviews.org/getting_started/Tabular_Datasets.html#Tabular) (independent values that specify which glacier this is), and the rest are \"value dimensions\" (various dependent values characterizing that particular sample).\n", "\n", "To make it faster to work with this data in plots, let's project all the lat,lon values into a coordinate system that can be displayed on top of tile-based online maps. We'll still need the original latitudes in degrees for some purposes, so we saved those above in a column `latdeg`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = gv.Dataset(gv.operation.project_points(data))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let's define various options that will control the appearance of our plots:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "geo_kw = dict(aggregator=ds.sum('area_km2'), x_sampling=1000, y_sampling=1000)\n", "elev_kw = dict(cmap='#7d3c98')\n", "temp_kw = dict(num_bins=50, adjoin=False, normed=False, bin_range=data.range('avg_temp_at_mean_elev'))\n", "prcp_kw = dict(num_bins=50, adjoin=False, normed=False, bin_range=data.range('avg_prcp'))\n", "\n", "size_opts = dict(min_height=400, min_width=600, responsive=True)\n", "geo_opts = dict(size_opts, cmap=bmy, global_extent=False, logz=True, colorbar=True)\n", "elev_opts = dict(size_opts, show_grid=True)\n", "temp_opts = dict(size_opts, fill_color='#f1948a', default_tools=[], toolbar=None, alpha=1.0)\n", "prcp_opts = dict(size_opts, fill_color='#85c1e9', default_tools=[], toolbar=None, alpha=1.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using these options with HoloViews and GeoViews, we can plot various combinations of the variables of interest:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "geo_bg = gv.tile_sources.EsriImagery.options(alpha=0.6, bgcolor=\"black\")\n", "geopoints = data.to(gv.Points, ['cenlon', 'cenlat'], ['area_km2'], []).options(**geo_opts).redim.range(area_km2=(0, 3000))\n", "\n", "(geo_bg*rasterize(geopoints, **geo_kw).options(**geo_opts) + \n", " datashade(data.to(hv.Scatter, 'mean_elev','latdeg', []), **elev_kw).options(**elev_opts) + \n", " data.hist('avg_temp_at_mean_elev', **temp_kw).options(**temp_opts) +\n", " data.hist('avg_prcp', **prcp_kw).options(**prcp_opts)).cols(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the top left we've overlaid the location centers on a web-based map of the Earth, separately making a scatterplot of those same datapoints in the top right with elevation versus latitude. The bottom rows show histograms of temperature and precipitation for the whole set of glaciers. Of course, these are just some of the many plots that could be constructed from this data; see [holoviews.org](http://holoviews.org) for inspiration.\n", "\n", "## Define plotting functions\n", "\n", "The above plots are useful for understanding the properties of all glaciers worldwide, but what's more interesting is to consider how some particular subset of the glaciers relates to the rest. To explore this, let's capture the above commands into some functions that will accept a dataset and return viewable plots for that particular data. That way we can plot selected subsets of the data and compare them to the plots of the full dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def geo(data): return gv.Points(data, crs=ccrs.GOOGLE_MERCATOR).options(alpha=1)\n", "def elev(data): return data.to(hv.Scatter, 'mean_elev', 'latdeg', [])\n", "def temp(data): return data.hist('avg_temp_at_mean_elev', **temp_kw).options(**temp_opts)\n", "def prcp(data): return data.hist('avg_prcp', **prcp_kw).options(**prcp_opts)\n", "def count(data): return hv.Div('

Glaciers selected: {}'.format(len(data)) + \"
\" +\n", " 'Area: {:.0f} kmĀ² ({:.1f}%)'.format(np.sum(data['area_km2']), np.sum(data['area_km2']) / total_area * 100)).options(height=40)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If called with the full dataset:\n", "\n", " (geo_bg*rasterize(geo(data), **geo_kw).options(**geo_opts) + datashade(elev(data), **elev_kw).options(**elev_opts) + temp(data) + prcp(data)).cols(2)\n", "\n", "these functions will return static plots just like those above. Let's capture that output as a set of low-opacity (`alpha<0.5`) plots to use as a background on which to show selected subsets of the data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "static_geo = rasterize(geo(data), **geo_kw).options(alpha=0.1, tools=['hover', 'box_select'], active_tools=['box_select'], **geo_opts)\n", "static_elev = datashade(elev(data), **elev_kw).options(alpha=0.1, tools=[ 'box_select'], active_tools=['box_select'], **elev_opts)\n", "static_temp = temp(data).options(alpha=0.1)\n", "static_prcp = prcp(data).options(alpha=0.1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we defined some Bokeh tools like `hover` and `box_select` that you'll see below. Meanwhile, we could plot these on their own if we wished:\n", "\n", " (geo_bg*static_geo + static_elev + static_temp + static_prcp).cols(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selections\n", "\n", "Now that we have some static plots for the background and some functions to create plots dynamically, we can set up selections linked between each of the static plots and determining which data is shown dynamically. We'll be using [HoloViews streams](http://holoviews.org/user_guide/Custom_Interactivity.html), which allow any of a variety of data producers to be linked to any of a variety of data consumers. First, we'll define functions for selecting by 1D or 2D value range:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def combine_selections(**kwargs):\n", " \"\"\"\n", " Combines selections on all available plots into a single selection by index.\n", " \"\"\"\n", " if all(not v for v in kwargs.values()):\n", " return slice(None)\n", " selection = {}\n", " for key, bounds in kwargs.items():\n", " if bounds is None:\n", " continue\n", " elif len(bounds) == 2:\n", " selection[key] = bounds\n", " else:\n", " xbound, ybound = key.split('__')\n", " selection[xbound] = bounds[0], bounds[2]\n", " selection[ybound] = bounds[1], bounds[3]\n", " return sorted(set(data.select(**selection).data.index))\n", "\n", "def select_data(**kwargs):\n", " return data.iloc[combine_selections(**kwargs)] if kwargs else data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, `select_data` will accept various keyword arguments that specify 2D or 1D bounds with which to filter the data. A 2D bound uses four values and selects on the values of two columns (here with values in Web Mercator coordinates):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "select_data(cenlon__cenlat=(3000000, 9000000, 5000000, 11000000)).dframe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each additional selection will further narrow down the items selected. Here we further select by a 1D bound on temperature, to select only those with an average temperature below -2 degrees:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "select_data(cenlon__cenlat=(3000000, 9000000, 5000000, 11000000), avg_temp_at_mean_elev=(-5,-2)).dframe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Separately, we can instantiate some Stream-based selection objects provided by HoloViews that capture user selection/bounds events on each of the static plots:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from holoviews.streams import Stream, BoundsXY, BoundsX\n", "\n", "geo_bounds = BoundsXY(source=static_geo, rename={'bounds': 'cenlon__cenlat'})\n", "elev_bounds = BoundsXY(source=static_elev, rename={'bounds': 'mean_elev__latdeg'})\n", "temp_bounds = BoundsX( source=static_temp, rename={'boundsx': 'avg_temp_at_mean_elev'})\n", "prcp_bounds = BoundsX( source=static_prcp, rename={'boundsx': 'avg_prcp'})\n", "\n", "selections = [geo_bounds, elev_bounds, temp_bounds, prcp_bounds]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can make dynamic versions of the above static plots that first compute the appropriate subset of the data from all the selection objects, then call the plotting functions above to generate the plot for just that subset of the data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dyn_data = hv.DynamicMap(select_data, streams=selections)\n", "\n", "dyn_geo = rasterize(dyn_data.apply(geo), **geo_kw).options( **geo_opts)\n", "dyn_elev = datashade(dyn_data.apply(elev), **elev_kw).options(**elev_opts)\n", "dyn_temp = dyn_data.apply(temp)\n", "dyn_prcp = dyn_data.apply(prcp)\n", "dyn_count = dyn_data.apply(count)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we could view these dynamic plots on their own:\n", "\n", " (geo_bg*dyn_geo + dyn_elev + dyn_temp + dyn_prcp).cols(2)\n", "\n", "But to show the selections relative to the static plots, we'll overlay the dynamic plots onto the static plots:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "geomap = geo_bg * static_geo * dyn_geo\n", "elevation = static_elev * dyn_elev\n", "temperature = static_temp * dyn_temp\n", "precipitation = static_prcp * dyn_prcp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could now view these four overlaid plots at once and see how all the selections work (being sure to choose the `box-select` tool rather than the default `box-zoom` tool before selecting):\n", "\n", " (geomap + elevation + temperature + precipitation).cols(2)\n", "\n", "If you create that plot and try it out, you'll see that each time a selection is made, it reduces the set of glaciers included. So we need one more function that will allow us to reset to the initial state by clearing all the selections:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def clear_selections(arg=None):\n", " geo_bounds.update(bounds=None)\n", " elev_bounds.update(bounds=None)\n", " temp_bounds.update(boundsx=None)\n", " prcp_bounds.update(boundsx=None)\n", " Stream.trigger(selections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dashboard\n", "\n", "The code and plots above should be fine for exploring this data in a notebook, but let's go further and make a shareable dashboard using [Panel](http://panel.pyviz.org). Panel lets us add arbitrary custom functionality, such as a button to reset the selections by calling `clear_selections`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import panel as pn\n", "pn.extension()\n", "\n", "clear_button = pn.widgets.Button(name='Clear selection')\n", "clear_button.param.watch(clear_selections, 'clicks');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can add static text, Markdown, or HTML items like a title, instructions, and logos:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "title = '

World glaciers explorer
'\n", "instruction = 'Box-select on each plot to subselect; clear selection to reset.
' + \\\n", " 'See the Jupyter notebook source code for how to build apps like this!'\n", "oggm_logo = ''\n", "pn_logo = ''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want detailed control over the formatting, you could define these items in a separate [Jinja2 template](http://bokeh.pydata.org/en/latest/docs/user_guide/server.html#building-bokeh-applications). But here, let's put it all together using Panel Row and Column objects, which can display objects and plots from many different libraries, including the HoloViews objects used here. You'll then get an app with widgets and plots usable from within the notebook:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "header = pn.Row(pn.Pane(oggm_logo), pn.layout.Spacer(width=30), \n", " pn.Column(pn.Pane(title, width=400), pn.Pane(instruction, width=500)),\n", " pn.layout.HSpacer(), pn.Column(pn.Pane(dyn_count), pn.layout.Spacer(height=20), clear_button), \n", " pn.Pane(pn_logo, width=140))\n", "\n", "pn.Column(header, pn.Row(geomap, elevation), pn.Row(temperature, precipitation), width_policy='max', height_policy='max').servable()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As long as you are running this notebook \"live\" (in Jupyter, not viewing a website or a static copy on anaconda.org), the above notebook cell should contain the fully operational dashboard here in the notebook. You can also launch the dashboard at a separate port that shows up in a new browser tab, either by changing `.servable()` to `.show()` above and re-executing that cell, or by leaving the cell as it is and running `bokeh serve --show GlaciersShaded.ipynb`. \n", "\n", "Either way, you should get a standalone dashboard like the image at the start of this notebook. You can now select and explore your data to your heart's content, and share it with anyone else interested in this topic! Or you can use the above approach to make your own custom dashboard for just about anything you want to visualize, with plots from just about any plotting library and arbitrary custom interactivity for libraries that support it." ] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 }