{ "cells": [ { "cell_type": "markdown", "id": "intro", "metadata": {}, "source": [ "# Interactive Hover for Big Data\n", "When visualizing large datasets with [Datashader](https://datashader.org/), you can easily identify macro level patterns. However, the aggregation process that converts the data into an image can make it difficult to examine individual data points, especially if they occupy the same pixel. This sets up a challenge: how do you explore individual data points without sacrificing the benefits of aggregation?\n", "\n", "To solve this problem, HoloViews offers the `selector` keyword, which makes it possible for the hover tooltip to include information about the underlying data points when using a Datashader operation (`rasterize` or `datashade`).\n", "\n", "The `selector` mechanism performantly retrieves the specific details on the server side without having to search through the entire dataset or having to send all the data to the browser. This allows users working with large datasets to detect big picture patterns while also accessing information about individual points.\n", "\n", "This notebook demonstrates how to use `selector`, which creates a dynamic hover tool that keeps the interactive experience fast and smooth with very large datasets and makes it easier to explore and understand complex visualizations.\n", "\n", ":::{note}\n", "This notebook uses dynamic updates, which require running a live Jupyter or Bokeh server. When viewed statically, the plots will not update, you can zoom and pan, and hover information will not be available. \n", ":::\n", "\n", ":::{note}\n", "This functionality requires Bokeh version 3.7 or greater.\n", ":::\n", "\n", "Let's start by creating a Points element with a DataFrame consisting of five datasets combined. Each of the datasets has a random x, y-coordinate based on a normal distribution centered at a specific (x, y) location, with varying standard deviations. The datasets—labeled `d1` through `d5`—represent different clusters:\n", "\n", "- `d1` is tightly clustered around (2, 2) with a small spread of 0.03,\n", "- `d2` is around (2, -2) with a wider spread of 0.10,\n", "- `d3` is around (-2, -2) with even more dispersion at 0.50,\n", "- `d4` is broadly spread around (-2, 2) with a standard deviation of 1.00,\n", "- and `d5` has the widest spread of 3.00 centered at the origin (0, 0).\n", "\n", "Each point also carries a `val` and `cat` column to identify its dataset and category. The total dataset contains 50,000 points, evenly split across the five distributions." ] }, { "cell_type": "code", "execution_count": null, "id": "imports", "metadata": {}, "outputs": [], "source": [ "import datashader as ds\n", "import numpy as np\n", "import pandas as pd\n", "\n", "import holoviews as hv\n", "from holoviews.operation.datashader import datashade, dynspread, rasterize\n", "\n", "hv.extension(\"bokeh\")\n", "\n", "# Set default hover tools on various plot types\n", "hv.opts.defaults(hv.opts.RGB(tools=[\"hover\"]), hv.opts.Image(tools=[\"hover\"]))\n", "\n", "\n", "def create_synthetic_dataset(x, y, s, val, cat):\n", " seed = np.random.default_rng(1)\n", " num = 10_000\n", " return pd.DataFrame(\n", " {\"x\": seed.normal(x, s, num), \"y\": seed.normal(y, s, num), \"s\": s, \"val\": val, \"cat\": cat}\n", " )\n", "\n", "\n", "df = pd.concat(\n", " {\n", " cat: create_synthetic_dataset(x, y, s, val, cat)\n", " for x, y, s, val, cat in [\n", " (2, 2, 0.03, 0, \"d1\"),\n", " (2, -2, 0.10, 1, \"d2\"),\n", " (-2, -2, 0.50, 2, \"d3\"),\n", " (-2, 2, 1.00, 3, \"d4\"),\n", " (0, 0, 3.00, 4, \"d5\"),\n", " ]\n", " },\n", " ignore_index=True,\n", ")\n", "\n", "\n", "points = hv.Points(df)\n", "\n", "# Show a sample from each dataset\n", "df.iloc[[0, 10_000, 20_000, 30_000, 40_000]]" ] }, { "cell_type": "markdown", "id": "datashader_intro", "metadata": {}, "source": [ "## Datashader Operations\n", "\n", "Datashader is used to convert the points into a rasterized image. Two common operations are:\n", "\n", "- **`rasterize`**: Converts points into an image grid where each pixel aggregates data. The default is to count the number of points per pixel.\n", "- **`datashade`**: Applies a color map to the rasterized data, outputting RGBA values\n", "\n", "The default aggregator counts the points per pixel, but you can specify a different aggregator, for example, `ds.mean(\"s\")` to calculate the mean of the `s` column. For more information, see the [Large Data user guide](./15-Large_Data.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "id": "basic_datashade", "metadata": {}, "outputs": [], "source": [ "rasterized = rasterize(points)\n", "shaded = datashade(points)\n", "rasterized + shaded" ] }, { "cell_type": "markdown", "id": "5428cb43-793a-4c49-8674-7444cd3a5851", "metadata": {}, "source": [ "## Selectors are a subtype of Aggregators\n", "\n", "Both `aggregator` and `selector` relate to performing an operation on data points in a pixel, but it's important to understand the difference. \n", "\n", "When multiple data points fall into the same pixel, Datashader needs to get a single value from this collection to form an image. This is done with an `aggregator` that can specify if the points should be combined (such as the `mean` of a column) or that a single value should just be **selected** (such as the `min` of a column).\n", "\n", "\n", "![aggregator ](../assets/aggregator.png)\n", "\n", "Let's see a couple of different aggregators in action:" ] }, { "cell_type": "code", "execution_count": null, "id": "6c5e0b4c-413f-4c69-9285-82f37c13b60f", "metadata": {}, "outputs": [], "source": [ "# Combine data points for the aggregation:\n", "rasterized_mean = rasterize(points, aggregator=ds.mean(\"s\")).opts(title=\"Aggregate is Mean of s col\")\n", "\n", "# Select a data point for the aggregation:\n", "rasterized_max = rasterize(points, aggregator=ds.max(\"s\")).opts(title=\"Aggregate is Max of s column\")\n", "\n", "rasterized_mean + rasterized_max" ] }, { "cell_type": "markdown", "id": "50baba6e-a077-4ab2-ad24-caa4e973b2c2", "metadata": {}, "source": [ "### Selectors\n", "Since a `selector` is a subtype of `aggregator`, then why do we need a separate `selector` keyword? The answer is that very often, users will want to retain hover tooltip information about the **particular** data point within each pixel (e.g., with `min` or `max`) while aggregating the data to form an image using an approach that **combines** the overlapping data points (e.g., with `count` or `mean`). The following are valid `selector` operations that expose information for the **server-side hover tool** to collect about one of the underlying data points in a pixel:\n", "\n", "- `ds.min()`: Select the row with the minimum value for the column\n", "- `ds.max()`: Select the row with the maximum value for the column\n", "- `ds.first()`: Select the first value in the column\n", "- `ds.last()`: Select the last value in the column\n", "\n", "Under the hood, the selected value has a corresponding row-index, which allows the collection and presentation of the entire row in the hover tooltip. If no column is set, the selector will use the index to determine the sample.\n", "\n", "![selector ](../assets/selector.png)" ] }, { "cell_type": "markdown", "id": "server_side_hover_intro", "metadata": {}, "source": [ "## Server-side HoverTool\n", "\n", "The key idea behind the server-side `HoverTool` is:\n", "\n", "1. **Hover event**: When a user hovers over a plot, the pixel coordinates are sent to the server.\n", "2. **Data lookup**: The server uses these coordinates to look up the corresponding aggregated data from the pre-computed dataset.\n", "3. **Update display**: The hover information is updated and sent back to the front end to display detailed data.\n", "\n", "This design avoids sending all the raw data to the client and only transmits the necessary information on-demand. You can enable this by adding a `selector` to a `rasterize` or `datashade` operation. \n", "\n", "By adding an `selector` we now get the information of all the dimensions in the DataFrame along with the original Hover Information, for this example the new information is `s`, `val`, and `cat`. The new information is split with a horizontal rule." ] }, { "cell_type": "code", "execution_count": null, "id": "server_hover_example", "metadata": {}, "outputs": [], "source": [ "rasterized_with_selector = rasterize(points, aggregator=ds.mean(\"s\"), selector=ds.min(\"s\"))\n", "rasterized_with_selector" ] }, { "cell_type": "markdown", "id": "90f093c4-3a3a-4c2f-9d5b-bbb6a1963d43", "metadata": {}, "source": [ "You can specify which columns to show in the HoverTool with `hover_tooltips`. You can also rename the label by passing in a tuple with `(label, column)`. The information about the selector itself can be disabled by setting `selector_in_hovertool=False`." ] }, { "cell_type": "code", "execution_count": null, "id": "7d830584-4066-40b1-bb4d-ddae629fa84d", "metadata": {}, "outputs": [], "source": [ "hover_tooltips=[\"x\", \"y\", (\"s [mean(s)]\", \"x_y s\"), (\"s [min(s)]\", \"s\"), (\"cat [min(s)]\", \"cat\")]\n", "rasterized_with_selector.opts(hover_tooltips=hover_tooltips, selector_in_hovertool=False)" ] }, { "cell_type": "markdown", "id": "214f31a0-16fe-4fc7-acc1-34e0bbe4f5ef", "metadata": {}, "source": [ ":::{note}\n", "When `selector` is enabled, hover tooltips use a predefined grid layout with fixed styling, limiting the customization options that would otherwise be available through `hover_tooltips` in standard plots that don't utilize `selector`.\n", ":::" ] }, { "cell_type": "markdown", "id": "dynspread_explanation", "metadata": {}, "source": [ "Some useful functions are `spread` and `dynspread`, which enhance the visual output by increasing the spread of pixels. This helps make points easier to hover over when zooming in on individual points." ] }, { "cell_type": "code", "execution_count": null, "id": "dynspread_example", "metadata": {}, "outputs": [], "source": [ "dynspreaded = dynspread(rasterized_with_selector)\n", "rasterized_with_selector + dynspreaded" ] }, { "cell_type": "markdown", "id": "8d12adb8-a143-43dd-91c2-a9f54698b6e2", "metadata": {}, "source": [ ":::{seealso}\n", "[Large Data](./15-Large_Data.ipynb): An introduction to Datashader and HoloViews\n", ":::" ] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 5 }