{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "

Tutorial 4. Interlinked Plots

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "hvPlot allows you to generate a number of different types of plot quickly from a standard API, returning [HoloViews](https://holoviews.org) objects as discussed in the previous notebook. Each initial plot will make some aspects of the data clear, and using the automatic interactive pan, zoom, and hover tools you can find additional trends and outliers at different spatial locations and spatial scales within each plot.\n", "\n", "Beyond what you can discover from each plot individually, how do you understand how the various plots relate to each other? For instance, imagine you have a data frame with columns _u_, _v_, _w_, _z_, and have separate plots of _u_ vs. _v_, _u_ vs. _w_, and _w_ vs. _z_. If you see a few outliers or a clump of unusual datapoints in your _u_ vs. _v_ plot, how can you find out the properties of those points in the _w_ vs. _z_ or other plots? Are those unusual _u_ vs. _v_ points typically high _w_, uniformly distributed along _w_, or some other pattern? \n", "\n", "To help understand multicolumnar and multidimensional datasets like this, scientists will often build complex multi-pane dashboards with custom functionality. HoloViz (and specifically Panel) tools are great for such dashboards, but here we can actually use the fact that hvPlot returns HoloViews objects to get quite sophisticated interlinking ([linked brushing](http://holoviews.org/user_guide/Linked_Brushing.html)) \"for free\", without needing to build any dashboard. HoloViews objects store metadata about what dimensions they cover, and we can use this metadata programmatically to let the user see how any data points in any plot relate across different plots." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see how this works, let us get back to the example we were working on at the end of the last notebook:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pathlib\n", "import holoviews as hv\n", "import pandas as pd\n", "import hvplot.pandas # noqa\n", "import colorcet as cc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First let us load the data as before:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%time\n", "df = pd.read_parquet(pathlib.Path('../data/earthquakes-projected.parq'))\n", "df = df.set_index(df.time)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And filter to the most severe earthquakes (magnitude `> 7`):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "most_severe = df[df.mag >= 7]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linked brushing across elements\n", "\n", "In the previous notebook, we saw how plot axes are automatically linked for panning and zooming when using the `+` operator, provided the dimensions match. When dimensions or an underlying index match across multiple plots, we can use a similar principle to achieve linked brushing, where user selections are also linked across plots.\n", "\n", "To illustrate, let us generate two histograms from our `most_severe_projected` DataFrame:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mag_hist = most_severe.hvplot(\n", " y='mag', kind='hist', responsive=True, min_height=150)\n", "\n", "depth_hist = most_severe.hvplot(\n", " y='depth', kind='hist', responsive=True, min_height=150)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These two histograms are plotting two different dimensions of our earthquake dataset (magnitude and depth), derived from the same set of earthquake samples. The samples between these two histograms share an index, and the relationships between these data points can be discovered and exploited programmatically even though they are in different elements. To do this, we can create an object for linking selections across elements:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls = hv.link_selections.instance()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given some HoloViews objects (elements, layouts, etc.), we can create versions of them linked to this shared linking object by calling `ls` on them:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls(depth_hist + mag_hist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try using the first Bokeh tool to select areas of either histogram: you'll then see both the depth and magnitude distributions for the bins you have selected, compared to the overall distribution. By default, selections on both histograms are combined so that the selection is the intersection of the two regions selected (data points matching _both_ the constraints on depth and the constraints on magnitude that you select). For instance, try selecting the deepest earthquakes (around 600), and you can see that those are not specific to one particular magnitude. You can then further select a particular magnitude range, and see how that range is distributed in depth over the selected depth range. Linked selections like this make it feasible to look at specific regions of a multidimensional space and see how the properties of those regions compare to the properties of other regions. You can use the Bokeh reset tool (double arrow) to clear your selection.\n", "\n", "Note that these two histograms are derived from the same `DataFrame` and created in the same call to `ls`, but neither of those is necessary to achieve the linked behavior! If linking two different `DataFrames`, the important thing to check is that any columns with the same name actually do have the same meaning, and that any index columns match, so that the plots you are visualizing make sense when linked together.\n", "\n", "## Linked brushing across element types\n", "\n", "The previous example linked across two histograms as a first example, but nothing prevents you from linked brushing across different element types. Here are our earthquake points, also derived from the same `DataFrame`, where the only change from earlier is that we are using the reversed warm colormap (described in the previous notebook):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "geo = most_severe.hvplot(\n", " 'easting', 'northing', color='mag', kind='points', tiles='EsriUSATopo',\n", " xaxis=None, yaxis=None, responsive=True, height=350, cmap = cc.CET_L4[::-1], framewise=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once again, we just need to pass our points to the `ls` object (newly declared here to be independent of the one above) to declare the linkage:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls2 = hv.link_selections.instance()\n", "\n", "(ls2(geo + depth_hist)).cols(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you can use the box-select tool to select earthquakes on the map and view their corresponding depth distribution, or vice versa. E.g. if you select just the earthquakes in Alaska, you can see that they tend not to be very deep underground (though that may be a sampling issue). Other selections will show other properties, in this case typically with no obvious relationship between geographic location and depth distribution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Accessing the data selection\n", "\n", "If you pass your `DataFrame` into the `.filter` method of your linked selection object, you can apply the active filter that you specified interactively:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ls2.filter(most_severe)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Exercise\n", "\n", "Try selecting a small number of earthquakes on the map above and re-running the previous cell. You should see that your `DataFrame` only includes the earthquakes you have selected. You can use this linked selections feature in your own workflows by selecting a region of your data, then running subsequent analyses only on that subset of the data (or comparing that subset to the whole data set)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "When exploring data it can be convenient to use the `.plot` API to quickly visualize a particular dataset. By calling `.hvplot` to generate different plots over the course of a session, it is possible to gradually build up a mental model of how a particular dataset is structured. Linked selections let you see relationships between your data's dimensions and clusters of datapoints much more directly, so that you can:\n", "\n", "1. Interactively explore high-dimensional data by making selections across different views of the same underlying samples.\n", "2. Turn this interactive exploration into a Python subselection of your data, allowing you to continue your data analysis on a subset of your data that you interactively selected.\n", "\n", "This approach is very general and allows a deeper understanding of high-dimensional data through interactivity. This interactivity is itself built on the very powerful HoloViews 'streams' system which you can leverage for yourself to build youw own [Custom Interactivity](./07_Custom_Interactivity.ipynb) (optional, advanced topic) when necessary.\n", "\n", "In the next section we will see how to apply data processing in a pipelined form, allowing us to build interactive visualizations driven by user-defined widgets when we want to have custom control over our data processing and selection." ] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 4 }