{ "cells": [ { "cell_type": "markdown", "id": "5b2c6d21", "metadata": {}, "source": [ "\n", "\n", "

Tutorial 5. Interactive Pipelines

" ] }, { "cell_type": "markdown", "id": "43cb5143", "metadata": {}, "source": [ "The plots built up over the first few tutorials were all highly interactive in the web browser, with interactivity provided by Bokeh plotting tools within the plots or in some cases by HoloViews generating a Bokeh widget to select for a `groupby` over a categorical variable. However, when you are exploring a dataset, you might want to see how _any_ aspect of the data or plot changes if varied interactively. Luckily, hvPlot makes it almost trivially easy to do this, so that you can very easily explore any parameter or setting in your code. \n", "\n", "hvPlot registers the `.interactive()` method on many of the PyData data structures, e.g. a Pandas or GeoPandas or Dask DataFrame, an Xarray DataSet. Calling `.interactive()` returns an interactive object (e.g. an interactive Pandas DataFrame), that can be used as if it was the original object (e.g. calling regular Pandas methods) and whose *output* (e.g. a DataFrame view) will be re-computed everytime one of its *inputs* change. The *inputs* are widgets (e.g. a drop-down list), that replace values you would usually hard-code and manually update to observe how they affect the output. When such an interactive object is displayed in a notebook, it includes the widgets that you have used together with the regular output.\n" ] }, { "cell_type": "markdown", "id": "1841376d", "metadata": {}, "source": [ "## Panel widgets\n", "\n", "Before using `.interactive()` we will need a widget library, and here we will be using [Panel](https://panel.holoviz.org/) to generate Bokeh widgets under user control, just as hvPlot uses Panel to generate widgets for a `groupby` as shown previously. Let's first get ahold of a Panel widget to see how they work. Here, let's create a Panel floating-point number slider to specify an earthquake magnitude between zero and nine:" ] }, { "cell_type": "code", "execution_count": null, "id": "93fccc4c", "metadata": {}, "outputs": [], "source": [ "import pathlib\n", "\n", "import holoviews as hv\n", "import hvplot.pandas # noqa\n", "import numpy as np\n", "import pandas as pd\n", "import panel as pn\n", "\n", "pn.extension(sizing_mode='stretch_width')" ] }, { "cell_type": "code", "execution_count": null, "id": "382008b1", "metadata": {}, "outputs": [], "source": [ "mag_slider = pn.widgets.FloatSlider(name='Minimum Magnitude', start=0, end=9, value=6)\n", "mag_slider" ] }, { "cell_type": "markdown", "id": "7542d517", "metadata": {}, "source": [ "The widget is a JavaScript object, but there are bidirectional connections between JS and Python that let us see and change the value of this slider using its `value` parameter:" ] }, { "cell_type": "code", "execution_count": null, "id": "2730a410", "metadata": {}, "outputs": [], "source": [ "mag_slider.value" ] }, { "cell_type": "code", "execution_count": null, "id": "31398646", "metadata": {}, "outputs": [], "source": [ "mag_slider.value = 7" ] }, { "cell_type": "markdown", "id": "9f460c98", "metadata": {}, "source": [ "#### Exercise\n", "\n", "Try moving the slider around and rerunning the `mag_slider.value` above to access the current slider value. As you can see, you can easily get the value of any widget to use in subsequent cells, but you'd need to re-run any cell that accesses that value for it to get updated." ] }, { "cell_type": "markdown", "id": "de5819d8", "metadata": {}, "source": [ "## hvPlot .interactive()\n", "\n", "hvPlot provides an easy way to connect widgets directly into an expression you want to control.\n", "\n", "First, let's read in our data:" ] }, { "cell_type": "code", "execution_count": null, "id": "e2565877", "metadata": {}, "outputs": [], "source": [ "%%time\n", "df = pd.read_parquet(pathlib.Path('../data/earthquakes-projected.parq'))\n", "df = df.set_index('time').tz_localize(None)" ] }, { "cell_type": "markdown", "id": "2b2e47a4", "metadata": {}, "source": [ "Now, let's do a little filtering that we might want to control with such a widget, such as selecting the highest-magnitude events:" ] }, { "cell_type": "code", "execution_count": null, "id": "cd7076f0", "metadata": {}, "outputs": [], "source": [ "WEB_MERCATOR_LIMITS = (-20037508.342789244, 20037508.342789244)\n", "\n", "df2 = df[['mag', 'depth', 'latitude', 'longitude', 'place', 'type']][df['northing'] < WEB_MERCATOR_LIMITS[1]]\n", "\n", "df2[df2['mag'] > 5].head()" ] }, { "cell_type": "markdown", "id": "04c8c9a9", "metadata": {}, "source": [ "What if instead of '5', we want the output above always to reflect the current value of `mag_slider`? We can do that by using hvPlot's `.interactive()` support, passing in a widget almost anywhere we want in a pipeline:" ] }, { "cell_type": "code", "execution_count": null, "id": "444b4081", "metadata": {}, "outputs": [], "source": [ "dfi = df2.interactive()\n", "\n", "dfi[dfi['mag'] > mag_slider].head()" ] }, { "cell_type": "markdown", "id": "c4f36f1d", "metadata": {}, "source": [ "Here, `.interactive` is a wrapper around your DataFrame or Xarray object that lets you provide Panel widgets almost anywhere you'd otherwise be using a number. Just as importing `hvplot.pandas` provides a `.hvplot()` method or object on your dataframe, it also provides a `.interactive` method or object that gives you a general-purpose *interactive* `Dataframe` driven by widgets. `.interactive` stores a copy of your pipeline (series of method calls or other expressions on your data) and dynamically replays the pipeline whenever that widget changes. \n", "\n", "`.interactive` supports just about any output you might want to get out of such a pipeline, such as text or numbers:" ] }, { "cell_type": "code", "execution_count": null, "id": "0429ed88", "metadata": {}, "outputs": [], "source": [ "dfi[dfi['mag'] > mag_slider].shape" ] }, { "cell_type": "markdown", "id": "d8371799", "metadata": {}, "source": [ "Or Matplotlib plots:" ] }, { "cell_type": "code", "execution_count": null, "id": "8a511401", "metadata": {}, "outputs": [], "source": [ "dfi[dfi['mag'] > mag_slider].plot(y='depth', kind='hist', bins=np.linspace(0, 50, 51))" ] }, { "cell_type": "markdown", "id": "88d1120d", "metadata": {}, "source": [ "Each time you drag the widget, hvPlot replays the pipeline and updates the output shown. \n", "\n", "Of course, `.interactive` also supports `.hvplot()`, here with a new copy of a widget so that it will be independent of the other cells above:" ] }, { "cell_type": "code", "execution_count": null, "id": "d2fd6e7f", "metadata": {}, "outputs": [], "source": [ "mag_slider2 = pn.widgets.FloatSlider(name='Minimum magnitude', start=0, end=9, value=6)\n", "\n", "dfi[dfi['mag'] > mag_slider2].hvplot(y='depth', kind='hist', bins=np.linspace(0, 50, 51))" ] }, { "cell_type": "markdown", "id": "9626d206", "metadata": {}, "source": [ "You can see that the depth distribution varies dramatically as you vary the minimum magnitude, with the lowest magnitude events apparently only detectable at short depths. There also seems to be some artifact at depth 10, which is the largest bin regardless of the filtering for all but the largest magnitudes." ] }, { "cell_type": "markdown", "id": "74f7fd03", "metadata": {}, "source": [ "## Date widgets\n", "\n", "A `.interactive()` pipeline can contain any number of widgets, including any from the Panel [reference gallery](https://panel.holoviz.org/reference/index.html#widgets). For instance, let's make a widget to specify a date range covering the dates found in this data:" ] }, { "cell_type": "code", "execution_count": null, "id": "183ba0eb", "metadata": {}, "outputs": [], "source": [ "date = pn.widgets.DateRangeSlider(name='Date', start=df.index[0], end=df.index[-1])\n", "date" ] }, { "cell_type": "markdown", "id": "1d2f79bd", "metadata": {}, "source": [ "Now we can access the value of this slider:" ] }, { "cell_type": "code", "execution_count": null, "id": "29ee1496", "metadata": {}, "outputs": [], "source": [ "date.value" ] }, { "cell_type": "markdown", "id": "154ef8fc", "metadata": {}, "source": [ "As this widget is specifying a range, this time the value is returned as a tuple. If you prefer, you can get the components of the tuple directly via the `value_start` and `value_end` parameters respectively:" ] }, { "cell_type": "code", "execution_count": null, "id": "434e259b", "metadata": {}, "outputs": [], "source": [ "f'Start is at {date.value_start} and the end is at {date.value_end}'" ] }, { "cell_type": "markdown", "id": "b586a182", "metadata": {}, "source": [ "Once again, try specifying different ranges with the widgets and rerunning the cell above." ] }, { "cell_type": "markdown", "id": "0ada23d6", "metadata": {}, "source": [ "Now let's use this widget to expand our expression to filter by date as well as magnitude:" ] }, { "cell_type": "code", "execution_count": null, "id": "1068b465", "metadata": {}, "outputs": [], "source": [ "mag = pn.widgets.FloatSlider(name='Minimum magnitude', start=0, end=9, value=6)\n", "\n", "filtered = dfi[\n", " (dfi['mag'] > mag) &\n", " (dfi.index >= date.param.value_start) &\n", " (dfi.index <= date.param.value_end)]\n", "\n", "filtered.head()" ] }, { "cell_type": "markdown", "id": "524b5c93", "metadata": {}, "source": [ "You can now use either the magnitude or the date range (or both) to filter the data, and the output will update. Note that here you want to move the start date of the range slider rather than the end; otherwise, you may not see the table change because the earthquakes are displayed in date order." ] }, { "cell_type": "markdown", "id": "7e5dd6bb", "metadata": {}, "source": [ "#### Exercise\n", "\n", "To specify the minimum earthquake magnitude, notice that we supplied the whole `mag` widget but `.interactive()` used only the `value` parameter of this widget by default. To be explicit, you may use `mag.param.value` instead if you wish. Try it!" ] }, { "cell_type": "markdown", "id": "1628595b", "metadata": {}, "source": [ "#### Exercise\n", "\n", "For readability, seven columns were chosen before displaying the `DataFrame`. Have a look at `df.columns` and pick a different set of columns for display." ] }, { "cell_type": "markdown", "id": "4f336254", "metadata": {}, "source": [ "## Functions as inputs\n", "\n", "Quite often the data structure you want to explore in a pipeline, may itself be the outcome of another pipeline. It may for instance be a Pandas Dataframe created by extracting and transforming the output of a database or an API call, or it could be the dynamic output of some simulation or pre-processing. With `hvplot.bind` you can start with an arbitrary custom function that returns the data structure you want to explore and then bind that function's argument to widgets. Then when those widgets change, the function will get called to get the updated output.\n", "\n", "To keep this example self-contained we'll illustrate this process using a simple function that filters the earthquakes dataset by event type and returns a DataFrame. Of course, this function could include _any_ computation that returns a DataFrame, including selecting data files on disk or making a query to a database." ] }, { "cell_type": "code", "execution_count": null, "id": "a337d070", "metadata": {}, "outputs": [], "source": [ "def input_function(event_type):\n", " df2 = df[['mag', 'depth', 'latitude', 'longitude', 'place', 'type']]\n", " return df2[df2['type'] == event_type]" ] }, { "cell_type": "markdown", "id": "c356111e", "metadata": {}, "source": [ "We can then create a Panel `Select` widget with a few options and bind it to the `input_function`. Calling `.interactive()` on the bound object is what allows it to be used in an interactive pipeline, as we previously did with `dfi`." ] }, { "cell_type": "code", "execution_count": null, "id": "553a2766", "metadata": {}, "outputs": [], "source": [ "event_types = pn.widgets.Select(options=['earthquake', 'quarry blast', 'explosion', 'ice quake'])\n", "\n", "inputi = hvplot.bind(input_function, event_types).interactive()" ] }, { "cell_type": "code", "execution_count": null, "id": "32463fdf", "metadata": {}, "outputs": [], "source": [ "inputi[inputi['mag'] > mag].head(2)" ] }, { "cell_type": "markdown", "id": "5434a4df", "metadata": {}, "source": [ "## .interactive() and HoloViews \n", "\n", "`.interactive()` lets you work naturally with the compositional HoloViews plots provided by `.hvplot()`. Here, let's combine such plots using the HoloViews `+` operator:" ] }, { "cell_type": "code", "execution_count": null, "id": "93fa8582", "metadata": {}, "outputs": [], "source": [ "mag_hist = filtered.hvplot(y='mag', kind='hist', width=300)\n", "depth_hist = filtered.hvplot(y='depth', kind='hist', width=300)\n", "\n", "mag_hist + depth_hist" ] }, { "cell_type": "markdown", "id": "2e1b35c2", "metadata": {}, "source": [ "These are the same two histograms we saw earlier, but now we can filter them on data dimensions like `time` that aren't even explicitly shown in the plot, using the Panel widgets.\n", "\n", "## Filtering earthquakes on a map\n", "\n", "To display the earthquakes on a map, we will first create a subset of the data to make it quick to update without needing Datashader.:" ] }, { "cell_type": "code", "execution_count": null, "id": "a765927c", "metadata": {}, "outputs": [], "source": [ "subset_df = df[\n", " (df.northing < WEB_MERCATOR_LIMITS[1]) &\n", " (df.mag > 4) &\n", " (df.index >= pd.Timestamp('2017-01-01')) &\n", " (df.index <= pd.Timestamp('2018-01-01'))]" ] }, { "cell_type": "markdown", "id": "c72a85ef", "metadata": {}, "source": [ "Now we can make a new interactive `DataFrame` from this new subselection:" ] }, { "cell_type": "code", "execution_count": null, "id": "77bbd123", "metadata": {}, "outputs": [], "source": [ "subset_dfi = subset_df.interactive(sizing_mode='stretch_width')" ] }, { "cell_type": "markdown", "id": "314d0b5b", "metadata": {}, "source": [ "And now we can declare our widgets and use them to filter the interactive `DataFrame` as before:" ] }, { "cell_type": "code", "execution_count": null, "id": "7aff1e89", "metadata": {}, "outputs": [], "source": [ "date_subrange = pn.widgets.DateRangeSlider(\n", " name='Date', start=subset_df.index[0], end=subset_df.index[-1])\n", "mag_subrange = pn.widgets.FloatSlider(name='Magnitude', start=3, end=9, value=3)\n", "\n", "filtered_subrange = subset_dfi[\n", " (subset_dfi.mag > mag_subrange) &\n", " (subset_dfi.index >= date_subrange.param.value_start) &\n", " (subset_dfi.index <= date_subrange.param.value_end)]" ] }, { "cell_type": "markdown", "id": "fccd691e", "metadata": {}, "source": [ "Now we can plot the earthquakes on an ESRI tilesource, including the filtering widgets as follows:" ] }, { "cell_type": "code", "execution_count": null, "id": "927909c8", "metadata": {}, "outputs": [], "source": [ "geo = filtered_subrange.hvplot(\n", " 'easting', 'northing', color='mag', kind='points',\n", " xaxis=None, yaxis=None, responsive=True, min_height=500, tiles='ESRI')\n", "\n", "geo" ] }, { "cell_type": "markdown", "id": "298d15b2", "metadata": {}, "source": [ "## Terminating methods for `.interactive`\n", "\n", "The examples above all illustrate cases where you can display the output of `.interactive()` and not worry about its type, which is no longer a DataFrame or a HoloViews object, but an `Interactive` object:" ] }, { "cell_type": "code", "execution_count": null, "id": "5de4562d", "metadata": {}, "outputs": [], "source": [ "type(geo)" ] }, { "cell_type": "markdown", "id": "3fc1c9c7", "metadata": {}, "source": [ "What if you need to work with some part of the interactive pipeline, e.g. to feed it to some function or object that does not understand `Interactive` objects? In such a case, you can use what is called a `terminating method` on your Interactive object to get at the underlying object for you to use.\n", "\n", "For instance, let's create magnitude and depth histograms on this subset of the data as in an earlier notebook and see if we can enable linked selections on them:" ] }, { "cell_type": "code", "execution_count": null, "id": "eac31693", "metadata": {}, "outputs": [], "source": [ "mag_subhist = filtered_subrange.hvplot(y='mag', kind='hist', responsive=True, min_height=200)\n", "depth_subhist = filtered_subrange.hvplot(y='depth', kind='hist', responsive=True, min_height=200)\n", "\n", "combined = mag_subhist + depth_subhist\n", "combined" ] }, { "cell_type": "markdown", "id": "acdd7192", "metadata": {}, "source": [ "Note that this looks like a HoloViews layout with some widgets, but this object is *not* a HoloViews object. Instead it is still an `Interactive` object:" ] }, { "cell_type": "code", "execution_count": null, "id": "299ff98f", "metadata": {}, "outputs": [], "source": [ "type(combined)" ] }, { "cell_type": "markdown", "id": "a0478d6f", "metadata": {}, "source": [ "`link_selections` does not currently understand `Interactive` objects, and so it will raise an exception when given one. If we need a HoloViews `Layout`, e.g. for calling `link_selections`, we can build a layout from the constituent objects using the `.holoviews()` terminating method on `Interactive`:" ] }, { "cell_type": "code", "execution_count": null, "id": "6c44539c", "metadata": {}, "outputs": [], "source": [ "layout = mag_subhist.holoviews() + depth_subhist.holoviews()\n", "layout" ] }, { "cell_type": "markdown", "id": "e158e91c", "metadata": {}, "source": [ "This is now a HoloViews object, so we can use it with `link_selections`:" ] }, { "cell_type": "code", "execution_count": null, "id": "aebccfd7", "metadata": {}, "outputs": [], "source": [ "print(type(layout))\n", "\n", "ls = hv.link_selections.instance()\n", "ls(mag_subhist.holoviews()) + ls(depth_subhist.holoviews())" ] }, { "cell_type": "markdown", "id": "c41f6b6b", "metadata": {}, "source": [ "You can use the box selection tool to see how selections compare between these plots. However, you will note that the widgets are no longer displayed. To address this, we can display the widgets separately using a different terminating method, namely `.widgets()`:" ] }, { "cell_type": "code", "execution_count": null, "id": "3914cf40", "metadata": {}, "outputs": [], "source": [ "filtered_subrange.widgets()" ] }, { "cell_type": "markdown", "id": "211e1dcd", "metadata": {}, "source": [ "For reference, the terminating methods for an `Interactive` object are:\n", "\n", "- `.holoviews()`: Give me a HoloViews object\n", "- `.panel()`: Give me a Panel ParamFunction\n", "\n", "- `.widgets()`: Give me a layout of widgets associated with this interactive object\n", "- `.layout()`: Give me the layout of the widgets and display `pn.Column(obj.widgets(), obj.panel())` where `pn.Column` will be described in the [Dashboards notebook](./06_Dashboards.ipynb)." ] }, { "cell_type": "markdown", "id": "0cbfe4eb", "metadata": {}, "source": [ "## Conclusion\n", "\n", "Using the techniques above, you can build up a collection of plots and other outputs with Panel widgets to control individual bits of computation and display. \n", "\n", "What if you want to collect these pieces and put them together into a standalone app or dashboard? If so, then the next tutorial will show you how to do so!" ] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 5 }