{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\"HV+BK\n", "

Exercise 4: Dynamic Interactions

" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import holoviews as hv\n", "import geoviews as gv\n", "\n", "hv.extension('bokeh')\n", "%opts RGB [width=600 height=600]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "diamonds = pd.read_csv('../data/diamonds.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As should be second nature for us now, we will look at this dataframe before we start doing anything." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "diamonds.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we will display a static plot of 'carat' vs. 'price' as we did in the first exercise, alongside a BoxWhisker plot of the distributions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%opts Scatter [width=600 height=400 logy=True tools=['box_select'] color_index='cut']\n", "%%opts Scatter (size=1.5 cmap='tab20c')\n", "\n", "scatter = hv.Scatter(diamonds.sample(10000), 'carat', ['price', 'cut', 'clarity']).select(carat=(0, 3))\n", "boxwhisker = hv.BoxWhisker(scatter, 'clarity', 'price')\n", "\n", "scatter + boxwhisker" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the ``BoxWhisker`` element here will statically display the whole distribution. But if you try out the \"Box select\" tool, you can select a subset of the Scatter points. Can we link the boxwhisker plot to selections made on the ``Scatter`` plot, so that we can see distributions in that particular region of the data space? Yes, as long as we have these three things:\n", "\n", "1. A stream that collects selection events from the ``scatter`` object\n", "2. A callback that constructs a HoloViews element from the given selection and returns it\n", "3. A DynamicMap that runs the callback each time a new selection is available\n", "\n", "For step 1, we provide the ``scatter`` object as the source for a ``Selection1D`` stream that will provide the ``index`` of all the selected nodes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "selection = hv.streams.Selection1D(source=scatter)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For step 2, write a function that can accept the ``index`` values, select those values from the original dataset, and return the appropriate HoloViews element; something like:\n", "\n", "```\n", "def selection_boxwhisker(index):\n", " selection = scatter.iloc[index] if len(index)>0 else scatter\n", " return ...some hv element built from the selection...\n", "```\n", "\n", "Here ``selection_boxwhisker`` should return a ``BoxWhisker`` element for the selection, plotting 'price' against 'clarity'. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For step 3, define a ``DynamicMap`` using the ``selection`` stream and your custom callback and lay it out next to the ``scatter`` object as above.\n", "\n", "Hint\n", "\n", "
\n", "A ``DynamicMap`` requires a callback function as its first argument and streams should be supplied in a list as a keyword argument.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Solution\n", "\n", "
\n", "
\n", "selection = hv.streams.Selection1D(source=scatter)\n", "\n", "def selection_boxwhisker(index):\n", " selection = scatter.iloc[index]\n", " return hv.BoxWhisker(selection, 'clarity', 'price')\n", "\n", "scatter + hv.DynamicMap(selection_boxwhisker, streams=[selection])\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 2: Streaming Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Exercise 1 used HoloViews streams to collect user interaction events (selections). Here, let's use them to view data sources that themselves are updating over time.\n", "\n", "First, let's set up a (simulated) streaming data source in form of taxi pickup locations. The code below splits the taxi dataset into chunks by hour which will be emitted one by one to emulate a live, streaming data source." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time\n", "import colorcet\n", "from itertools import cycle\n", "from holoviews.operation.datashader import datashade\n", "\n", "def taxi_trips_stream(source='../data/nyc_taxi_wide.parq', frequency='H'):\n", " \"\"\"Generate dataframes grouped by given frequency\"\"\"\n", " def get_group(resampler, key):\n", " try:\n", " df = resampler.get_group(key)\n", " df.reset_index(drop=True)\n", " except KeyError:\n", " df = pd.DataFrame()\n", " return df\n", "\n", " df = pd.read_parquet(source,\n", " infer_datetime_format=True,\n", " usecols=['tpep_pickup_datetime', 'pickup_x', 'pickup_y', 'total_amount'])\n", " df = df.set_index('tpep_pickup_datetime', drop=True)\n", " df = df.sort_index()\n", " r = df.resample(frequency)\n", " chunks = [get_group(r, g) for g in sorted(r.groups)]\n", " indices = cycle(range(len(chunks)))\n", " while True:\n", " yield chunks[next(indices)]\n", "\n", "trips = taxi_trips_stream()\n", "example = next(trips)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As usual let's start by inspecting the data, in this case the initial chunk emitted above:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "example.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To build our streaming visualization, first declare a a map tile source for a background plot, and then make a ``Pipe`` stream initialized with the example chunk of data already emitted:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "tiles = gv.WMTS('https://maps.wikimedia.org/osm-intl/{Z}/{X}/{Y}@2x.png')\n", "pipe = hv.streams.Pipe(example)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then you will need to define a callback to use when declaring a ``DynamicMap``. This function will need to accept a chunk of data, then return a ``Points`` object displaying the 'pickup_x' and 'pickup_y' coordinates and a ``label`` indicating the time range being covered. Something like:\n", "\n", "```\n", "def hourly_points(data):\n", " label = '%s - %s' % (str(data.index.min()), str(data.index.max()))\n", " return ...some hv object using the given data...\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, use that callback and the ``pipe`` stream to define a ``DynamicMap``, applying the datashade operation to the DynamicMap and then overlaying it on top of the ``tiles``. \n", "\n", "**Warning**: Do not display the ``DynamicMap`` without applying the ``datashade()`` operation, or you run the risk of freezing your browser.\n", "\n", "Hint\n", "\n", "
\n", "To apply datashading simply call ``datashade(dynamicmap)``.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should now see a map of New York City with the taxi trips on top. Run the next cell to send events to the ``Pipe`` and update the plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "for i in range(100):\n", " time.sleep(0.05)\n", " pipe.send(next(trips))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Solution\n", "\n", "
\n", "
\n", "%%opts RGB [width=600 height=600]\n", "pipe = hv.streams.Pipe(example)\n", "tiles = gv.WMTS('https://maps.wikimedia.org/osm-intl/{Z}/{X}/{Y}@2x.png')\n", "\n", "def hourly_points(data):\n", " label = '%s - %s' % (str(data.index.min()), str(data.index.max()))\n", " return hv.Points(data, ['pickup_x', 'pickup_y'], label=label)\n", "\n", "points = hv.DynamicMap(hourly_points, streams=[pipe])\n", "tiles * datashade(points)\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the previous exercise we used the ``Pipe`` stream, which emits just the latest chunk. That's a good way to monitor an ongoing stream, but often you'll instead want to accumulate data over time, showing the latest chunk combined with other previous chunks. Here we will stream data using the ``Buffer`` stream, which accumulates data until its length is reached. We will start by defining some options, an example dataframe, and the ``Buffer`` stream with a length of 1,000,000:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%opts Curve [width=800 height=400] (color='black' line_width=1) {+framewise} Scatter (color='red')\n", "from holoviews.operation.timeseries import resample, rolling_outlier_std\n", "example = next(trips)[['fare_amount']]\n", "buffer = hv.streams.Buffer(example, length=1000000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As before, you'll need to complete the callback function so it returns an element. In this case, we need a ``Curve`` plotting the 'fare_amount' against the 'tpep_pickup_datetime', starting something like:\n", "\n", "```\n", "def fare_curve(data):\n", " ...\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again as before, we need to define a ``DynamicMap`` that uses this callback in combination with a stream (``buffer`` in this case). Here let's assign it to a variable rather than try to show it right away:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, apply the ``resample`` operation to the DynamicMap object, with ``rule='T'`` and ``function=np.sum`` and then apply the ``rolling_outlier_std`` operation to the output of that. Finally display an overlay of the ``resample`` output and the ``rolling_outlier_std`` output.\n", "\n", "Hint\n", "\n", "
\n", "Operations like ``resample`` and ``rolling_outlier_std`` can be chained, e.g.:\n", "

\n", "resampled = resample(dmap)\n", "outliers = rolling_outlier_std(resampled)\n", "resampled * outliers\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that you've displayed the plot, let's start sending some data to the buffer, which should start accumulating 1000000 trips:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "for i in range(100):\n", " time.sleep(0.1)\n", " buffer.send(next(trips)[['fare_amount']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Solution\n", "\n", "
\n", "
\n", "%%opts Curve [width=800 height=400] (color='black' line_width=1) {+framewise} Scatter (color='red') Overlay [show_legend=False]\n", "\n", "example = next(trips)[['fare_amount']]\n", "buffer = hv.streams.Buffer(example, length=1000000)\n", "\n", "def fare_curve(data):\n", " return hv.Curve(data, 'tpep_pickup_datetime', 'fare_amount')\n", "\n", "fares = hv.DynamicMap(fare_curve, streams=[buffer])\n", "minutely = resample(fares, rule='T', function=np.sum)\n", "minutely * rolling_outlier_std(minutely, rolling_window=10)\n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }