{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "With GPS-enabled devices, it's easy to collect a large quantity of trajectory data, i.e. a connected series of points in 2D or 3D. However, it's not so easy to plot large datasets with most plotting programs, and so people generally downsample the trajectories, which can hide important features of the data. Here we show how to use [datashader](https://github.com/bokeh/datashader) to look at *all* the datapoints even for large datasets, faithfully displaying the data at the highest level, while revealing additional structure when examining small regions of the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import datashader as ds\n", "import datashader.transfer_functions as tf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create some fake data\n", "\n", "Here we create a fake trajectory in a 2D space by calculating a random walk with momentum and then adding various perturbations:\n", "\n", "1. a sine-based displacement of the x axis to simulate e.g. a mechanical problem\n", "2. random noise on both x and y to simulate measurement uncertainty\n", "3. a completely arbitrary bogus value at a fixed location, to simulate corrupted data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Constants\n", "np.random.seed(1)\n", "n = 1000000 # Number of points\n", "f = filter_width = 5000 # momentum or smoothing parameter, for a moving average filter\n", "\n", "# filtered random walk\n", "xs = np.convolve(np.random.normal(0, 0.1, size=n), np.ones(f)/f).cumsum()\n", "ys = np.convolve(np.random.normal(0, 0.1, size=n), np.ones(f)/f).cumsum()\n", "\n", "# Add \"mechanical\" wobble on the x axis\n", "xs += 0.1*np.sin(0.1*np.array(range(n-1+f)))\n", "\n", "# Add \"measurement\" noise\n", "xs += np.random.normal(0, 0.005, size=n-1+f)\n", "ys += np.random.normal(0, 0.005, size=n-1+f)\n", "\n", "# Add a completely incorrect value\n", "xs[int(len(xs)/2)] = 100\n", "ys[int(len(xs)/2)] = 0\n", "\n", "# Create a dataframe\n", "df = pd.DataFrame(dict(x=xs,y=ys))\n", "\n", "# Default plot ranges:\n", "x_range = (xs.min(), xs.max())\n", "y_range = (ys.min(), ys.max())\n", "\n", "df.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Static Plots\n", "\n", "With datashader, it's quick and easy to plot the entire 1-million-point dataset, without any downsampling:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_image(x_range=x_range, y_range=y_range, w=500, h=500):\n", " cvs = ds.Canvas(x_range=x_range, y_range=y_range, plot_height=h, plot_width=w)\n", " agg = cvs.line(df, 'x', 'y', agg=ds.any())\n", " return tf.shade(agg)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%time create_image()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here you can immediately see the long straight line that results from the one stray value. A single stray such value is very unlikely to be discovered in any downsampled plot, but is obvious here.\n", "\n", "Similarly, the plot above suggests that the trajectory is relatively smooth, but if you zoom in further you can see the wobble that's not obvious at the high-level view:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_image(x_range=(95.0,101.0), y_range=(-61,-55))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Zooming in even further reveals the low-level noise displacing around each datapoint:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "create_image(x_range=(98.8,99.8), y_range=(-60.2,-59.2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Downsampling the dataset would cover up each of these problems. If such data were then used for e.g. calculating total trajectory length, the result would be wildly incorrect -- the single incorrect reading would greatly increase the calculated length, while downsampling the \"measurement noise\" and the added wobble would decrease the reported length. Datashading makes the full dataset available for visual inspection, so that issues like this can be detected rather than being covered up by downsampling.\n", "\n", "\n", "### Dynamic Plots\n", "\n", "Specifying hard-coded ranges as above is awkward, so it's much more natural to simply zoom in interactively, which can be done using the `datashade` operation imported from [HoloViews](http://holoviews.org/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from holoviews.operation.datashader import datashade, spread\n", "import holoviews as hv\n", "hv.extension('bokeh')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now it is possible to build an interactive visualization by passing HoloViews `Path` object to the `datashade` operation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "opts = hv.opts.RGB(width=900, height=500, aspect='equal')\n", "spread(datashade(hv.Path(df, kdims=['x','y']), cnorm='linear', aggregator=ds.any())).opts(opts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you are looking at a live, running version of this notebook and enable the scroll zoom tool in the upper right of the plot, you can select any region of the plot to examine more closely, with very fast updates as you explore even for large datasets." ] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 1 }