{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "\n", "\n", "

Tutorial A2. Dashboard Workflow

" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Workflow for building and deploying interactive visualizations" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**Let's say you want to make it easy to explore some dataset, i.e.:** \n", "\n", "* Make a visualization of the data\n", "* Maybe add some custom widgets to see the effects of some variables\n", "* Then deploy the result as a web app." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**You can definitely do that in Python, but you would expect to:**\n", "* Spend days of effort to get some initial prototype working in a Jupyter notebook, every time\n", "* Work hard to tame the resulting opaque mishmash of domain-specific, widget, and plotting code\n", "* Start over nearly from scratch whenever you need to:\n", " - Deploy in a standalone server\n", " - Visualize different aspects of your data\n", " - Scale up to larger (>100K) datasets" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Step-by-step data-science workflow\n", "\n", "Here we'll show a simple, flexible, powerful, step-by-step workflow, explaining which open-source tools solve each of the problems involved:\n", "\n", "- Step 1: Get some data\n", "- Step 2: Prototype a plot in a notebook\n", "- Step 3: Model your domain\n", "- Step 4: Get a widget-based UI for free\n", "- Step 5: Link your domain model to your visualization\n", "- Step 6: Widgets now control your interactive plots\n", "- Step 7: Deploy your dashboard" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import holoviews as hv, geoviews as gv, param, dask.dataframe as dd, cartopy.crs as crs\n", "\n", "from colorcet import cm, fire\n", "from holoviews.operation import decimate\n", "from holoviews.operation.datashader import datashade\n", "from holoviews.streams import RangeXY\n", "from geoviews.tile_sources import EsriImagery" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Step 1: Get some data\n", "\n", "* Here we'll use a subset of the often-studied NYC Taxi dataset\n", "* About 12 million points of GPS locations from taxis\n", "* Stored in the efficient Parquet format for easy access\n", "* Loaded into a Dask dataframe for multi-core
(and if needed, out-of-core or distributed) computation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "
\n", " If you have less than 8GB of memory or haven't downloaded the full data, substitute data/.data_stubs/ for data/ in the following cell:\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "%time df = dd.read_parquet('../data/nyc_taxi_wide.parq').persist()\n", "print(len(df))\n", "df.head(2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Step 2: Prototype a plot in a notebook\n", "\n", "* A text-based representation isn't very useful for big datasets like this, so we need to build a plot\n", "* But we don't want to start a software project, so we use HoloViews:\n", " - Simple, declarative way to annotate your data for visualization\n", " - Large library of Elements with associated visual representation\n", " - Elements combine (lay out or overlay) easily\n", "* And we'll want live interactivity, so we'll use a Bokeh plotting extension\n", "* Result:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "hv.extension('bokeh')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "points = hv.Points(df, ['pickup_x', 'pickup_y'])\n", "decimate(points)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Here ``Points`` declares an object wrapping `df`, visualized as a scatterplot of the pickup locations. `decimate` limits how many points will be sent to the browser so it won't crash. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "As you can see, HoloViews makes it simple to pop up a visualization of your data, getting *something* on screen with only a few characters of typing. But it's not particularly pretty, so let's customize it a bit:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "opts = dict(width=700, height=600, xaxis=None, yaxis=None, bgcolor='black')\n", "decimate(points.options(**opts))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "That looks a bit better, but it's still decimating the data nearly beyond recognition, so let's try using Datashader to rasterize it into a fixed-size image to send to the browser:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "taxi_trips = datashade(points, cmap=fire).options(**opts)\n", "taxi_trips" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Ok, that looks good now; there's clearly lots to explore in this dataset. Notice that the aspect ratio changed because Datashader is using *every* point, including some distant outliers. One way to fix the aspect ratio is to indicate that it's geographic data by overlaying it on a map:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "taxi_trips = datashade(points, x_sampling=1, y_sampling=1, cmap=fire).options(**opts)\n", "EsriImagery * taxi_trips" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We could add lots more visual elements (laying out additional plots left and right, overlaying annotations, etc.), but let's say that this is our basic visualization we'll want to share. \n", "\n", "To sum up what we've done so far, here are the complete 10 lines of code required to generate this geo-located interactive plot of millions of datapoints in Jupyter:\n", "\n", "```\n", "import holoviews as hv, geoviews as gv, dask.dataframe as dd\n", "from colorcet import fire\n", "from holoviews.operation.datashader import datashade\n", "from geoviews.tile_sources import EsriImagery\n", "hv.extension('bokeh')\n", "df = dd.read_parquet('../data/nyc_taxi_wide.parq').persist()\n", "opts = dict(width=700, height=600, xaxis=None, yaxis=None, bgcolor='black')\n", "points = hv.Points(df, ['pickup_x', 'pickup_y'])\n", "taxi_trips = datashade(points, x_sampling=1, y_sampling=1, cmap=fire).options(\\*\\*opts)\n", "EsriImagery * taxi_trips\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Step 3: Model your domain\n", "\n", "Now that we've prototyped a nice plot, we could keep editing the code above to explore this data. But at this point we will instead often wish to start sharing our results with people not familiar with programming visualizations in this way. \n", "\n", "So the next step: figure out what we want our intended user to be able to change, and declare those variables or parameters with:\n", "\n", " - type and range checking\n", " - documentation strings\n", " - default values\n", "\n", "The Param library allows declaring Python attributes having these features (and more, such as dynamic values and inheritance), letting you set up a well-defined space for a user (or you!) to explore." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## NYC Taxi Parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "cmaps = ['bgy','bgyw','bmw','bmy','fire','gray','kbc','kgy']\n", "\n", "class NYCTaxiExplorer(param.Parameterized):\n", " alpha = param.Magnitude(default=0.75, doc=\"Alpha value for the map opacity\")\n", " plot = param.ObjectSelector(default=\"pickup\", objects=[\"pickup\",\"dropoff\"])\n", " colormap = param.ObjectSelector(default='fire', objects=cmaps)\n", " passengers = param.Range(default=(0, 10), bounds=(0, 10), doc=\"\"\"\n", " Filter for taxi trips by number of passengers\"\"\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Each Parameter is a normal Python attribute, but with special checks and functions run automatically when getting or setting.\n", "\n", "Parameters capture your goals and your knowledge about your domain, declaratively." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Class level parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "NYCTaxiExplorer.alpha" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "NYCTaxiExplorer.alpha = 0.5\n", "NYCTaxiExplorer.alpha" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Validation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "try:\n", " NYCTaxiExplorer.alpha = '0'\n", "except Exception as e:\n", " print(e)\n", " \n", "try:\n", " NYCTaxiExplorer.passengers = (0,100)\n", "except Exception as e:\n", " print(e) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Instance parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "explorer = NYCTaxiExplorer(alpha=0.6)\n", "explorer.alpha" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "NYCTaxiExplorer.alpha" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Step 4: Get a widget-based UI for free\n", "\n", "* Parameters are purely declarative and independent of any widget toolkit, but contain all the information needed to build interactive widgets\n", "* Panel can generate UIs in Jupyter or Bokeh Server from Parameters\n", "* Declaration of parameters is independent of the UI library used\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import panel as pn\n", "\n", "pn.Row(NYCTaxiExplorer)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "NYCTaxiExplorer.passengers" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Step 5: Link your domain model to your visualization\n", "\n", "We've now defined the space that's available for exploration, and the next step is to link up the parameter space with the code that specifies the plot:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class NYCTaxiExplorer(param.Parameterized):\n", " alpha = param.Magnitude(default=0.75, doc=\"Alpha value for the map opacity\")\n", " colormap = param.ObjectSelector(default='fire', objects=cmaps)\n", " plot = param.ObjectSelector(default=\"pickup\", objects=[\"pickup\",\"dropoff\"])\n", " passengers = param.Range(default=(0, 10), bounds=(0, 10))\n", "\n", " def make_view(self, x_range=None, y_range=None, **kwargs):\n", " points = hv.Points(df, kdims=[self.plot+'_x', self.plot+'_y'], vdims=['passenger_count'])\n", " selected = points.select(passenger_count=self.passengers)\n", " taxi_trips = datashade(selected, x_sampling=1, y_sampling=1, cmap=cm[self.colormap],\n", " width=800, height=475)\n", " return EsriImagery.clone(crs=crs.GOOGLE_MERCATOR).options(alpha=self.alpha, **opts) * taxi_trips" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Note that the `NYCTaxiExplorer` class is entirely declarative (no widgets), and can be used \"by hand\" to provide range-checked and type-checked plotting for values from the declared parameter space:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "explorer = NYCTaxiExplorer(alpha=0.4, plot=\"dropoff\")\n", "explorer.make_view()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Step 6: Widgets now control your interactive plots\n", "\n", "But in practice, why not pop up the widgets to make it fully interactive?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "explorer = NYCTaxiExplorer()\n", "r = pn.Row(explorer, explorer.make_view)\n", "r" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Step 7: Deploy your dashboard\n", "\n", "Ok, now you've got something worth sharing, running inside Jupyter. But if you want to share your interactive app with people who don't use Python, you'll now want to run a server with this same code.\n", "\n", "* Deploy with **Bokeh Server**:\n", " - Write the above code to a file\n", " - Add `r.server_doc();` to the end to tell Bokeh Server which object to show in the dashboard\n", " - ``bokeh serve nyc_taxi/main.py``" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Complete dashboard code" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "with open('apps/nyc_taxi/main.py', 'r') as f: print(f.read())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Branching out\n", "\n", "The other sections in this tutorial will expand on steps in this workflow, providing more step-by-step instructions for each of the major tasks. These techniques can create much more ambitious apps with very little additional code or effort:\n", "\n", "* Adding additional linked or separate subplots of any type;
see [2 - Annotating your data](./02_Annotating_Data.ipynb) and [4 - Working with datasets](./04-Working_with_Datasets.ipynb).\n", "* Declaring code that runs for clicking or selecting *within* the Bokeh plot; see [8 - Custom interactivity](./08_Custom_Interactivity.ipynb).\n", "* Using multiple sets of widgets of many different types; see [Panel](http://panel.pyviz.org).\n", "* Using datasets too big for any one machine, with [Dask.Distributed](https://distributed.readthedocs.io).\n", "* Presenting Jupyter notebooks like this one as slides using [RISE](https://github.com/damianavila/RISE)." ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" } }, "nbformat": 4, "nbformat_minor": 2 }