{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"+ title: Interactive plots of large data sets made easy: Datashader\n",
"+ date: 2019-08-15\n",
"+ modified: 2020-01-08\n",
"+ tags: python, datashader, holoviews, bokeh, maps, holoviz\n",
"+ Slug: interactive-large-data-plots-datashader\n",
"+ Category: Python\n",
"+ Authors: MC\n",
"+ Summary: When plotting huge data sets using Python while keeping interactivity, Datashader is paramount. In this post, I demonstrate the abilities of this powerful and convenient library. We use an unique dataset containing a whole year of shared bike usage in Cologne to plot over a million locations on a map."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get this post as an interactive Jupyter Notebook and execute the code via Binder:\n",
" \n",
"*Update*: Now using data from 2019 instead of 2018"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Motivation \n",
"\n",
"In a [previous post]({filename}/kvb_geoviews1.ipynb), we've look at `GeoViews` as a convenient and powerful Python library for visualizing geo data. We've seen that it is able to plot tens of thousands of points on a map in spite of being fully interactive. In this post, we will explore another library that is part of the [HoloViz](http://holoviz.org/) initiative. [Datashader](http://datashader.org/) is able to visualize truly large datasets by using an optimized rendering pipeline. [HoloViews'](https://holoviews.org) support for Datashader makes plotting millions of data points pretty easy, even while maintaining interactivity. In the following, we will again use our [Cologne bike rental]({filename}/kvb_part1.md) data to demonstrate DataShader's abilities. Get the data [here](https://data.world/mc51/cologne-rental-bike-nextbike-locations-for-2018) and follow along. \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting a whole year of bike locations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our dataset contains all KVB bike locations for the whole year ~~2018~~ 2019. Thats going to be a lot of data to display at once. But this is where datashader's strength comes into play. Other interactive libraries, i.e. bokeh, plotly etc. embed the data as JSON in an `.hmtl` file and let the browser do all the work. In contrast, datashader renders an image containing the processed data at the appropriate level. Consequently, all the hard work (the computation) is done beforehand.
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the first step, we load the libraries we'll be using:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"
\\n\"+\n", " \"\\n\"+\n",
" \"from bokeh.resources import INLINE\\n\"+\n",
" \"output_notebook(resources=INLINE)\\n\"+\n",
" \"\\n\"+\n",
" \"\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"\\n\"+\n \"| \n", " | STT_NR | \n", "district | \n", "SHAPE_AREA | \n", "SHAPE_LEN | \n", "geometry | \n", "area_km2 | \n", "
|---|---|---|---|---|---|---|
| 0 | \n", "909 | \n", "Flittard | \n", "7.735915e+06 | \n", "14368.509990 | \n", "POLYGON ((7.01364 51.02217, 7.01349 51.02200, ... | \n", "7.735915 | \n", "
| 1 | \n", "905 | \n", "Dellbrück | \n", "9.946585e+06 | \n", "16722.887925 | \n", "POLYGON ((7.06747 50.99363, 7.06758 50.99357, ... | \n", "9.946585 | \n", "
| 2 | \n", "807 | \n", "Brück | \n", "7.499469e+06 | \n", "12759.921710 | \n", "POLYGON ((7.07104 50.95452, 7.07138 50.95388, ... | \n", "7.499469 | \n", "
| 3 | \n", "707 | \n", "Urbach | \n", "2.291919e+06 | \n", "7008.335906 | \n", "POLYGON ((7.09195 50.88922, 7.09249 50.88921, ... | \n", "2.291919 | \n", "
| 4 | \n", "501 | \n", "Nippes | \n", "2.995158e+06 | \n", "8434.060948 | \n", "POLYGON ((6.95471 50.97357, 6.95475 50.97332, ... | \n", "2.995158 | \n", "
| \n", " | bike_id | \n", "scrape_weekday | \n", "u_id | \n", "lat | \n", "lon | \n", "scrape_datetime | \n", "
|---|---|---|---|---|---|---|
| 0 | \n", "21664 | \n", "Tue | \n", "34183 | \n", "50.936540 | \n", "6.948328 | \n", "2019-01-01 00:05:02 | \n", "
| 1 | \n", "21619 | \n", "Tue | \n", "672578 | \n", "50.953946 | \n", "6.912720 | \n", "2019-01-01 00:05:02 | \n", "