{
"cells": [
{
"cell_type": "markdown",
"id": "4f5aceb8",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Data Visualization with Bokeh\n",
"\n",
"### PyHEP 2021 (virtual) Workshop\n",
"\n",
"### Author: Bruno Alves | Date: 6 July 2021"
]
},
{
"cell_type": "markdown",
"id": "3a153079",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Disclaimers"
]
},
{
"cell_type": "markdown",
"id": "75e695ec",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"1. This tutorial is heavily opinionated: the definition of \"best\" plotting library can vary \n",
" - ```bokeh``` is the best for me, and it could be the best for you too"
]
},
{
"cell_type": "markdown",
"id": "7f95a6f2",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"2. I am in now way involved with the development of ```bokeh```; I am simply a user, just like most of you \n",
" - I have been using ```bokeh``` for the past ~3 years"
]
},
{
"cell_type": "markdown",
"id": "6e7b032f",
"metadata": {
"cell_style": "center",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Motivation\n"
]
},
{
"cell_type": "markdown",
"id": "d9c34fb8",
"metadata": {
"cell_style": "center",
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
" 1. Get people to know, enjoy and use ```bokeh```"
]
},
{
"cell_type": "markdown",
"id": "0edbbaae",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
" - Does not seem to be popular in HEP. However:\n",
" - LHCb uses it for [data quality monitoring](https://cds.cern.ch/record/2298467)\n",
" - It was [mentioned](https://arxiv.org/abs/1811.10309) by the [HEP Software Foundation](https://hepsoftwarefoundation.org/) (but dismissed; fortunately their reasons are now completely outdated)"
]
},
{
"cell_type": "markdown",
"id": "8473cfe0",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
" - As other plotting alternatives, it is shadowed by the ubiquitousness of ```matplotlib``` "
]
},
{
"cell_type": "markdown",
"id": "6334f167",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
" 2. ```bokeh``` code, when compared to ```matplotlib``` (personal opinion, of course):"
]
},
{
"cell_type": "markdown",
"id": "009bb4dd",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
" - is more readable"
]
},
{
"cell_type": "markdown",
"id": "ebc27a24",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
" - is easier to write without constantly resorting to the documentation\n",
" - ```mpl```'s docs are unreliable"
]
},
{
"cell_type": "markdown",
"id": "a49ab31c",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
" - gives simple interactive plots for free"
]
},
{
"cell_type": "markdown",
"id": "ba3c3866",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
" - can be used for easily creating and sharing complex and virtually unlimited interactive visualizations/dashboards"
]
},
{
"cell_type": "markdown",
"id": "32477457",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"```matplotlib``` is still more popular because:"
]
},
{
"cell_type": "markdown",
"id": "a5ab366e",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
" - it is older (started in 2003, vs. 2013 for ```bokeh```) and has more features than current alternatives"
]
},
{
"cell_type": "markdown",
"id": "e3ab6f74",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
" - people have the tendency to resist change"
]
},
{
"cell_type": "markdown",
"id": "30bf518c",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
" - most default examples for anything on StackOverflow use ```matplotlib```:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5d7ce15c",
"metadata": {
"cell_style": "center",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"from bokeh.io import output_notebook, show\n",
"from bokeh.plotting import figure\n",
"output_notebook()\n",
"\n",
"nquestions=[59322, 4355, 767, 127]\n",
"libs=['mpl', 'bokeh', 'altair', 'plotnine']\n",
"\n",
"p = figure(plot_height=600, plot_width=800,\n",
" title='Histogram', \n",
" x_range=libs)\n",
"p.vbar(x=libs, top=nquestions, width=0.9)\n",
"p.yaxis.axis_label = 'Number of questions posted on SO'\n",
"show(p) "
]
},
{
"cell_type": "markdown",
"id": "b7f69430",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Basic plotting"
]
},
{
"cell_type": "markdown",
"id": "09a47996",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"We start by some definitions to be used by multiple libraries:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "09c167ed",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"from types import SimpleNamespace\n",
"\n",
"#data for line plots\n",
"dline = SimpleNamespace( x=[1,2,3,4,5,6,7,8,9], \n",
" y=[6,7,2,8,9,3,4,5,1],\n",
" size=15,\n",
" line_color='blue',\n",
" out_color='red', \n",
" fill_color='orange',\n",
" fill_alpha=1 )\n",
"\n",
"#data for histograms\n",
"mu, sigma, npoints = 0, 0.5, 1000\n",
"nbins = 35\n",
"dhist = np.random.normal(mu, sigma, npoints)\n",
"hist_, edges_ = np.histogram(dhist, density=False, bins=nbins)\n",
"dhist = SimpleNamespace( data=dhist, hist=hist_, edges=edges_, nbins=nbins)"
]
},
{
"cell_type": "markdown",
"id": "7a05af47",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## ```matplotlib```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f85a8f9b",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"id": "12676e38",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Line plot:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fa32df3a",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"# the following requires ipympl \n",
"# %matplotlib widget\n",
"\n",
"fig = plt.figure(figsize=(8,6))\n",
"ax = fig.add_subplot(111)\n",
"ax.set_ylabel('Y')\n",
"plt.title('Histogram')\n",
"\n",
"plt_marker_options = dict(s=10*dline.size, color=dline.fill_color, marker='o',\n",
" edgecolor=dline.out_color,\n",
" alpha=dline.fill_alpha)\n",
"\n",
"plt.plot(dline.x, dline.y, color=dline.line_color)\n",
"plt.scatter(dline.x, dline.y, **plt_marker_options)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "1f5e8a81",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Histogram ([multiple APIs](https://matplotlib.org/stable/api/index.html)):"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d071bd0c",
"metadata": {
"cell_style": "split"
},
"outputs": [],
"source": [
"plt.hist(dhist.data, bins=dhist.nbins)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ffb8b4ca",
"metadata": {
"cell_style": "split"
},
"outputs": [],
"source": [
"#fig, ax = plt.subplots(figsize=(5,4))\n",
"fig = plt.figure()\n",
"ax = fig.add_subplot(111)\n",
"ax.hist(dhist.data, bins=dhist.nbins)\n",
"plt.show()\n",
"#we can create Figure and Axes instances explicitly"
]
},
{
"cell_type": "markdown",
"id": "f24df995",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"- I find ```matplotlib``` hard to use without constantly going back to the documentation, even for simple tasks"
]
},
{
"cell_type": "markdown",
"id": "2d5f4ed7",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- However, ```matplotlib``` is more mature and complete, being the oldest. In addition, some wrappers on top of it provide additional convenient functionalities, such as ```mplhep```."
]
},
{
"cell_type": "markdown",
"id": "2fd73d97",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- Unless what you want to do only exists in ```matplotlib```, I would suggest using ```bokeh``` for everything, including simple plots."
]
},
{
"cell_type": "markdown",
"id": "bfbc6886",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## ```bokeh```\n",
"\n",
"- built around glyphs\n",
"- relies on a \"layered\" approach (\"grammar of graphics\"), but mostly ignores data transformations"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73157af7",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"from bokeh.io import output_notebook, show\n",
"from bokeh.plotting import figure\n",
"output_notebook() # alternatively one could use output_file()"
]
},
{
"cell_type": "markdown",
"id": "2de202c3",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Line plot:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ca80c6f",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"# create a new plot with default tools, using figure\n",
"pline = figure(plot_width=400, plot_height=400)\n",
"\n",
"line_options = dict(line_width=2)\n",
"pline.line(dline.x, dline.y, **line_options)\n",
"\n",
"marker_options = dict(size=dline.size, color=dline.out_color, \n",
" fill_color=dline.fill_color, fill_alpha=dline.fill_alpha)\n",
"circ = pline.circle(dline.x, dline.y, **marker_options)\n",
"\n",
"show(pline)"
]
},
{
"cell_type": "markdown",
"id": "8f0cf23a",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Histogram:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d9b0e09d",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"hist_options = dict(fill_color=\"yellow\", line_color=\"black\", alpha=.8)\n",
"\n",
"phist = figure(title='Bokeh Histogram', plot_width=600, plot_height=400,\n",
" background_fill_color=\"#2a4f32\")\n",
"\n",
"phist.quad(top=dhist.hist, bottom=0, left=dhist.edges[:-1], right=dhist.edges[1:], **hist_options)\n",
"phist.ygrid.grid_line_color = None\n",
"\n",
"show(phist)"
]
},
{
"cell_type": "markdown",
"id": "35272923",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Setting properties"
]
},
{
"cell_type": "markdown",
"id": "ff0a7ded",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Figure and object properties can be very easily customised:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5fe9b396",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"#set figure properties\n",
"pline.title = 'Line Plot'\n",
"pline.xgrid.grid_line_color = 'red'\n",
"pline.yaxis.axis_label = 'Y Axis'\n",
"pline.outline_line_width = 2\n",
"\n",
"#set glyph properties\n",
"#recall: circ = p_line.circle(data.x, data.y, **marker_options)\n",
"circ.glyph.line_color = \"indigo\"\n",
"circ.glyph.line_dash = [3,1]\n",
"circ.glyph.line_width = 4\n",
"\n",
"show(pline)"
]
},
{
"cell_type": "markdown",
"id": "95b43242",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"One can search for specific properties in the documentation or else do:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "89ccb53e",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"from bokeh.models import Axis\n",
"print([x for x in vars(Axis) if x[:1] != \"_\"])"
]
},
{
"cell_type": "markdown",
"id": "d9ced292",
"metadata": {
"slideshow": {
"slide_type": "-"
}
},
"source": [
"The same idea can be applied to ```Title```, ```Legend```, ```Toolbar```, ... [[more about models](https://docs.bokeh.org/en/latest/docs/reference/models.html)]"
]
},
{
"cell_type": "markdown",
"id": "e698d817",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Not everything is perfect..."
]
},
{
"cell_type": "markdown",
"id": "8d21b6f1",
"metadata": {},
"source": [
"- Less customisation than ```matplotlib```"
]
},
{
"cell_type": "markdown",
"id": "4a1514c8",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- High-level charts were deprecated. Possible alternatives:\n",
" - [HoloViews](https://holoviews.org/index.html)\n",
" - [Chartify](https://github.com/spotify/chartify) (virtually no documentation, one [tutorial](https://github.com/spotify/chartify/blob/master/examples/Chartify%20Tutorial.ipynb))"
]
},
{
"cell_type": "markdown",
"id": "531c2564",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"\n",
"- The flexibility/time tradeoff might not be optimal in some scenarios (*e.g.* quick interactive plotting)"
]
},
{
"cell_type": "markdown",
"id": "8e4d8d2b",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- No native 3D plots available\n",
" - it [can be done](https://docs.bokeh.org/en/latest/docs/user_guide/extensions_gallery/wrapping.html#userguide-extensions-examples-wrapping), but it is way too cumbersome"
]
},
{
"cell_type": "markdown",
"id": "a3931ad3",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"- No support for inset plots (which ```matplotlib``` [supports](https://matplotlib.org/1.3.1/mpl_toolkits/axes_grid/users/overview.html#insetlocator) ): current [feature request](https://github.com/bokeh/bokeh/issues/3821)"
]
},
{
"cell_type": "markdown",
"id": "c44bb230",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"##### Other ```bokeh``` features not explored in this tutorial:\n",
"\n",
"- [data streaming](https://docs.bokeh.org/en/latest/docs/user_guide/data.html#appending-data-to-a-columndatasource)\n",
"- [mapping geo data](https://docs.bokeh.org/en/latest/docs/user_guide/geo.html)\n",
"- [embed plots in websites](https://docs.bokeh.org/en/latest/docs/user_guide/embed.html)\n",
"- [network graph visualization](https://docs.bokeh.org/en/latest/docs/user_guide/graph.html#userguide-graph)"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "PyHEP21",
"language": "python",
"name": "pyhep21"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}