{ "cells": [ { "cell_type": "markdown", "id": "4f5aceb8", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Data Visualization with Bokeh\n", "\n", "### PyHEP 2021 (virtual) Workshop\n", "\n", "### Author: Bruno Alves | Date: 6 July 2021" ] }, { "cell_type": "markdown", "id": "3a153079", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Disclaimers" ] }, { "cell_type": "markdown", "id": "75e695ec", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "1. This tutorial is heavily opinionated: the definition of \"best\" plotting library can vary \n", " - ```bokeh``` is the best for me, and it could be the best for you too" ] }, { "cell_type": "markdown", "id": "7f95a6f2", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "2. I am in now way involved with the development of ```bokeh```; I am simply a user, just like most of you \n", " - I have been using ```bokeh``` for the past ~3 years" ] }, { "cell_type": "markdown", "id": "6e7b032f", "metadata": { "cell_style": "center", "slideshow": { "slide_type": "slide" } }, "source": [ "# Motivation\n" ] }, { "cell_type": "markdown", "id": "d9c34fb8", "metadata": { "cell_style": "center", "slideshow": { "slide_type": "subslide" } }, "source": [ " 1. Get people to know, enjoy and use ```bokeh```" ] }, { "cell_type": "markdown", "id": "0edbbaae", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " - Does not seem to be popular in HEP. However:\n", " - LHCb uses it for [data quality monitoring](https://cds.cern.ch/record/2298467)\n", " - It was [mentioned](https://arxiv.org/abs/1811.10309) by the [HEP Software Foundation](https://hepsoftwarefoundation.org/) (but dismissed; fortunately their reasons are now completely outdated)" ] }, { "cell_type": "markdown", "id": "8473cfe0", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " - As other plotting alternatives, it is shadowed by the ubiquitousness of ```matplotlib``` " ] }, { "cell_type": "markdown", "id": "6334f167", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ " 2. ```bokeh``` code, when compared to ```matplotlib``` (personal opinion, of course):" ] }, { "cell_type": "markdown", "id": "009bb4dd", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " - is more readable" ] }, { "cell_type": "markdown", "id": "ebc27a24", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " - is easier to write without constantly resorting to the documentation\n", " - ```mpl```'s docs are unreliable" ] }, { "cell_type": "markdown", "id": "a49ab31c", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " - gives simple interactive plots for free" ] }, { "cell_type": "markdown", "id": "ba3c3866", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " - can be used for easily creating and sharing complex and virtually unlimited interactive visualizations/dashboards" ] }, { "cell_type": "markdown", "id": "32477457", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```matplotlib``` is still more popular because:" ] }, { "cell_type": "markdown", "id": "a5ab366e", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " - it is older (started in 2003, vs. 2013 for ```bokeh```) and has more features than current alternatives" ] }, { "cell_type": "markdown", "id": "e3ab6f74", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " - people have the tendency to resist change" ] }, { "cell_type": "markdown", "id": "30bf518c", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " - most default examples for anything on StackOverflow use ```matplotlib```:" ] }, { "cell_type": "code", "execution_count": null, "id": "5d7ce15c", "metadata": { "cell_style": "center", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "from bokeh.io import output_notebook, show\n", "from bokeh.plotting import figure\n", "output_notebook()\n", "\n", "nquestions=[59322, 4355, 767, 127]\n", "libs=['mpl', 'bokeh', 'altair', 'plotnine']\n", "\n", "p = figure(plot_height=600, plot_width=800,\n", " title='Histogram', \n", " x_range=libs)\n", "p.vbar(x=libs, top=nquestions, width=0.9)\n", "p.yaxis.axis_label = 'Number of questions posted on SO'\n", "show(p) " ] }, { "cell_type": "markdown", "id": "b7f69430", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Basic plotting" ] }, { "cell_type": "markdown", "id": "09a47996", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "We start by some definitions to be used by multiple libraries:" ] }, { "cell_type": "code", "execution_count": null, "id": "09c167ed", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "import numpy as np\n", "from types import SimpleNamespace\n", "\n", "#data for line plots\n", "dline = SimpleNamespace( x=[1,2,3,4,5,6,7,8,9], \n", " y=[6,7,2,8,9,3,4,5,1],\n", " size=15,\n", " line_color='blue',\n", " out_color='red', \n", " fill_color='orange',\n", " fill_alpha=1 )\n", "\n", "#data for histograms\n", "mu, sigma, npoints = 0, 0.5, 1000\n", "nbins = 35\n", "dhist = np.random.normal(mu, sigma, npoints)\n", "hist_, edges_ = np.histogram(dhist, density=False, bins=nbins)\n", "dhist = SimpleNamespace( data=dhist, hist=hist_, edges=edges_, nbins=nbins)" ] }, { "cell_type": "markdown", "id": "7a05af47", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## ```matplotlib```" ] }, { "cell_type": "code", "execution_count": null, "id": "f85a8f9b", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "12676e38", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Line plot:" ] }, { "cell_type": "code", "execution_count": null, "id": "fa32df3a", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "%matplotlib inline\n", "# the following requires ipympl \n", "# %matplotlib widget\n", "\n", "fig = plt.figure(figsize=(8,6))\n", "ax = fig.add_subplot(111)\n", "ax.set_ylabel('Y')\n", "plt.title('Histogram')\n", "\n", "plt_marker_options = dict(s=10*dline.size, color=dline.fill_color, marker='o',\n", " edgecolor=dline.out_color,\n", " alpha=dline.fill_alpha)\n", "\n", "plt.plot(dline.x, dline.y, color=dline.line_color)\n", "plt.scatter(dline.x, dline.y, **plt_marker_options)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "1f5e8a81", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Histogram ([multiple APIs](https://matplotlib.org/stable/api/index.html)):" ] }, { "cell_type": "code", "execution_count": null, "id": "d071bd0c", "metadata": { "cell_style": "split" }, "outputs": [], "source": [ "plt.hist(dhist.data, bins=dhist.nbins)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "ffb8b4ca", "metadata": { "cell_style": "split" }, "outputs": [], "source": [ "#fig, ax = plt.subplots(figsize=(5,4))\n", "fig = plt.figure()\n", "ax = fig.add_subplot(111)\n", "ax.hist(dhist.data, bins=dhist.nbins)\n", "plt.show()\n", "#we can create Figure and Axes instances explicitly" ] }, { "cell_type": "markdown", "id": "f24df995", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- I find ```matplotlib``` hard to use without constantly going back to the documentation, even for simple tasks" ] }, { "cell_type": "markdown", "id": "2d5f4ed7", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- However, ```matplotlib``` is more mature and complete, being the oldest. In addition, some wrappers on top of it provide additional convenient functionalities, such as ```mplhep```." ] }, { "cell_type": "markdown", "id": "2fd73d97", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Unless what you want to do only exists in ```matplotlib```, I would suggest using ```bokeh``` for everything, including simple plots." ] }, { "cell_type": "markdown", "id": "bfbc6886", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## ```bokeh```\n", "\n", "- built around glyphs\n", "- relies on a \"layered\" approach (\"grammar of graphics\"), but mostly ignores data transformations" ] }, { "cell_type": "code", "execution_count": null, "id": "73157af7", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "from bokeh.io import output_notebook, show\n", "from bokeh.plotting import figure\n", "output_notebook() # alternatively one could use output_file()" ] }, { "cell_type": "markdown", "id": "2de202c3", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Line plot:" ] }, { "cell_type": "code", "execution_count": null, "id": "9ca80c6f", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# create a new plot with default tools, using figure\n", "pline = figure(plot_width=400, plot_height=400)\n", "\n", "line_options = dict(line_width=2)\n", "pline.line(dline.x, dline.y, **line_options)\n", "\n", "marker_options = dict(size=dline.size, color=dline.out_color, \n", " fill_color=dline.fill_color, fill_alpha=dline.fill_alpha)\n", "circ = pline.circle(dline.x, dline.y, **marker_options)\n", "\n", "show(pline)" ] }, { "cell_type": "markdown", "id": "8f0cf23a", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Histogram:" ] }, { "cell_type": "code", "execution_count": null, "id": "d9b0e09d", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "hist_options = dict(fill_color=\"yellow\", line_color=\"black\", alpha=.8)\n", "\n", "phist = figure(title='Bokeh Histogram', plot_width=600, plot_height=400,\n", " background_fill_color=\"#2a4f32\")\n", "\n", "phist.quad(top=dhist.hist, bottom=0, left=dhist.edges[:-1], right=dhist.edges[1:], **hist_options)\n", "phist.ygrid.grid_line_color = None\n", "\n", "show(phist)" ] }, { "cell_type": "markdown", "id": "35272923", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Setting properties" ] }, { "cell_type": "markdown", "id": "ff0a7ded", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Figure and object properties can be very easily customised:" ] }, { "cell_type": "code", "execution_count": null, "id": "5fe9b396", "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "#set figure properties\n", "pline.title = 'Line Plot'\n", "pline.xgrid.grid_line_color = 'red'\n", "pline.yaxis.axis_label = 'Y Axis'\n", "pline.outline_line_width = 2\n", "\n", "#set glyph properties\n", "#recall: circ = p_line.circle(data.x, data.y, **marker_options)\n", "circ.glyph.line_color = \"indigo\"\n", "circ.glyph.line_dash = [3,1]\n", "circ.glyph.line_width = 4\n", "\n", "show(pline)" ] }, { "cell_type": "markdown", "id": "95b43242", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "One can search for specific properties in the documentation or else do:" ] }, { "cell_type": "code", "execution_count": null, "id": "89ccb53e", "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "from bokeh.models import Axis\n", "print([x for x in vars(Axis) if x[:1] != \"_\"])" ] }, { "cell_type": "markdown", "id": "d9ced292", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The same idea can be applied to ```Title```, ```Legend```, ```Toolbar```, ... [[more about models](https://docs.bokeh.org/en/latest/docs/reference/models.html)]" ] }, { "cell_type": "markdown", "id": "e698d817", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Not everything is perfect..." ] }, { "cell_type": "markdown", "id": "8d21b6f1", "metadata": {}, "source": [ "- Less customisation than ```matplotlib```" ] }, { "cell_type": "markdown", "id": "4a1514c8", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- High-level charts were deprecated. Possible alternatives:\n", " - [HoloViews](https://holoviews.org/index.html)\n", " - [Chartify](https://github.com/spotify/chartify) (virtually no documentation, one [tutorial](https://github.com/spotify/chartify/blob/master/examples/Chartify%20Tutorial.ipynb))" ] }, { "cell_type": "markdown", "id": "531c2564", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "\n", "- The flexibility/time tradeoff might not be optimal in some scenarios (*e.g.* quick interactive plotting)" ] }, { "cell_type": "markdown", "id": "8e4d8d2b", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- No native 3D plots available\n", " - it [can be done](https://docs.bokeh.org/en/latest/docs/user_guide/extensions_gallery/wrapping.html#userguide-extensions-examples-wrapping), but it is way too cumbersome" ] }, { "cell_type": "markdown", "id": "a3931ad3", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- No support for inset plots (which ```matplotlib``` [supports](https://matplotlib.org/1.3.1/mpl_toolkits/axes_grid/users/overview.html#insetlocator) ): current [feature request](https://github.com/bokeh/bokeh/issues/3821)" ] }, { "cell_type": "markdown", "id": "c44bb230", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "##### Other ```bokeh``` features not explored in this tutorial:\n", "\n", "- [data streaming](https://docs.bokeh.org/en/latest/docs/user_guide/data.html#appending-data-to-a-columndatasource)\n", "- [mapping geo data](https://docs.bokeh.org/en/latest/docs/user_guide/geo.html)\n", "- [embed plots in websites](https://docs.bokeh.org/en/latest/docs/user_guide/embed.html)\n", "- [network graph visualization](https://docs.bokeh.org/en/latest/docs/user_guide/graph.html#userguide-graph)" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "PyHEP21", "language": "python", "name": "pyhep21" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }