# <span style="color:green">Data Visualization with Bokeh</span>

### <span style="color:brown">PyHEP 2021 (virtual) Workshop</span>

### <span style="color:salmon">Author:</span> Bruno Alves | <span style="color:salmon">Date:</span> 6 July 2021

# Disclaimers

1. This tutorial is heavily opinionated: the definition of "best" plotting library can vary 
 - ```bokeh``` is the best for me, and it could be the best for you too

2. I am in now way involved with the development of ```bokeh```; I am simply a user, just like most of you          
   - I have been using ```bokeh``` for the past ~3 years

# Motivation


 1. Get people to know, enjoy and use ```bokeh```

  - Does not seem to be popular in HEP. However:
      - LHCb uses it for [data quality monitoring](https://cds.cern.ch/record/2298467)
      - It was [mentioned](https://arxiv.org/abs/1811.10309) by the [HEP Software Foundation](https://hepsoftwarefoundation.org/) (but dismissed; fortunately their reasons are now completely outdated)

 - As other plotting alternatives, it is shadowed by the ubiquitousness of ```matplotlib``` 

 2. ```bokeh``` code, when compared to ```matplotlib``` (personal opinion, of course):

 - is more readable

 - is easier to write without constantly resorting to the documentation
     - ```mpl```'s docs are unreliable

 - gives simple interactive plots for free

 - can be used for easily creating and sharing complex and virtually unlimited interactive visualizations/dashboards

```matplotlib``` is still more popular because:

  - it is older (started in 2003, vs. 2013 for ```bokeh```) and has more features than current alternatives

   - people have the tendency to resist change

   - most default examples for anything on StackOverflow use ```matplotlib```:

In [None]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
output_notebook()

nquestions=[59322, 4355, 767, 127]
libs=['mpl', 'bokeh', 'altair', 'plotnine']

p = figure(plot_height=600, plot_width=800,
           title='Histogram', 
           x_range=libs)
p.vbar(x=libs, top=nquestions, width=0.9)
p.yaxis.axis_label = 'Number of questions posted on SO'
show(p) 

# Basic plotting

We start by some definitions to be used by multiple libraries:

In [None]:
import numpy as np
from types import SimpleNamespace

#data for line plots
dline = SimpleNamespace( x=[1,2,3,4,5,6,7,8,9], 
                         y=[6,7,2,8,9,3,4,5,1],
                         size=15,
                         line_color='blue',
                         out_color='red', 
                         fill_color='orange',
                         fill_alpha=1 )

#data for histograms
mu, sigma, npoints = 0, 0.5, 1000
nbins = 35
dhist = np.random.normal(mu, sigma, npoints)
hist_, edges_ = np.histogram(dhist, density=False, bins=nbins)
dhist = SimpleNamespace( data=dhist, hist=hist_, edges=edges_, nbins=nbins)

## ```matplotlib```

In [None]:
import matplotlib.pyplot as plt

#### Line plot:

In [None]:
%matplotlib inline
# the following requires ipympl 
# %matplotlib widget

fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111)
ax.set_ylabel('Y')
plt.title('Histogram')

plt_marker_options = dict(s=10*dline.size, color=dline.fill_color, marker='o',
                          edgecolor=dline.out_color,
                          alpha=dline.fill_alpha)

plt.plot(dline.x, dline.y, color=dline.line_color)
plt.scatter(dline.x, dline.y, **plt_marker_options)
plt.show()

#### Histogram ([multiple APIs](https://matplotlib.org/stable/api/index.html)):

In [None]:
plt.hist(dhist.data, bins=dhist.nbins)
plt.show()

In [None]:
#fig, ax = plt.subplots(figsize=(5,4))
fig = plt.figure()
ax = fig.add_subplot(111)
ax.hist(dhist.data, bins=dhist.nbins)
plt.show()
#we can create Figure and Axes instances explicitly

- I find ```matplotlib``` hard to use without constantly going back to the documentation, even for simple tasks

- However, ```matplotlib``` is more mature and complete, being the oldest. In addition, some wrappers on top of it provide additional convenient functionalities, such as ```mplhep```.

- Unless what you want to do only exists in ```matplotlib```, I would suggest using ```bokeh``` for everything, including simple plots.

## ```bokeh```

- built around glyphs
- relies on a "layered" approach ("grammar of graphics"), but mostly ignores data transformations

In [None]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
output_notebook() # alternatively one could use output_file()

#### Line plot:

In [None]:
# create a new plot with default tools, using figure
pline = figure(plot_width=400, plot_height=400)

line_options = dict(line_width=2)
pline.line(dline.x, dline.y, **line_options)

marker_options = dict(size=dline.size, color=dline.out_color, 
                      fill_color=dline.fill_color, fill_alpha=dline.fill_alpha)
circ = pline.circle(dline.x, dline.y, **marker_options)

show(pline)

#### Histogram:

In [None]:
hist_options = dict(fill_color="yellow", line_color="black", alpha=.8)

phist = figure(title='Bokeh Histogram', plot_width=600, plot_height=400,
                background_fill_color="#2a4f32")

phist.quad(top=dhist.hist, bottom=0, left=dhist.edges[:-1], right=dhist.edges[1:], **hist_options)
phist.ygrid.grid_line_color = None

show(phist)

## Setting properties

Figure and object properties can be very easily customised:

In [None]:
#set figure properties
pline.title = 'Line Plot'
pline.xgrid.grid_line_color = 'red'
pline.yaxis.axis_label = 'Y Axis'
pline.outline_line_width = 2

#set glyph properties
#recall: circ = p_line.circle(data.x, data.y, **marker_options)
circ.glyph.line_color = "indigo"
circ.glyph.line_dash = [3,1]
circ.glyph.line_width = 4

show(pline)

One can search for specific properties in the documentation or else do:

In [None]:
from bokeh.models import Axis
print([x for x in vars(Axis) if x[:1] != "_"])

The same idea can be applied to ```Title```, ```Legend```, ```Toolbar```, ... [[more about models](https://docs.bokeh.org/en/latest/docs/reference/models.html)]

## Not everything is perfect...

- Less customisation than ```matplotlib```

- High-level charts were deprecated. Possible alternatives:
    - [HoloViews](https://holoviews.org/index.html)
    - [Chartify](https://github.com/spotify/chartify) (virtually no documentation, one [tutorial](https://github.com/spotify/chartify/blob/master/examples/Chartify%20Tutorial.ipynb))


- The flexibility/time tradeoff might not be optimal in some scenarios (*e.g.* quick interactive plotting)

- No native 3D plots available
    - it [can be done](https://docs.bokeh.org/en/latest/docs/user_guide/extensions_gallery/wrapping.html#userguide-extensions-examples-wrapping), but it is way too cumbersome

- No support for inset plots (which ```matplotlib``` [supports](https://matplotlib.org/1.3.1/mpl_toolkits/axes_grid/users/overview.html#insetlocator) ): current [feature request](https://github.com/bokeh/bokeh/issues/3821)

##### Other ```bokeh``` features not explored in this tutorial:

- [data streaming](https://docs.bokeh.org/en/latest/docs/user_guide/data.html#appending-data-to-a-columndatasource)
- [mapping geo data](https://docs.bokeh.org/en/latest/docs/user_guide/geo.html)
- [embed plots in websites](https://docs.bokeh.org/en/latest/docs/user_guide/embed.html)
- [network graph visualization](https://docs.bokeh.org/en/latest/docs/user_guide/graph.html#userguide-graph)