{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Making Interactive Visualizations with Bokeh\n", "\n", "[Bokeh](http://bokeh.pydata.org/en/latest/) is an interactive Python library for visualizations that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of novel graphics in the style of D3.js, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.\n", "\n", " - To get started using Bokeh to make your visualizations, see the [User Guide](http://bokeh.pydata.org/en/latest/docs/user_guide.html#userguide).\n", " - To see examples of how you might use Bokeh with your own data, check out the [Gallery](http://bokeh.pydata.org/en/latest/docs/gallery.html#gallery).\n", " - A complete API reference of Bokeh is at [Reference Guide](http://bokeh.pydata.org/en/latest/docs/reference.html#refguide).\n", "\n", "The following notebook is intended to illustrate some of Bokeh's interactive utilities and is based on a [post](https://demo.bokehplots.com/apps/gapminder) by software engineer and Bokeh developer [Sarah Bird](https://twitter.com/birdsarah).\n", "\n", "\n", "## Recreating Gapminder's \"The Health and Wealth of Nations\" \n", "\n", "Gapminder started as a spin-off from Professor Hans Rosling’s teaching at the Karolinska Institute in Stockholm. Having encountered broad ignorance about the rapid health improvement in Asia, he wanted to measure that lack of awareness among students and professors. He presented the surprising results from his so-called “Chimpanzee Test” in [his first TED-talk](https://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen) in 2006.\n", "\n", "[![The Best Stats You've Never Seen](http://img.youtube.com/vi/hVimVzgtD6w/0.jpg)](http://www.youtube.com/watch?v=hVimVzgtD6w \"The best stats you've ever seen | Hans Rosling\")\n", "\n", "Rosling's interactive [\"Health and Wealth of Nations\" visualization](http://www.gapminder.org/world) has since become an iconic illustration of how our assumptions about ‘first world’ and ‘third world’ countries can betray us. Mike Bostock has [recreated the visualization using D3.js](https://bost.ocks.org/mike/nations/), and in this lab, we will see that it is also possible to use Bokeh to recreate the interactive visualization in Python.\n", "\n", "\n", "### About Bokeh Widgets\n", "Widgets are interactive controls that can be added to Bokeh applications to provide a front end user interface to a visualization. They can drive new computations, update plots, and connect to other programmatic functionality. When used with the [Bokeh server](http://bokeh.pydata.org/en/latest/docs/user_guide/server.html), widgets can run arbitrary Python code, enabling complex applications. Widgets can also be used without the Bokeh server in standalone HTML documents through the browser’s Javascript runtime.\n", "\n", "To use widgets, you must add them to your document and define their functionality. Widgets can be added directly to the document root or nested inside a layout. There are two ways to program a widget’s functionality:\n", "\n", " - Use the CustomJS callback (see [CustomJS for Widgets](http://bokeh.pydata.org/en/0.12.0/docs/user_guide/interaction.html#userguide-interaction-actions-widget-callbacks). This will work in standalone HTML documents.\n", " - Use `bokeh serve` to start the Bokeh server and set up event handlers with `.on_change` (or for some widgets, `.on_click`).\n", " \n", "### Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "from bokeh.layouts import layout\n", "from bokeh.embed import file_html\n", "\n", "from bokeh.io import show\n", "from bokeh.io import output_notebook \n", "\n", "from bokeh.models import Text\n", "from bokeh.models import Plot\n", "from bokeh.models import Slider\n", "from bokeh.models import Circle\n", "from bokeh.models import Range1d\n", "from bokeh.models import CustomJS\n", "from bokeh.models import HoverTool\n", "from bokeh.models import LinearAxis\n", "from bokeh.models import ColumnDataSource\n", "from bokeh.models import SingleIntervalTicker\n", "\n", "from bokeh.palettes import Spectral6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To display Bokeh plots inline in a Jupyter notebook, use the `output_notebook()` function from bokeh.io. When `show()` is called, the plot will be displayed inline in the next notebook output cell. To save your Bokeh plots, you can use the `output_file()` function instead (or in addition)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "output_notebook()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get the data\n", "\n", "Some of Bokeh examples rely on sample data that is not included in the Bokeh GitHub repository or released packages, due to their size. Once Bokeh is installed, the sample data can be obtained by executing the command in the next cell. The location that the sample data is stored can be configured. By default, data is downloaded and stored to a directory $HOME/.bokeh/data. (The directory is created if it does not already exist.) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import bokeh.sampledata\n", "bokeh.sampledata.download()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare the data \n", " \n", "In order to create an interactive plot in Bokeh, we need to animate snapshots of the data over time from 1964 to 2013. In order to do this, we can think of each year as a separate static plot. We can then use a JavaScript `Callback` to change the data source that is driving the plot. \n", "\n", "#### JavaScript Callbacks\n", "\n", "Bokeh exposes various [callbacks](http://bokeh.pydata.org/en/latest/docs/user_guide/interaction/callbacks.html#userguide-interaction-callbacks), which can be specified from Python, that trigger actions inside the browser’s JavaScript runtime. This kind of JavaScript callback can be used to add interesting interactions to Bokeh documents without the need to use a Bokeh server (but can also be used in conjuction with a Bokeh server). Custom callbacks can be set using a [`CustomJS` object](http://bokeh.pydata.org/en/latest/docs/user_guide/interaction/callbacks.html#customjs-for-widgets) and passing it as the callback argument to a `Widget` object.\n", "\n", "As the data we will be using today is not too big, we can pass all the datasets to the JavaScript at once and switch between them on the client side using a slider widget. \n", "\n", "This means that we need to put all of the datasets together build a single data source for each year. First we will load each of the datasets with the `process_data()` function and do a bit of clean up:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def process_data():\n", " from bokeh.sampledata.gapminder import fertility, life_expectancy, population, regions\n", " # Make the column names ints not strings for handling\n", " columns = list(fertility.columns)\n", " years = list(range(int(columns[0]), int(columns[-1])))\n", " rename_dict = dict(zip(columns, years))\n", "\n", " fertility = fertility.rename(columns=rename_dict)\n", " life_expectancy = life_expectancy.rename(columns=rename_dict)\n", " population = population.rename(columns=rename_dict)\n", " regions = regions.rename(columns=rename_dict)\n", "\n", " # Turn population into bubble sizes. Use min_size and factor to tweak.\n", " scale_factor = 200\n", " population_size = np.sqrt(population / np.pi) / scale_factor\n", " min_size = 3\n", " population_size = population_size.where(population_size >= min_size).fillna(min_size)\n", "\n", " # Use pandas categories and categorize & color the regions\n", " regions.Group = regions.Group.astype('category')\n", " regions_list = list(regions.Group.cat.categories)\n", "\n", " def get_color(r):\n", " return Spectral6[regions_list.index(r.Group)]\n", " regions['region_color'] = regions.apply(get_color, axis=1)\n", "\n", " return fertility, life_expectancy, population_size, regions, years, regions_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we will add each of our sources to the `sources` dictionary, where each key is the name of the year (prefaced with an underscore) and each value is a dataframe with the aggregated values for that year.\n", "\n", "_Note that we needed the prefixing as JavaScript objects cannot begin with a number._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fertility_df, life_expectancy_df, population_df_size, regions_df, years, regions = process_data()\n", "\n", "sources = {}\n", "\n", "region_color = regions_df['region_color']\n", "region_color.name = 'region_color'\n", "\n", "for year in years:\n", " fertility = fertility_df[year]\n", " fertility.name = 'fertility'\n", " life = life_expectancy_df[year]\n", " life.name = 'life' \n", " population = population_df_size[year]\n", " population.name = 'population' \n", " new_df = pd.concat([fertility, life, population, region_color], sort=True, axis=1)\n", " sources['_' + str(year)] = ColumnDataSource(new_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see what's in the `sources` dictionary by running the cell below.\n", "\n", "Later we will be able to pass this `sources` dictionary to the JavaScript Callback. In so doing, we will find that in our JavaScript we have objects named by year that refer to a corresponding `ColumnDataSource`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also create a corresponding `dictionary_of_sources` object, where the keys are integers and the values are the references to our ColumnDataSources from above: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dictionary_of_sources = dict(zip([x for x in years], ['_%s' % x for x in years]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "js_source_array = str(dictionary_of_sources).replace(\"'\", \"\")\n", "js_source_array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have an object that's storing all of our `ColumnDataSources`, so that we can look them up.\n", "\n", "### Build the plot\n", "\n", "First we need to create a `Plot` object. We'll start with a basic frame, only specifying things like plot height, width, and ranges for the axes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xdr = Range1d(1, 9)\n", "ydr = Range1d(20, 100)\n", "plot = Plot(\n", " x_range=xdr,\n", " y_range=ydr,\n", " plot_width=800,\n", " plot_height=400,\n", " outline_line_color=None,\n", " toolbar_location=None, \n", " min_border=20,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What do you expect to see when we call `show()` on the plot that we have created so far? Try it out below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show(plot)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Build the axes\n", "\n", "Next we can make some stylistic modifications to the plot axes (e.g. by specifying the text font, size, and color, and by adding labels), to make the plot look more like the one in Hans Rosling's video." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "AXIS_FORMATS = dict(\n", " major_label_text_font_size=\"10pt\",\n", " major_label_text_font_style=\"normal\",\n", " axis_label_text_font_size=\"10pt\",\n", "\n", " axis_line_color='#AAAAAA',\n", " major_tick_line_color='#AAAAAA',\n", " major_label_text_color='#666666',\n", "\n", " major_tick_line_cap=\"round\",\n", " axis_line_cap=\"round\",\n", " axis_line_width=1,\n", " major_tick_line_width=1,\n", ")\n", "\n", "xaxis = LinearAxis(ticker=SingleIntervalTicker(interval=1), axis_label=\"Children per woman (total fertility)\", **AXIS_FORMATS)\n", "yaxis = LinearAxis(ticker=SingleIntervalTicker(interval=20), axis_label=\"Life expectancy at birth (years)\", **AXIS_FORMATS) \n", "plot.add_layout(xaxis, 'below')\n", "plot.add_layout(yaxis, 'left')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What do you expect to see when we call `show()` on the plot that we have created so far? Experiment with running the below cell, modifying some of the parameters above, and then re-running the cell below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show(plot)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add the background year text\n", "\n", "One of the features of Rosling's animation is that the year appears as the text background of the plot. We will add this feature to our plot first so it will be layered below all the other glyphs (will will be incrementally added, layer by layer, on top of each other until we are finished)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text_source = ColumnDataSource({'year': ['%s' % years[0]]})\n", "text = Text(x=2, y=35, text='year', text_font_size='150pt', text_color='#EEEEEE')\n", "plot.add_glyph(text_source, text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Test out different versions of the background text and see how it changes the plot:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show(plot)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add the bubbles and hover\n", "Next we will add the bubbles using Bokeh's [`Circle`](http://bokeh.pydata.org/en/latest/docs/reference/plotting.html#bokeh.plotting.figure.Figure.circle) glyph. We start from the first year of data, which is our source that drives the circles (the other sources will be used later). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Add the circle\n", "renderer_source = sources['_%s' % years[0]]\n", "circle_glyph = Circle(\n", " x='fertility', y='life', size='population',\n", " fill_color='region_color', fill_alpha=0.8, \n", " line_color='#7c7e71', line_width=0.5, line_alpha=0.5)\n", "\n", "circle_renderer = plot.add_glyph(renderer_source, circle_glyph)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above, `plot.add_glyph` returns the renderer, which we can then pass to the `HoverTool` so that hover only happens for the bubbles on the page and not other glyph elements:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Add the hover (only against the circle and not other plot elements, hover over will display the population)\n", "tooltips = \"@population\"\n", "plot.add_tools(HoverTool(tooltips=tooltips, renderers=[circle_renderer]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Test out different parameters for the `Circle` glyph and see how it changes the plot:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show(plot)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add the legend\n", "\n", "Next we will manually build a legend for our plot by adding circles and texts to the upper-righthand portion:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text_x = 7\n", "text_y = 95\n", "for i, region in enumerate(regions):\n", " plot.add_glyph(Text(x=text_x, y=text_y, text=[region], text_font_size='10pt', text_color='#666666'))\n", " plot.add_glyph(Circle(x=text_x - 0.1, y=text_y + 2, fill_color=Spectral6[i], size=10, line_color=None, fill_alpha=0.8))\n", " text_y = text_y - 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Experiment with different parameters, and test it out by running the below cell:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show(plot)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add the slider and callback\n", "Next we add the slider widget and the JavaScript callback code, which changes the data of the `renderer_source` (powering the bubbles / circles) and the data of the `text_source` (powering our background text). After we've `set()` the data we need to `trigger()` a change. `slider`, `renderer_source`, `text_source` are all available because we add them as args to `Callback`. \n", "\n", "It is the combination of `sources = %s % (js_source_array)` in the JavaScript and `Callback(args=sources...)` that provides the ability to look-up, by year, the JavaScript version of our Python-made `ColumnDataSource`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Add the slider\n", "code = \"\"\"\n", " var year = slider.value,\n", " sources = %s,\n", " new_source_data = sources[year].data;\n", " renderer_source.data = new_source_data;\n", " text_source.data = {'year': [String(year)]};\n", "\"\"\" % js_source_array\n", "\n", "callback = CustomJS(args=sources, code=code)\n", "slider = Slider(start=years[0], end=years[-1], value=1, step=1, title=\"Year\")\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "callback.args[\"renderer_source\"] = renderer_source\n", "callback.args[\"slider\"] = slider\n", "callback.args[\"text_source\"] = text_source\n", "slider.js_on_change(\"value\", callback)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check out what our slider looks like by itself:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show(bokeh.models.Column(slider))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Putting all the pieces together\n", "\n", "Last but not least, we put the chart and the slider together in a layout and display it inline in the notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show(layout([[plot], [slider]], sizing_mode='scale_width'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nice work! \n", "\n", "Next try integrating some of the widgets in your own plots. \n", "\n", "For more on Bokeh see the [User Guide](http://bokeh.pydata.org/en/latest/docs/user_guide.html#userguide), check out examples from the [Gallery](http://bokeh.pydata.org/en/latest/docs/gallery.html#gallery), and learn more about Bokeh's API in the [Reference Guide](http://bokeh.pydata.org/en/latest/docs/reference.html#refguide)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 1 }