{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", "

Bokeh Tutorial

\n", "
\n", "\n", "

10. High Level Charts

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This section covers the `bokeh.charts` interface, which is a high-level API that is especially useful for exploratory data analysis (for instance, in a Jupyter notebook). It provides functions for quickly producing many standard chart types, often with a single line of code. We will look at the following types in this notebook:\n", "\n", "* [Scatter Plot](#Scatter-Plot)\n", "* [Bar Chart](#Bar-Chart)\n", "* [Histogram](#Histogram)\n", "* [Box Plot](#Box-Plot)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " Loading BokehJS ...\n", "
" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", "(function(global) {\n", " function now() {\n", " return new Date();\n", " }\n", "\n", " var force = \"1\";\n", "\n", " if (typeof (window._bokeh_onload_callbacks) === \"undefined\" || force !== \"\") {\n", " window._bokeh_onload_callbacks = [];\n", " window._bokeh_is_loading = undefined;\n", " }\n", "\n", "\n", " \n", " if (typeof (window._bokeh_timeout) === \"undefined\" || force !== \"\") {\n", " window._bokeh_timeout = Date.now() + 5000;\n", " window._bokeh_failed_load = false;\n", " }\n", "\n", " var NB_LOAD_WARNING = {'data': {'text/html':\n", " \"
\\n\"+\n", " \"

\\n\"+\n", " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n", " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n", " \"

\\n\"+\n", " \"\\n\"+\n", " \"\\n\"+\n", " \"from bokeh.resources import INLINE\\n\"+\n", " \"output_notebook(resources=INLINE)\\n\"+\n", " \"\\n\"+\n", " \"
\"}};\n", "\n", " function display_loaded() {\n", " if (window.Bokeh !== undefined) {\n", " Bokeh.$(\"#372ab4ab-0122-4887-9191-3081b2834bee\").text(\"BokehJS successfully loaded.\");\n", " } else if (Date.now() < window._bokeh_timeout) {\n", " setTimeout(display_loaded, 100)\n", " }\n", " }\n", "\n", " function run_callbacks() {\n", " window._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n", " delete window._bokeh_onload_callbacks\n", " console.info(\"Bokeh: all callbacks have finished\");\n", " }\n", "\n", " function load_libs(js_urls, callback) {\n", " window._bokeh_onload_callbacks.push(callback);\n", " if (window._bokeh_is_loading > 0) {\n", " console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n", " return null;\n", " }\n", " if (js_urls == null || js_urls.length === 0) {\n", " run_callbacks();\n", " return null;\n", " }\n", " console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n", " window._bokeh_is_loading = js_urls.length;\n", " for (var i = 0; i < js_urls.length; i++) {\n", " var url = js_urls[i];\n", " var s = document.createElement('script');\n", " s.src = url;\n", " s.async = false;\n", " s.onreadystatechange = s.onload = function() {\n", " window._bokeh_is_loading--;\n", " if (window._bokeh_is_loading === 0) {\n", " console.log(\"Bokeh: all BokehJS libraries loaded\");\n", " run_callbacks()\n", " }\n", " };\n", " s.onerror = function() {\n", " console.warn(\"failed to load library \" + url);\n", " };\n", " console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n", " document.getElementsByTagName(\"head\")[0].appendChild(s);\n", " }\n", " };var element = document.getElementById(\"372ab4ab-0122-4887-9191-3081b2834bee\");\n", " if (element == null) {\n", " console.log(\"Bokeh: ERROR: autoload.js configured with elementid '372ab4ab-0122-4887-9191-3081b2834bee' but no matching script tag was found. \")\n", " return false;\n", " }\n", "\n", " var js_urls = ['https://cdn.pydata.org/bokeh/dev/bokeh-0.12.2rc3.min.js', 'https://cdn.pydata.org/bokeh/dev/bokeh-widgets-0.12.2rc3.min.js', 'https://cdn.pydata.org/bokeh/dev/bokeh-compiler-0.12.2rc3.min.js'];\n", "\n", " var inline_js = [\n", " function(Bokeh) {\n", " Bokeh.set_log_level(\"info\");\n", " },\n", " \n", " function(Bokeh) {\n", " \n", " Bokeh.$(\"#372ab4ab-0122-4887-9191-3081b2834bee\").text(\"BokehJS is loading...\");\n", " },\n", " function(Bokeh) {\n", " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/dev/bokeh-0.12.2rc3.min.css\");\n", " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/dev/bokeh-0.12.2rc3.min.css\");\n", " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/dev/bokeh-widgets-0.12.2rc3.min.css\");\n", " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/dev/bokeh-widgets-0.12.2rc3.min.css\");\n", " }\n", " ];\n", "\n", " function run_inline_js() {\n", " \n", " if ((window.Bokeh !== undefined) || (force === \"1\")) {\n", " for (var i = 0; i < inline_js.length; i++) {\n", " inline_js[i](window.Bokeh);\n", " }if (force === \"1\") {\n", " display_loaded();\n", " }} else if (Date.now() < window._bokeh_timeout) {\n", " setTimeout(run_inline_js, 100);\n", " } else if (!window._bokeh_failed_load) {\n", " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n", " window._bokeh_failed_load = true;\n", " } else if (!force) {\n", " var cell = $(\"#372ab4ab-0122-4887-9191-3081b2834bee\").parents('.cell').data().cell;\n", " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n", " }\n", "\n", " }\n", "\n", " if (window._bokeh_is_loading === 0) {\n", " console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n", " run_inline_js();\n", " } else {\n", " load_libs(js_urls, function() {\n", " console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n", " run_inline_js();\n", " });\n", " }\n", "}(this));" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from bokeh.io import output_notebook, show\n", "output_notebook()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Scatter Plot\n", "\n", "A high-level scatter plot is provided by [`bokeh.charts.Scatter`]().\n", "\n", "For this section will use the \"iris\" data set. First let's import it and take a look at a few rows:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sepal_lengthsepal_widthpetal_lengthpetal_widthspecies
05.13.51.40.2setosa
14.93.01.40.2setosa
24.73.21.30.2setosa
34.63.11.50.2setosa
45.03.61.40.2setosa
\n", "
" ], "text/plain": [ " sepal_length sepal_width petal_length petal_width species\n", "0 5.1 3.5 1.4 0.2 setosa\n", "1 4.9 3.0 1.4 0.2 setosa\n", "2 4.7 3.2 1.3 0.2 setosa\n", "3 4.6 3.1 1.5 0.2 setosa\n", "4 5.0 3.6 1.4 0.2 setosa" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from bokeh.sampledata.iris import flowers\n", "flowers.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from bokeh.charts import Scatter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A basic scatter chart takes the data (in this case a pandas DataFrame) as the first argument, and specifies the `x` and `y` coordinates for the scatter as the names of columns in the data." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "
\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "p = Scatter(flowers, x='petal_length', y='petal_width')\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By passing a column name for the `color` parameter, you can make `Scatter` automatically color the markers according to the groups in that column. Let's also add a legend by specify its location as the value of a `legend` paramter (in this case `\"top_left\"`)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "
\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "p = Scatter(flowers, x='petal_length', y='petal_width', color='species', legend='top_left')\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By passing a column name for the `marker` parameter, you can make `Scatter` automatically vary the marker shapes according to the groups in that column. Let's try that as an exercise." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# EXERCISE: vary the marker shape by passing a column name as the `marker` keyword argument\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Bar Chart\n", "\n", "A high-level bar chart is provided by [`bokeh.charts.Bar`]()\n", "\n", "For this section, we will use the \"autompg\" data set. Let's import it and take a quick look:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcyldisplhpweightaccelyroriginname
018.08307.0130350412.0701chevrolet chevelle malibu
115.08350.0165369311.5701buick skylark 320
218.08318.0150343611.0701plymouth satellite
316.08304.0150343312.0701amc rebel sst
417.08302.0140344910.5701ford torino
\n", "
" ], "text/plain": [ " mpg cyl displ hp weight accel yr origin name\n", "0 18.0 8 307.0 130 3504 12.0 70 1 chevrolet chevelle malibu\n", "1 15.0 8 350.0 165 3693 11.5 70 1 buick skylark 320\n", "2 18.0 8 318.0 150 3436 11.0 70 1 plymouth satellite\n", "3 16.0 8 304.0 150 3433 12.0 70 1 amc rebel sst\n", "4 17.0 8 302.0 140 3449 10.5 70 1 ford torino" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from bokeh.sampledata.autompg import autompg\n", "autompg.head()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from bokeh.charts import Bar" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A basic bar chart takes the data (again a DataFrame) as the first value, as well as column names for:\n", "\n", "* `label` - a column to group to label the x-axis\n", "* `values` - a column to aggregate values for each group, to give the bar heights\n", "* `agg` - the name of an aggregation to perform over the values (e.g., `\"mean\"`, `\"max\"`, etc.)\n", "\n", "A simple example that also specifies some other properties such as `title` and `legend` is shown below:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "
\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "p = Bar(autompg, label='cyl', values='mpg', agg='max', \n", " title=\"Max MPG by CYL\", legend=None, tools='crosshair')\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By passing another column name as the `group` parameter, the aggregations can be further subdivided by the groups in that column, and the bars grouped visually. The example below demonstrates this, as well as adding a legend by specifying its location:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "
\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "p = Bar(autompg, label='yr', values='mpg', agg='median', group='origin', \n", " title=\"Median MPG by YR, grouped by ORIGIN\", legend='top_left', tools='crosshair')\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, bars for subgroups can be stacked visually, by providing a column name for the `stack` parameter. Let's try that as an exercise." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# EXERCISE: change the chart above to stack the bars with title \"Median MPG by YR, stacked by ORIGIN\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Histogram\n", "\n", "A high-level Histogram is provided by [`bokeh.charts.Histogram`]()\n", "\n", "For this section, we will construct our own synthetic data set that has values generated from two different probability distributions. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
typevalue
995normal-0.301098
996normal-0.740360
997normal0.030623
998normal0.320627
999normal0.049325
0lognormal0.350363
1lognormal0.508560
2lognormal2.078477
3lognormal1.247154
4lognormal0.941148
\n", "
" ], "text/plain": [ " type value\n", "995 normal -0.301098\n", "996 normal -0.740360\n", "997 normal 0.030623\n", "998 normal 0.320627\n", "999 normal 0.049325\n", "0 lognormal 0.350363\n", "1 lognormal 0.508560\n", "2 lognormal 2.078477\n", "3 lognormal 1.247154\n", "4 lognormal 0.941148" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "# build some distributions\n", "mu, sigma = 0, 0.5\n", "normal = pd.DataFrame({'value': np.random.normal(mu, sigma, 1000), 'type': 'normal'})\n", "lognormal = pd.DataFrame({'value': np.random.lognormal(mu, sigma, 1000), 'type': 'lognormal'})\n", "\n", "# create a pandas data frame\n", "df = pd.concat([normal, lognormal])\n", "df[995:1005]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from bokeh.charts import Histogram" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A basic histogram takes the data as the first parameter, and a column name as the `values` parameter. Optionally, you can also specify the number of bins to use by giving a value for the `bins` parameter. The example below shows the distribution of ***all*** the values (both the \"normal\" and \"lognormal\" values). " ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "
\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "hist = Histogram(df, values='value', bins=30)\n", "show(hist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's also possible to generate multiple histograms at once by grouping the data. The column to group by is specified by the `color` parameter (and the histogram for each group is colored differently automatically). Let's try that as an exercise." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# EXERCISE: generate histograms for each \"type\" of distribution, and add a legend to the top left.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Box Plot\n", "\n", "A high-level box plot is provided by [`bokeh.charts.BoxPlot`]()\n", "\n", "For this section we will use the \"iris\" data set again. " ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from bokeh.charts import BoxPlot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A basic box plot takes the data as the first value, as well as column names for:\n", "\n", "* `label` - a column to group to label the x-axis\n", "* `values` - a column to aggregate values for each group\n", "\n", "A simple example that also specifies some other properties such as `title` and `legend` is shown below:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "
\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "p = BoxPlot(flowers, label='species', values='petal_width', tools='crosshair', color='#aa4444',\n", " xlabel='', ylabel='petal width, mm', title='Distributions of petal widths')\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of a single color, the box and whiskers groups can be colored by grouping one of the columns. This is done by passing a column name as the `color` parameter. Let's try that as an exercise." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# EXERCISE: color the boxes by \"species\" and add a legend to the top left\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "# Further reading\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "http://nbviewer.jupyter.org/github/bokeh/bokeh/tree/0.11.1/examples/charts/file/\n", "\n", "http://nbviewer.jupyter.org/github/bokeh/bokeh/tree/0.11.1/examples/howto/charts/\n", "\n", "http://nbviewer.jupyter.org/github/bokeh/bokeh-demos/blob/master/presentations/2016-03-pydata-strata/notebooks/Charts.ipynb\n", "\n", "http://nbviewer.jupyter.org/github/bokeh/bokeh-demos/blob/master/presentations/2016-03-pydata-strata/notebooks/Charts%20Demo.ipynb" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }