{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " \n", " \n", "

Bokeh Tutorial

\n", "
\n", "\n", "

03. Data Sources and Transformations

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Imports and Setup\n", "\n", "First, let's make the standard imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from bokeh.io import output_notebook, show\n", "from bokeh.plotting import figure" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "output_notebook()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook uses Bokeh sample data. If you haven't downloaded it already, this can be downloaded by running the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "import bokeh.sampledata\n", "bokeh.sampledata.download()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Overview\n", "\n", "We've seen how Bokeh can work well with Python lists, NumPy arrays, Pandas series, etc. At lower levels, these inputs are converted to a Bokeh `ColumnDataSource`. This data type is the central data source object used throughout Bokeh. Although Bokeh often creates them for us transparently, there are times when it is useful to create them explicitly.\n", "\n", "In later sections we will see features like hover tooltips, computed transforms, and CustomJS interactions that make use of the `ColumnDataSource`, so let's take a quick look now. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating with Python Dicts\n", "\n", "The `ColumnDataSource` can be imported from `bokeh.models`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from bokeh.models import ColumnDataSource" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `ColumnDataSource` is a mapping of column names (strings) to sequences of values. Here is a simple example. The mapping is provided by passing a Python `dict` with string keys and simple Python lists as values. The values could also be NumPy arrays, or Pandas sequences.\n", "\n", "***NOTE: ALL the columns in a `ColumnDataSource` must always be the SAME length.***\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "source = ColumnDataSource(data={\n", " 'x' : [1, 2, 3, 4, 5],\n", " 'y' : [3, 7, 8, 5, 1],\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Up until now we have called functions like `p.circle` by passing in literal lists or arrays of data directly, when we do this, Bokeh creates a `ColumnDataSource` for us, automatically. But it is possible to specify a `ColumnDataSource` explicitly by passing it as the `source` argument to a glyph method. Whenever we do this, if we want a property (like `\"x\"` or `\"y\"` or `\"fill_color\"`) to have a sequence of values, we pass the ***name of the column*** that we would like to use for a property:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "p = figure(width=400, height=400)\n", "p.circle('x', 'y', size=20, source=source)\n", "show(p)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Exercise: create a column data source with NumPy arrays as column values and plot it\n", "\n", "import numpy as np\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating with Pandas DataFrames\n", "\n", "It's also simple to create `ColumnDataSource` objects directly from Pandas data frames. To do this, just pass the data frame to `ColumnDataSource` when you create it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from bokeh.sampledata.penguins import data as df\n", "\n", "source = ColumnDataSource(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use it as we did above by passing the column names to glyph methods:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "p = figure(width=400, height=400)\n", "p.circle('flipper_length_mm', 'body_mass_g', source=source)\n", "show(p)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Exercise: create a column data source with the autompg sample data frame and plot it\n", "\n", "from bokeh.sampledata.autompg import autompg_clean as df\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Automatic Conversion\n", "\n", "If you do not need to share data sources, it may be convenient to pass dicts, Pandas `DataFrame` or `GroupBy` objects directly to glhyph methods, without explicitly creating a `ColumnDataSource`. In this case, a `ColumnDataSource` will be created automatically." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from bokeh.sampledata.penguins import data as df\n", "\n", "p = figure(width=400, height=400)\n", "p.circle('flipper_length_mm', 'body_mass_g', source=df)\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transformations\n", "\n", "In addition to being configured with names of columns from data sources, glyph properties may also be configured with transform objects that represent transformations of columns. These live in the `bokeh.transform` module. It is important to note that when doing using these objects, the tranformations occur *in the browser, not in Python*. \n", "\n", "The first transform we look at is the `cumsum` transform, which can generate a new sequence of values from a data source column by cumulatively summing the values in the column. This can be usefull for pie or donut type charts as seen below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from math import pi\n", "import pandas as pd\n", "from bokeh.palettes import Category20c\n", "from bokeh.transform import cumsum\n", "\n", "x = { 'United States': 157, 'United Kingdom': 93, 'Japan': 89, 'China': 63,\n", " 'Germany': 44, 'India': 42, 'Italy': 40, 'Australia': 35, 'Brazil': 32,\n", " 'France': 31, 'Taiwan': 31, 'Spain': 29 }\n", "\n", "data = pd.Series(x).reset_index(name='value').rename(columns={'index':'country'})\n", "data['color'] = Category20c[len(x)]\n", "\n", "# represent each value as an angle = value / total * 2pi\n", "data['angle'] = data['value']/data['value'].sum() * 2*pi\n", "\n", "p = figure(height=350, title=\"Pie Chart\", toolbar_location=None,\n", " tools=\"hover\", tooltips=\"@country: @value\")\n", "\n", "p.wedge(x=0, y=1, radius=0.4, \n", " \n", " # use cumsum to cumulatively sum the values for start and end angles\n", " start_angle=cumsum('angle', include_zero=True), end_angle=cumsum('angle'),\n", " line_color=\"white\", fill_color='color', legend_field='country', source=data)\n", "\n", "p.axis.axis_label=None\n", "p.axis.visible=False\n", "p.grid.grid_line_color = None\n", "\n", "show(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next transform we look at is the `linear_cmap` transform, which can generate a new sequence of colors by applying a linear colormapping to a data source column. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from bokeh.transform import linear_cmap\n", "\n", "N = 4000\n", "data = dict(x=np.random.random(size=N) * 100,\n", " y=np.random.random(size=N) * 100,\n", " r=np.random.random(size=N) * 1.5)\n", "\n", "p = figure()\n", "\n", "p.circle('x', 'y', radius='r', source=data, fill_alpha=0.6,\n", " \n", " # color map based on the x-coordinate\n", " color=linear_cmap('x', 'Viridis256', 0, 100))\n", "\n", "show(p) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Change the code above to use `log_cmap` and observe the results. Try changing `low` and `high` and specificying `low_color` and `high_color`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Exercise: use the corresponding factor_cmap to color map a scatter plot of the penguin data set\n", "\n", "from bokeh.sampledata.penguins import data\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Next Section" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Click on this link to go to the next notebook: [04 - Adding Annotations](04%20-%20Adding%20Annotations.ipynb).\n", "\n", "To go back to the overview, click [here](00%20-%20Introduction%20and%20Setup.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 4 }