{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Annotating Your Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import holoviews as hv\n", "hv.extension('bokeh')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As introduced in the [Getting Started guide](../getting_started/1-Introduction.ipynb), HoloViews relies heavily on semantic *annotations*, i.e., metadata you declare that lets HoloViews interpret what your data represents. With these annotations, HoloViews can perform complex tasks like visualization automatically.\n", "\n", "There are three main kinds of annotation that can be associated with each element:\n", " 1. **Type**, used to declare the sort of data you have, which is required before it can be visualized,\n", " 2. **Dimensions**, used to specify the abstract space in which the data resides, allowing axis labeling and indexing, and\n", " 3. **Group/Label**, used to declare a meaningful category and human-readable description of the element, allowing plot labeling and selecting related sets of elements.\n", "\n", "This user guide explains each of these three types of annotation, describing why you would need or want to use them. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Specifying element type\n", "\n", "Basic Python data structures like dataframes, arrays, lists, and dictionaries can be used to represent an infinite variety of different types of data, and thus they cannot be visualized as any particular type of graphical representation without some additional information from the user that says what sort of data it is meant to be. The user can declare this information by selecting a suitable HoloViews element type from the many different ones available (see the [Reference Gallery](http://holoviews.org/reference/index.html)). \n", "\n", "For instance, let's say you have two lists of numbers:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xs = range(-10,11)\n", "ys = [100-x**2 for x in xs]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As far as Python is concerned, ``xs`` and ``ys`` are just two arbitrary lists, which could represent nearly anything imaginable. But we as humans can see that each of the ``ys`` is a value computed from one of the ``xs`` by evaluating the function $y=100-x^2$. We can convey some of that information to HoloViews by choosing a ``Curve`` element type, which is a convenient shorthand for \"a discrete set of real-valued samples from a continuous function of one real-valued variable\":" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "curve = hv.Curve((xs, ys))\n", "curve" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, declaring the element type is the only *required* bit of annotation, instantly making your data visualizable. However, this initial visualization relies on various defaults that may not be appropriate for your data, and you can override these defaults by declaring additional annotations as described below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Specifying element dimensionality\n", "\n", "Each element type can process a certain number and type of *dimensions*, i.e., ways in which the data can vary. For instance, the ``Curve`` object above has two dimensions, $x$ and $y$. If you look at how we generated the data, you can see that these two dimensions are semantically different -- we chose an arbitrary set of values for the ``xs``, and then calculated a corresponding value to make each of the ``ys``. In mathematical terms, $x$ is thus an independent variable (selected by the creator of the data), and $y$ is a dependent variable (typically measured or calculated from the independent variable(s)).\n", "\n", "HoloViews elements call these two different types of variables *key dimensions* (``kdims``) and *value dimensions* (``vdims``). The *key dimensions* are the dimensions you can index *by* to get the values corresponding to the *value* dimensions. You can learn more about indexing data in the later [Indexing and Selecting Data](./10-Indexing_and_Selecting_Data.ipynb) user guide.\n", "\n", "Different elements have different numbers of required key dimensions and value dimensions. For instance, a ``Curve`` always has one key dimension and one value dimension. As we did not explicitly specify anything regarding dimensions when declaring the curve above, the ``kdims`` and ``vidms`` use their default names 'x' and 'y':" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"Object 'curve' has kdims {kdims} and vdims {vdims}\".format(kdims=curve.kdims, vdims=curve.vdims)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The easiest way to override the default dimension names is to provide strings for the dimensions, where the second argument in the Element constructor will always be the ``kdims``, and the third will always be the ``vdims``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "trajectory = hv.Curve((xs, ys), 'distance', 'height')\n", "trajectory" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\"Object 'trajectory' has kdims {kdims} and vdims {vdims} \".format(kdims=trajectory.kdims, vdims=trajectory.vdims)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the strings we provided have been 'promoted' to dimension objects. The ``kdims`` and ``vdims`` *always* contain instances of the ``Dimension`` class, described in the following section. Here, the immediate effect is to use the new names for the displayed axis labels." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dimension parameters\n", "\n", "``Dimensions`` are not just names, they are rich objects with numerous parameters that can be used to describe the space in which the data resides. Only two of these are considered *core* parameters the dimension object; the rest are auxiliary metadata. The most important parameters are:\n", "\n", "
\n", "
\n", "
name
(core) A concise name for the dimension, which for convenient usage as a keyword argument should usually be a legal Python identifier. The name also corresponds to the name of the variable in the underlying data, e.g. when providing a dictionary of columns or a DataFrame.
\n", "
label
(core) A optional longer description of the dimension, which is convenient if you want the displayed label to contain arbitrary spaces, symbols, or unicode. By default this is identical to the name but if provided this value uniquely identifies the dimension.
\n", "
range
The minimum and maximum allowable values for the dimension, for error checking and generating widgets when needed.
\n", "
soft_range
Suggested minimum and maximum values within the allowed range, used to specify a useful portion of the range for widgets and animations.
\n", "
step
Suggested interval for sampling a continuous range, if needed for a widget or animation.
\n", "
unit
The name of the unit to be associated with the dimension, if any, for labelling.
\n", "
values
Explicit list of allowed dimension values, for error checking, widgets, and animations.
\n", "
\n", "\n", "\n", "For the full list of parameters, you can call ``hv.help(hv.Dimension)``.\n", "\n", "Similar to how you can just use a string if all you want to specify is the name, you can provide a ``(name,label)`` tuple if you want to specify the ``name`` and the ``label`` to ``kdims`` and ``vdims`` without building an explicit ``Dimension``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "wo_unit = hv.Curve((xs, ys), \n", " ('distance','Horizontal distance'), \n", " ('height','Height above sea level'))\n", "\n", "distance = hv.Dimension('distance', label='Horizontal distance', unit='m')\n", "height = hv.Dimension(('height', 'Height above sea level'), unit='m')\n", "with_unit = hv.Curve((xs, ys), distance, height)\n", "\n", "# (using + to compose elements is described in the next guide)\n", "wo_unit + with_unit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that after supplying the longer labels, you can still use the short name to specify the dimension in keyword arguments. For instance, try using ``with_unit.select(distance=(5,8))`` in the cell above." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting properties with redim\n", "\n", "Declaring dimension objects with appropriate parameters can be awkward and verbose if you only want to set a few specific parameters. You can often avoid declaring explicit dimension objects using the ``redim`` method, which returns a *clone* of the element: the same data, wrapped in a new instance of the same element type with the new dimension settings.\n", "\n", "Let's use ``redim`` to swap out the 'height' dimension for an 'altitude' dimension:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "renamed_height = trajectory.redim(height='altitude')\n", "renamed_height" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ``redim`` \"method\" is actually a utility that can be used to set any of the dimension parameters, such as the label, unit, range, or values. For instance, the label can be updated on an existing object by specifying the dimension name and then the new value for that parameter:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "renamed_height.redim.label(altitude='Altitude above sea-level', distance='Horizontal distance')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Organizing your elements with groups and labels\n", "\n", "A complex visualization you build with HoloViews may include many instances of the same element type, each built from different bits of data and potentially representing categorically distinct types of information to you. To help you keep track of these distinctions when you need to, HoloViews provides a ``group`` parameter you can use to declare semantically distinct categories for elements, and a ``label`` parameter you can use to identify which specific item the element represents within that category:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "low_ys = [25-(0.5*el)**2 for el in xs]\n", "shallow = hv.Curve((xs, low_ys), group='Trajectory', label='Shallow')\n", "medium = hv.Curve((xs, ys), group='Trajectory', label='Medium')\n", "shallow + medium" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, the ``group`` and ``label`` information will be used to generate sensible titles, here indicating that both sets of data represent trajectories, and that there are two different specific trajectories being shown. Once the group and/or label have been specified, they can be used for [Applying Customization](./03-Applying_Customizations.ipynb) (e.g. to make all trajectories have the same line width and style, or to customize one particular plot out of many of the same type). The group and label are also used for indexing, as we will see in the following [Composing_Elements](./02-Composing_Elements.ipynb) guide." ] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 4 }