{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Annotating Your Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import holoviews as hv\n",
"hv.extension('bokeh')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As introduced in the [Getting Started guide](../getting_started/1-Introduction.ipynb), HoloViews relies heavily on semantic *annotations*, i.e., metadata you declare that lets HoloViews interpret what your data represents. With these annotations, HoloViews can perform complex tasks like visualization automatically.\n",
"\n",
"There are three main kinds of annotation that can be associated with each element:\n",
" 1. **Type**, used to declare the sort of data you have, which is required before it can be visualized,\n",
" 2. **Dimensions**, used to specify the abstract space in which the data resides, allowing axis labeling and indexing, and\n",
" 3. **Group/Label**, used to declare a meaningful category and human-readable description of the element, allowing plot labeling and selecting related sets of elements.\n",
"\n",
"This user guide explains each of these three types of annotation, describing why you would need or want to use them. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Specifying element type\n",
"\n",
"Basic Python data structures like dataframes, arrays, lists, and dictionaries can be used to represent an infinite variety of different types of data, and thus they cannot be visualized as any particular type of graphical representation without some additional information from the user that says what sort of data it is meant to be. The user can declare this information by selecting a suitable HoloViews element type from the many different ones available (see the [Reference Gallery](http://holoviews.org/reference/index.html)). \n",
"\n",
"For instance, let's say you have two lists of numbers:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"xs = range(-10,11)\n",
"ys = [100-x**2 for x in xs]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As far as Python is concerned, ``xs`` and ``ys`` are just two arbitrary lists, which could represent nearly anything imaginable. But we as humans can see that each of the ``ys`` is a value computed from one of the ``xs`` by evaluating the function $y=100-x^2$. We can convey some of that information to HoloViews by choosing a ``Curve`` element type, which is a convenient shorthand for \"a discrete set of real-valued samples from a continuous function of one real-valued variable\":"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"curve = hv.Curve((xs, ys))\n",
"curve"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, declaring the element type is the only *required* bit of annotation, instantly making your data visualizable. However, this initial visualization relies on various defaults that may not be appropriate for your data, and you can override these defaults by declaring additional annotations as described below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Specifying element dimensionality\n",
"\n",
"Each element type can process a certain number and type of *dimensions*, i.e., ways in which the data can vary. For instance, the ``Curve`` object above has two dimensions, $x$ and $y$. If you look at how we generated the data, you can see that these two dimensions are semantically different -- we chose an arbitrary set of values for the ``xs``, and then calculated a corresponding value to make each of the ``ys``. In mathematical terms, $x$ is thus an independent variable (selected by the creator of the data), and $y$ is a dependent variable (typically measured or calculated from the independent variable(s)).\n",
"\n",
"HoloViews elements call these two different types of variables *key dimensions* (``kdims``) and *value dimensions* (``vdims``). The *key dimensions* are the dimensions you can index *by* to get the values corresponding to the *value* dimensions. You can learn more about indexing data in the later [Indexing and Selecting Data](./10-Indexing_and_Selecting_Data.ipynb) user guide.\n",
"\n",
"Different elements have different numbers of required key dimensions and value dimensions. For instance, a ``Curve`` always has one key dimension and one value dimension. As we did not explicitly specify anything regarding dimensions when declaring the curve above, the ``kdims`` and ``vidms`` use their default names 'x' and 'y':"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"Object 'curve' has kdims {kdims} and vdims {vdims}\".format(kdims=curve.kdims, vdims=curve.vdims)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The easiest way to override the default dimension names is to provide strings for the dimensions, where the second argument in the Element constructor will always be the ``kdims``, and the third will always be the ``vdims``:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"trajectory = hv.Curve((xs, ys), 'distance', 'height')\n",
"trajectory"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"Object 'trajectory' has kdims {kdims} and vdims {vdims} \".format(kdims=trajectory.kdims, vdims=trajectory.vdims)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the strings we provided have been 'promoted' to dimension objects. The ``kdims`` and ``vdims`` *always* contain instances of the ``Dimension`` class, described in the following section. Here, the immediate effect is to use the new names for the displayed axis labels."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Dimension parameters\n",
"\n",
"``Dimensions`` are not just names, they are rich objects with numerous parameters that can be used to describe the space in which the data resides. Only two of these are considered *core* parameters the dimension object; the rest are auxiliary metadata. The most important parameters are:\n",
"\n",
"
\n",
"