# Tutorial: Introduction to Altair

This tutorial will guide you through the process of creating visualizations in Altair. For details on installing Altair or its underlying philosophy, please see the [Altair Documentation](http://altair-viz.github.io/)

Outline:

- [The data](#The-data)
- [The `Chart` object](#The-Chart-object)
- [Data encodings and marks](#Data-encodings-and-marks)
- [Data transformation: Aggregation](#Data-transformation:-Aggregation)
- [Customizing your visualization](#Customizing-your-visualization)
- [Publishing a visualization online](#Publishing-a-visualization-online)

This tutorial is written in the form of a Jupyter Notebook; we suggest downloading the notebook and following along, executing the code yourself as we go. For creating Altair visualizations in the notebook, all that is required is to [install the package and its dependencies](https://altair-viz.github.io/installation.html) and import the Altair namespace:

In [1]:
import altair as alt

## The data

Data in Altair is built around the [Pandas Dataframe](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).
For the purposes of this tutorial, we'll start by importing Pandas and creating a simple `DataFrame` to visualize, with a categorical variable in column `a` and a numerical variable in column `b`:

In [2]:
import pandas as pd
data = pd.DataFrame({'a': list('CCCDDDEEE'),
                     'b': [2, 7, 4, 1, 2, 6, 8, 4, 7]})
data

Unnamed: 0,a,b
0,C,2
1,C,7
2,C,4
3,D,1
4,D,2
5,D,6
6,E,8
7,E,4
8,E,7


In Altair, every dataset should be provided as a `Dataframe`, or as a URL referencing an appropriate dataset (see [Defining Data](https://altair-viz.github.io/user_guide/data.html)).

## The Chart object

The fundamental object in Altair is the ``Chart``. It takes the dataframe as a single argument:

In [3]:
chart = alt.Chart(data)

Fundamentally, a ``Chart`` is an object which knows how to emit a JSON dictionary representing the data and visualization encodings (see below), which can be sent to the notebook and rendered by the Vega-Lite JavaScript library.

Here is what that JSON looks like for the current chart (since the chart is not yet complete, we turn off chart validation):

In [4]:
chart.to_dict(validate=False)

{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},
 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'},
 '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json',
 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2},
   {'a': 'C', 'b': 7},
   {'a': 'C', 'b': 4},
   {'a': 'D', 'b': 1},
   {'a': 'D', 'b': 2},
   {'a': 'D', 'b': 6},
   {'a': 'E', 'b': 8},
   {'a': 'E', 'b': 4},
   {'a': 'E', 'b': 7}]}}

At this point the specification contains only the data and the default configuration, but no visualization specification.

## Chart Marks

Next we can decide what sort of *mark* we would like to use to represent our data.
For example, we can choose the ``point`` mark to represent each data as a point on the plot:

In [5]:
chart = alt.Chart(data).mark_point()
chart

The result is a visualization with one point per row in the data, though it is not a particularly interesting: all the points are stacked right on top of each other!
To see how this affects the specification, we can once again examine the dictionary representation:

In [6]:
chart.to_dict()

{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},
 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'},
 'mark': 'point',
 '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json',
 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2},
   {'a': 'C', 'b': 7},
   {'a': 'C', 'b': 4},
   {'a': 'D', 'b': 1},
   {'a': 'D', 'b': 2},
   {'a': 'D', 'b': 6},
   {'a': 'E', 'b': 8},
   {'a': 'E', 'b': 4},
   {'a': 'E', 'b': 7}]}}

Notice that now in addition to the data, the specification includes information about the mark type.

## Data encodings

The next step is to add *visual encodings* (or *encodings* for short) to the chart. A visual encoding specifies how a given data column should be mapped onto the visual properties of the visualization.
Some of the more frequenty used visual encodings are listed here:

* X: x-axis value
* Y: y-axis value
* Color: color of the mark
* Opacity: transparency/opacity of the mark
* Shape: shape of the mark
* Size: size of the mark
* Row: row within a grid of facet plots
* Column: column within a grid of facet plots

For a complete list of these encodings, see the [Encodings](https://altair-viz.github.io/user_guide/encoding.html) section of the documentation.

Visual encodings can be created with the `encode()` method of the `Chart` object. For example, we can start by mapping the `y` axis of the chart to column `a`:

In [7]:
chart = alt.Chart(data).mark_point().encode(y='a')
chart

The result is a one-dimensional visualization representing the values taken on by `a`.
As above, we can view the JSON data generated for this visualization:

In [8]:
chart.to_dict()

{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},
 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'},
 'mark': 'point',
 'encoding': {'y': {'type': 'nominal', 'field': 'a'}},
 '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json',
 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2},
   {'a': 'C', 'b': 7},
   {'a': 'C', 'b': 4},
   {'a': 'D', 'b': 1},
   {'a': 'D', 'b': 2},
   {'a': 'D', 'b': 6},
   {'a': 'E', 'b': 8},
   {'a': 'E', 'b': 4},
   {'a': 'E', 'b': 7}]}}

The result is the same as above with the addition of the `'encoding'` key, which specifies the visualization channel (`y`), the name of the field (`a`), and the type of the variable (`nominal`).

Altair is able to automatically determine the type of the variable using built-in heuristics. Altair and Vega-Lite support four primitive data types:

<table>
  <tr>
    <th>Data Type</th>
    <th>Code</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>quantitative</td>
    <td>Q</td>
    <td>Numerical quantity (real-valued)</td>
  </tr>
  <tr>
    <td>nominal</td>
    <td>N</td>
    <td>Name / Unordered categorical</td>
  </tr>
  <tr>
    <td>ordinal</td>
    <td>O</td>
    <td>Ordered categorial</td>
  </tr>
  <tr>
    <td>temporal</td>
    <td>T</td>
    <td>Date/time</td>
  </tr>
</table>

You can set the data type of a column explicitly using a one letter code attached to the column name with a colon:

In [9]:
alt.Chart(data).mark_point().encode(y='a:N')

The visualization can be made more interesting by adding another channel to the encoding: let's encode column `b` as the `x` position:

In [10]:
alt.Chart(data).mark_point().encode(
    y='a',
    x='b'
)

With two visual channels encoded, we can see the raw data points in the `DataFrame`. A different mark type can be chosen using a different `mark_*()` method, such as `mark_bar()`:

In [11]:
alt.Chart(data).mark_bar().encode(
    alt.Y('a'),
    alt.X('b')
)

Notice, we have used a slightly different syntax for specifying the channels using classes (``alt.X`` and ``alt.Y``) passed as positional arguments. These classes allow additional arguments to be passed to each channel, as we will see below.

Here are some of the more commonly used `mark_*()` methods supported in Altair and Vega-Lite; for more detail see [Marks](https://altair-viz.github.io/user_guide/marks.html) in the Altair documentation:

|Method|
|------|
|``mark_area()``|
|``mark_bar()``|
|``mark_circle()``|
|``mark_line()``|
|``mark_point()``|
|``mark_rule()``|
|``mark_square()``|
|``mark_text()``|
|``mark_tick()``|

## Data transformation: Aggregation

Altair and Vega-Lite also support a variety of built-in data transformations, such as aggregation. The easiest way to specify such aggregations is through a string-function syntax in the argument to the column name. For example, here we will plot not all the values, but a single point representing the mean of the x-values for a given y-value:

In [12]:
alt.Chart(data).mark_point().encode(
    y='a',
    x='mean(b)'
)

Conceptually, this is equivalent to the following groupby operation:

In [13]:
data.groupby('a').mean()

Unnamed: 0_level_0,b
a,Unnamed: 1_level_1
C,4.333333
D,3.0
E,6.333333


More typically, aggregated values are displayed using bar charts.
Making this change is as simple as replacing `mark_point()` with `mark_bar()`:

In [14]:
chart = alt.Chart(data).mark_bar().encode(
    y='a',
    x='mean(b)'
)
chart

As above, Altair's role in this visualization is converting the resulting object into an appropriate JSON dict.
Here it is, leaving out the data for clarity:

In [15]:
chart.to_dict()

{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},
 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'},
 'mark': 'bar',
 'encoding': {'x': {'type': 'quantitative', 'aggregate': 'mean', 'field': 'b'},
  'y': {'type': 'nominal', 'field': 'a'}},
 '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json',
 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2},
   {'a': 'C', 'b': 7},
   {'a': 'C', 'b': 4},
   {'a': 'D', 'b': 1},
   {'a': 'D', 'b': 2},
   {'a': 'D', 'b': 6},
   {'a': 'E', 'b': 8},
   {'a': 'E', 'b': 4},
   {'a': 'E', 'b': 7}]}}

Notice that Altair has taken the string `'mean(b)'` and converted it to a mapping that includes `field`, `type`, and `aggregate`. The full shorthand syntax for the column names in Altair also includes the explicit type code separated by a column:

In [16]:
x = alt.X('mean(b):Q')
x.to_dict()

{'type': 'quantitative', 'aggregate': 'mean', 'field': 'b'}

This shorthand is equivalent to spelling-out these properties by name:

In [17]:
x = alt.X('b', aggregate='average', type='quantitative')
x.to_dict()

{'type': 'quantitative', 'aggregate': 'average', 'field': 'b'}

This is one benefit of using the Altair API over writing the Vega-Lite spec from scratch: valid Vega-Lite specifications can be created very succinctly, with less boilerplate code.

## Customizing your visualization

To speed the process of data exploration, Altair (via Vega-Lite) makes some choices about default properties of the visualization.
Altair also provides an API to customize the look of the visualization. For example, we can use the `X` object we saw above to override the default x-axis title:

In [18]:
alt.Chart(data).mark_bar().encode(
    y='a',
    x=alt.X('mean(b)', axis=alt.Axis(title='Mean of quantity b'))
)

The properties of marks can be configured by passing keyword arguments to the `mark_*()` methods; for example, any named HTML color is supported:

In [19]:
alt.Chart(data).mark_bar(color='firebrick').encode(
    y='a',
    x=alt.X('mean(b)', axis=alt.Axis(title='Mean of quantity b'))
)

Similarly, we can set properties of the chart such as width and height using the ``properties()`` method:

In [20]:
chart = alt.Chart(data).mark_bar().encode(
    y='a',
    x=alt.X('average(b)', axis=alt.Axis(title='Average of b'))
).properties(
    width=400,
    height=300
)

chart

As above, we can inspect how these configuration options affect the resulting Vega-lite specification:

In [21]:
chart.to_dict()

{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},
 'data': {'name': 'data-347f1284ea3247c0f55cb966abbdd2d8'},
 'mark': 'bar',
 'encoding': {'x': {'type': 'quantitative',
   'aggregate': 'average',
   'axis': {'title': 'Average of b'},
   'field': 'b'},
  'y': {'type': 'nominal', 'field': 'a'}},
 'height': 300,
 'width': 400,
 '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.0.json',
 'datasets': {'data-347f1284ea3247c0f55cb966abbdd2d8': [{'a': 'C', 'b': 2},
   {'a': 'C', 'b': 7},
   {'a': 'C', 'b': 4},
   {'a': 'D', 'b': 1},
   {'a': 'D', 'b': 2},
   {'a': 'D', 'b': 6},
   {'a': 'E', 'b': 8},
   {'a': 'E', 'b': 4},
   {'a': 'E', 'b': 7}]}}

To learn more about the various properties of chart objects, you can use Jupyter's help syntax:

In [22]:
alt.Chart?

[0;31mInit signature:[0m
[0malt[0m[0;34m.[0m[0mChart[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdata[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mencoding[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmark[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mwidth[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mheight[0m[0;34m=[0m[0mUndefined[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m**[0m[0mkwargs[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Create a basic Altair/Vega-Lite chart.

Although it is possible to set all Chart properties as constructor attributes,
it is more idiomatic to use methods such as ``mark_point()``, ``encode()``,
``transform_filter()``, ``properties()``, etc. See Altair's documentation
for details and examples: http://altair-viz.github.io/.

Attributes
----------
data : Data
   

You can also read more in Altair's [Configuration](https://altair-viz.github.io/user_guide/configuration.html) documentation.

## Publishing a visualization online

Because Altair produces Vega-Lite specifications, it is relatively straightforward to export charts and publish them on the web as Vega-Lite plots.
All that is required is to load the Vega-Lite javascript library, and pass it the JSON plot specification output by Altair.
For convenience Altair provides a ``save()`` method, which will save any chart to HTML:

In [23]:
chart.save('chart.html')

In [24]:
!cat chart.html

<!DOCTYPE html>
<html>
<head>
  <style>
    .error {
        color: red;
    }
  </style>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega@5"></script>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-lite@4.0.0"></script>
  <script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-embed@6"></script>
</head>
<body>
  <div id="vis"></div>
  <script>
    (function(vegaEmbed) {
      var spec = {"config": {"view": {"continuousWidth": 400, "continuousHeight": 300}}, "data": {"name": "data-347f1284ea3247c0f55cb966abbdd2d8"}, "mark": "bar", "encoding": {"x": {"type": "quantitative", "aggregate": "average", "axis": {"title": "Average of b"}, "field": "b"}, "y": {"type": "nominal", "field": "a"}}, "height": 300, "width": 400, "$schema": "https://vega.github.io/schema/vega-lite/v4.0.0.json", "datasets": {"data-347f1284ea3247c0f55cb966abbdd2d8": [{"a": "C", "b": 2}, {"a": "C", "b": 7}, {"a": "C", "b": 4}, {"a": "D", "b": 1}, {"a": "D",

Notice that the chart specification is passed to the ``vegaEmbed`` library in the ``spec`` variable; the rest of the code is a template that is constant regardless of the chart.

We can view the output in an iframe within the notebook (note that some online notebook viewers will not display iframes):

In [25]:
# Display IFrame in IPython
from IPython.display import IFrame
IFrame('chart.html', width=400, height=200)

Alternatively, you can use your web browser to open the file manually to confirm that it works: [chart.html](chart.html).

## Learning More

For more information on Altair, please refer to Altair's online documentation: http://altair-viz.github.io/

You can also see some of the example plots listed in the [accompanying notebooks](01-Index.ipynb).