{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Simple Charts: Core Concepts\n",
"\n",
"The goal of this section is to teach you the core concepts required to create a basic Altair chart; namely:\n",
"\n",
"- **Data**, **Marks**, and **Encodings**: the three core pieces of an Altair chart\n",
"\n",
"- **Encoding Types**: ``Q`` (quantitative), ``N`` (nominal), ``O`` (ordinal), ``T`` (temporal), which drive the visual representation of the encodings\n",
"\n",
"- **Binning and Aggregation**: which let you control aspects of the data representation within Altair.\n",
"\n",
"With a good understanding of these core pieces, you will be well on your way to making a variety of charts in Altair."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll start by importing Altair, and (if necessary) enabling the appropriate renderer:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import altair as alt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A Basic Altair Chart\n",
"\n",
"The essential elements of an Altair chart are the **data**, the **mark**, and the **encoding**.\n",
"\n",
"The format by which these are specified will look something like this:\n",
"\n",
"```python\n",
"alt.Chart(data).mark_point().encode(\n",
" encoding_1='column_1',\n",
" encoding_2='column_2',\n",
" # etc.\n",
")\n",
"```\n",
"\n",
"Let's take a look at these pieces, one at a time."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Data\n",
"\n",
"Data in Altair is built around the [Pandas Dataframe](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).\n",
"For this section, we'll use the cars dataset that we saw before, which we can load using the [vega_datasets](https://github.com/altair-viz/vega_datasets) package:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Miles_per_Gallon | \n",
" Cylinders | \n",
" Displacement | \n",
" Horsepower | \n",
" Weight_in_lbs | \n",
" Acceleration | \n",
" Year | \n",
" Origin | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" chevrolet chevelle malibu | \n",
" 18.0 | \n",
" 8 | \n",
" 307.0 | \n",
" 130.0 | \n",
" 3504 | \n",
" 12.0 | \n",
" 1970-01-01 | \n",
" USA | \n",
"
\n",
" \n",
" 1 | \n",
" buick skylark 320 | \n",
" 15.0 | \n",
" 8 | \n",
" 350.0 | \n",
" 165.0 | \n",
" 3693 | \n",
" 11.5 | \n",
" 1970-01-01 | \n",
" USA | \n",
"
\n",
" \n",
" 2 | \n",
" plymouth satellite | \n",
" 18.0 | \n",
" 8 | \n",
" 318.0 | \n",
" 150.0 | \n",
" 3436 | \n",
" 11.0 | \n",
" 1970-01-01 | \n",
" USA | \n",
"
\n",
" \n",
" 3 | \n",
" amc rebel sst | \n",
" 16.0 | \n",
" 8 | \n",
" 304.0 | \n",
" 150.0 | \n",
" 3433 | \n",
" 12.0 | \n",
" 1970-01-01 | \n",
" USA | \n",
"
\n",
" \n",
" 4 | \n",
" ford torino | \n",
" 17.0 | \n",
" 8 | \n",
" 302.0 | \n",
" 140.0 | \n",
" 3449 | \n",
" 10.5 | \n",
" 1970-01-01 | \n",
" USA | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Miles_per_Gallon Cylinders Displacement \\\n",
"0 chevrolet chevelle malibu 18.0 8 307.0 \n",
"1 buick skylark 320 15.0 8 350.0 \n",
"2 plymouth satellite 18.0 8 318.0 \n",
"3 amc rebel sst 16.0 8 304.0 \n",
"4 ford torino 17.0 8 302.0 \n",
"\n",
" Horsepower Weight_in_lbs Acceleration Year Origin \n",
"0 130.0 3504 12.0 1970-01-01 USA \n",
"1 165.0 3693 11.5 1970-01-01 USA \n",
"2 150.0 3436 11.0 1970-01-01 USA \n",
"3 150.0 3433 12.0 1970-01-01 USA \n",
"4 140.0 3449 10.5 1970-01-01 USA "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from vega_datasets import data\n",
"cars = data.cars()\n",
"\n",
"cars.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Data in Altair is expected to be in a [tidy format](http://vita.had.co.nz/papers/tidy-data.html); in other words:\n",
"\n",
"- each **row** is an observation\n",
"- each **column** is a variable\n",
"\n",
"See [Altair's Data Documentation](https://altair-viz.github.io/user_guide/data.html) for more information."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The *Chart* object\n",
"\n",
"With the data defined, you can instantiate Altair's fundamental object, the ``Chart``. Fundamentally, a ``Chart`` is an object which knows how to emit a JSON dictionary representing the data and visualization encodings, which can be sent to the notebook and rendered by the Vega-Lite JavaScript library.\n",
"Let's take a look at what this JSON representation looks like, using only the first row of the data:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},\n",
" 'data': {'name': 'data-36a712fbaefa4d20aa0b32e160cfd83a'},\n",
" 'mark': 'point',\n",
" '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.2.json',\n",
" 'datasets': {'data-36a712fbaefa4d20aa0b32e160cfd83a': [{'Name': 'chevrolet chevelle malibu',\n",
" 'Miles_per_Gallon': 18.0,\n",
" 'Cylinders': 8,\n",
" 'Displacement': 307.0,\n",
" 'Horsepower': 130.0,\n",
" 'Weight_in_lbs': 3504,\n",
" 'Acceleration': 12.0,\n",
" 'Year': '1970-01-01T00:00:00',\n",
" 'Origin': 'USA'}]}}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cars1 = cars.iloc[:1]\n",
"alt.Chart(cars1).mark_point().to_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At this point the chart includes a JSON-formatted representation of the dataframe, what type of mark to use, along with some metadata that is included in every chart output."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Mark\n",
"\n",
"We can decide what sort of *mark* we would like to use to represent our data.\n",
"In the previous example, we can choose the ``point`` mark to represent each data as a point on the plot:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(cars).mark_point()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is a visualization with one point per row in the data, though it is not a particularly interesting: all the points are stacked right on top of each other!\n",
"\n",
"It is useful to again examine the JSON output here:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},\n",
" 'data': {'name': 'data-36a712fbaefa4d20aa0b32e160cfd83a'},\n",
" 'mark': 'point',\n",
" '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.2.json',\n",
" 'datasets': {'data-36a712fbaefa4d20aa0b32e160cfd83a': [{'Name': 'chevrolet chevelle malibu',\n",
" 'Miles_per_Gallon': 18.0,\n",
" 'Cylinders': 8,\n",
" 'Displacement': 307.0,\n",
" 'Horsepower': 130.0,\n",
" 'Weight_in_lbs': 3504,\n",
" 'Acceleration': 12.0,\n",
" 'Year': '1970-01-01T00:00:00',\n",
" 'Origin': 'USA'}]}}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(cars1).mark_point().to_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that now in addition to the data, the specification includes information about the mark type."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are a number of available marks that you can use; some of the more common are the following:\n",
"\n",
"* ``mark_point()`` \n",
"* ``mark_circle()``\n",
"* ``mark_square()``\n",
"* ``mark_line()``\n",
"* ``mark_area()``\n",
"* ``mark_bar()``\n",
"* ``mark_tick()``\n",
"\n",
"You can get a complete list of ``mark_*`` methods using Jupyter's tab-completion feature: in any cell just type:\n",
"\n",
" alt.Chart.mark_\n",
" \n",
"followed by the tab key to see the available options."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Encodings\n",
"\n",
"The next step is to add *visual encoding channels* (or *encodings* for short) to the chart. An encoding channel specifies how a given data column should be mapped onto the visual properties of the visualization.\n",
"Some of the more frequenty used visual encodings are listed here:\n",
"\n",
"* ``x``: x-axis value\n",
"* ``y``: y-axis value\n",
"* ``color``: color of the mark\n",
"* ``opacity``: transparency/opacity of the mark\n",
"* ``shape``: shape of the mark\n",
"* ``size``: size of the mark\n",
"* ``row``: row within a grid of facet plots\n",
"* ``column``: column within a grid of facet plots\n",
"\n",
"For a complete list of these encodings, see the [Encodings](https://altair-viz.github.io/user_guide/encoding.html) section of the documentation.\n",
"\n",
"Visual encodings can be created with the `encode()` method of the `Chart` object. For example, we can start by mapping the `y` axis of the chart to the `Origin` column:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(cars).mark_point().encode(\n",
" y='Origin'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is a one-dimensional visualization representing the values taken on by `Origin`, with the points in each category on top of each other.\n",
"As above, we can view the JSON data generated for this visualization:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'config': {'view': {'continuousWidth': 400, 'continuousHeight': 300}},\n",
" 'data': {'name': 'data-36a712fbaefa4d20aa0b32e160cfd83a'},\n",
" 'mark': 'point',\n",
" 'encoding': {'x': {'type': 'nominal', 'field': 'Origin'}},\n",
" '$schema': 'https://vega.github.io/schema/vega-lite/v4.0.2.json',\n",
" 'datasets': {'data-36a712fbaefa4d20aa0b32e160cfd83a': [{'Name': 'chevrolet chevelle malibu',\n",
" 'Miles_per_Gallon': 18.0,\n",
" 'Cylinders': 8,\n",
" 'Displacement': 307.0,\n",
" 'Horsepower': 130.0,\n",
" 'Weight_in_lbs': 3504,\n",
" 'Acceleration': 12.0,\n",
" 'Year': '1970-01-01T00:00:00',\n",
" 'Origin': 'USA'}]}}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(cars1).mark_point().encode(\n",
" x='Origin'\n",
").to_dict()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The result is the same as above with the addition of the `'encoding'` key, which specifies the visualization channel (`y`), the name of the field (`Origin`), and the type of the variable (`nominal`).\n",
"We'll discuss these data types in a moment."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The visualization can be made more interesting by adding another channel to the encoding: let's encode the `Miles_per_Gallon` as the `x` position:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(cars).mark_point().encode(\n",
" y='Origin',\n",
" x='Miles_per_Gallon'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can add as many encodings as you wish, with each encoding mapped to a column in the data.\n",
"For example, here we will color the points by *Origin*, and plot *Miles_per_gallon* vs *Year*:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(cars).mark_point().encode(\n",
" color='Origin',\n",
" y='Miles_per_Gallon',\n",
" x='Year'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Excercise: Exploring Data\n",
"\n",
"Now that you know the basics (Data, encodings, marks) take some time and try making a few plots!\n",
"\n",
"In particular, I'd suggest trying various combinations of the following:\n",
"\n",
"- Marks: ``mark_point()``, ``mark_line()``, ``mark_bar()``, ``mark_text()``, ``mark_rect()``...\n",
"- Data Columns: ``'Acceleration'``, ``'Cylinders'``, ``'Displacement'``, ``'Horsepower'``, ``'Miles_per_Gallon'``, ``'Name'``, ``'Origin'``, ``'Weight_in_lbs'``, ``'Year'``\n",
"- Encodings: ``x``, ``y``, ``color``, ``shape``, ``row``, ``column``, ``opacity``, ``text``, ``tooltip``...\n",
"\n",
"Work with a partner to use various combinations of these options, and see what you can learn from the data! In particular, think about the following:\n",
"\n",
"- Which encodings go well with continuous, quantitative values?\n",
"- Which encodings go well with discrete, categorical (i.e. nominal) values?\n",
"\n",
"After about 10 minutes, we'll ask for a couple volunteers to share their combination of marks, columns, and encodings."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Encoding Types"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One of the central ideas of Altair is that the library will **choose good defaults for your data type**.\n",
"\n",
"The basic data types supported by Altair are as follows:\n",
"\n",
"\n",
" \n",
" Data Type | \n",
" Code | \n",
" Description | \n",
"
\n",
" \n",
" quantitative | \n",
" Q | \n",
" Numerical quantity (real-valued) | \n",
"
\n",
" \n",
" nominal | \n",
" N | \n",
" Name / Unordered categorical | \n",
"
\n",
" \n",
" ordinal | \n",
" O | \n",
" Ordered categorial | \n",
"
\n",
" \n",
" temporal | \n",
" T | \n",
" Date/time | \n",
"
\n",
"
\n",
"\n",
"When you specify data as a pandas dataframe, these types are **automatically determined** by Altair.\n",
"\n",
"When you specify data as a URL, you must **manually specify** data types for each of your columns."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at a simple plot containing three of the columns from the cars data:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(cars).mark_tick().encode(\n",
" x='Miles_per_Gallon',\n",
" y='Origin',\n",
" color='Cylinders'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Questions:\n",
"\n",
"- what data type best goes with ``Miles_per_Gallon``?\n",
"- what data type best goes with ``Origin``?\n",
"- what data type best goes with ``Cylinders``?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's add the shorthands for each of these data types to our specification, using the one-letter codes above\n",
"(for example, change ``\"Miles_per_Gallon\"`` to ``\"Miles_per_Gallon:Q\"`` to explicitly specify that it is a quantitative type):"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(cars).mark_tick().encode(\n",
" x='Miles_per_Gallon:Q',\n",
" y='Origin:N',\n",
" color='Cylinders:O'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice how if we change the data type for ``'Cylinders'`` to ordinal the plot changes.\n",
"\n",
"As you use Altair, it is useful to get into the habit of always specifying these types explicitly, because this is *mandatory* when working with data loaded from a file or a URL."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise: Adding Explicit Types\n",
"\n",
"Following are a few simple charts made with the cars dataset. For each one, try to add explicit types to the encodings (i.e. change ``\"Horsepower\"`` to ``\"Horsepower:Q\"`` so that the plot doesn't change.\n",
"\n",
"Are there any plots that can be made better by changing the type?"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(cars).mark_bar().encode(\n",
" y='Origin',\n",
" x='mean(Horsepower)'\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(cars).mark_line().encode(\n",
" x='Year',\n",
" y='mean(Miles_per_Gallon)',\n",
" color='Origin'\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(cars).mark_bar().encode(\n",
" y='Cylinders',\n",
" x='count()',\n",
" color='Origin'\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
""
],
"text/plain": [
"alt.Chart(...)"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"alt.Chart(cars).mark_rect().encode(\n",
" x='Cylinders',\n",
" y='Origin',\n",
" color='count()'\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}