{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Demo: Exploring the Iris Dataset\n", "\n", "We'll start this tutorial with a demo to whet your appetite for learning more. This section purposely moves quickly through many of the concepts (e.g. data, marks, encodings, aggregation, data types, selections, etc.)\n", "We will return to treat each of these in more depth later in the tutorial, so don't worry if it all seems to go a bit quickly!\n", "\n", "In the live tutorial, this will be done from scratch in a blank notebook.\n", "However, for the sake of people who want to look back on what we did live, I'll do my best to reproduce the examples and the discussion here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Imports and Data\n", "\n", "We'll start with importing the [Altair package](http://altair-viz.github.io/) and enabling the appropriate renderer (if necessary):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import altair as alt\n", "\n", "# Altair plots render by default in JupyterLab and nteract\n", "\n", "# Uncomment/run this line to enable Altair in the classic notebook (not in JupyterLab)\n", "# alt.renderers.enable('notebook')\n", "\n", "# Uncomment/run this line to enable Altair in Colab\n", "# alt.renderers.enable('colab')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we'll use the [vega_datasets package](https://github.com/altair-viz/vega_datasets), to load an example dataset:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
petalLengthpetalWidthsepalLengthsepalWidthspecies
01.40.25.13.5setosa
11.40.24.93.0setosa
21.30.24.73.2setosa
31.50.24.63.1setosa
41.40.25.03.6setosa
\n", "
" ], "text/plain": [ " petalLength petalWidth sepalLength sepalWidth species\n", "0 1.4 0.2 5.1 3.5 setosa\n", "1 1.4 0.2 4.9 3.0 setosa\n", "2 1.3 0.2 4.7 3.2 setosa\n", "3 1.5 0.2 4.6 3.1 setosa\n", "4 1.4 0.2 5.0 3.6 setosa" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from vega_datasets import data\n", "\n", "iris = data.iris()\n", "iris.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that this data is in columnar format: that is, each column contains an attribute of a data point, and each row contains a single instance of the data (here, the measurament of a single flower)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Zero, One, and Two-dimensional Charts\n", "\n", "Using Altair, we can begin to explore this data.\n", "\n", "The most basic chart contains the dataset, along with a mark to represent each row:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_point()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a pretty silly chart, because it consists of 150 points, all laid-out on top of each other. But this *is* technically a representation of the data!\n", "\n", "To make it more interesting, we need to *encode* columns of the data into visual features of the plot (e.g. x-position, y-position, size, color, etc.)\n", "\n", "Let's encode petal length on the x-axis using the ``encode()`` method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_point().encode(\n", " x='petalLength'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a bit better, but the ``point`` mark is probably not the best for a 1D chart like this.\n", "\n", "Let's try the ``tick`` mark instead:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_tick().encode(\n", " x='petalLength'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another thing we might do with the points is to expand them into a 2D chart by also encoding the y value. We'll return to using ``point`` markers, and put ``petalWidth`` on the y-axis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_point().encode(\n", " x='petalLength',\n", " y='petalWidth'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3 Simple Interactions\n", "\n", "One of the nicest features of Altair is the grammar of interaction that it provides.\n", "The simplest kind of interaction is the ability to pan and zoom along charts; Altair contains a shortcut to enable this via the ``interactive()`` method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_point().encode(\n", " x='petalLength',\n", " y='petalWidth'\n", ").interactive()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This lets you click and drag, as well as use your computer's scroll/zoom behavior to zoom in and out on the chart.\n", "\n", "We'll see other interactions later." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. A Third Dimension: Color" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A 2D plot allows us to encode two dimensions of the data. Let's look at using color to encode a third:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_point().encode(\n", " x='petalLength',\n", " y='petalWidth',\n", " color='species'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that when we use a categorical value for color, it chooses an appropriate color map for categorical data.\n", "\n", "Let's see what happens when we use a continuous color value:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_point().encode(\n", " x='petalLength',\n", " y='petalWidth',\n", " color='sepalLength'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A continuous color results in a color scale that is appropriate for continuous data.\n", "\n", "This is one key feature of Altair: it chooses appropriate scales for the data type.\n", "\n", "Similar, if we put the categorical value on the x or y axis, it will also create discrete scales:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_tick().encode(\n", " x='petalLength',\n", " y='species',\n", " color='species'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Altair also can choose the orientation of the tick mark appropriately given the types of the x and y encodings:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_tick().encode(\n", " y='petalLength',\n", " x='species',\n", " color='species'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Binning and aggregation\n", "\n", "Let's return quickly to our 1D chart of petal length:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_tick().encode(\n", " x='petalLength',\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another way we might represent this data is to creat a histogram: to bin the x data and show the count on the y axis.\n", "In many plotting libraries this is done with a special method like ``hist()``. In Altair, such binning and aggregation is part of the declarative API.\n", "\n", "To move beyond a simple field name, we use ``alt.X()`` for the x encoding, and we use ``'count()'`` for the y encoding:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_bar().encode(\n", " x=alt.X('petalLength', bin=True),\n", " y='count()'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want more control over the bins, we can use ``alt.Bin`` to adjust bin parameters" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_bar().encode(\n", " x=alt.X('petalLength', bin=alt.Bin(maxbins=50)),\n", " y='count()'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we apply another encoding (such as ``color``), the data will be automatically grouped within each bin:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_bar().encode(\n", " x=alt.X('petalLength', bin=alt.Bin(maxbins=50)),\n", " y='count()',\n", " color='species'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you prefer a separate plot for each category, the ``column`` encoding can help:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_bar().encode(\n", " x=alt.X('petalLength', bin=alt.Bin(maxbins=50)),\n", " y='count()',\n", " color='species',\n", " column='species'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Binning and aggregation works in two dimensions as well; we can use the ``rect`` marker and visualize the count using the color:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_rect().encode(\n", " x=alt.X('petalLength', bin=True),\n", " y=alt.Y('sepalLength', bin=True),\n", " color='count()'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Aggregations can be more than simple counts; we can also aggregate and compute the mean of a third quantity within each bin" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_rect().encode(\n", " x=alt.X('petalLength', bin=True),\n", " y=alt.Y('sepalLength', bin=True),\n", " color='mean(petalWidth)'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Stacking and Layering\n", "\n", "Sometimes a multi-panel view is helpful in order to see relationships between points.\n", "Altair provides the ``hconcat()`` and ``vconcat()`` functions, and the associated ``|`` and ``&`` operators to concatenate charts together (note this is different than the column-wise faceting we saw above, because each panel contains the *entire* dataset)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "chart1 = alt.Chart(iris).mark_point().encode(\n", " x='petalWidth',\n", " y='sepalWidth',\n", " color='species'\n", ")\n", "\n", "chart2 = alt.Chart(iris).mark_point().encode(\n", " x='sepalLength',\n", " y='sepalWidth',\n", " color='species'\n", ")\n", "\n", "chart1 | chart2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A useful trick for this sort of situation is to create a base chart, and concatenate two slightly different copies of it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "base = alt.Chart(iris).mark_point().encode(\n", " y='sepalWidth',\n", " color='species'\n", ")\n", "\n", "base.encode(x='petalWidth') | base.encode(x='sepalLength')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another type of compound chart is a layered chart, built using ``alt.layer()`` or equivalently the ``+`` operator.\n", "For example, we could layer points on top of the binned counts from before:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "counts = alt.Chart(iris).mark_rect(opacity=0.3).encode(\n", " x=alt.X('petalLength', bin=True),\n", " y=alt.Y('sepalLength', bin=True),\n", " color='count()'\n", ")\n", "\n", "points = alt.Chart(iris).mark_point().encode(\n", " x='petalLength',\n", " y='sepalLength',\n", " color='species'\n", ")\n", "\n", "counts + points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll see a use of this kind of chart stacking in the interactivity section, below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Interactivity: Selections\n", "\n", "Let's return to our scatter plot, and take a look at the other types of interactivity that Altair offers:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_point().encode(\n", " x='petalLength',\n", " y='petalWidth',\n", " color='species'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recall that you can add ``interactive()`` to the end of a chart to enable the most basic interactive scales:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "alt.Chart(iris).mark_point().encode(\n", " x='petalLength',\n", " y='petalWidth',\n", " color='species'\n", ").interactive()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Altair provides a general ``selection`` API for creating interactive plots; for example, here we create an interval selection:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "interval = alt.selection_interval()\n", "\n", "alt.Chart(iris).mark_point().encode(\n", " x='petalLength',\n", " y='petalWidth',\n", " color='species'\n", ").properties(\n", " selection=interval\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Currently this selection doesn't actually do anything, but we can change that by conditioning the color on this selection:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "interval = alt.selection_interval()\n", "\n", "alt.Chart(iris).mark_point().encode(\n", " x='petalLength',\n", " y='petalWidth',\n", " color=alt.condition(interval, 'species', alt.value('lightgray'))\n", ").properties(\n", " selection=interval\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The nice thing about this selection API is that it *automatically* applies across any compound charts; for example, here we can horizontally concatenate two charts, and since they both have the same selection they both respond appropriately:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "interval = alt.selection_interval()\n", "\n", "base = alt.Chart(iris).mark_point().encode(\n", " y='petalWidth',\n", " color=alt.condition(interval, 'species', alt.value('lightgray'))\n", ").properties(\n", " selection=interval\n", ")\n", "\n", "base.encode(x='petalLength') | base.encode(x='sepalLength')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The properties of the interval selection can be easily modified; for example, we can specify that it only applies to the x encoding:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "interval = alt.selection_interval(encodings=['x'])\n", "\n", "base = alt.Chart(iris).mark_point().encode(\n", " y='petalWidth',\n", " color=alt.condition(interval, 'species', alt.value('lightgray'))\n", ").properties(\n", " selection=interval\n", ")\n", "\n", "base.encode(x='petalLength') | base.encode(x='sepalLength')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do even more sophisticated things with selections as well.\n", "For example, let's make a histogram of the number of records by species, and stack it on our scatter plot:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "interval = alt.selection_interval()\n", "\n", "base = alt.Chart(iris).mark_point().encode(\n", " y='petalLength',\n", " color=alt.condition(interval, 'species', alt.value('lightgray'))\n", ").properties(\n", " selection=interval\n", ")\n", "\n", "hist = alt.Chart(iris).mark_bar().encode(\n", " x='count()',\n", " y='species',\n", " color='species'\n", ").properties(\n", " width=800,\n", " height=80\n", ").transform_filter(\n", " interval\n", ")\n", "\n", "scatter = base.encode(x='petalWidth') | base.encode(x='sepalWidth')\n", "\n", "scatter & hist" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This demo has covered a number of the available components of Altair.\n", "In the following sections, we'll look into each of these a bit more systematically." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }