{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Measles Incidence in Altair" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an example of reproducing the Wall Street Journal's famous [Measles Incidence Plot](http://graphics.wsj.com/infectious-diseases-and-vaccines/#b02g20t20w15) in Python using [Altair](http://github.com/ellisonbg/altair/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Data\n", "\n", "We'll start by downloading the data. Fortunately, others have made the data available in an easily digestible form; a github search revealed the dataset in CSV format here:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YEARWEEKALABAMAALASKAARIZONAARKANSASCALIFORNIACOLORADOCONNECTICUTDELAWARE...SOUTH DAKOTATENNESSEETEXASUTAHVERMONTVIRGINIAWASHINGTONWEST VIRGINIAWISCONSINWYOMING
0192813.67NaN1.904.111.388.384.508.58...5.6922.031.180.40.28NaN14.833.361.540.91
1192826.25NaN6.409.911.806.029.007.30...6.5716.960.63NaN0.56NaN17.344.190.96NaN
2192837.95NaN4.5011.151.312.868.8115.88...2.0424.660.620.21.12NaN15.674.194.791.36
31928412.58NaN1.9013.751.8713.7110.404.29...2.1918.860.370.26.70NaN12.774.661.643.64
4192858.03NaN0.4720.792.385.1316.805.58...3.9420.051.570.46.70NaN18.837.372.910.91
\n", "

5 rows × 53 columns

\n", "
" ], "text/plain": [ " YEAR WEEK ALABAMA ALASKA ARIZONA ARKANSAS CALIFORNIA COLORADO \\\n", "0 1928 1 3.67 NaN 1.90 4.11 1.38 8.38 \n", "1 1928 2 6.25 NaN 6.40 9.91 1.80 6.02 \n", "2 1928 3 7.95 NaN 4.50 11.15 1.31 2.86 \n", "3 1928 4 12.58 NaN 1.90 13.75 1.87 13.71 \n", "4 1928 5 8.03 NaN 0.47 20.79 2.38 5.13 \n", "\n", " CONNECTICUT DELAWARE ... SOUTH DAKOTA TENNESSEE TEXAS UTAH VERMONT \\\n", "0 4.50 8.58 ... 5.69 22.03 1.18 0.4 0.28 \n", "1 9.00 7.30 ... 6.57 16.96 0.63 NaN 0.56 \n", "2 8.81 15.88 ... 2.04 24.66 0.62 0.2 1.12 \n", "3 10.40 4.29 ... 2.19 18.86 0.37 0.2 6.70 \n", "4 16.80 5.58 ... 3.94 20.05 1.57 0.4 6.70 \n", "\n", " VIRGINIA WASHINGTON WEST VIRGINIA WISCONSIN WYOMING \n", "0 NaN 14.83 3.36 1.54 0.91 \n", "1 NaN 17.34 4.19 0.96 NaN \n", "2 NaN 15.67 4.19 4.79 1.36 \n", "3 NaN 12.77 4.66 1.64 3.64 \n", "4 NaN 18.83 7.37 2.91 0.91 \n", "\n", "[5 rows x 53 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "url = 'https://raw.githubusercontent.com/blmoore/blogR/master/data/measles_incidence.csv'\n", "data = pd.read_csv(url, skiprows=2, na_values='-')\n", "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Munging with Pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This data needs to be cleaned-up a bit; we can do this with the Pandas library.\n", "We first need to aggregate the incidence data by year:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ALABAMAALASKAARIZONAARKANSASCALIFORNIACOLORADOCONNECTICUTDELAWAREDISTRICT OF COLUMBIAFLORIDA...SOUTH DAKOTATENNESSEETEXASUTAHVERMONTVIRGINIAWASHINGTONWEST VIRGINIAWISCONSINWYOMING
YEAR
1928334.990.0200.75481.7769.22206.98634.95256.02535.63119.58...160.16315.4397.3516.83334.800.0344.82195.98124.61227.00
1929111.930.054.8867.2272.8074.24614.82239.8294.2078.01...167.7733.0471.2868.90105.310.0248.60380.141016.54312.16
1930157.000.0466.3153.44760.241132.76112.23109.25182.10356.59...346.31179.9173.121044.79236.690.0631.64157.70748.58341.55
1931337.290.0497.6945.91477.48453.27790.461003.28832.99260.79...212.36134.7939.5629.72318.400.0197.43291.38506.5760.69
193210.210.020.115.33214.08222.90348.2715.9853.1413.63...96.3768.9976.5813.911146.0853.4631.93599.65935.31242.10
\n", "

5 rows × 51 columns

\n", "
" ], "text/plain": [ " ALABAMA ALASKA ARIZONA ARKANSAS CALIFORNIA COLORADO CONNECTICUT \\\n", "YEAR \n", "1928 334.99 0.0 200.75 481.77 69.22 206.98 634.95 \n", "1929 111.93 0.0 54.88 67.22 72.80 74.24 614.82 \n", "1930 157.00 0.0 466.31 53.44 760.24 1132.76 112.23 \n", "1931 337.29 0.0 497.69 45.91 477.48 453.27 790.46 \n", "1932 10.21 0.0 20.11 5.33 214.08 222.90 348.27 \n", "\n", " DELAWARE DISTRICT OF COLUMBIA FLORIDA ... SOUTH DAKOTA TENNESSEE \\\n", "YEAR ... \n", "1928 256.02 535.63 119.58 ... 160.16 315.43 \n", "1929 239.82 94.20 78.01 ... 167.77 33.04 \n", "1930 109.25 182.10 356.59 ... 346.31 179.91 \n", "1931 1003.28 832.99 260.79 ... 212.36 134.79 \n", "1932 15.98 53.14 13.63 ... 96.37 68.99 \n", "\n", " TEXAS UTAH VERMONT VIRGINIA WASHINGTON WEST VIRGINIA WISCONSIN \\\n", "YEAR \n", "1928 97.35 16.83 334.80 0.0 344.82 195.98 124.61 \n", "1929 71.28 68.90 105.31 0.0 248.60 380.14 1016.54 \n", "1930 73.12 1044.79 236.69 0.0 631.64 157.70 748.58 \n", "1931 39.56 29.72 318.40 0.0 197.43 291.38 506.57 \n", "1932 76.58 13.91 1146.08 53.4 631.93 599.65 935.31 \n", "\n", " WYOMING \n", "YEAR \n", "1928 227.00 \n", "1929 312.16 \n", "1930 341.55 \n", "1931 60.69 \n", "1932 242.10 \n", "\n", "[5 rows x 51 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "annual = data.drop('WEEK', axis=1).groupby('YEAR').sum()\n", "annual.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, because Altair is built to handle data where each row corresponds to a single sample, we will stack the data, re-labeling the columns for clarity:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
YEARstateincidence
01928ALABAMA334.99
11929ALABAMA111.93
21930ALABAMA157.00
31931ALABAMA337.29
41932ALABAMA10.21
\n", "
" ], "text/plain": [ " YEAR state incidence\n", "0 1928 ALABAMA 334.99\n", "1 1929 ALABAMA 111.93\n", "2 1930 ALABAMA 157.00\n", "3 1931 ALABAMA 337.29\n", "4 1932 ALABAMA 10.21" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "measles = annual.reset_index()\n", "measles = measles.melt('YEAR', var_name='state', value_name='incidence')\n", "measles.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initial Visualization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use Altair's syntax for generating a heat map:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import altair as alt" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "alt.Chart(measles).mark_rect().encode(\n", " x='YEAR:O',\n", " y='state:N',\n", " color='incidence'\n", ").properties(\n", " width=600,\n", " height=400\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adjusting Aesthetics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All operative components of the visualization appear above, we now just have to adjust the aesthetic features to reproduce the original plot.\n", "Altair allows a wide range of flexibility for such adjustments, including size and color of markings, axis labels and titles, and more.\n", "\n", "Here is the data visualized again with a number of these adjustments:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.Chart(...)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Define a custom colormape using Hex codes & HTML color names\n", "colormap = alt.Scale(domain=[0, 100, 200, 300, 1000, 3000],\n", " range=['#F0F8FF', 'cornflowerblue', 'mediumseagreen', '#FFEE00', 'darkorange', 'firebrick'],\n", " type='sqrt')\n", "\n", "alt.Chart(measles).mark_rect().encode(\n", " alt.X('YEAR:O', axis=alt.Axis(title=None, ticks=False)),\n", " alt.Y('state:N', axis=alt.Axis(title=None, ticks=False)),\n", " alt.Color('incidence:Q', sort='ascending', scale=colormap, legend=None)\n", ").properties(\n", " width=800,\n", " height=500\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result clearly shows the impact of the the measles vaccine introduced in the mid-1960s." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Layering & Selections\n", "\n", "Here is another view of the data, using layering and selections to allow zooming-in" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "" ], "text/plain": [ "alt.LayerChart(...)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hover = alt.selection_single(on='mouseover', nearest=True, fields=['state'], empty='none')\n", "\n", "line = alt.Chart().mark_line().encode(\n", " alt.X('YEAR:Q',\n", " scale=alt.Scale(zero=False),\n", " axis=alt.Axis(format='f', title='year')\n", " ),\n", " alt.Y('incidence:Q', axis=alt.Axis(title='measles incidence')),\n", " detail='state:N',\n", " opacity=alt.condition(hover, alt.value(1.0), alt.value(0.1))\n", ").properties(\n", " width=800,\n", " height=300\n", ")\n", "\n", "point = line.mark_point().encode(\n", " opacity=alt.value(0.0)\n", ").properties(\n", " selection=hover\n", ")\n", "\n", "mean = alt.Chart().mark_line().encode(\n", " x=alt.X('YEAR:Q', scale=alt.Scale(zero=False)),\n", " y='mean(incidence):Q',\n", " color=alt.value('black')\n", ")\n", "\n", "text = alt.Chart().mark_text(align='right').encode(\n", " x='min(YEAR):Q',\n", " y='mean(incidence):Q',\n", " text='state:N',\n", " detail='state:N',\n", " opacity=alt.condition(hover, alt.value(1.0), alt.value(0.0))\n", ")\n", "\n", "alt.layer(point, line, mean, text, data=measles).interactive(bind_y=False)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }