# Measles Incidence in Altair

This is an example of reproducing the Wall Street Journal's famous [Measles Incidence Plot](http://graphics.wsj.com/infectious-diseases-and-vaccines/#b02g20t20w15) in Python using [Altair](http://github.com/ellisonbg/altair/).

## The Data

We'll start by downloading the data. Fortunately, others have made the data available in an easily digestible form; a github search revealed the dataset in CSV format here:

In [1]:
import pandas as pd
url = 'https://raw.githubusercontent.com/blmoore/blogR/master/data/measles_incidence.csv'
data = pd.read_csv(url, skiprows=2, na_values='-')
data.head()

Unnamed: 0,YEAR,WEEK,ALABAMA,ALASKA,ARIZONA,ARKANSAS,CALIFORNIA,COLORADO,CONNECTICUT,DELAWARE,...,SOUTH DAKOTA,TENNESSEE,TEXAS,UTAH,VERMONT,VIRGINIA,WASHINGTON,WEST VIRGINIA,WISCONSIN,WYOMING
0,1928,1,3.67,,1.9,4.11,1.38,8.38,4.5,8.58,...,5.69,22.03,1.18,0.4,0.28,,14.83,3.36,1.54,0.91
1,1928,2,6.25,,6.4,9.91,1.8,6.02,9.0,7.3,...,6.57,16.96,0.63,,0.56,,17.34,4.19,0.96,
2,1928,3,7.95,,4.5,11.15,1.31,2.86,8.81,15.88,...,2.04,24.66,0.62,0.2,1.12,,15.67,4.19,4.79,1.36
3,1928,4,12.58,,1.9,13.75,1.87,13.71,10.4,4.29,...,2.19,18.86,0.37,0.2,6.7,,12.77,4.66,1.64,3.64
4,1928,5,8.03,,0.47,20.79,2.38,5.13,16.8,5.58,...,3.94,20.05,1.57,0.4,6.7,,18.83,7.37,2.91,0.91


## Data Munging with Pandas

This data needs to be cleaned-up a bit; we can do this with the Pandas library.
We first need to aggregate the incidence data by year:

In [2]:
annual = data.drop('WEEK', axis=1).groupby('YEAR').sum()
annual.head()

Unnamed: 0_level_0,ALABAMA,ALASKA,ARIZONA,ARKANSAS,CALIFORNIA,COLORADO,CONNECTICUT,DELAWARE,DISTRICT OF COLUMBIA,FLORIDA,...,SOUTH DAKOTA,TENNESSEE,TEXAS,UTAH,VERMONT,VIRGINIA,WASHINGTON,WEST VIRGINIA,WISCONSIN,WYOMING
YEAR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1928,334.99,0.0,200.75,481.77,69.22,206.98,634.95,256.02,535.63,119.58,...,160.16,315.43,97.35,16.83,334.8,0.0,344.82,195.98,124.61,227.0
1929,111.93,0.0,54.88,67.22,72.8,74.24,614.82,239.82,94.2,78.01,...,167.77,33.04,71.28,68.9,105.31,0.0,248.6,380.14,1016.54,312.16
1930,157.0,0.0,466.31,53.44,760.24,1132.76,112.23,109.25,182.1,356.59,...,346.31,179.91,73.12,1044.79,236.69,0.0,631.64,157.7,748.58,341.55
1931,337.29,0.0,497.69,45.91,477.48,453.27,790.46,1003.28,832.99,260.79,...,212.36,134.79,39.56,29.72,318.4,0.0,197.43,291.38,506.57,60.69
1932,10.21,0.0,20.11,5.33,214.08,222.9,348.27,15.98,53.14,13.63,...,96.37,68.99,76.58,13.91,1146.08,53.4,631.93,599.65,935.31,242.1


Next, because Altair is built to handle data where each row corresponds to a single sample, we will stack the data, re-labeling the columns for clarity:

In [3]:
measles = annual.reset_index()
measles = measles.melt('YEAR', var_name='state', value_name='incidence')
measles.head()

Unnamed: 0,YEAR,state,incidence
0,1928,ALABAMA,334.99
1,1929,ALABAMA,111.93
2,1930,ALABAMA,157.0
3,1931,ALABAMA,337.29
4,1932,ALABAMA,10.21


## Initial Visualization

Now we can use Altair's syntax for generating a heat map:

In [4]:
import altair as alt

In [5]:
alt.Chart(measles).mark_rect().encode(
    x='YEAR:O',
    y='state:N',
    color='incidence'
).properties(
    width=600,
    height=400
)

## Adjusting Aesthetics

All operative components of the visualization appear above, we now just have to adjust the aesthetic features to reproduce the original plot.
Altair allows a wide range of flexibility for such adjustments, including size and color of markings, axis labels and titles, and more.

Here is the data visualized again with a number of these adjustments:

In [6]:
# Define a custom colormape using Hex codes & HTML color names
colormap = alt.Scale(domain=[0, 100, 200, 300, 1000, 3000],
                     range=['#F0F8FF', 'cornflowerblue', 'mediumseagreen', '#FFEE00', 'darkorange', 'firebrick'],
                     type='sqrt')

alt.Chart(measles).mark_rect().encode(
    alt.X('YEAR:O', axis=alt.Axis(title=None, ticks=False)),
    alt.Y('state:N', axis=alt.Axis(title=None, ticks=False)),
    alt.Color('incidence:Q', sort='ascending', scale=colormap, legend=None)
).properties(
    width=800,
    height=500
)

The result clearly shows the impact of the the measles vaccine introduced in the mid-1960s.

## Layering & Selections

Here is another view of the data, using layering and selections to allow zooming-in

In [7]:
hover = alt.selection_single(on='mouseover', nearest=True, fields=['state'], empty='none')

line = alt.Chart().mark_line().encode(
    alt.X('YEAR:Q',
        scale=alt.Scale(zero=False),
        axis=alt.Axis(format='f', title='year')
    ),
    alt.Y('incidence:Q', axis=alt.Axis(title='measles incidence')),
    detail='state:N',
    opacity=alt.condition(hover, alt.value(1.0), alt.value(0.1))
).properties(
    width=800,
    height=300
)

point = line.mark_point().encode(
    opacity=alt.value(0.0)
).properties(
    selection=hover
)

mean = alt.Chart().mark_line().encode(
    x=alt.X('YEAR:Q', scale=alt.Scale(zero=False)),
    y='mean(incidence):Q',
    color=alt.value('black')
)

text = alt.Chart().mark_text(align='right').encode(
    x='min(YEAR):Q',
    y='mean(incidence):Q',
    text='state:N',
    detail='state:N',
    opacity=alt.condition(hover, alt.value(1.0), alt.value(0.0))
)

alt.layer(point, line, mean, text, data=measles).interactive(bind_y=False)