{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Titanic Examples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook contains examples that show how to visualize the Titanic data set using Altair." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import seaborn.apionly as sns\n", "import altair.api as alt\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
Lightning initialized
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Running local mode, some functionality limited.\n", "\n" ] } ], "source": [ "alt.use_renderer('lightning')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df = sns.load_dataset('titanic')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df = df.dropna()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
survivedpclasssexagesibspparchfareembarkedclasswhoadult_maledeckembark_townalivealone
111female381071.2833CFirstwomanFalseCCherbourgyesFalse
311female351053.1000SFirstwomanFalseCSouthamptonyesFalse
601male540051.8625SFirstmanTrueESouthamptonnoTrue
1013female41116.7000SThirdchildFalseGSouthamptonyesFalse
1111female580026.5500SFirstwomanFalseCSouthamptonyesTrue
\n", "
" ], "text/plain": [ " survived pclass sex age sibsp parch fare embarked class \\\n", "1 1 1 female 38 1 0 71.2833 C First \n", "3 1 1 female 35 1 0 53.1000 S First \n", "6 0 1 male 54 0 0 51.8625 S First \n", "10 1 3 female 4 1 1 16.7000 S Third \n", "11 1 1 female 58 0 0 26.5500 S First \n", "\n", " who adult_male deck embark_town alive alone \n", "1 woman False C Cherbourg yes False \n", "3 woman False C Southampton yes False \n", "6 man True E Southampton no True \n", "10 child False G Southampton yes False \n", "11 woman False C Southampton yes True " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is a basic scatterplot of the `age` and `fare`:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "\t
\n", "\t\t
\n", "\t\t\t
\n", "\t\t
\n", "\t
\n", "
\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v1 = alt.Viz(df).encode(x='age', y='fare').point()\n", "v1.render()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Facet the rows by the categorical column `sex`:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "\t
\n", "\t\t
\n", "\t\t\t
\n", "\t\t
\n", "\t
\n", "
\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v2 = alt.Viz(df).encode(x='age', y='fare', row='sex').point()\n", "v2.render()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add color based grouping on the `suvived` column. Notice here, we are specifying that the `survived` column is a *nominal* data type by adding the string `:N` to the end of the column name. In Altair and vega-lite, there are two categorical data types:\n", "\n", "1. Nominal: categorical with no implicit ordering (Male, Female).\n", "2. Ordinal: categorical with implicit ordering (Monday, Tuesday, Wednesday, etc.).\n", "\n", "Which of these isused will determine how values are mapped onto the visual properties." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "\t
\n", "\t\t
\n", "\t\t\t
\n", "\t\t
\n", "\t
\n", "
\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v3 = alt.Viz(df).encode(x='age', y='fare', row='sex', color='survived:N').point()\n", "v3.render()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A histogram is just an encoding with a default aggregation and count specified:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "\t
\n", "\t\t
\n", "\t\t\t
\n", "\t\t
\n", "\t
\n", "
\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v4 = alt.Viz(df).hist(x='fare', color='survived:N', bins=30)\n", "v4.render()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{'type': 'Q', 'bin': {'maxbins': 30}, 'name': 'fare'}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v4.encoding.x" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{'type': 'Q', 'bin': False, 'aggregate': 'count', 'name': 'fare'}" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "v4.encoding.y" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.0" } }, "nbformat": 4, "nbformat_minor": 0 }