{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring a DataSet\n", "\n", "Now that we have seen the basic pieces of Altair's API, it's time to practice using it to explore a new dataset.\n", "With your partner, choose one of the following four datasets, detailed below.\n", "\n", "As you explore the data, recall the building blocks we've discussed:\n", "\n", "- various marks: ``mark_point()``, ``mark_line()``, ``mark_tick()``, ``mark_bar()``, ``mark_area()``, ``mark_rect()``, etc.\n", "- various encodings: ``x``, ``y``, ``color``, ``shape``, ``size``, ``row``, ``column``, ``text``, ``tooltip``, etc.\n", "- binning and aggregations: a [List of available aggregations](https://altair-viz.github.io/user_guide/encoding.html#binning-and-aggregation) can be found in Altair's documentation\n", "- stacking and layering (``alt.layer`` <-> ``+``, ``alt.hconcat`` <-> ``|``, ``alt.vconcat`` <-> ``&``)\n", "\n", "Start simple and build from there. Which encodings work best with quantitative data? With categorical data?\n", "What can you learn about your dataset using these tools?\n", "\n", "We'll set aside about 20 minutes for you to work on this with your partner." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from vega_datasets import data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Seattle Weather\n", "\n", "This data includes daily precipitation, temperature range, wind speed, and weather type as a function of date between 2012 and 2015 in Seattle." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dateprecipitationtemp_maxtemp_minwindweather
02012-01-010.012.85.04.7drizzle
12012-01-0210.910.62.84.5rain
22012-01-030.811.77.22.3rain
32012-01-0420.312.25.64.7rain
42012-01-051.38.92.86.1rain
\n", "
" ], "text/plain": [ " date precipitation temp_max temp_min wind weather\n", "0 2012-01-01 0.0 12.8 5.0 4.7 drizzle\n", "1 2012-01-02 10.9 10.6 2.8 4.5 rain\n", "2 2012-01-03 0.8 11.7 7.2 2.3 rain\n", "3 2012-01-04 20.3 12.2 5.6 4.7 rain\n", "4 2012-01-05 1.3 8.9 2.8 6.1 rain" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weather = data.seattle_weather()\n", "weather.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gapminder\n", "\n", "This data consists of population, fertility, and life expectancy over time in a number of countries around the world.\n", "\n", "Note that, while you may be tempted to use a temporal encoding for the year, here the year is simply a number, not a date stamp, and so temporal encoding is not the best choice here." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearcountryclusterpoplife_expectfertility
01955Afghanistan0889120930.3327.7
11960Afghanistan0982945031.9977.7
21965Afghanistan01099788534.0207.7
31970Afghanistan01243062336.0887.7
41975Afghanistan01413201938.4387.7
\n", "
" ], "text/plain": [ " year country cluster pop life_expect fertility\n", "0 1955 Afghanistan 0 8891209 30.332 7.7\n", "1 1960 Afghanistan 0 9829450 31.997 7.7\n", "2 1965 Afghanistan 0 10997885 34.020 7.7\n", "3 1970 Afghanistan 0 12430623 36.088 7.7\n", "4 1975 Afghanistan 0 14132019 38.438 7.7" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gapminder = data.gapminder()\n", "gapminder.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Population\n", "\n", "This data contains the US population sub-divided by age and sex every decade from 1850 to near the present.\n", "\n", "Note that, while you may be tempted to use a temporal encoding for the year, here the year is simply a number, not a date stamp, and so temporal encoding is not the best choice." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearagesexpeople
01850011483789
11850021450376
21850511411067
31850521359668
418501011260099
\n", "
" ], "text/plain": [ " year age sex people\n", "0 1850 0 1 1483789\n", "1 1850 0 2 1450376\n", "2 1850 5 1 1411067\n", "3 1850 5 2 1359668\n", "4 1850 10 1 1260099" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "population = data.population()\n", "population.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Movies\n", "\n", "The movies dataset has data on 3200 movies, including release date, budget, and ratings on IMDB and Rotten Tomatoes." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleUS_GrossWorldwide_GrossUS_DVD_SalesProduction_BudgetRelease_DateMPAA_RatingRunning_Time_minDistributorSourceMajor_GenreCreative_TypeDirectorRotten_Tomatoes_RatingIMDB_RatingIMDB_Votes
0The Land Girls146083.0146083.0NaN8000000.0Jun 12 1998RNaNGramercyNoneNoneNoneNoneNaN6.11071.0
1First Love, Last Rites10876.010876.0NaN300000.0Aug 07 1998RNaNStrandNoneDramaNoneNoneNaN6.9207.0
2I Married a Strange Person203134.0203134.0NaN250000.0Aug 28 1998NoneNaNLionsgateNoneComedyNoneNoneNaN6.8865.0
3Let's Talk About Sex373615.0373615.0NaN300000.0Sep 11 1998NoneNaNFine LineNoneComedyNoneNone13.0NaNNaN
4Slam1009819.01087521.0NaN1000000.0Oct 09 1998RNaNTrimarkOriginal ScreenplayDramaContemporary FictionNone62.03.4165.0
\n", "
" ], "text/plain": [ " Title US_Gross Worldwide_Gross US_DVD_Sales \\\n", "0 The Land Girls 146083.0 146083.0 NaN \n", "1 First Love, Last Rites 10876.0 10876.0 NaN \n", "2 I Married a Strange Person 203134.0 203134.0 NaN \n", "3 Let's Talk About Sex 373615.0 373615.0 NaN \n", "4 Slam 1009819.0 1087521.0 NaN \n", "\n", " Production_Budget Release_Date MPAA_Rating Running_Time_min Distributor \\\n", "0 8000000.0 Jun 12 1998 R NaN Gramercy \n", "1 300000.0 Aug 07 1998 R NaN Strand \n", "2 250000.0 Aug 28 1998 None NaN Lionsgate \n", "3 300000.0 Sep 11 1998 None NaN Fine Line \n", "4 1000000.0 Oct 09 1998 R NaN Trimark \n", "\n", " Source Major_Genre Creative_Type Director \\\n", "0 None None None None \n", "1 None Drama None None \n", "2 None Comedy None None \n", "3 None Comedy None None \n", "4 Original Screenplay Drama Contemporary Fiction None \n", "\n", " Rotten_Tomatoes_Rating IMDB_Rating IMDB_Votes \n", "0 NaN 6.1 1071.0 \n", "1 NaN 6.9 207.0 \n", "2 NaN 6.8 865.0 \n", "3 13.0 NaN NaN \n", "4 62.0 3.4 165.0 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "movies = data.movies()\n", "movies.head()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }