{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exploring a DataSet\n",
    "\n",
    "Now that we have seen the basic pieces of Altair's API, it's time to practice using it to explore a new dataset.\n",
    "With your partner, choose one of the following four datasets, detailed below.\n",
    "\n",
    "As you explore the data, recall the building blocks we've discussed:\n",
    "\n",
    "- various marks: ``mark_point()``, ``mark_line()``, ``mark_tick()``, ``mark_bar()``, ``mark_area()``, ``mark_rect()``, etc.\n",
    "- various encodings: ``x``, ``y``, ``color``, ``shape``, ``size``, ``row``, ``column``, ``text``, ``tooltip``, etc.\n",
    "- binning and aggregations: a [List of available aggregations](https://altair-viz.github.io/user_guide/encoding.html#binning-and-aggregation) can be found in Altair's documentation\n",
    "- stacking and layering (``alt.layer`` <-> ``+``, ``alt.hconcat`` <-> ``|``, ``alt.vconcat`` <-> ``&``)\n",
    "\n",
    "Start simple and build from there. Which encodings work best with quantitative data? With categorical data?\n",
    "What can you learn about your dataset using these tools?\n",
    "\n",
    "We'll set aside about 20 minutes for you to work on this with your partner."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from vega_datasets import data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Seattle Weather\n",
    "\n",
    "This data includes daily precipitation, temperature range, wind speed, and weather type as a function of date between 2012 and 2015 in Seattle."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>precipitation</th>\n",
       "      <th>temp_max</th>\n",
       "      <th>temp_min</th>\n",
       "      <th>wind</th>\n",
       "      <th>weather</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2012-01-01</td>\n",
       "      <td>0.0</td>\n",
       "      <td>12.8</td>\n",
       "      <td>5.0</td>\n",
       "      <td>4.7</td>\n",
       "      <td>drizzle</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2012-01-02</td>\n",
       "      <td>10.9</td>\n",
       "      <td>10.6</td>\n",
       "      <td>2.8</td>\n",
       "      <td>4.5</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2012-01-03</td>\n",
       "      <td>0.8</td>\n",
       "      <td>11.7</td>\n",
       "      <td>7.2</td>\n",
       "      <td>2.3</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2012-01-04</td>\n",
       "      <td>20.3</td>\n",
       "      <td>12.2</td>\n",
       "      <td>5.6</td>\n",
       "      <td>4.7</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2012-01-05</td>\n",
       "      <td>1.3</td>\n",
       "      <td>8.9</td>\n",
       "      <td>2.8</td>\n",
       "      <td>6.1</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        date  precipitation  temp_max  temp_min  wind  weather\n",
       "0 2012-01-01            0.0      12.8       5.0   4.7  drizzle\n",
       "1 2012-01-02           10.9      10.6       2.8   4.5     rain\n",
       "2 2012-01-03            0.8      11.7       7.2   2.3     rain\n",
       "3 2012-01-04           20.3      12.2       5.6   4.7     rain\n",
       "4 2012-01-05            1.3       8.9       2.8   6.1     rain"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "weather = data.seattle_weather()\n",
    "weather.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gapminder\n",
    "\n",
    "This data consists of population, fertility, and life expectancy over time in a number of countries around the world.\n",
    "\n",
    "Note that, while you may be tempted to use a temporal encoding for the year, here the year is simply a number, not a date stamp, and so temporal encoding is not the best choice here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>country</th>\n",
       "      <th>cluster</th>\n",
       "      <th>pop</th>\n",
       "      <th>life_expect</th>\n",
       "      <th>fertility</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1955</td>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>8891209</td>\n",
       "      <td>30.332</td>\n",
       "      <td>7.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1960</td>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>9829450</td>\n",
       "      <td>31.997</td>\n",
       "      <td>7.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1965</td>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>10997885</td>\n",
       "      <td>34.020</td>\n",
       "      <td>7.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1970</td>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>12430623</td>\n",
       "      <td>36.088</td>\n",
       "      <td>7.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1975</td>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>14132019</td>\n",
       "      <td>38.438</td>\n",
       "      <td>7.7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   year      country  cluster       pop  life_expect  fertility\n",
       "0  1955  Afghanistan        0   8891209       30.332        7.7\n",
       "1  1960  Afghanistan        0   9829450       31.997        7.7\n",
       "2  1965  Afghanistan        0  10997885       34.020        7.7\n",
       "3  1970  Afghanistan        0  12430623       36.088        7.7\n",
       "4  1975  Afghanistan        0  14132019       38.438        7.7"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gapminder = data.gapminder()\n",
    "gapminder.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Population\n",
    "\n",
    "This data contains the US population sub-divided by age and sex every decade from 1850 to near the present.\n",
    "\n",
    "Note that, while you may be tempted to use a temporal encoding for the year, here the year is simply a number, not a date stamp, and so temporal encoding is not the best choice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>age</th>\n",
       "      <th>sex</th>\n",
       "      <th>people</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1850</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1483789</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1850</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1450376</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1850</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>1411067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1850</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td>1359668</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1850</td>\n",
       "      <td>10</td>\n",
       "      <td>1</td>\n",
       "      <td>1260099</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   year  age  sex   people\n",
       "0  1850    0    1  1483789\n",
       "1  1850    0    2  1450376\n",
       "2  1850    5    1  1411067\n",
       "3  1850    5    2  1359668\n",
       "4  1850   10    1  1260099"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "population = data.population()\n",
    "population.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Movies\n",
    "\n",
    "The movies dataset has data on 3200 movies, including release date, budget, and ratings on IMDB and Rotten Tomatoes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>US_Gross</th>\n",
       "      <th>Worldwide_Gross</th>\n",
       "      <th>US_DVD_Sales</th>\n",
       "      <th>Production_Budget</th>\n",
       "      <th>Release_Date</th>\n",
       "      <th>MPAA_Rating</th>\n",
       "      <th>Running_Time_min</th>\n",
       "      <th>Distributor</th>\n",
       "      <th>Source</th>\n",
       "      <th>Major_Genre</th>\n",
       "      <th>Creative_Type</th>\n",
       "      <th>Director</th>\n",
       "      <th>Rotten_Tomatoes_Rating</th>\n",
       "      <th>IMDB_Rating</th>\n",
       "      <th>IMDB_Votes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The Land Girls</td>\n",
       "      <td>146083.0</td>\n",
       "      <td>146083.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8000000.0</td>\n",
       "      <td>Jun 12 1998</td>\n",
       "      <td>R</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Gramercy</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>6.1</td>\n",
       "      <td>1071.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>First Love, Last Rites</td>\n",
       "      <td>10876.0</td>\n",
       "      <td>10876.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>300000.0</td>\n",
       "      <td>Aug 07 1998</td>\n",
       "      <td>R</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Strand</td>\n",
       "      <td>None</td>\n",
       "      <td>Drama</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>6.9</td>\n",
       "      <td>207.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>I Married a Strange Person</td>\n",
       "      <td>203134.0</td>\n",
       "      <td>203134.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>250000.0</td>\n",
       "      <td>Aug 28 1998</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Lionsgate</td>\n",
       "      <td>None</td>\n",
       "      <td>Comedy</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>6.8</td>\n",
       "      <td>865.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Let's Talk About Sex</td>\n",
       "      <td>373615.0</td>\n",
       "      <td>373615.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>300000.0</td>\n",
       "      <td>Sep 11 1998</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Fine Line</td>\n",
       "      <td>None</td>\n",
       "      <td>Comedy</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>13.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Slam</td>\n",
       "      <td>1009819.0</td>\n",
       "      <td>1087521.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1000000.0</td>\n",
       "      <td>Oct 09 1998</td>\n",
       "      <td>R</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Trimark</td>\n",
       "      <td>Original Screenplay</td>\n",
       "      <td>Drama</td>\n",
       "      <td>Contemporary Fiction</td>\n",
       "      <td>None</td>\n",
       "      <td>62.0</td>\n",
       "      <td>3.4</td>\n",
       "      <td>165.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        Title   US_Gross  Worldwide_Gross  US_DVD_Sales  \\\n",
       "0              The Land Girls   146083.0         146083.0           NaN   \n",
       "1      First Love, Last Rites    10876.0          10876.0           NaN   \n",
       "2  I Married a Strange Person   203134.0         203134.0           NaN   \n",
       "3        Let's Talk About Sex   373615.0         373615.0           NaN   \n",
       "4                        Slam  1009819.0        1087521.0           NaN   \n",
       "\n",
       "   Production_Budget Release_Date MPAA_Rating  Running_Time_min Distributor  \\\n",
       "0          8000000.0  Jun 12 1998           R               NaN    Gramercy   \n",
       "1           300000.0  Aug 07 1998           R               NaN      Strand   \n",
       "2           250000.0  Aug 28 1998        None               NaN   Lionsgate   \n",
       "3           300000.0  Sep 11 1998        None               NaN   Fine Line   \n",
       "4          1000000.0  Oct 09 1998           R               NaN     Trimark   \n",
       "\n",
       "                Source Major_Genre         Creative_Type Director  \\\n",
       "0                 None        None                  None     None   \n",
       "1                 None       Drama                  None     None   \n",
       "2                 None      Comedy                  None     None   \n",
       "3                 None      Comedy                  None     None   \n",
       "4  Original Screenplay       Drama  Contemporary Fiction     None   \n",
       "\n",
       "   Rotten_Tomatoes_Rating  IMDB_Rating  IMDB_Votes  \n",
       "0                     NaN          6.1      1071.0  \n",
       "1                     NaN          6.9       207.0  \n",
       "2                     NaN          6.8       865.0  \n",
       "3                    13.0          NaN         NaN  \n",
       "4                    62.0          3.4       165.0  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "movies = data.movies()\n",
    "movies.head()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}