{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"from datascience import *\n",
"import numpy as np\n",
"\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plots\n",
"plots.style.use('fivethirtyeight')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Maps\n",
"\n",
"We'll illustrate the use of maps with the New York Times Covid-19 database. Recall that this is \n",
"a county-level database of confirmed cases and deaths, updated daily,\n",
"compiled from state and local governments and health departments across the United States.\n",
"\n",
"The Times has created many visualizations that are effective communications of important information about the pandemic. Here we will construct some simple circle maps.\n",
"\n",
"The data are publically available via GitHub: [https://github.com/nytimes/covid-19-data](https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html). \n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
" \n",
" \n",
" | date | county | state | fips | cases | deaths | \n",
"
\n",
" \n",
" \n",
" \n",
" | 2020-01-21 | Snohomish | Washington | 53061 | 1 | 0 | \n",
"
\n",
" \n",
" | 2020-01-22 | Snohomish | Washington | 53061 | 1 | 0 | \n",
"
\n",
" \n",
" | 2020-01-23 | Snohomish | Washington | 53061 | 1 | 0 | \n",
"
\n",
" \n",
" | 2020-01-24 | Cook | Illinois | 17031 | 1 | 0 | \n",
"
\n",
" \n",
" | 2020-01-24 | Snohomish | Washington | 53061 | 1 | 0 | \n",
"
\n",
" \n",
" | 2020-01-25 | Orange | California | 6059 | 1 | 0 | \n",
"
\n",
" \n",
" | 2020-01-25 | Cook | Illinois | 17031 | 1 | 0 | \n",
"
\n",
" \n",
" | 2020-01-25 | Snohomish | Washington | 53061 | 1 | 0 | \n",
"
\n",
" \n",
" | 2020-01-26 | Maricopa | Arizona | 4013 | 1 | 0 | \n",
"
\n",
" \n",
" | 2020-01-26 | Los Angeles | California | 6037 | 1 | 0 | \n",
"
\n",
" \n",
"
\n",
"... (1076214 rows omitted)
"
],
"text/plain": [
"date | county | state | fips | cases | deaths\n",
"2020-01-21 | Snohomish | Washington | 53061 | 1 | 0\n",
"2020-01-22 | Snohomish | Washington | 53061 | 1 | 0\n",
"2020-01-23 | Snohomish | Washington | 53061 | 1 | 0\n",
"2020-01-24 | Cook | Illinois | 17031 | 1 | 0\n",
"2020-01-24 | Snohomish | Washington | 53061 | 1 | 0\n",
"2020-01-25 | Orange | California | 6059 | 1 | 0\n",
"2020-01-25 | Cook | Illinois | 17031 | 1 | 0\n",
"2020-01-25 | Snohomish | Washington | 53061 | 1 | 0\n",
"2020-01-26 | Maricopa | Arizona | 4013 | 1 | 0\n",
"2020-01-26 | Los Angeles | California | 6037 | 1 | 0\n",
"... (1076214 rows omitted)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"covid_table = Table.read_table(\"https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv\")\n",
"covid_table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we load in a table of geographical data for counties."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | county | state | fips | lat | lon | \n",
"
\n",
" \n",
" \n",
" \n",
" | New York City | New York | nan | 40.7146 | -74.0071 | \n",
"
\n",
" \n",
" | Westchester | New York | 36119 | 41.1191 | -73.7887 | \n",
"
\n",
" \n",
" | Nassau | New York | 36059 | 42.5164 | -73.6113 | \n",
"
\n",
" \n",
" | Suffolk | New York | 36103 | 40.9601 | -72.8343 | \n",
"
\n",
" \n",
" | Cook | Illinois | 17031 | 41.8139 | -87.6155 | \n",
"
\n",
" \n",
" | King | Washington | 53033 | 47.4325 | -121.959 | \n",
"
\n",
" \n",
" | Unknown | New Jersey | nan | 0 | 0 | \n",
"
\n",
" \n",
" | Wayne | Michigan | 26163 | 42.2852 | -83.3836 | \n",
"
\n",
" \n",
" | Los Angeles | California | 6037 | 34.0536 | -118.246 | \n",
"
\n",
" \n",
" | Bergen | New Jersey | 34003 | 40.9476 | -74.0276 | \n",
"
\n",
" \n",
"
\n",
"... (1660 rows omitted)
"
],
"text/plain": [
"county | state | fips | lat | lon\n",
"New York City | New York | nan | 40.7146 | -74.0071\n",
"Westchester | New York | 36119 | 41.1191 | -73.7887\n",
"Nassau | New York | 36059 | 42.5164 | -73.6113\n",
"Suffolk | New York | 36103 | 40.9601 | -72.8343\n",
"Cook | Illinois | 17031 | 41.8139 | -87.6155\n",
"King | Washington | 53033 | 47.4325 | -121.959\n",
"Unknown | New Jersey | nan | 0 | 0\n",
"Wayne | Michigan | 26163 | 42.2852 | -83.3836\n",
"Los Angeles | California | 6037 | 34.0536 | -118.246\n",
"Bergen | New Jersey | 34003 | 40.9476 | -74.0276\n",
"... (1660 rows omitted)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"county_geo = Table.read_table(\"https://raw.githubusercontent.com/jdlafferty/covid-19/master/data/geo-counties.csv\") \n",
"county_geo"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Process the recent data\n",
"\n",
"First we will calculate the average new cases for a subset of states. "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | date | county | state | fips | cases | deaths | \n",
"
\n",
" \n",
" \n",
" \n",
" | 2021-02-21 | Fairfield | Connecticut | 9001 | 78181 | 2027 | \n",
"
\n",
" \n",
" | 2021-02-21 | Hartford | Connecticut | 9003 | 68527 | 2251 | \n",
"
\n",
" \n",
" | 2021-02-21 | Litchfield | Connecticut | 9005 | 10835 | 270 | \n",
"
\n",
" \n",
" | 2021-02-21 | Middlesex | Connecticut | 9007 | 10113 | 337 | \n",
"
\n",
" \n",
" | 2021-02-21 | New Haven | Connecticut | 9009 | 69233 | 1891 | \n",
"
\n",
" \n",
" | 2021-02-21 | New London | Connecticut | 9011 | 18635 | 403 | \n",
"
\n",
" \n",
" | 2021-02-21 | Tolland | Connecticut | 9013 | 7612 | 159 | \n",
"
\n",
" \n",
" | 2021-02-21 | Unknown | Connecticut | nan | 937 | 9 | \n",
"
\n",
" \n",
" | 2021-02-21 | Windham | Connecticut | 9015 | 9028 | 176 | \n",
"
\n",
" \n",
" | 2021-02-21 | Androscoggin | Maine | 23001 | 4695 | 54 | \n",
"
\n",
" \n",
"
\n",
"... (570 rows omitted)
"
],
"text/plain": [
"date | county | state | fips | cases | deaths\n",
"2021-02-21 | Fairfield | Connecticut | 9001 | 78181 | 2027\n",
"2021-02-21 | Hartford | Connecticut | 9003 | 68527 | 2251\n",
"2021-02-21 | Litchfield | Connecticut | 9005 | 10835 | 270\n",
"2021-02-21 | Middlesex | Connecticut | 9007 | 10113 | 337\n",
"2021-02-21 | New Haven | Connecticut | 9009 | 69233 | 1891\n",
"2021-02-21 | New London | Connecticut | 9011 | 18635 | 403\n",
"2021-02-21 | Tolland | Connecticut | 9013 | 7612 | 159\n",
"2021-02-21 | Unknown | Connecticut | nan | 937 | 9\n",
"2021-02-21 | Windham | Connecticut | 9015 | 9028 | 176\n",
"2021-02-21 | Androscoggin | Maine | 23001 | 4695 | 54\n",
"... (570 rows omitted)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first_date = '2021-02-20'\n",
"\n",
"# Some subsets of states to visualize:\n",
"continental_states = ['Pennsylvania', 'Arizona', 'Connecticut', 'Florida', 'Wisconsin', 'South Dakota', 'Colorado',\n",
" 'New Jersey', 'California', 'Ohio', 'West Virginia', 'Oregon', 'Alabama', 'Maine', 'Vermont',\n",
" 'Montana', 'Nevada', 'Washington', 'Massachusetts', 'Maryland', 'Iowa', 'North Dakota', 'Kentucky',\n",
" 'Delaware', 'Mississippi', 'Tennessee', 'Virginia', 'Kansas', 'Missouri', 'Utah', 'North Carolina', 'Louisiana',\n",
" 'South Carolina', 'Minnesota', 'Arkansas', 'Indiana', 'New York', 'Wyoming', 'New Mexico', 'Rhode Island',\n",
" 'Michigan', 'Nebraska', 'New Hampshire', 'Georgia', 'Texas', 'Illinois', 'Oklahoma', 'Idaho']\n",
"\n",
"new_england_states = ['Connecticut', 'Massachusetts', 'Vermont', 'New Hampshire', 'Rhode Island', 'Maine']\n",
"west_coast_states = ['California', 'Oregon', 'Washington']\n",
"southeast_states = ['Florida', 'Louisiana', 'Alabama', 'Mississippi', 'Georgia', 'South Carolina']\n",
"\n",
"states = new_england_states\n",
"\n",
"recent_data = covid_table.where('date', are.above(first_date))\n",
"recent_state_data = recent_data.where('state', are.contained_in(states))\n",
"recent_state_data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[FIPS](https://en.wikipedia.org/wiki/FIPS_county_code) is a number assigned to each county. This will be used to merge the Covid-19 data with the geo data.\n",
"\n",
"In the following code, we group the data by the FIPS county code, and compute the new cases for each day during the past week in that county. Then we average. \n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | fips | new cases | \n",
"
\n",
" \n",
" \n",
" \n",
" | 9001 | 294.572 | \n",
"
\n",
" \n",
" | 9003 | 207.858 | \n",
"
\n",
" \n",
" | 9005 | 38.2867 | \n",
"
\n",
" \n",
" | 9007 | 39.001 | \n",
"
\n",
" \n",
" | 9009 | 280.144 | \n",
"
\n",
" \n",
" | 9011 | 69.7153 | \n",
"
\n",
" \n",
" | 9013 | 25.001 | \n",
"
\n",
" \n",
" | 9015 | 18.1439 | \n",
"
\n",
" \n",
" | 23001 | 16.2867 | \n",
"
\n",
" \n",
" | 23003 | 2.28671 | \n",
"
\n",
" \n",
"
\n",
"... (57 rows omitted)
"
],
"text/plain": [
"fips | new cases\n",
"9001 | 294.572\n",
"9003 | 207.858\n",
"9005 | 38.2867\n",
"9007 | 39.001\n",
"9009 | 280.144\n",
"9011 | 69.7153\n",
"9013 | 25.001\n",
"9015 | 18.1439\n",
"23001 | 16.2867\n",
"23003 | 2.28671\n",
"... (57 rows omitted)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# remove extra columns\n",
"data = recent_state_data.drop('date').drop('county').drop('state').drop('deaths')\n",
"\n",
"# exclude cases where fips is not known\n",
"data = data.where('fips', are.above(0))\n",
"\n",
"# now, group by fips and form a list of the cumlative cases\n",
"data = data.group('fips', list)\n",
"\n",
"# apply the difference function np.diff to get the new cases\n",
"data = data.with_column('new cases', data.apply(np.diff, 'cases list'))\n",
"data = data.drop('cases list')\n",
"\n",
"# Now average to get the average new cases in each county over the past week\n",
"# We add a small amount .001 to avoid zeros, which the graphics handles badly \n",
"new_cases = Table().with_columns('fips', data['fips'], \n",
" 'new cases', data.apply(np.mean, 'new cases') + .001)\n",
"new_cases"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | county | state | fips | lat | lon | \n",
"
\n",
" \n",
" \n",
" \n",
" | Fairfield | Connecticut | 9001 | 41.1663 | -73.2288 | \n",
"
\n",
" \n",
" | Hartford | Connecticut | 9003 | 41.7638 | -72.6739 | \n",
"
\n",
" \n",
" | Litchfield | Connecticut | 9005 | 41.7463 | -73.189 | \n",
"
\n",
" \n",
" | Middlesex | Connecticut | 9007 | 41.4094 | -72.529 | \n",
"
\n",
" \n",
" | New Haven | Connecticut | 9009 | 41.308 | -72.9243 | \n",
"
\n",
" \n",
" | New London | Connecticut | 9011 | 41.3565 | -72.0963 | \n",
"
\n",
" \n",
" | Tolland | Connecticut | 9013 | 41.8745 | -72.374 | \n",
"
\n",
" \n",
" | Windham | Connecticut | 9015 | 41.6999 | -72.1551 | \n",
"
\n",
" \n",
" | Androscoggin | Maine | 23001 | 44.1971 | -70.2027 | \n",
"
\n",
" \n",
" | Cumberland | Maine | 23005 | 43.8005 | -70.2545 | \n",
"
\n",
" \n",
"
\n",
"... (54 rows omitted)
"
],
"text/plain": [
"county | state | fips | lat | lon\n",
"Fairfield | Connecticut | 9001 | 41.1663 | -73.2288\n",
"Hartford | Connecticut | 9003 | 41.7638 | -72.6739\n",
"Litchfield | Connecticut | 9005 | 41.7463 | -73.189\n",
"Middlesex | Connecticut | 9007 | 41.4094 | -72.529\n",
"New Haven | Connecticut | 9009 | 41.308 | -72.9243\n",
"New London | Connecticut | 9011 | 41.3565 | -72.0963\n",
"Tolland | Connecticut | 9013 | 41.8745 | -72.374\n",
"Windham | Connecticut | 9015 | 41.6999 | -72.1551\n",
"Androscoggin | Maine | 23001 | 44.1971 | -70.2027\n",
"Cumberland | Maine | 23005 | 43.8005 | -70.2545\n",
"... (54 rows omitted)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"state_geo = county_geo.where('state', are.contained_in(states)).sort('fips')\n",
"state_geo"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we join the table of average new cases with the geo table, so that we have latitude and longitude \n",
"for each county.\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | county | state | lat | lon | new cases | \n",
"
\n",
" \n",
" \n",
" \n",
" | Fairfield | Connecticut | 41.1663 | -73.2288 | 294.572 | \n",
"
\n",
" \n",
" | Hartford | Connecticut | 41.7638 | -72.6739 | 207.858 | \n",
"
\n",
" \n",
" | Litchfield | Connecticut | 41.7463 | -73.189 | 38.2867 | \n",
"
\n",
" \n",
" | Middlesex | Connecticut | 41.4094 | -72.529 | 39.001 | \n",
"
\n",
" \n",
" | New Haven | Connecticut | 41.308 | -72.9243 | 280.144 | \n",
"
\n",
" \n",
" | New London | Connecticut | 41.3565 | -72.0963 | 69.7153 | \n",
"
\n",
" \n",
" | Tolland | Connecticut | 41.8745 | -72.374 | 25.001 | \n",
"
\n",
" \n",
" | Windham | Connecticut | 41.6999 | -72.1551 | 18.1439 | \n",
"
\n",
" \n",
" | Androscoggin | Maine | 44.1971 | -70.2027 | 16.2867 | \n",
"
\n",
" \n",
" | Cumberland | Maine | 43.8005 | -70.2545 | 41.8581 | \n",
"
\n",
" \n",
"
\n",
"... (49 rows omitted)
"
],
"text/plain": [
"county | state | lat | lon | new cases\n",
"Fairfield | Connecticut | 41.1663 | -73.2288 | 294.572\n",
"Hartford | Connecticut | 41.7638 | -72.6739 | 207.858\n",
"Litchfield | Connecticut | 41.7463 | -73.189 | 38.2867\n",
"Middlesex | Connecticut | 41.4094 | -72.529 | 39.001\n",
"New Haven | Connecticut | 41.308 | -72.9243 | 280.144\n",
"New London | Connecticut | 41.3565 | -72.0963 | 69.7153\n",
"Tolland | Connecticut | 41.8745 | -72.374 | 25.001\n",
"Windham | Connecticut | 41.6999 | -72.1551 | 18.1439\n",
"Androscoggin | Maine | 44.1971 | -70.2027 | 16.2867\n",
"Cumberland | Maine | 43.8005 | -70.2545 | 41.8581\n",
"... (49 rows omitted)"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_cases_geo = state_geo.join('fips', new_cases)\n",
"new_cases_geo = new_cases_geo.drop('fips')\n",
"new_cases_geo\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can create a map where we show a circle at each county location, with area\n",
"proportional to the average number of new cases over the past "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | lat | long | labels | areas | colors | \n",
"
\n",
" \n",
" \n",
" \n",
" | 41.1663 | -73.2288 | Fairfield | 2945.72 | red | \n",
"
\n",
" \n",
" | 41.7638 | -72.6739 | Hartford | 2078.58 | red | \n",
"
\n",
" \n",
" | 41.7463 | -73.189 | Litchfield | 382.867 | red | \n",
"
\n",
" \n",
" | 41.4094 | -72.529 | Middlesex | 390.01 | red | \n",
"
\n",
" \n",
" | 41.308 | -72.9243 | New Haven | 2801.44 | red | \n",
"
\n",
" \n",
" | 41.3565 | -72.0963 | New London | 697.153 | red | \n",
"
\n",
" \n",
" | 41.8745 | -72.374 | Tolland | 250.01 | red | \n",
"
\n",
" \n",
" | 41.6999 | -72.1551 | Windham | 181.439 | red | \n",
"
\n",
" \n",
" | 44.1971 | -70.2027 | Androscoggin | 162.867 | red | \n",
"
\n",
" \n",
" | 43.8005 | -70.2545 | Cumberland | 418.581 | red | \n",
"
\n",
" \n",
"
\n",
"... (49 rows omitted)
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"Make this Notebook Trusted to load map: File -> Trust Notebook
"
],
"text/plain": [
""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dat = Table().with_columns('lat', new_cases_geo['lat'], \n",
" 'long', new_cases_geo['lon'], \n",
" 'labels', new_cases_geo['county'],\n",
" 'areas', 10*new_cases_geo['new cases'],\n",
" 'colors', 'red')\n",
"dat.show(10)\n",
"Circle.map_table(dat, weight=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Table examples"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Drink | Cafe | Price | \n",
"
\n",
" \n",
" \n",
" \n",
" | Milk Tea | Book Trader Cafe | 4 | \n",
"
\n",
" \n",
" | Espresso | Willoughby's | 2 | \n",
"
\n",
" \n",
" | Coffee | Willoughby's | 3 | \n",
"
\n",
" \n",
" | Espresso | Blue State Coffee | 2 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"Drink | Cafe | Price\n",
"Milk Tea | Book Trader Cafe | 4\n",
"Espresso | Willoughby's | 2\n",
"Coffee | Willoughby's | 3\n",
"Espresso | Blue State Coffee | 2"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"drinks = Table(['Drink', 'Cafe', 'Price']).with_rows([\n",
" ['Milk Tea', 'Book Trader Cafe', 4],\n",
" ['Espresso', \"Willoughby's\", 2],\n",
" ['Coffee', \"Willoughby's\", 3],\n",
" ['Espresso', \"Blue State Coffee\", 2]\n",
"])\n",
"drinks"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Coupon % off | Location | \n",
"
\n",
" \n",
" \n",
" \n",
" | 5 | Willoughby's | \n",
"
\n",
" \n",
" | 50 | Blue State Coffee | \n",
"
\n",
" \n",
" | 25 | Willoughby's | \n",
"
\n",
" \n",
" | 0 | Book Trader Cafe | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"Coupon % off | Location\n",
"5 | Willoughby's\n",
"50 | Blue State Coffee\n",
"25 | Willoughby's\n",
"0 | Book Trader Cafe"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"discounts = Table().with_columns(\n",
" 'Coupon % off', make_array(5, 50, 25, 0),\n",
" 'Location', make_array(\"Willoughby's\", \"Blue State Coffee\", \"Willoughby's\", \"Book Trader Cafe\")\n",
")\n",
"discounts"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Cafe | Drink | Price | Coupon % off | \n",
"
\n",
" \n",
" \n",
" \n",
" | Blue State Coffee | Espresso | 2 | 50 | \n",
"
\n",
" \n",
" | Book Trader Cafe | Milk Tea | 4 | 0 | \n",
"
\n",
" \n",
" | Willoughby's | Espresso | 2 | 5 | \n",
"
\n",
" \n",
" | Willoughby's | Espresso | 2 | 25 | \n",
"
\n",
" \n",
" | Willoughby's | Coffee | 3 | 5 | \n",
"
\n",
" \n",
" | Willoughby's | Coffee | 3 | 25 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Cafe | Drink | Price | Coupon % off | Discounted price | \n",
"
\n",
" \n",
" \n",
" \n",
" | Blue State Coffee | Espresso | 2 | 50 | 1 | \n",
"
\n",
" \n",
" | Book Trader Cafe | Milk Tea | 4 | 0 | 4 | \n",
"
\n",
" \n",
" | Willoughby's | Espresso | 2 | 5 | 1.9 | \n",
"
\n",
" \n",
" | Willoughby's | Espresso | 2 | 25 | 1.5 | \n",
"
\n",
" \n",
" | Willoughby's | Coffee | 3 | 5 | 2.85 | \n",
"
\n",
" \n",
" | Willoughby's | Coffee | 3 | 25 | 2.25 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"Cafe | Drink | Price | Coupon % off | Discounted price\n",
"Blue State Coffee | Espresso | 2 | 50 | 1\n",
"Book Trader Cafe | Milk Tea | 4 | 0 | 4\n",
"Willoughby's | Espresso | 2 | 5 | 1.9\n",
"Willoughby's | Espresso | 2 | 25 | 1.5\n",
"Willoughby's | Coffee | 3 | 5 | 2.85\n",
"Willoughby's | Coffee | 3 | 25 | 2.25"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Discussion question: Generate a table with one row per cafe that \n",
"# has the name and discounted price of its cheapest discounted drink\n",
"\n",
"# Link (join) drinks with discounts\n",
"combined = drinks.join('Cafe', discounts, 'Location')\n",
"combined.show()\n",
"\n",
"# Compute discounted prices\n",
"discounted_prices = combined.column('Price') * (1 - combined.column('Coupon % off')/100)\n",
"\n",
"discounted_drinks = combined.with_column('Discounted price', discounted_prices)\n",
"discounted_drinks\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Cafe | Drink | Price | Coupon % off | Discounted price | \n",
"
\n",
" \n",
" \n",
" \n",
" | Blue State Coffee | Espresso | 2 | 50 | 1 | \n",
"
\n",
" \n",
" | Book Trader Cafe | Milk Tea | 4 | 0 | 4 | \n",
"
\n",
" \n",
" | Willoughby's | Espresso | 2 | 25 | 1.5 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"Cafe | Drink | Price | Coupon % off | Discounted price\n",
"Blue State Coffee | Espresso | 2 | 50 | 1\n",
"Book Trader Cafe | Milk Tea | 4 | 0 | 4\n",
"Willoughby's | Espresso | 2 | 25 | 1.5"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Correct, Espresso is cheaper\n",
"discounted_drinks.sort('Discounted price').sort('Cafe', distinct=True) "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Cafe | Drink | Price | Coupon % off | Discounted price | \n",
"
\n",
" \n",
" \n",
" \n",
" | Blue State Coffee | Espresso | 2 | 50 | 1 | \n",
"
\n",
" \n",
" | Book Trader Cafe | Milk Tea | 4 | 0 | 4 | \n",
"
\n",
" \n",
" | Willoughby's | Espresso | 2 | 5 | 1.9 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"Cafe | Drink | Price | Coupon % off | Discounted price\n",
"Blue State Coffee | Espresso | 2 | 50 | 1\n",
"Book Trader Cafe | Milk Tea | 4 | 0 | 4\n",
"Willoughby's | Espresso | 2 | 5 | 1.9"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Incorrect - need to sort by \"Discounted price\" first\n",
"discounted_drinks.sort('Cafe', distinct=True) "
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Cafe | Drink min | Price min | Coupon % off min | Discounted price min | \n",
"
\n",
" \n",
" \n",
" \n",
" | Blue State Coffee | Espresso | 2 | 50 | 1 | \n",
"
\n",
" \n",
" | Book Trader Cafe | Milk Tea | 4 | 0 | 4 | \n",
"
\n",
" \n",
" | Willoughby's | Coffee | 2 | 5 | 1.5 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"Cafe | Drink min | Price min | Coupon % off min | Discounted price min\n",
"Blue State Coffee | Espresso | 2 | 50 | 1\n",
"Book Trader Cafe | Milk Tea | 4 | 0 | 4\n",
"Willoughby's | Coffee | 2 | 5 | 1.5"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Incorrect, Coffee is first alphabetically\n",
"discounted_drinks.group('Cafe', min) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sample midterm question"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Challenge yourself and try to solve these on your own before looking at the solutions!"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Start | End | Duration | \n",
"
\n",
" \n",
" \n",
" \n",
" | Harry Bridges Plaza (Ferry Building) | San Francisco Caltrain (Townsend at 4th) | 765 | \n",
"
\n",
" \n",
" | San Antonio Shopping Center | Mountain View City Hall | 1036 | \n",
"
\n",
" \n",
" | Post at Kearny | 2nd at South Park | 307 | \n",
"
\n",
" \n",
"
\n",
"... (354149 rows omitted)
"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"trip0 = Table.read_table(\"trip.csv\")\n",
"trip = Table().with_columns(\n",
"\"Start\", trip0.column(\"Start Station\"),\n",
"\"End\", trip0.column(\"End Station\"),\n",
"\"Duration\", trip0.column(\"Duration\"))\n",
"trip.show(3)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"# The name of the station where the most rentals ended \n",
"#(assume no ties)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"# The number of stations for which the average duration ending \n",
"# at that station was more than 10 minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# The number of stations that have more than 500 starts \n",
"# AND more than 500 ends"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'San Francisco Caltrain (Townsend at 4th)'"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The name of the station where the most rentals ended (assume no ties).\n",
"# First, find end counts\n",
"# Then, find the station with the highest end count\n",
"trip.group('End').sort('count', descending=True).column(0).item(0)\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"68"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The number of stations for which the average duration ending \n",
"# at that station was more than 10 minutes.\n",
"\n",
"# First, find the average end time for each station\n",
"# Then, keep the ones above 10 minutes\n",
"# Then, count them\n",
"trip.group('End', np.average).where(2, are.above(10*60)).num_rows\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"56"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The number of stations that have more than 500 starts \n",
"# AND more than 500 ends\n",
"# First, find the start counts\n",
"starting = trip.group('Start').relabeled('count', 'Start count').relabeled('Start', 'Station')\n",
"# Then, find the end counts\n",
"ending = trip.group('End').relabeled('count', 'End count').relabeled('End', 'Station')\n",
"# Combine them with join\n",
"starting.join('Station', ending).where('Start count', are.above(500)).where('End count', are.above(500)).num_rows\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comparison ##"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"3 > 1"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"bool"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(3 > 1)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"3 < 1"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"True"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"3 == 3"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"# 3 = 3"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"x = 14\n",
"y = 3"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x > 10"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"12 < x < 18"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"12 < x"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x < 18"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"12 < x-y < 18"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x > 10 and y > 5"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comparisons with arrays"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['cat', 'dog', 'cat', 'cat', 'dog', 'rabbit'], dtype=' 'cat'"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sum(pets > 'cat')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\"cat\" < \"catastrophe\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Predicates and advanced `where`"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Semester | \n",
"
\n",
" \n",
" \n",
" \n",
" | 1 | \n",
"
\n",
" \n",
" | 2 | \n",
"
\n",
" \n",
" | 3 | \n",
"
\n",
" \n",
" | 4 | \n",
"
\n",
" \n",
" | 5 | \n",
"
\n",
" \n",
" | 6 | \n",
"
\n",
" \n",
" | 7 | \n",
"
\n",
" \n",
" | 8 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"Semester\n",
"1\n",
"2\n",
"3\n",
"4\n",
"5\n",
"6\n",
"7\n",
"8"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"terms = Table().with_column('Semester', np.arange(1, 9))\n",
"terms"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Semester | \n",
"
\n",
" \n",
" \n",
" \n",
" | 7 | \n",
"
\n",
" \n",
" | 8 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"Semester\n",
"7\n",
"8"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"terms.where('Semester', are.above(6))"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"is_senior = are.above(6)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"is_senior(4)"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"def also_is_senior(x):\n",
" return x > 6"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"also_is_senior(5)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([False, False, False, False, False, False, True, True])"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"terms.apply(also_is_senior, 'Semester')"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Semester | \n",
"
\n",
" \n",
" \n",
" \n",
" | 7 | \n",
"
\n",
" \n",
" | 8 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"Semester\n",
"7\n",
"8"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"terms.where('Semester', are.above(6))"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Semester | \n",
"
\n",
" \n",
" \n",
" \n",
" | 7 | \n",
"
\n",
" \n",
" | 8 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"Semester\n",
"7\n",
"8"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"terms.where('Semester', is_senior)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Semester | \n",
"
\n",
" \n",
" \n",
" \n",
" | 7 | \n",
"
\n",
" \n",
" | 8 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"Semester\n",
"7\n",
"8"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"terms.where('Semester', also_is_senior)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | Semester | \n",
"
\n",
" \n",
" \n",
" \n",
" | 7 | \n",
"
\n",
" \n",
" | 8 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"Semester\n",
"7\n",
"8"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"terms.where(terms.apply(also_is_senior, 'Semester'))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" | lat | long | labels | areas | colors | \n",
"
\n",
" \n",
" \n",
" \n",
"
"
],
"text/plain": [
"lat | long | labels | areas | colors"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dat.where('labels', are.equal_to('Benton'))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}