{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from datascience import *\n", "import numpy as np\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plots\n", "plots.style.use('fivethirtyeight')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Maps\n", "\n", "We'll illustrate the use of maps with the New York Times Covid-19 database. Recall that this is \n", "a county-level database of confirmed cases and deaths, updated daily,\n", "compiled from state and local governments and health departments across the United States.\n", "\n", "The Times has created many visualizations that are effective communications of important information about the pandemic. Here we will construct some simple circle maps.\n", "\n", "The data are publically available via GitHub: [https://github.com/nytimes/covid-19-data](https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html). \n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date county state fips cases deaths
2020-01-21 Snohomish Washington 53061 1 0
2020-01-22 Snohomish Washington 53061 1 0
2020-01-23 Snohomish Washington 53061 1 0
2020-01-24 Cook Illinois 17031 1 0
2020-01-24 Snohomish Washington 53061 1 0
2020-01-25 Orange California 6059 1 0
2020-01-25 Cook Illinois 17031 1 0
2020-01-25 Snohomish Washington 53061 1 0
2020-01-26 Maricopa Arizona 4013 1 0
2020-01-26 Los Angeles California 6037 1 0
\n", "

... (1076214 rows omitted)

" ], "text/plain": [ "date | county | state | fips | cases | deaths\n", "2020-01-21 | Snohomish | Washington | 53061 | 1 | 0\n", "2020-01-22 | Snohomish | Washington | 53061 | 1 | 0\n", "2020-01-23 | Snohomish | Washington | 53061 | 1 | 0\n", "2020-01-24 | Cook | Illinois | 17031 | 1 | 0\n", "2020-01-24 | Snohomish | Washington | 53061 | 1 | 0\n", "2020-01-25 | Orange | California | 6059 | 1 | 0\n", "2020-01-25 | Cook | Illinois | 17031 | 1 | 0\n", "2020-01-25 | Snohomish | Washington | 53061 | 1 | 0\n", "2020-01-26 | Maricopa | Arizona | 4013 | 1 | 0\n", "2020-01-26 | Los Angeles | California | 6037 | 1 | 0\n", "... (1076214 rows omitted)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "covid_table = Table.read_table(\"https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv\")\n", "covid_table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we load in a table of geographical data for counties." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
county state fips lat lon
New York City New York nan 40.7146 -74.0071
Westchester New York 36119 41.1191 -73.7887
Nassau New York 36059 42.5164 -73.6113
Suffolk New York 36103 40.9601 -72.8343
Cook Illinois 17031 41.8139 -87.6155
King Washington 53033 47.4325 -121.959
Unknown New Jersey nan 0 0
Wayne Michigan 26163 42.2852 -83.3836
Los Angeles California 6037 34.0536 -118.246
Bergen New Jersey 34003 40.9476 -74.0276
\n", "

... (1660 rows omitted)

" ], "text/plain": [ "county | state | fips | lat | lon\n", "New York City | New York | nan | 40.7146 | -74.0071\n", "Westchester | New York | 36119 | 41.1191 | -73.7887\n", "Nassau | New York | 36059 | 42.5164 | -73.6113\n", "Suffolk | New York | 36103 | 40.9601 | -72.8343\n", "Cook | Illinois | 17031 | 41.8139 | -87.6155\n", "King | Washington | 53033 | 47.4325 | -121.959\n", "Unknown | New Jersey | nan | 0 | 0\n", "Wayne | Michigan | 26163 | 42.2852 | -83.3836\n", "Los Angeles | California | 6037 | 34.0536 | -118.246\n", "Bergen | New Jersey | 34003 | 40.9476 | -74.0276\n", "... (1660 rows omitted)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "county_geo = Table.read_table(\"https://raw.githubusercontent.com/jdlafferty/covid-19/master/data/geo-counties.csv\") \n", "county_geo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Process the recent data\n", "\n", "First we will calculate the average new cases for a subset of states. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
date county state fips cases deaths
2021-02-21 Fairfield Connecticut 9001 78181 2027
2021-02-21 Hartford Connecticut 9003 68527 2251
2021-02-21 Litchfield Connecticut 9005 10835 270
2021-02-21 Middlesex Connecticut 9007 10113 337
2021-02-21 New Haven Connecticut 9009 69233 1891
2021-02-21 New London Connecticut 9011 18635 403
2021-02-21 Tolland Connecticut 9013 7612 159
2021-02-21 Unknown Connecticut nan 937 9
2021-02-21 Windham Connecticut 9015 9028 176
2021-02-21 Androscoggin Maine 23001 4695 54
\n", "

... (570 rows omitted)

" ], "text/plain": [ "date | county | state | fips | cases | deaths\n", "2021-02-21 | Fairfield | Connecticut | 9001 | 78181 | 2027\n", "2021-02-21 | Hartford | Connecticut | 9003 | 68527 | 2251\n", "2021-02-21 | Litchfield | Connecticut | 9005 | 10835 | 270\n", "2021-02-21 | Middlesex | Connecticut | 9007 | 10113 | 337\n", "2021-02-21 | New Haven | Connecticut | 9009 | 69233 | 1891\n", "2021-02-21 | New London | Connecticut | 9011 | 18635 | 403\n", "2021-02-21 | Tolland | Connecticut | 9013 | 7612 | 159\n", "2021-02-21 | Unknown | Connecticut | nan | 937 | 9\n", "2021-02-21 | Windham | Connecticut | 9015 | 9028 | 176\n", "2021-02-21 | Androscoggin | Maine | 23001 | 4695 | 54\n", "... (570 rows omitted)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "first_date = '2021-02-20'\n", "\n", "# Some subsets of states to visualize:\n", "continental_states = ['Pennsylvania', 'Arizona', 'Connecticut', 'Florida', 'Wisconsin', 'South Dakota', 'Colorado',\n", " 'New Jersey', 'California', 'Ohio', 'West Virginia', 'Oregon', 'Alabama', 'Maine', 'Vermont',\n", " 'Montana', 'Nevada', 'Washington', 'Massachusetts', 'Maryland', 'Iowa', 'North Dakota', 'Kentucky',\n", " 'Delaware', 'Mississippi', 'Tennessee', 'Virginia', 'Kansas', 'Missouri', 'Utah', 'North Carolina', 'Louisiana',\n", " 'South Carolina', 'Minnesota', 'Arkansas', 'Indiana', 'New York', 'Wyoming', 'New Mexico', 'Rhode Island',\n", " 'Michigan', 'Nebraska', 'New Hampshire', 'Georgia', 'Texas', 'Illinois', 'Oklahoma', 'Idaho']\n", "\n", "new_england_states = ['Connecticut', 'Massachusetts', 'Vermont', 'New Hampshire', 'Rhode Island', 'Maine']\n", "west_coast_states = ['California', 'Oregon', 'Washington']\n", "southeast_states = ['Florida', 'Louisiana', 'Alabama', 'Mississippi', 'Georgia', 'South Carolina']\n", "\n", "states = new_england_states\n", "\n", "recent_data = covid_table.where('date', are.above(first_date))\n", "recent_state_data = recent_data.where('state', are.contained_in(states))\n", "recent_state_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[FIPS](https://en.wikipedia.org/wiki/FIPS_county_code) is a number assigned to each county. This will be used to merge the Covid-19 data with the geo data.\n", "\n", "In the following code, we group the data by the FIPS county code, and compute the new cases for each day during the past week in that county. Then we average. \n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fips new cases
9001 294.572
9003 207.858
9005 38.2867
9007 39.001
9009 280.144
9011 69.7153
9013 25.001
9015 18.1439
23001 16.2867
23003 2.28671
\n", "

... (57 rows omitted)

" ], "text/plain": [ "fips | new cases\n", "9001 | 294.572\n", "9003 | 207.858\n", "9005 | 38.2867\n", "9007 | 39.001\n", "9009 | 280.144\n", "9011 | 69.7153\n", "9013 | 25.001\n", "9015 | 18.1439\n", "23001 | 16.2867\n", "23003 | 2.28671\n", "... (57 rows omitted)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# remove extra columns\n", "data = recent_state_data.drop('date').drop('county').drop('state').drop('deaths')\n", "\n", "# exclude cases where fips is not known\n", "data = data.where('fips', are.above(0))\n", "\n", "# now, group by fips and form a list of the cumlative cases\n", "data = data.group('fips', list)\n", "\n", "# apply the difference function np.diff to get the new cases\n", "data = data.with_column('new cases', data.apply(np.diff, 'cases list'))\n", "data = data.drop('cases list')\n", "\n", "# Now average to get the average new cases in each county over the past week\n", "# We add a small amount .001 to avoid zeros, which the graphics handles badly \n", "new_cases = Table().with_columns('fips', data['fips'], \n", " 'new cases', data.apply(np.mean, 'new cases') + .001)\n", "new_cases" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
county state fips lat lon
Fairfield Connecticut 9001 41.1663 -73.2288
Hartford Connecticut 9003 41.7638 -72.6739
Litchfield Connecticut 9005 41.7463 -73.189
Middlesex Connecticut 9007 41.4094 -72.529
New Haven Connecticut 9009 41.308 -72.9243
New London Connecticut 9011 41.3565 -72.0963
Tolland Connecticut 9013 41.8745 -72.374
Windham Connecticut 9015 41.6999 -72.1551
Androscoggin Maine 23001 44.1971 -70.2027
Cumberland Maine 23005 43.8005 -70.2545
\n", "

... (54 rows omitted)

" ], "text/plain": [ "county | state | fips | lat | lon\n", "Fairfield | Connecticut | 9001 | 41.1663 | -73.2288\n", "Hartford | Connecticut | 9003 | 41.7638 | -72.6739\n", "Litchfield | Connecticut | 9005 | 41.7463 | -73.189\n", "Middlesex | Connecticut | 9007 | 41.4094 | -72.529\n", "New Haven | Connecticut | 9009 | 41.308 | -72.9243\n", "New London | Connecticut | 9011 | 41.3565 | -72.0963\n", "Tolland | Connecticut | 9013 | 41.8745 | -72.374\n", "Windham | Connecticut | 9015 | 41.6999 | -72.1551\n", "Androscoggin | Maine | 23001 | 44.1971 | -70.2027\n", "Cumberland | Maine | 23005 | 43.8005 | -70.2545\n", "... (54 rows omitted)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state_geo = county_geo.where('state', are.contained_in(states)).sort('fips')\n", "state_geo" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we join the table of average new cases with the geo table, so that we have latitude and longitude \n", "for each county.\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
county state lat lon new cases
Fairfield Connecticut 41.1663 -73.2288 294.572
Hartford Connecticut 41.7638 -72.6739 207.858
Litchfield Connecticut 41.7463 -73.189 38.2867
Middlesex Connecticut 41.4094 -72.529 39.001
New Haven Connecticut 41.308 -72.9243 280.144
New London Connecticut 41.3565 -72.0963 69.7153
Tolland Connecticut 41.8745 -72.374 25.001
Windham Connecticut 41.6999 -72.1551 18.1439
Androscoggin Maine 44.1971 -70.2027 16.2867
Cumberland Maine 43.8005 -70.2545 41.8581
\n", "

... (49 rows omitted)

" ], "text/plain": [ "county | state | lat | lon | new cases\n", "Fairfield | Connecticut | 41.1663 | -73.2288 | 294.572\n", "Hartford | Connecticut | 41.7638 | -72.6739 | 207.858\n", "Litchfield | Connecticut | 41.7463 | -73.189 | 38.2867\n", "Middlesex | Connecticut | 41.4094 | -72.529 | 39.001\n", "New Haven | Connecticut | 41.308 | -72.9243 | 280.144\n", "New London | Connecticut | 41.3565 | -72.0963 | 69.7153\n", "Tolland | Connecticut | 41.8745 | -72.374 | 25.001\n", "Windham | Connecticut | 41.6999 | -72.1551 | 18.1439\n", "Androscoggin | Maine | 44.1971 | -70.2027 | 16.2867\n", "Cumberland | Maine | 43.8005 | -70.2545 | 41.8581\n", "... (49 rows omitted)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_cases_geo = state_geo.join('fips', new_cases)\n", "new_cases_geo = new_cases_geo.drop('fips')\n", "new_cases_geo\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can create a map where we show a circle at each county location, with area\n", "proportional to the average number of new cases over the past " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lat long labels areas colors
41.1663 -73.2288 Fairfield 2945.72 red
41.7638 -72.6739 Hartford 2078.58 red
41.7463 -73.189 Litchfield 382.867 red
41.4094 -72.529 Middlesex 390.01 red
41.308 -72.9243 New Haven 2801.44 red
41.3565 -72.0963 New London 697.153 red
41.8745 -72.374 Tolland 250.01 red
41.6999 -72.1551 Windham 181.439 red
44.1971 -70.2027 Androscoggin 162.867 red
43.8005 -70.2545 Cumberland 418.581 red
\n", "

... (49 rows omitted)

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dat = Table().with_columns('lat', new_cases_geo['lat'], \n", " 'long', new_cases_geo['lon'], \n", " 'labels', new_cases_geo['county'],\n", " 'areas', 10*new_cases_geo['new cases'],\n", " 'colors', 'red')\n", "dat.show(10)\n", "Circle.map_table(dat, weight=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table examples" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Drink Cafe Price
Milk Tea Book Trader Cafe 4
Espresso Willoughby's 2
Coffee Willoughby's 3
Espresso Blue State Coffee 2
" ], "text/plain": [ "Drink | Cafe | Price\n", "Milk Tea | Book Trader Cafe | 4\n", "Espresso | Willoughby's | 2\n", "Coffee | Willoughby's | 3\n", "Espresso | Blue State Coffee | 2" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drinks = Table(['Drink', 'Cafe', 'Price']).with_rows([\n", " ['Milk Tea', 'Book Trader Cafe', 4],\n", " ['Espresso', \"Willoughby's\", 2],\n", " ['Coffee', \"Willoughby's\", 3],\n", " ['Espresso', \"Blue State Coffee\", 2]\n", "])\n", "drinks" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Coupon % off Location
5 Willoughby's
50 Blue State Coffee
25 Willoughby's
0 Book Trader Cafe
" ], "text/plain": [ "Coupon % off | Location\n", "5 | Willoughby's\n", "50 | Blue State Coffee\n", "25 | Willoughby's\n", "0 | Book Trader Cafe" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "discounts = Table().with_columns(\n", " 'Coupon % off', make_array(5, 50, 25, 0),\n", " 'Location', make_array(\"Willoughby's\", \"Blue State Coffee\", \"Willoughby's\", \"Book Trader Cafe\")\n", ")\n", "discounts" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Cafe Drink Price Coupon % off
Blue State Coffee Espresso 2 50
Book Trader Cafe Milk Tea 4 0
Willoughby's Espresso 2 5
Willoughby's Espresso 2 25
Willoughby's Coffee 3 5
Willoughby's Coffee 3 25
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Cafe Drink Price Coupon % off Discounted price
Blue State Coffee Espresso 2 50 1
Book Trader Cafe Milk Tea 4 0 4
Willoughby's Espresso 2 5 1.9
Willoughby's Espresso 2 25 1.5
Willoughby's Coffee 3 5 2.85
Willoughby's Coffee 3 25 2.25
" ], "text/plain": [ "Cafe | Drink | Price | Coupon % off | Discounted price\n", "Blue State Coffee | Espresso | 2 | 50 | 1\n", "Book Trader Cafe | Milk Tea | 4 | 0 | 4\n", "Willoughby's | Espresso | 2 | 5 | 1.9\n", "Willoughby's | Espresso | 2 | 25 | 1.5\n", "Willoughby's | Coffee | 3 | 5 | 2.85\n", "Willoughby's | Coffee | 3 | 25 | 2.25" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Discussion question: Generate a table with one row per cafe that \n", "# has the name and discounted price of its cheapest discounted drink\n", "\n", "# Link (join) drinks with discounts\n", "combined = drinks.join('Cafe', discounts, 'Location')\n", "combined.show()\n", "\n", "# Compute discounted prices\n", "discounted_prices = combined.column('Price') * (1 - combined.column('Coupon % off')/100)\n", "\n", "discounted_drinks = combined.with_column('Discounted price', discounted_prices)\n", "discounted_drinks\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Cafe Drink Price Coupon % off Discounted price
Blue State Coffee Espresso 2 50 1
Book Trader Cafe Milk Tea 4 0 4
Willoughby's Espresso 2 25 1.5
" ], "text/plain": [ "Cafe | Drink | Price | Coupon % off | Discounted price\n", "Blue State Coffee | Espresso | 2 | 50 | 1\n", "Book Trader Cafe | Milk Tea | 4 | 0 | 4\n", "Willoughby's | Espresso | 2 | 25 | 1.5" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Correct, Espresso is cheaper\n", "discounted_drinks.sort('Discounted price').sort('Cafe', distinct=True) " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Cafe Drink Price Coupon % off Discounted price
Blue State Coffee Espresso 2 50 1
Book Trader Cafe Milk Tea 4 0 4
Willoughby's Espresso 2 5 1.9
" ], "text/plain": [ "Cafe | Drink | Price | Coupon % off | Discounted price\n", "Blue State Coffee | Espresso | 2 | 50 | 1\n", "Book Trader Cafe | Milk Tea | 4 | 0 | 4\n", "Willoughby's | Espresso | 2 | 5 | 1.9" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Incorrect - need to sort by \"Discounted price\" first\n", "discounted_drinks.sort('Cafe', distinct=True) " ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Cafe Drink min Price min Coupon % off min Discounted price min
Blue State Coffee Espresso 2 50 1
Book Trader Cafe Milk Tea 4 0 4
Willoughby's Coffee 2 5 1.5
" ], "text/plain": [ "Cafe | Drink min | Price min | Coupon % off min | Discounted price min\n", "Blue State Coffee | Espresso | 2 | 50 | 1\n", "Book Trader Cafe | Milk Tea | 4 | 0 | 4\n", "Willoughby's | Coffee | 2 | 5 | 1.5" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Incorrect, Coffee is first alphabetically\n", "discounted_drinks.group('Cafe', min) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sample midterm question" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Challenge yourself and try to solve these on your own before looking at the solutions!" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Start End Duration
Harry Bridges Plaza (Ferry Building) San Francisco Caltrain (Townsend at 4th) 765
San Antonio Shopping Center Mountain View City Hall 1036
Post at Kearny 2nd at South Park 307
\n", "

... (354149 rows omitted)

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "trip0 = Table.read_table(\"trip.csv\")\n", "trip = Table().with_columns(\n", "\"Start\", trip0.column(\"Start Station\"),\n", "\"End\", trip0.column(\"End Station\"),\n", "\"Duration\", trip0.column(\"Duration\"))\n", "trip.show(3)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# The name of the station where the most rentals ended \n", "#(assume no ties)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# The number of stations for which the average duration ending \n", "# at that station was more than 10 minutes." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# The number of stations that have more than 500 starts \n", "# AND more than 500 ends" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'San Francisco Caltrain (Townsend at 4th)'" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The name of the station where the most rentals ended (assume no ties).\n", "# First, find end counts\n", "# Then, find the station with the highest end count\n", "trip.group('End').sort('count', descending=True).column(0).item(0)\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "68" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The number of stations for which the average duration ending \n", "# at that station was more than 10 minutes.\n", "\n", "# First, find the average end time for each station\n", "# Then, keep the ones above 10 minutes\n", "# Then, count them\n", "trip.group('End', np.average).where(2, are.above(10*60)).num_rows\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "56" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# The number of stations that have more than 500 starts \n", "# AND more than 500 ends\n", "# First, find the start counts\n", "starting = trip.group('Start').relabeled('count', 'Start count').relabeled('Start', 'Station')\n", "# Then, find the end counts\n", "ending = trip.group('End').relabeled('count', 'End count').relabeled('End', 'Station')\n", "# Combine them with join\n", "starting.join('Station', ending).where('Start count', are.above(500)).where('End count', are.above(500)).num_rows\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparison ##" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "3 > 1" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "bool" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(3 > 1)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "3 < 1" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "True" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "3 == 3" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# 3 = 3" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "x = 14\n", "y = 3" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x > 10" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "12 < x < 18" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "12 < x" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x < 18" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "12 < x-y < 18" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x > 10 and y > 5" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparisons with arrays" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['cat', 'dog', 'cat', 'cat', 'dog', 'rabbit'], dtype=' 'cat'" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(pets > 'cat')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"cat\" < \"catastrophe\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predicates and advanced `where`" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Semester
1
2
3
4
5
6
7
8
" ], "text/plain": [ "Semester\n", "1\n", "2\n", "3\n", "4\n", "5\n", "6\n", "7\n", "8" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "terms = Table().with_column('Semester', np.arange(1, 9))\n", "terms" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Semester
7
8
" ], "text/plain": [ "Semester\n", "7\n", "8" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "terms.where('Semester', are.above(6))" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "is_senior = are.above(6)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "is_senior(4)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "def also_is_senior(x):\n", " return x > 6" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "also_is_senior(5)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, False, False, False, False, False, True, True])" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "terms.apply(also_is_senior, 'Semester')" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Semester
7
8
" ], "text/plain": [ "Semester\n", "7\n", "8" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "terms.where('Semester', are.above(6))" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Semester
7
8
" ], "text/plain": [ "Semester\n", "7\n", "8" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "terms.where('Semester', is_senior)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Semester
7
8
" ], "text/plain": [ "Semester\n", "7\n", "8" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "terms.where('Semester', also_is_senior)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Semester
7
8
" ], "text/plain": [ "Semester\n", "7\n", "8" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "terms.where(terms.apply(also_is_senior, 'Semester'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lat long labels areas colors
" ], "text/plain": [ "lat | long | labels | areas | colors" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dat.where('labels', are.equal_to('Benton'))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }