{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Scraping NBA team information from Wikipedia (Revisited)\n", "\n", "__Update__ (March 6, 2018): This simple Wikipedia table is the subject of a lot of editing! Somebody modified it again in early February. In this case, the editor felt that US state postal codes would be unfamiliar to international NBA fans. Instead, the table current uses the full name of the US states (and Ontario, in the case of Toronto). The URL below was changed to reflect an historical snapshot of the table prior to this change.\n", "\n", "In this notebook, we are going to take another look at scraping [NBA team information from Wikipedia](https://en.wikipedia.org/wiki/National_Basketball_Association#Teams). We will also see how generate a map of NBA arena locations.\n", "\n", "In [an earlier notebook](https://github.com/practicallypredictable/posts/blob/master/notebooks/scrape_wikipedia_nba_team_info-part1.ipynb), we scraped the table using the [Requests](http://docs.python-requests.org/en/master/) and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) packages.\n", "\n", "Unfortunately, somebody modified the table in mid-December. As a result, the original code in that notebook no longer works.\n", "\n", "Web content changes all the time, which will occasionally break web scraping code. This is particularly true of Wikipedia, where pages are open to edits by the community.\n", "\n", "In this particular case, we could just move on and ignore the table changes. The NBA team data are basically unchanged. We could just use the saved CSV file from the prior scraping. That's a major reason why you should always save the result of web scraping.\n", "\n", "On the other hand, I think this is a good opportunity to try to scrape the table in a more robust and general way. You will also see examples of some useful `pandas` techniques to clean up the Wikipedia data. I think you will find these techniques useful in your own sports analytics projects.\n", "\n", "I also wanted to do something useful with the Wikipedia information, beyond using it as an example to learn web scraping. Later in this post, we'll discuss why arena location data can be useful in sports analytics. Drawing a map is a perfect way to learn how to start using geographic data in Python. \n", "\n", "### What Changed\n", "\n", "The change the person made to the table was relatively simple. This person decided to group together certain cells in the table for the two New York teams (the Knicks and the Nets) and the LA teams (the Clippers and the Lakers). In particular, this person added HTML `rowspan` tags in the City columns, as well as in the Arena column for the LA teams.\n", "\n", "Think of `rowspan` and `colspan` tags in an HTML table as being similar to merged cells in a spreadsheet program like Microsoft Excel or Google Sheets.\n", "\n", "You can [look at the Wikipedia page prior to the table change here](https://en.wikipedia.org/w/index.php?title=National_Basketball_Association&direction=prev&oldid=815465296), and compare to the current table. Try to use your browser's inspection tools to find the `rowspan` tags that changed.\n", "\n", "We want to figure out how to read these merged cells, and \"unspan\" them to make the table layout simpler." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scraping the Table, Again\n", "\n", "We are going to use a general approach for scraping HTML tables. This approach will work for Wikipedia and other web pages, and will automatically handle the spanning that broke our original code.\n", "\n", "The scraping code is part of the [pracpred package, which you can find on GitHub](https://github.com/practicallypredictable/pracpred) or [find on from PyPI](https://pypi.python.org/pypi/pracpred). You can install the package using the command `pip install pracpred` in your sports analytics environment." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pracpred.scrape as pps" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As usual, we will do our data analysis using [`pandas`](https://pandas.pydata.org/). We will also use the [Matplotlib Basemap](https://matplotlib.org/basemap/) package for plotting a map at the end of this notebook." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "from mpl_toolkits.basemap import Basemap\n", "from matplotlib.patches import Polygon\n", "from matplotlib.collections import PatchCollection\n", "from matplotlib.colors import rgb2hex\n", "%matplotlib notebook" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "import warnings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You'll notice we are going to use the [`warnings`](https://docs.python.org/3/library/warnings.html?highlight=warnings#module-warnings) module from the Python standard library. This is purely cosmetic, because as you'll see toward the end of this notebook, Basemap emits some warning messages that I want to suppress." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "PARENT_DIR = Path.cwd().parent" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting the Raw HTML Table\n", "\n", "f you inspect the HTML for the Wikipedia page, you'll see that it has 5 HTML tables. We really only want the one for the NBA teams. If you inspect this table in your browser, you'll see that it has the HTML tag `<table class=\"navbox wikitable\">`. We can specify this class to make sure we only get back the table we want." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#URL = 'https://en.wikipedia.org/wiki/National_Basketball_Association'\n", "URL = 'https://en.wikipedia.org/w/index.php?title=National_Basketball_Association&oldid=823837048'\n", "NBA_TEAM_INFO = 'navbox wikitable'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "USER_AGENT = (\n", " 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) ' +\n", " 'AppleWebKit/537.36 (KHTML, like Gecko) ' +\n", " 'Chrome/61.0.3163.100 Safari/537.36'\n", ")\n", "\n", "REQUEST_HEADERS = {\n", " 'user-agent': USER_AGENT,\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can call the scraping code. You can [find the source code on GitHub here](https://github.com/practicallypredictable/pracpred/blob/master/pracpred/scrape/html_tables.py). The package defines two Python classes, `HTMLTables` and `HTMLTable`. The `HTMLTables` `class` is basically a wrapper on top of Requests and BeautifulSoup. This `class` gets and stores the HTML for one or more tables from a URL. The `HTMLTable` `class` has the code to unspan the table and convert it to a `pandas` `DataFrame`." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "tables = pps.HTMLTables(URL, table_class=NBA_TEAM_INFO, headers=REQUEST_HEADERS)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(tables)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(33, 9)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tables[0].shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We got back one table, which has 33 rows and 9 columns. Notice that the table dimensions are the largest number of rows in any column, and the largest number of columns in any row. This is the key to getting the unspanning to work. We want to view the table as a grid of cells to remove the spanning structure.\n", "\n", "Now, let's convert the HTML table to a `pandas` `DataFrame`. For this particular table, we want to have any spanned cells repeat the values when we unspan the table." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>0</th>\n", " <th>1</th>\n", " <th>2</th>\n", " <th>3</th>\n", " <th>4</th>\n", " <th>5</th>\n", " <th>6</th>\n", " <th>7</th>\n", " <th>8</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Division</td>\n", " <td>Team</td>\n", " <td>City</td>\n", " <td>Arena</td>\n", " <td>Capacity</td>\n", " <td>Coordinates</td>\n", " <td>Founded</td>\n", " <td>Joined</td>\n", " <td>Head coach</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Atlantic</td>\n", " <td>Boston Celtics</td>\n", " <td>Boston, MA</td>\n", " <td>TD Garden</td>\n", " <td>18,624</td>\n", " <td>42°21′59″N 71°03′44″W / 42.366303°N 71.06222...</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Brad Stevens</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Atlantic</td>\n", " <td>Brooklyn Nets</td>\n", " <td>New York City, NY</td>\n", " <td>Barclays Center</td>\n", " <td>17,732</td>\n", " <td>40°40′58″N 73°58′29″W / 40.68265°N 73.974689...</td>\n", " <td>1967*</td>\n", " <td>1976</td>\n", " <td>Kenny Atkinson</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Atlantic</td>\n", " <td>New York Knicks</td>\n", " <td>New York City, NY</td>\n", " <td>Madison Square Garden</td>\n", " <td>19,812</td>\n", " <td>40°45′02″N 73°59′37″W / 40.750556°N 73.99361...</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Jeff Hornacek</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>Atlantic</td>\n", " <td>Philadelphia 76ers</td>\n", " <td>Philadelphia, PA</td>\n", " <td>Wells Fargo Center</td>\n", " <td>21,600</td>\n", " <td>39°54′04″N 75°10′19″W / 39.901111°N 75.17194...</td>\n", " <td>1946*</td>\n", " <td>1949</td>\n", " <td>Brett Brown</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>Atlantic</td>\n", " <td>Toronto Raptors</td>\n", " <td>Toronto, ON</td>\n", " <td>Air Canada Centre</td>\n", " <td>19,800</td>\n", " <td>43°38′36″N 79°22′45″W / 43.643333°N 79.37916...</td>\n", " <td>1995</td>\n", " <td>1995</td>\n", " <td>Dwane Casey</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>Central</td>\n", " <td>Chicago Bulls</td>\n", " <td>Chicago, IL</td>\n", " <td>United Center</td>\n", " <td>20,917</td>\n", " <td>41°52′50″N 87°40′27″W / 41.880556°N 87.67416...</td>\n", " <td>1966</td>\n", " <td>1966</td>\n", " <td>Fred Hoiberg</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>Central</td>\n", " <td>Cleveland Cavaliers</td>\n", " <td>Cleveland, OH</td>\n", " <td>Quicken Loans Arena</td>\n", " <td>20,562</td>\n", " <td>41°29′47″N 81°41′17″W / 41.496389°N 81.68805...</td>\n", " <td>1970</td>\n", " <td>1970</td>\n", " <td>Tyronn Lue</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>Central</td>\n", " <td>Detroit Pistons</td>\n", " <td>Detroit, MI</td>\n", " <td>Little Caesars Arena</td>\n", " <td>20,491</td>\n", " <td>42°41′49″N 83°14′44″W / 42.696944°N 83.24555...</td>\n", " <td>1941*</td>\n", " <td>1948</td>\n", " <td>Stan Van Gundy</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>Central</td>\n", " <td>Indiana Pacers</td>\n", " <td>Indianapolis, IN</td>\n", " <td>Bankers Life Fieldhouse</td>\n", " <td>17,923</td>\n", " <td>39°45′50″N 86°09′20″W / 39.763889°N 86.15555...</td>\n", " <td>1967</td>\n", " <td>1976</td>\n", " <td>Nate McMillan</td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>Central</td>\n", " <td>Milwaukee Bucks</td>\n", " <td>Milwaukee, WI</td>\n", " <td>Bradley Center</td>\n", " <td>18,717</td>\n", " <td>43°02′37″N 87°55′01″W / 43.043611°N 87.91694...</td>\n", " <td>1968</td>\n", " <td>1968</td>\n", " <td>Joe Prunty</td>\n", " </tr>\n", " <tr>\n", " <th>12</th>\n", " <td>Southeast</td>\n", " <td>Atlanta Hawks</td>\n", " <td>Atlanta, GA</td>\n", " <td>Philips Arena</td>\n", " <td>15,711</td>\n", " <td>33°45′26″N 84°23′47″W / 33.757222°N 84.39638...</td>\n", " <td>1946*</td>\n", " <td>1949</td>\n", " <td>Mike Budenholzer</td>\n", " </tr>\n", " <tr>\n", " <th>13</th>\n", " <td>Southeast</td>\n", " <td>Charlotte Hornets</td>\n", " <td>Charlotte, NC</td>\n", " <td>Spectrum Center</td>\n", " <td>19,077</td>\n", " <td>35°13′30″N 80°50′21″W / 35.225°N 80.839167°W...</td>\n", " <td>1988*</td>\n", " <td>1988*</td>\n", " <td>Steve Clifford</td>\n", " </tr>\n", " <tr>\n", " <th>14</th>\n", " <td>Southeast</td>\n", " <td>Miami Heat</td>\n", " <td>Miami, FL</td>\n", " <td>American Airlines Arena</td>\n", " <td>19,600</td>\n", " <td>25°46′53″N 80°11′17″W / 25.781389°N 80.18805...</td>\n", " <td>1988</td>\n", " <td>1988</td>\n", " <td>Erik Spoelstra</td>\n", " </tr>\n", " <tr>\n", " <th>15</th>\n", " <td>Southeast</td>\n", " <td>Orlando Magic</td>\n", " <td>Orlando, FL</td>\n", " <td>Amway Center</td>\n", " <td>18,846</td>\n", " <td>28°32′21″N 81°23′01″W / 28.539167°N 81.38361...</td>\n", " <td>1989</td>\n", " <td>1989</td>\n", " <td>Frank Vogel</td>\n", " </tr>\n", " <tr>\n", " <th>16</th>\n", " <td>Southeast</td>\n", " <td>Washington Wizards</td>\n", " <td>Washington, D.C.</td>\n", " <td>Capital One Arena</td>\n", " <td>20,356</td>\n", " <td>38°53′53″N 77°01′15″W / 38.898056°N 77.02083...</td>\n", " <td>1961*</td>\n", " <td>1961*</td>\n", " <td>Scott Brooks</td>\n", " </tr>\n", " <tr>\n", " <th>17</th>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " </tr>\n", " <tr>\n", " <th>18</th>\n", " <td>Northwest</td>\n", " <td>Denver Nuggets</td>\n", " <td>Denver, CO</td>\n", " <td>Pepsi Center</td>\n", " <td>19,520</td>\n", " <td>39°44′55″N 105°00′27″W / 39.748611°N 105.007...</td>\n", " <td>1967</td>\n", " <td>1976</td>\n", " <td>Michael Malone</td>\n", " </tr>\n", " <tr>\n", " <th>19</th>\n", " <td>Northwest</td>\n", " <td>Minnesota Timberwolves</td>\n", " <td>Minneapolis, MN</td>\n", " <td>Target Center</td>\n", " <td>19,356</td>\n", " <td>44°58′46″N 93°16′34″W / 44.979444°N 93.27611...</td>\n", " <td>1989</td>\n", " <td>1989</td>\n", " <td>Tom Thibodeau</td>\n", " </tr>\n", " <tr>\n", " <th>20</th>\n", " <td>Northwest</td>\n", " <td>Oklahoma City Thunder</td>\n", " <td>Oklahoma City, OK</td>\n", " <td>Chesapeake Energy Arena</td>\n", " <td>18,203</td>\n", " <td>35°27′48″N 97°30′54″W / 35.463333°N 97.515°W...</td>\n", " <td>1967*</td>\n", " <td>1967*</td>\n", " <td>Billy Donovan</td>\n", " </tr>\n", " <tr>\n", " <th>21</th>\n", " <td>Northwest</td>\n", " <td>Portland Trail Blazers</td>\n", " <td>Portland, OR</td>\n", " <td>Moda Center</td>\n", " <td>19,441</td>\n", " <td>45°31′54″N 122°40′00″W / 45.531667°N 122.666...</td>\n", " <td>1970</td>\n", " <td>1970</td>\n", " <td>Terry Stotts</td>\n", " </tr>\n", " <tr>\n", " <th>22</th>\n", " <td>Northwest</td>\n", " <td>Utah Jazz</td>\n", " <td>Salt Lake City, UT</td>\n", " <td>Vivint Smart Home Arena</td>\n", " <td>19,911</td>\n", " <td>40°46′06″N 111°54′04″W / 40.768333°N 111.901...</td>\n", " <td>1974*</td>\n", " <td>1974*</td>\n", " <td>Quin Snyder</td>\n", " </tr>\n", " <tr>\n", " <th>23</th>\n", " <td>Pacific</td>\n", " <td>Golden State Warriors</td>\n", " <td>Oakland, CA</td>\n", " <td>Oracle Arena</td>\n", " <td>19,596</td>\n", " <td>37°45′01″N 122°12′11″W / 37.750278°N 122.203...</td>\n", " <td>1946*</td>\n", " <td>1946*</td>\n", " <td>Steve Kerr</td>\n", " </tr>\n", " <tr>\n", " <th>24</th>\n", " <td>Pacific</td>\n", " <td>Los Angeles Clippers</td>\n", " <td>Los Angeles, CA</td>\n", " <td>Staples Center</td>\n", " <td>19,060</td>\n", " <td>34°02′35″N 118°16′02″W / 34.043056°N 118.267...</td>\n", " <td>1970*</td>\n", " <td>1970*</td>\n", " <td>Doc Rivers</td>\n", " </tr>\n", " <tr>\n", " <th>25</th>\n", " <td>Pacific</td>\n", " <td>Los Angeles Lakers</td>\n", " <td>Los Angeles, CA</td>\n", " <td>Staples Center</td>\n", " <td>18,997</td>\n", " <td>34°02′35″N 118°16′02″W / 34.043056°N 118.267...</td>\n", " <td>1947*</td>\n", " <td>1948</td>\n", " <td>Luke Walton</td>\n", " </tr>\n", " <tr>\n", " <th>26</th>\n", " <td>Pacific</td>\n", " <td>Phoenix Suns</td>\n", " <td>Phoenix, AZ</td>\n", " <td>Talking Stick Resort Arena</td>\n", " <td>18,055</td>\n", " <td>33°26′45″N 112°04′17″W / 33.445833°N 112.071...</td>\n", " <td>1968</td>\n", " <td>1968</td>\n", " <td>Jay Triano</td>\n", " </tr>\n", " <tr>\n", " <th>27</th>\n", " <td>Pacific</td>\n", " <td>Sacramento Kings</td>\n", " <td>Sacramento, CA</td>\n", " <td>Golden 1 Center</td>\n", " <td>17,500</td>\n", " <td>38°38′57″N 121°31′05″W / 38.649167°N 121.518...</td>\n", " <td>1923*</td>\n", " <td>1948</td>\n", " <td>Dave Joerger</td>\n", " </tr>\n", " <tr>\n", " <th>28</th>\n", " <td>Southwest</td>\n", " <td>Dallas Mavericks</td>\n", " <td>Dallas, TX</td>\n", " <td>American Airlines Center</td>\n", " <td>19,200</td>\n", " <td>32°47′26″N 96°48′37″W / 32.790556°N 96.81027...</td>\n", " <td>1980</td>\n", " <td>1980</td>\n", " <td>Rick Carlisle</td>\n", " </tr>\n", " <tr>\n", " <th>29</th>\n", " <td>Southwest</td>\n", " <td>Houston Rockets</td>\n", " <td>Houston, TX</td>\n", " <td>Toyota Center</td>\n", " <td>18,055</td>\n", " <td>29°45′03″N 95°21′44″W / 29.750833°N 95.36222...</td>\n", " <td>1967*</td>\n", " <td>1967*</td>\n", " <td>Mike D'Antoni</td>\n", " </tr>\n", " <tr>\n", " <th>30</th>\n", " <td>Southwest</td>\n", " <td>Memphis Grizzlies</td>\n", " <td>Memphis, TN</td>\n", " <td>FedExForum</td>\n", " <td>18,119</td>\n", " <td>35°08′18″N 90°03′02″W / 35.138333°N 90.05055...</td>\n", " <td>1995*</td>\n", " <td>1995*</td>\n", " <td>J. B. Bickerstaff</td>\n", " </tr>\n", " <tr>\n", " <th>31</th>\n", " <td>Southwest</td>\n", " <td>New Orleans Pelicans</td>\n", " <td>New Orleans, LA</td>\n", " <td>Smoothie King Center</td>\n", " <td>16,867</td>\n", " <td>29°56′56″N 90°04′55″W / 29.948889°N 90.08194...</td>\n", " <td>2002*</td>\n", " <td>2002*</td>\n", " <td>Alvin Gentry</td>\n", " </tr>\n", " <tr>\n", " <th>32</th>\n", " <td>Southwest</td>\n", " <td>San Antonio Spurs</td>\n", " <td>San Antonio, TX</td>\n", " <td>AT&T Center</td>\n", " <td>18,418</td>\n", " <td>29°25′37″N 98°26′15″W / 29.426944°N 98.4375°...</td>\n", " <td>1967*</td>\n", " <td>1976</td>\n", " <td>Gregg Popovich</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " 0 1 2 \\\n", "0 Division Team City \n", "1 Eastern Conference Eastern Conference Eastern Conference \n", "2 Atlantic Boston Celtics Boston, MA \n", "3 Atlantic Brooklyn Nets New York City, NY \n", "4 Atlantic New York Knicks New York City, NY \n", "5 Atlantic Philadelphia 76ers Philadelphia, PA \n", "6 Atlantic Toronto Raptors Toronto, ON \n", "7 Central Chicago Bulls Chicago, IL \n", "8 Central Cleveland Cavaliers Cleveland, OH \n", "9 Central Detroit Pistons Detroit, MI \n", "10 Central Indiana Pacers Indianapolis, IN \n", "11 Central Milwaukee Bucks Milwaukee, WI \n", "12 Southeast Atlanta Hawks Atlanta, GA \n", "13 Southeast Charlotte Hornets Charlotte, NC \n", "14 Southeast Miami Heat Miami, FL \n", "15 Southeast Orlando Magic Orlando, FL \n", "16 Southeast Washington Wizards Washington, D.C. \n", "17 Western Conference Western Conference Western Conference \n", "18 Northwest Denver Nuggets Denver, CO \n", "19 Northwest Minnesota Timberwolves Minneapolis, MN \n", "20 Northwest Oklahoma City Thunder Oklahoma City, OK \n", "21 Northwest Portland Trail Blazers Portland, OR \n", "22 Northwest Utah Jazz Salt Lake City, UT \n", "23 Pacific Golden State Warriors Oakland, CA \n", "24 Pacific Los Angeles Clippers Los Angeles, CA \n", "25 Pacific Los Angeles Lakers Los Angeles, CA \n", "26 Pacific Phoenix Suns Phoenix, AZ \n", "27 Pacific Sacramento Kings Sacramento, CA \n", "28 Southwest Dallas Mavericks Dallas, TX \n", "29 Southwest Houston Rockets Houston, TX \n", "30 Southwest Memphis Grizzlies Memphis, TN \n", "31 Southwest New Orleans Pelicans New Orleans, LA \n", "32 Southwest San Antonio Spurs San Antonio, TX \n", "\n", " 3 4 \\\n", "0 Arena Capacity \n", "1 Eastern Conference Eastern Conference \n", "2 TD Garden 18,624 \n", "3 Barclays Center 17,732 \n", "4 Madison Square Garden 19,812 \n", "5 Wells Fargo Center 21,600 \n", "6 Air Canada Centre 19,800 \n", "7 United Center 20,917 \n", "8 Quicken Loans Arena 20,562 \n", "9 Little Caesars Arena 20,491 \n", "10 Bankers Life Fieldhouse 17,923 \n", "11 Bradley Center 18,717 \n", "12 Philips Arena 15,711 \n", "13 Spectrum Center 19,077 \n", "14 American Airlines Arena 19,600 \n", "15 Amway Center 18,846 \n", "16 Capital One Arena 20,356 \n", "17 Western Conference Western Conference \n", "18 Pepsi Center 19,520 \n", "19 Target Center 19,356 \n", "20 Chesapeake Energy Arena 18,203 \n", "21 Moda Center 19,441 \n", "22 Vivint Smart Home Arena 19,911 \n", "23 Oracle Arena 19,596 \n", "24 Staples Center 19,060 \n", "25 Staples Center 18,997 \n", "26 Talking Stick Resort Arena 18,055 \n", "27 Golden 1 Center 17,500 \n", "28 American Airlines Center 19,200 \n", "29 Toyota Center 18,055 \n", "30 FedExForum 18,119 \n", "31 Smoothie King Center 16,867 \n", "32 AT&T Center 18,418 \n", "\n", " 5 6 \\\n", "0 Coordinates Founded \n", "1 Eastern Conference Eastern Conference \n", "2 42°21′59″N 71°03′44″W / 42.366303°N 71.06222... 1946 \n", "3 40°40′58″N 73°58′29″W / 40.68265°N 73.974689... 1967* \n", "4 40°45′02″N 73°59′37″W / 40.750556°N 73.99361... 1946 \n", "5 39°54′04″N 75°10′19″W / 39.901111°N 75.17194... 1946* \n", "6 43°38′36″N 79°22′45″W / 43.643333°N 79.37916... 1995 \n", "7 41°52′50″N 87°40′27″W / 41.880556°N 87.67416... 1966 \n", "8 41°29′47″N 81°41′17″W / 41.496389°N 81.68805... 1970 \n", "9 42°41′49″N 83°14′44″W / 42.696944°N 83.24555... 1941* \n", "10 39°45′50″N 86°09′20″W / 39.763889°N 86.15555... 1967 \n", "11 43°02′37″N 87°55′01″W / 43.043611°N 87.91694... 1968 \n", "12 33°45′26″N 84°23′47″W / 33.757222°N 84.39638... 1946* \n", "13 35°13′30″N 80°50′21″W / 35.225°N 80.839167°W... 1988* \n", "14 25°46′53″N 80°11′17″W / 25.781389°N 80.18805... 1988 \n", "15 28°32′21″N 81°23′01″W / 28.539167°N 81.38361... 1989 \n", "16 38°53′53″N 77°01′15″W / 38.898056°N 77.02083... 1961* \n", "17 Western Conference Western Conference \n", "18 39°44′55″N 105°00′27″W / 39.748611°N 105.007... 1967 \n", "19 44°58′46″N 93°16′34″W / 44.979444°N 93.27611... 1989 \n", "20 35°27′48″N 97°30′54″W / 35.463333°N 97.515°W... 1967* \n", "21 45°31′54″N 122°40′00″W / 45.531667°N 122.666... 1970 \n", "22 40°46′06″N 111°54′04″W / 40.768333°N 111.901... 1974* \n", "23 37°45′01″N 122°12′11″W / 37.750278°N 122.203... 1946* \n", "24 34°02′35″N 118°16′02″W / 34.043056°N 118.267... 1970* \n", "25 34°02′35″N 118°16′02″W / 34.043056°N 118.267... 1947* \n", "26 33°26′45″N 112°04′17″W / 33.445833°N 112.071... 1968 \n", "27 38°38′57″N 121°31′05″W / 38.649167°N 121.518... 1923* \n", "28 32°47′26″N 96°48′37″W / 32.790556°N 96.81027... 1980 \n", "29 29°45′03″N 95°21′44″W / 29.750833°N 95.36222... 1967* \n", "30 35°08′18″N 90°03′02″W / 35.138333°N 90.05055... 1995* \n", "31 29°56′56″N 90°04′55″W / 29.948889°N 90.08194... 2002* \n", "32 29°25′37″N 98°26′15″W / 29.426944°N 98.4375°... 1967* \n", "\n", " 7 8 \n", "0 Joined Head coach \n", "1 Eastern Conference Eastern Conference \n", "2 1946 Brad Stevens \n", "3 1976 Kenny Atkinson \n", "4 1946 Jeff Hornacek \n", "5 1949 Brett Brown \n", "6 1995 Dwane Casey \n", "7 1966 Fred Hoiberg \n", "8 1970 Tyronn Lue \n", "9 1948 Stan Van Gundy \n", "10 1976 Nate McMillan \n", "11 1968 Joe Prunty \n", "12 1949 Mike Budenholzer \n", "13 1988* Steve Clifford \n", "14 1988 Erik Spoelstra \n", "15 1989 Frank Vogel \n", "16 1961* Scott Brooks \n", "17 Western Conference Western Conference \n", "18 1976 Michael Malone \n", "19 1989 Tom Thibodeau \n", "20 1967* Billy Donovan \n", "21 1970 Terry Stotts \n", "22 1974* Quin Snyder \n", "23 1946* Steve Kerr \n", "24 1970* Doc Rivers \n", "25 1948 Luke Walton \n", "26 1968 Jay Triano \n", "27 1948 Dave Joerger \n", "28 1980 Rick Carlisle \n", "29 1967* Mike D'Antoni \n", "30 1995* J. B. Bickerstaff \n", "31 2002* Alvin Gentry \n", "32 1976 Gregg Popovich " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw = tables[0].to_df(repeat_span=True)\n", "raw" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cleaning Up the Table\n", "\n", "Now let's clean up the raw information in the table.\n", "\n", "#### Column Headers\n", "\n", "First, notice that our generic scraping function doesn't know anything about what columns are in the table. We need to create useful column headers." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def setup_columns(raw):\n", " df = raw.copy()\n", " df.columns = df.loc[0, :]\n", " return df.drop(df.index[0])" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Division</th>\n", " <th>Team</th>\n", " <th>City</th>\n", " <th>Arena</th>\n", " <th>Capacity</th>\n", " <th>Coordinates</th>\n", " <th>Founded</th>\n", " <th>Joined</th>\n", " <th>Head coach</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>1</th>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " <td>Eastern Conference</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Atlantic</td>\n", " <td>Boston Celtics</td>\n", " <td>Boston, MA</td>\n", " <td>TD Garden</td>\n", " <td>18,624</td>\n", " <td>42°21′59″N 71°03′44″W / 42.366303°N 71.06222...</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Brad Stevens</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Atlantic</td>\n", " <td>Brooklyn Nets</td>\n", " <td>New York City, NY</td>\n", " <td>Barclays Center</td>\n", " <td>17,732</td>\n", " <td>40°40′58″N 73°58′29″W / 40.68265°N 73.974689...</td>\n", " <td>1967*</td>\n", " <td>1976</td>\n", " <td>Kenny Atkinson</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Atlantic</td>\n", " <td>New York Knicks</td>\n", " <td>New York City, NY</td>\n", " <td>Madison Square Garden</td>\n", " <td>19,812</td>\n", " <td>40°45′02″N 73°59′37″W / 40.750556°N 73.99361...</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Jeff Hornacek</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>Atlantic</td>\n", " <td>Philadelphia 76ers</td>\n", " <td>Philadelphia, PA</td>\n", " <td>Wells Fargo Center</td>\n", " <td>21,600</td>\n", " <td>39°54′04″N 75°10′19″W / 39.901111°N 75.17194...</td>\n", " <td>1946*</td>\n", " <td>1949</td>\n", " <td>Brett Brown</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>Atlantic</td>\n", " <td>Toronto Raptors</td>\n", " <td>Toronto, ON</td>\n", " <td>Air Canada Centre</td>\n", " <td>19,800</td>\n", " <td>43°38′36″N 79°22′45″W / 43.643333°N 79.37916...</td>\n", " <td>1995</td>\n", " <td>1995</td>\n", " <td>Dwane Casey</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>Central</td>\n", " <td>Chicago Bulls</td>\n", " <td>Chicago, IL</td>\n", " <td>United Center</td>\n", " <td>20,917</td>\n", " <td>41°52′50″N 87°40′27″W / 41.880556°N 87.67416...</td>\n", " <td>1966</td>\n", " <td>1966</td>\n", " <td>Fred Hoiberg</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>Central</td>\n", " <td>Cleveland Cavaliers</td>\n", " <td>Cleveland, OH</td>\n", " <td>Quicken Loans Arena</td>\n", " <td>20,562</td>\n", " <td>41°29′47″N 81°41′17″W / 41.496389°N 81.68805...</td>\n", " <td>1970</td>\n", " <td>1970</td>\n", " <td>Tyronn Lue</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>Central</td>\n", " <td>Detroit Pistons</td>\n", " <td>Detroit, MI</td>\n", " <td>Little Caesars Arena</td>\n", " <td>20,491</td>\n", " <td>42°41′49″N 83°14′44″W / 42.696944°N 83.24555...</td>\n", " <td>1941*</td>\n", " <td>1948</td>\n", " <td>Stan Van Gundy</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>Central</td>\n", " <td>Indiana Pacers</td>\n", " <td>Indianapolis, IN</td>\n", " <td>Bankers Life Fieldhouse</td>\n", " <td>17,923</td>\n", " <td>39°45′50″N 86°09′20″W / 39.763889°N 86.15555...</td>\n", " <td>1967</td>\n", " <td>1976</td>\n", " <td>Nate McMillan</td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>Central</td>\n", " <td>Milwaukee Bucks</td>\n", " <td>Milwaukee, WI</td>\n", " <td>Bradley Center</td>\n", " <td>18,717</td>\n", " <td>43°02′37″N 87°55′01″W / 43.043611°N 87.91694...</td>\n", " <td>1968</td>\n", " <td>1968</td>\n", " <td>Joe Prunty</td>\n", " </tr>\n", " <tr>\n", " <th>12</th>\n", " <td>Southeast</td>\n", " <td>Atlanta Hawks</td>\n", " <td>Atlanta, GA</td>\n", " <td>Philips Arena</td>\n", " <td>15,711</td>\n", " <td>33°45′26″N 84°23′47″W / 33.757222°N 84.39638...</td>\n", " <td>1946*</td>\n", " <td>1949</td>\n", " <td>Mike Budenholzer</td>\n", " </tr>\n", " <tr>\n", " <th>13</th>\n", " <td>Southeast</td>\n", " <td>Charlotte Hornets</td>\n", " <td>Charlotte, NC</td>\n", " <td>Spectrum Center</td>\n", " <td>19,077</td>\n", " <td>35°13′30″N 80°50′21″W / 35.225°N 80.839167°W...</td>\n", " <td>1988*</td>\n", " <td>1988*</td>\n", " <td>Steve Clifford</td>\n", " </tr>\n", " <tr>\n", " <th>14</th>\n", " <td>Southeast</td>\n", " <td>Miami Heat</td>\n", " <td>Miami, FL</td>\n", " <td>American Airlines Arena</td>\n", " <td>19,600</td>\n", " <td>25°46′53″N 80°11′17″W / 25.781389°N 80.18805...</td>\n", " <td>1988</td>\n", " <td>1988</td>\n", " <td>Erik Spoelstra</td>\n", " </tr>\n", " <tr>\n", " <th>15</th>\n", " <td>Southeast</td>\n", " <td>Orlando Magic</td>\n", " <td>Orlando, FL</td>\n", " <td>Amway Center</td>\n", " <td>18,846</td>\n", " <td>28°32′21″N 81°23′01″W / 28.539167°N 81.38361...</td>\n", " <td>1989</td>\n", " <td>1989</td>\n", " <td>Frank Vogel</td>\n", " </tr>\n", " <tr>\n", " <th>16</th>\n", " <td>Southeast</td>\n", " <td>Washington Wizards</td>\n", " <td>Washington, D.C.</td>\n", " <td>Capital One Arena</td>\n", " <td>20,356</td>\n", " <td>38°53′53″N 77°01′15″W / 38.898056°N 77.02083...</td>\n", " <td>1961*</td>\n", " <td>1961*</td>\n", " <td>Scott Brooks</td>\n", " </tr>\n", " <tr>\n", " <th>17</th>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " <td>Western Conference</td>\n", " </tr>\n", " <tr>\n", " <th>18</th>\n", " <td>Northwest</td>\n", " <td>Denver Nuggets</td>\n", " <td>Denver, CO</td>\n", " <td>Pepsi Center</td>\n", " <td>19,520</td>\n", " <td>39°44′55″N 105°00′27″W / 39.748611°N 105.007...</td>\n", " <td>1967</td>\n", " <td>1976</td>\n", " <td>Michael Malone</td>\n", " </tr>\n", " <tr>\n", " <th>19</th>\n", " <td>Northwest</td>\n", " <td>Minnesota Timberwolves</td>\n", " <td>Minneapolis, MN</td>\n", " <td>Target Center</td>\n", " <td>19,356</td>\n", " <td>44°58′46″N 93°16′34″W / 44.979444°N 93.27611...</td>\n", " <td>1989</td>\n", " <td>1989</td>\n", " <td>Tom Thibodeau</td>\n", " </tr>\n", " <tr>\n", " <th>20</th>\n", " <td>Northwest</td>\n", " <td>Oklahoma City Thunder</td>\n", " <td>Oklahoma City, OK</td>\n", " <td>Chesapeake Energy Arena</td>\n", " <td>18,203</td>\n", " <td>35°27′48″N 97°30′54″W / 35.463333°N 97.515°W...</td>\n", " <td>1967*</td>\n", " <td>1967*</td>\n", " <td>Billy Donovan</td>\n", " </tr>\n", " <tr>\n", " <th>21</th>\n", " <td>Northwest</td>\n", " <td>Portland Trail Blazers</td>\n", " <td>Portland, OR</td>\n", " <td>Moda Center</td>\n", " <td>19,441</td>\n", " <td>45°31′54″N 122°40′00″W / 45.531667°N 122.666...</td>\n", " <td>1970</td>\n", " <td>1970</td>\n", " <td>Terry Stotts</td>\n", " </tr>\n", " <tr>\n", " <th>22</th>\n", " <td>Northwest</td>\n", " <td>Utah Jazz</td>\n", " <td>Salt Lake City, UT</td>\n", " <td>Vivint Smart Home Arena</td>\n", " <td>19,911</td>\n", " <td>40°46′06″N 111°54′04″W / 40.768333°N 111.901...</td>\n", " <td>1974*</td>\n", " <td>1974*</td>\n", " <td>Quin Snyder</td>\n", " </tr>\n", " <tr>\n", " <th>23</th>\n", " <td>Pacific</td>\n", " <td>Golden State Warriors</td>\n", " <td>Oakland, CA</td>\n", " <td>Oracle Arena</td>\n", " <td>19,596</td>\n", " <td>37°45′01″N 122°12′11″W / 37.750278°N 122.203...</td>\n", " <td>1946*</td>\n", " <td>1946*</td>\n", " <td>Steve Kerr</td>\n", " </tr>\n", " <tr>\n", " <th>24</th>\n", " <td>Pacific</td>\n", " <td>Los Angeles Clippers</td>\n", " <td>Los Angeles, CA</td>\n", " <td>Staples Center</td>\n", " <td>19,060</td>\n", " <td>34°02′35″N 118°16′02″W / 34.043056°N 118.267...</td>\n", " <td>1970*</td>\n", " <td>1970*</td>\n", " <td>Doc Rivers</td>\n", " </tr>\n", " <tr>\n", " <th>25</th>\n", " <td>Pacific</td>\n", " <td>Los Angeles Lakers</td>\n", " <td>Los Angeles, CA</td>\n", " <td>Staples Center</td>\n", " <td>18,997</td>\n", " <td>34°02′35″N 118°16′02″W / 34.043056°N 118.267...</td>\n", " <td>1947*</td>\n", " <td>1948</td>\n", " <td>Luke Walton</td>\n", " </tr>\n", " <tr>\n", " <th>26</th>\n", " <td>Pacific</td>\n", " <td>Phoenix Suns</td>\n", " <td>Phoenix, AZ</td>\n", " <td>Talking Stick Resort Arena</td>\n", " <td>18,055</td>\n", " <td>33°26′45″N 112°04′17″W / 33.445833°N 112.071...</td>\n", " <td>1968</td>\n", " <td>1968</td>\n", " <td>Jay Triano</td>\n", " </tr>\n", " <tr>\n", " <th>27</th>\n", " <td>Pacific</td>\n", " <td>Sacramento Kings</td>\n", " <td>Sacramento, CA</td>\n", " <td>Golden 1 Center</td>\n", " <td>17,500</td>\n", " <td>38°38′57″N 121°31′05″W / 38.649167°N 121.518...</td>\n", " <td>1923*</td>\n", " <td>1948</td>\n", " <td>Dave Joerger</td>\n", " </tr>\n", " <tr>\n", " <th>28</th>\n", " <td>Southwest</td>\n", " <td>Dallas Mavericks</td>\n", " <td>Dallas, TX</td>\n", " <td>American Airlines Center</td>\n", " <td>19,200</td>\n", " <td>32°47′26″N 96°48′37″W / 32.790556°N 96.81027...</td>\n", " <td>1980</td>\n", " <td>1980</td>\n", " <td>Rick Carlisle</td>\n", " </tr>\n", " <tr>\n", " <th>29</th>\n", " <td>Southwest</td>\n", " <td>Houston Rockets</td>\n", " <td>Houston, TX</td>\n", " <td>Toyota Center</td>\n", " <td>18,055</td>\n", " <td>29°45′03″N 95°21′44″W / 29.750833°N 95.36222...</td>\n", " <td>1967*</td>\n", " <td>1967*</td>\n", " <td>Mike D'Antoni</td>\n", " </tr>\n", " <tr>\n", " <th>30</th>\n", " <td>Southwest</td>\n", " <td>Memphis Grizzlies</td>\n", " <td>Memphis, TN</td>\n", " <td>FedExForum</td>\n", " <td>18,119</td>\n", " <td>35°08′18″N 90°03′02″W / 35.138333°N 90.05055...</td>\n", " <td>1995*</td>\n", " <td>1995*</td>\n", " <td>J. B. Bickerstaff</td>\n", " </tr>\n", " <tr>\n", " <th>31</th>\n", " <td>Southwest</td>\n", " <td>New Orleans Pelicans</td>\n", " <td>New Orleans, LA</td>\n", " <td>Smoothie King Center</td>\n", " <td>16,867</td>\n", " <td>29°56′56″N 90°04′55″W / 29.948889°N 90.08194...</td>\n", " <td>2002*</td>\n", " <td>2002*</td>\n", " <td>Alvin Gentry</td>\n", " </tr>\n", " <tr>\n", " <th>32</th>\n", " <td>Southwest</td>\n", " <td>San Antonio Spurs</td>\n", " <td>San Antonio, TX</td>\n", " <td>AT&T Center</td>\n", " <td>18,418</td>\n", " <td>29°25′37″N 98°26′15″W / 29.426944°N 98.4375°...</td>\n", " <td>1967*</td>\n", " <td>1976</td>\n", " <td>Gregg Popovich</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "0 Division Team City \\\n", "1 Eastern Conference Eastern Conference Eastern Conference \n", "2 Atlantic Boston Celtics Boston, MA \n", "3 Atlantic Brooklyn Nets New York City, NY \n", "4 Atlantic New York Knicks New York City, NY \n", "5 Atlantic Philadelphia 76ers Philadelphia, PA \n", "6 Atlantic Toronto Raptors Toronto, ON \n", "7 Central Chicago Bulls Chicago, IL \n", "8 Central Cleveland Cavaliers Cleveland, OH \n", "9 Central Detroit Pistons Detroit, MI \n", "10 Central Indiana Pacers Indianapolis, IN \n", "11 Central Milwaukee Bucks Milwaukee, WI \n", "12 Southeast Atlanta Hawks Atlanta, GA \n", "13 Southeast Charlotte Hornets Charlotte, NC \n", "14 Southeast Miami Heat Miami, FL \n", "15 Southeast Orlando Magic Orlando, FL \n", "16 Southeast Washington Wizards Washington, D.C. \n", "17 Western Conference Western Conference Western Conference \n", "18 Northwest Denver Nuggets Denver, CO \n", "19 Northwest Minnesota Timberwolves Minneapolis, MN \n", "20 Northwest Oklahoma City Thunder Oklahoma City, OK \n", "21 Northwest Portland Trail Blazers Portland, OR \n", "22 Northwest Utah Jazz Salt Lake City, UT \n", "23 Pacific Golden State Warriors Oakland, CA \n", "24 Pacific Los Angeles Clippers Los Angeles, CA \n", "25 Pacific Los Angeles Lakers Los Angeles, CA \n", "26 Pacific Phoenix Suns Phoenix, AZ \n", "27 Pacific Sacramento Kings Sacramento, CA \n", "28 Southwest Dallas Mavericks Dallas, TX \n", "29 Southwest Houston Rockets Houston, TX \n", "30 Southwest Memphis Grizzlies Memphis, TN \n", "31 Southwest New Orleans Pelicans New Orleans, LA \n", "32 Southwest San Antonio Spurs San Antonio, TX \n", "\n", "0 Arena Capacity \\\n", "1 Eastern Conference Eastern Conference \n", "2 TD Garden 18,624 \n", "3 Barclays Center 17,732 \n", "4 Madison Square Garden 19,812 \n", "5 Wells Fargo Center 21,600 \n", "6 Air Canada Centre 19,800 \n", "7 United Center 20,917 \n", "8 Quicken Loans Arena 20,562 \n", "9 Little Caesars Arena 20,491 \n", "10 Bankers Life Fieldhouse 17,923 \n", "11 Bradley Center 18,717 \n", "12 Philips Arena 15,711 \n", "13 Spectrum Center 19,077 \n", "14 American Airlines Arena 19,600 \n", "15 Amway Center 18,846 \n", "16 Capital One Arena 20,356 \n", "17 Western Conference Western Conference \n", "18 Pepsi Center 19,520 \n", "19 Target Center 19,356 \n", "20 Chesapeake Energy Arena 18,203 \n", "21 Moda Center 19,441 \n", "22 Vivint Smart Home Arena 19,911 \n", "23 Oracle Arena 19,596 \n", "24 Staples Center 19,060 \n", "25 Staples Center 18,997 \n", "26 Talking Stick Resort Arena 18,055 \n", "27 Golden 1 Center 17,500 \n", "28 American Airlines Center 19,200 \n", "29 Toyota Center 18,055 \n", "30 FedExForum 18,119 \n", "31 Smoothie King Center 16,867 \n", "32 AT&T Center 18,418 \n", "\n", "0 Coordinates Founded \\\n", "1 Eastern Conference Eastern Conference \n", "2 42°21′59″N 71°03′44″W / 42.366303°N 71.06222... 1946 \n", "3 40°40′58″N 73°58′29″W / 40.68265°N 73.974689... 1967* \n", "4 40°45′02″N 73°59′37″W / 40.750556°N 73.99361... 1946 \n", "5 39°54′04″N 75°10′19″W / 39.901111°N 75.17194... 1946* \n", "6 43°38′36″N 79°22′45″W / 43.643333°N 79.37916... 1995 \n", "7 41°52′50″N 87°40′27″W / 41.880556°N 87.67416... 1966 \n", "8 41°29′47″N 81°41′17″W / 41.496389°N 81.68805... 1970 \n", "9 42°41′49″N 83°14′44″W / 42.696944°N 83.24555... 1941* \n", "10 39°45′50″N 86°09′20″W / 39.763889°N 86.15555... 1967 \n", "11 43°02′37″N 87°55′01″W / 43.043611°N 87.91694... 1968 \n", "12 33°45′26″N 84°23′47″W / 33.757222°N 84.39638... 1946* \n", "13 35°13′30″N 80°50′21″W / 35.225°N 80.839167°W... 1988* \n", "14 25°46′53″N 80°11′17″W / 25.781389°N 80.18805... 1988 \n", "15 28°32′21″N 81°23′01″W / 28.539167°N 81.38361... 1989 \n", "16 38°53′53″N 77°01′15″W / 38.898056°N 77.02083... 1961* \n", "17 Western Conference Western Conference \n", "18 39°44′55″N 105°00′27″W / 39.748611°N 105.007... 1967 \n", "19 44°58′46″N 93°16′34″W / 44.979444°N 93.27611... 1989 \n", "20 35°27′48″N 97°30′54″W / 35.463333°N 97.515°W... 1967* \n", "21 45°31′54″N 122°40′00″W / 45.531667°N 122.666... 1970 \n", "22 40°46′06″N 111°54′04″W / 40.768333°N 111.901... 1974* \n", "23 37°45′01″N 122°12′11″W / 37.750278°N 122.203... 1946* \n", "24 34°02′35″N 118°16′02″W / 34.043056°N 118.267... 1970* \n", "25 34°02′35″N 118°16′02″W / 34.043056°N 118.267... 1947* \n", "26 33°26′45″N 112°04′17″W / 33.445833°N 112.071... 1968 \n", "27 38°38′57″N 121°31′05″W / 38.649167°N 121.518... 1923* \n", "28 32°47′26″N 96°48′37″W / 32.790556°N 96.81027... 1980 \n", "29 29°45′03″N 95°21′44″W / 29.750833°N 95.36222... 1967* \n", "30 35°08′18″N 90°03′02″W / 35.138333°N 90.05055... 1995* \n", "31 29°56′56″N 90°04′55″W / 29.948889°N 90.08194... 2002* \n", "32 29°25′37″N 98°26′15″W / 29.426944°N 98.4375°... 1967* \n", "\n", "0 Joined Head coach \n", "1 Eastern Conference Eastern Conference \n", "2 1946 Brad Stevens \n", "3 1976 Kenny Atkinson \n", "4 1946 Jeff Hornacek \n", "5 1949 Brett Brown \n", "6 1995 Dwane Casey \n", "7 1966 Fred Hoiberg \n", "8 1970 Tyronn Lue \n", "9 1948 Stan Van Gundy \n", "10 1976 Nate McMillan \n", "11 1968 Joe Prunty \n", "12 1949 Mike Budenholzer \n", "13 1988* Steve Clifford \n", "14 1988 Erik Spoelstra \n", "15 1989 Frank Vogel \n", "16 1961* Scott Brooks \n", "17 Western Conference Western Conference \n", "18 1976 Michael Malone \n", "19 1989 Tom Thibodeau \n", "20 1967* Billy Donovan \n", "21 1970 Terry Stotts \n", "22 1974* Quin Snyder \n", "23 1946* Steve Kerr \n", "24 1970* Doc Rivers \n", "25 1948 Luke Walton \n", "26 1968 Jay Triano \n", "27 1948 Dave Joerger \n", "28 1980 Rick Carlisle \n", "29 1967* Mike D'Antoni \n", "30 1995* J. B. Bickerstaff \n", "31 2002* Alvin Gentry \n", "32 1976 Gregg Popovich " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = setup_columns(raw)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### NBA Conference Information\n", "\n", "Next, notice that the Eastern and Western Conference repeat across the entire row. What we want is to remove those rows, and create a new column showing the conference for each team." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def cleanup_nba_conferences(df):\n", " df['temporary'] = df['Division']\n", " df = df.set_index('temporary')\n", " eastern = df.index.get_loc('Eastern Conference')\n", " western = df.index.get_loc('Western Conference')\n", " df.loc[eastern+1:western, 'Conference'] = 'Eastern'\n", " df.loc[western+1:, 'Conference'] = 'Western'\n", " df = df.drop(df.index[eastern]).drop(df.index[western])\n", " df['Conference'] = df['Conference'].astype('category')\n", " df['Division'] = df['Division'].astype('category')\n", " return df.reset_index(drop=True)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Division</th>\n", " <th>Team</th>\n", " <th>City</th>\n", " <th>Arena</th>\n", " <th>Capacity</th>\n", " <th>Coordinates</th>\n", " <th>Founded</th>\n", " <th>Joined</th>\n", " <th>Head coach</th>\n", " <th>Conference</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Atlantic</td>\n", " <td>Boston Celtics</td>\n", " <td>Boston, MA</td>\n", " <td>TD Garden</td>\n", " <td>18,624</td>\n", " <td>42°21′59″N 71°03′44″W / 42.366303°N 71.06222...</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Brad Stevens</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Atlantic</td>\n", " <td>Brooklyn Nets</td>\n", " <td>New York City, NY</td>\n", " <td>Barclays Center</td>\n", " <td>17,732</td>\n", " <td>40°40′58″N 73°58′29″W / 40.68265°N 73.974689...</td>\n", " <td>1967*</td>\n", " <td>1976</td>\n", " <td>Kenny Atkinson</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Atlantic</td>\n", " <td>New York Knicks</td>\n", " <td>New York City, NY</td>\n", " <td>Madison Square Garden</td>\n", " <td>19,812</td>\n", " <td>40°45′02″N 73°59′37″W / 40.750556°N 73.99361...</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Jeff Hornacek</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Atlantic</td>\n", " <td>Philadelphia 76ers</td>\n", " <td>Philadelphia, PA</td>\n", " <td>Wells Fargo Center</td>\n", " <td>21,600</td>\n", " <td>39°54′04″N 75°10′19″W / 39.901111°N 75.17194...</td>\n", " <td>1946*</td>\n", " <td>1949</td>\n", " <td>Brett Brown</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Atlantic</td>\n", " <td>Toronto Raptors</td>\n", " <td>Toronto, ON</td>\n", " <td>Air Canada Centre</td>\n", " <td>19,800</td>\n", " <td>43°38′36″N 79°22′45″W / 43.643333°N 79.37916...</td>\n", " <td>1995</td>\n", " <td>1995</td>\n", " <td>Dwane Casey</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>Central</td>\n", " <td>Chicago Bulls</td>\n", " <td>Chicago, IL</td>\n", " <td>United Center</td>\n", " <td>20,917</td>\n", " <td>41°52′50″N 87°40′27″W / 41.880556°N 87.67416...</td>\n", " <td>1966</td>\n", " <td>1966</td>\n", " <td>Fred Hoiberg</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>Central</td>\n", " <td>Cleveland Cavaliers</td>\n", " <td>Cleveland, OH</td>\n", " <td>Quicken Loans Arena</td>\n", " <td>20,562</td>\n", " <td>41°29′47″N 81°41′17″W / 41.496389°N 81.68805...</td>\n", " <td>1970</td>\n", " <td>1970</td>\n", " <td>Tyronn Lue</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>Central</td>\n", " <td>Detroit Pistons</td>\n", " <td>Detroit, MI</td>\n", " <td>Little Caesars Arena</td>\n", " <td>20,491</td>\n", " <td>42°41′49″N 83°14′44″W / 42.696944°N 83.24555...</td>\n", " <td>1941*</td>\n", " <td>1948</td>\n", " <td>Stan Van Gundy</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>Central</td>\n", " <td>Indiana Pacers</td>\n", " <td>Indianapolis, IN</td>\n", " <td>Bankers Life Fieldhouse</td>\n", " <td>17,923</td>\n", " <td>39°45′50″N 86°09′20″W / 39.763889°N 86.15555...</td>\n", " <td>1967</td>\n", " <td>1976</td>\n", " <td>Nate McMillan</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>Central</td>\n", " <td>Milwaukee Bucks</td>\n", " <td>Milwaukee, WI</td>\n", " <td>Bradley Center</td>\n", " <td>18,717</td>\n", " <td>43°02′37″N 87°55′01″W / 43.043611°N 87.91694...</td>\n", " <td>1968</td>\n", " <td>1968</td>\n", " <td>Joe Prunty</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>Southeast</td>\n", " <td>Atlanta Hawks</td>\n", " <td>Atlanta, GA</td>\n", " <td>Philips Arena</td>\n", " <td>15,711</td>\n", " <td>33°45′26″N 84°23′47″W / 33.757222°N 84.39638...</td>\n", " <td>1946*</td>\n", " <td>1949</td>\n", " <td>Mike Budenholzer</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>Southeast</td>\n", " <td>Charlotte Hornets</td>\n", " <td>Charlotte, NC</td>\n", " <td>Spectrum Center</td>\n", " <td>19,077</td>\n", " <td>35°13′30″N 80°50′21″W / 35.225°N 80.839167°W...</td>\n", " <td>1988*</td>\n", " <td>1988*</td>\n", " <td>Steve Clifford</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>12</th>\n", " <td>Southeast</td>\n", " <td>Miami Heat</td>\n", " <td>Miami, FL</td>\n", " <td>American Airlines Arena</td>\n", " <td>19,600</td>\n", " <td>25°46′53″N 80°11′17″W / 25.781389°N 80.18805...</td>\n", " <td>1988</td>\n", " <td>1988</td>\n", " <td>Erik Spoelstra</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>13</th>\n", " <td>Southeast</td>\n", " <td>Orlando Magic</td>\n", " <td>Orlando, FL</td>\n", " <td>Amway Center</td>\n", " <td>18,846</td>\n", " <td>28°32′21″N 81°23′01″W / 28.539167°N 81.38361...</td>\n", " <td>1989</td>\n", " <td>1989</td>\n", " <td>Frank Vogel</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>14</th>\n", " <td>Southeast</td>\n", " <td>Washington Wizards</td>\n", " <td>Washington, D.C.</td>\n", " <td>Capital One Arena</td>\n", " <td>20,356</td>\n", " <td>38°53′53″N 77°01′15″W / 38.898056°N 77.02083...</td>\n", " <td>1961*</td>\n", " <td>1961*</td>\n", " <td>Scott Brooks</td>\n", " <td>Eastern</td>\n", " </tr>\n", " <tr>\n", " <th>15</th>\n", " <td>Northwest</td>\n", " <td>Denver Nuggets</td>\n", " <td>Denver, CO</td>\n", " <td>Pepsi Center</td>\n", " <td>19,520</td>\n", " <td>39°44′55″N 105°00′27″W / 39.748611°N 105.007...</td>\n", " <td>1967</td>\n", " <td>1976</td>\n", " <td>Michael Malone</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>16</th>\n", " <td>Northwest</td>\n", " <td>Minnesota Timberwolves</td>\n", " <td>Minneapolis, MN</td>\n", " <td>Target Center</td>\n", " <td>19,356</td>\n", " <td>44°58′46″N 93°16′34″W / 44.979444°N 93.27611...</td>\n", " <td>1989</td>\n", " <td>1989</td>\n", " <td>Tom Thibodeau</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>17</th>\n", " <td>Northwest</td>\n", " <td>Oklahoma City Thunder</td>\n", " <td>Oklahoma City, OK</td>\n", " <td>Chesapeake Energy Arena</td>\n", " <td>18,203</td>\n", " <td>35°27′48″N 97°30′54″W / 35.463333°N 97.515°W...</td>\n", " <td>1967*</td>\n", " <td>1967*</td>\n", " <td>Billy Donovan</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>18</th>\n", " <td>Northwest</td>\n", " <td>Portland Trail Blazers</td>\n", " <td>Portland, OR</td>\n", " <td>Moda Center</td>\n", " <td>19,441</td>\n", " <td>45°31′54″N 122°40′00″W / 45.531667°N 122.666...</td>\n", " <td>1970</td>\n", " <td>1970</td>\n", " <td>Terry Stotts</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>19</th>\n", " <td>Northwest</td>\n", " <td>Utah Jazz</td>\n", " <td>Salt Lake City, UT</td>\n", " <td>Vivint Smart Home Arena</td>\n", " <td>19,911</td>\n", " <td>40°46′06″N 111°54′04″W / 40.768333°N 111.901...</td>\n", " <td>1974*</td>\n", " <td>1974*</td>\n", " <td>Quin Snyder</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>20</th>\n", " <td>Pacific</td>\n", " <td>Golden State Warriors</td>\n", " <td>Oakland, CA</td>\n", " <td>Oracle Arena</td>\n", " <td>19,596</td>\n", " <td>37°45′01″N 122°12′11″W / 37.750278°N 122.203...</td>\n", " <td>1946*</td>\n", " <td>1946*</td>\n", " <td>Steve Kerr</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>21</th>\n", " <td>Pacific</td>\n", " <td>Los Angeles Clippers</td>\n", " <td>Los Angeles, CA</td>\n", " <td>Staples Center</td>\n", " <td>19,060</td>\n", " <td>34°02′35″N 118°16′02″W / 34.043056°N 118.267...</td>\n", " <td>1970*</td>\n", " <td>1970*</td>\n", " <td>Doc Rivers</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>22</th>\n", " <td>Pacific</td>\n", " <td>Los Angeles Lakers</td>\n", " <td>Los Angeles, CA</td>\n", " <td>Staples Center</td>\n", " <td>18,997</td>\n", " <td>34°02′35″N 118°16′02″W / 34.043056°N 118.267...</td>\n", " <td>1947*</td>\n", " <td>1948</td>\n", " <td>Luke Walton</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>23</th>\n", " <td>Pacific</td>\n", " <td>Phoenix Suns</td>\n", " <td>Phoenix, AZ</td>\n", " <td>Talking Stick Resort Arena</td>\n", " <td>18,055</td>\n", " <td>33°26′45″N 112°04′17″W / 33.445833°N 112.071...</td>\n", " <td>1968</td>\n", " <td>1968</td>\n", " <td>Jay Triano</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>24</th>\n", " <td>Pacific</td>\n", " <td>Sacramento Kings</td>\n", " <td>Sacramento, CA</td>\n", " <td>Golden 1 Center</td>\n", " <td>17,500</td>\n", " <td>38°38′57″N 121°31′05″W / 38.649167°N 121.518...</td>\n", " <td>1923*</td>\n", " <td>1948</td>\n", " <td>Dave Joerger</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>25</th>\n", " <td>Southwest</td>\n", " <td>Dallas Mavericks</td>\n", " <td>Dallas, TX</td>\n", " <td>American Airlines Center</td>\n", " <td>19,200</td>\n", " <td>32°47′26″N 96°48′37″W / 32.790556°N 96.81027...</td>\n", " <td>1980</td>\n", " <td>1980</td>\n", " <td>Rick Carlisle</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>26</th>\n", " <td>Southwest</td>\n", " <td>Houston Rockets</td>\n", " <td>Houston, TX</td>\n", " <td>Toyota Center</td>\n", " <td>18,055</td>\n", " <td>29°45′03″N 95°21′44″W / 29.750833°N 95.36222...</td>\n", " <td>1967*</td>\n", " <td>1967*</td>\n", " <td>Mike D'Antoni</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>27</th>\n", " <td>Southwest</td>\n", " <td>Memphis Grizzlies</td>\n", " <td>Memphis, TN</td>\n", " <td>FedExForum</td>\n", " <td>18,119</td>\n", " <td>35°08′18″N 90°03′02″W / 35.138333°N 90.05055...</td>\n", " <td>1995*</td>\n", " <td>1995*</td>\n", " <td>J. B. Bickerstaff</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>28</th>\n", " <td>Southwest</td>\n", " <td>New Orleans Pelicans</td>\n", " <td>New Orleans, LA</td>\n", " <td>Smoothie King Center</td>\n", " <td>16,867</td>\n", " <td>29°56′56″N 90°04′55″W / 29.948889°N 90.08194...</td>\n", " <td>2002*</td>\n", " <td>2002*</td>\n", " <td>Alvin Gentry</td>\n", " <td>Western</td>\n", " </tr>\n", " <tr>\n", " <th>29</th>\n", " <td>Southwest</td>\n", " <td>San Antonio Spurs</td>\n", " <td>San Antonio, TX</td>\n", " <td>AT&T Center</td>\n", " <td>18,418</td>\n", " <td>29°25′37″N 98°26′15″W / 29.426944°N 98.4375°...</td>\n", " <td>1967*</td>\n", " <td>1976</td>\n", " <td>Gregg Popovich</td>\n", " <td>Western</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "0 Division Team City \\\n", "0 Atlantic Boston Celtics Boston, MA \n", "1 Atlantic Brooklyn Nets New York City, NY \n", "2 Atlantic New York Knicks New York City, NY \n", "3 Atlantic Philadelphia 76ers Philadelphia, PA \n", "4 Atlantic Toronto Raptors Toronto, ON \n", "5 Central Chicago Bulls Chicago, IL \n", "6 Central Cleveland Cavaliers Cleveland, OH \n", "7 Central Detroit Pistons Detroit, MI \n", "8 Central Indiana Pacers Indianapolis, IN \n", "9 Central Milwaukee Bucks Milwaukee, WI \n", "10 Southeast Atlanta Hawks Atlanta, GA \n", "11 Southeast Charlotte Hornets Charlotte, NC \n", "12 Southeast Miami Heat Miami, FL \n", "13 Southeast Orlando Magic Orlando, FL \n", "14 Southeast Washington Wizards Washington, D.C. \n", "15 Northwest Denver Nuggets Denver, CO \n", "16 Northwest Minnesota Timberwolves Minneapolis, MN \n", "17 Northwest Oklahoma City Thunder Oklahoma City, OK \n", "18 Northwest Portland Trail Blazers Portland, OR \n", "19 Northwest Utah Jazz Salt Lake City, UT \n", "20 Pacific Golden State Warriors Oakland, CA \n", "21 Pacific Los Angeles Clippers Los Angeles, CA \n", "22 Pacific Los Angeles Lakers Los Angeles, CA \n", "23 Pacific Phoenix Suns Phoenix, AZ \n", "24 Pacific Sacramento Kings Sacramento, CA \n", "25 Southwest Dallas Mavericks Dallas, TX \n", "26 Southwest Houston Rockets Houston, TX \n", "27 Southwest Memphis Grizzlies Memphis, TN \n", "28 Southwest New Orleans Pelicans New Orleans, LA \n", "29 Southwest San Antonio Spurs San Antonio, TX \n", "\n", "0 Arena Capacity \\\n", "0 TD Garden 18,624 \n", "1 Barclays Center 17,732 \n", "2 Madison Square Garden 19,812 \n", "3 Wells Fargo Center 21,600 \n", "4 Air Canada Centre 19,800 \n", "5 United Center 20,917 \n", "6 Quicken Loans Arena 20,562 \n", "7 Little Caesars Arena 20,491 \n", "8 Bankers Life Fieldhouse 17,923 \n", "9 Bradley Center 18,717 \n", "10 Philips Arena 15,711 \n", "11 Spectrum Center 19,077 \n", "12 American Airlines Arena 19,600 \n", "13 Amway Center 18,846 \n", "14 Capital One Arena 20,356 \n", "15 Pepsi Center 19,520 \n", "16 Target Center 19,356 \n", "17 Chesapeake Energy Arena 18,203 \n", "18 Moda Center 19,441 \n", "19 Vivint Smart Home Arena 19,911 \n", "20 Oracle Arena 19,596 \n", "21 Staples Center 19,060 \n", "22 Staples Center 18,997 \n", "23 Talking Stick Resort Arena 18,055 \n", "24 Golden 1 Center 17,500 \n", "25 American Airlines Center 19,200 \n", "26 Toyota Center 18,055 \n", "27 FedExForum 18,119 \n", "28 Smoothie King Center 16,867 \n", "29 AT&T Center 18,418 \n", "\n", "0 Coordinates Founded Joined \\\n", "0 42°21′59″N 71°03′44″W / 42.366303°N 71.06222... 1946 1946 \n", "1 40°40′58″N 73°58′29″W / 40.68265°N 73.974689... 1967* 1976 \n", "2 40°45′02″N 73°59′37″W / 40.750556°N 73.99361... 1946 1946 \n", "3 39°54′04″N 75°10′19″W / 39.901111°N 75.17194... 1946* 1949 \n", "4 43°38′36″N 79°22′45″W / 43.643333°N 79.37916... 1995 1995 \n", "5 41°52′50″N 87°40′27″W / 41.880556°N 87.67416... 1966 1966 \n", "6 41°29′47″N 81°41′17″W / 41.496389°N 81.68805... 1970 1970 \n", "7 42°41′49″N 83°14′44″W / 42.696944°N 83.24555... 1941* 1948 \n", "8 39°45′50″N 86°09′20″W / 39.763889°N 86.15555... 1967 1976 \n", "9 43°02′37″N 87°55′01″W / 43.043611°N 87.91694... 1968 1968 \n", "10 33°45′26″N 84°23′47″W / 33.757222°N 84.39638... 1946* 1949 \n", "11 35°13′30″N 80°50′21″W / 35.225°N 80.839167°W... 1988* 1988* \n", "12 25°46′53″N 80°11′17″W / 25.781389°N 80.18805... 1988 1988 \n", "13 28°32′21″N 81°23′01″W / 28.539167°N 81.38361... 1989 1989 \n", "14 38°53′53″N 77°01′15″W / 38.898056°N 77.02083... 1961* 1961* \n", "15 39°44′55″N 105°00′27″W / 39.748611°N 105.007... 1967 1976 \n", "16 44°58′46″N 93°16′34″W / 44.979444°N 93.27611... 1989 1989 \n", "17 35°27′48″N 97°30′54″W / 35.463333°N 97.515°W... 1967* 1967* \n", "18 45°31′54″N 122°40′00″W / 45.531667°N 122.666... 1970 1970 \n", "19 40°46′06″N 111°54′04″W / 40.768333°N 111.901... 1974* 1974* \n", "20 37°45′01″N 122°12′11″W / 37.750278°N 122.203... 1946* 1946* \n", "21 34°02′35″N 118°16′02″W / 34.043056°N 118.267... 1970* 1970* \n", "22 34°02′35″N 118°16′02″W / 34.043056°N 118.267... 1947* 1948 \n", "23 33°26′45″N 112°04′17″W / 33.445833°N 112.071... 1968 1968 \n", "24 38°38′57″N 121°31′05″W / 38.649167°N 121.518... 1923* 1948 \n", "25 32°47′26″N 96°48′37″W / 32.790556°N 96.81027... 1980 1980 \n", "26 29°45′03″N 95°21′44″W / 29.750833°N 95.36222... 1967* 1967* \n", "27 35°08′18″N 90°03′02″W / 35.138333°N 90.05055... 1995* 1995* \n", "28 29°56′56″N 90°04′55″W / 29.948889°N 90.08194... 2002* 2002* \n", "29 29°25′37″N 98°26′15″W / 29.426944°N 98.4375°... 1967* 1976 \n", "\n", "0 Head coach Conference \n", "0 Brad Stevens Eastern \n", "1 Kenny Atkinson Eastern \n", "2 Jeff Hornacek Eastern \n", "3 Brett Brown Eastern \n", "4 Dwane Casey Eastern \n", "5 Fred Hoiberg Eastern \n", "6 Tyronn Lue Eastern \n", "7 Stan Van Gundy Eastern \n", "8 Nate McMillan Eastern \n", "9 Joe Prunty Eastern \n", "10 Mike Budenholzer Eastern \n", "11 Steve Clifford Eastern \n", "12 Erik Spoelstra Eastern \n", "13 Frank Vogel Eastern \n", "14 Scott Brooks Eastern \n", "15 Michael Malone Western \n", "16 Tom Thibodeau Western \n", "17 Billy Donovan Western \n", "18 Terry Stotts Western \n", "19 Quin Snyder Western \n", "20 Steve Kerr Western \n", "21 Doc Rivers Western \n", "22 Luke Walton Western \n", "23 Jay Triano Western \n", "24 Dave Joerger Western \n", "25 Rick Carlisle Western \n", "26 Mike D'Antoni Western \n", "27 J. B. Bickerstaff Western \n", "28 Alvin Gentry Western \n", "29 Gregg Popovich Western " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = cleanup_nba_conferences(df)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### City and Postal Code\n", "\n", "Next, we want to split the city and the postal code into two separate columns.\n", "\n", "To do this, we need to use [`pandas` string-handling methods](https://pandas.pydata.org/pandas-docs/stable/text.html)." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "def split_city_postal(df):\n", " df['Postal'] = df['City'].str.rsplit(',', n=1).str.get(1).str.replace('.', '').str.strip()\n", " df['City'] = df['City'].str.rsplit(',', n=1).str.get(0)\n", " return df" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Division</th>\n", " <th>Team</th>\n", " <th>City</th>\n", " <th>Arena</th>\n", " <th>Capacity</th>\n", " <th>Coordinates</th>\n", " <th>Founded</th>\n", " <th>Joined</th>\n", " <th>Head coach</th>\n", " <th>Conference</th>\n", " <th>Postal</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Atlantic</td>\n", " <td>Boston Celtics</td>\n", " <td>Boston</td>\n", " <td>TD Garden</td>\n", " <td>18,624</td>\n", " <td>42°21′59″N 71°03′44″W / 42.366303°N 71.06222...</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Brad Stevens</td>\n", " <td>Eastern</td>\n", " <td>MA</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Atlantic</td>\n", " <td>Brooklyn Nets</td>\n", " <td>New York City</td>\n", " <td>Barclays Center</td>\n", " <td>17,732</td>\n", " <td>40°40′58″N 73°58′29″W / 40.68265°N 73.974689...</td>\n", " <td>1967*</td>\n", " <td>1976</td>\n", " <td>Kenny Atkinson</td>\n", " <td>Eastern</td>\n", " <td>NY</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Atlantic</td>\n", " <td>New York Knicks</td>\n", " <td>New York City</td>\n", " <td>Madison Square Garden</td>\n", " <td>19,812</td>\n", " <td>40°45′02″N 73°59′37″W / 40.750556°N 73.99361...</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Jeff Hornacek</td>\n", " <td>Eastern</td>\n", " <td>NY</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Atlantic</td>\n", " <td>Philadelphia 76ers</td>\n", " <td>Philadelphia</td>\n", " <td>Wells Fargo Center</td>\n", " <td>21,600</td>\n", " <td>39°54′04″N 75°10′19″W / 39.901111°N 75.17194...</td>\n", " <td>1946*</td>\n", " <td>1949</td>\n", " <td>Brett Brown</td>\n", " <td>Eastern</td>\n", " <td>PA</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Atlantic</td>\n", " <td>Toronto Raptors</td>\n", " <td>Toronto</td>\n", " <td>Air Canada Centre</td>\n", " <td>19,800</td>\n", " <td>43°38′36″N 79°22′45″W / 43.643333°N 79.37916...</td>\n", " <td>1995</td>\n", " <td>1995</td>\n", " <td>Dwane Casey</td>\n", " <td>Eastern</td>\n", " <td>ON</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "0 Division Team City Arena \\\n", "0 Atlantic Boston Celtics Boston TD Garden \n", "1 Atlantic Brooklyn Nets New York City Barclays Center \n", "2 Atlantic New York Knicks New York City Madison Square Garden \n", "3 Atlantic Philadelphia 76ers Philadelphia Wells Fargo Center \n", "4 Atlantic Toronto Raptors Toronto Air Canada Centre \n", "\n", "0 Capacity Coordinates Founded Joined \\\n", "0 18,624 42°21′59″N 71°03′44″W / 42.366303°N 71.06222... 1946 1946 \n", "1 17,732 40°40′58″N 73°58′29″W / 40.68265°N 73.974689... 1967* 1976 \n", "2 19,812 40°45′02″N 73°59′37″W / 40.750556°N 73.99361... 1946 1946 \n", "3 21,600 39°54′04″N 75°10′19″W / 39.901111°N 75.17194... 1946* 1949 \n", "4 19,800 43°38′36″N 79°22′45″W / 43.643333°N 79.37916... 1995 1995 \n", "\n", "0 Head coach Conference Postal \n", "0 Brad Stevens Eastern MA \n", "1 Kenny Atkinson Eastern NY \n", "2 Jeff Hornacek Eastern NY \n", "3 Brett Brown Eastern PA \n", "4 Dwane Casey Eastern ON " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = split_city_postal(df)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Arena Latitude and Longitude\n", "\n", "Lastly, we need to clean up the arena latitude and longitude. This is a little tricky, since there is a lot of content packed into the Coordinates column in the `DataFrame`. Let's focus on one row to see what's going on." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['42°21′59″N 71°03′44″W\\ufeff ',\n", " ' \\ufeff42.366303°N 71.062228°W\\ufeff ',\n", " ' 42.366303; -71.062228\\ufeff (Boston Celtics)']]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "row = list(df.loc[df['Team'] == 'Boston Celtics', 'Coordinates'].str.split('/'))\n", "row" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 3 elements per row, with different formats for the latitude and longitude. In case you were wondering, the `\\ufeff` appearing in the text strings are a special [Unicode](https://en.wikipedia.org/wiki/Byte_order_mark) character. We are going to ignore the first two elements and just get the third.\n", "\n", "We need to split this third element into latitude and longitude by the semi-colon (;) and extract the numbers. Again, we will use `pandas` string-handling methods, along with [regular expressions](https://docs.python.org/3/howto/regex.html). Regular expressions are a very general way to find and extract text in many computer languages, including Python. In this particular case, the regular expression just gets numbers with a decimal point, potentially starting with a negative sign." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "def get_arena_lat_lon(df):\n", " df['Coordinates'] = df['Coordinates'].str.split('/').str.get(2).str.split(';')\n", " df['Latitude'] = df['Coordinates'].str.get(0).astype(float)\n", " df['Longitude'] = df['Coordinates'].str.get(1).str.extract('(-+[\\d]*\\.[\\d]*)', expand=False).astype(float)\n", " return df" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Division</th>\n", " <th>Team</th>\n", " <th>City</th>\n", " <th>Arena</th>\n", " <th>Capacity</th>\n", " <th>Coordinates</th>\n", " <th>Founded</th>\n", " <th>Joined</th>\n", " <th>Head coach</th>\n", " <th>Conference</th>\n", " <th>Postal</th>\n", " <th>Latitude</th>\n", " <th>Longitude</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Atlantic</td>\n", " <td>Boston Celtics</td>\n", " <td>Boston</td>\n", " <td>TD Garden</td>\n", " <td>18,624</td>\n", " <td>[ 42.366303, -71.062228 (Boston Celtics)]</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Brad Stevens</td>\n", " <td>Eastern</td>\n", " <td>MA</td>\n", " <td>42.366303</td>\n", " <td>-71.062228</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Atlantic</td>\n", " <td>Brooklyn Nets</td>\n", " <td>New York City</td>\n", " <td>Barclays Center</td>\n", " <td>17,732</td>\n", " <td>[ 40.68265, -73.974689 (Brooklyn Nets)]</td>\n", " <td>1967*</td>\n", " <td>1976</td>\n", " <td>Kenny Atkinson</td>\n", " <td>Eastern</td>\n", " <td>NY</td>\n", " <td>40.682650</td>\n", " <td>-73.974689</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>Atlantic</td>\n", " <td>New York Knicks</td>\n", " <td>New York City</td>\n", " <td>Madison Square Garden</td>\n", " <td>19,812</td>\n", " <td>[ 40.750556, -73.993611 (New York Knicks)]</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Jeff Hornacek</td>\n", " <td>Eastern</td>\n", " <td>NY</td>\n", " <td>40.750556</td>\n", " <td>-73.993611</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Atlantic</td>\n", " <td>Philadelphia 76ers</td>\n", " <td>Philadelphia</td>\n", " <td>Wells Fargo Center</td>\n", " <td>21,600</td>\n", " <td>[ 39.901111, -75.171944 (Philadelphia 76ers)]</td>\n", " <td>1946*</td>\n", " <td>1949</td>\n", " <td>Brett Brown</td>\n", " <td>Eastern</td>\n", " <td>PA</td>\n", " <td>39.901111</td>\n", " <td>-75.171944</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Atlantic</td>\n", " <td>Toronto Raptors</td>\n", " <td>Toronto</td>\n", " <td>Air Canada Centre</td>\n", " <td>19,800</td>\n", " <td>[ 43.643333, -79.379167 (Toronto Raptors)]</td>\n", " <td>1995</td>\n", " <td>1995</td>\n", " <td>Dwane Casey</td>\n", " <td>Eastern</td>\n", " <td>ON</td>\n", " <td>43.643333</td>\n", " <td>-79.379167</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "0 Division Team City Arena \\\n", "0 Atlantic Boston Celtics Boston TD Garden \n", "1 Atlantic Brooklyn Nets New York City Barclays Center \n", "2 Atlantic New York Knicks New York City Madison Square Garden \n", "3 Atlantic Philadelphia 76ers Philadelphia Wells Fargo Center \n", "4 Atlantic Toronto Raptors Toronto Air Canada Centre \n", "\n", "0 Capacity Coordinates Founded Joined \\\n", "0 18,624 [ 42.366303, -71.062228 (Boston Celtics)] 1946 1946 \n", "1 17,732 [ 40.68265, -73.974689 (Brooklyn Nets)] 1967* 1976 \n", "2 19,812 [ 40.750556, -73.993611 (New York Knicks)] 1946 1946 \n", "3 21,600 [ 39.901111, -75.171944 (Philadelphia 76ers)] 1946* 1949 \n", "4 19,800 [ 43.643333, -79.379167 (Toronto Raptors)] 1995 1995 \n", "\n", "0 Head coach Conference Postal Latitude Longitude \n", "0 Brad Stevens Eastern MA 42.366303 -71.062228 \n", "1 Kenny Atkinson Eastern NY 40.682650 -73.974689 \n", "2 Jeff Hornacek Eastern NY 40.750556 -73.993611 \n", "3 Brett Brown Eastern PA 39.901111 -75.171944 \n", "4 Dwane Casey Eastern ON 43.643333 -79.379167 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = get_arena_lat_lon(df)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Putting It All Together\n", "\n", "Now we'll just put all these steps into one function. This function will combine all the steps, do a few more simple cleanups, and drop columns that we don't need at the end. We also want to save the final, cleaned-up results." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "def wiki_teams_info(raw):\n", " df = setup_columns(raw)\n", " df = cleanup_nba_conferences(df)\n", " df = split_city_postal(df)\n", " df = get_arena_lat_lon(df)\n", " df['Capacity'] = df['Capacity'].str.replace(',', '').astype(int)\n", " df['Founded'] = df['Founded'].str.replace('*', '').astype(int)\n", " df['Joined'] = df['Joined'].str.replace('*', '').astype(int)\n", " cols = [\n", " 'Team',\n", " 'Conference',\n", " 'Division',\n", " 'City',\n", " 'Postal',\n", " 'Arena',\n", " 'Capacity',\n", " 'Latitude',\n", " 'Longitude',\n", " 'Founded',\n", " 'Joined',\n", " 'Head coach',\n", " ]\n", " return df[cols].reset_index(drop=True)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Team</th>\n", " <th>Conference</th>\n", " <th>Division</th>\n", " <th>City</th>\n", " <th>Postal</th>\n", " <th>Arena</th>\n", " <th>Capacity</th>\n", " <th>Latitude</th>\n", " <th>Longitude</th>\n", " <th>Founded</th>\n", " <th>Joined</th>\n", " <th>Head coach</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Boston Celtics</td>\n", " <td>Eastern</td>\n", " <td>Atlantic</td>\n", " <td>Boston</td>\n", " <td>MA</td>\n", " <td>TD Garden</td>\n", " <td>18624</td>\n", " <td>42.366303</td>\n", " <td>-71.062228</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Brad Stevens</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Brooklyn Nets</td>\n", " <td>Eastern</td>\n", " <td>Atlantic</td>\n", " <td>New York City</td>\n", " <td>NY</td>\n", " <td>Barclays Center</td>\n", " <td>17732</td>\n", " <td>40.682650</td>\n", " <td>-73.974689</td>\n", " <td>1967</td>\n", " <td>1976</td>\n", " <td>Kenny Atkinson</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>New York Knicks</td>\n", " <td>Eastern</td>\n", " <td>Atlantic</td>\n", " <td>New York City</td>\n", " <td>NY</td>\n", " <td>Madison Square Garden</td>\n", " <td>19812</td>\n", " <td>40.750556</td>\n", " <td>-73.993611</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Jeff Hornacek</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>Philadelphia 76ers</td>\n", " <td>Eastern</td>\n", " <td>Atlantic</td>\n", " <td>Philadelphia</td>\n", " <td>PA</td>\n", " <td>Wells Fargo Center</td>\n", " <td>21600</td>\n", " <td>39.901111</td>\n", " <td>-75.171944</td>\n", " <td>1946</td>\n", " <td>1949</td>\n", " <td>Brett Brown</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Toronto Raptors</td>\n", " <td>Eastern</td>\n", " <td>Atlantic</td>\n", " <td>Toronto</td>\n", " <td>ON</td>\n", " <td>Air Canada Centre</td>\n", " <td>19800</td>\n", " <td>43.643333</td>\n", " <td>-79.379167</td>\n", " <td>1995</td>\n", " <td>1995</td>\n", " <td>Dwane Casey</td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>Chicago Bulls</td>\n", " <td>Eastern</td>\n", " <td>Central</td>\n", " <td>Chicago</td>\n", " <td>IL</td>\n", " <td>United Center</td>\n", " <td>20917</td>\n", " <td>41.880556</td>\n", " <td>-87.674167</td>\n", " <td>1966</td>\n", " <td>1966</td>\n", " <td>Fred Hoiberg</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>Cleveland Cavaliers</td>\n", " <td>Eastern</td>\n", " <td>Central</td>\n", " <td>Cleveland</td>\n", " <td>OH</td>\n", " <td>Quicken Loans Arena</td>\n", " <td>20562</td>\n", " <td>41.496389</td>\n", " <td>-81.688056</td>\n", " <td>1970</td>\n", " <td>1970</td>\n", " <td>Tyronn Lue</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>Detroit Pistons</td>\n", " <td>Eastern</td>\n", " <td>Central</td>\n", " <td>Detroit</td>\n", " <td>MI</td>\n", " <td>Little Caesars Arena</td>\n", " <td>20491</td>\n", " <td>42.696944</td>\n", " <td>-83.245556</td>\n", " <td>1941</td>\n", " <td>1948</td>\n", " <td>Stan Van Gundy</td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>Indiana Pacers</td>\n", " <td>Eastern</td>\n", " <td>Central</td>\n", " <td>Indianapolis</td>\n", " <td>IN</td>\n", " <td>Bankers Life Fieldhouse</td>\n", " <td>17923</td>\n", " <td>39.763889</td>\n", " <td>-86.155556</td>\n", " <td>1967</td>\n", " <td>1976</td>\n", " <td>Nate McMillan</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>Milwaukee Bucks</td>\n", " <td>Eastern</td>\n", " <td>Central</td>\n", " <td>Milwaukee</td>\n", " <td>WI</td>\n", " <td>Bradley Center</td>\n", " <td>18717</td>\n", " <td>43.043611</td>\n", " <td>-87.916944</td>\n", " <td>1968</td>\n", " <td>1968</td>\n", " <td>Joe Prunty</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>Atlanta Hawks</td>\n", " <td>Eastern</td>\n", " <td>Southeast</td>\n", " <td>Atlanta</td>\n", " <td>GA</td>\n", " <td>Philips Arena</td>\n", " <td>15711</td>\n", " <td>33.757222</td>\n", " <td>-84.396389</td>\n", " <td>1946</td>\n", " <td>1949</td>\n", " <td>Mike Budenholzer</td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>Charlotte Hornets</td>\n", " <td>Eastern</td>\n", " <td>Southeast</td>\n", " <td>Charlotte</td>\n", " <td>NC</td>\n", " <td>Spectrum Center</td>\n", " <td>19077</td>\n", " <td>35.225000</td>\n", " <td>-80.839167</td>\n", " <td>1988</td>\n", " <td>1988</td>\n", " <td>Steve Clifford</td>\n", " </tr>\n", " <tr>\n", " <th>12</th>\n", " <td>Miami Heat</td>\n", " <td>Eastern</td>\n", " <td>Southeast</td>\n", " <td>Miami</td>\n", " <td>FL</td>\n", " <td>American Airlines Arena</td>\n", " <td>19600</td>\n", " <td>25.781389</td>\n", " <td>-80.188056</td>\n", " <td>1988</td>\n", " <td>1988</td>\n", " <td>Erik Spoelstra</td>\n", " </tr>\n", " <tr>\n", " <th>13</th>\n", " <td>Orlando Magic</td>\n", " <td>Eastern</td>\n", " <td>Southeast</td>\n", " <td>Orlando</td>\n", " <td>FL</td>\n", " <td>Amway Center</td>\n", " <td>18846</td>\n", " <td>28.539167</td>\n", " <td>-81.383611</td>\n", " <td>1989</td>\n", " <td>1989</td>\n", " <td>Frank Vogel</td>\n", " </tr>\n", " <tr>\n", " <th>14</th>\n", " <td>Washington Wizards</td>\n", " <td>Eastern</td>\n", " <td>Southeast</td>\n", " <td>Washington</td>\n", " <td>DC</td>\n", " <td>Capital One Arena</td>\n", " <td>20356</td>\n", " <td>38.898056</td>\n", " <td>-77.020833</td>\n", " <td>1961</td>\n", " <td>1961</td>\n", " <td>Scott Brooks</td>\n", " </tr>\n", " <tr>\n", " <th>15</th>\n", " <td>Denver Nuggets</td>\n", " <td>Western</td>\n", " <td>Northwest</td>\n", " <td>Denver</td>\n", " <td>CO</td>\n", " <td>Pepsi Center</td>\n", " <td>19520</td>\n", " <td>39.748611</td>\n", " <td>-105.007500</td>\n", " <td>1967</td>\n", " <td>1976</td>\n", " <td>Michael Malone</td>\n", " </tr>\n", " <tr>\n", " <th>16</th>\n", " <td>Minnesota Timberwolves</td>\n", " <td>Western</td>\n", " <td>Northwest</td>\n", " <td>Minneapolis</td>\n", " <td>MN</td>\n", " <td>Target Center</td>\n", " <td>19356</td>\n", " <td>44.979444</td>\n", " <td>-93.276111</td>\n", " <td>1989</td>\n", " <td>1989</td>\n", " <td>Tom Thibodeau</td>\n", " </tr>\n", " <tr>\n", " <th>17</th>\n", " <td>Oklahoma City Thunder</td>\n", " <td>Western</td>\n", " <td>Northwest</td>\n", " <td>Oklahoma City</td>\n", " <td>OK</td>\n", " <td>Chesapeake Energy Arena</td>\n", " <td>18203</td>\n", " <td>35.463333</td>\n", " <td>-97.515000</td>\n", " <td>1967</td>\n", " <td>1967</td>\n", " <td>Billy Donovan</td>\n", " </tr>\n", " <tr>\n", " <th>18</th>\n", " <td>Portland Trail Blazers</td>\n", " <td>Western</td>\n", " <td>Northwest</td>\n", " <td>Portland</td>\n", " <td>OR</td>\n", " <td>Moda Center</td>\n", " <td>19441</td>\n", " <td>45.531667</td>\n", " <td>-122.666667</td>\n", " <td>1970</td>\n", " <td>1970</td>\n", " <td>Terry Stotts</td>\n", " </tr>\n", " <tr>\n", " <th>19</th>\n", " <td>Utah Jazz</td>\n", " <td>Western</td>\n", " <td>Northwest</td>\n", " <td>Salt Lake City</td>\n", " <td>UT</td>\n", " <td>Vivint Smart Home Arena</td>\n", " <td>19911</td>\n", " <td>40.768333</td>\n", " <td>-111.901111</td>\n", " <td>1974</td>\n", " <td>1974</td>\n", " <td>Quin Snyder</td>\n", " </tr>\n", " <tr>\n", " <th>20</th>\n", " <td>Golden State Warriors</td>\n", " <td>Western</td>\n", " <td>Pacific</td>\n", " <td>Oakland</td>\n", " <td>CA</td>\n", " <td>Oracle Arena</td>\n", " <td>19596</td>\n", " <td>37.750278</td>\n", " <td>-122.203056</td>\n", " <td>1946</td>\n", " <td>1946</td>\n", " <td>Steve Kerr</td>\n", " </tr>\n", " <tr>\n", " <th>21</th>\n", " <td>Los Angeles Clippers</td>\n", " <td>Western</td>\n", " <td>Pacific</td>\n", " <td>Los Angeles</td>\n", " <td>CA</td>\n", " <td>Staples Center</td>\n", " <td>19060</td>\n", " <td>34.043056</td>\n", " <td>-118.267222</td>\n", " <td>1970</td>\n", " <td>1970</td>\n", " <td>Doc Rivers</td>\n", " </tr>\n", " <tr>\n", " <th>22</th>\n", " <td>Los Angeles Lakers</td>\n", " <td>Western</td>\n", " <td>Pacific</td>\n", " <td>Los Angeles</td>\n", " <td>CA</td>\n", " <td>Staples Center</td>\n", " <td>18997</td>\n", " <td>34.043056</td>\n", " <td>-118.267222</td>\n", " <td>1947</td>\n", " <td>1948</td>\n", " <td>Luke Walton</td>\n", " </tr>\n", " <tr>\n", " <th>23</th>\n", " <td>Phoenix Suns</td>\n", " <td>Western</td>\n", " <td>Pacific</td>\n", " <td>Phoenix</td>\n", " <td>AZ</td>\n", " <td>Talking Stick Resort Arena</td>\n", " <td>18055</td>\n", " <td>33.445833</td>\n", " <td>-112.071389</td>\n", " <td>1968</td>\n", " <td>1968</td>\n", " <td>Jay Triano</td>\n", " </tr>\n", " <tr>\n", " <th>24</th>\n", " <td>Sacramento Kings</td>\n", " <td>Western</td>\n", " <td>Pacific</td>\n", " <td>Sacramento</td>\n", " <td>CA</td>\n", " <td>Golden 1 Center</td>\n", " <td>17500</td>\n", " <td>38.649167</td>\n", " <td>-121.518056</td>\n", " <td>1923</td>\n", " <td>1948</td>\n", " <td>Dave Joerger</td>\n", " </tr>\n", " <tr>\n", " <th>25</th>\n", " <td>Dallas Mavericks</td>\n", " <td>Western</td>\n", " <td>Southwest</td>\n", " <td>Dallas</td>\n", " <td>TX</td>\n", " <td>American Airlines Center</td>\n", " <td>19200</td>\n", " <td>32.790556</td>\n", " <td>-96.810278</td>\n", " <td>1980</td>\n", " <td>1980</td>\n", " <td>Rick Carlisle</td>\n", " </tr>\n", " <tr>\n", " <th>26</th>\n", " <td>Houston Rockets</td>\n", " <td>Western</td>\n", " <td>Southwest</td>\n", " <td>Houston</td>\n", " <td>TX</td>\n", " <td>Toyota Center</td>\n", " <td>18055</td>\n", " <td>29.750833</td>\n", " <td>-95.362222</td>\n", " <td>1967</td>\n", " <td>1967</td>\n", " <td>Mike D'Antoni</td>\n", " </tr>\n", " <tr>\n", " <th>27</th>\n", " <td>Memphis Grizzlies</td>\n", " <td>Western</td>\n", " <td>Southwest</td>\n", " <td>Memphis</td>\n", " <td>TN</td>\n", " <td>FedExForum</td>\n", " <td>18119</td>\n", " <td>35.138333</td>\n", " <td>-90.050556</td>\n", " <td>1995</td>\n", " <td>1995</td>\n", " <td>J. B. Bickerstaff</td>\n", " </tr>\n", " <tr>\n", " <th>28</th>\n", " <td>New Orleans Pelicans</td>\n", " <td>Western</td>\n", " <td>Southwest</td>\n", " <td>New Orleans</td>\n", " <td>LA</td>\n", " <td>Smoothie King Center</td>\n", " <td>16867</td>\n", " <td>29.948889</td>\n", " <td>-90.081944</td>\n", " <td>2002</td>\n", " <td>2002</td>\n", " <td>Alvin Gentry</td>\n", " </tr>\n", " <tr>\n", " <th>29</th>\n", " <td>San Antonio Spurs</td>\n", " <td>Western</td>\n", " <td>Southwest</td>\n", " <td>San Antonio</td>\n", " <td>TX</td>\n", " <td>AT&T Center</td>\n", " <td>18418</td>\n", " <td>29.426944</td>\n", " <td>-98.437500</td>\n", " <td>1967</td>\n", " <td>1976</td>\n", " <td>Gregg Popovich</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "0 Team Conference Division City Postal \\\n", "0 Boston Celtics Eastern Atlantic Boston MA \n", "1 Brooklyn Nets Eastern Atlantic New York City NY \n", "2 New York Knicks Eastern Atlantic New York City NY \n", "3 Philadelphia 76ers Eastern Atlantic Philadelphia PA \n", "4 Toronto Raptors Eastern Atlantic Toronto ON \n", "5 Chicago Bulls Eastern Central Chicago IL \n", "6 Cleveland Cavaliers Eastern Central Cleveland OH \n", "7 Detroit Pistons Eastern Central Detroit MI \n", "8 Indiana Pacers Eastern Central Indianapolis IN \n", "9 Milwaukee Bucks Eastern Central Milwaukee WI \n", "10 Atlanta Hawks Eastern Southeast Atlanta GA \n", "11 Charlotte Hornets Eastern Southeast Charlotte NC \n", "12 Miami Heat Eastern Southeast Miami FL \n", "13 Orlando Magic Eastern Southeast Orlando FL \n", "14 Washington Wizards Eastern Southeast Washington DC \n", "15 Denver Nuggets Western Northwest Denver CO \n", "16 Minnesota Timberwolves Western Northwest Minneapolis MN \n", "17 Oklahoma City Thunder Western Northwest Oklahoma City OK \n", "18 Portland Trail Blazers Western Northwest Portland OR \n", "19 Utah Jazz Western Northwest Salt Lake City UT \n", "20 Golden State Warriors Western Pacific Oakland CA \n", "21 Los Angeles Clippers Western Pacific Los Angeles CA \n", "22 Los Angeles Lakers Western Pacific Los Angeles CA \n", "23 Phoenix Suns Western Pacific Phoenix AZ \n", "24 Sacramento Kings Western Pacific Sacramento CA \n", "25 Dallas Mavericks Western Southwest Dallas TX \n", "26 Houston Rockets Western Southwest Houston TX \n", "27 Memphis Grizzlies Western Southwest Memphis TN \n", "28 New Orleans Pelicans Western Southwest New Orleans LA \n", "29 San Antonio Spurs Western Southwest San Antonio TX \n", "\n", "0 Arena Capacity Latitude Longitude Founded \\\n", "0 TD Garden 18624 42.366303 -71.062228 1946 \n", "1 Barclays Center 17732 40.682650 -73.974689 1967 \n", "2 Madison Square Garden 19812 40.750556 -73.993611 1946 \n", "3 Wells Fargo Center 21600 39.901111 -75.171944 1946 \n", "4 Air Canada Centre 19800 43.643333 -79.379167 1995 \n", "5 United Center 20917 41.880556 -87.674167 1966 \n", "6 Quicken Loans Arena 20562 41.496389 -81.688056 1970 \n", "7 Little Caesars Arena 20491 42.696944 -83.245556 1941 \n", "8 Bankers Life Fieldhouse 17923 39.763889 -86.155556 1967 \n", "9 Bradley Center 18717 43.043611 -87.916944 1968 \n", "10 Philips Arena 15711 33.757222 -84.396389 1946 \n", "11 Spectrum Center 19077 35.225000 -80.839167 1988 \n", "12 American Airlines Arena 19600 25.781389 -80.188056 1988 \n", "13 Amway Center 18846 28.539167 -81.383611 1989 \n", "14 Capital One Arena 20356 38.898056 -77.020833 1961 \n", "15 Pepsi Center 19520 39.748611 -105.007500 1967 \n", "16 Target Center 19356 44.979444 -93.276111 1989 \n", "17 Chesapeake Energy Arena 18203 35.463333 -97.515000 1967 \n", "18 Moda Center 19441 45.531667 -122.666667 1970 \n", "19 Vivint Smart Home Arena 19911 40.768333 -111.901111 1974 \n", "20 Oracle Arena 19596 37.750278 -122.203056 1946 \n", "21 Staples Center 19060 34.043056 -118.267222 1970 \n", "22 Staples Center 18997 34.043056 -118.267222 1947 \n", "23 Talking Stick Resort Arena 18055 33.445833 -112.071389 1968 \n", "24 Golden 1 Center 17500 38.649167 -121.518056 1923 \n", "25 American Airlines Center 19200 32.790556 -96.810278 1980 \n", "26 Toyota Center 18055 29.750833 -95.362222 1967 \n", "27 FedExForum 18119 35.138333 -90.050556 1995 \n", "28 Smoothie King Center 16867 29.948889 -90.081944 2002 \n", "29 AT&T Center 18418 29.426944 -98.437500 1967 \n", "\n", "0 Joined Head coach \n", "0 1946 Brad Stevens \n", "1 1976 Kenny Atkinson \n", "2 1946 Jeff Hornacek \n", "3 1949 Brett Brown \n", "4 1995 Dwane Casey \n", "5 1966 Fred Hoiberg \n", "6 1970 Tyronn Lue \n", "7 1948 Stan Van Gundy \n", "8 1976 Nate McMillan \n", "9 1968 Joe Prunty \n", "10 1949 Mike Budenholzer \n", "11 1988 Steve Clifford \n", "12 1988 Erik Spoelstra \n", "13 1989 Frank Vogel \n", "14 1961 Scott Brooks \n", "15 1976 Michael Malone \n", "16 1989 Tom Thibodeau \n", "17 1967 Billy Donovan \n", "18 1970 Terry Stotts \n", "19 1974 Quin Snyder \n", "20 1946 Steve Kerr \n", "21 1970 Doc Rivers \n", "22 1948 Luke Walton \n", "23 1968 Jay Triano \n", "24 1948 Dave Joerger \n", "25 1980 Rick Carlisle \n", "26 1967 Mike D'Antoni \n", "27 1995 J. B. Bickerstaff \n", "28 2002 Alvin Gentry \n", "29 1976 Gregg Popovich " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = wiki_teams_info(raw)\n", "df" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "OUTPUT_DIR = PARENT_DIR / 'data' / 'scraped'\n", "OUTPUT_DIR.mkdir(exist_ok=True, parents=True)\n", "csvfile = OUTPUT_DIR.joinpath('wiki-nba_team_info.csv')\n", "df.to_csv(csvfile, index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use these HTML scraping tools and techniques in your own sports analytics projects. But we're not done yet.\n", "\n", "### A Map of NBA Arenas\n", "\n", "We haven't done anything useful with the NBA team data from Wikipedia. One nice thing we can do is to draw a map of NBA arena locations using the latitude and longitude information.\n", "\n", "There's a more practical use for this arena data. Most serious [strength of schedule](https://en.wikipedia.org/wiki/Strength_of_schedule) analysis in the NBA looks at road games, rest and distance traveled. In a future post, we'll see how to incorporate this geographic information to estimate travel distance between games.\n", "\n", "#### Another HTML Table to Scrape\n", "\n", "We are going to use Python's [Basemap package](https://basemaptutorial.readthedocs.io/en/latest/) to draw a map of North America with NBA arenas. We are also going to going to fill in the U.S. states having NBA arenas using a different color for each NBA Division. Sorry Toronto and Washington fans. Any coloring for Washington, D.C. wouldn't be visible anyway, and this example won't fill in the province of Ontario.\n", "\n", "In order to do this coloring, we need to use [shapefiles](https://basemaptutorial.readthedocs.io/en/latest/shapefile.html). These files contain information about the shapes of various geographic features (in this case, U.S. states). We will overlay these shapes on our map filled with the correct color. The shapefiles we will use come from the [U.S. Census Bureau](https://www.census.gov/geo/maps-data/data/prev_cartbndry_names.html).\n", "\n", "In order to use these particular shapefiles, we need to be able to move between state names and postal abbreviations. Our arena data has only the postal abbreviations, and the shapefiles use state names.\n", "\n", "There are plenty of ways to get this information (including typing it in to a Python program yourself). However, since this technical guide is about scraping HTML tables, we can use it as another opportunity to scrape a Wikipedia table.\n", "\n", "Let's scrape [Wikipedia's list of U.S. state abbreviations](https://en.wikipedia.org/wiki/List_of_U.S._state_abbreviations)." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "ABBR_URL = 'https://en.wikipedia.org/wiki/List_of_U.S._state_abbreviations'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the same HTML table scraping function as before. In this case, we want to use a table class of `'sortable'` to get the right table." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "abbr_tables = pps.HTMLTables(ABBR_URL, headers=REQUEST_HEADERS, table_class='sortable')" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(abbr_tables)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>0</th>\n", " <th>1</th>\n", " <th>2</th>\n", " <th>3</th>\n", " <th>4</th>\n", " <th>5</th>\n", " <th>6</th>\n", " <th>7</th>\n", " <th>8</th>\n", " <th>9</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>Codes: ISO ISO 3166 codes (2-letter, 3-l...</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>Name and status of region</td>\n", " <td>NaN</td>\n", " <td>ISO</td>\n", " <td>ANSI</td>\n", " <td>NaN</td>\n", " <td>USPS</td>\n", " <td>USCG</td>\n", " <td>GPO</td>\n", " <td>AP</td>\n", " <td>Other abbreviations</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " <td>NaN</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>United States of America</td>\n", " <td>Federal state</td>\n", " <td>US USA 840</td>\n", " <td>US</td>\n", " <td>00</td>\n", " <td></td>\n", " <td></td>\n", " <td>U.S.</td>\n", " <td>U.S.</td>\n", " <td>U.S.A.</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>Alabama</td>\n", " <td>State</td>\n", " <td>US-AL</td>\n", " <td>AL</td>\n", " <td>01</td>\n", " <td>AL</td>\n", " <td>AL</td>\n", " <td>Ala.</td>\n", " <td>Ala.</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>5</th>\n", " <td>Alaska</td>\n", " <td>State</td>\n", " <td>US-AK</td>\n", " <td>AK</td>\n", " <td>02</td>\n", " <td>AK</td>\n", " <td>AK</td>\n", " <td>Alaska</td>\n", " <td>Alaska</td>\n", " <td>Alas.</td>\n", " </tr>\n", " <tr>\n", " <th>6</th>\n", " <td>Arizona</td>\n", " <td>State</td>\n", " <td>US-AZ</td>\n", " <td>AZ</td>\n", " <td>04</td>\n", " <td>AZ</td>\n", " <td>AZ</td>\n", " <td>Ariz.</td>\n", " <td>Ariz.</td>\n", " <td>Az.</td>\n", " </tr>\n", " <tr>\n", " <th>7</th>\n", " <td>Arkansas</td>\n", " <td>State</td>\n", " <td>US-AR</td>\n", " <td>AR</td>\n", " <td>05</td>\n", " <td>AR</td>\n", " <td>AR</td>\n", " <td>Ark.</td>\n", " <td>Ark.</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>8</th>\n", " <td>California</td>\n", " <td>State</td>\n", " <td>US-CA</td>\n", " <td>CA</td>\n", " <td>06</td>\n", " <td>CA</td>\n", " <td>CF</td>\n", " <td>Calif.</td>\n", " <td>Calif.</td>\n", " <td>Ca., Cal.</td>\n", " </tr>\n", " <tr>\n", " <th>9</th>\n", " <td>Colorado</td>\n", " <td>State</td>\n", " <td>US-CO</td>\n", " <td>CO</td>\n", " <td>08</td>\n", " <td>CO</td>\n", " <td>CL</td>\n", " <td>Colo.</td>\n", " <td>Colo.</td>\n", " <td>Col.</td>\n", " </tr>\n", " <tr>\n", " <th>10</th>\n", " <td>Connecticut</td>\n", " <td>State</td>\n", " <td>US-CT</td>\n", " <td>CT</td>\n", " <td>09</td>\n", " <td>CT</td>\n", " <td>CT</td>\n", " <td>Conn.</td>\n", " <td>Conn.</td>\n", " <td>Ct.</td>\n", " </tr>\n", " <tr>\n", " <th>11</th>\n", " <td>Delaware</td>\n", " <td>State</td>\n", " <td>US-DE</td>\n", " <td>DE</td>\n", " <td>10</td>\n", " <td>DE</td>\n", " <td>DL</td>\n", " <td>Del.</td>\n", " <td>Del.</td>\n", " <td>De.</td>\n", " </tr>\n", " <tr>\n", " <th>12</th>\n", " <td>District of Columbia</td>\n", " <td>Federal district</td>\n", " <td>US-DC</td>\n", " <td>DC</td>\n", " <td>11</td>\n", " <td>DC</td>\n", " <td>DC</td>\n", " <td>D.C.</td>\n", " <td>D.C.</td>\n", " <td>Wash. D.C.</td>\n", " </tr>\n", " <tr>\n", " <th>13</th>\n", " <td>Florida</td>\n", " <td>State</td>\n", " <td>US-FL</td>\n", " <td>FL</td>\n", " <td>12</td>\n", " <td>FL</td>\n", " <td>FL</td>\n", " <td>Fla.</td>\n", " <td>Fla.</td>\n", " <td>Fl., Flor.</td>\n", " </tr>\n", " <tr>\n", " <th>14</th>\n", " <td>Georgia</td>\n", " <td>State</td>\n", " <td>US-GA</td>\n", " <td>GA</td>\n", " <td>13</td>\n", " <td>GA</td>\n", " <td>GA</td>\n", " <td>Ga.</td>\n", " <td>Ga.</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>15</th>\n", " <td>Hawaii</td>\n", " <td>State</td>\n", " <td>US-HI</td>\n", " <td>HI</td>\n", " <td>15</td>\n", " <td>HI</td>\n", " <td>HA</td>\n", " <td>Hawaii</td>\n", " <td>Hawaii</td>\n", " <td>H.I.</td>\n", " </tr>\n", " <tr>\n", " <th>16</th>\n", " <td>Idaho</td>\n", " <td>State</td>\n", " <td>US-ID</td>\n", " <td>ID</td>\n", " <td>16</td>\n", " <td>ID</td>\n", " <td>ID</td>\n", " <td>Idaho</td>\n", " <td>Idaho</td>\n", " <td>Id., Ida.</td>\n", " </tr>\n", " <tr>\n", " <th>17</th>\n", " <td>Illinois</td>\n", " <td>State</td>\n", " <td>US-IL</td>\n", " <td>IL</td>\n", " <td>17</td>\n", " <td>IL</td>\n", " <td>IL</td>\n", " <td>Ill.</td>\n", " <td>Ill.</td>\n", " <td>Il., Ills., Ill's</td>\n", " </tr>\n", " <tr>\n", " <th>18</th>\n", " <td>Indiana</td>\n", " <td>State</td>\n", " <td>US-IN</td>\n", " <td>IN</td>\n", " <td>18</td>\n", " <td>IN</td>\n", " <td>IN</td>\n", " <td>Ind.</td>\n", " <td>Ind.</td>\n", " <td>In.</td>\n", " </tr>\n", " <tr>\n", " <th>19</th>\n", " <td>Iowa</td>\n", " <td>State</td>\n", " <td>US-IA</td>\n", " <td>IA</td>\n", " <td>19</td>\n", " <td>IA</td>\n", " <td>IA</td>\n", " <td>Iowa</td>\n", " <td>Iowa</td>\n", " <td>Ia., Ioa.</td>\n", " </tr>\n", " <tr>\n", " <th>20</th>\n", " <td>Kansas</td>\n", " <td>State</td>\n", " <td>US-KS</td>\n", " <td>KS</td>\n", " <td>20</td>\n", " <td>KS</td>\n", " <td>KA</td>\n", " <td>Kans.</td>\n", " <td>Kan.</td>\n", " <td>Ks., Ka.</td>\n", " </tr>\n", " <tr>\n", " <th>21</th>\n", " <td>Kentucky</td>\n", " <td>State (Commonwealth)</td>\n", " <td>US-KY</td>\n", " <td>KY</td>\n", " <td>21</td>\n", " <td>KY</td>\n", " <td>KY</td>\n", " <td>Ky.</td>\n", " <td>Ky.</td>\n", " <td>Ken., Kent.</td>\n", " </tr>\n", " <tr>\n", " <th>22</th>\n", " <td>Louisiana</td>\n", " <td>State</td>\n", " <td>US-LA</td>\n", " <td>LA</td>\n", " <td>22</td>\n", " <td>LA</td>\n", " <td>LA</td>\n", " <td>La.</td>\n", " <td>La.</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>23</th>\n", " <td>Maine</td>\n", " <td>State</td>\n", " <td>US-ME</td>\n", " <td>ME</td>\n", " <td>23</td>\n", " <td>ME</td>\n", " <td>ME</td>\n", " <td>Maine</td>\n", " <td>Maine</td>\n", " <td>Me.</td>\n", " </tr>\n", " <tr>\n", " <th>24</th>\n", " <td>Maryland</td>\n", " <td>State</td>\n", " <td>US-MD</td>\n", " <td>MD</td>\n", " <td>24</td>\n", " <td>MD</td>\n", " <td>MD</td>\n", " <td>Md.</td>\n", " <td>Md.</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>25</th>\n", " <td>Massachusetts</td>\n", " <td>State (Commonwealth)</td>\n", " <td>US-MA</td>\n", " <td>MA</td>\n", " <td>25</td>\n", " <td>MA</td>\n", " <td>MS</td>\n", " <td>Mass.</td>\n", " <td>Mass.</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>26</th>\n", " <td>Michigan</td>\n", " <td>State</td>\n", " <td>US-MI</td>\n", " <td>MI</td>\n", " <td>26</td>\n", " <td>MI</td>\n", " <td>MC</td>\n", " <td>Mich.</td>\n", " <td>Mich.</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>27</th>\n", " <td>Minnesota</td>\n", " <td>State</td>\n", " <td>US-MN</td>\n", " <td>MN</td>\n", " <td>27</td>\n", " <td>MN</td>\n", " <td>MN</td>\n", " <td>Minn.</td>\n", " <td>Minn.</td>\n", " <td>Mn.</td>\n", " </tr>\n", " <tr>\n", " <th>28</th>\n", " <td>Mississippi</td>\n", " <td>State</td>\n", " <td>US-MS</td>\n", " <td>MS</td>\n", " <td>28</td>\n", " <td>MS</td>\n", " <td>MI</td>\n", " <td>Miss.</td>\n", " <td>Miss.</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>29</th>\n", " <td>Missouri</td>\n", " <td>State</td>\n", " <td>US-MO</td>\n", " <td>MO</td>\n", " <td>29</td>\n", " <td>MO</td>\n", " <td>MO</td>\n", " <td>Mo.</td>\n", " <td>Mo.</td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>...</th>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " <td>...</td>\n", " </tr>\n", " <tr>\n", " <th>51</th>\n", " <td>Washington</td>\n", " <td>State</td>\n", " <td>US-WA</td>\n", " <td>WA</td>\n", " <td>53</td>\n", " <td>WA</td>\n", " <td>WN</td>\n", " <td>Wash.</td>\n", " <td>Wash.</td>\n", " <td>Wa., Wn.</td>\n", " </tr>\n", " <tr>\n", " <th>52</th>\n", " <td>West Virginia</td>\n", " <td>State</td>\n", " <td>US-WV</td>\n", " <td>WV</td>\n", " <td>54</td>\n", " <td>WV</td>\n", " <td>WV</td>\n", " <td>W. Va.</td>\n", " <td>W.Va.</td>\n", " <td>W.V., W. Virg.</td>\n", " </tr>\n", " <tr>\n", " <th>53</th>\n", " <td>Wisconsin</td>\n", " <td>State</td>\n", " <td>US-WI</td>\n", " <td>WI</td>\n", " <td>55</td>\n", " <td>WI</td>\n", " <td>WS</td>\n", " <td>Wis.</td>\n", " <td>Wis.</td>\n", " <td>Wi., Wisc.</td>\n", " </tr>\n", " <tr>\n", " <th>54</th>\n", " <td>Wyoming</td>\n", " <td>State</td>\n", " <td>US-WY</td>\n", " <td>WY</td>\n", " <td>56</td>\n", " <td>WY</td>\n", " <td>WY</td>\n", " <td>Wyo.</td>\n", " <td>Wyo.</td>\n", " <td>Wy.</td>\n", " </tr>\n", " <tr>\n", " <th>55</th>\n", " <td>American Samoa</td>\n", " <td>Insular area (Territory)</td>\n", " <td>AS ASM 016 US-AS</td>\n", " <td>AS</td>\n", " <td>60</td>\n", " <td>AS</td>\n", " <td>AS</td>\n", " <td>A.S.</td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>56</th>\n", " <td>Guam</td>\n", " <td>Insular area (Territory)</td>\n", " <td>GU GUM 316 US-GU</td>\n", " <td>GU</td>\n", " <td>66</td>\n", " <td>GU</td>\n", " <td>GU</td>\n", " <td>Guam</td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>57</th>\n", " <td>Northern Mariana Islands</td>\n", " <td>Insular area (Commonwealth)</td>\n", " <td>MP MNP 580 US-MP</td>\n", " <td>MP</td>\n", " <td>69</td>\n", " <td>MP</td>\n", " <td>CM</td>\n", " <td>M.P.</td>\n", " <td></td>\n", " <td>CNMI</td>\n", " </tr>\n", " <tr>\n", " <th>58</th>\n", " <td>Puerto Rico</td>\n", " <td>Insular area (Territory)</td>\n", " <td>PR PRI 630 US-PR</td>\n", " <td>PR</td>\n", " <td>72</td>\n", " <td>PR</td>\n", " <td>PR</td>\n", " <td>P.R.</td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>59</th>\n", " <td>U.S. Virgin Islands</td>\n", " <td>Insular area (Territory)</td>\n", " <td>VI VIR 850 US-VI</td>\n", " <td>VI</td>\n", " <td>78</td>\n", " <td>VI</td>\n", " <td>VI</td>\n", " <td>V.I.</td>\n", " <td></td>\n", " <td>U.S.V.I.</td>\n", " </tr>\n", " <tr>\n", " <th>60</th>\n", " <td>U.S. Minor Outlying Islands</td>\n", " <td>Insular areas</td>\n", " <td>UM UMI 581 US-UM</td>\n", " <td>UM</td>\n", " <td>74</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>61</th>\n", " <td>Baker Island</td>\n", " <td>island</td>\n", " <td>UM-81</td>\n", " <td></td>\n", " <td>81</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>XB</td>\n", " </tr>\n", " <tr>\n", " <th>62</th>\n", " <td>Howland Island</td>\n", " <td>island</td>\n", " <td>UM-84</td>\n", " <td></td>\n", " <td>84</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>XH</td>\n", " </tr>\n", " <tr>\n", " <th>63</th>\n", " <td>Jarvis Island</td>\n", " <td>island</td>\n", " <td>UM-86</td>\n", " <td></td>\n", " <td>86</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>XQ</td>\n", " </tr>\n", " <tr>\n", " <th>64</th>\n", " <td>Johnston Atoll</td>\n", " <td>atoll</td>\n", " <td>UM-67</td>\n", " <td></td>\n", " <td>67</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>XU</td>\n", " </tr>\n", " <tr>\n", " <th>65</th>\n", " <td>Kingman Reef</td>\n", " <td>atoll</td>\n", " <td>UM-89</td>\n", " <td></td>\n", " <td>89</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>XM</td>\n", " </tr>\n", " <tr>\n", " <th>66</th>\n", " <td>Midway Islands</td>\n", " <td>atoll</td>\n", " <td>UM-71</td>\n", " <td></td>\n", " <td>71</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>QM</td>\n", " </tr>\n", " <tr>\n", " <th>67</th>\n", " <td>Navassa Island</td>\n", " <td>island</td>\n", " <td>UM-76</td>\n", " <td></td>\n", " <td>76</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>XV</td>\n", " </tr>\n", " <tr>\n", " <th>68</th>\n", " <td>Palmyra Atoll</td>\n", " <td>atoll</td>\n", " <td>UM-95</td>\n", " <td></td>\n", " <td>95</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>XL</td>\n", " </tr>\n", " <tr>\n", " <th>69</th>\n", " <td>Wake Island</td>\n", " <td>atoll</td>\n", " <td>UM-79</td>\n", " <td></td>\n", " <td>79</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>QW</td>\n", " </tr>\n", " <tr>\n", " <th>70</th>\n", " <td>Micronesia</td>\n", " <td>Freely associated state</td>\n", " <td>FM FSM 583</td>\n", " <td>FM</td>\n", " <td>64</td>\n", " <td>FM</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>71</th>\n", " <td>Marshall Islands</td>\n", " <td>Freely associated state</td>\n", " <td>MH MHL 584</td>\n", " <td>MH</td>\n", " <td>68</td>\n", " <td>MH</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>72</th>\n", " <td>Palau</td>\n", " <td>Freely associated state</td>\n", " <td>PW PLW 585</td>\n", " <td>PW</td>\n", " <td>70</td>\n", " <td>PW</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>73</th>\n", " <td>U.S. Armed Forces – Americas</td>\n", " <td>US military mail code</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>AA</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>74</th>\n", " <td>U.S. Armed Forces – Europe</td>\n", " <td>US military mail code</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>AE</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>75</th>\n", " <td>U.S. Armed Forces – Pacific</td>\n", " <td>US military mail code</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>AP</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>76</th>\n", " <td>Northern Mariana Islands</td>\n", " <td>Obsolete postal code</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>CM</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>77</th>\n", " <td>Panama Canal Zone</td>\n", " <td>Obsolete postal code</td>\n", " <td>PZ PCZ 594</td>\n", " <td></td>\n", " <td></td>\n", " <td>CZ</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>78</th>\n", " <td>Nebraska</td>\n", " <td>Obsolete postal code</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td>NB</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>79</th>\n", " <td>Philippine Islands</td>\n", " <td>Obsolete postal code</td>\n", " <td>PH PHL 608</td>\n", " <td></td>\n", " <td></td>\n", " <td>PI</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " <tr>\n", " <th>80</th>\n", " <td>Trust Territory of the Pacific Islands</td>\n", " <td>Obsolete postal code</td>\n", " <td>PC PCI 582</td>\n", " <td></td>\n", " <td></td>\n", " <td>TT</td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " <td></td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "<p>81 rows × 10 columns</p>\n", "</div>" ], "text/plain": [ " 0 \\\n", "0 Codes: ISO ISO 3166 codes (2-letter, 3-l... \n", "1 Name and status of region \n", "2 \n", "3 United States of America \n", "4 Alabama \n", "5 Alaska \n", "6 Arizona \n", "7 Arkansas \n", "8 California \n", "9 Colorado \n", "10 Connecticut \n", "11 Delaware \n", "12 District of Columbia \n", "13 Florida \n", "14 Georgia \n", "15 Hawaii \n", "16 Idaho \n", "17 Illinois \n", "18 Indiana \n", "19 Iowa \n", "20 Kansas \n", "21 Kentucky \n", "22 Louisiana \n", "23 Maine \n", "24 Maryland \n", "25 Massachusetts \n", "26 Michigan \n", "27 Minnesota \n", "28 Mississippi \n", "29 Missouri \n", ".. ... \n", "51 Washington \n", "52 West Virginia \n", "53 Wisconsin \n", "54 Wyoming \n", "55 American Samoa \n", "56 Guam \n", "57 Northern Mariana Islands \n", "58 Puerto Rico \n", "59 U.S. Virgin Islands \n", "60 U.S. Minor Outlying Islands \n", "61 Baker Island \n", "62 Howland Island \n", "63 Jarvis Island \n", "64 Johnston Atoll \n", "65 Kingman Reef \n", "66 Midway Islands \n", "67 Navassa Island \n", "68 Palmyra Atoll \n", "69 Wake Island \n", "70 Micronesia \n", "71 Marshall Islands \n", "72 Palau \n", "73 U.S. Armed Forces – Americas \n", "74 U.S. Armed Forces – Europe \n", "75 U.S. Armed Forces – Pacific \n", "76 Northern Mariana Islands \n", "77 Panama Canal Zone \n", "78 Nebraska \n", "79 Philippine Islands \n", "80 Trust Territory of the Pacific Islands \n", "\n", " 1 2 3 4 5 6 \\\n", "0 NaN NaN NaN NaN NaN NaN \n", "1 NaN ISO ANSI NaN USPS USCG \n", "2 \n", "3 Federal state US USA 840 US 00 \n", "4 State US-AL AL 01 AL AL \n", "5 State US-AK AK 02 AK AK \n", "6 State US-AZ AZ 04 AZ AZ \n", "7 State US-AR AR 05 AR AR \n", "8 State US-CA CA 06 CA CF \n", "9 State US-CO CO 08 CO CL \n", "10 State US-CT CT 09 CT CT \n", "11 State US-DE DE 10 DE DL \n", "12 Federal district US-DC DC 11 DC DC \n", "13 State US-FL FL 12 FL FL \n", "14 State US-GA GA 13 GA GA \n", "15 State US-HI HI 15 HI HA \n", "16 State US-ID ID 16 ID ID \n", "17 State US-IL IL 17 IL IL \n", "18 State US-IN IN 18 IN IN \n", "19 State US-IA IA 19 IA IA \n", "20 State US-KS KS 20 KS KA \n", "21 State (Commonwealth) US-KY KY 21 KY KY \n", "22 State US-LA LA 22 LA LA \n", "23 State US-ME ME 23 ME ME \n", "24 State US-MD MD 24 MD MD \n", "25 State (Commonwealth) US-MA MA 25 MA MS \n", "26 State US-MI MI 26 MI MC \n", "27 State US-MN MN 27 MN MN \n", "28 State US-MS MS 28 MS MI \n", "29 State US-MO MO 29 MO MO \n", ".. ... ... ... ... ... ... \n", "51 State US-WA WA 53 WA WN \n", "52 State US-WV WV 54 WV WV \n", "53 State US-WI WI 55 WI WS \n", "54 State US-WY WY 56 WY WY \n", "55 Insular area (Territory) AS ASM 016 US-AS AS 60 AS AS \n", "56 Insular area (Territory) GU GUM 316 US-GU GU 66 GU GU \n", "57 Insular area (Commonwealth) MP MNP 580 US-MP MP 69 MP CM \n", "58 Insular area (Territory) PR PRI 630 US-PR PR 72 PR PR \n", "59 Insular area (Territory) VI VIR 850 US-VI VI 78 VI VI \n", "60 Insular areas UM UMI 581 US-UM UM 74 \n", "61 island UM-81 81 \n", "62 island UM-84 84 \n", "63 island UM-86 86 \n", "64 atoll UM-67 67 \n", "65 atoll UM-89 89 \n", "66 atoll UM-71 71 \n", "67 island UM-76 76 \n", "68 atoll UM-95 95 \n", "69 atoll UM-79 79 \n", "70 Freely associated state FM FSM 583 FM 64 FM \n", "71 Freely associated state MH MHL 584 MH 68 MH \n", "72 Freely associated state PW PLW 585 PW 70 PW \n", "73 US military mail code AA \n", "74 US military mail code AE \n", "75 US military mail code AP \n", "76 Obsolete postal code CM \n", "77 Obsolete postal code PZ PCZ 594 CZ \n", "78 Obsolete postal code NB \n", "79 Obsolete postal code PH PHL 608 PI \n", "80 Obsolete postal code PC PCI 582 TT \n", "\n", " 7 8 9 \n", "0 NaN NaN NaN \n", "1 GPO AP Other abbreviations \n", "2 NaN NaN NaN \n", "3 U.S. U.S. U.S.A. \n", "4 Ala. Ala. \n", "5 Alaska Alaska Alas. \n", "6 Ariz. Ariz. Az. \n", "7 Ark. Ark. \n", "8 Calif. Calif. Ca., Cal. \n", "9 Colo. Colo. Col. \n", "10 Conn. Conn. Ct. \n", "11 Del. Del. De. \n", "12 D.C. D.C. Wash. D.C. \n", "13 Fla. Fla. Fl., Flor. \n", "14 Ga. Ga. \n", "15 Hawaii Hawaii H.I. \n", "16 Idaho Idaho Id., Ida. \n", "17 Ill. Ill. Il., Ills., Ill's \n", "18 Ind. Ind. In. \n", "19 Iowa Iowa Ia., Ioa. \n", "20 Kans. Kan. Ks., Ka. \n", "21 Ky. Ky. Ken., Kent. \n", "22 La. La. \n", "23 Maine Maine Me. \n", "24 Md. Md. \n", "25 Mass. Mass. \n", "26 Mich. Mich. \n", "27 Minn. Minn. Mn. \n", "28 Miss. Miss. \n", "29 Mo. Mo. \n", ".. ... ... ... \n", "51 Wash. Wash. Wa., Wn. \n", "52 W. Va. W.Va. W.V., W. Virg. \n", "53 Wis. Wis. Wi., Wisc. \n", "54 Wyo. Wyo. Wy. \n", "55 A.S. \n", "56 Guam \n", "57 M.P. CNMI \n", "58 P.R. \n", "59 V.I. U.S.V.I. \n", "60 \n", "61 XB \n", "62 XH \n", "63 XQ \n", "64 XU \n", "65 XM \n", "66 QM \n", "67 XV \n", "68 XL \n", "69 QW \n", "70 \n", "71 \n", "72 \n", "73 \n", "74 \n", "75 \n", "76 \n", "77 \n", "78 \n", "79 \n", "80 \n", "\n", "[81 rows x 10 columns]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "abbr_df = abbr_tables[0].to_df()\n", "abbr_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Cleaning an Ugly Table\n", "\n", "This is a relatively ugly table. Notice that many of the cells are blank. One of the reasons I wanted to use this example is to show how this general web scraping framework works, even for ugly tables.\n", "\n", "In this case, we just want the name, status and USPS columns. We can also filter out any obsolete postal codes." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "def usps_abbrs(raw):\n", " df = raw.drop(raw.index[:4]).reset_index(drop=True)\n", " df = df.iloc[:, [0, 1, 5]]\n", " df.columns = ['Name', 'Status', 'USPS']\n", " df = df.loc[(df['USPS'] != '') & (~df['Status'].str.contains('Obsolete')), ['Name', 'Status', 'USPS']]\n", " return df.reset_index(drop=True)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>Name</th>\n", " <th>Status</th>\n", " <th>USPS</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>42</th>\n", " <td>Tennessee</td>\n", " <td>State</td>\n", " <td>TN</td>\n", " </tr>\n", " <tr>\n", " <th>43</th>\n", " <td>Texas</td>\n", " <td>State</td>\n", " <td>TX</td>\n", " </tr>\n", " <tr>\n", " <th>44</th>\n", " <td>Utah</td>\n", " <td>State</td>\n", " <td>UT</td>\n", " </tr>\n", " <tr>\n", " <th>45</th>\n", " <td>Vermont</td>\n", " <td>State</td>\n", " <td>VT</td>\n", " </tr>\n", " <tr>\n", " <th>46</th>\n", " <td>Virginia</td>\n", " <td>State (Commonwealth)</td>\n", " <td>VA</td>\n", " </tr>\n", " <tr>\n", " <th>47</th>\n", " <td>Washington</td>\n", " <td>State</td>\n", " <td>WA</td>\n", " </tr>\n", " <tr>\n", " <th>48</th>\n", " <td>West Virginia</td>\n", " <td>State</td>\n", " <td>WV</td>\n", " </tr>\n", " <tr>\n", " <th>49</th>\n", " <td>Wisconsin</td>\n", " <td>State</td>\n", " <td>WI</td>\n", " </tr>\n", " <tr>\n", " <th>50</th>\n", " <td>Wyoming</td>\n", " <td>State</td>\n", " <td>WY</td>\n", " </tr>\n", " <tr>\n", " <th>51</th>\n", " <td>American Samoa</td>\n", " <td>Insular area (Territory)</td>\n", " <td>AS</td>\n", " </tr>\n", " <tr>\n", " <th>52</th>\n", " <td>Guam</td>\n", " <td>Insular area (Territory)</td>\n", " <td>GU</td>\n", " </tr>\n", " <tr>\n", " <th>53</th>\n", " <td>Northern Mariana Islands</td>\n", " <td>Insular area (Commonwealth)</td>\n", " <td>MP</td>\n", " </tr>\n", " <tr>\n", " <th>54</th>\n", " <td>Puerto Rico</td>\n", " <td>Insular area (Territory)</td>\n", " <td>PR</td>\n", " </tr>\n", " <tr>\n", " <th>55</th>\n", " <td>U.S. Virgin Islands</td>\n", " <td>Insular area (Territory)</td>\n", " <td>VI</td>\n", " </tr>\n", " <tr>\n", " <th>56</th>\n", " <td>Micronesia</td>\n", " <td>Freely associated state</td>\n", " <td>FM</td>\n", " </tr>\n", " <tr>\n", " <th>57</th>\n", " <td>Marshall Islands</td>\n", " <td>Freely associated state</td>\n", " <td>MH</td>\n", " </tr>\n", " <tr>\n", " <th>58</th>\n", " <td>Palau</td>\n", " <td>Freely associated state</td>\n", " <td>PW</td>\n", " </tr>\n", " <tr>\n", " <th>59</th>\n", " <td>U.S. Armed Forces – Americas</td>\n", " <td>US military mail code</td>\n", " <td>AA</td>\n", " </tr>\n", " <tr>\n", " <th>60</th>\n", " <td>U.S. Armed Forces – Europe</td>\n", " <td>US military mail code</td>\n", " <td>AE</td>\n", " </tr>\n", " <tr>\n", " <th>61</th>\n", " <td>U.S. Armed Forces – Pacific</td>\n", " <td>US military mail code</td>\n", " <td>AP</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " Name Status USPS\n", "42 Tennessee State TN\n", "43 Texas State TX\n", "44 Utah State UT\n", "45 Vermont State VT\n", "46 Virginia State (Commonwealth) VA\n", "47 Washington State WA\n", "48 West Virginia State WV\n", "49 Wisconsin State WI\n", "50 Wyoming State WY\n", "51 American Samoa Insular area (Territory) AS\n", "52 Guam Insular area (Territory) GU\n", "53 Northern Mariana Islands Insular area (Commonwealth) MP\n", "54 Puerto Rico Insular area (Territory) PR\n", "55 U.S. Virgin Islands Insular area (Territory) VI\n", "56 Micronesia Freely associated state FM\n", "57 Marshall Islands Freely associated state MH\n", "58 Palau Freely associated state PW\n", "59 U.S. Armed Forces – Americas US military mail code AA\n", "60 U.S. Armed Forces – Europe US military mail code AE\n", "61 U.S. Armed Forces – Pacific US military mail code AP" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "usps_df = usps_abbrs(abbr_df)\n", "usps_df.tail(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This simple table is just what we need. Now, we can build a function which will return the postal abbreviation given the state name." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "def state_abbr_mapper(usps_df):\n", " name_usps = usps_df[['Name', 'USPS']].set_index('Name').to_dict(orient='Index')\n", " def inner(name):\n", " return name_usps[name]['USPS']\n", " return inner" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "name2abbr = state_abbr_mapper(usps_df)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'AL'" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "name2abbr('Alabama')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Take another look at what this function does. We used an inner function, which we return. This inner function \"remembers\" the `DataFrame` which was passed in to the outer function. It will be very easy to use this simple function as a wrapper to the `DataFrame` in our map-drawing code below.\n", "\n", "#### Drawing the Map\n", "\n", "Now we can start putting the pieces of the map together.\n", "\n", "First, we need a function to create a Basemap of the lower 48 U.S. states, along with portions of Canada and Mexico." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "def draw_basemap():\n", " \"\"\"Lambert Conformal map of lower 48 U.S. states with portions of Canada and Mexico.\"\"\"\n", " m = Basemap(\n", " llcrnrlon=-119,\n", " llcrnrlat=22,\n", " urcrnrlon=-64,\n", " urcrnrlat=49,\n", " projection='lcc',\n", " lat_1=32,\n", " lat_2=45,\n", " lon_0=-95,\n", " )\n", " m.fillcontinents(color='lightgray')\n", " return m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Reading the Shapefiles\n", "\n", "Next, we read in our shapefiles for the U.S. states." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "def read_shape_files(m):\n", " \"\"\" Get U.S. state shape boundaries.\"\"\"\n", " # Shapefiles downloaded from https://www.census.gov/geo/maps-data/data/prev_cartbndry_names.html\n", " MAP_DATA_DIR = PARENT_DIR / 'data'\n", " SHAPEFILE = MAP_DATA_DIR.joinpath('st99_d00')\n", " return m.readshapefile(\n", " shapefile=str(SHAPEFILE),\n", " name='states',\n", " drawbounds=True,\n", " color='white',\n", " linewidth=1,\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Making a Colormap for NBA Divisions\n", "\n", "Next, we create a colormap with a distinct color for each of the NBA Divisions." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "def make_colormap(divisions, colormap='Set3'):\n", " \"\"\"Create colormap with distinct value for each NBA division.\"\"\"\n", " cmap = plt.get_cmap(colormap, len(divisions)) \n", " return {div: cmap(divisions.index(div))[:3] for div in divisions}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to assign a color to each state that has an NBA arena. Of course, we can't use the U.S. state shapefiles for Toronto." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "def get_state_colors(df):\n", " colors = make_colormap(list(df['Division'].str.strip().unique()))\n", " state_color = {}\n", " for abbr in list(df['Postal'].str.strip().unique()):\n", " div = list(df.loc[df['Postal'] == abbr, 'Division'].unique())\n", " assert len(div) == 1 # there can only be one Division applicable for teams from one U.S. state\n", " div = str(div[0])\n", " color = colors[div]\n", " state_color[abbr] = rgb2hex(color)\n", " return state_color" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we need to get the information from the shapefile for each U.S. state. This is where we need to use our function to look up the postal abbreviation given a U.S. state name from the shapefile." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "def get_state_polygons(m):\n", " state_polygons = {}\n", " for info, shape in zip(m.states_info, m.states):\n", " abbr = name2abbr(info['NAME'])\n", " if abbr in state_polygons:\n", " state_polygons[abbr].append(Polygon(shape, True))\n", " else:\n", " state_polygons[abbr] = [Polygon(shape, True)]\n", " return state_polygons" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Putting It All Together and Drawing the Map\n", "\n", "To draw the map, we need to perform the following steps:\n", "\n", "- Create the Basemap;\n", "- Read in the state shapefiles and assign the colors for the states that need to be filled in;\n", "- Fill in the states with the correct colors\n", "- Draw markers for the arenas using the latitude and longitude information\n", "- Create text labels for the arenas using the team names\n", "- Show the map\n", "\n", "As I mentioned above, Basemap may emit some warnings when you run this code. The warnings I get are harmless, and I've filtered them out using the `warnings` module. You can run this code without the `warnings` module if you want, and the map should still be fine." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "def draw_nba_map(df):\n", " \"\"\"Draw map with locations of NBA arenas.\"\"\"\n", " \n", " fig, ax = plt.subplots(figsize=(12,8))\n", " m = draw_basemap()\n", " state_shapes = read_shape_files(m)\n", " state_polygons = get_state_polygons(m)\n", " state_colors = get_state_colors(df)\n", " \n", " # Color in states (skip Ontario and Washington, DC)\n", " for abbr in state_colors:\n", " if abbr not in ['ON', 'DC']:\n", " ax.add_collection(PatchCollection(\n", " state_polygons[abbr],\n", " facecolor=state_colors[abbr],\n", " edgecolor='white',\n", " linewidth=1,\n", " zorder=2)\n", " )\n", "\n", " # Display markers and labels for arenas\n", " cities = set()\n", " for _, row in df.iterrows():\n", " city = row['City']\n", " x, y = m(row['Longitude'], row['Latitude'])\n", " m.plot(x=x, y=y, color='black', marker='o', markersize=5)\n", " team = row['Team'].split()[-1]\n", " \n", " # If a city has already been plotted, offset the text so labels don't overlap\n", " if city in cities:\n", " label_x = x+40000\n", " label_y = y+40000\n", " else:\n", " label_x = x+40000\n", " label_y = y-40000\n", " cities.add(city)\n", " plt.text(x=label_x, y=label_y, s=team, fontsize='smaller')\n", "\n", " # Remove the box surrounding the plot\n", " for spine in ax.spines.values():\n", " spine.set_visible(False)\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "<matplotlib.figure.Figure at 0x10f0b7f98>" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "with warnings.catch_warnings():\n", " warnings.simplefilter(\"ignore\")\n", " draw_nba_map(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This simple example only scratches the surface of what you can do with geographical data in Python." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:sports_py36]", "language": "python", "name": "conda-env-sports_py36-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }