{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## NBA Home Court Advantage\n", "\n", "### Part 3: Do Teams Have Different Amounts of Home Court Advantage?\n", "\n", "This notebook looks at whether particular NBA teams have demonstrated significantly different amounts of home court advantage over time. This is challenging, since team quality varies significantly over time.\n", "\n", "The punchline? There is only very weak statistical evidence that teams have persistent and measurable differences in home court advantage relative to the league average.\n", "\n", "As with the [previous](http://nbviewer.jupyter.org/github/practicallypredictable/posts/blob/master/notebooks/nba_home_court-part1.ipynb) [notebooks](http://nbviewer.jupyter.org/github/practicallypredictable/posts/blob/master/notebooks/nba_home_court-part2.ipynb) in this series on NBA home court advantage, we will look at 24,797 match ups from the 1996-97 through 2016-17 regular seasons.\n", "\n", "Let's look at the data." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "pd.options.display.max_rows = 999\n", "pd.options.display.max_columns = 999\n", "pd.options.display.float_format = '{:.3f}'.format" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from scipy import stats" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "from matplotlib.colors import rgb2hex\n", "import seaborn as sns\n", "sns.set()\n", "sns.set_context('notebook')\n", "plt.style.use('ggplot')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's load the data. See the previous notebooks for more information on how to obtain the data." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "PROJECT_DIR = Path.cwd().parent / 'basketball' / 'nba'\n", "DATA_DIR = PROJECT_DIR / 'data' / 'prepared'\n", "DATA_DIR.mkdir(exist_ok=True, parents=True)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def load_nba_historical_matchups(input_dir):\n", " \"\"\"Load pickle file of NBA matchups prepared for analytics.\"\"\"\n", " PKLFILENAME = 'stats_nba_com-matchups-1996_97-2016_17.pkl'\n", " pklfile = input_dir.joinpath(PKLFILENAME)\n", " return pd.read_pickle(pklfile)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(26787, 41)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "matchups = load_nba_historical_matchups(DATA_DIR)\n", "matchups.shape" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "21" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "seasons = sorted(list(matchups['season'].unique()))\n", "len(seasons)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def prepare_regular_season(matchups):\n", " df = matchups.copy()\n", " df = df[df['season_type'] == 'regular']\n", " return df" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(24797, 41)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reg = prepare_regular_season(matchups)\n", "reg.shape" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "30" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "teams = sorted(list(reg['team_curr_h'].unique()))\n", "len(teams)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def load_win_loss_information():\n", " csvfile = DATA_DIR.joinpath('stats_nba_com-team_records-1996_97-2016_17.csv')\n", " df = pd.read_csv(csvfile)\n", " return df" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 622 entries, 0 to 621\n", "Data columns (total 14 columns):\n", "season 622 non-null object\n", "team 622 non-null object\n", "games 622 non-null int64\n", "wins 622 non-null int64\n", "losses 622 non-null int64\n", "win_percentage 622 non-null float64\n", "home_games 622 non-null int64\n", "home_wins 622 non-null int64\n", "home_losses 622 non-null int64\n", "home_win_percentage 622 non-null float64\n", "away_games 622 non-null int64\n", "away_wins 622 non-null int64\n", "away_losses 622 non-null int64\n", "away_win_percentage 622 non-null float64\n", "dtypes: float64(3), int64(9), object(2)\n", "memory usage: 68.1+ KB\n" ] } ], "source": [ "wl = load_win_loss_information()\n", "wl.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Home Versus Away Win Percentage\n", "\n", "Now that we have the data, we will do a simple analysis. We want to simply take the difference between home win percentage and away win percentage, grouped by season and team.\n", "\n", "This is a simple way to think about isolating the impact of home court advantage. Weak teams should have relatively weak home and away records, and strong teams should have relatively strong home and away records. Taking the difference between the home and away win percentage should average out the differences between teams and focus on the impact of the home court.\n", "\n", "The [NBA schedule](https://www.nbastuffer.com/analytics101/how-the-nba-schedule-is-made/) isn't perfectly balanced, in the sense that the average quality of home opponents may differ randomly from the quality of road opponents. But, there's no reason I'm aware of to think that there is a systematic bias in how the home versus away games are scheduled from season to season. We will analyze the schedule and home versus away records by conference and division in a future post, to see if that has any impact on a team's home court advantage.\n", "\n", "Although teams can vary significantly intra-season due to trades and injuries, the regular season is a reasonable time unit for analyzing team performance." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
home_win_percentageaway_win_percentagehome_away_diff
count622.000622.000622.000
mean0.5980.4020.197
std0.1700.1590.107
min0.1210.040-0.121
25%0.4880.2930.122
50%0.6100.4150.195
75%0.7320.5120.268
max0.9760.8290.585
\n", "
" ], "text/plain": [ " home_win_percentage away_win_percentage home_away_diff\n", "count 622.000 622.000 622.000\n", "mean 0.598 0.402 0.197\n", "std 0.170 0.159 0.107\n", "min 0.121 0.040 -0.121\n", "25% 0.488 0.293 0.122\n", "50% 0.610 0.415 0.195\n", "75% 0.732 0.512 0.268\n", "max 0.976 0.829 0.585" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hca = wl[['season', 'team', 'home_win_percentage', 'away_win_percentage']].copy()\n", "hca['home_away_diff'] = hca['home_win_percentage'] - hca['away_win_percentage']\n", "hca.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 622 distinct team/season pairs. Let's group these data by team and aggregate the statistics over the 21 complete seasons.\n", "\n", "These are the 5 teams with the highest observed average home court advantage, measured by home versus away win percentage differential." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
team
DEN21.0000.2580.1380.0240.1710.2930.3660.463
UTA21.0000.2450.1250.0730.1710.2440.2930.488
IND21.0000.2390.1010.0730.1460.2200.3410.390
ATL21.0000.2310.1100.0400.1460.2440.3170.415
CLE21.0000.2270.0830.0300.1950.2440.2680.390
\n", "
" ], "text/plain": [ " count mean std min 25% 50% 75% max\n", "team \n", "DEN 21.000 0.258 0.138 0.024 0.171 0.293 0.366 0.463\n", "UTA 21.000 0.245 0.125 0.073 0.171 0.244 0.293 0.488\n", "IND 21.000 0.239 0.101 0.073 0.146 0.220 0.341 0.390\n", "ATL 21.000 0.231 0.110 0.040 0.146 0.244 0.317 0.415\n", "CLE 21.000 0.227 0.083 0.030 0.195 0.244 0.268 0.390" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hca.groupby('team')['home_away_diff'].describe().sort_values('mean', ascending=False).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And these 5 teams have the lowest observed average home court advantage." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
team
MIA21.0000.1720.121-0.0730.0730.1950.2440.390
HOU21.0000.1720.112-0.0980.0980.1710.2800.317
BOS21.0000.1640.114-0.0490.0980.1710.2200.415
TOR21.0000.1600.0730.0490.0980.1460.2000.317
PHI21.0000.1320.095-0.0730.0490.1460.1950.293
\n", "
" ], "text/plain": [ " count mean std min 25% 50% 75% max\n", "team \n", "MIA 21.000 0.172 0.121 -0.073 0.073 0.195 0.244 0.390\n", "HOU 21.000 0.172 0.112 -0.098 0.098 0.171 0.280 0.317\n", "BOS 21.000 0.164 0.114 -0.049 0.098 0.171 0.220 0.415\n", "TOR 21.000 0.160 0.073 0.049 0.098 0.146 0.200 0.317\n", "PHI 21.000 0.132 0.095 -0.073 0.049 0.146 0.195 0.293" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hca.groupby('team')['home_away_diff'].describe().sort_values('mean', ascending=False).tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are some familiar names in these lists. Denver and Utah often show up in [articles about home court advantage](http://www.espn.com/nba/insider/story/_/id/9014283/nba-analyzing-real-home-court-advantage-utah-jazz-denver-nuggets). Some [people attribute the supposed extra home court advantage to the altitude and the time zone](https://www.cbssports.com/nba/news/nba-trying-to-weaken-nuggets-home-court-advantage/) of Denver and Salt Lake City. There's even [an article studying the potential effect of altitude on free throw shooting percentage](https://harvardsportsanalysis.wordpress.com/2011/05/18/does-altitude-affect-free-throw-percentage/). And, lastly, here's [an article that attempts to rank all 30 NBA teams by home court advantage](https://www.foxsports.com/nba/gallery/ranking-every-nba-teams-home-court-advantage-112816), without any data to support the rankings.\n", "\n", "In the words of the great American statistician [W. Edwards Deming](https://en.wikipedia.org/wiki/W._Edwards_Deming), \"In God we trust; all others must bring data.\"\n", "\n", "Needless to say, I'm skeptical.\n", "\n", "### Survey Says...\n", "\n", "Here's a [box plot](https://en.wikipedia.org/wiki/Box_plot) of the home versus away win percentages for all 30 NBA teams. The dashed horizontal line shows the overall median home versus away win percentage differential of roughly 20%." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(figsize=(12,8))\n", "ax = hca.boxplot('home_away_diff', by='team', ax=ax)\n", "ax.set_title('Home vs. Away Win Percentages between 1996-97 and 2016-17 Seasons')\n", "ax.set_ylim(-0.1, 0.5)\n", "ax.set_ylabel('Home Win Percentage minus Away Win Percentage')\n", "ax.set_xlabel('Team')\n", "ax.axhline(hca['home_away_diff'].median(), linestyle='--', alpha=0.5, color='black')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.19512200000000002" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hca['home_away_diff'].median()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The main thing to notice about this plot is the significant variability in the data. Denver's away win percentage has varied between 2% less than its home win percentage to as much as 46% less. Of course, a large win percentage differential could occur because a team has an awful road record or because it has an outstanding home record. Similarly, a small win percentage differential could occur because a team is outstanding on the road or is relatively weak at home.\n", "\n", "The rectangles on the plot display the [interquartile range](https://en.wikipedia.org/wiki/Interquartile_range) for each team. The interquartial range is the difference between the 75% quantile and the 25% quantile. Approximately half of the observations for each team occur in each of the relevant rectangles.\n", "\n", "The grey lines within each rectangle represent the [median](https://en.wikipedia.org/wiki/Median) for each team. Approximately half of the win percentage differentials are above this line, and half are below this line.\n", "\n", "(For the purists, the reason I say \"approximately\" in the preceding sentences is due to the fact that some fine-tuning may be required to break ties, depending upon the number of data points. See the links above for details on the calculation of median and interquartile range if you're interested.)\n", "\n", "All of the interquartile ranges intersect the line representing the overall median. If a team had a significantly different home court advantage (or disdvantage), you would expect to see that more clearly in the plot.\n", "\n", "This plot doesn't attempt to separate out the reasons why the home versus away win percentage varies so much. That will have to wait for another day. But, if the question being asked is does a team have a consistent and measurably different home court advantage, this plot certainly doesn't support a yes answer to that question.\n", "\n", "### Analysis of Variance\n", "\n", "There are more formal statistical tests of whether several groups have the same mean. An example of this is [one-way analysis of variance](https://en.wikipedia.org/wiki/One-way_analysis_of_variance), or ANOVA. The details don't matter much, but to illustrate the method in Python, here is a one-way ANOVA testing whether all the teams have the same mean home versus away win percentage differential.\n", "\n", "In Python, it's easy to use the `scipy` statistics package to perform this test." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "data = [hca.loc[hca['team'] == team, 'home_away_diff'] for team in teams]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.090958554085467463" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "f_val, p_val = stats.f_oneway(*data)\n", "p_val" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In one-way ANOVA, the [null hypothesis](http://www.statisticshowto.com/probability-and-statistics/null-hypothesis/) is that all the means are the same. For the NBA data, the resulting [_p-value_](http://www.dummies.com/education/math/statistics/what-a-p-value-tells-you-about-statistical-data/) of 9.1% is only very weak evidence against the null hypothesis that all the home court advantages are the same.\n", "\n", "The statistical test tells the same story as the plot. The data are too noisy to support a strong conclusion.\n", "\n", "### Quantifying Differences in Home Court Advantage\n", "\n", "Even though we've seen that the data don't really support the argument that different teams have consistently different amounts of home court advantage, let's take the analysis just a little bit further.\n", "\n", "Suppose for a moment that teams really did have different home court advantages. How would you use that information?\n", "\n", "Under the NBA schedule, every team plays the same number of home and away games every season. So, with a little bit of algebra, you can show that the overall win percentage is the average of the home win percentage and the away win percentage.\n", "\n", "Also, remember that the way we are estimating home court advantage here is home win percentage minus away win percentage. The average team has roughly a 20% gap between home and away win percentages. But, as we've seen in prior posts, the gap between the home win percentage and the overall win percentage is around 10%. That is, half of 20%.\n", "\n", "So, if a particular team had a significantly better home court advantage than the league average, the predicted impact on the overall season win percentage would only be half of the difference between that teams's home court advantage, and the league average home court advantage.\n", "\n", "Let's estimate the predicted impact on overall win percentage, assuming that each team really did have different home court advantages.\n", "\n", "We'll estimate the home court advantage using the [_interquartile mean_](https://en.wikipedia.org/wiki/Interquartile_mean), which is the just the average of the 25% and 75% quartiles. It's not the same as the median. You can visualize the interquartile mean as being the centers of the rectangles in the box plot above.\n", "\n", "Here are the 5 best and 5 worst NBA team home court advantages, estimated by the historical interquartile mean over the past 21 completed NBA seasons." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "def iq_mean(x):\n", " return (x.quantile(0.75) + x.quantile(0.25))/2" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "team\n", "DEN 0.268\n", "IND 0.244\n", "ATL 0.232\n", "CLE 0.232\n", "UTA 0.232\n", "Name: home_away_diff, dtype: float64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y = hca.groupby('team')['home_away_diff'].agg(iq_mean).sort_values(ascending=False)\n", "y.head()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "team\n", "MIA 0.159\n", "MIN 0.159\n", "TOR 0.149\n", "DAL 0.146\n", "PHI 0.122\n", "Name: home_away_diff, dtype: float64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we can just subtract out the league interquartile mean and divide by 2, to look at the impact on each team's estimated overall win percentage." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.19417887500000003" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = sum(y)/len(y)\n", "m" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "team\n", "DEN 0.037\n", "IND 0.025\n", "ATL 0.019\n", "CLE 0.019\n", "UTA 0.019\n", "SAC 0.019\n", "POR 0.013\n", "BKN 0.013\n", "CHA 0.007\n", "WAS 0.007\n", "NOP 0.007\n", "NYK 0.000\n", "SAS 0.000\n", "LAC 0.000\n", "MEM 0.000\n", "ORL 0.000\n", "HOU -0.003\n", "MIL -0.006\n", "GSW -0.006\n", "DET -0.006\n", "PHX -0.006\n", "LAL -0.006\n", "CHI -0.006\n", "OKC -0.012\n", "BOS -0.018\n", "MIA -0.018\n", "MIN -0.018\n", "TOR -0.023\n", "DAL -0.024\n", "PHI -0.036\n", "Name: home_away_diff, dtype: float64" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "home_edge = (y - m)/2\n", "home_edge" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that by this measure, Indiana, Atlanta and Cleveland rank better than Utah, despite the claim that Salt Lake City's altitude helps the Jazz. The Jazz just don't stand out as much using the interquartile mean. This suggests some historical [outliers](https://en.wikipedia.org/wiki/Outlier) may be influencing the common perception that the Jazz enjoy a special degree of home court advantage.\n", "\n", "What's also interesting is the teams at the bottom. People usually don't think of OKC, Boston or Toronto as suffering from significantly below-average home court advantage.\n", "\n", "If Denver really did have such a large relative home court advantage compared to the league average, it would be expected to pick up roughly an extra 3 games per season on average. It's hard to detect that effect in the noisy year-to-year variation of the team's record.\n", "\n", "The general take-away is: treat claims of persistent home court advantage in the sports media with skepticism." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:sports_py36]", "language": "python", "name": "conda-env-sports_py36-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }