{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## NBA Home Court Advantage\n", "\n", "### Part 1: A First Look at Win Percentages and Point Advantages\n", "\n", "In this notebook, we'll take a short look at home court advantage in the NBA. The analysis will use data from the 1996-97 season through the 2016-17 season. We scraped this data from [stats.nba.com](http://stats.nba.com/) in [this notebook](http://nbviewer.jupyter.org/github/practicallypredictable/posts/blob/master/notebooks/scrape-stats_nba-team_matchups.ipynb).\n", "\n", "Let's first import the packages we need." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "pd.options.display.max_rows = 999\n", "pd.options.display.max_columns = 999\n", "pd.options.display.float_format = '{:.3f}'.format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use [`seaborn`](https://seaborn.pydata.org/) and [`matplotlib`](https://matplotlib.org/) for plotting." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "sns.set()\n", "sns.set_context('notebook')\n", "plt.style.use('ggplot')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use the statistics module from the [`scipy`](https://www.scipy.org/) package in a plot at the end of the notebook." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import scipy.stats as stats" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "PROJECT_DIR = Path.cwd().parent / 'basketball' / 'nba'\n", "INPUT_DIR = PROJECT_DIR / 'data' / 'prepared'\n", "INPUT_DIR.mkdir(exist_ok=True, parents=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's get the historical matchups we scraped in the previous notebook. In that notebook, the scraped data were saved using Python's [pickle](https://docs.python.org/3/library/pickle.html) format. Refer to the [earlier notebook](http://nbviewer.jupyter.org/github/practicallypredictable/posts/blob/master/notebooks/scrape-stats_nba-team_matchups.ipynb) for more information if you need to scape the data." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def load_nba_historical_matchups(input_dir):\n", " \"\"\"Load pickle file of NBA matchups prepared for analytics.\"\"\"\n", " PKLFILENAME = 'stats_nba_com-matchups-1996_97-2016_17.pkl'\n", " pklfile = input_dir.joinpath(PKLFILENAME)\n", " return pd.read_pickle(pklfile)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(26787, 41)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "matchups = load_nba_historical_matchups(INPUT_DIR)\n", "matchups.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have 21 complete seasons of matchup data. Let's just use the regular season matchups for this analysis." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "21" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "seasons = sorted(list(matchups['season'].unique()))\n", "len(seasons)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def prepare_regular_season(matchups):\n", " df = matchups.copy()\n", " df = df[df['season_type'] == 'regular']\n", " df['pt_diff'] = df['pts_h'] - df['pts_a']\n", " return df" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(24797, 42)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reg = prepare_regular_season(matchups)\n", "reg.shape" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def home_win_percentage(df):\n", " games = len(df)\n", " if games > 0:\n", " return float(df['won'].value_counts()['H'] / games)\n", " else:\n", " return np.nan" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5980562164777997" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "home_win_percentage(reg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The home team has won almost 60% of home games over the past 21 complete seasons. Of course, to win basketball games, you have to score more than the other team. Let's examine what this means in terms of point differentials." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "def home_court_analysis(df):\n", " seasons = sorted(list(df['season'].unique()))\n", " home_win_pct = [\n", " home_win_percentage(df.loc[df['season'] == season]) for season in seasons\n", " ]\n", " pt_diff_mean = [\n", " df.loc[df['season'] == season, 'pt_diff'].mean() for season in seasons\n", " ]\n", " pt_diff_std = [\n", " df.loc[df['season'] == season, 'pt_diff'].std() for season in seasons\n", " ]\n", " return pd.DataFrame({\n", " 'season': seasons,\n", " 'home_win_pct': home_win_pct,\n", " 'pt_diff_mean': pt_diff_mean,\n", " 'pt_diff_std': pt_diff_std,\n", " })" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | home_win_pct | \n", "pt_diff_mean | \n", "pt_diff_std | \n", "
|---|---|---|---|
| count | \n", "21.000 | \n", "21.000 | \n", "21.000 | \n", "
| mean | \n", "0.598 | \n", "3.105 | \n", "12.989 | \n", "
| std | \n", "0.015 | \n", "0.385 | \n", "0.474 | \n", "
| min | \n", "0.575 | \n", "2.407 | \n", "12.030 | \n", "
| 25% | \n", "0.589 | \n", "2.820 | \n", "12.669 | \n", "
| 50% | \n", "0.598 | \n", "3.149 | \n", "13.057 | \n", "
| 75% | \n", "0.608 | \n", "3.399 | \n", "13.342 | \n", "
| max | \n", "0.628 | \n", "3.884 | \n", "13.686 | \n", "